Context Navigation

← Previous Change
Next Change →

Lexer.cpp

Timestamp:

May 3, 2009, 9:49:35 AM (16 years ago)

Author:

Darin Adler

Message:

2009-05-02 Darin Adler <Darin Adler>

Reviewed by Maciej Stachowiak.

Bug 25519: streamline lexer by handling BOMs differently
https://bugs.webkit.org/show_bug.cgi?id=25519

Roughly 1% faster SunSpider.

parser/Grammar.y: Tweak formatting a bit.

parser/Lexer.cpp: (JSC::Lexer::Lexer): Remove unnnecessary initialization of data members that are set up by setCode. (JSC::Lexer::currentOffset): Added. Used where the old code would look at m_currentOffset. (JSC::Lexer::shift1): Replaces the old shift function. No longer does anything to handle BOM characters. (JSC::Lexer::shift2): Ditto. (JSC::Lexer::shift3): Ditto. (JSC::Lexer::shift4): Ditto. (JSC::Lexer::setCode): Updated for name change from yylineno to m_line. Removed now-unused m_eatNextIdentifier, m_stackToken, and m_restrKeyword. Replaced m_skipLF and m_skipCR with m_skipLineEnd. Replaced the old m_length with m_codeEnd and m_currentOffset with m_codeStart. Added code to scan for a BOM character and call copyCodeWithoutBOMs() if we find any. (JSC::Lexer::copyCodeWithoutBOMs): Added. (JSC::Lexer::nextLine): Updated for name change from yylineno to m_line. (JSC::Lexer::makeIdentifier): Moved up higher in the file. (JSC::Lexer::matchPunctuator): Moved up higher in the file and changed to use a switch statement instead of just if statements. (JSC::Lexer::isLineTerminator): Moved up higher in the file and changed to have fewer branches. (JSC::Lexer::lastTokenWasRestrKeyword): Added. This replaces the old m_restrKeyword boolean. (JSC::Lexer::isIdentStart): Moved up higher in the file. Changed to use fewer branches in the ASCII but not identifier case. (JSC::Lexer::isIdentPart): Ditto. (JSC::Lexer::singleEscape): Moved up higher in the file. (JSC::Lexer::convertOctal): Moved up higher in the file. (JSC::Lexer::convertHex): Moved up higher in the file. Changed to use toASCIIHexValue instead of rolling our own here. (JSC::Lexer::convertUnicode): Ditto. (JSC::Lexer::record8): Moved up higher in the file. (JSC::Lexer::record16): Moved up higher in the file. (JSC::Lexer::lex): Changed type of stringType to int. Replaced m_skipLF and m_skipCR with m_skipLineEnd, which requires fewer branches in the main lexer loop. Use currentOffset instead of m_currentOffset. Removed unneeded m_stackToken. Use isASCIIDigit instead of isDecimalDigit. Split out the two cases for InIdentifierOrKeyword and InIdentifier. Added special case tight loops for identifiers and other simple states. Removed a branch from the code that sets m_atLineStart to false using goto. Streamlined the number-handling code so we don't check for the same types twice for non-numeric cases and don't add a null to m_buffer8 when it's not being used. Removed m_eatNextIdentifier, which wasn't working anyway, and m_restrKeyword, which is redundant with m_lastToken. Set the m_delimited flag without using a branch. (JSC::Lexer::scanRegExp): Tweaked style a bit. (JSC::Lexer::clear): Clear m_codeWithoutBOMs so we don't use memory after parsing. Clear out UString objects in the more conventional way. (JSC::Lexer::sourceCode): Made this no-longer inline since it has more work to do in the case where we stripped BOMs.

parser/Lexer.h: Renamed yylineno to m_lineNumber. Removed convertHex function, which is the same as toASCIIHexValue. Removed isHexDigit function, which is the same as isASCIIHedDigit. Replaced shift with four separate shift functions. Removed isWhiteSpace function that passes m_current, instead just passing m_current explicitly. Removed isOctalDigit, which is the same as isASCIIOctalDigit. Eliminated unused arguments from matchPunctuator. Added copyCoodeWithoutBOMs and currentOffset. Moved the makeIdentifier function out of the header. Added lastTokenWasRestrKeyword function. Added new constants for m_skipLineEnd. Removed unused yycolumn, m_restrKeyword, m_skipLF, m_skipCR, m_eatNextIdentifier, m_stackToken, m_position, m_length, m_currentOffset, m_nextOffset1, m_nextOffset2, m_nextOffset3. Added m_skipLineEnd, m_codeStart, m_codeEnd, and m_codeWithoutBOMs.

parser/SourceProvider.h: Added hasBOMs function. In the future this can be used to tell the lexer about strings known not to have BOMs.

runtime/JSGlobalObjectFunctions.cpp: (JSC::globalFuncUnescape): Changed to use isASCIIHexDigit.

wtf/ASCIICType.h: Added using statements to match the design of the other WTF headers.

File:

: 1 edited

trunk/JavaScriptCore/parser/Lexer.cpp (modified) (12 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/parser/Lexer.cpp

-              r43144
+              r43156
 #include <limits.h>
 #include <string.h>
-#include <wtf/ASCIICType.h>
 #include <wtf/Assertions.h>
 …
 using namespace Unicode;
 // we can't specify the namespace in yacc's C output, so do it here
+// We can't specify the namespace in yacc's C output, so do it here instead.
 using namespace JSC;
 …
 #include "Lexer.lut.h"
 // a bridge for yacc from the C world to C++
+// A bridge for yacc from the C world to the C++ world.
 int jscyylex(void* lvalp, void* llocp, void* globalData)
+{
 …
 namespace JSC {
+static bool isDecimalDigit(int);
+static const UChar byteOrderMark = 0xFEFF;
+// Values for m_skipLineEnd.
+static const unsigned char SkipLFShift = 0;
+static const unsigned char SkipCRShift = 1;
+static const unsigned char SkipLF = 1 << SkipLFShift;
+static const unsigned char SkipCR = 1 << SkipCRShift;
 Lexer::Lexer(JSGlobalData* globalData)
+    : yylineno(1)
+    , m_restrKeyword(false)
+    , m_eatNextIdentifier(false)
+    , m_stackToken(-1)
+    , m_lastToken(-1)
+    , m_position(0)
+    , m_code(0)
+    , m_length(0)
+    , m_isReparsing(false)
+    , m_atLineStart(true)
+    , m_current(0)
+    , m_next1(0)
+    , m_next2(0)
+    , m_next3(0)
+    , m_currentOffset(0)
+    , m_nextOffset1(0)
+    , m_nextOffset2(0)
+    , m_nextOffset3(0)
+    : m_isReparsing(false)
     , m_globalData(globalData)
     , m_mainTable(JSC::mainTable)
+    , m_keywordTable(JSC::mainTable)
+{
     m_buffer8.reserveInitialCapacity(initialReadBufferCapacity);
 …
 Lexer::~Lexer()
+{
+    m_mainTable.deleteTable();
+}
+ALWAYS_INLINE void Lexer::shift(unsigned p)
+{
+    // ECMA-262 calls for stripping Cf characters here, but we only do this for BOM,
+    // see <https://bugs.webkit.org/show_bug.cgi?id=4931>.
+    while (p--) {
+        m_current = m_next1;
+        m_next1 = m_next2;
+        m_next2 = m_next3;
+        m_currentOffset = m_nextOffset1;
+        m_nextOffset1 = m_nextOffset2;
+        m_nextOffset2 = m_nextOffset3;
+        do {
+            if (m_position >= m_length) {
+                m_nextOffset3 = m_position;
+                m_position++;
+                m_next3 = -1;
+                break;
+            }
+            m_nextOffset3 = m_position;
+            m_next3 = m_code[m_position++];
+        } while (UNLIKELY(m_next3 == 0xFEFF));
+    }
+    m_keywordTable.deleteTable();
+}
+inline int Lexer::currentOffset() const
+{
+    return m_code - 4 - m_codeStart;
+}
+ALWAYS_INLINE void Lexer::shift1()
+{
+    m_current = m_next1;
+    m_next1 = m_next2;
+    m_next2 = m_next3;
+    if (LIKELY(m_code < m_codeEnd))
+        m_next3 = m_code[0];
+    else
+        m_next3 = -1;
+    ++m_code;
+}
+ALWAYS_INLINE void Lexer::shift2()
+{
+    m_current = m_next2;
+    m_next1 = m_next3;
+    if (LIKELY(m_code + 1 < m_codeEnd)) {
+        m_next2 = m_code[0];
+        m_next3 = m_code[1];
+    } else {
+        m_next2 = m_code < m_codeEnd ? m_code[0] : -1;
+        m_next3 = -1;
+    }
+    m_code += 2;
+}
+ALWAYS_INLINE void Lexer::shift3()
+{
+    m_current = m_next3;
+    if (LIKELY(m_code + 2 < m_codeEnd)) {
+        m_next1 = m_code[0];
+        m_next2 = m_code[1];
+        m_next3 = m_code[2];
+    } else {
+        m_next1 = m_code < m_codeEnd ? m_code[0] : -1;
+        m_next2 = m_code + 1 < m_codeEnd ? m_code[1] : -1;
+        m_next3 = -1;
+    }
+    m_code += 3;
+}
+ALWAYS_INLINE void Lexer::shift4()
+{
+    if (LIKELY(m_code + 3 < m_codeEnd)) {
+        m_current = m_code[0];
+        m_next1 = m_code[1];
+        m_next2 = m_code[2];
+        m_next3 = m_code[3];
+    } else {
+        m_current = m_code < m_codeEnd ? m_code[0] : -1;
+        m_next1 = m_code + 1 < m_codeEnd ? m_code[1] : -1;
+        m_next2 = m_code + 2 < m_codeEnd ? m_code[2] : -1;
+        m_next3 = -1;
+    }
+    m_code += 4;
+}
 void Lexer::setCode(const SourceCode& source)
+{
+    yylineno = source.firstLine();
+    m_restrKeyword = false;
+    m_lineNumber = source.firstLine();
     m_delimited = false;
-    m_eatNextIdentifier = false;
-    m_stackToken = -1;
     m_lastToken = -1;
+    m_position = source.startOffset();
+    const UChar* data = source.provider()->data();
     m_source = &source;
     m_code = source.provider()->data();
     m_length = source.endOffset();
     m_skipLF = false;
     m_skipCR = false;
+    m_codeStart = data;
+    m_code = data + source.startOffset();
+    m_codeEnd = data + source.endOffset();
+    m_skipLineEnd = 0;
     m_error = false;
     m_atLineStart = true;
+    // read first characters
+    shift(4);
+    // ECMA-262 calls for stripping all Cf characters, but we only strip BOM characters.
+    // See <https://bugs.webkit.org/show_bug.cgi?id=4931> for details.
+    if (source.provider()->hasBOMs()) {
+        for (const UChar* p = m_codeStart; p < m_codeEnd; ++p) {
+            if (UNLIKELY(*p == byteOrderMark)) {
+                copyCodeWithoutBOMs();
+                break;
+            }
+        }
+    }
+    // Read the first characters into the 4-character buffer.
+    shift4();
+    ASSERT(currentOffset() == source.startOffset());
+}
+void Lexer::copyCodeWithoutBOMs()
+{
+    // Note: In this case, the character offset data for debugging will be incorrect.
+    // If it's important to correctly debug code with extraneous BOMs, then the caller
+    // should strip the BOMs when creating the SourceProvider object and do its own
+    // mapping of offsets within the stripped text to original text offset.
+    m_codeWithoutBOMs.reserveCapacity(m_codeEnd - m_code);
+    for (const UChar* p = m_code; p < m_codeEnd; ++p) {
+        UChar c = *p;
+        if (c != byteOrderMark)
+            m_codeWithoutBOMs.append(c);
+    }
+    ptrdiff_t startDelta = m_codeStart - m_code;
+    m_code = m_codeWithoutBOMs.data();
+    m_codeStart = m_code + startDelta;
+    m_codeEnd = m_codeWithoutBOMs.data() + m_codeWithoutBOMs.size();
+}
 …
 void Lexer::nextLine()
+{
     yylineno++;
+    ++m_lineNumber;
     m_atLineStart = true;
+}
 …
+}
+int Lexer::lex(void* p1, void* p2)
+{
+    YYSTYPE* lvalp = static_cast<YYSTYPE*>(p1);
+    YYLTYPE* llocp = static_cast<YYLTYPE*>(p2);
+    int token = 0;
+    m_state = Start;
+    unsigned short stringType = 0; // either single or double quotes
+    m_buffer8.resize(0);
+    m_buffer16.resize(0);
+    m_done = false;
+    m_terminator = false;
+    m_skipLF = false;
+    m_skipCR = false;
+    // did we push a token on the stack previously ?
+    // (after an automatic semicolon insertion)
+    if (m_stackToken >= 0) {
+        setDone(Other);
+        token = m_stackToken;
+        m_stackToken = 0;
+    }
+    int startOffset = m_currentOffset;
+    if (!m_done) {
+    while (true) {
+        if (m_skipLF && m_current != '\n') // found \r but not \n afterwards
+            m_skipLF = false;
+        if (m_skipCR && m_current != '\r') // found \n but not \r afterwards
+            m_skipCR = false;
+        if (m_skipLF || m_skipCR) { // found \r\n or \n\r -> eat the second one
+            m_skipLF = false;
+            m_skipCR = false;
+            shift(1);
+        }
+        switch (m_state) {
+            case Start:
+                startOffset = m_currentOffset;
+                if (isWhiteSpace()) {
+                    // do nothing
+                } else if (m_current == '/' && m_next1 == '/') {
+                    shift(1);
+                    m_state = InSingleLineComment;
+                } else if (m_current == '/' && m_next1 == '*') {
+                    shift(1);
+                    m_state = InMultiLineComment;
+                } else if (m_current == -1) {
+                    if (!m_terminator && !m_delimited && !m_isReparsing) {
+                        // automatic semicolon insertion if program incomplete
+                        token = ';';
+                        m_stackToken = 0;
+                        setDone(Other);
+                    } else
+                        setDone(Eof);
+                } else if (isLineTerminator()) {
+                    nextLine();
+                    m_terminator = true;
+                    if (m_restrKeyword) {
+                        token = ';';
+                        setDone(Other);
+                    }
+                } else if (m_current == '"' || m_current == '\'') {
+                    m_state = InString;
+                    stringType = static_cast<unsigned short>(m_current);
+                } else if (isIdentStart(m_current)) {
+                    record16(m_current);
+                    m_state = InIdentifierOrKeyword;
+                } else if (m_current == '\\')
+                    m_state = InIdentifierStartUnicodeEscapeStart;
+                else if (m_current == '0') {
+                    record8(m_current);
+                    m_state = InNum0;
+                } else if (isDecimalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InNum;
+                } else if (m_current == '.' && isDecimalDigit(m_next1)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                    // <!-- marks the beginning of a line comment (for www usage)
+                } else if (m_current == '<' && m_next1 == '!' && m_next2 == '-' && m_next3 == '-') {
+                    shift(3);
+                    m_state = InSingleLineComment;
+                    // same for -->
+                } else if (m_atLineStart && m_current == '-' && m_next1 == '-' &&  m_next2 == '>') {
+                    shift(2);
+                    m_state = InSingleLineComment;
+                } else {
+                    token = matchPunctuator(lvalp->intValue, m_current, m_next1, m_next2, m_next3);
+                    if (token != -1)
+                        setDone(Other);
+                    else
+                        setDone(Bad);
+                }
+                break;
+            case InString:
+                if (m_current == stringType) {
+                    shift(1);
+                    setDone(String);
+                } else if (isLineTerminator() || m_current == -1)
+                    setDone(Bad);
+                else if (m_current == '\\')
+                    m_state = InEscapeSequence;
+                else
+                    record16(m_current);
+                break;
+            // Escape Sequences inside of strings
+            case InEscapeSequence:
+                if (isOctalDigit(m_current)) {
+                    if (m_current >= '0' && m_current <= '3' &&
+                        isOctalDigit(m_next1) && isOctalDigit(m_next2)) {
+                        record16(convertOctal(m_current, m_next1, m_next2));
+                        shift(2);
+                        m_state = InString;
+                    } else if (isOctalDigit(m_current) && isOctalDigit(m_next1)) {
+                        record16(convertOctal('0', m_current, m_next1));
+                        shift(1);
+                        m_state = InString;
+                    } else if (isOctalDigit(m_current)) {
+                        record16(convertOctal('0', '0', m_current));
+                        m_state = InString;
+                    } else
+                        setDone(Bad);
+                } else if (m_current == 'x')
+                    m_state = InHexEscape;
+                else if (m_current == 'u')
+                    m_state = InUnicodeEscape;
+                else if (isLineTerminator()) {
+                    nextLine();
+                    m_state = InString;
+                } else {
+                    record16(singleEscape(static_cast<unsigned short>(m_current)));
+                    m_state = InString;
+                }
+                break;
+            case InHexEscape:
+                if (isHexDigit(m_current) && isHexDigit(m_next1)) {
+                    m_state = InString;
+                    record16(convertHex(m_current, m_next1));
+                    shift(1);
+                } else if (m_current == stringType) {
+                    record16('x');
+                    shift(1);
+                    setDone(String);
+                } else {
+                    record16('x');
+                    record16(m_current);
+                    m_state = InString;
+                }
+                break;
+            case InUnicodeEscape:
+                if (isHexDigit(m_current) && isHexDigit(m_next1) && isHexDigit(m_next2) && isHexDigit(m_next3)) {
+                    record16(convertUnicode(m_current, m_next1, m_next2, m_next3));
+                    shift(3);
+                    m_state = InString;
+                } else if (m_current == stringType) {
+                    record16('u');
+                    shift(1);
+                    setDone(String);
+                } else
+                    setDone(Bad);
+                break;
+            case InSingleLineComment:
+                if (isLineTerminator()) {
+                    nextLine();
+                    m_terminator = true;
+                    if (m_restrKeyword) {
+                        token = ';';
+                        setDone(Other);
+                    } else
+                        m_state = Start;
+                } else if (m_current == -1)
+                    setDone(Eof);
+                break;
+            case InMultiLineComment:
+                if (m_current == -1)
+                    setDone(Bad);
+                else if (isLineTerminator())
+                    nextLine();
+                else if (m_current == '*' && m_next1 == '/') {
+                    m_state = Start;
+                    shift(1);
+                }
+                break;
+            case InIdentifierOrKeyword:
+            case InIdentifier:
+                if (isIdentPart(m_current))
+                    record16(m_current);
+                else if (m_current == '\\')
+                    m_state = InIdentifierPartUnicodeEscapeStart;
+                else
+                    setDone(m_state == InIdentifierOrKeyword ? IdentifierOrKeyword : Identifier);
+                break;
+            case InNum0:
+                if (m_current == 'x' || m_current == 'X') {
+                    record8(m_current);
+                    m_state = InHex;
+                } else if (m_current == '.') {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else if (isOctalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InOctal;
+                } else if (isDecimalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else
+                    setDone(Number);
+                break;
+            case InHex:
+                if (isHexDigit(m_current))
+                    record8(m_current);
+                else
+                    setDone(Hex);
+                break;
+            case InOctal:
+                if (isOctalDigit(m_current))
+                    record8(m_current);
+                else if (isDecimalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else
+                    setDone(Octal);
+                break;
+            case InNum:
+                if (isDecimalDigit(m_current))
+                    record8(m_current);
+                else if (m_current == '.') {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else
+                    setDone(Number);
+                break;
+            case InDecimal:
+                if (isDecimalDigit(m_current))
+                    record8(m_current);
+                else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else
+                    setDone(Number);
+                break;
+            case InExponentIndicator:
+                if (m_current == '+' || m_current == '-')
+                    record8(m_current);
+                else if (isDecimalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InExponent;
+                } else
+                    setDone(Bad);
+                break;
+            case InExponent:
+                if (isDecimalDigit(m_current))
+                    record8(m_current);
+                else
+                    setDone(Number);
+                break;
+            case InIdentifierStartUnicodeEscapeStart:
+                if (m_current == 'u')
+                    m_state = InIdentifierStartUnicodeEscape;
+                else
+                    setDone(Bad);
+                break;
+            case InIdentifierPartUnicodeEscapeStart:
+                if (m_current == 'u')
+                    m_state = InIdentifierPartUnicodeEscape;
+                else
+                    setDone(Bad);
+                break;
+            case InIdentifierStartUnicodeEscape:
+                if (!isHexDigit(m_current) || !isHexDigit(m_next1) || !isHexDigit(m_next2) || !isHexDigit(m_next3)) {
+                    setDone(Bad);
+                    break;
+                }
+                token = convertUnicode(m_current, m_next1, m_next2, m_next3);
+                shift(3);
+                if (!isIdentStart(token)) {
+                    setDone(Bad);
+                    break;
+                }
+                record16(token);
+                m_state = InIdentifier;
+                break;
+            case InIdentifierPartUnicodeEscape:
+                if (!isHexDigit(m_current) || !isHexDigit(m_next1) || !isHexDigit(m_next2) || !isHexDigit(m_next3)) {
+                    setDone(Bad);
+                    break;
+                }
+                token = convertUnicode(m_current, m_next1, m_next2, m_next3);
+                shift(3);
+                if (!isIdentPart(token)) {
+                    setDone(Bad);
+                    break;
+                }
+                record16(token);
+                m_state = InIdentifier;
+                break;
+            default:
+                ASSERT(!"Unhandled state in switch statement");
+        }
+        if (m_state != Start && m_state != InSingleLineComment)
+            m_atLineStart = false;
+        if (m_done)
+            break;
+        shift(1);
+    }
+    }
+    // no identifiers allowed directly after numeric literal, e.g. "3in" is bad
+    if ((m_state == Number || m_state == Octal || m_state == Hex) && isIdentStart(m_current))
+        m_state = Bad;
+    // terminate string
+    m_buffer8.append('\0');
+#ifdef JSC_DEBUG_LEX
+    fprintf(stderr, "line: %d ", lineNo());
+    fprintf(stderr, "yytext (%x): ", m_buffer8[0]);
+    fprintf(stderr, "%s ", m_buffer8.data());
+#endif
+    double dval = 0;
+    if (m_state == Number)
+        dval = WTF::strtod(m_buffer8.data(), 0L);
+    else if (m_state == Hex) { // scan hex numbers
+        const char* p = m_buffer8.data() + 2;
+        while (char c = *p++) {
+            dval *= 16;
+            dval += convertHex(c);
+        }
+        if (dval >= mantissaOverflowLowerBound)
+            dval = parseIntOverflow(m_buffer8.data() + 2, p - (m_buffer8.data() + 3), 16);
+        m_state = Number;
+    } else if (m_state == Octal) {   // scan octal number
+        const char* p = m_buffer8.data() + 1;
+        while (char c = *p++) {
+            dval *= 8;
+            dval += c - '0';
+        }
+        if (dval >= mantissaOverflowLowerBound)
+            dval = parseIntOverflow(m_buffer8.data() + 1, p - (m_buffer8.data() + 2), 8);
+        m_state = Number;
+    }
+#ifdef JSC_DEBUG_LEX
+    switch (m_state) {
+        case Eof:
+            printf("(EOF)\n");
+            break;
+        case Other:
+            printf("(Other)\n");
+            break;
+        case Identifier:
+            printf("(Identifier)/(Keyword)\n");
+            break;
+        case String:
+            printf("(String)\n");
+            break;
+        case Number:
+            printf("(Number)\n");
+            break;
+        default:
+            printf("(unknown)");
+    }
+#endif
+    if (m_state != Identifier)
+        m_eatNextIdentifier = false;
+    m_restrKeyword = false;
+    m_delimited = false;
+    llocp->first_line = yylineno;
+    llocp->last_line = yylineno;
+    llocp->first_column = startOffset;
+    llocp->last_column = m_currentOffset;
+    switch (m_state) {
+        case Eof:
+            token = 0;
+            break;
+        case Other:
+            if (token == '}' || token == ';')
+                m_delimited = true;
+            break;
+        case Identifier:
+            // Apply anonymous-function hack below (eat the identifier).
+            if (m_eatNextIdentifier) {
+                m_eatNextIdentifier = false;
+                token = lex(lvalp, llocp);
+                break;
+            }
+            lvalp->ident = makeIdentifier(m_buffer16);
+            token = IDENT;
+            break;
+        case IdentifierOrKeyword: {
+            lvalp->ident = makeIdentifier(m_buffer16);
+            const HashEntry* entry = m_mainTable.entry(m_globalData, *lvalp->ident);
+            if (!entry) {
+                // Lookup for keyword failed, means this is an identifier.
+                token = IDENT;
+                break;
+            }
+            token = entry->lexerValue();
+            // Hack for "f = function somename() { ... }"; too hard to get into the grammar.
+            m_eatNextIdentifier = token == FUNCTION && m_lastToken == '=';
+            if (token == CONTINUE || token == BREAK || token == RETURN || token == THROW)
+                m_restrKeyword = true;
+            break;
+        }
+        case String:
+            // Atomize constant strings in case they're later used in property lookup.
+            lvalp->ident = makeIdentifier(m_buffer16);
+            token = STRING;
+            break;
+        case Number:
+            lvalp->doubleValue = dval;
+            token = NUMBER;
+            break;
+        case Bad:
+#ifdef JSC_DEBUG_LEX
+            fprintf(stderr, "yylex: ERROR.\n");
+#endif
+            m_error = true;
+            return -1;
+        default:
+            ASSERT(!"unhandled numeration value in switch");
+            m_error = true;
+            return -1;
+    }
+    m_lastToken = token;
+    return token;
+}
+bool Lexer::isWhiteSpace() const
+{
+    return isWhiteSpace(m_current);
+}
+bool Lexer::isLineTerminator()
+{
+    bool cr = (m_current == '\r');
+    bool lf = (m_current == '\n');
+    if (cr)
+        m_skipLF = true;
+    else if (lf)
+        m_skipCR = true;
+    return cr || lf || m_current == 0x2028 || m_current == 0x2029;
+}
+bool Lexer::isIdentStart(int c)
+{
+    return isASCIIAlpha(c) || c == '$' || c == '_' || (!isASCII(c) && (category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other)));
+}
+bool Lexer::isIdentPart(int c)
+{
+    return isASCIIAlphanumeric(c) || c == '$' || c == '_' || (!isASCII(c) && (category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other
+                            | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector)));
+}
+static bool isDecimalDigit(int c)
+{
+    return isASCIIDigit(c);
+}
+bool Lexer::isHexDigit(int c)
+{
+    return isASCIIHexDigit(c);
+}
+bool Lexer::isOctalDigit(int c)
+{
+    return isASCIIOctalDigit(c);
+}
+int Lexer::matchPunctuator(int& charPos, int c1, int c2, int c3, int c4)
+{
+    if (c1 == '>' && c2 == '>' && c3 == '>' && c4 == '=') {
+        shift(4);
+        return URSHIFTEQUAL;
+    }
+    if (c1 == '=' && c2 == '=' && c3 == '=') {
+        shift(3);
+        return STREQ;
+    }
+    if (c1 == '!' && c2 == '=' && c3 == '=') {
+        shift(3);
+        return STRNEQ;
+    }
+    if (c1 == '>' && c2 == '>' && c3 == '>') {
+        shift(3);
+        return URSHIFT;
+    }
+    if (c1 == '<' && c2 == '<' && c3 == '=') {
+        shift(3);
+        return LSHIFTEQUAL;
+    }
+    if (c1 == '>' && c2 == '>' && c3 == '=') {
+        shift(3);
+        return RSHIFTEQUAL;
+    }
+    if (c1 == '<' && c2 == '=') {
+        shift(2);
+        return LE;
+    }
+    if (c1 == '>' && c2 == '=') {
+        shift(2);
+        return GE;
+    }
+    if (c1 == '!' && c2 == '=') {
+        shift(2);
+        return NE;
+    }
+    if (c1 == '+' && c2 == '+') {
+        shift(2);
+        if (m_terminator)
+            return AUTOPLUSPLUS;
+        return PLUSPLUS;
+    }
+    if (c1 == '-' && c2 == '-') {
+        shift(2);
+        if (m_terminator)
+            return AUTOMINUSMINUS;
+        return MINUSMINUS;
+    }
+    if (c1 == '=' && c2 == '=') {
+        shift(2);
+        return EQEQ;
+    }
+    if (c1 == '+' && c2 == '=') {
+        shift(2);
+        return PLUSEQUAL;
+    }
+    if (c1 == '-' && c2 == '=') {
+        shift(2);
+        return MINUSEQUAL;
+    }
+    if (c1 == '*' && c2 == '=') {
+        shift(2);
+        return MULTEQUAL;
+    }
+    if (c1 == '/' && c2 == '=') {
+        shift(2);
+        return DIVEQUAL;
+    }
+    if (c1 == '&' && c2 == '=') {
+        shift(2);
+        return ANDEQUAL;
+    }
+    if (c1 == '^' && c2 == '=') {
+        shift(2);
+        return XOREQUAL;
+    }
+    if (c1 == '%' && c2 == '=') {
+        shift(2);
+        return MODEQUAL;
+    }
+    if (c1 == '|' && c2 == '=') {
+        shift(2);
+        return OREQUAL;
+    }
+    if (c1 == '<' && c2 == '<') {
+        shift(2);
+        return LSHIFT;
+    }
+    if (c1 == '>' && c2 == '>') {
+        shift(2);
+        return RSHIFT;
+    }
+    if (c1 == '&' && c2 == '&') {
+        shift(2);
+        return AND;
+    }
+    if (c1 == '|' && c2 == '|') {
+        shift(2);
+        return OR;
+    }
+    switch (c1) {
+ALWAYS_INLINE JSC::Identifier* Lexer::makeIdentifier(const Vector<UChar>& buffer)
+{
+    m_identifiers.append(JSC::Identifier(m_globalData, buffer.data(), buffer.size()));
+    return &m_identifiers.last();
+}
+ALWAYS_INLINE int Lexer::matchPunctuator(int& charPos)
+{
+    switch (m_current) {
+        case '>':
+            if (m_next1 == '>' && m_next2 == '>') {
+                if (m_next3 == '=') {
+                    shift4();
+                    return URSHIFTEQUAL;
+                }
+                shift3();
+                return URSHIFT;
+            }
+            if (m_next1 == '>') {
+                if (m_next2 == '=') {
+                    shift3();
+                    return RSHIFTEQUAL;
+                }
+                shift2();
+                return RSHIFT;
+            }
+            if (m_next1 == '=') {
+                shift2();
+                return GE;
+            }
+            shift1();
+            return '>';
         case '=':
+        case '>':
+            if (m_next1 == '=') {
+                if (m_next2 == '=') {
+                    shift3();
+                    return STREQ;
+                }
+                shift2();
+                return EQEQ;
+            }
+            shift1();
+            return '=';
+        case '!':
+            if (m_next1 == '=') {
+                if (m_next2 == '=') {
+                    shift3();
+                    return STRNEQ;
+                }
+                shift2();
+                return NE;
+            }
+            shift1();
+            return '!';
         case '<':
+            if (m_next1 == '<') {
+                if (m_next2 == '=') {
+                    shift3();
+                    return LSHIFTEQUAL;
+                }
+                shift2();
+                return LSHIFT;
+            }
+            if (m_next1 == '=') {
+                shift2();
+                return LE;
+            }
+            shift1();
+            return '<';
+        case '+':
+            if (m_next1 == '+') {
+                shift2();
+                if (m_terminator)
+                    return AUTOPLUSPLUS;
+                return PLUSPLUS;
+            }
+            if (m_next1 == '=') {
+                shift2();
+                return PLUSEQUAL;
+            }
+            shift1();
+            return '+';
+        case '-':
+            if (m_next1 == '-') {
+                shift2();
+                if (m_terminator)
+                    return AUTOMINUSMINUS;
+                return MINUSMINUS;
+            }
+            if (m_next1 == '=') {
+                shift2();
+                return MINUSEQUAL;
+            }
+            shift1();
+            return '-';
+        case '*':
+            if (m_next1 == '=') {
+                shift2();
+                return MULTEQUAL;
+            }
+            shift1();
+            return '*';
+        case '/':
+            if (m_next1 == '=') {
+                shift2();
+                return DIVEQUAL;
+            }
+            shift1();
+            return '/';
+        case '&':
+            if (m_next1 == '&') {
+                shift2();
+                return AND;
+            }
+            if (m_next1 == '=') {
+                shift2();
+                return ANDEQUAL;
+            }
+            shift1();
+            return '&';
+        case '^':
+            if (m_next1 == '=') {
+                shift2();
+                return XOREQUAL;
+            }
+            shift1();
+            return '^';
+        case '%':
+            if (m_next1 == '=') {
+                shift2();
+                return MODEQUAL;
+            }
+            shift1();
+            return '%';
+        case '|':
+            if (m_next1 == '=') {
+                shift2();
+                return OREQUAL;
+            }
+            if (m_next1 == '|') {
+                shift2();
+                return OR;
+            }
+            shift1();
+            return '|';
         case ',':
+        case '!':
+            shift1();
+            return ',';
         case '~':
+            shift1();
+            return '~';
         case '?':
+            shift1();
+            return '?';
         case ':':
+            shift1();
+            return ':';
         case '.':
+        case '+':
+        case '-':
+        case '*':
+        case '/':
+        case '&':
+        case '|':
+        case '^':
+        case '%':
+            shift1();
+            return '.';
         case '(':
+            shift1();
+            return '(';
         case ')':
+            shift1();
+            return ')';
         case '[':
+            shift1();
+            return '[';
         case ']':
+            shift1();
+            return ']';
         case ';':
             shift(1);
             return static_cast<int>(c1);
+            shift1();
+            return ';';
         case '{':
             charPos = m_currentOffset;
             shift(1);
+            charPos = currentOffset();
+            shift1();
             return OPENBRACE;
         case '}':
             charPos = m_currentOffset;
             shift(1);
+            charPos = currentOffset();
+            shift1();
             return CLOSEBRACE;
+        default:
+            return -1;
+    }
+}
+unsigned short Lexer::singleEscape(unsigned short c)
+    }
+    return -1;
+}
+ALWAYS_INLINE bool Lexer::isLineTerminator()
+{
+    bool cr = m_current == '\r';
+    bool lf = m_current == '\n';
+    m_skipLineEnd |= (cr << SkipLFShift) | (lf << SkipCRShift);
+    return cr | lf | ((m_current & ~1) == 0x2028);
+}
+inline bool Lexer::lastTokenWasRestrKeyword() const
+{
+    return m_lastToken == CONTINUE || m_lastToken == BREAK || m_lastToken == RETURN || m_lastToken == THROW;
+}
+static NEVER_INLINE bool isNonASCIIIdentStart(int c)
+{
+    return category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other);
+}
+static inline bool isIdentStart(int c)
+{
+    return isASCII(c) ? isASCIIAlpha(c) || c == '$' || c == '_' : isNonASCIIIdentStart(c);
+}
+static NEVER_INLINE bool isNonASCIIIdentPart(int c)
+{
+    return category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other
+        | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector);
+}
+static inline bool isIdentPart(int c)
+{
+    return isASCII(c) ? isASCIIAlphanumeric(c) || c == '$' || c == '_' : isNonASCIIIdentPart(c);
+}
+static int singleEscape(int c)
+{
     switch (c) {
 …
+}
+unsigned short Lexer::convertOctal(int c1, int c2, int c3)
+{
+    return static_cast<unsigned short>((c1 - '0') * 64 + (c2 - '0') * 8 + c3 - '0');
+}
+unsigned char Lexer::convertHex(int c)
+{
+    if (c >= '0' && c <= '9')
+        return static_cast<unsigned char>(c - '0');
+    if (c >= 'a' && c <= 'f')
+        return static_cast<unsigned char>(c - 'a' + 10);
+    return static_cast<unsigned char>(c - 'A' + 10);
+}
+unsigned char Lexer::convertHex(int c1, int c2)
+{
+    return ((convertHex(c1) << 4) + convertHex(c2));
+}
+UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
+{
+    unsigned char highByte = (convertHex(c1) << 4) + convertHex(c2);
+    unsigned char lowByte = (convertHex(c3) << 4) + convertHex(c4);
+    return (highByte << 8 | lowByte);
+}
+void Lexer::record8(int c)
+static inline int convertOctal(int c1, int c2, int c3)
+{
+    return (c1 - '0') * 64 + (c2 - '0') * 8 + c3 - '0';
+}
+inline void Lexer::record8(int c)
+{
     ASSERT(c >= 0);
     ASSERT(c <= 0xff);
+    ASSERT(c <= 0xFF);
     m_buffer8.append(static_cast<char>(c));
+}
 void Lexer::record16(int c)
+inline void Lexer::record16(int c)
+{
     ASSERT(c >= 0);
 …
+}
 void Lexer::record16(UChar c)
+inline void Lexer::record16(UChar c)
+{
     m_buffer16.append(c);
+}
+int Lexer::lex(void* p1, void* p2)
+{
+    YYSTYPE* lvalp = static_cast<YYSTYPE*>(p1);
+    YYLTYPE* llocp = static_cast<YYLTYPE*>(p2);
+    int token = 0;
+    m_state = Start;
+    int stringType = 0; // either single or double quotes
+    m_buffer8.resize(0);
+    m_buffer16.resize(0);
+    m_done = false;
+    m_terminator = false;
+    m_skipLineEnd = 0;
+    int startOffset = currentOffset();
+    while (true) {
+        if (m_skipLineEnd) {
+            if (m_current != '\n') // found \r but not \n afterwards
+                m_skipLineEnd &= ~SkipLF;
+            if (m_current != '\r') // found \n but not \r afterwards
+                m_skipLineEnd &= ~SkipCR;
+            if (m_skipLineEnd) { // found \r\n or \n\r -> eat the second one
+                m_skipLineEnd = 0;
+                shift1();
+            }
+        }
+        switch (m_state) {
+            case Start:
+                startOffset = currentOffset();
+                if (isWhiteSpace(m_current)) {
+                    // do nothing
+                } else if (m_current == '/' && m_next1 == '/') {
+                    shift1();
+                    m_state = InSingleLineComment;
+                } else if (m_current == '/' && m_next1 == '*') {
+                    shift1();
+                    m_state = InMultiLineComment;
+                } else if (m_current == -1) {
+                    if (!m_terminator && !m_delimited && !m_isReparsing) {
+                        // automatic semicolon insertion if program incomplete
+                        token = ';';
+                        setDone(Other);
+                    } else
+                        setDone(Eof);
+                } else if (isLineTerminator()) {
+                    nextLine();
+                    m_terminator = true;
+                    if (lastTokenWasRestrKeyword()) {
+                        token = ';';
+                        setDone(Other);
+                    }
+                } else if (m_current == '"' || m_current == '\'') {
+                    m_state = InString;
+                    stringType = m_current;
+                } else if (isIdentStart(m_current)) {
+                    record16(m_current);
+                    m_state = InIdentifierOrKeyword;
+                } else if (m_current == '\\')
+                    m_state = InIdentifierStartUnicodeEscapeStart;
+                else if (m_current == '0') {
+                    record8(m_current);
+                    m_state = InNum0;
+                } else if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InNum;
+                } else if (m_current == '.' && isASCIIDigit(m_next1)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else if (m_current == '<' && m_next1 == '!' && m_next2 == '-' && m_next3 == '-') {
+                    // <!-- marks the beginning of a line comment (for www usage)
+                    shift3();
+                    m_state = InSingleLineComment;
+                } else if (m_atLineStart && m_current == '-' && m_next1 == '-' && m_next2 == '>') {
+                    // same for -->
+                    shift2();
+                    m_state = InSingleLineComment;
+                } else {
+                    token = matchPunctuator(lvalp->intValue);
+                    if (token != -1)
+                        setDone(Other);
+                    else
+                        setDone(Bad);
+                }
+                goto stillAtLineStart;
+            case InString:
+                if (m_current == stringType) {
+                    shift1();
+                    setDone(String);
+                } else if (isLineTerminator() || m_current == -1)
+                    setDone(Bad);
+                else if (m_current == '\\')
+                    m_state = InEscapeSequence;
+                else
+                    record16(m_current);
+                break;
+            // Escape Sequences inside of strings
+            case InEscapeSequence:
+                if (isASCIIOctalDigit(m_current)) {
+                    if (m_current >= '0' && m_current <= '3' && isASCIIOctalDigit(m_next1) && isASCIIOctalDigit(m_next2)) {
+                        record16(convertOctal(m_current, m_next1, m_next2));
+                        shift2();
+                        m_state = InString;
+                    } else if (isASCIIOctalDigit(m_current) && isASCIIOctalDigit(m_next1)) {
+                        record16(convertOctal('0', m_current, m_next1));
+                        shift1();
+                        m_state = InString;
+                    } else if (isASCIIOctalDigit(m_current)) {
+                        record16(convertOctal('0', '0', m_current));
+                        m_state = InString;
+                    } else
+                        setDone(Bad);
+                } else if (m_current == 'x')
+                    m_state = InHexEscape;
+                else if (m_current == 'u')
+                    m_state = InUnicodeEscape;
+                else if (isLineTerminator()) {
+                    nextLine();
+                    m_state = InString;
+                } else {
+                    record16(singleEscape(m_current));
+                    m_state = InString;
+                }
+                break;
+            case InHexEscape:
+                if (isASCIIHexDigit(m_current) && isASCIIHexDigit(m_next1)) {
+                    m_state = InString;
+                    record16(convertHex(m_current, m_next1));
+                    shift1();
+                } else if (m_current == stringType) {
+                    record16('x');
+                    shift1();
+                    setDone(String);
+                } else {
+                    record16('x');
+                    record16(m_current);
+                    m_state = InString;
+                }
+                break;
+            case InUnicodeEscape:
+                if (isASCIIHexDigit(m_current) && isASCIIHexDigit(m_next1) && isASCIIHexDigit(m_next2) && isASCIIHexDigit(m_next3)) {
+                    record16(convertUnicode(m_current, m_next1, m_next2, m_next3));
+                    shift3();
+                    m_state = InString;
+                } else if (m_current == stringType) {
+                    record16('u');
+                    shift1();
+                    setDone(String);
+                } else
+                    setDone(Bad);
+                break;
+            case InSingleLineComment:
+                if (isLineTerminator()) {
+                    nextLine();
+                    m_terminator = true;
+                    if (lastTokenWasRestrKeyword()) {
+                        token = ';';
+                        setDone(Other);
+                    } else
+                        m_state = Start;
+                } else if (m_current == -1)
+                    setDone(Eof);
+                goto stillAtLineStart;
+            case InMultiLineComment:
+                if (isLineTerminator())
+                    nextLine();
+                else if (m_current == '*' && m_next1 == '/') {
+                    m_state = Start;
+                    shift1();
+                } else if (m_current == -1)
+                    setDone(Bad);
+                break;
+            case InIdentifierOrKeyword:
+                if (isIdentPart(m_current)) {
+                    record16(m_current);
+                    while (isIdentPart(m_next1)) {
+                        shift1();
+                        record16(m_current);
+                    }
+                } else if (m_current == '\\')
+                    m_state = InIdentifierPartUnicodeEscapeStart;
+                else
+                    setDone(IdentifierOrKeyword);
+                break;
+            case InIdentifier:
+                if (isIdentPart(m_current)) {
+                    record16(m_current);
+                    while (isIdentPart(m_next1)) {
+                        shift1();
+                        record16(m_current);
+                    }
+                } else if (m_current == '\\')
+                    m_state = InIdentifierPartUnicodeEscapeStart;
+                else
+                    setDone(Identifier);
+                break;
+            case InNum0:
+                if (m_current == 'x' || m_current == 'X') {
+                    record8(m_current);
+                    m_state = InHex;
+                } else if (m_current == '.') {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else if (isASCIIOctalDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InOctal;
+                } else if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else
+                    setDone(Number);
+                break;
+            case InHex:
+                if (isASCIIHexDigit(m_current)) {
+                    record8(m_current);
+                    while (isASCIIHexDigit(m_next1)) {
+                        shift1();
+                        record8(m_current);
+                    }
+                } else
+                    setDone(Hex);
+                break;
+            case InOctal:
+                if (isASCIIOctalDigit(m_current)) {
+                    record8(m_current);
+                    while (isASCIIOctalDigit(m_next1)) {
+                        shift1();
+                        record8(m_current);
+                    }
+                } else if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else
+                    setDone(Octal);
+                break;
+            case InNum:
+                if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    while (isASCIIDigit(m_next1)) {
+                        shift1();
+                        record8(m_current);
+                    }
+                } else if (m_current == '.') {
+                    record8(m_current);
+                    m_state = InDecimal;
+                } else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else
+                    setDone(Number);
+                break;
+            case InDecimal:
+                if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    while (isASCIIDigit(m_next1)) {
+                        shift1();
+                        record8(m_current);
+                    }
+                } else if (m_current == 'e' || m_current == 'E') {
+                    record8(m_current);
+                    m_state = InExponentIndicator;
+                } else
+                    setDone(Number);
+                break;
+            case InExponentIndicator:
+                if (m_current == '+' || m_current == '-')
+                    record8(m_current);
+                else if (isASCIIDigit(m_current)) {
+                    record8(m_current);
+                    m_state = InExponent;
+                } else
+                    setDone(Bad);
+                break;
+            case InExponent:
+                if (isASCIIDigit(m_current))
+                    record8(m_current);
+                else
+                    setDone(Number);
+                break;
+            case InIdentifierStartUnicodeEscapeStart:
+                if (m_current == 'u')
+                    m_state = InIdentifierStartUnicodeEscape;
+                else
+                    setDone(Bad);
+                break;
+            case InIdentifierPartUnicodeEscapeStart:
+                if (m_current == 'u')
+                    m_state = InIdentifierPartUnicodeEscape;
+                else
+                    setDone(Bad);
+                break;
+            case InIdentifierStartUnicodeEscape:
+                if (!isASCIIHexDigit(m_current) || !isASCIIHexDigit(m_next1) || !isASCIIHexDigit(m_next2) || !isASCIIHexDigit(m_next3)) {
+                    setDone(Bad);
+                    break;
+                }
+                token = convertUnicode(m_current, m_next1, m_next2, m_next3);
+                shift3();
+                if (!isIdentStart(token)) {
+                    setDone(Bad);
+                    break;
+                }
+                record16(token);
+                m_state = InIdentifier;
+                break;
+            case InIdentifierPartUnicodeEscape:
+                if (!isASCIIHexDigit(m_current) || !isASCIIHexDigit(m_next1) || !isASCIIHexDigit(m_next2) || !isASCIIHexDigit(m_next3)) {
+                    setDone(Bad);
+                    break;
+                }
+                token = convertUnicode(m_current, m_next1, m_next2, m_next3);
+                shift3();
+                if (!isIdentPart(token)) {
+                    setDone(Bad);
+                    break;
+                }
+                record16(token);
+                m_state = InIdentifier;
+                break;
+            default:
+                ASSERT_NOT_REACHED();
+        }
+        m_atLineStart = false;
+stillAtLineStart:
+        if (m_done)
+            break;
+        shift1();
+    }
+    if (m_state == Number || m_state == Octal || m_state == Hex) {
+        // No identifiers allowed directly after numeric literal, e.g. "3in" is bad.
+        if (isIdentStart(m_current))
+            m_state = Bad;
+        else {
+            // terminate string
+            m_buffer8.append('\0');
+            if (m_state == Number)
+                lvalp->doubleValue = WTF::strtod(m_buffer8.data(), 0L);
+            else if (m_state == Hex) { // scan hex numbers
+                double dval = 0;
+                const char* p = m_buffer8.data() + 2;
+                while (char c = *p++) {
+                    dval *= 16;
+                    dval += toASCIIHexValue(c);
+                }
+                if (dval >= mantissaOverflowLowerBound)
+                    dval = parseIntOverflow(m_buffer8.data() + 2, p - (m_buffer8.data() + 3), 16);
+                m_state = Number;
+                lvalp->doubleValue = dval;
+            } else {   // scan octal number
+                double dval = 0;
+                const char* p = m_buffer8.data() + 1;
+                while (char c = *p++) {
+                    dval *= 8;
+                    dval += c - '0';
+                }
+                if (dval >= mantissaOverflowLowerBound)
+                    dval = parseIntOverflow(m_buffer8.data() + 1, p - (m_buffer8.data() + 2), 8);
+                m_state = Number;
+                lvalp->doubleValue = dval;
+            }
+        }
+    }
+    m_delimited = false;
+    int lineNumber = m_lineNumber;
+    llocp->first_line = lineNumber;
+    llocp->last_line = lineNumber;
+    llocp->first_column = startOffset;
+    llocp->last_column = currentOffset();
+    switch (m_state) {
+        case Eof:
+            token = 0;
+            break;
+        case Other:
+            m_delimited = (token == '}') | (token == ';');
+            break;
+        case Identifier:
+            lvalp->ident = makeIdentifier(m_buffer16);
+            token = IDENT;
+            break;
+        case IdentifierOrKeyword: {
+            lvalp->ident = makeIdentifier(m_buffer16);
+            const HashEntry* entry = m_keywordTable.entry(m_globalData, *lvalp->ident);
+            if (!entry) {
+                // Lookup for keyword failed, means this is an identifier.
+                token = IDENT;
+                break;
+            }
+            token = entry->lexerValue();
+            break;
+        }
+        case String:
+            // Atomize constant strings in case they're later used in property lookup.
+            lvalp->ident = makeIdentifier(m_buffer16);
+            token = STRING;
+            break;
+        case Number:
+            token = NUMBER;
+            break;
+        default:
+            ASSERT_NOT_REACHED();
+            // Fall through.
+        case Bad:
+            m_error = true;
+            return -1;
+    }
+    m_lastToken = token;
+    return token;
+}
 …
         if (isLineTerminator() || m_current == -1)
             return false;
         else if (m_current != '/' || lastWasEscape == true || inBrackets == true) {
+        else if (m_current != '/' || lastWasEscape || inBrackets) {
             // keep track of '[' and ']'
             if (!lastWasEscape) {
                 if ( m_current == '[' && !inBrackets )
+                if (m_current == '[' && !inBrackets)
                     inBrackets = true;
                 if ( m_current == ']' && inBrackets )
+                if (m_current == ']' && inBrackets)
                     inBrackets = false;
+            }
             record16(m_current);
+            lastWasEscape =
+            !lastWasEscape && (m_current == '\\');
+            lastWasEscape = !lastWasEscape && m_current == '\\';
         } else { // end of regexp
             m_pattern = UString(m_buffer16);
             m_buffer16.resize(0);
             shift(1);
+            shift1();
             break;
+        }
         shift(1);
+        shift1();
+    }
     while (isIdentPart(m_current)) {
         record16(m_current);
         shift(1);
+        shift1();
+    }
     m_flags = UString(m_buffer16);
 …
+{
     m_identifiers.clear();
+    m_codeWithoutBOMs.clear();
     Vector<char> newBuffer8;
 …
     m_isReparsing = false;
+    m_pattern = 0;
+    m_flags = 0;
+    m_pattern = UString();
+    m_flags = UString();
+}
+SourceCode Lexer::sourceCode(int openBrace, int closeBrace, int firstLine)
+{
+    if (m_codeWithoutBOMs.isEmpty())
+        return SourceCode(m_source->provider(), openBrace, closeBrace + 1, firstLine);
+    const UChar* data = m_source->provider()->data();
+    ASSERT(openBrace < closeBrace);
+    int numBOMsBeforeOpenBrace = 0;
+    int numBOMsBetweenBraces = 0;
+    int i;
+    for (i = m_source->startOffset(); i < openBrace; ++i)
+        numBOMsBeforeOpenBrace += data[i] == byteOrderMark;
+    for (; i < closeBrace; ++i)
+        numBOMsBetweenBraces += data[i] == byteOrderMark;
+    return SourceCode(m_source->provider(), openBrace + numBOMsBeforeOpenBrace,
+        closeBrace + numBOMsBeforeOpenBrace + numBOMsBetweenBraces + 1, firstLine);
+}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 43156 in webkit for trunk/JavaScriptCore/parser/Lexer.cpp

Legend:

trunk/JavaScriptCore/parser/Lexer.cpp

Download in other formats: