Context Navigation

← Previous Change
Next Change →

Lexer.h

Timestamp:

May 3, 2009, 9:49:35 AM (16 years ago)

Author:

Darin Adler

Message:

2009-05-02 Darin Adler <Darin Adler>

Reviewed by Maciej Stachowiak.

Bug 25519: streamline lexer by handling BOMs differently
https://bugs.webkit.org/show_bug.cgi?id=25519

Roughly 1% faster SunSpider.

parser/Grammar.y: Tweak formatting a bit.

parser/Lexer.cpp: (JSC::Lexer::Lexer): Remove unnnecessary initialization of data members that are set up by setCode. (JSC::Lexer::currentOffset): Added. Used where the old code would look at m_currentOffset. (JSC::Lexer::shift1): Replaces the old shift function. No longer does anything to handle BOM characters. (JSC::Lexer::shift2): Ditto. (JSC::Lexer::shift3): Ditto. (JSC::Lexer::shift4): Ditto. (JSC::Lexer::setCode): Updated for name change from yylineno to m_line. Removed now-unused m_eatNextIdentifier, m_stackToken, and m_restrKeyword. Replaced m_skipLF and m_skipCR with m_skipLineEnd. Replaced the old m_length with m_codeEnd and m_currentOffset with m_codeStart. Added code to scan for a BOM character and call copyCodeWithoutBOMs() if we find any. (JSC::Lexer::copyCodeWithoutBOMs): Added. (JSC::Lexer::nextLine): Updated for name change from yylineno to m_line. (JSC::Lexer::makeIdentifier): Moved up higher in the file. (JSC::Lexer::matchPunctuator): Moved up higher in the file and changed to use a switch statement instead of just if statements. (JSC::Lexer::isLineTerminator): Moved up higher in the file and changed to have fewer branches. (JSC::Lexer::lastTokenWasRestrKeyword): Added. This replaces the old m_restrKeyword boolean. (JSC::Lexer::isIdentStart): Moved up higher in the file. Changed to use fewer branches in the ASCII but not identifier case. (JSC::Lexer::isIdentPart): Ditto. (JSC::Lexer::singleEscape): Moved up higher in the file. (JSC::Lexer::convertOctal): Moved up higher in the file. (JSC::Lexer::convertHex): Moved up higher in the file. Changed to use toASCIIHexValue instead of rolling our own here. (JSC::Lexer::convertUnicode): Ditto. (JSC::Lexer::record8): Moved up higher in the file. (JSC::Lexer::record16): Moved up higher in the file. (JSC::Lexer::lex): Changed type of stringType to int. Replaced m_skipLF and m_skipCR with m_skipLineEnd, which requires fewer branches in the main lexer loop. Use currentOffset instead of m_currentOffset. Removed unneeded m_stackToken. Use isASCIIDigit instead of isDecimalDigit. Split out the two cases for InIdentifierOrKeyword and InIdentifier. Added special case tight loops for identifiers and other simple states. Removed a branch from the code that sets m_atLineStart to false using goto. Streamlined the number-handling code so we don't check for the same types twice for non-numeric cases and don't add a null to m_buffer8 when it's not being used. Removed m_eatNextIdentifier, which wasn't working anyway, and m_restrKeyword, which is redundant with m_lastToken. Set the m_delimited flag without using a branch. (JSC::Lexer::scanRegExp): Tweaked style a bit. (JSC::Lexer::clear): Clear m_codeWithoutBOMs so we don't use memory after parsing. Clear out UString objects in the more conventional way. (JSC::Lexer::sourceCode): Made this no-longer inline since it has more work to do in the case where we stripped BOMs.

parser/Lexer.h: Renamed yylineno to m_lineNumber. Removed convertHex function, which is the same as toASCIIHexValue. Removed isHexDigit function, which is the same as isASCIIHedDigit. Replaced shift with four separate shift functions. Removed isWhiteSpace function that passes m_current, instead just passing m_current explicitly. Removed isOctalDigit, which is the same as isASCIIOctalDigit. Eliminated unused arguments from matchPunctuator. Added copyCoodeWithoutBOMs and currentOffset. Moved the makeIdentifier function out of the header. Added lastTokenWasRestrKeyword function. Added new constants for m_skipLineEnd. Removed unused yycolumn, m_restrKeyword, m_skipLF, m_skipCR, m_eatNextIdentifier, m_stackToken, m_position, m_length, m_currentOffset, m_nextOffset1, m_nextOffset2, m_nextOffset3. Added m_skipLineEnd, m_codeStart, m_codeEnd, and m_codeWithoutBOMs.

parser/SourceProvider.h: Added hasBOMs function. In the future this can be used to tell the lexer about strings known not to have BOMs.

runtime/JSGlobalObjectFunctions.cpp: (JSC::globalFuncUnescape): Changed to use isASCIIHexDigit.

wtf/ASCIICType.h: Added using statements to match the design of the other WTF headers.

File:

: 1 edited

trunk/JavaScriptCore/parser/Lexer.h (modified) (8 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/parser/Lexer.h

-              r43144
+              r43156
 /*
  *  Copyright (C) 1999-2000 Harri Porten ([email protected])
  *  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
+ *  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
+ *
  *  This library is free software; you can redistribute it and/or
 …
 #define Lexer_h
-#include "Identifier.h"
 #include "Lookup.h"
 #include "SegmentedVector.h"
 #include "SourceCode.h"
+#include <wtf/ASCIICType.h>
 #include <wtf/Vector.h>
 #include <wtf/unicode/Unicode.h>
 …
     class Lexer : Noncopyable {
     public:
+        // Character manipulation functions.
+        static bool isWhiteSpace(int character);
+        static bool isLineTerminator(int character);
+        static unsigned char convertHex(int c1, int c2);
+        static UChar convertUnicode(int c1, int c2, int c3, int c4);
+        // Functions to set up parsing.
         void setCode(const SourceCode&);
         void setIsReparsing() { m_isReparsing = true; }
+        // Functions for the parser itself.
         int lex(void* lvalp, void* llocp);
+        int lineNumber() const { return m_lineNumber; }
+        bool prevTerminator() const { return m_terminator; }
+        SourceCode sourceCode(int openBrace, int closeBrace, int firstLine);
+        bool scanRegExp();
+        const UString& pattern() const { return m_pattern; }
+        const UString& flags() const { return m_flags; }
+        int lineNo() const { return yylineno; }
+        // Functions for use after parsing.
+        bool sawError() const { return m_error; }
+        void clear();
+        bool prevTerminator() const { return m_terminator; }
+    private:
+        friend class JSGlobalData;
+        Lexer(JSGlobalData*);
+        ~Lexer();
         enum State {
 …
         };
-        bool scanRegExp();
-        const UString& pattern() const { return m_pattern; }
-        const UString& flags() const { return m_flags; }
-        static unsigned char convertHex(int);
-        static unsigned char convertHex(int c1, int c2);
-        static UChar convertUnicode(int c1, int c2, int c3, int c4);
-        static bool isIdentStart(int);
-        static bool isIdentPart(int);
-        static bool isHexDigit(int);
-        bool sawError() const { return m_error; }
-        void clear();
-        SourceCode sourceCode(int openBrace, int closeBrace, int firstLine) { return SourceCode(m_source->provider(), openBrace, closeBrace + 1, firstLine); }
-        static inline bool isWhiteSpace(int ch)
+        {
-            return ch == '\t' || ch == 0x0b || ch == 0x0c || WTF::Unicode::isSeparatorSpace(ch);
+        }
-        static inline bool isLineTerminator(int ch)
+        {
-            return ch == '\r' || ch == '\n' || ch == 0x2028 || ch == 0x2029;
+        }
-    private:
-        friend class JSGlobalData;
-        Lexer(JSGlobalData*);
-        ~Lexer();
         void setDone(State);
+        void shift(unsigned int p);
+        void shift1();
+        void shift2();
+        void shift3();
+        void shift4();
         void nextLine();
         int lookupKeyword(const char *);
-        bool isWhiteSpace() const;
         bool isLineTerminator();
-        static bool isOctalDigit(int);
+        ALWAYS_INLINE int matchPunctuator(int& charPos, int c1, int c2, int c3, int c4);
+        static unsigned short singleEscape(unsigned short);
+        static unsigned short convertOctal(int c1, int c2, int c3);
+        int matchPunctuator(int& charPos);
         void record8(int);
 …
         void record16(UChar);
+        ALWAYS_INLINE JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer)
+        {
+            m_identifiers.append(JSC::Identifier(m_globalData, buffer.data(), buffer.size()));
+            return &m_identifiers.last();
+        }
+        void copyCodeWithoutBOMs();
+        int currentOffset() const;
+        JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer);
+        bool lastTokenWasRestrKeyword() const;
         static const size_t initialReadBufferCapacity = 32;
         static const size_t initialIdentifierTableCapacity = 64;
+        int yylineno;
+        int yycolumn;
+        int m_lineNumber;
         bool m_done;
 …
         Vector<UChar> m_buffer16;
         bool m_terminator;
-        bool m_restrKeyword;
         bool m_delimited; // encountered delimiter like "'" and "}" on last run
+        bool m_skipLF;
+        bool m_skipCR;
+        bool m_eatNextIdentifier;
+        int m_stackToken;
+        unsigned char m_skipLineEnd;
         int m_lastToken;
         State m_state;
-        unsigned int m_position;
         const SourceCode* m_source;
         const UChar* m_code;
+        unsigned int m_length;
+        const UChar* m_codeStart;
+        const UChar* m_codeEnd;
         bool m_isReparsing;
         int m_atLineStart;
+        bool m_atLineStart;
         bool m_error;
 …
         int m_next3;
-        int m_currentOffset;
-        int m_nextOffset1;
-        int m_nextOffset2;
-        int m_nextOffset3;
         SegmentedVector<JSC::Identifier, initialIdentifierTableCapacity> m_identifiers;
 …
         UString m_flags;
+        const HashTable m_mainTable;
+        const HashTable m_keywordTable;
+        Vector<UChar> m_codeWithoutBOMs;
     };
+    inline bool Lexer::isWhiteSpace(int ch)
+    {
+        return isASCII(ch) ? (ch == ' ' || ch == '\t' || ch == 0xB || ch == 0xC) : WTF::Unicode::isSeparatorSpace(ch);
+    }
+    inline bool Lexer::isLineTerminator(int ch)
+    {
+        return ch == '\r' || ch == '\n' || ch == 0x2028 || ch == 0x2029;
+    }
+    inline unsigned char Lexer::convertHex(int c1, int c2)
+    {
+        return (toASCIIHexValue(c1) << 4) | toASCIIHexValue(c2);
+    }
+    inline UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
+    {
+        return (convertHex(c1, c2) << 8) | convertHex(c3, c4);
+    }
 } // namespace JSC

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 43156 in webkit for trunk/JavaScriptCore/parser/Lexer.h

Legend:

trunk/JavaScriptCore/parser/Lexer.h

Download in other formats: