Changeset 43156 in webkit for trunk/JavaScriptCore/parser/Lexer.h


Ignore:
Timestamp:
May 3, 2009, 9:49:35 AM (16 years ago)
Author:
Darin Adler
Message:

2009-05-02 Darin Adler <Darin Adler>

Reviewed by Maciej Stachowiak.

Bug 25519: streamline lexer by handling BOMs differently
https://bugs.webkit.org/show_bug.cgi?id=25519

Roughly 1% faster SunSpider.

  • parser/Grammar.y: Tweak formatting a bit.
  • parser/Lexer.cpp: (JSC::Lexer::Lexer): Remove unnnecessary initialization of data members that are set up by setCode. (JSC::Lexer::currentOffset): Added. Used where the old code would look at m_currentOffset. (JSC::Lexer::shift1): Replaces the old shift function. No longer does anything to handle BOM characters. (JSC::Lexer::shift2): Ditto. (JSC::Lexer::shift3): Ditto. (JSC::Lexer::shift4): Ditto. (JSC::Lexer::setCode): Updated for name change from yylineno to m_line. Removed now-unused m_eatNextIdentifier, m_stackToken, and m_restrKeyword. Replaced m_skipLF and m_skipCR with m_skipLineEnd. Replaced the old m_length with m_codeEnd and m_currentOffset with m_codeStart. Added code to scan for a BOM character and call copyCodeWithoutBOMs() if we find any. (JSC::Lexer::copyCodeWithoutBOMs): Added. (JSC::Lexer::nextLine): Updated for name change from yylineno to m_line. (JSC::Lexer::makeIdentifier): Moved up higher in the file. (JSC::Lexer::matchPunctuator): Moved up higher in the file and changed to use a switch statement instead of just if statements. (JSC::Lexer::isLineTerminator): Moved up higher in the file and changed to have fewer branches. (JSC::Lexer::lastTokenWasRestrKeyword): Added. This replaces the old m_restrKeyword boolean. (JSC::Lexer::isIdentStart): Moved up higher in the file. Changed to use fewer branches in the ASCII but not identifier case. (JSC::Lexer::isIdentPart): Ditto. (JSC::Lexer::singleEscape): Moved up higher in the file. (JSC::Lexer::convertOctal): Moved up higher in the file. (JSC::Lexer::convertHex): Moved up higher in the file. Changed to use toASCIIHexValue instead of rolling our own here. (JSC::Lexer::convertUnicode): Ditto. (JSC::Lexer::record8): Moved up higher in the file. (JSC::Lexer::record16): Moved up higher in the file. (JSC::Lexer::lex): Changed type of stringType to int. Replaced m_skipLF and m_skipCR with m_skipLineEnd, which requires fewer branches in the main lexer loop. Use currentOffset instead of m_currentOffset. Removed unneeded m_stackToken. Use isASCIIDigit instead of isDecimalDigit. Split out the two cases for InIdentifierOrKeyword and InIdentifier. Added special case tight loops for identifiers and other simple states. Removed a branch from the code that sets m_atLineStart to false using goto. Streamlined the number-handling code so we don't check for the same types twice for non-numeric cases and don't add a null to m_buffer8 when it's not being used. Removed m_eatNextIdentifier, which wasn't working anyway, and m_restrKeyword, which is redundant with m_lastToken. Set the m_delimited flag without using a branch. (JSC::Lexer::scanRegExp): Tweaked style a bit. (JSC::Lexer::clear): Clear m_codeWithoutBOMs so we don't use memory after parsing. Clear out UString objects in the more conventional way. (JSC::Lexer::sourceCode): Made this no-longer inline since it has more work to do in the case where we stripped BOMs.
  • parser/Lexer.h: Renamed yylineno to m_lineNumber. Removed convertHex function, which is the same as toASCIIHexValue. Removed isHexDigit function, which is the same as isASCIIHedDigit. Replaced shift with four separate shift functions. Removed isWhiteSpace function that passes m_current, instead just passing m_current explicitly. Removed isOctalDigit, which is the same as isASCIIOctalDigit. Eliminated unused arguments from matchPunctuator. Added copyCoodeWithoutBOMs and currentOffset. Moved the makeIdentifier function out of the header. Added lastTokenWasRestrKeyword function. Added new constants for m_skipLineEnd. Removed unused yycolumn, m_restrKeyword, m_skipLF, m_skipCR, m_eatNextIdentifier, m_stackToken, m_position, m_length, m_currentOffset, m_nextOffset1, m_nextOffset2, m_nextOffset3. Added m_skipLineEnd, m_codeStart, m_codeEnd, and m_codeWithoutBOMs.
  • parser/SourceProvider.h: Added hasBOMs function. In the future this can be used to tell the lexer about strings known not to have BOMs.
  • runtime/JSGlobalObjectFunctions.cpp: (JSC::globalFuncUnescape): Changed to use isASCIIHexDigit.
  • wtf/ASCIICType.h: Added using statements to match the design of the other WTF headers.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/parser/Lexer.h

    r43144 r43156  
    11/*
    22 *  Copyright (C) 1999-2000 Harri Porten ([email protected])
    3  *  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
     3 *  Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
    44 *
    55 *  This library is free software; you can redistribute it and/or
     
    2323#define Lexer_h
    2424
    25 #include "Identifier.h"
    2625#include "Lookup.h"
    2726#include "SegmentedVector.h"
    2827#include "SourceCode.h"
     28#include <wtf/ASCIICType.h>
    2929#include <wtf/Vector.h>
    3030#include <wtf/unicode/Unicode.h>
     
    3636    class Lexer : Noncopyable {
    3737    public:
     38        // Character manipulation functions.
     39        static bool isWhiteSpace(int character);
     40        static bool isLineTerminator(int character);
     41        static unsigned char convertHex(int c1, int c2);
     42        static UChar convertUnicode(int c1, int c2, int c3, int c4);
     43
     44        // Functions to set up parsing.
    3845        void setCode(const SourceCode&);
    3946        void setIsReparsing() { m_isReparsing = true; }
     47
     48        // Functions for the parser itself.
    4049        int lex(void* lvalp, void* llocp);
     50        int lineNumber() const { return m_lineNumber; }
     51        bool prevTerminator() const { return m_terminator; }
     52        SourceCode sourceCode(int openBrace, int closeBrace, int firstLine);
     53        bool scanRegExp();
     54        const UString& pattern() const { return m_pattern; }
     55        const UString& flags() const { return m_flags; }
    4156
    42         int lineNo() const { return yylineno; }
     57        // Functions for use after parsing.
     58        bool sawError() const { return m_error; }
     59        void clear();
    4360
    44         bool prevTerminator() const { return m_terminator; }
     61    private:
     62        friend class JSGlobalData;
     63
     64        Lexer(JSGlobalData*);
     65        ~Lexer();
    4566
    4667        enum State {
     
    7697        };
    7798
    78         bool scanRegExp();
    79         const UString& pattern() const { return m_pattern; }
    80         const UString& flags() const { return m_flags; }
    81 
    82         static unsigned char convertHex(int);
    83         static unsigned char convertHex(int c1, int c2);
    84         static UChar convertUnicode(int c1, int c2, int c3, int c4);
    85         static bool isIdentStart(int);
    86         static bool isIdentPart(int);
    87         static bool isHexDigit(int);
    88 
    89         bool sawError() const { return m_error; }
    90 
    91         void clear();
    92         SourceCode sourceCode(int openBrace, int closeBrace, int firstLine) { return SourceCode(m_source->provider(), openBrace, closeBrace + 1, firstLine); }
    93 
    94         static inline bool isWhiteSpace(int ch)
    95         {
    96             return ch == '\t' || ch == 0x0b || ch == 0x0c || WTF::Unicode::isSeparatorSpace(ch);
    97         }
    98 
    99         static inline bool isLineTerminator(int ch)
    100         {
    101             return ch == '\r' || ch == '\n' || ch == 0x2028 || ch == 0x2029;
    102         }
    103 
    104     private:
    105         friend class JSGlobalData;
    106         Lexer(JSGlobalData*);
    107         ~Lexer();
    108 
    10999        void setDone(State);
    110         void shift(unsigned int p);
     100        void shift1();
     101        void shift2();
     102        void shift3();
     103        void shift4();
    111104        void nextLine();
    112105        int lookupKeyword(const char *);
    113106
    114         bool isWhiteSpace() const;
    115107        bool isLineTerminator();
    116         static bool isOctalDigit(int);
    117108
    118         ALWAYS_INLINE int matchPunctuator(int& charPos, int c1, int c2, int c3, int c4);
    119         static unsigned short singleEscape(unsigned short);
    120         static unsigned short convertOctal(int c1, int c2, int c3);
     109        int matchPunctuator(int& charPos);
    121110
    122111        void record8(int);
     
    124113        void record16(UChar);
    125114
    126         ALWAYS_INLINE JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer)
    127         {
    128             m_identifiers.append(JSC::Identifier(m_globalData, buffer.data(), buffer.size()));
    129             return &m_identifiers.last();
    130         }
     115        void copyCodeWithoutBOMs();
     116
     117        int currentOffset() const;
     118
     119        JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer);
     120
     121        bool lastTokenWasRestrKeyword() const;
    131122
    132123        static const size_t initialReadBufferCapacity = 32;
    133124        static const size_t initialIdentifierTableCapacity = 64;
    134125
    135         int yylineno;
    136         int yycolumn;
     126        int m_lineNumber;
    137127
    138128        bool m_done;
     
    140130        Vector<UChar> m_buffer16;
    141131        bool m_terminator;
    142         bool m_restrKeyword;
    143132        bool m_delimited; // encountered delimiter like "'" and "}" on last run
    144         bool m_skipLF;
    145         bool m_skipCR;
    146         bool m_eatNextIdentifier;
    147         int m_stackToken;
     133        unsigned char m_skipLineEnd;
    148134        int m_lastToken;
    149135
    150136        State m_state;
    151         unsigned int m_position;
    152137        const SourceCode* m_source;
    153138        const UChar* m_code;
    154         unsigned int m_length;
     139        const UChar* m_codeStart;
     140        const UChar* m_codeEnd;
    155141        bool m_isReparsing;
    156         int m_atLineStart;
     142        bool m_atLineStart;
    157143        bool m_error;
    158144
     
    163149        int m_next3;
    164150       
    165         int m_currentOffset;
    166         int m_nextOffset1;
    167         int m_nextOffset2;
    168         int m_nextOffset3;
    169        
    170151        SegmentedVector<JSC::Identifier, initialIdentifierTableCapacity> m_identifiers;
    171152
     
    175156        UString m_flags;
    176157
    177         const HashTable m_mainTable;
     158        const HashTable m_keywordTable;
     159
     160        Vector<UChar> m_codeWithoutBOMs;
    178161    };
     162
     163    inline bool Lexer::isWhiteSpace(int ch)
     164    {
     165        return isASCII(ch) ? (ch == ' ' || ch == '\t' || ch == 0xB || ch == 0xC) : WTF::Unicode::isSeparatorSpace(ch);
     166    }
     167
     168    inline bool Lexer::isLineTerminator(int ch)
     169    {
     170        return ch == '\r' || ch == '\n' || ch == 0x2028 || ch == 0x2029;
     171    }
     172
     173    inline unsigned char Lexer::convertHex(int c1, int c2)
     174    {
     175        return (toASCIIHexValue(c1) << 4) | toASCIIHexValue(c2);
     176    }
     177
     178    inline UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
     179    {
     180        return (convertHex(c1, c2) << 8) | convertHex(c3, c4);
     181    }
    179182
    180183} // namespace JSC
Note: See TracChangeset for help on using the changeset viewer.