source: webkit/trunk/JavaScriptCore/parser/Lexer.h@ 43156

Last change on this file since 43156 was 43156, checked in by Darin Adler, 16 years ago

2009-05-02 Darin Adler <Darin Adler>

Reviewed by Maciej Stachowiak.

Bug 25519: streamline lexer by handling BOMs differently
https://bugs.webkit.org/show_bug.cgi?id=25519

Roughly 1% faster SunSpider.

  • parser/Grammar.y: Tweak formatting a bit.
  • parser/Lexer.cpp: (JSC::Lexer::Lexer): Remove unnnecessary initialization of data members that are set up by setCode. (JSC::Lexer::currentOffset): Added. Used where the old code would look at m_currentOffset. (JSC::Lexer::shift1): Replaces the old shift function. No longer does anything to handle BOM characters. (JSC::Lexer::shift2): Ditto. (JSC::Lexer::shift3): Ditto. (JSC::Lexer::shift4): Ditto. (JSC::Lexer::setCode): Updated for name change from yylineno to m_line. Removed now-unused m_eatNextIdentifier, m_stackToken, and m_restrKeyword. Replaced m_skipLF and m_skipCR with m_skipLineEnd. Replaced the old m_length with m_codeEnd and m_currentOffset with m_codeStart. Added code to scan for a BOM character and call copyCodeWithoutBOMs() if we find any. (JSC::Lexer::copyCodeWithoutBOMs): Added. (JSC::Lexer::nextLine): Updated for name change from yylineno to m_line. (JSC::Lexer::makeIdentifier): Moved up higher in the file. (JSC::Lexer::matchPunctuator): Moved up higher in the file and changed to use a switch statement instead of just if statements. (JSC::Lexer::isLineTerminator): Moved up higher in the file and changed to have fewer branches. (JSC::Lexer::lastTokenWasRestrKeyword): Added. This replaces the old m_restrKeyword boolean. (JSC::Lexer::isIdentStart): Moved up higher in the file. Changed to use fewer branches in the ASCII but not identifier case. (JSC::Lexer::isIdentPart): Ditto. (JSC::Lexer::singleEscape): Moved up higher in the file. (JSC::Lexer::convertOctal): Moved up higher in the file. (JSC::Lexer::convertHex): Moved up higher in the file. Changed to use toASCIIHexValue instead of rolling our own here. (JSC::Lexer::convertUnicode): Ditto. (JSC::Lexer::record8): Moved up higher in the file. (JSC::Lexer::record16): Moved up higher in the file. (JSC::Lexer::lex): Changed type of stringType to int. Replaced m_skipLF and m_skipCR with m_skipLineEnd, which requires fewer branches in the main lexer loop. Use currentOffset instead of m_currentOffset. Removed unneeded m_stackToken. Use isASCIIDigit instead of isDecimalDigit. Split out the two cases for InIdentifierOrKeyword and InIdentifier. Added special case tight loops for identifiers and other simple states. Removed a branch from the code that sets m_atLineStart to false using goto. Streamlined the number-handling code so we don't check for the same types twice for non-numeric cases and don't add a null to m_buffer8 when it's not being used. Removed m_eatNextIdentifier, which wasn't working anyway, and m_restrKeyword, which is redundant with m_lastToken. Set the m_delimited flag without using a branch. (JSC::Lexer::scanRegExp): Tweaked style a bit. (JSC::Lexer::clear): Clear m_codeWithoutBOMs so we don't use memory after parsing. Clear out UString objects in the more conventional way. (JSC::Lexer::sourceCode): Made this no-longer inline since it has more work to do in the case where we stripped BOMs.
  • parser/Lexer.h: Renamed yylineno to m_lineNumber. Removed convertHex function, which is the same as toASCIIHexValue. Removed isHexDigit function, which is the same as isASCIIHedDigit. Replaced shift with four separate shift functions. Removed isWhiteSpace function that passes m_current, instead just passing m_current explicitly. Removed isOctalDigit, which is the same as isASCIIOctalDigit. Eliminated unused arguments from matchPunctuator. Added copyCoodeWithoutBOMs and currentOffset. Moved the makeIdentifier function out of the header. Added lastTokenWasRestrKeyword function. Added new constants for m_skipLineEnd. Removed unused yycolumn, m_restrKeyword, m_skipLF, m_skipCR, m_eatNextIdentifier, m_stackToken, m_position, m_length, m_currentOffset, m_nextOffset1, m_nextOffset2, m_nextOffset3. Added m_skipLineEnd, m_codeStart, m_codeEnd, and m_codeWithoutBOMs.
  • parser/SourceProvider.h: Added hasBOMs function. In the future this can be used to tell the lexer about strings known not to have BOMs.
  • runtime/JSGlobalObjectFunctions.cpp: (JSC::globalFuncUnescape): Changed to use isASCIIHexDigit.
  • wtf/ASCIICType.h: Added using statements to match the design of the other WTF headers.
  • Property svn:eol-style set to native
File size: 5.2 KB
Line 
1/*
2 * Copyright (C) 1999-2000 Harri Porten ([email protected])
3 * Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
4 *
5 * This library is free software; you can redistribute it and/or
6 * modify it under the terms of the GNU Library General Public
7 * License as published by the Free Software Foundation; either
8 * version 2 of the License, or (at your option) any later version.
9 *
10 * This library is distributed in the hope that it will be useful,
11 * but WITHOUT ANY WARRANTY; without even the implied warranty of
12 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13 * Library General Public License for more details.
14 *
15 * You should have received a copy of the GNU Library General Public License
16 * along with this library; see the file COPYING.LIB. If not, write to
17 * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
18 * Boston, MA 02110-1301, USA.
19 *
20 */
21
22#ifndef Lexer_h
23#define Lexer_h
24
25#include "Lookup.h"
26#include "SegmentedVector.h"
27#include "SourceCode.h"
28#include <wtf/ASCIICType.h>
29#include <wtf/Vector.h>
30#include <wtf/unicode/Unicode.h>
31
32namespace JSC {
33
34 class RegExp;
35
36 class Lexer : Noncopyable {
37 public:
38 // Character manipulation functions.
39 static bool isWhiteSpace(int character);
40 static bool isLineTerminator(int character);
41 static unsigned char convertHex(int c1, int c2);
42 static UChar convertUnicode(int c1, int c2, int c3, int c4);
43
44 // Functions to set up parsing.
45 void setCode(const SourceCode&);
46 void setIsReparsing() { m_isReparsing = true; }
47
48 // Functions for the parser itself.
49 int lex(void* lvalp, void* llocp);
50 int lineNumber() const { return m_lineNumber; }
51 bool prevTerminator() const { return m_terminator; }
52 SourceCode sourceCode(int openBrace, int closeBrace, int firstLine);
53 bool scanRegExp();
54 const UString& pattern() const { return m_pattern; }
55 const UString& flags() const { return m_flags; }
56
57 // Functions for use after parsing.
58 bool sawError() const { return m_error; }
59 void clear();
60
61 private:
62 friend class JSGlobalData;
63
64 Lexer(JSGlobalData*);
65 ~Lexer();
66
67 enum State {
68 Start,
69 IdentifierOrKeyword,
70 Identifier,
71 InIdentifierOrKeyword,
72 InIdentifier,
73 InIdentifierStartUnicodeEscapeStart,
74 InIdentifierStartUnicodeEscape,
75 InIdentifierPartUnicodeEscapeStart,
76 InIdentifierPartUnicodeEscape,
77 InSingleLineComment,
78 InMultiLineComment,
79 InNum,
80 InNum0,
81 InHex,
82 InOctal,
83 InDecimal,
84 InExponentIndicator,
85 InExponent,
86 Hex,
87 Octal,
88 Number,
89 String,
90 Eof,
91 InString,
92 InEscapeSequence,
93 InHexEscape,
94 InUnicodeEscape,
95 Other,
96 Bad
97 };
98
99 void setDone(State);
100 void shift1();
101 void shift2();
102 void shift3();
103 void shift4();
104 void nextLine();
105 int lookupKeyword(const char *);
106
107 bool isLineTerminator();
108
109 int matchPunctuator(int& charPos);
110
111 void record8(int);
112 void record16(int);
113 void record16(UChar);
114
115 void copyCodeWithoutBOMs();
116
117 int currentOffset() const;
118
119 JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer);
120
121 bool lastTokenWasRestrKeyword() const;
122
123 static const size_t initialReadBufferCapacity = 32;
124 static const size_t initialIdentifierTableCapacity = 64;
125
126 int m_lineNumber;
127
128 bool m_done;
129 Vector<char> m_buffer8;
130 Vector<UChar> m_buffer16;
131 bool m_terminator;
132 bool m_delimited; // encountered delimiter like "'" and "}" on last run
133 unsigned char m_skipLineEnd;
134 int m_lastToken;
135
136 State m_state;
137 const SourceCode* m_source;
138 const UChar* m_code;
139 const UChar* m_codeStart;
140 const UChar* m_codeEnd;
141 bool m_isReparsing;
142 bool m_atLineStart;
143 bool m_error;
144
145 // current and following unicode characters (int to allow for -1 for end-of-file marker)
146 int m_current;
147 int m_next1;
148 int m_next2;
149 int m_next3;
150
151 SegmentedVector<JSC::Identifier, initialIdentifierTableCapacity> m_identifiers;
152
153 JSGlobalData* m_globalData;
154
155 UString m_pattern;
156 UString m_flags;
157
158 const HashTable m_keywordTable;
159
160 Vector<UChar> m_codeWithoutBOMs;
161 };
162
163 inline bool Lexer::isWhiteSpace(int ch)
164 {
165 return isASCII(ch) ? (ch == ' ' || ch == '\t' || ch == 0xB || ch == 0xC) : WTF::Unicode::isSeparatorSpace(ch);
166 }
167
168 inline bool Lexer::isLineTerminator(int ch)
169 {
170 return ch == '\r' || ch == '\n' || ch == 0x2028 || ch == 0x2029;
171 }
172
173 inline unsigned char Lexer::convertHex(int c1, int c2)
174 {
175 return (toASCIIHexValue(c1) << 4) | toASCIIHexValue(c2);
176 }
177
178 inline UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
179 {
180 return (convertHex(c1, c2) << 8) | convertHex(c3, c4);
181 }
182
183} // namespace JSC
184
185#endif // Lexer_h
Note: See TracBrowser for help on using the repository browser.