Context Navigation

Lexer.h@ 43156

Visit:

Last change on this file since 43156 was 43156, checked in by Darin Adler, 16 years ago

2009-05-02 Darin Adler <Darin Adler>

Reviewed by Maciej Stachowiak.

Bug 25519: streamline lexer by handling BOMs differently
https://bugs.webkit.org/show_bug.cgi?id=25519

Roughly 1% faster SunSpider.

parser/Grammar.y: Tweak formatting a bit.

parser/Lexer.cpp: (JSC::Lexer::Lexer): Remove unnnecessary initialization of data members that are set up by setCode. (JSC::Lexer::currentOffset): Added. Used where the old code would look at m_currentOffset. (JSC::Lexer::shift1): Replaces the old shift function. No longer does anything to handle BOM characters. (JSC::Lexer::shift2): Ditto. (JSC::Lexer::shift3): Ditto. (JSC::Lexer::shift4): Ditto. (JSC::Lexer::setCode): Updated for name change from yylineno to m_line. Removed now-unused m_eatNextIdentifier, m_stackToken, and m_restrKeyword. Replaced m_skipLF and m_skipCR with m_skipLineEnd. Replaced the old m_length with m_codeEnd and m_currentOffset with m_codeStart. Added code to scan for a BOM character and call copyCodeWithoutBOMs() if we find any. (JSC::Lexer::copyCodeWithoutBOMs): Added. (JSC::Lexer::nextLine): Updated for name change from yylineno to m_line. (JSC::Lexer::makeIdentifier): Moved up higher in the file. (JSC::Lexer::matchPunctuator): Moved up higher in the file and changed to use a switch statement instead of just if statements. (JSC::Lexer::isLineTerminator): Moved up higher in the file and changed to have fewer branches. (JSC::Lexer::lastTokenWasRestrKeyword): Added. This replaces the old m_restrKeyword boolean. (JSC::Lexer::isIdentStart): Moved up higher in the file. Changed to use fewer branches in the ASCII but not identifier case. (JSC::Lexer::isIdentPart): Ditto. (JSC::Lexer::singleEscape): Moved up higher in the file. (JSC::Lexer::convertOctal): Moved up higher in the file. (JSC::Lexer::convertHex): Moved up higher in the file. Changed to use toASCIIHexValue instead of rolling our own here. (JSC::Lexer::convertUnicode): Ditto. (JSC::Lexer::record8): Moved up higher in the file. (JSC::Lexer::record16): Moved up higher in the file. (JSC::Lexer::lex): Changed type of stringType to int. Replaced m_skipLF and m_skipCR with m_skipLineEnd, which requires fewer branches in the main lexer loop. Use currentOffset instead of m_currentOffset. Removed unneeded m_stackToken. Use isASCIIDigit instead of isDecimalDigit. Split out the two cases for InIdentifierOrKeyword and InIdentifier. Added special case tight loops for identifiers and other simple states. Removed a branch from the code that sets m_atLineStart to false using goto. Streamlined the number-handling code so we don't check for the same types twice for non-numeric cases and don't add a null to m_buffer8 when it's not being used. Removed m_eatNextIdentifier, which wasn't working anyway, and m_restrKeyword, which is redundant with m_lastToken. Set the m_delimited flag without using a branch. (JSC::Lexer::scanRegExp): Tweaked style a bit. (JSC::Lexer::clear): Clear m_codeWithoutBOMs so we don't use memory after parsing. Clear out UString objects in the more conventional way. (JSC::Lexer::sourceCode): Made this no-longer inline since it has more work to do in the case where we stripped BOMs.

parser/Lexer.h: Renamed yylineno to m_lineNumber. Removed convertHex function, which is the same as toASCIIHexValue. Removed isHexDigit function, which is the same as isASCIIHedDigit. Replaced shift with four separate shift functions. Removed isWhiteSpace function that passes m_current, instead just passing m_current explicitly. Removed isOctalDigit, which is the same as isASCIIOctalDigit. Eliminated unused arguments from matchPunctuator. Added copyCoodeWithoutBOMs and currentOffset. Moved the makeIdentifier function out of the header. Added lastTokenWasRestrKeyword function. Added new constants for m_skipLineEnd. Removed unused yycolumn, m_restrKeyword, m_skipLF, m_skipCR, m_eatNextIdentifier, m_stackToken, m_position, m_length, m_currentOffset, m_nextOffset1, m_nextOffset2, m_nextOffset3. Added m_skipLineEnd, m_codeStart, m_codeEnd, and m_codeWithoutBOMs.

parser/SourceProvider.h: Added hasBOMs function. In the future this can be used to tell the lexer about strings known not to have BOMs.

runtime/JSGlobalObjectFunctions.cpp: (JSC::globalFuncUnescape): Changed to use isASCIIHexDigit.

wtf/ASCIICType.h: Added using statements to match the design of the other WTF headers.

Property svn:eol-style set to native

File size: 5.2 KB

Line
1	/*
2	* Copyright (C) 1999-2000 Harri Porten ([email protected])
3	* Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
4	*
5	* This library is free software; you can redistribute it and/or
6	* modify it under the terms of the GNU Library General Public
7	* License as published by the Free Software Foundation; either
8	* version 2 of the License, or (at your option) any later version.
9	*
10	* This library is distributed in the hope that it will be useful,
11	* but WITHOUT ANY WARRANTY; without even the implied warranty of
12	* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13	* Library General Public License for more details.
14	*
15	* You should have received a copy of the GNU Library General Public License
16	* along with this library; see the file COPYING.LIB. If not, write to
17	* the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
18	* Boston, MA 02110-1301, USA.
19	*
20	*/
21
22	#ifndef Lexer_h
23	#define Lexer_h
24
25	#include "Lookup.h"
26	#include "SegmentedVector.h"
27	#include "SourceCode.h"
28	#include <wtf/ASCIICType.h>
29	#include <wtf/Vector.h>
30	#include <wtf/unicode/Unicode.h>
31
32	namespace JSC {
33
34	class RegExp;
35
36	class Lexer : Noncopyable {
37	public:
38	// Character manipulation functions.
39	static bool isWhiteSpace(int character);
40	static bool isLineTerminator(int character);
41	static unsigned char convertHex(int c1, int c2);
42	static UChar convertUnicode(int c1, int c2, int c3, int c4);
43
44	// Functions to set up parsing.
45	void setCode(const SourceCode&);
46	void setIsReparsing() { m_isReparsing = true; }
47
48	// Functions for the parser itself.
49	int lex(void* lvalp, void* llocp);
50	int lineNumber() const { return m_lineNumber; }
51	bool prevTerminator() const { return m_terminator; }
52	SourceCode sourceCode(int openBrace, int closeBrace, int firstLine);
53	bool scanRegExp();
54	const UString& pattern() const { return m_pattern; }
55	const UString& flags() const { return m_flags; }
56
57	// Functions for use after parsing.
58	bool sawError() const { return m_error; }
59	void clear();
60
61	private:
62	friend class JSGlobalData;
63
64	Lexer(JSGlobalData*);
65	~Lexer();
66
67	enum State {
68	Start,
69	IdentifierOrKeyword,
70	Identifier,
71	InIdentifierOrKeyword,
72	InIdentifier,
73	InIdentifierStartUnicodeEscapeStart,
74	InIdentifierStartUnicodeEscape,
75	InIdentifierPartUnicodeEscapeStart,
76	InIdentifierPartUnicodeEscape,
77	InSingleLineComment,
78	InMultiLineComment,
79	InNum,
80	InNum0,
81	InHex,
82	InOctal,
83	InDecimal,
84	InExponentIndicator,
85	InExponent,
86	Hex,
87	Octal,
88	Number,
89	String,
90	Eof,
91	InString,
92	InEscapeSequence,
93	InHexEscape,
94	InUnicodeEscape,
95	Other,
96	Bad
97	};
98
99	void setDone(State);
100	void shift1();
101	void shift2();
102	void shift3();
103	void shift4();
104	void nextLine();
105	int lookupKeyword(const char *);
106
107	bool isLineTerminator();
108
109	int matchPunctuator(int& charPos);
110
111	void record8(int);
112	void record16(int);
113	void record16(UChar);
114
115	void copyCodeWithoutBOMs();
116
117	int currentOffset() const;
118
119	JSC::Identifier* makeIdentifier(const Vector<UChar>& buffer);
120
121	bool lastTokenWasRestrKeyword() const;
122
123	static const size_t initialReadBufferCapacity = 32;
124	static const size_t initialIdentifierTableCapacity = 64;
125
126	int m_lineNumber;
127
128	bool m_done;
129	Vector<char> m_buffer8;
130	Vector<UChar> m_buffer16;
131	bool m_terminator;
132	bool m_delimited; // encountered delimiter like "'" and "}" on last run
133	unsigned char m_skipLineEnd;
134	int m_lastToken;
135
136	State m_state;
137	const SourceCode* m_source;
138	const UChar* m_code;
139	const UChar* m_codeStart;
140	const UChar* m_codeEnd;
141	bool m_isReparsing;
142	bool m_atLineStart;
143	bool m_error;
144
145	// current and following unicode characters (int to allow for -1 for end-of-file marker)
146	int m_current;
147	int m_next1;
148	int m_next2;
149	int m_next3;
150
151	SegmentedVector<JSC::Identifier, initialIdentifierTableCapacity> m_identifiers;
152
153	JSGlobalData* m_globalData;
154
155	UString m_pattern;
156	UString m_flags;
157
158	const HashTable m_keywordTable;
159
160	Vector<UChar> m_codeWithoutBOMs;
161	};
162
163	inline bool Lexer::isWhiteSpace(int ch)
164	{
165	return isASCII(ch) ? (ch == ' ' \|\| ch == '\t' \|\| ch == 0xB \|\| ch == 0xC) : WTF::Unicode::isSeparatorSpace(ch);
166	}
167
168	inline bool Lexer::isLineTerminator(int ch)
169	{
170	return ch == '\r' \|\| ch == '\n' \|\| ch == 0x2028 \|\| ch == 0x2029;
171	}
172
173	inline unsigned char Lexer::convertHex(int c1, int c2)
174	{
175	return (toASCIIHexValue(c1) << 4) \| toASCIIHexValue(c2);
176	}
177
178	inline UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
179	{
180	return (convertHex(c1, c2) << 8) \| convertHex(c3, c4);
181	}
182
183	} // namespace JSC
184
185	#endif // Lexer_h

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: webkit/trunk/JavaScriptCore/parser/Lexer.h@ 43156

Download in other formats: