Context Navigation

lexer.cpp@ 38061

Visit:

Last change on this file since 38061 was 37184, checked in by [email protected], 17 years ago

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Preliminary step toward dynamic recompilation: Standardized and
simplified the parsing interface.

The main goal in this patch is to make it easy to ask for a duplicate
compilation, and get back a duplicate result -- same source URL, same
debugger / profiler ID, same toString behavior, etc.

The basic unit of compilation and evaluation is now SourceCode, which
encompasses a SourceProvider, a range in that provider, and a starting
line number.

A SourceProvider now encompasses a source URL, and *is* a source ID,
since a pointer is a unique identifier.

API/JSBase.cpp: (JSEvaluateScript): (JSCheckScriptSyntax): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.

VM/CodeBlock.h: (JSC::EvalCodeCache::get): Provide a SourceCode to the Interpreter, since other APIs are no longer supported. (JSC::CodeBlock::CodeBlock): ASSERT something that used to be ASSERTed by our caller -- this is a better bottleneck.

VM/CodeGenerator.cpp: (JSC::CodeGenerator::CodeGenerator): Updated for the fact that FunctionBodyNode's parameters are no longer a WTF::Vector.

kjs/Arguments.cpp: (JSC::Arguments::Arguments): ditto

kjs/DebuggerCallFrame.cpp: (JSC::DebuggerCallFrame::evaluate): Provide a SourceCode to the Parser, since other APIs are no longer supported.

kjs/FunctionConstructor.cpp: (JSC::constructFunction): Provide a SourceCode to the Parser, since other APIs are no longer supported. Adopt FunctionBodyNode's new "finishParsing" API.

kjs/JSFunction.cpp: (JSC::JSFunction::lengthGetter): (JSC::JSFunction::getParameterName): Updated for the fact that FunctionBodyNode's parameters are no longer a wtf::Vector.

kjs/JSFunction.h: Nixed some cruft.

kjs/JSGlobalObjectFunctions.cpp: (JSC::globalFuncEval): Provide a SourceCode to the Parser, since other APIs are no longer supported.

kjs/Parser.cpp: (JSC::Parser::parse): Require a SourceCode argument, instead of a bunch of broken out parameters. Stop tracking sourceId as an integer, since we use the SourceProvider pointer for this now. Don't clamp the startingLineNumber, since SourceCode does that now.

kjs/Parser.h: (JSC::Parser::parse): Standardized the parsing interface to require a SourceCode.

kjs/Shell.cpp: (functionRun): (functionLoad): (prettyPrintScript): (runWithScripts): (runInteractive): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.

kjs/SourceProvider.h: (JSC::SourceProvider::SourceProvider): (JSC::SourceProvider::url): (JSC::SourceProvider::asId): (JSC::UStringSourceProvider::create): (JSC::UStringSourceProvider::UStringSourceProvider): Added new responsibilities described above.

kjs/SourceRange.h: (JSC::SourceCode::SourceCode): (JSC::SourceCode::toString): (JSC::SourceCode::provider): (JSC::SourceCode::firstLine): (JSC::SourceCode::data): (JSC::SourceCode::length): Added new responsibilities described above. Renamed SourceRange to SourceCode, based on review feedback. Added a makeSource function for convenience.

kjs/debugger.h: Provide a SourceCode to the client, since other APIs are no longer supported.

kjs/grammar.y: Provide startingLineNumber when creating a SourceCode.

kjs/debugger.h: Treat sourceId as intptr_t to avoid loss of precision on 64bit platforms.

kjs/interpreter.cpp: (JSC::Interpreter::checkSyntax): (JSC::Interpreter::evaluate):
kjs/interpreter.h: Require a SourceCode instead of broken out arguments.

kjs/lexer.cpp: (JSC::Lexer::setCode):
kjs/lexer.h: (JSC::Lexer::sourceRange): Fold together the SourceProvider and line number into a SourceCode. Fixed a bug where the Lexer would accidentally keep alive the last SourceProvider forever.

kjs/nodes.cpp: (JSC::ScopeNode::ScopeNode): (JSC::ProgramNode::ProgramNode): (JSC::ProgramNode::create): (JSC::EvalNode::EvalNode): (JSC::EvalNode::generateCode): (JSC::EvalNode::create): (JSC::FunctionBodyNode::FunctionBodyNode): (JSC::FunctionBodyNode::finishParsing): (JSC::FunctionBodyNode::create): (JSC::FunctionBodyNode::generateCode): (JSC::ProgramNode::generateCode): (JSC::FunctionBodyNode::paramString):
kjs/nodes.h: (JSC::ScopeNode::): (JSC::ScopeNode::sourceId): (JSC::FunctionBodyNode::): (JSC::FunctionBodyNode::parameterCount): (JSC::FuncExprNode::): (JSC::FuncDeclNode::): Store a SourceCode in all ScopeNodes, since SourceCode is now responsible for tracking URL, ID, etc. Streamlined some ad hoc FunctionBodyNode fixups into a "finishParsing" function, to help make clear what you need to do in order to finish parsing a FunctionBodyNode.

wtf/Vector.h: (WTF::::releaseBuffer): Don't ASSERT that releaseBuffer() is only called when buffer is not 0, since FunctionBodyNode is more than happy to get back a 0 buffer, and other functions like RefPtr::release() allow for 0, too.

JavaScriptGlue:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

JSRun.cpp: (JSRun::Evaluate): (JSRun::CheckSyntax): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.

WebCore:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Updated for JavaScriptCore API changes: use a SourceCode instead of
broken out parameters; treat sourceId as intptr_t.

ForwardingHeaders/kjs/SourceRange.h: Copied from ForwardingHeaders/kjs/SourceProvider.h.
bindings/js/JSXMLHttpRequestCustom.cpp: (WebCore::JSXMLHttpRequest::send):
bindings/js/ScriptController.cpp: (WebCore::ScriptController::evaluate):
bindings/js/StringSourceProvider.h: (WebCore::StringSourceProvider::create): (WebCore::StringSourceProvider::StringSourceProvider):

(WebCore::makeSource): Added a makeSource function for convenience.

bindings/objc/WebScriptObject.mm: (-[WebScriptObject evaluateWebScript:]):
bridge/NP_jsobject.cpp: (_NPN_Evaluate):
bridge/jni/jni_jsobject.mm: (JavaJSObject::call): (JavaJSObject::eval): (JavaJSObject::getMember): (JavaJSObject::setMember): (JavaJSObject::removeMember):

bridge/jni/jni_runtime.h: (JSC::Bindings::JavaString::operator UString): Replaced the explicit ustring() function with an implicit operator because this class already holds a UString::rep.

page/Console.cpp: (WebCore::retrieveLastCaller): (WebCore::Console::trace):
page/InspectorController.cpp: (WebCore::jsStringRef): (WebCore::InspectorController::addBreakpoint): (WebCore::InspectorController::removeBreakpoint): (WebCore::InspectorController::didParseSource): (WebCore::InspectorController::failedToParseSource):
page/InspectorController.h:
page/JavaScriptCallFrame.cpp: (WebCore::JavaScriptCallFrame::JavaScriptCallFrame):
page/JavaScriptCallFrame.h: (WebCore::JavaScriptCallFrame::create): (WebCore::JavaScriptCallFrame::sourceIdentifier): (WebCore::JavaScriptCallFrame::update):
page/JavaScriptDebugListener.h:
page/JavaScriptDebugServer.cpp: (WebCore::JavaScriptDebugServer::addBreakpoint): (WebCore::JavaScriptDebugServer::removeBreakpoint): (WebCore::JavaScriptDebugServer::hasBreakpoint): (WebCore::dispatchDidParseSource): (WebCore::dispatchFailedToParseSource): (WebCore::JavaScriptDebugServer::sourceParsed): (WebCore::JavaScriptDebugServer::callEvent): (WebCore::JavaScriptDebugServer::atStatement): (WebCore::JavaScriptDebugServer::returnEvent): (WebCore::JavaScriptDebugServer::exception): (WebCore::JavaScriptDebugServer::willExecuteProgram): (WebCore::JavaScriptDebugServer::didExecuteProgram): (WebCore::JavaScriptDebugServer::didReachBreakpoint):
page/JavaScriptDebugServer.h:
page/inspector/ScriptsPanel.js: Renamed internal uses of sourceId and sourceIdentifier to sourceID.

WebKit/mac:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Updated for JavaScriptCore API changes: use a SourceCode instead of
broken out parameters; treat sourceId as intptr_t.

We still treat sourceId as int in some cases because of DashCode. See
<rdar://problem/6263293> WebScriptDebugDelegate should use intptr_t for
sourceId, not int.

WebView/WebScriptDebugger.h:
WebView/WebScriptDebugger.mm: (toNSString): (WebScriptDebugger::sourceParsed): (WebScriptDebugger::callEvent): (WebScriptDebugger::atStatement): (WebScriptDebugger::returnEvent): (WebScriptDebugger::exception): (WebScriptDebugger::willExecuteProgram): (WebScriptDebugger::didExecuteProgram): (WebScriptDebugger::didReachBreakpoint):

Property svn:eol-style set to native

File size: 27.3 KB

Line
1	/*
2	* Copyright (C) 1999-2000 Harri Porten ([email protected])
3	* Copyright (C) 2006, 2007, 2008 Apple Inc. All Rights Reserved.
4	* Copyright (C) 2007 Cameron Zwarich ([email protected])
5	*
6	* This library is free software; you can redistribute it and/or
7	* modify it under the terms of the GNU Library General Public
8	* License as published by the Free Software Foundation; either
9	* version 2 of the License, or (at your option) any later version.
10	*
11	* This library is distributed in the hope that it will be useful,
12	* but WITHOUT ANY WARRANTY; without even the implied warranty of
13	* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14	* Library General Public License for more details.
15	*
16	* You should have received a copy of the GNU Library General Public License
17	* along with this library; see the file COPYING.LIB. If not, write to
18	* the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
19	* Boston, MA 02110-1301, USA.
20	*
21	*/
22
23	#include "config.h"
24	#include "lexer.h"
25
26	#include "dtoa.h"
27	#include "JSFunction.h"
28	#include "nodes.h"
29	#include "NodeInfo.h"
30	#include "JSGlobalObjectFunctions.h"
31	#include <ctype.h>
32	#include <limits.h>
33	#include <string.h>
34	#include <wtf/Assertions.h>
35	#include <wtf/unicode/Unicode.h>
36
37	using namespace WTF;
38	using namespace Unicode;
39
40	// we can't specify the namespace in yacc's C output, so do it here
41	using namespace JSC;
42
43	#ifndef KDE_USE_FINAL
44	#include "grammar.h"
45	#endif
46
47	#include "lookup.h"
48	#include "lexer.lut.h"
49
50	// a bridge for yacc from the C world to C++
51	int kjsyylex(void* lvalp, void* llocp, void* globalData)
52	{
53	return static_cast<JSGlobalData*>(globalData)->lexer->lex(lvalp, llocp);
54	}
55
56	namespace JSC {
57
58	static bool isDecimalDigit(int);
59
60	static const size_t initialReadBufferCapacity = 32;
61	static const size_t initialStringTableCapacity = 64;
62
63	Lexer::Lexer(JSGlobalData* globalData)
64	: yylineno(1)
65	, m_restrKeyword(false)
66	, m_eatNextIdentifier(false)
67	, m_stackToken(-1)
68	, m_lastToken(-1)
69	, m_position(0)
70	, m_code(0)
71	, m_length(0)
72	, m_atLineStart(true)
73	, m_current(0)
74	, m_next1(0)
75	, m_next2(0)
76	, m_next3(0)
77	, m_currentOffset(0)
78	, m_nextOffset1(0)
79	, m_nextOffset2(0)
80	, m_nextOffset3(0)
81	, m_globalData(globalData)
82	, m_mainTable(JSC::mainTable)
83	{
84	m_buffer8.reserveCapacity(initialReadBufferCapacity);
85	m_buffer16.reserveCapacity(initialReadBufferCapacity);
86	m_strings.reserveCapacity(initialStringTableCapacity);
87	m_identifiers.reserveCapacity(initialStringTableCapacity);
88	}
89
90	Lexer::~Lexer()
91	{
92	m_mainTable.deleteTable();
93	}
94
95	void Lexer::setCode(const SourceCode& source)
96	{
97	yylineno = source.firstLine();
98	m_restrKeyword = false;
99	m_delimited = false;
100	m_eatNextIdentifier = false;
101	m_stackToken = -1;
102	m_lastToken = -1;
103
104	m_position = 0;
105	m_source = &source;
106	m_code = source.data();
107	m_length = source.length();
108	m_skipLF = false;
109	m_skipCR = false;
110	m_error = false;
111	m_atLineStart = true;
112
113	// read first characters
114	shift(4);
115	}
116
117	void Lexer::shift(unsigned p)
118	{
119	// ECMA-262 calls for stripping Cf characters here, but we only do this for BOM,
120	// see <https://bugs.webkit.org/show_bug.cgi?id=4931>.
121
122	while (p--) {
123	m_current = m_next1;
124	m_next1 = m_next2;
125	m_next2 = m_next3;
126	m_currentOffset = m_nextOffset1;
127	m_nextOffset1 = m_nextOffset2;
128	m_nextOffset2 = m_nextOffset3;
129	do {
130	if (m_position >= m_length) {
131	m_nextOffset3 = m_position;
132	m_position++;
133	m_next3 = -1;
134	break;
135	}
136	m_nextOffset3 = m_position;
137	m_next3 = m_code[m_position++];
138	} while (m_next3 == 0xFEFF);
139	}
140	}
141
142	// called on each new line
143	void Lexer::nextLine()
144	{
145	yylineno++;
146	m_atLineStart = true;
147	}
148
149	void Lexer::setDone(State s)
150	{
151	m_state = s;
152	m_done = true;
153	}
154
155	int Lexer::lex(void* p1, void* p2)
156	{
157	YYSTYPE* lvalp = static_cast<YYSTYPE*>(p1);
158	YYLTYPE* llocp = static_cast<YYLTYPE*>(p2);
159	int token = 0;
160	m_state = Start;
161	unsigned short stringType = 0; // either single or double quotes
162	m_buffer8.clear();
163	m_buffer16.clear();
164	m_done = false;
165	m_terminator = false;
166	m_skipLF = false;
167	m_skipCR = false;
168
169	// did we push a token on the stack previously ?
170	// (after an automatic semicolon insertion)
171	if (m_stackToken >= 0) {
172	setDone(Other);
173	token = m_stackToken;
174	m_stackToken = 0;
175	}
176	int startOffset = m_currentOffset;
177	while (!m_done) {
178	if (m_skipLF && m_current != '\n') // found \r but not \n afterwards
179	m_skipLF = false;
180	if (m_skipCR && m_current != '\r') // found \n but not \r afterwards
181	m_skipCR = false;
182	if (m_skipLF \|\| m_skipCR) { // found \r\n or \n\r -> eat the second one
183	m_skipLF = false;
184	m_skipCR = false;
185	shift(1);
186	}
187	switch (m_state) {
188	case Start:
189	startOffset = m_currentOffset;
190	if (isWhiteSpace()) {
191	// do nothing
192	} else if (m_current == '/' && m_next1 == '/') {
193	shift(1);
194	m_state = InSingleLineComment;
195	} else if (m_current == '/' && m_next1 == '*') {
196	shift(1);
197	m_state = InMultiLineComment;
198	} else if (m_current == -1) {
199	if (!m_terminator && !m_delimited) {
200	// automatic semicolon insertion if program incomplete
201	token = ';';
202	m_stackToken = 0;
203	setDone(Other);
204	} else
205	setDone(Eof);
206	} else if (isLineTerminator()) {
207	nextLine();
208	m_terminator = true;
209	if (m_restrKeyword) {
210	token = ';';
211	setDone(Other);
212	}
213	} else if (m_current == '"' \|\| m_current == '\'') {
214	m_state = InString;
215	stringType = static_cast<unsigned short>(m_current);
216	} else if (isIdentStart(m_current)) {
217	record16(m_current);
218	m_state = InIdentifierOrKeyword;
219	} else if (m_current == '\\')
220	m_state = InIdentifierStartUnicodeEscapeStart;
221	else if (m_current == '0') {
222	record8(m_current);
223	m_state = InNum0;
224	} else if (isDecimalDigit(m_current)) {
225	record8(m_current);
226	m_state = InNum;
227	} else if (m_current == '.' && isDecimalDigit(m_next1)) {
228	record8(m_current);
229	m_state = InDecimal;
230	// <!-- marks the beginning of a line comment (for www usage)
231	} else if (m_current == '<' && m_next1 == '!' && m_next2 == '-' && m_next3 == '-') {
232	shift(3);
233	m_state = InSingleLineComment;
234	// same for -->
235	} else if (m_atLineStart && m_current == '-' && m_next1 == '-' && m_next2 == '>') {
236	shift(2);
237	m_state = InSingleLineComment;
238	} else {
239	token = matchPunctuator(lvalp->intValue, m_current, m_next1, m_next2, m_next3);
240	if (token != -1)
241	setDone(Other);
242	else
243	setDone(Bad);
244	}
245	break;
246	case InString:
247	if (m_current == stringType) {
248	shift(1);
249	setDone(String);
250	} else if (isLineTerminator() \|\| m_current == -1)
251	setDone(Bad);
252	else if (m_current == '\\')
253	m_state = InEscapeSequence;
254	else
255	record16(m_current);
256	break;
257	// Escape Sequences inside of strings
258	case InEscapeSequence:
259	if (isOctalDigit(m_current)) {
260	if (m_current >= '0' && m_current <= '3' &&
261	isOctalDigit(m_next1) && isOctalDigit(m_next2)) {
262	record16(convertOctal(m_current, m_next1, m_next2));
263	shift(2);
264	m_state = InString;
265	} else if (isOctalDigit(m_current) && isOctalDigit(m_next1)) {
266	record16(convertOctal('0', m_current, m_next1));
267	shift(1);
268	m_state = InString;
269	} else if (isOctalDigit(m_current)) {
270	record16(convertOctal('0', '0', m_current));
271	m_state = InString;
272	} else
273	setDone(Bad);
274	} else if (m_current == 'x')
275	m_state = InHexEscape;
276	else if (m_current == 'u')
277	m_state = InUnicodeEscape;
278	else if (isLineTerminator()) {
279	nextLine();
280	m_state = InString;
281	} else {
282	record16(singleEscape(static_cast<unsigned short>(m_current)));
283	m_state = InString;
284	}
285	break;
286	case InHexEscape:
287	if (isHexDigit(m_current) && isHexDigit(m_next1)) {
288	m_state = InString;
289	record16(convertHex(m_current, m_next1));
290	shift(1);
291	} else if (m_current == stringType) {
292	record16('x');
293	shift(1);
294	setDone(String);
295	} else {
296	record16('x');
297	record16(m_current);
298	m_state = InString;
299	}
300	break;
301	case InUnicodeEscape:
302	if (isHexDigit(m_current) && isHexDigit(m_next1) && isHexDigit(m_next2) && isHexDigit(m_next3)) {
303	record16(convertUnicode(m_current, m_next1, m_next2, m_next3));
304	shift(3);
305	m_state = InString;
306	} else if (m_current == stringType) {
307	record16('u');
308	shift(1);
309	setDone(String);
310	} else
311	setDone(Bad);
312	break;
313	case InSingleLineComment:
314	if (isLineTerminator()) {
315	nextLine();
316	m_terminator = true;
317	if (m_restrKeyword) {
318	token = ';';
319	setDone(Other);
320	} else
321	m_state = Start;
322	} else if (m_current == -1)
323	setDone(Eof);
324	break;
325	case InMultiLineComment:
326	if (m_current == -1)
327	setDone(Bad);
328	else if (isLineTerminator())
329	nextLine();
330	else if (m_current == '*' && m_next1 == '/') {
331	m_state = Start;
332	shift(1);
333	}
334	break;
335	case InIdentifierOrKeyword:
336	case InIdentifier:
337	if (isIdentPart(m_current))
338	record16(m_current);
339	else if (m_current == '\\')
340	m_state = InIdentifierPartUnicodeEscapeStart;
341	else
342	setDone(m_state == InIdentifierOrKeyword ? IdentifierOrKeyword : Identifier);
343	break;
344	case InNum0:
345	if (m_current == 'x' \|\| m_current == 'X') {
346	record8(m_current);
347	m_state = InHex;
348	} else if (m_current == '.') {
349	record8(m_current);
350	m_state = InDecimal;
351	} else if (m_current == 'e' \|\| m_current == 'E') {
352	record8(m_current);
353	m_state = InExponentIndicator;
354	} else if (isOctalDigit(m_current)) {
355	record8(m_current);
356	m_state = InOctal;
357	} else if (isDecimalDigit(m_current)) {
358	record8(m_current);
359	m_state = InDecimal;
360	} else
361	setDone(Number);
362	break;
363	case InHex:
364	if (isHexDigit(m_current))
365	record8(m_current);
366	else
367	setDone(Hex);
368	break;
369	case InOctal:
370	if (isOctalDigit(m_current))
371	record8(m_current);
372	else if (isDecimalDigit(m_current)) {
373	record8(m_current);
374	m_state = InDecimal;
375	} else
376	setDone(Octal);
377	break;
378	case InNum:
379	if (isDecimalDigit(m_current))
380	record8(m_current);
381	else if (m_current == '.') {
382	record8(m_current);
383	m_state = InDecimal;
384	} else if (m_current == 'e' \|\| m_current == 'E') {
385	record8(m_current);
386	m_state = InExponentIndicator;
387	} else
388	setDone(Number);
389	break;
390	case InDecimal:
391	if (isDecimalDigit(m_current))
392	record8(m_current);
393	else if (m_current == 'e' \|\| m_current == 'E') {
394	record8(m_current);
395	m_state = InExponentIndicator;
396	} else
397	setDone(Number);
398	break;
399	case InExponentIndicator:
400	if (m_current == '+' \|\| m_current == '-')
401	record8(m_current);
402	else if (isDecimalDigit(m_current)) {
403	record8(m_current);
404	m_state = InExponent;
405	} else
406	setDone(Bad);
407	break;
408	case InExponent:
409	if (isDecimalDigit(m_current))
410	record8(m_current);
411	else
412	setDone(Number);
413	break;
414	case InIdentifierStartUnicodeEscapeStart:
415	if (m_current == 'u')
416	m_state = InIdentifierStartUnicodeEscape;
417	else
418	setDone(Bad);
419	break;
420	case InIdentifierPartUnicodeEscapeStart:
421	if (m_current == 'u')
422	m_state = InIdentifierPartUnicodeEscape;
423	else
424	setDone(Bad);
425	break;
426	case InIdentifierStartUnicodeEscape:
427	if (!isHexDigit(m_current) \|\| !isHexDigit(m_next1) \|\| !isHexDigit(m_next2) \|\| !isHexDigit(m_next3)) {
428	setDone(Bad);
429	break;
430	}
431	token = convertUnicode(m_current, m_next1, m_next2, m_next3);
432	shift(3);
433	if (!isIdentStart(token)) {
434	setDone(Bad);
435	break;
436	}
437	record16(token);
438	m_state = InIdentifier;
439	break;
440	case InIdentifierPartUnicodeEscape:
441	if (!isHexDigit(m_current) \|\| !isHexDigit(m_next1) \|\| !isHexDigit(m_next2) \|\| !isHexDigit(m_next3)) {
442	setDone(Bad);
443	break;
444	}
445	token = convertUnicode(m_current, m_next1, m_next2, m_next3);
446	shift(3);
447	if (!isIdentPart(token)) {
448	setDone(Bad);
449	break;
450	}
451	record16(token);
452	m_state = InIdentifier;
453	break;
454	default:
455	ASSERT(!"Unhandled state in switch statement");
456	}
457
458	// move on to the next character
459	if (!m_done)
460	shift(1);
461	if (m_state != Start && m_state != InSingleLineComment)
462	m_atLineStart = false;
463	}
464
465	// no identifiers allowed directly after numeric literal, e.g. "3in" is bad
466	if ((m_state == Number \|\| m_state == Octal \|\| m_state == Hex) && isIdentStart(m_current))
467	m_state = Bad;
468
469	// terminate string
470	m_buffer8.append('\0');
471
472	#ifdef KJS_DEBUG_LEX
473	fprintf(stderr, "line: %d ", lineNo());
474	fprintf(stderr, "yytext (%x): ", m_buffer8[0]);
475	fprintf(stderr, "%s ", m_buffer8.data());
476	#endif
477
478	double dval = 0;
479	if (m_state == Number)
480	dval = strtod(m_buffer8.data(), 0L);
481	else if (m_state == Hex) { // scan hex numbers
482	const char* p = m_buffer8.data() + 2;
483	while (char c = *p++) {
484	dval *= 16;
485	dval += convertHex(c);
486	}
487
488	if (dval >= mantissaOverflowLowerBound)
489	dval = parseIntOverflow(m_buffer8.data() + 2, p - (m_buffer8.data() + 3), 16);
490
491	m_state = Number;
492	} else if (m_state == Octal) { // scan octal number
493	const char* p = m_buffer8.data() + 1;
494	while (char c = *p++) {
495	dval *= 8;
496	dval += c - '0';
497	}
498
499	if (dval >= mantissaOverflowLowerBound)
500	dval = parseIntOverflow(m_buffer8.data() + 1, p - (m_buffer8.data() + 2), 8);
501
502	m_state = Number;
503	}
504
505	#ifdef KJS_DEBUG_LEX
506	switch (m_state) {
507	case Eof:
508	printf("(EOF)\n");
509	break;
510	case Other:
511	printf("(Other)\n");
512	break;
513	case Identifier:
514	printf("(Identifier)/(Keyword)\n");
515	break;
516	case String:
517	printf("(String)\n");
518	break;
519	case Number:
520	printf("(Number)\n");
521	break;
522	default:
523	printf("(unknown)");
524	}
525	#endif
526
527	if (m_state != Identifier)
528	m_eatNextIdentifier = false;
529
530	m_restrKeyword = false;
531	m_delimited = false;
532	llocp->first_line = yylineno;
533	llocp->last_line = yylineno;
534	llocp->first_column = startOffset;
535	llocp->last_column = m_currentOffset;
536	switch (m_state) {
537	case Eof:
538	token = 0;
539	break;
540	case Other:
541	if (token == '}' \|\| token == ';')
542	m_delimited = true;
543	break;
544	case Identifier:
545	// Apply anonymous-function hack below (eat the identifier).
546	if (m_eatNextIdentifier) {
547	m_eatNextIdentifier = false;
548	token = lex(lvalp, llocp);
549	break;
550	}
551	lvalp->ident = makeIdentifier(m_buffer16);
552	token = IDENT;
553	break;
554	case IdentifierOrKeyword: {
555	lvalp->ident = makeIdentifier(m_buffer16);
556	const HashEntry* entry = m_mainTable.entry(m_globalData, *lvalp->ident);
557	if (!entry) {
558	// Lookup for keyword failed, means this is an identifier.
559	token = IDENT;
560	break;
561	}
562	token = entry->lexerValue();
563	// Hack for "f = function somename() { ... }"; too hard to get into the grammar.
564	m_eatNextIdentifier = token == FUNCTION && m_lastToken == '=';
565	if (token == CONTINUE \|\| token == BREAK \|\| token == RETURN \|\| token == THROW)
566	m_restrKeyword = true;
567	break;
568	}
569	case String:
570	// Atomize constant strings in case they're later used in property lookup.
571	lvalp->ident = makeIdentifier(m_buffer16);
572	token = STRING;
573	break;
574	case Number:
575	lvalp->doubleValue = dval;
576	token = NUMBER;
577	break;
578	case Bad:
579	#ifdef KJS_DEBUG_LEX
580	fprintf(stderr, "yylex: ERROR.\n");
581	#endif
582	m_error = true;
583	return -1;
584	default:
585	ASSERT(!"unhandled numeration value in switch");
586	m_error = true;
587	return -1;
588	}
589	m_lastToken = token;
590	return token;
591	}
592
593	bool Lexer::isWhiteSpace() const
594	{
595	return m_current == '\t' \|\| m_current == 0x0b \|\| m_current == 0x0c \|\| isSeparatorSpace(m_current);
596	}
597
598	bool Lexer::isLineTerminator()
599	{
600	bool cr = (m_current == '\r');
601	bool lf = (m_current == '\n');
602	if (cr)
603	m_skipLF = true;
604	else if (lf)
605	m_skipCR = true;
606	return cr \|\| lf \|\| m_current == 0x2028 \|\| m_current == 0x2029;
607	}
608
609	bool Lexer::isIdentStart(int c)
610	{
611	return (category(c) & (Letter_Uppercase \| Letter_Lowercase \| Letter_Titlecase \| Letter_Modifier \| Letter_Other))
612	\|\| c == '$' \|\| c == '_';
613	}
614
615	bool Lexer::isIdentPart(int c)
616	{
617	return (category(c) & (Letter_Uppercase \| Letter_Lowercase \| Letter_Titlecase \| Letter_Modifier \| Letter_Other
618	\| Mark_NonSpacing \| Mark_SpacingCombining \| Number_DecimalDigit \| Punctuation_Connector))
619	\|\| c == '$' \|\| c == '_';
620	}
621
622	static bool isDecimalDigit(int c)
623	{
624	return (c >= '0' && c <= '9');
625	}
626
627	bool Lexer::isHexDigit(int c)
628	{
629	return (c >= '0' && c <= '9'
630	\|\| c >= 'a' && c <= 'f'
631	\|\| c >= 'A' && c <= 'F');
632	}
633
634	bool Lexer::isOctalDigit(int c)
635	{
636	return (c >= '0' && c <= '7');
637	}
638
639	int Lexer::matchPunctuator(int& charPos, int c1, int c2, int c3, int c4)
640	{
641	if (c1 == '>' && c2 == '>' && c3 == '>' && c4 == '=') {
642	shift(4);
643	return URSHIFTEQUAL;
644	}
645	if (c1 == '=' && c2 == '=' && c3 == '=') {
646	shift(3);
647	return STREQ;
648	}
649	if (c1 == '!' && c2 == '=' && c3 == '=') {
650	shift(3);
651	return STRNEQ;
652	}
653	if (c1 == '>' && c2 == '>' && c3 == '>') {
654	shift(3);
655	return URSHIFT;
656	}
657	if (c1 == '<' && c2 == '<' && c3 == '=') {
658	shift(3);
659	return LSHIFTEQUAL;
660	}
661	if (c1 == '>' && c2 == '>' && c3 == '=') {
662	shift(3);
663	return RSHIFTEQUAL;
664	}
665	if (c1 == '<' && c2 == '=') {
666	shift(2);
667	return LE;
668	}
669	if (c1 == '>' && c2 == '=') {
670	shift(2);
671	return GE;
672	}
673	if (c1 == '!' && c2 == '=') {
674	shift(2);
675	return NE;
676	}
677	if (c1 == '+' && c2 == '+') {
678	shift(2);
679	if (m_terminator)
680	return AUTOPLUSPLUS;
681	return PLUSPLUS;
682	}
683	if (c1 == '-' && c2 == '-') {
684	shift(2);
685	if (m_terminator)
686	return AUTOMINUSMINUS;
687	return MINUSMINUS;
688	}
689	if (c1 == '=' && c2 == '=') {
690	shift(2);
691	return EQEQ;
692	}
693	if (c1 == '+' && c2 == '=') {
694	shift(2);
695	return PLUSEQUAL;
696	}
697	if (c1 == '-' && c2 == '=') {
698	shift(2);
699	return MINUSEQUAL;
700	}
701	if (c1 == '*' && c2 == '=') {
702	shift(2);
703	return MULTEQUAL;
704	}
705	if (c1 == '/' && c2 == '=') {
706	shift(2);
707	return DIVEQUAL;
708	}
709	if (c1 == '&' && c2 == '=') {
710	shift(2);
711	return ANDEQUAL;
712	}
713	if (c1 == '^' && c2 == '=') {
714	shift(2);
715	return XOREQUAL;
716	}
717	if (c1 == '%' && c2 == '=') {
718	shift(2);
719	return MODEQUAL;
720	}
721	if (c1 == '\|' && c2 == '=') {
722	shift(2);
723	return OREQUAL;
724	}
725	if (c1 == '<' && c2 == '<') {
726	shift(2);
727	return LSHIFT;
728	}
729	if (c1 == '>' && c2 == '>') {
730	shift(2);
731	return RSHIFT;
732	}
733	if (c1 == '&' && c2 == '&') {
734	shift(2);
735	return AND;
736	}
737	if (c1 == '\|' && c2 == '\|') {
738	shift(2);
739	return OR;
740	}
741
742	switch (c1) {
743	case '=':
744	case '>':
745	case '<':
746	case ',':
747	case '!':
748	case '~':
749	case '?':
750	case ':':
751	case '.':
752	case '+':
753	case '-':
754	case '*':
755	case '/':
756	case '&':
757	case '\|':
758	case '^':
759	case '%':
760	case '(':
761	case ')':
762	case '[':
763	case ']':
764	case ';':
765	shift(1);
766	return static_cast<int>(c1);
767	case '{':
768	charPos = m_position - 4;
769	shift(1);
770	return OPENBRACE;
771	case '}':
772	charPos = m_position - 4;
773	shift(1);
774	return CLOSEBRACE;
775	default:
776	return -1;
777	}
778	}
779
780	unsigned short Lexer::singleEscape(unsigned short c)
781	{
782	switch (c) {
783	case 'b':
784	return 0x08;
785	case 't':
786	return 0x09;
787	case 'n':
788	return 0x0A;
789	case 'v':
790	return 0x0B;
791	case 'f':
792	return 0x0C;
793	case 'r':
794	return 0x0D;
795	case '"':
796	return 0x22;
797	case '\'':
798	return 0x27;
799	case '\\':
800	return 0x5C;
801	default:
802	return c;
803	}
804	}
805
806	unsigned short Lexer::convertOctal(int c1, int c2, int c3)
807	{
808	return static_cast<unsigned short>((c1 - '0') * 64 + (c2 - '0') * 8 + c3 - '0');
809	}
810
811	unsigned char Lexer::convertHex(int c)
812	{
813	if (c >= '0' && c <= '9')
814	return static_cast<unsigned char>(c - '0');
815	if (c >= 'a' && c <= 'f')
816	return static_cast<unsigned char>(c - 'a' + 10);
817	return static_cast<unsigned char>(c - 'A' + 10);
818	}
819
820	unsigned char Lexer::convertHex(int c1, int c2)
821	{
822	return ((convertHex(c1) << 4) + convertHex(c2));
823	}
824
825	UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
826	{
827	unsigned char highByte = (convertHex(c1) << 4) + convertHex(c2);
828	unsigned char lowByte = (convertHex(c3) << 4) + convertHex(c4);
829	return (highByte << 8 \| lowByte);
830	}
831
832	void Lexer::record8(int c)
833	{
834	ASSERT(c >= 0);
835	ASSERT(c <= 0xff);
836	m_buffer8.append(static_cast<char>(c));
837	}
838
839	void Lexer::record16(int c)
840	{
841	ASSERT(c >= 0);
842	ASSERT(c <= USHRT_MAX);
843	record16(UChar(static_cast<unsigned short>(c)));
844	}
845
846	void Lexer::record16(UChar c)
847	{
848	m_buffer16.append(c);
849	}
850
851	bool Lexer::scanRegExp()
852	{
853	m_buffer16.clear();
854	bool lastWasEscape = false;
855	bool inBrackets = false;
856
857	while (1) {
858	if (isLineTerminator() \|\| m_current == -1)
859	return false;
860	else if (m_current != '/' \|\| lastWasEscape == true \|\| inBrackets == true) {
861	// keep track of '[' and ']'
862	if (!lastWasEscape) {
863	if ( m_current == '[' && !inBrackets )
864	inBrackets = true;
865	if ( m_current == ']' && inBrackets )
866	inBrackets = false;
867	}
868	record16(m_current);
869	lastWasEscape =
870	!lastWasEscape && (m_current == '\\');
871	} else { // end of regexp
872	m_pattern = UString(m_buffer16);
873	m_buffer16.clear();
874	shift(1);
875	break;
876	}
877	shift(1);
878	}
879
880	while (isIdentPart(m_current)) {
881	record16(m_current);
882	shift(1);
883	}
884	m_flags = UString(m_buffer16);
885
886	return true;
887	}
888
889	void Lexer::clear()
890	{
891	deleteAllValues(m_strings);
892	Vector<UString*> newStrings;
893	newStrings.reserveCapacity(initialStringTableCapacity);
894	m_strings.swap(newStrings);
895
896	deleteAllValues(m_identifiers);
897	Vector<JSC::Identifier*> newIdentifiers;
898	newIdentifiers.reserveCapacity(initialStringTableCapacity);
899	m_identifiers.swap(newIdentifiers);
900
901	Vector<char> newBuffer8;
902	newBuffer8.reserveCapacity(initialReadBufferCapacity);
903	m_buffer8.swap(newBuffer8);
904
905	Vector<UChar> newBuffer16;
906	newBuffer16.reserveCapacity(initialReadBufferCapacity);
907	m_buffer16.swap(newBuffer16);
908
909	m_pattern = 0;
910	m_flags = 0;
911	}
912
913	Identifier* Lexer::makeIdentifier(const Vector<UChar>& buffer)
914	{
915	JSC::Identifier* identifier = new JSC::Identifier(m_globalData, buffer.data(), buffer.size());
916	m_identifiers.append(identifier);
917	return identifier;
918	}
919
920	} // namespace JSC

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: webkit/trunk/JavaScriptCore/kjs/lexer.cpp@ 38061

Download in other formats: