source: webkit/trunk/JavaScriptCore/kjs/lexer.cpp@ 38061

Last change on this file since 38061 was 37184, checked in by [email protected], 17 years ago

JavaScriptCore:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Preliminary step toward dynamic recompilation: Standardized and
simplified the parsing interface.


The main goal in this patch is to make it easy to ask for a duplicate
compilation, and get back a duplicate result -- same source URL, same
debugger / profiler ID, same toString behavior, etc.


The basic unit of compilation and evaluation is now SourceCode, which
encompasses a SourceProvider, a range in that provider, and a starting
line number.

A SourceProvider now encompasses a source URL, and *is* a source ID,
since a pointer is a unique identifier.

  • API/JSBase.cpp: (JSEvaluateScript): (JSCheckScriptSyntax): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.


  • VM/CodeBlock.h: (JSC::EvalCodeCache::get): Provide a SourceCode to the Interpreter, since other APIs are no longer supported. (JSC::CodeBlock::CodeBlock): ASSERT something that used to be ASSERTed by our caller -- this is a better bottleneck.
  • VM/CodeGenerator.cpp: (JSC::CodeGenerator::CodeGenerator): Updated for the fact that FunctionBodyNode's parameters are no longer a WTF::Vector.
  • kjs/Arguments.cpp: (JSC::Arguments::Arguments): ditto
  • kjs/DebuggerCallFrame.cpp: (JSC::DebuggerCallFrame::evaluate): Provide a SourceCode to the Parser, since other APIs are no longer supported.
  • kjs/FunctionConstructor.cpp: (JSC::constructFunction): Provide a SourceCode to the Parser, since other APIs are no longer supported. Adopt FunctionBodyNode's new "finishParsing" API.
  • kjs/JSFunction.cpp: (JSC::JSFunction::lengthGetter): (JSC::JSFunction::getParameterName): Updated for the fact that FunctionBodyNode's parameters are no longer a wtf::Vector.
  • kjs/JSFunction.h: Nixed some cruft.
  • kjs/JSGlobalObjectFunctions.cpp: (JSC::globalFuncEval): Provide a SourceCode to the Parser, since other APIs are no longer supported.
  • kjs/Parser.cpp: (JSC::Parser::parse): Require a SourceCode argument, instead of a bunch of broken out parameters. Stop tracking sourceId as an integer, since we use the SourceProvider pointer for this now. Don't clamp the startingLineNumber, since SourceCode does that now.
  • kjs/Parser.h: (JSC::Parser::parse): Standardized the parsing interface to require a SourceCode.
  • kjs/Shell.cpp: (functionRun): (functionLoad): (prettyPrintScript): (runWithScripts): (runInteractive): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.
  • kjs/SourceProvider.h: (JSC::SourceProvider::SourceProvider): (JSC::SourceProvider::url): (JSC::SourceProvider::asId): (JSC::UStringSourceProvider::create): (JSC::UStringSourceProvider::UStringSourceProvider): Added new responsibilities described above.
  • kjs/SourceRange.h: (JSC::SourceCode::SourceCode): (JSC::SourceCode::toString): (JSC::SourceCode::provider): (JSC::SourceCode::firstLine): (JSC::SourceCode::data): (JSC::SourceCode::length): Added new responsibilities described above. Renamed SourceRange to SourceCode, based on review feedback. Added a makeSource function for convenience.
  • kjs/debugger.h: Provide a SourceCode to the client, since other APIs are no longer supported.
  • kjs/grammar.y: Provide startingLineNumber when creating a SourceCode.
  • kjs/debugger.h: Treat sourceId as intptr_t to avoid loss of precision on 64bit platforms.
  • kjs/interpreter.cpp: (JSC::Interpreter::checkSyntax): (JSC::Interpreter::evaluate):
  • kjs/interpreter.h: Require a SourceCode instead of broken out arguments.
  • kjs/lexer.cpp: (JSC::Lexer::setCode):
  • kjs/lexer.h: (JSC::Lexer::sourceRange): Fold together the SourceProvider and line number into a SourceCode. Fixed a bug where the Lexer would accidentally keep alive the last SourceProvider forever.
  • kjs/nodes.cpp: (JSC::ScopeNode::ScopeNode): (JSC::ProgramNode::ProgramNode): (JSC::ProgramNode::create): (JSC::EvalNode::EvalNode): (JSC::EvalNode::generateCode): (JSC::EvalNode::create): (JSC::FunctionBodyNode::FunctionBodyNode): (JSC::FunctionBodyNode::finishParsing): (JSC::FunctionBodyNode::create): (JSC::FunctionBodyNode::generateCode): (JSC::ProgramNode::generateCode): (JSC::FunctionBodyNode::paramString):
  • kjs/nodes.h: (JSC::ScopeNode::): (JSC::ScopeNode::sourceId): (JSC::FunctionBodyNode::): (JSC::FunctionBodyNode::parameterCount): (JSC::FuncExprNode::): (JSC::FuncDeclNode::): Store a SourceCode in all ScopeNodes, since SourceCode is now responsible for tracking URL, ID, etc. Streamlined some ad hoc FunctionBodyNode fixups into a "finishParsing" function, to help make clear what you need to do in order to finish parsing a FunctionBodyNode.
  • wtf/Vector.h: (WTF::::releaseBuffer): Don't ASSERT that releaseBuffer() is only called when buffer is not 0, since FunctionBodyNode is more than happy to get back a 0 buffer, and other functions like RefPtr::release() allow for 0, too.

JavaScriptGlue:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

  • JSRun.cpp: (JSRun::Evaluate): (JSRun::CheckSyntax): Provide a SourceCode to the Interpreter, since other APIs are no longer supported.

WebCore:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Updated for JavaScriptCore API changes: use a SourceCode instead of
broken out parameters; treat sourceId as intptr_t.

  • ForwardingHeaders/kjs/SourceRange.h: Copied from ForwardingHeaders/kjs/SourceProvider.h.
  • bindings/js/JSXMLHttpRequestCustom.cpp: (WebCore::JSXMLHttpRequest::send):
  • bindings/js/ScriptController.cpp: (WebCore::ScriptController::evaluate):
  • bindings/js/StringSourceProvider.h: (WebCore::StringSourceProvider::create): (WebCore::StringSourceProvider::StringSourceProvider):

(WebCore::makeSource): Added a makeSource function for convenience.

  • bindings/objc/WebScriptObject.mm: (-[WebScriptObject evaluateWebScript:]):
  • bridge/NP_jsobject.cpp: (_NPN_Evaluate):
  • bridge/jni/jni_jsobject.mm: (JavaJSObject::call): (JavaJSObject::eval): (JavaJSObject::getMember): (JavaJSObject::setMember): (JavaJSObject::removeMember):
  • bridge/jni/jni_runtime.h: (JSC::Bindings::JavaString::operator UString): Replaced the explicit ustring() function with an implicit operator because this class already holds a UString::rep.
  • page/Console.cpp: (WebCore::retrieveLastCaller): (WebCore::Console::trace):
  • page/InspectorController.cpp: (WebCore::jsStringRef): (WebCore::InspectorController::addBreakpoint): (WebCore::InspectorController::removeBreakpoint): (WebCore::InspectorController::didParseSource): (WebCore::InspectorController::failedToParseSource):
  • page/InspectorController.h:
  • page/JavaScriptCallFrame.cpp: (WebCore::JavaScriptCallFrame::JavaScriptCallFrame):
  • page/JavaScriptCallFrame.h: (WebCore::JavaScriptCallFrame::create): (WebCore::JavaScriptCallFrame::sourceIdentifier): (WebCore::JavaScriptCallFrame::update):
  • page/JavaScriptDebugListener.h:
  • page/JavaScriptDebugServer.cpp: (WebCore::JavaScriptDebugServer::addBreakpoint): (WebCore::JavaScriptDebugServer::removeBreakpoint): (WebCore::JavaScriptDebugServer::hasBreakpoint): (WebCore::dispatchDidParseSource): (WebCore::dispatchFailedToParseSource): (WebCore::JavaScriptDebugServer::sourceParsed): (WebCore::JavaScriptDebugServer::callEvent): (WebCore::JavaScriptDebugServer::atStatement): (WebCore::JavaScriptDebugServer::returnEvent): (WebCore::JavaScriptDebugServer::exception): (WebCore::JavaScriptDebugServer::willExecuteProgram): (WebCore::JavaScriptDebugServer::didExecuteProgram): (WebCore::JavaScriptDebugServer::didReachBreakpoint):
  • page/JavaScriptDebugServer.h:
  • page/inspector/ScriptsPanel.js: Renamed internal uses of sourceId and sourceIdentifier to sourceID.

WebKit/mac:

2008-10-01 Geoffrey Garen <[email protected]>

Reviewed by Darin Adler and Cameron Zwarich.

Updated for JavaScriptCore API changes: use a SourceCode instead of
broken out parameters; treat sourceId as intptr_t.


We still treat sourceId as int in some cases because of DashCode. See
<rdar://problem/6263293> WebScriptDebugDelegate should use intptr_t for
sourceId, not int.

  • WebView/WebScriptDebugger.h:
  • WebView/WebScriptDebugger.mm: (toNSString): (WebScriptDebugger::sourceParsed): (WebScriptDebugger::callEvent): (WebScriptDebugger::atStatement): (WebScriptDebugger::returnEvent): (WebScriptDebugger::exception): (WebScriptDebugger::willExecuteProgram): (WebScriptDebugger::didExecuteProgram): (WebScriptDebugger::didReachBreakpoint):
  • Property svn:eol-style set to native
File size: 27.3 KB
Line 
1/*
2 * Copyright (C) 1999-2000 Harri Porten ([email protected])
3 * Copyright (C) 2006, 2007, 2008 Apple Inc. All Rights Reserved.
4 * Copyright (C) 2007 Cameron Zwarich ([email protected])
5 *
6 * This library is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU Library General Public
8 * License as published by the Free Software Foundation; either
9 * version 2 of the License, or (at your option) any later version.
10 *
11 * This library is distributed in the hope that it will be useful,
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14 * Library General Public License for more details.
15 *
16 * You should have received a copy of the GNU Library General Public License
17 * along with this library; see the file COPYING.LIB. If not, write to
18 * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
19 * Boston, MA 02110-1301, USA.
20 *
21 */
22
23#include "config.h"
24#include "lexer.h"
25
26#include "dtoa.h"
27#include "JSFunction.h"
28#include "nodes.h"
29#include "NodeInfo.h"
30#include "JSGlobalObjectFunctions.h"
31#include <ctype.h>
32#include <limits.h>
33#include <string.h>
34#include <wtf/Assertions.h>
35#include <wtf/unicode/Unicode.h>
36
37using namespace WTF;
38using namespace Unicode;
39
40// we can't specify the namespace in yacc's C output, so do it here
41using namespace JSC;
42
43#ifndef KDE_USE_FINAL
44#include "grammar.h"
45#endif
46
47#include "lookup.h"
48#include "lexer.lut.h"
49
50// a bridge for yacc from the C world to C++
51int kjsyylex(void* lvalp, void* llocp, void* globalData)
52{
53 return static_cast<JSGlobalData*>(globalData)->lexer->lex(lvalp, llocp);
54}
55
56namespace JSC {
57
58static bool isDecimalDigit(int);
59
60static const size_t initialReadBufferCapacity = 32;
61static const size_t initialStringTableCapacity = 64;
62
63Lexer::Lexer(JSGlobalData* globalData)
64 : yylineno(1)
65 , m_restrKeyword(false)
66 , m_eatNextIdentifier(false)
67 , m_stackToken(-1)
68 , m_lastToken(-1)
69 , m_position(0)
70 , m_code(0)
71 , m_length(0)
72 , m_atLineStart(true)
73 , m_current(0)
74 , m_next1(0)
75 , m_next2(0)
76 , m_next3(0)
77 , m_currentOffset(0)
78 , m_nextOffset1(0)
79 , m_nextOffset2(0)
80 , m_nextOffset3(0)
81 , m_globalData(globalData)
82 , m_mainTable(JSC::mainTable)
83{
84 m_buffer8.reserveCapacity(initialReadBufferCapacity);
85 m_buffer16.reserveCapacity(initialReadBufferCapacity);
86 m_strings.reserveCapacity(initialStringTableCapacity);
87 m_identifiers.reserveCapacity(initialStringTableCapacity);
88}
89
90Lexer::~Lexer()
91{
92 m_mainTable.deleteTable();
93}
94
95void Lexer::setCode(const SourceCode& source)
96{
97 yylineno = source.firstLine();
98 m_restrKeyword = false;
99 m_delimited = false;
100 m_eatNextIdentifier = false;
101 m_stackToken = -1;
102 m_lastToken = -1;
103
104 m_position = 0;
105 m_source = &source;
106 m_code = source.data();
107 m_length = source.length();
108 m_skipLF = false;
109 m_skipCR = false;
110 m_error = false;
111 m_atLineStart = true;
112
113 // read first characters
114 shift(4);
115}
116
117void Lexer::shift(unsigned p)
118{
119 // ECMA-262 calls for stripping Cf characters here, but we only do this for BOM,
120 // see <https://bugs.webkit.org/show_bug.cgi?id=4931>.
121
122 while (p--) {
123 m_current = m_next1;
124 m_next1 = m_next2;
125 m_next2 = m_next3;
126 m_currentOffset = m_nextOffset1;
127 m_nextOffset1 = m_nextOffset2;
128 m_nextOffset2 = m_nextOffset3;
129 do {
130 if (m_position >= m_length) {
131 m_nextOffset3 = m_position;
132 m_position++;
133 m_next3 = -1;
134 break;
135 }
136 m_nextOffset3 = m_position;
137 m_next3 = m_code[m_position++];
138 } while (m_next3 == 0xFEFF);
139 }
140}
141
142// called on each new line
143void Lexer::nextLine()
144{
145 yylineno++;
146 m_atLineStart = true;
147}
148
149void Lexer::setDone(State s)
150{
151 m_state = s;
152 m_done = true;
153}
154
155int Lexer::lex(void* p1, void* p2)
156{
157 YYSTYPE* lvalp = static_cast<YYSTYPE*>(p1);
158 YYLTYPE* llocp = static_cast<YYLTYPE*>(p2);
159 int token = 0;
160 m_state = Start;
161 unsigned short stringType = 0; // either single or double quotes
162 m_buffer8.clear();
163 m_buffer16.clear();
164 m_done = false;
165 m_terminator = false;
166 m_skipLF = false;
167 m_skipCR = false;
168
169 // did we push a token on the stack previously ?
170 // (after an automatic semicolon insertion)
171 if (m_stackToken >= 0) {
172 setDone(Other);
173 token = m_stackToken;
174 m_stackToken = 0;
175 }
176 int startOffset = m_currentOffset;
177 while (!m_done) {
178 if (m_skipLF && m_current != '\n') // found \r but not \n afterwards
179 m_skipLF = false;
180 if (m_skipCR && m_current != '\r') // found \n but not \r afterwards
181 m_skipCR = false;
182 if (m_skipLF || m_skipCR) { // found \r\n or \n\r -> eat the second one
183 m_skipLF = false;
184 m_skipCR = false;
185 shift(1);
186 }
187 switch (m_state) {
188 case Start:
189 startOffset = m_currentOffset;
190 if (isWhiteSpace()) {
191 // do nothing
192 } else if (m_current == '/' && m_next1 == '/') {
193 shift(1);
194 m_state = InSingleLineComment;
195 } else if (m_current == '/' && m_next1 == '*') {
196 shift(1);
197 m_state = InMultiLineComment;
198 } else if (m_current == -1) {
199 if (!m_terminator && !m_delimited) {
200 // automatic semicolon insertion if program incomplete
201 token = ';';
202 m_stackToken = 0;
203 setDone(Other);
204 } else
205 setDone(Eof);
206 } else if (isLineTerminator()) {
207 nextLine();
208 m_terminator = true;
209 if (m_restrKeyword) {
210 token = ';';
211 setDone(Other);
212 }
213 } else if (m_current == '"' || m_current == '\'') {
214 m_state = InString;
215 stringType = static_cast<unsigned short>(m_current);
216 } else if (isIdentStart(m_current)) {
217 record16(m_current);
218 m_state = InIdentifierOrKeyword;
219 } else if (m_current == '\\')
220 m_state = InIdentifierStartUnicodeEscapeStart;
221 else if (m_current == '0') {
222 record8(m_current);
223 m_state = InNum0;
224 } else if (isDecimalDigit(m_current)) {
225 record8(m_current);
226 m_state = InNum;
227 } else if (m_current == '.' && isDecimalDigit(m_next1)) {
228 record8(m_current);
229 m_state = InDecimal;
230 // <!-- marks the beginning of a line comment (for www usage)
231 } else if (m_current == '<' && m_next1 == '!' && m_next2 == '-' && m_next3 == '-') {
232 shift(3);
233 m_state = InSingleLineComment;
234 // same for -->
235 } else if (m_atLineStart && m_current == '-' && m_next1 == '-' && m_next2 == '>') {
236 shift(2);
237 m_state = InSingleLineComment;
238 } else {
239 token = matchPunctuator(lvalp->intValue, m_current, m_next1, m_next2, m_next3);
240 if (token != -1)
241 setDone(Other);
242 else
243 setDone(Bad);
244 }
245 break;
246 case InString:
247 if (m_current == stringType) {
248 shift(1);
249 setDone(String);
250 } else if (isLineTerminator() || m_current == -1)
251 setDone(Bad);
252 else if (m_current == '\\')
253 m_state = InEscapeSequence;
254 else
255 record16(m_current);
256 break;
257 // Escape Sequences inside of strings
258 case InEscapeSequence:
259 if (isOctalDigit(m_current)) {
260 if (m_current >= '0' && m_current <= '3' &&
261 isOctalDigit(m_next1) && isOctalDigit(m_next2)) {
262 record16(convertOctal(m_current, m_next1, m_next2));
263 shift(2);
264 m_state = InString;
265 } else if (isOctalDigit(m_current) && isOctalDigit(m_next1)) {
266 record16(convertOctal('0', m_current, m_next1));
267 shift(1);
268 m_state = InString;
269 } else if (isOctalDigit(m_current)) {
270 record16(convertOctal('0', '0', m_current));
271 m_state = InString;
272 } else
273 setDone(Bad);
274 } else if (m_current == 'x')
275 m_state = InHexEscape;
276 else if (m_current == 'u')
277 m_state = InUnicodeEscape;
278 else if (isLineTerminator()) {
279 nextLine();
280 m_state = InString;
281 } else {
282 record16(singleEscape(static_cast<unsigned short>(m_current)));
283 m_state = InString;
284 }
285 break;
286 case InHexEscape:
287 if (isHexDigit(m_current) && isHexDigit(m_next1)) {
288 m_state = InString;
289 record16(convertHex(m_current, m_next1));
290 shift(1);
291 } else if (m_current == stringType) {
292 record16('x');
293 shift(1);
294 setDone(String);
295 } else {
296 record16('x');
297 record16(m_current);
298 m_state = InString;
299 }
300 break;
301 case InUnicodeEscape:
302 if (isHexDigit(m_current) && isHexDigit(m_next1) && isHexDigit(m_next2) && isHexDigit(m_next3)) {
303 record16(convertUnicode(m_current, m_next1, m_next2, m_next3));
304 shift(3);
305 m_state = InString;
306 } else if (m_current == stringType) {
307 record16('u');
308 shift(1);
309 setDone(String);
310 } else
311 setDone(Bad);
312 break;
313 case InSingleLineComment:
314 if (isLineTerminator()) {
315 nextLine();
316 m_terminator = true;
317 if (m_restrKeyword) {
318 token = ';';
319 setDone(Other);
320 } else
321 m_state = Start;
322 } else if (m_current == -1)
323 setDone(Eof);
324 break;
325 case InMultiLineComment:
326 if (m_current == -1)
327 setDone(Bad);
328 else if (isLineTerminator())
329 nextLine();
330 else if (m_current == '*' && m_next1 == '/') {
331 m_state = Start;
332 shift(1);
333 }
334 break;
335 case InIdentifierOrKeyword:
336 case InIdentifier:
337 if (isIdentPart(m_current))
338 record16(m_current);
339 else if (m_current == '\\')
340 m_state = InIdentifierPartUnicodeEscapeStart;
341 else
342 setDone(m_state == InIdentifierOrKeyword ? IdentifierOrKeyword : Identifier);
343 break;
344 case InNum0:
345 if (m_current == 'x' || m_current == 'X') {
346 record8(m_current);
347 m_state = InHex;
348 } else if (m_current == '.') {
349 record8(m_current);
350 m_state = InDecimal;
351 } else if (m_current == 'e' || m_current == 'E') {
352 record8(m_current);
353 m_state = InExponentIndicator;
354 } else if (isOctalDigit(m_current)) {
355 record8(m_current);
356 m_state = InOctal;
357 } else if (isDecimalDigit(m_current)) {
358 record8(m_current);
359 m_state = InDecimal;
360 } else
361 setDone(Number);
362 break;
363 case InHex:
364 if (isHexDigit(m_current))
365 record8(m_current);
366 else
367 setDone(Hex);
368 break;
369 case InOctal:
370 if (isOctalDigit(m_current))
371 record8(m_current);
372 else if (isDecimalDigit(m_current)) {
373 record8(m_current);
374 m_state = InDecimal;
375 } else
376 setDone(Octal);
377 break;
378 case InNum:
379 if (isDecimalDigit(m_current))
380 record8(m_current);
381 else if (m_current == '.') {
382 record8(m_current);
383 m_state = InDecimal;
384 } else if (m_current == 'e' || m_current == 'E') {
385 record8(m_current);
386 m_state = InExponentIndicator;
387 } else
388 setDone(Number);
389 break;
390 case InDecimal:
391 if (isDecimalDigit(m_current))
392 record8(m_current);
393 else if (m_current == 'e' || m_current == 'E') {
394 record8(m_current);
395 m_state = InExponentIndicator;
396 } else
397 setDone(Number);
398 break;
399 case InExponentIndicator:
400 if (m_current == '+' || m_current == '-')
401 record8(m_current);
402 else if (isDecimalDigit(m_current)) {
403 record8(m_current);
404 m_state = InExponent;
405 } else
406 setDone(Bad);
407 break;
408 case InExponent:
409 if (isDecimalDigit(m_current))
410 record8(m_current);
411 else
412 setDone(Number);
413 break;
414 case InIdentifierStartUnicodeEscapeStart:
415 if (m_current == 'u')
416 m_state = InIdentifierStartUnicodeEscape;
417 else
418 setDone(Bad);
419 break;
420 case InIdentifierPartUnicodeEscapeStart:
421 if (m_current == 'u')
422 m_state = InIdentifierPartUnicodeEscape;
423 else
424 setDone(Bad);
425 break;
426 case InIdentifierStartUnicodeEscape:
427 if (!isHexDigit(m_current) || !isHexDigit(m_next1) || !isHexDigit(m_next2) || !isHexDigit(m_next3)) {
428 setDone(Bad);
429 break;
430 }
431 token = convertUnicode(m_current, m_next1, m_next2, m_next3);
432 shift(3);
433 if (!isIdentStart(token)) {
434 setDone(Bad);
435 break;
436 }
437 record16(token);
438 m_state = InIdentifier;
439 break;
440 case InIdentifierPartUnicodeEscape:
441 if (!isHexDigit(m_current) || !isHexDigit(m_next1) || !isHexDigit(m_next2) || !isHexDigit(m_next3)) {
442 setDone(Bad);
443 break;
444 }
445 token = convertUnicode(m_current, m_next1, m_next2, m_next3);
446 shift(3);
447 if (!isIdentPart(token)) {
448 setDone(Bad);
449 break;
450 }
451 record16(token);
452 m_state = InIdentifier;
453 break;
454 default:
455 ASSERT(!"Unhandled state in switch statement");
456 }
457
458 // move on to the next character
459 if (!m_done)
460 shift(1);
461 if (m_state != Start && m_state != InSingleLineComment)
462 m_atLineStart = false;
463 }
464
465 // no identifiers allowed directly after numeric literal, e.g. "3in" is bad
466 if ((m_state == Number || m_state == Octal || m_state == Hex) && isIdentStart(m_current))
467 m_state = Bad;
468
469 // terminate string
470 m_buffer8.append('\0');
471
472#ifdef KJS_DEBUG_LEX
473 fprintf(stderr, "line: %d ", lineNo());
474 fprintf(stderr, "yytext (%x): ", m_buffer8[0]);
475 fprintf(stderr, "%s ", m_buffer8.data());
476#endif
477
478 double dval = 0;
479 if (m_state == Number)
480 dval = strtod(m_buffer8.data(), 0L);
481 else if (m_state == Hex) { // scan hex numbers
482 const char* p = m_buffer8.data() + 2;
483 while (char c = *p++) {
484 dval *= 16;
485 dval += convertHex(c);
486 }
487
488 if (dval >= mantissaOverflowLowerBound)
489 dval = parseIntOverflow(m_buffer8.data() + 2, p - (m_buffer8.data() + 3), 16);
490
491 m_state = Number;
492 } else if (m_state == Octal) { // scan octal number
493 const char* p = m_buffer8.data() + 1;
494 while (char c = *p++) {
495 dval *= 8;
496 dval += c - '0';
497 }
498
499 if (dval >= mantissaOverflowLowerBound)
500 dval = parseIntOverflow(m_buffer8.data() + 1, p - (m_buffer8.data() + 2), 8);
501
502 m_state = Number;
503 }
504
505#ifdef KJS_DEBUG_LEX
506 switch (m_state) {
507 case Eof:
508 printf("(EOF)\n");
509 break;
510 case Other:
511 printf("(Other)\n");
512 break;
513 case Identifier:
514 printf("(Identifier)/(Keyword)\n");
515 break;
516 case String:
517 printf("(String)\n");
518 break;
519 case Number:
520 printf("(Number)\n");
521 break;
522 default:
523 printf("(unknown)");
524 }
525#endif
526
527 if (m_state != Identifier)
528 m_eatNextIdentifier = false;
529
530 m_restrKeyword = false;
531 m_delimited = false;
532 llocp->first_line = yylineno;
533 llocp->last_line = yylineno;
534 llocp->first_column = startOffset;
535 llocp->last_column = m_currentOffset;
536 switch (m_state) {
537 case Eof:
538 token = 0;
539 break;
540 case Other:
541 if (token == '}' || token == ';')
542 m_delimited = true;
543 break;
544 case Identifier:
545 // Apply anonymous-function hack below (eat the identifier).
546 if (m_eatNextIdentifier) {
547 m_eatNextIdentifier = false;
548 token = lex(lvalp, llocp);
549 break;
550 }
551 lvalp->ident = makeIdentifier(m_buffer16);
552 token = IDENT;
553 break;
554 case IdentifierOrKeyword: {
555 lvalp->ident = makeIdentifier(m_buffer16);
556 const HashEntry* entry = m_mainTable.entry(m_globalData, *lvalp->ident);
557 if (!entry) {
558 // Lookup for keyword failed, means this is an identifier.
559 token = IDENT;
560 break;
561 }
562 token = entry->lexerValue();
563 // Hack for "f = function somename() { ... }"; too hard to get into the grammar.
564 m_eatNextIdentifier = token == FUNCTION && m_lastToken == '=';
565 if (token == CONTINUE || token == BREAK || token == RETURN || token == THROW)
566 m_restrKeyword = true;
567 break;
568 }
569 case String:
570 // Atomize constant strings in case they're later used in property lookup.
571 lvalp->ident = makeIdentifier(m_buffer16);
572 token = STRING;
573 break;
574 case Number:
575 lvalp->doubleValue = dval;
576 token = NUMBER;
577 break;
578 case Bad:
579#ifdef KJS_DEBUG_LEX
580 fprintf(stderr, "yylex: ERROR.\n");
581#endif
582 m_error = true;
583 return -1;
584 default:
585 ASSERT(!"unhandled numeration value in switch");
586 m_error = true;
587 return -1;
588 }
589 m_lastToken = token;
590 return token;
591}
592
593bool Lexer::isWhiteSpace() const
594{
595 return m_current == '\t' || m_current == 0x0b || m_current == 0x0c || isSeparatorSpace(m_current);
596}
597
598bool Lexer::isLineTerminator()
599{
600 bool cr = (m_current == '\r');
601 bool lf = (m_current == '\n');
602 if (cr)
603 m_skipLF = true;
604 else if (lf)
605 m_skipCR = true;
606 return cr || lf || m_current == 0x2028 || m_current == 0x2029;
607}
608
609bool Lexer::isIdentStart(int c)
610{
611 return (category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other))
612 || c == '$' || c == '_';
613}
614
615bool Lexer::isIdentPart(int c)
616{
617 return (category(c) & (Letter_Uppercase | Letter_Lowercase | Letter_Titlecase | Letter_Modifier | Letter_Other
618 | Mark_NonSpacing | Mark_SpacingCombining | Number_DecimalDigit | Punctuation_Connector))
619 || c == '$' || c == '_';
620}
621
622static bool isDecimalDigit(int c)
623{
624 return (c >= '0' && c <= '9');
625}
626
627bool Lexer::isHexDigit(int c)
628{
629 return (c >= '0' && c <= '9'
630 || c >= 'a' && c <= 'f'
631 || c >= 'A' && c <= 'F');
632}
633
634bool Lexer::isOctalDigit(int c)
635{
636 return (c >= '0' && c <= '7');
637}
638
639int Lexer::matchPunctuator(int& charPos, int c1, int c2, int c3, int c4)
640{
641 if (c1 == '>' && c2 == '>' && c3 == '>' && c4 == '=') {
642 shift(4);
643 return URSHIFTEQUAL;
644 }
645 if (c1 == '=' && c2 == '=' && c3 == '=') {
646 shift(3);
647 return STREQ;
648 }
649 if (c1 == '!' && c2 == '=' && c3 == '=') {
650 shift(3);
651 return STRNEQ;
652 }
653 if (c1 == '>' && c2 == '>' && c3 == '>') {
654 shift(3);
655 return URSHIFT;
656 }
657 if (c1 == '<' && c2 == '<' && c3 == '=') {
658 shift(3);
659 return LSHIFTEQUAL;
660 }
661 if (c1 == '>' && c2 == '>' && c3 == '=') {
662 shift(3);
663 return RSHIFTEQUAL;
664 }
665 if (c1 == '<' && c2 == '=') {
666 shift(2);
667 return LE;
668 }
669 if (c1 == '>' && c2 == '=') {
670 shift(2);
671 return GE;
672 }
673 if (c1 == '!' && c2 == '=') {
674 shift(2);
675 return NE;
676 }
677 if (c1 == '+' && c2 == '+') {
678 shift(2);
679 if (m_terminator)
680 return AUTOPLUSPLUS;
681 return PLUSPLUS;
682 }
683 if (c1 == '-' && c2 == '-') {
684 shift(2);
685 if (m_terminator)
686 return AUTOMINUSMINUS;
687 return MINUSMINUS;
688 }
689 if (c1 == '=' && c2 == '=') {
690 shift(2);
691 return EQEQ;
692 }
693 if (c1 == '+' && c2 == '=') {
694 shift(2);
695 return PLUSEQUAL;
696 }
697 if (c1 == '-' && c2 == '=') {
698 shift(2);
699 return MINUSEQUAL;
700 }
701 if (c1 == '*' && c2 == '=') {
702 shift(2);
703 return MULTEQUAL;
704 }
705 if (c1 == '/' && c2 == '=') {
706 shift(2);
707 return DIVEQUAL;
708 }
709 if (c1 == '&' && c2 == '=') {
710 shift(2);
711 return ANDEQUAL;
712 }
713 if (c1 == '^' && c2 == '=') {
714 shift(2);
715 return XOREQUAL;
716 }
717 if (c1 == '%' && c2 == '=') {
718 shift(2);
719 return MODEQUAL;
720 }
721 if (c1 == '|' && c2 == '=') {
722 shift(2);
723 return OREQUAL;
724 }
725 if (c1 == '<' && c2 == '<') {
726 shift(2);
727 return LSHIFT;
728 }
729 if (c1 == '>' && c2 == '>') {
730 shift(2);
731 return RSHIFT;
732 }
733 if (c1 == '&' && c2 == '&') {
734 shift(2);
735 return AND;
736 }
737 if (c1 == '|' && c2 == '|') {
738 shift(2);
739 return OR;
740 }
741
742 switch (c1) {
743 case '=':
744 case '>':
745 case '<':
746 case ',':
747 case '!':
748 case '~':
749 case '?':
750 case ':':
751 case '.':
752 case '+':
753 case '-':
754 case '*':
755 case '/':
756 case '&':
757 case '|':
758 case '^':
759 case '%':
760 case '(':
761 case ')':
762 case '[':
763 case ']':
764 case ';':
765 shift(1);
766 return static_cast<int>(c1);
767 case '{':
768 charPos = m_position - 4;
769 shift(1);
770 return OPENBRACE;
771 case '}':
772 charPos = m_position - 4;
773 shift(1);
774 return CLOSEBRACE;
775 default:
776 return -1;
777 }
778}
779
780unsigned short Lexer::singleEscape(unsigned short c)
781{
782 switch (c) {
783 case 'b':
784 return 0x08;
785 case 't':
786 return 0x09;
787 case 'n':
788 return 0x0A;
789 case 'v':
790 return 0x0B;
791 case 'f':
792 return 0x0C;
793 case 'r':
794 return 0x0D;
795 case '"':
796 return 0x22;
797 case '\'':
798 return 0x27;
799 case '\\':
800 return 0x5C;
801 default:
802 return c;
803 }
804}
805
806unsigned short Lexer::convertOctal(int c1, int c2, int c3)
807{
808 return static_cast<unsigned short>((c1 - '0') * 64 + (c2 - '0') * 8 + c3 - '0');
809}
810
811unsigned char Lexer::convertHex(int c)
812{
813 if (c >= '0' && c <= '9')
814 return static_cast<unsigned char>(c - '0');
815 if (c >= 'a' && c <= 'f')
816 return static_cast<unsigned char>(c - 'a' + 10);
817 return static_cast<unsigned char>(c - 'A' + 10);
818}
819
820unsigned char Lexer::convertHex(int c1, int c2)
821{
822 return ((convertHex(c1) << 4) + convertHex(c2));
823}
824
825UChar Lexer::convertUnicode(int c1, int c2, int c3, int c4)
826{
827 unsigned char highByte = (convertHex(c1) << 4) + convertHex(c2);
828 unsigned char lowByte = (convertHex(c3) << 4) + convertHex(c4);
829 return (highByte << 8 | lowByte);
830}
831
832void Lexer::record8(int c)
833{
834 ASSERT(c >= 0);
835 ASSERT(c <= 0xff);
836 m_buffer8.append(static_cast<char>(c));
837}
838
839void Lexer::record16(int c)
840{
841 ASSERT(c >= 0);
842 ASSERT(c <= USHRT_MAX);
843 record16(UChar(static_cast<unsigned short>(c)));
844}
845
846void Lexer::record16(UChar c)
847{
848 m_buffer16.append(c);
849}
850
851bool Lexer::scanRegExp()
852{
853 m_buffer16.clear();
854 bool lastWasEscape = false;
855 bool inBrackets = false;
856
857 while (1) {
858 if (isLineTerminator() || m_current == -1)
859 return false;
860 else if (m_current != '/' || lastWasEscape == true || inBrackets == true) {
861 // keep track of '[' and ']'
862 if (!lastWasEscape) {
863 if ( m_current == '[' && !inBrackets )
864 inBrackets = true;
865 if ( m_current == ']' && inBrackets )
866 inBrackets = false;
867 }
868 record16(m_current);
869 lastWasEscape =
870 !lastWasEscape && (m_current == '\\');
871 } else { // end of regexp
872 m_pattern = UString(m_buffer16);
873 m_buffer16.clear();
874 shift(1);
875 break;
876 }
877 shift(1);
878 }
879
880 while (isIdentPart(m_current)) {
881 record16(m_current);
882 shift(1);
883 }
884 m_flags = UString(m_buffer16);
885
886 return true;
887}
888
889void Lexer::clear()
890{
891 deleteAllValues(m_strings);
892 Vector<UString*> newStrings;
893 newStrings.reserveCapacity(initialStringTableCapacity);
894 m_strings.swap(newStrings);
895
896 deleteAllValues(m_identifiers);
897 Vector<JSC::Identifier*> newIdentifiers;
898 newIdentifiers.reserveCapacity(initialStringTableCapacity);
899 m_identifiers.swap(newIdentifiers);
900
901 Vector<char> newBuffer8;
902 newBuffer8.reserveCapacity(initialReadBufferCapacity);
903 m_buffer8.swap(newBuffer8);
904
905 Vector<UChar> newBuffer16;
906 newBuffer16.reserveCapacity(initialReadBufferCapacity);
907 m_buffer16.swap(newBuffer16);
908
909 m_pattern = 0;
910 m_flags = 0;
911}
912
913Identifier* Lexer::makeIdentifier(const Vector<UChar>& buffer)
914{
915 JSC::Identifier* identifier = new JSC::Identifier(m_globalData, buffer.data(), buffer.size());
916 m_identifiers.append(identifier);
917 return identifier;
918}
919
920} // namespace JSC
Note: See TracBrowser for help on using the repository browser.