Context Navigation

← Previous Change
Next Change →

ParserTokens.h

Timestamp:

Mar 16, 2020, 5:12:17 PM (5 years ago)

Author:

Message:

JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t
https://bugs.webkit.org/show_bug.cgi?id=208998

Reviewed by Michael Saboff.

JSTests:

stress/unicode-identifiers-with-surrogate-pairs.js: Added.

(let.c.of.chars.eval.foo):
(throwsSyntaxError):
(let.c.of.continueChars.throwsSyntaxError.foo):

Source/JavaScriptCore:

This patch fixes a bug in the parser that allows for surrogate pairs when parsing identifiers.
It also makes a few other changes to the parser:

1) When looking for keywords we just need to check that subsequent
character cannot be a identifier part or an escape start.

2) The only time we call parseIdentifierSlowCase is when we hit an
escape start or a surrogate pair so we can optimize that to just
copy everything up slow character into our buffer.

3) We shouldn't allow for asking if a UChar is an identifier start/part.

KeywordLookupGenerator.py:

(Trie.printSubTreeAsC):
(Trie.printAsC):

parser/Lexer.cpp:

(JSC::isNonLatin1IdentStart):
(JSC::isIdentStart):
(JSC::isSingleCharacterIdentStart):
(JSC::cannotBeIdentStart):
(JSC::isIdentPart):
(JSC::isSingleCharacterIdentPart):
(JSC::cannotBeIdentPartOrEscapeStart):
(JSC::Lexer<LChar>::currentCodePoint const):
(JSC::Lexer<UChar>::currentCodePoint const):
(JSC::Lexer<LChar>::parseIdentifier):
(JSC::Lexer<UChar>::parseIdentifier):
(JSC::Lexer<CharacterType>::parseIdentifierSlowCase):
(JSC::Lexer<T>::lexWithoutClearingLineTerminator):
(JSC::Lexer<T>::scanRegExp):
(JSC::isIdentPartIncludingEscapeTemplate): Deleted.
(JSC::isIdentPartIncludingEscape): Deleted.

parser/Lexer.h:

(JSC::Lexer::setOffsetFromSourcePtr): Deleted.

parser/Parser.cpp:

(JSC::Parser<LexerType>::printUnexpectedTokenText):

parser/ParserTokens.h:

Source/WTF:

wtf/text/WTFString.cpp:

(WTF::String::fromCodePoint):

wtf/text/WTFString.h:

LayoutTests:

Fix broken test that asserted a non-ID_START codepoint was a start codepoint and
an ID_START codepoint was not a valid codepoint...

js/script-tests/unicode-escape-sequences.js:
js/unicode-escape-sequences-expected.txt:

File:

: 1 edited

trunk/Source/JavaScriptCore/parser/ParserTokens.h (modified) (3 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/JavaScriptCore/parser/ParserTokens.h

-              r255440
+              r258531
 enum {
     // Token Bitfield: 0b000000000RTE000IIIIPPPPKUXXXXXXX
+    // Token Bitfield: 0b000000000RTE00IIIIPPPPKUXXXXXXXX
     // R = right-associative bit
     // T = unterminated error flag
 …
     //
     // We must keep the upper 8bit (1byte) region empty. JSTokenType must be 24bits.
     UnaryOpTokenFlag = 128,
     KeywordTokenFlag = 256,
     BinaryOpTokenPrecedenceShift = 9,
+    UnaryOpTokenFlag = 1 << 8,
+    KeywordTokenFlag = 1 << 9,
+    BinaryOpTokenPrecedenceShift = 10,
     BinaryOpTokenAllowsInPrecedenceAdditionalShift = 4,
     BinaryOpTokenPrecedenceMask = 15 << BinaryOpTokenPrecedenceShift,
     ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 7),
+    ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 6),
     UnterminatedErrorTokenFlag = ErrorTokenFlag << 1,
     RightAssociativeBinaryOpTokenFlag = UnterminatedErrorTokenFlag << 1
 …
     INVALID_TEMPLATE_LITERAL_ERRORTOK = 15 | ErrorTokenFlag,
     UNEXPECTED_ESCAPE_ERRORTOK = 16 | ErrorTokenFlag,
+    INVALID_UNICODE_ENCODING_ERRORTOK = 17 | ErrorTokenFlag,
+    INVALID_IDENTIFIER_UNICODE_ERRORTOK = 18 | ErrorTokenFlag,
 };
 static_assert(static_cast<unsigned>(POW) <= 0x00ffffffU, "JSTokenType must be 24bits.");

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 258531 in webkit for trunk/Source/JavaScriptCore/parser/ParserTokens.h

Legend:

trunk/Source/JavaScriptCore/parser/ParserTokens.h

Download in other formats: