Ignore:
Timestamp:
Mar 16, 2020, 5:12:17 PM (5 years ago)
Author:
[email protected]
Message:

JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t
https://bugs.webkit.org/show_bug.cgi?id=208998

Reviewed by Michael Saboff.

JSTests:

  • stress/unicode-identifiers-with-surrogate-pairs.js: Added.

(let.c.of.chars.eval.foo):
(throwsSyntaxError):
(let.c.of.continueChars.throwsSyntaxError.foo):

Source/JavaScriptCore:

This patch fixes a bug in the parser that allows for surrogate pairs when parsing identifiers.
It also makes a few other changes to the parser:

1) When looking for keywords we just need to check that subsequent
character cannot be a identifier part or an escape start.

2) The only time we call parseIdentifierSlowCase is when we hit an
escape start or a surrogate pair so we can optimize that to just
copy everything up slow character into our buffer.

3) We shouldn't allow for asking if a UChar is an identifier start/part.

  • KeywordLookupGenerator.py:

(Trie.printSubTreeAsC):
(Trie.printAsC):

  • parser/Lexer.cpp:

(JSC::isNonLatin1IdentStart):
(JSC::isIdentStart):
(JSC::isSingleCharacterIdentStart):
(JSC::cannotBeIdentStart):
(JSC::isIdentPart):
(JSC::isSingleCharacterIdentPart):
(JSC::cannotBeIdentPartOrEscapeStart):
(JSC::Lexer<LChar>::currentCodePoint const):
(JSC::Lexer<UChar>::currentCodePoint const):
(JSC::Lexer<LChar>::parseIdentifier):
(JSC::Lexer<UChar>::parseIdentifier):
(JSC::Lexer<CharacterType>::parseIdentifierSlowCase):
(JSC::Lexer<T>::lexWithoutClearingLineTerminator):
(JSC::Lexer<T>::scanRegExp):
(JSC::isIdentPartIncludingEscapeTemplate): Deleted.
(JSC::isIdentPartIncludingEscape): Deleted.

  • parser/Lexer.h:

(JSC::Lexer::setOffsetFromSourcePtr): Deleted.

  • parser/Parser.cpp:

(JSC::Parser<LexerType>::printUnexpectedTokenText):

  • parser/ParserTokens.h:

Source/WTF:

  • wtf/text/WTFString.cpp:

(WTF::String::fromCodePoint):

  • wtf/text/WTFString.h:

LayoutTests:

Fix broken test that asserted a non-ID_START codepoint was a start codepoint and
an ID_START codepoint was not a valid codepoint...

  • js/script-tests/unicode-escape-sequences.js:
  • js/unicode-escape-sequences-expected.txt:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/parser/ParserTokens.h

    r255440 r258531  
    3434
    3535enum {
    36     // Token Bitfield: 0b000000000RTE000IIIIPPPPKUXXXXXXX
     36    // Token Bitfield: 0b000000000RTE00IIIIPPPPKUXXXXXXXX
    3737    // R = right-associative bit
    3838    // T = unterminated error flag
     
    4444    //
    4545    // We must keep the upper 8bit (1byte) region empty. JSTokenType must be 24bits.
    46     UnaryOpTokenFlag = 128,
    47     KeywordTokenFlag = 256,
    48     BinaryOpTokenPrecedenceShift = 9,
     46    UnaryOpTokenFlag = 1 << 8,
     47    KeywordTokenFlag = 1 << 9,
     48    BinaryOpTokenPrecedenceShift = 10,
    4949    BinaryOpTokenAllowsInPrecedenceAdditionalShift = 4,
    5050    BinaryOpTokenPrecedenceMask = 15 << BinaryOpTokenPrecedenceShift,
    51     ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 7),
     51    ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 6),
    5252    UnterminatedErrorTokenFlag = ErrorTokenFlag << 1,
    5353    RightAssociativeBinaryOpTokenFlag = UnterminatedErrorTokenFlag << 1
     
    193193    INVALID_TEMPLATE_LITERAL_ERRORTOK = 15 | ErrorTokenFlag,
    194194    UNEXPECTED_ESCAPE_ERRORTOK = 16 | ErrorTokenFlag,
     195    INVALID_UNICODE_ENCODING_ERRORTOK = 17 | ErrorTokenFlag,
     196    INVALID_IDENTIFIER_UNICODE_ERRORTOK = 18 | ErrorTokenFlag,
    195197};
    196198static_assert(static_cast<unsigned>(POW) <= 0x00ffffffU, "JSTokenType must be 24bits.");
Note: See TracChangeset for help on using the changeset viewer.