Context Navigation

← Previous Change
Next Change →

Lexer.cpp

Timestamp:

Jan 27, 2017, 7:09:12 PM (8 years ago)

Author:

Yusuke Suzuki

Message:

Lift template escape sequence restrictions in tagged templates
https://bugs.webkit.org/show_bug.cgi?id=166871

Reviewed by Saam Barati.

JSTests:

Update the error messages and add new tests.

ChakraCore/test/es6/unicode_6_identifier_Blue524737.baseline-jsc:
stress/lift-template-literal.js: Added.

(dump):
(testTag.return.tag):
(testTag):

stress/template-literal-syntax.js:

Source/JavaScriptCore:

This patch implements stage 3 Lifting Template Literal Restriction[1].
Prior to this patch, template literal becomes syntax error if it contains
invalid escape sequences. But it is too restricted; Template literal
can have cooked and raw representations and only cooked representation
can escape sequences. So even if invalid escape sequences are included,
the raw representation can be valid.

Lifting Template Literal Restriction relaxes the above restriction.
When invalid escape sequence is included, if target template literals
are used as tagged templates, we make the result of the template including
the invalid escape sequence undefined instead of making it SyntaxError
immediately. It allows us to accept the templates including invalid
escape sequences in the raw representations in tagged templates.

On the other hand, the raw representation is only used in tagged templates.
So if invalid escape sequences are included in the usual template literals,
we just make it SyntaxError as before.

[1]: https://github.com/tc39/proposal-template-literal-revision

bytecompiler/BytecodeGenerator.cpp:

(JSC::BytecodeGenerator::emitGetTemplateObject):

bytecompiler/NodesCodegen.cpp:

(JSC::TemplateStringNode::emitBytecode):
(JSC::TemplateLiteralNode::emitBytecode):

parser/ASTBuilder.h:

(JSC::ASTBuilder::createTemplateString):

parser/Lexer.cpp:

(JSC::Lexer<CharacterType>::parseUnicodeEscape):
(JSC::Lexer<T>::parseTemplateLiteral):
(JSC::Lexer<T>::lex):
(JSC::Lexer<T>::scanTemplateString):
(JSC::Lexer<T>::scanTrailingTemplateString): Deleted.

parser/Lexer.h:
parser/NodeConstructors.h:

(JSC::TemplateStringNode::TemplateStringNode):

parser/Nodes.h:

(JSC::TemplateStringNode::cooked):
(JSC::TemplateStringNode::raw):

parser/Parser.cpp:

(JSC::Parser<LexerType>::parseAssignmentElement):
(JSC::Parser<LexerType>::parseTemplateString):
(JSC::Parser<LexerType>::parseTemplateLiteral):
(JSC::Parser<LexerType>::parsePrimaryExpression):
(JSC::Parser<LexerType>::parseMemberExpression):

parser/ParserTokens.h:
parser/SyntaxChecker.h:

(JSC::SyntaxChecker::createTemplateString):

runtime/TemplateRegistry.cpp:

(JSC::TemplateRegistry::getTemplateObject):

runtime/TemplateRegistryKey.h:

(JSC::TemplateRegistryKey::cookedStrings):
(JSC::TemplateRegistryKey::create):
(JSC::TemplateRegistryKey::TemplateRegistryKey):

runtime/TemplateRegistryKeyTable.cpp:

(JSC::TemplateRegistryKeyTable::createKey):

runtime/TemplateRegistryKeyTable.h:

LayoutTests:

Update the error messages.

inspector/runtime/parse-expected.txt:
js/unicode-escape-sequences-expected.txt:

File:

: 1 edited

trunk/Source/JavaScriptCore/parser/Lexer.cpp (modified) (19 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/JavaScriptCore/parser/Lexer.cpp

-              r209632
+              r211319
 };
+template<typename CharacterType> ParsedUnicodeEscapeValue Lexer<CharacterType>::parseUnicodeEscape()
+template<typename CharacterType>
+ParsedUnicodeEscapeValue Lexer<CharacterType>::parseUnicodeEscape()
+{
     if (m_current == '{') {
 …
                 return m_current ? ParsedUnicodeEscapeValue::Invalid : ParsedUnicodeEscapeValue::Incomplete;
             codePoint = (codePoint << 4) | toASCIIHexValue(m_current);
+            if (codePoint > UCHAR_MAX_VALUE)
+                return ParsedUnicodeEscapeValue::Invalid;
+            if (codePoint > UCHAR_MAX_VALUE) {
+                // For raw template literal syntax, we consume `NotEscapeSequence`.
+                // Here, we consume NotCodePoint's HexDigits.
+                //
+                // NotEscapeSequence ::
+                //     u { [lookahread not one of HexDigit]
+                //     u { NotCodePoint
+                //     u { CodePoint [lookahead != }]
+                //
+                // NotCodePoint ::
+                //     HexDigits but not if MV of HexDigits <= 0x10FFFF
+                //
+                // CodePoint ::
+                //     HexDigits but not if MV of HexDigits > 0x10FFFF
+                shift();
+                while (isASCIIHexDigit(m_current))
+                    shift();
+                return atEnd() ? ParsedUnicodeEscapeValue::Incomplete : ParsedUnicodeEscapeValue::Invalid;
+            }
             shift();
         } while (m_current != '}');
 …
     auto character3 = peek(2);
     auto character4 = peek(3);
+    if (UNLIKELY(!isASCIIHexDigit(m_current) || !isASCIIHexDigit(character2) || !isASCIIHexDigit(character3) || !isASCIIHexDigit(character4)))
+        return (m_code + 4) >= m_codeEnd ? ParsedUnicodeEscapeValue::Incomplete : ParsedUnicodeEscapeValue::Invalid;
+    if (UNLIKELY(!isASCIIHexDigit(m_current) || !isASCIIHexDigit(character2) || !isASCIIHexDigit(character3) || !isASCIIHexDigit(character4))) {
+        auto result = (m_code + 4) >= m_codeEnd ? ParsedUnicodeEscapeValue::Incomplete : ParsedUnicodeEscapeValue::Invalid;
+        // For raw template literal syntax, we consume `NotEscapeSequence`.
+        //
+        // NotEscapeSequence ::
+        //     u [lookahead not one of HexDigit][lookahead != {]
+        //     u HexDigit [lookahead not one of HexDigit]
+        //     u HexDigit HexDigit [lookahead not one of HexDigit]
+        //     u HexDigit HexDigit HexDigit [lookahead not one of HexDigit]
+        while (isASCIIHexDigit(m_current))
+            shift();
+        return result;
+    }
     auto result = convertUnicode(m_current, character2, character3, character4);
     shift();
 …
 template <typename T>
 template <bool shouldBuildStrings> ALWAYS_INLINE auto Lexer<T>::parseComplexEscape(EscapeParseMode escapeParseMode, bool strictMode, T stringQuoteCharacter) -> StringParseResult
+template <bool shouldBuildStrings, LexerEscapeParseMode escapeParseMode> ALWAYS_INLINE auto Lexer<T>::parseComplexEscape(bool strictMode, T stringQuoteCharacter) -> StringParseResult
+{
     if (m_current == 'x') {
         shift();
         if (!isASCIIHexDigit(m_current) || !isASCIIHexDigit(peek(1))) {
+            // For raw template literal syntax, we consume `NotEscapeSequence`.
+            //
+            // NotEscapeSequence ::
+            //     x [lookahread not one of HexDigit]
+            //     x HexDigit [lookahread not one of HexDigit]
+            if (isASCIIHexDigit(m_current))
+                shift();
+            ASSERT(!isASCIIHexDigit(m_current));
             m_lexErrorMessage = ASCIILiteral("\\x can only be followed by a hex character sequence");
+            return StringCannotBeParsed;
+        }
+            return atEnd() ? StringUnterminated : StringCannotBeParsed;
+        }
         T prev = m_current;
         shift();
 …
             record16(convertHex(prev, m_current));
         shift();
         return StringParsedSuccessfully;
+    }
 …
         shift();
         if (escapeParseMode == EscapeParseMode::String && m_current == stringQuoteCharacter) {
+        if (escapeParseMode == LexerEscapeParseMode::String && m_current == stringQuoteCharacter) {
             if (shouldBuildStrings)
                 record16('u');
 …
         m_lexErrorMessage = ASCIILiteral("\\u can only be followed by a Unicode character sequence");
         return character.isIncomplete() ? StringUnterminated : StringCannotBeParsed;
+        return atEnd() ? StringUnterminated : StringCannotBeParsed;
+    }
 …
             shift();
             if (character1 != '0' || isASCIIDigit(m_current)) {
+                // For raw template literal syntax, we consume `NotEscapeSequence`.
+                //
+                // NotEscapeSequence ::
+                //     0 DecimalDigit
+                //     DecimalDigit but not 0
+                if (character1 == '0')
+                    shift();
                 m_lexErrorMessage = ASCIILiteral("The only valid numeric escape in strict mode is '\\0'");
                 return StringCannotBeParsed;
+                return atEnd() ? StringUnterminated : StringCannotBeParsed;
+            }
             if (shouldBuildStrings)
 …
                 shiftLineTerminator();
             else {
                 StringParseResult result = parseComplexEscape<shouldBuildStrings>(EscapeParseMode::String, strictMode, stringQuoteCharacter);
+                StringParseResult result = parseComplexEscape<shouldBuildStrings, LexerEscapeParseMode::String>(strictMode, stringQuoteCharacter);
                 if (result != StringParsedSuccessfully)
                     return result;
 …
 template <typename T>
+template <bool shouldBuildStrings> typename Lexer<T>::StringParseResult Lexer<T>::parseTemplateLiteral(JSTokenData* tokenData, RawStringsBuildMode rawStringsBuildMode)
+{
+typename Lexer<T>::StringParseResult Lexer<T>::parseTemplateLiteral(JSTokenData* tokenData, RawStringsBuildMode rawStringsBuildMode)
+{
+    bool parseCookedFailed = false;
     const T* stringStart = currentSourcePtr();
     const T* rawStringStart = currentSourcePtr();
 …
         if (UNLIKELY(m_current == '\\')) {
             lineNumberAdder.clear();
             if (stringStart != currentSourcePtr() && shouldBuildStrings)
+            if (stringStart != currentSourcePtr())
                 append16(stringStart, currentSourcePtr() - stringStart);
             shift();
 …
             // Most common escape sequences first.
             if (escape) {
+                if (shouldBuildStrings)
+                    record16(escape);
+                record16(escape);
                 shift();
             } else if (UNLIKELY(isLineTerminator(m_current))) {
                 // Normalize <CR>, <CR><LF> to <LF>.
                 if (m_current == '\r') {
+                    if (shouldBuildStrings) {
+                        ASSERT_WITH_MESSAGE(rawStringStart != currentSourcePtr(), "We should have at least shifted the escape.");
+                        if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings) {
+                            m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+                            m_bufferForRawTemplateString16.append('\n');
+                        }
+                    ASSERT_WITH_MESSAGE(rawStringStart != currentSourcePtr(), "We should have at least shifted the escape.");
+                    if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings) {
+                        m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+                        m_bufferForRawTemplateString16.append('\n');
+                    }
 …
             } else {
                 bool strictMode = true;
+                StringParseResult result = parseComplexEscape<shouldBuildStrings>(EscapeParseMode::Template, strictMode, '`');
+                if (result != StringParsedSuccessfully)
+                    return result;
+                StringParseResult result = parseComplexEscape<true, LexerEscapeParseMode::Template>(strictMode, '`');
+                if (result != StringParsedSuccessfully) {
+                    if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings && result == StringCannotBeParsed)
+                        parseCookedFailed = true;
+                    else
+                        return result;
+                }
+            }
 …
                 if (m_current == '\r') {
                     // Normalize <CR>, <CR><LF> to <LF>.
+                    if (shouldBuildStrings) {
+                        if (stringStart != currentSourcePtr())
+                            append16(stringStart, currentSourcePtr() - stringStart);
+                        if (rawStringStart != currentSourcePtr() && rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+                            m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+                        record16('\n');
+                        if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+                            m_bufferForRawTemplateString16.append('\n');
+                    }
+                    if (stringStart != currentSourcePtr())
+                        append16(stringStart, currentSourcePtr() - stringStart);
+                    if (rawStringStart != currentSourcePtr() && rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+                        m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+                    record16('\n');
+                    if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+                        m_bufferForRawTemplateString16.append('\n');
                     lineNumberAdder.add(m_current);
                     shift();
 …
     bool isTail = m_current == '`';
+    if (shouldBuildStrings) {
+        if (currentSourcePtr() != stringStart)
+            append16(stringStart, currentSourcePtr() - stringStart);
+        if (rawStringStart != currentSourcePtr() && rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+            m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+    }
+    if (shouldBuildStrings) {
+    if (currentSourcePtr() != stringStart)
+        append16(stringStart, currentSourcePtr() - stringStart);
+    if (rawStringStart != currentSourcePtr() && rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+        m_bufferForRawTemplateString16.append(rawStringStart, currentSourcePtr() - rawStringStart);
+    if (!parseCookedFailed)
         tokenData->cooked = makeIdentifier(m_buffer16.data(), m_buffer16.size());
         // Line terminator normalization (e.g. <CR> => <LF>) should be applied to both the raw and cooked representations.
         if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+            tokenData->raw = makeIdentifier(m_bufferForRawTemplateString16.data(), m_bufferForRawTemplateString16.size());
         else
             tokenData->raw = makeEmptyIdentifier();
     } else {
         tokenData->cooked = makeEmptyIdentifier();
         tokenData->raw = makeEmptyIdentifier();
+    }
+    else
+        tokenData->cooked = nullptr;
+    // Line terminator normalization (e.g. <CR> => <LF>) should be applied to both the raw and cooked representations.
+    if (rawStringsBuildMode == RawStringsBuildMode::BuildRawStrings)
+        tokenData->raw = makeIdentifier(m_bufferForRawTemplateString16.data(), m_bufferForRawTemplateString16.size());
+    else
+        tokenData->raw = nullptr;
     tokenData->isTail = isTail;
 …
         shift();
         token = SEMICOLON;
+        break;
+    case CharacterBackQuote:
+        shift();
+        token = BACKQUOTE;
         break;
     case CharacterOpenBrace:
 …
         break;
+        }
-    case CharacterBackQuote: {
-        // Skip backquote.
-        shift();
-        StringParseResult result = StringCannotBeParsed;
-        if (lexerFlags & LexerFlagsDontBuildStrings)
-            result = parseTemplateLiteral<false>(tokenData, RawStringsBuildMode::BuildRawStrings);
-        else
-            result = parseTemplateLiteral<true>(tokenData, RawStringsBuildMode::BuildRawStrings);
-        if (UNLIKELY(result != StringParsedSuccessfully)) {
-            token = result == StringUnterminated ? UNTERMINATED_TEMPLATE_LITERAL_ERRORTOK : INVALID_TEMPLATE_LITERAL_ERRORTOK;
-            goto returnError;
+        }
-        token = TEMPLATE;
-        break;
+        }
     case CharacterIdentifierStart:
         ASSERT(isIdentStart(m_current));
 …
 template <typename T>
 JSTokenType Lexer<T>::scanTrailingTemplateString(JSToken* tokenRecord, RawStringsBuildMode rawStringsBuildMode)
+JSTokenType Lexer<T>::scanTemplateString(JSToken* tokenRecord, RawStringsBuildMode rawStringsBuildMode)
+{
     JSTokenData* tokenData = &tokenRecord->m_data;
 …
     ASSERT(m_buffer16.isEmpty());
     // Leading closing brace } is already shifted in the previous token scan.
+    // Leading backquote ` (for template head) or closing brace } (for template trailing) are already shifted in the previous token scan.
     // So in this re-scan phase, shift() is not needed here.
     StringParseResult result = parseTemplateLiteral<true>(tokenData, rawStringsBuildMode);
+    StringParseResult result = parseTemplateLiteral(tokenData, rawStringsBuildMode);
     JSTokenType token = ERRORTOK;
     if (UNLIKELY(result != StringParsedSuccessfully)) {

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 211319 in webkit for trunk/Source/JavaScriptCore/parser/Lexer.cpp

Legend:

trunk/Source/JavaScriptCore/parser/Lexer.cpp

Download in other formats: