Context Navigation

← Previous Change
Next Change →

Lexer.cpp

Timestamp:

Jul 3, 2010, 1:30:24 PM (15 years ago)

Author:

[email protected]

Message:

Move BOM handling out of the lexer and parser
https://bugs.webkit.org/show_bug.cgi?id=41539

Reviewed by Geoffrey Garen.

JavaScriptCore:

Doing the BOM stripping in the lexer meant that we could
end up having to strip the BOMs from a source multiple times.
To deal with this we now require all strings provided by
a SourceProvider to already have had the BOMs stripped.
This also simplifies some of the lexer logic.

parser/Lexer.cpp:

(JSC::Lexer::setCode):
(JSC::Lexer::sourceCode):

parser/SourceProvider.h:

(JSC::SourceProvider::SourceProvider):
(JSC::UStringSourceProvider::create):
(JSC::UStringSourceProvider::getRange):
(JSC::UStringSourceProvider::UStringSourceProvider):

wtf/text/StringImpl.h:

(WebCore::StringImpl::copyStringWithoutBOMs):

WebCore:

Update WebCore to ensure that SourceProviders don't
produce strings with BOMs in them.

bindings/js/ScriptSourceProvider.h:

(WebCore::ScriptSourceProvider::ScriptSourceProvider):

bindings/js/StringSourceProvider.h:

(WebCore::StringSourceProvider::StringSourceProvider):

loader/CachedScript.cpp:

(WebCore::CachedScript::CachedScript):
(WebCore::CachedScript::script):

loader/CachedScript.h:

(WebCore::CachedScript::):

CachedScript now stores decoded data with the BOMs stripped,
and caches the presence of BOMs across memory purges.

File:

: 1 edited

trunk/JavaScriptCore/parser/Lexer.cpp (modified) (4 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/parser/Lexer.cpp

-              r62416
+              r62449
 namespace JSC {
-static const UChar byteOrderMark = 0xFEFF;
 enum CharacterTypes {
 …
     m_buffer16.reserveInitialCapacity((m_codeEnd - m_code) / 2);
-    // ECMA-262 calls for stripping all Cf characters, but we only strip BOM characters.
-    // See <https://bugs.webkit.org/show_bug.cgi?id=4931> for details.
-    if (source.provider()->hasBOMs()) {
-        for (const UChar* p = m_codeStart; p < m_codeEnd; ++p) {
-            if (UNLIKELY(*p == byteOrderMark)) {
-                copyCodeWithoutBOMs();
-                break;
+            }
+        }
+    }
     if (LIKELY(m_code < m_codeEnd))
         m_current = *m_code;
 …
         m_current = -1;
     ASSERT(currentOffset() == source.startOffset());
+}
-void Lexer::copyCodeWithoutBOMs()
+{
-    // Note: In this case, the character offset data for debugging will be incorrect.
-    // If it's important to correctly debug code with extraneous BOMs, then the caller
-    // should strip the BOMs when creating the SourceProvider object and do its own
-    // mapping of offsets within the stripped text to original text offset.
-    m_codeWithoutBOMs.reserveCapacity(m_codeEnd - m_code);
-    for (const UChar* p = m_code; p < m_codeEnd; ++p) {
-        UChar c = *p;
-        if (c != byteOrderMark)
-            m_codeWithoutBOMs.append(c);
+    }
-    ptrdiff_t startDelta = m_codeStart - m_code;
-    m_code = m_codeWithoutBOMs.data();
-    m_codeStart = m_code + startDelta;
-    m_codeEnd = m_codeWithoutBOMs.data() + m_codeWithoutBOMs.size();
+}
 …
 SourceCode Lexer::sourceCode(int openBrace, int closeBrace, int firstLine)
+{
-    if (m_codeWithoutBOMs.isEmpty())
-        return SourceCode(m_source->provider(), openBrace, closeBrace + 1, firstLine);
-    const UChar* data = m_source->provider()->data();
-    ASSERT(openBrace < closeBrace);
-    int i;
-    for (i = m_source->startOffset(); i < openBrace; ++i) {
-        if (data[i] == byteOrderMark) {
-            openBrace++;
-            closeBrace++;
+        }
+    }
-    for (; i < closeBrace; ++i) {
-        if (data[i] == byteOrderMark)
-            closeBrace++;
+    }
-    ASSERT(openBrace < closeBrace);
     return SourceCode(m_source->provider(), openBrace, closeBrace + 1, firstLine);
+}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 62449 in webkit for trunk/JavaScriptCore/parser/Lexer.cpp

Legend:

trunk/JavaScriptCore/parser/Lexer.cpp

Download in other formats: