Ignore:
Timestamp:
Mar 8, 2016, 10:35:58 AM (9 years ago)
Author:
[email protected]
Message:

[ES6] Regular Expression canonicalization tables for Unicode need to be updated to use Unicode CaseFolding.txt
https://bugs.webkit.org/show_bug.cgi?id=155114

Reviewed by Darin Adler.

Source/JavaScriptCore:

Extracted out the Unicode canonicalization table creation from
YarrCanonicalizeUnicode.js into a new Python script, generateYarrCanonicalizeUnicode.
That script generates the Unicode tables as the file YarrCanonicalizeUnicode.cpp in
DerivedSources/JavaScriptCore.

Updated the processing of ignore case to make the ASCII short cuts dependent on whether
or not we are a Unicode pattern.

Renamed yarr/YarrCanonicalizeUnicode.{cpp,js} back to their prior names,
YarrCanonicalizeUCS2.{cpp,js}.
Renamed yarr/YarrCanonicalizeUnicode.h to YarrCanonicalize.h as it declares both the
legacy UCS2 and Unicode tables.

  • CMakeLists.txt:
  • DerivedSources.make:
  • JavaScriptCore.xcodeproj/project.pbxproj:
  • generateYarrCanonicalizeUnicode: Added.
  • ucd: Added.
  • ucd/CaseFolding.txt: Added. The current verion, 8.0, of the Unicode CaseFolding table.
  • yarr/YarrCanonicalizeUCS2.cpp: Copied from Source/JavaScriptCore/yarr/YarrCanonicalizeUnicode.cpp.
  • yarr/YarrCanonicalize.h: Copied from Source/JavaScriptCore/yarr/YarrCanonicalizeUnicode.h.
  • yarr/YarrCanonicalizeUCS2.js: Copied from Source/JavaScriptCore/yarr/YarrCanonicalizeUnicode.js.

(printHeader):

  • yarr/YarrCanonicalizeUnicode.cpp: Removed.
  • yarr/YarrCanonicalizeUnicode.h: Removed.
  • yarr/YarrCanonicalizeUnicode.js: Removed.
  • yarr/YarrInterpreter.cpp:

(JSC::Yarr::Interpreter::tryConsumeBackReference):

  • yarr/YarrJIT.cpp:
  • yarr/YarrPattern.cpp:

(JSC::Yarr::CharacterClassConstructor::putChar):

LayoutTests:

Updated test cases.

  • js/regexp-unicode-expected.txt:
  • js/script-tests/regexp-unicode.js:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp

    r197534 r197781  
    2929
    3030#include "Yarr.h"
    31 #include "YarrCanonicalizeUnicode.h"
     31#include "YarrCanonicalize.h"
    3232#include <wtf/BumpPointerAllocator.h>
    3333#include <wtf/DataLog.h>
     
    378378
    379379            if (pattern->m_ignoreCase) {
    380                 // The definition for canonicalize (see ES 6.0, 15.10.2.8) means that
    381                 // unicode values are never allowed to match against ascii ones.
    382                 if (isASCII(oldCh) || isASCII(ch)) {
     380                // See ES 6.0, 21.2.2.8.2 for the definition of Canonicalize(). For non-Unicode
     381                // patterns, Unicode values are never allowed to match against ASCII ones.
     382                // For Unicode, we need to check all canonical equivalents of a character.
     383                if (!unicode && (isASCII(oldCh) || isASCII(ch))) {
    383384                    if (toASCIIUpper(oldCh) == toASCIIUpper(ch))
    384385                        continue;
Note: See TracChangeset for help on using the changeset viewer.