Ignore:
Timestamp:
Dec 8, 2010, 9:40:29 PM (15 years ago)
Author:
[email protected]
Message:

Permit Character Class Escape in CharacterRange in Character Class.
https://bugs.webkit.org/show_bug.cgi?id=50483
https://bugs.webkit.org/show_bug.cgi?id=50538
https://bugs.webkit.org/show_bug.cgi?id=50654
https://bugs.webkit.org/show_bug.cgi?id=50646

Reviewed by Sam Weinig.

We recently tightened up our spec conformance in generating syntax
error in these cases, however testing in the wild has shown this
to be problematic. This reverts the previous change in allowing
class escapes (e.g. \d) in ranges in character classes ([]), but
does retain some closer conformance to the spec in only allowing
ranges that would be permitted per the grammar rules in the spec
(e.g. in /[\d-a-z]/ "a-z" cannot be considered as a range).

JavaScriptCore:

  • yarr/RegexParser.h:

(JSC::Yarr::Parser::CharacterClassParserDelegate::atomPatternCharacter):
(JSC::Yarr::Parser::CharacterClassParserDelegate::atomBuiltInCharacterClass):
(JSC::Yarr::Parser::parse):

LayoutTests:

  • fast/js/regexp-ranges-and-escaped-hyphens-expected.txt:
  • fast/js/script-tests/regexp-ranges-and-escaped-hyphens.js:
  • fast/regex/invalid-range-in-class-expected.txt:
  • fast/regex/pcre-test-1-expected.txt:
  • fast/regex/script-tests/invalid-range-in-class.js:
  • fast/regex/script-tests/pcre-test-1.js:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/yarr/RegexParser.h

    r72999 r73594  
    5959        ParenthesesTypeInvalid,
    6060        CharacterClassUnmatched,
    61         CharacterClassInvalidRange,
    6261        CharacterClassOutOfOrder,
    6362        EscapeUnterminated,
     
    143142                return;
    144143
     144                // See coment in atomBuiltInCharacterClass below.
     145                // This too is technically an error, per ECMA-262, and again we
     146                // we chose to allow this.  Note a subtlely here that while we
     147                // diverge from the spec's definition of CharacterRange we do
     148                // remain in compliance with the grammar.  For example, consider
     149                // the expression /[\d-a-z]/.  We comply with the grammar in
     150                // this case by not allowing a-z to be matched as a range.
    145151            case AfterCharacterClassHyphen:
    146                 // Error! We have something like /[\d-x]/.
    147                 m_err = CharacterClassInvalidRange;
     152                m_delegate.atomCharacterClassAtom(ch);
     153                m_state = Empty;
    148154                return;
    149155            }
     
    168174                return;
    169175
     176                // If we hit either of these cases, we have an invalid range that
     177                // looks something like /[x-\d]/ or /[\d-\d]/.
     178                // According to ECMA-262 this should be a syntax error, but
     179                // empirical testing shows this to break teh webz.  Instead we
     180                // comply with to the ECMA-262 grammar, and assume the grammar to
     181                // have matched the range correctly, but tweak our interpretation
     182                // of CharacterRange.  Effectively we implicitly handle the hyphen
     183                // as if it were escaped, e.g. /[\w-_]/ is treated as /[\w\-_]/.
    170184            case CachedCharacterHyphen:
     185                m_delegate.atomCharacterClassAtom(m_character);
     186                m_delegate.atomCharacterClassAtom('-');
     187                // fall through
    171188            case AfterCharacterClassHyphen:
    172                 // Error! If we hit either of these cases, we have an
    173                 // invalid range that looks something like /[x-\d]/
    174                 // or /[\d-\d]/.
    175                 m_err = CharacterClassInvalidRange;
     189                m_delegate.atomCharacterClassBuiltIn(classID, invert);
     190                m_state = Empty;
    176191                return;
    177192            }
     
    682697            "unrecognized character after (?",
    683698            "missing terminating ] for character class",
    684             "invalid range in character class",
    685699            "range out of order in character class",
    686700            "\\ at end of pattern"
Note: See TracChangeset for help on using the changeset viewer.