Ignore:
Timestamp:
Nov 13, 2007, 9:25:26 AM (18 years ago)
Author:
Darin Adler
Message:

JavaScriptCore:

Reviewed by Geoff.

+ single-digit sequences like \4 should be treated as octal

character constants, unless there is a sufficient number
of brackets for them to be treated as backreferences

+ \8 turns into the character "8", not a binary zero character

followed by "8" (same for 9)

+ only the first 3 digits should be considered part of an

octal character constant (the old behavior was to decode
an arbitrarily long sequence and then mask with 0xFF)

+ if \x is followed by anything other than two valid hex digits,

then it should simply be treated a the letter "x"; that includes
not supporting the \x{41} syntax

+ if \u is followed by anything less than four valid hex digits,

then it should simply be treated a the letter "u"

+ an extra "+" should be a syntax error, rather than being treated

as the "possessive quantifier"

+ if a "]" character appears immediately after a "[" character that

starts a character class, then that's an empty character class,
rather than being the start of a character class that includes a
"]" character

+ a "$" should not match a terminating newline; we could have gotten

PCRE to handle this the way we wanted by passing an appropriate option

Test: fast/js/regexp-no-extensions.html

  • pcre/pcre_compile.cpp: (check_escape): Check backreferences against bracount to catch both overflows and things that should be treated as octal. Rewrite octal loop to not go on indefinitely. Rewrite both hex loops to match and remove \x{} support. (compile_branch): Restructure loops so that we don't special-case a "]" at the beginning of a character class. Remove code that treated "+" as the possessive quantifier. (jsRegExpCompile): Change the "]" handling here too.
  • pcre/pcre_exec.cpp: (match): Changed CIRC to match the DOLL implementation. Changed DOLL to remove handling of "terminating newline", a Perl concept which we don't need.
  • tests/mozilla/expected.html: Two tests are fixed now: ecma_3/RegExp/regress-100199.js and ecma_3/RegExp/regress-188206.js. One test fails now: ecma_3/RegExp/perlstress-002.js -- our success before was due to a bug (we treated all 1-character numeric escapes as backreferences). The date tests also now both expect success -- whatever was making them fail before was probably due to the time being close to a DST shift; maybe we need to get rid of those tests.

LayoutTests:

Reviewed by Geoff.

  • fast/js/regexp-no-extensions-expected.txt: Added.
  • fast/js/regexp-no-extensions.html: Added.
  • fast/js/resources/regexp-no-extensions.js: Added.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/pcre/pcre_exec.cpp

    r27733 r27752  
    604604    BEGIN_OPCODE(KETRMIN):
    605605    BEGIN_OPCODE(KETRMAX):
    606       {
    607606      frame->prev = frame->ecode - GET(frame->ecode, 1);
    608607      frame->saved_eptr = frame->eptrb->epb_saved_eptr;
     
    682681        if (is_match) RRETURN;
    683682        }
    684       }
    685683    RRETURN;
    686684
    687     /* Start of subject unless notbol, or after internal newline if multiline */
     685    /* Start of subject, or after internal newline if multiline. */
    688686
    689687    BEGIN_OPCODE(CIRC):
    690     if (md->multiline)
    691       {
    692       if (frame->eptr != md->start_subject && !IS_NEWLINE(frame->eptr[-1]))
    693         RRETURN_NO_MATCH;
    694       frame->ecode++;
    695       NEXT_OPCODE;
    696       }
    697     if (frame->eptr != md->start_subject) RRETURN_NO_MATCH;
     688    if (frame->eptr != md->start_subject && (!md->multiline || !IS_NEWLINE(frame->eptr[-1])))
     689      RRETURN_NO_MATCH;
    698690    frame->ecode++;
    699691    NEXT_OPCODE;
    700692
    701     /* Assert before internal newline if multiline, or before a terminating
    702     newline unless endonly is set, else end of subject unless noteol is set. */
     693    /* End of subject, or before internal newline if multiline. */
    703694
    704695    BEGIN_OPCODE(DOLL):
    705     if (md->multiline)
    706       {
    707       if (frame->eptr < md->end_subject)
    708         { if (!IS_NEWLINE(*frame->eptr)) RRETURN_NO_MATCH; }
    709       frame->ecode++;
    710       }
    711     else
    712       {
    713       if (frame->eptr < md->end_subject - 1 ||
    714          (frame->eptr == md->end_subject - 1 && !IS_NEWLINE(*frame->eptr)))
    715         RRETURN_NO_MATCH;
    716       frame->ecode++;
    717       }
     696    if (frame->eptr < md->end_subject && (!md->multiline || !IS_NEWLINE(*frame->eptr)))
     697      RRETURN_NO_MATCH;
     698    frame->ecode++;
    718699    NEXT_OPCODE;
    719700
     
    722703    BEGIN_OPCODE(NOT_WORD_BOUNDARY):
    723704    BEGIN_OPCODE(WORD_BOUNDARY):
    724       {
    725705      /* Find out if the previous and current characters are "word" characters.
    726706      It takes a bit more work in UTF-8 mode. Characters > 128 are assumed to
    727707      be "non-word" characters. */
    728708
    729         {
    730709        if (frame->eptr == md->start_subject) prev_is_word = false; else
    731710          {
     
    740719          cur_is_word = c < 128 && (md->ctypes[c] & ctype_word) != 0;
    741720          }
    742         }
    743721
    744722      /* Now see if the situation is what we want */
     
    747725           cur_is_word == prev_is_word : cur_is_word != prev_is_word)
    748726        RRETURN_NO_MATCH;
    749       }
    750727    NEXT_OPCODE;
    751728
Note: See TracChangeset for help on using the changeset viewer.