Context Navigation

← Previous Change
Next Change →

pcre_exec.cpp

Timestamp:

Nov 13, 2007, 9:25:26 AM (18 years ago)

Author:

Darin Adler

Message:

JavaScriptCore:

Reviewed by Geoff.

fix http://bugs.webkit.org/show_bug.cgi?id=11231 RegExp bug when handling newline characters and a number of other differences between PCRE behvior and JavaScript regular expressions:

+ single-digit sequences like \4 should be treated as octal

character constants, unless there is a sufficient number
of brackets for them to be treated as backreferences

+ \8 turns into the character "8", not a binary zero character

followed by "8" (same for 9)

+ only the first 3 digits should be considered part of an

octal character constant (the old behavior was to decode
an arbitrarily long sequence and then mask with 0xFF)

+ if \x is followed by anything other than two valid hex digits,

then it should simply be treated a the letter "x"; that includes
not supporting the \x{41} syntax

+ if \u is followed by anything less than four valid hex digits,

then it should simply be treated a the letter "u"

+ an extra "+" should be a syntax error, rather than being treated

as the "possessive quantifier"

+ if a "]" character appears immediately after a "[" character that

starts a character class, then that's an empty character class,
rather than being the start of a character class that includes a
"]" character

+ a "$" should not match a terminating newline; we could have gotten

PCRE to handle this the way we wanted by passing an appropriate option

Test: fast/js/regexp-no-extensions.html

pcre/pcre_compile.cpp: (check_escape): Check backreferences against bracount to catch both overflows and things that should be treated as octal. Rewrite octal loop to not go on indefinitely. Rewrite both hex loops to match and remove \x{} support. (compile_branch): Restructure loops so that we don't special-case a "]" at the beginning of a character class. Remove code that treated "+" as the possessive quantifier. (jsRegExpCompile): Change the "]" handling here too.

pcre/pcre_exec.cpp: (match): Changed CIRC to match the DOLL implementation. Changed DOLL to remove handling of "terminating newline", a Perl concept which we don't need.

tests/mozilla/expected.html: Two tests are fixed now: ecma_3/RegExp/regress-100199.js and ecma_3/RegExp/regress-188206.js. One test fails now: ecma_3/RegExp/perlstress-002.js -- our success before was due to a bug (we treated all 1-character numeric escapes as backreferences). The date tests also now both expect success -- whatever was making them fail before was probably due to the time being close to a DST shift; maybe we need to get rid of those tests.

LayoutTests:

Reviewed by Geoff.

test for http://bugs.webkit.org/show_bug.cgi?id=11231 RegExp bug when handling newline characters and other regular expression behavior that is different for JavaScript and PCRE

fast/js/regexp-no-extensions-expected.txt: Added.
fast/js/regexp-no-extensions.html: Added.
fast/js/resources/regexp-no-extensions.js: Added.

File:

: 1 edited

trunk/JavaScriptCore/pcre/pcre_exec.cpp (modified) (5 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/pcre/pcre_exec.cpp

-              r27733
+              r27752
     BEGIN_OPCODE(KETRMIN):
     BEGIN_OPCODE(KETRMAX):
+      {
       frame->prev = frame->ecode - GET(frame->ecode, 1);
       frame->saved_eptr = frame->eptrb->epb_saved_eptr;
 …
         if (is_match) RRETURN;
+        }
+      }
     RRETURN;
     /* Start of subject unless notbol, or after internal newline if multiline */
+    /* Start of subject, or after internal newline if multiline. */
     BEGIN_OPCODE(CIRC):
+    if (md->multiline)
+      {
+      if (frame->eptr != md->start_subject && !IS_NEWLINE(frame->eptr[-1]))
+        RRETURN_NO_MATCH;
+      frame->ecode++;
+      NEXT_OPCODE;
+      }
+    if (frame->eptr != md->start_subject) RRETURN_NO_MATCH;
+    if (frame->eptr != md->start_subject && (!md->multiline || !IS_NEWLINE(frame->eptr[-1])))
+      RRETURN_NO_MATCH;
     frame->ecode++;
     NEXT_OPCODE;
+    /* Assert before internal newline if multiline, or before a terminating
+    newline unless endonly is set, else end of subject unless noteol is set. */
+    /* End of subject, or before internal newline if multiline. */
     BEGIN_OPCODE(DOLL):
+    if (md->multiline)
+      {
+      if (frame->eptr < md->end_subject)
+        { if (!IS_NEWLINE(*frame->eptr)) RRETURN_NO_MATCH; }
+      frame->ecode++;
+      }
+    else
+      {
+      if (frame->eptr < md->end_subject - 1 ||
+         (frame->eptr == md->end_subject - 1 && !IS_NEWLINE(*frame->eptr)))
+        RRETURN_NO_MATCH;
+      frame->ecode++;
+      }
+    if (frame->eptr < md->end_subject && (!md->multiline || !IS_NEWLINE(*frame->eptr)))
+      RRETURN_NO_MATCH;
+    frame->ecode++;
     NEXT_OPCODE;
 …
     BEGIN_OPCODE(NOT_WORD_BOUNDARY):
     BEGIN_OPCODE(WORD_BOUNDARY):
+      {
       /* Find out if the previous and current characters are "word" characters.
       It takes a bit more work in UTF-8 mode. Characters > 128 are assumed to
       be "non-word" characters. */
+        {
         if (frame->eptr == md->start_subject) prev_is_word = false; else
+          {
 …
           cur_is_word = c < 128 && (md->ctypes[c] & ctype_word) != 0;
+          }
+        }
       /* Now see if the situation is what we want */
 …
            cur_is_word == prev_is_word : cur_is_word != prev_is_word)
         RRETURN_NO_MATCH;
+      }
     NEXT_OPCODE;

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 27752 in webkit for trunk/JavaScriptCore/pcre/pcre_exec.cpp

Legend:

trunk/JavaScriptCore/pcre/pcre_exec.cpp

Download in other formats: