Context Navigation

← Previous Change
Next Change →

regexp.cpp

Timestamp:

Jul 19, 2007, 2:10:40 PM (18 years ago)

Author:

darin

Message:

Reviewed by Geoff.

fix <rdar://problem/5345440> PCRE computes wrong length for expressions with quantifiers on named recursion or subexpressions

It's challenging to implement proper preflighting for compiling these advanced features.
But we don't want them in the JavaScript engine anyway.

Turned off the following features of PCRE (some of these are simply parsed and not implemented):

\C \E \G \L \N \P \Q \U \X \Z
\e \l \p \u \z
[::] .. [==]
(?#) (?<=) (?<!) (?>)
(?C) (?P) (?R)
(?0) (and 1-9)
(?imsxUX)

Added the following:

\u \v

Because of \v, the js1_2/regexp/special_characters.js test now passes.

To be conservative, I left some features that JavaScript doesn't want, such as
\012 and \x{2013}, in place. We can revisit these later; they're not directly-enough
related to avoiding the incorrect preflighting.

I also didn't try to remove unused opcodes and remove code from the execution engine.
That could save code size and speed things up a bit, but it would require more changes.

kjs/regexp.h:
kjs/regexp.cpp: (KJS::RegExp::RegExp): Remove the sanitizePattern workaround for lack of \u support, since the PCRE code now has \u support.

pcre/pcre-config.h: Set JAVASCRIPT to 1.
pcre/pcre_internal.h: Added ESC_v.

pcre/pcre_compile.c: Added a different escape table for when JAVASCRIPT is set that omits all the escapes we don't want interpreted and includes '\v'. (check_escape): Put !JAVASCRIPT around the code for '\l', '\L', '\N', '\u', and '\U', and added code to handle '\u2013' inside JAVASCRIPT. (compile_branch): Put !JAVASCRIPT if around all the code implementing the features we don't want. (pcre_compile2): Ditto.

tests/mozilla/expected.html: Updated since js1_2/regexp/special_characters.js now passes.

File:

: 1 edited

trunk/JavaScriptCore/kjs/regexp.cpp (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/kjs/regexp.cpp

-              r18517
+              r24453
                         options, &errorMessage, &errorOffset, NULL);
   if (!m_regex) {
+    // Try again, this time handle any \u we might find.
+    UString uPattern = sanitizePattern(p);
+    m_regex = pcre_compile(reinterpret_cast<const uint16_t*>(uPattern.data()), uPattern.size(),
+                          options, &errorMessage, &errorOffset, NULL);
+    if (!m_regex) {
+      m_constructionError = strdup(errorMessage);
+      return;
+    }
+    m_constructionError = strdup(errorMessage);
+    return;
+  }
 …
+}
-UString RegExp::sanitizePattern(const UString& p)
+{
-  UString newPattern;
-  int startPos = 0;
-  int pos = p.find("\\u", 0) + 2; // Skip the \u
-  while (pos != 1) { // p.find failing is -1 + 2 = 1
-    if (pos + 3 < p.size()) {
-      if (isHexDigit(p[pos]) && isHexDigit(p[pos + 1]) &&
-          isHexDigit(p[pos + 2]) && isHexDigit(p[pos + 3])) {
-        newPattern.append(p.substr(startPos, pos - startPos - 2));
-        UChar escapedUnicode(convertUnicode(p[pos], p[pos + 1],
-                                            p[pos + 2], p[pos + 3]));
-        // \u encoded characters should be treated as if they were escaped,
-        // so add an escape for certain characters that need it.
-        switch (escapedUnicode.unicode()) {
-          case '|':
-          case '+':
-          case '*':
-          case '(':
-          case ')':
-          case '[':
-          case ']':
-          case '{':
-          case '}':
-          case '?':
-          case '\\':
-            newPattern.append('\\');
+        }
-        newPattern.append(escapedUnicode);
-        startPos = pos + 4;
+      }
+    }
-    pos = p.find("\\u", pos) + 2;
+  }
-  newPattern.append(p.substr(startPos, p.size() - startPos));
-  return newPattern;
+}
 bool RegExp::isHexDigit(UChar uc)
+{

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 24453 in webkit for trunk/JavaScriptCore/kjs/regexp.cpp

Legend:

trunk/JavaScriptCore/kjs/regexp.cpp

Download in other formats: