Changeset 24453 in webkit for trunk/JavaScriptCore/kjs/regexp.cpp


Ignore:
Timestamp:
Jul 19, 2007, 2:10:40 PM (18 years ago)
Author:
darin
Message:

Reviewed by Geoff.

  • fix <rdar://problem/5345440> PCRE computes wrong length for expressions with quantifiers on named recursion or subexpressions

It's challenging to implement proper preflighting for compiling these advanced features.
But we don't want them in the JavaScript engine anyway.

Turned off the following features of PCRE (some of these are simply parsed and not implemented):

\C \E \G \L \N \P \Q \U \X \Z
\e \l \p \u \z
[::] .. [==]
(?#) (?<=) (?<!) (?>)
(?C) (?P) (?R)
(?0) (and 1-9)
(?imsxUX)

Added the following:

\u \v

Because of \v, the js1_2/regexp/special_characters.js test now passes.

To be conservative, I left some features that JavaScript doesn't want, such as
\012 and \x{2013}, in place. We can revisit these later; they're not directly-enough
related to avoiding the incorrect preflighting.

I also didn't try to remove unused opcodes and remove code from the execution engine.
That could save code size and speed things up a bit, but it would require more changes.

  • kjs/regexp.h:
  • kjs/regexp.cpp: (KJS::RegExp::RegExp): Remove the sanitizePattern workaround for lack of \u support, since the PCRE code now has \u support.
  • pcre/pcre-config.h: Set JAVASCRIPT to 1.
  • pcre/pcre_internal.h: Added ESC_v.
  • pcre/pcre_compile.c: Added a different escape table for when JAVASCRIPT is set that omits all the escapes we don't want interpreted and includes '\v'. (check_escape): Put !JAVASCRIPT around the code for '\l', '\L', '\N', '\u', and '\U', and added code to handle '\u2013' inside JAVASCRIPT. (compile_branch): Put !JAVASCRIPT if around all the code implementing the features we don't want. (pcre_compile2): Ditto.
  • tests/mozilla/expected.html: Updated since js1_2/regexp/special_characters.js now passes.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/kjs/regexp.cpp

    r18517 r24453  
    5151                        options, &errorMessage, &errorOffset, NULL);
    5252  if (!m_regex) {
    53     // Try again, this time handle any \u we might find.
    54     UString uPattern = sanitizePattern(p);
    55     m_regex = pcre_compile(reinterpret_cast<const uint16_t*>(uPattern.data()), uPattern.size(),
    56                           options, &errorMessage, &errorOffset, NULL);
    57     if (!m_regex) {
    58       m_constructionError = strdup(errorMessage);
    59       return;
    60     }
     53    m_constructionError = strdup(errorMessage);
     54    return;
    6155  }
    6256
     
    190184}
    191185
    192 UString RegExp::sanitizePattern(const UString& p)
    193 {
    194   UString newPattern;
    195  
    196   int startPos = 0;
    197   int pos = p.find("\\u", 0) + 2; // Skip the \u
    198  
    199   while (pos != 1) { // p.find failing is -1 + 2 = 1
    200     if (pos + 3 < p.size()) {
    201       if (isHexDigit(p[pos]) && isHexDigit(p[pos + 1]) &&
    202           isHexDigit(p[pos + 2]) && isHexDigit(p[pos + 3])) {
    203         newPattern.append(p.substr(startPos, pos - startPos - 2));
    204         UChar escapedUnicode(convertUnicode(p[pos], p[pos + 1],
    205                                             p[pos + 2], p[pos + 3]));
    206         // \u encoded characters should be treated as if they were escaped,
    207         // so add an escape for certain characters that need it.
    208         switch (escapedUnicode.unicode()) {
    209           case '|':
    210           case '+':
    211           case '*':
    212           case '(':
    213           case ')':
    214           case '[':
    215           case ']':
    216           case '{':
    217           case '}':
    218           case '?':
    219           case '\\':
    220             newPattern.append('\\');
    221         }
    222         newPattern.append(escapedUnicode);
    223 
    224         startPos = pos + 4;
    225       }
    226     }
    227     pos = p.find("\\u", pos) + 2;
    228   }
    229   newPattern.append(p.substr(startPos, p.size() - startPos));
    230 
    231   return newPattern;
    232 }
    233 
    234186bool RegExp::isHexDigit(UChar uc)
    235187{
Note: See TracChangeset for help on using the changeset viewer.