Ignore:
Timestamp:
Dec 8, 2017, 12:32:42 PM (7 years ago)
Author:
[email protected]
Message:

YARR: JIT RegExps with greedy parenthesized sub patterns
https://bugs.webkit.org/show_bug.cgi?id=180538

Reviewed by JF Bastien.

This patch adds JIT support for regular expressions containing greedy counted
parenthesis. An example expression that couldn't be JIT'ed before is /q(a|b)*q/.

Just like in the interpreter, expressions with nested parenthetical subpatterns
require saving the results of previous matches of the parentheses contents along
with any associated state. This saved state is needed in the case that we need
to backtrack. This state is called ParenContext within the code space allocated
for this ParenContext is managed using a simple block allocator within the JIT'ed
code. The raw space managed by this allocator is passed into the JIT'ed function.

Since this fixed sized space may be exceeded, this patch adds a fallback mechanism.
If the JIT'ed code exhausts all its ParenContext space, it returns a new error
JSRegExpJITCodeFailure. The caller will then bytecompile and interpret the
expression.

Due to increased register usage by the parenthesis handling code, the use of
registers by the JIT engine was restructured, with registers used for Unicode
pattern matching replaced with constants.

Reworked some of the context structures that are used across the interpreter
and JIT implementations to make them a little more uniform and to handle the
needs of JIT'ing the new parentheses forms.

To help with development and debugging of this code, compiled patterns dumping
code was enhanced. Also added the ability to also dump interpreter ByteCodes.

  • runtime/RegExp.cpp:

(JSC::byteCodeCompilePattern):
(JSC::RegExp::byteCodeCompileIfNecessary):
(JSC::RegExp::compile):
(JSC::RegExp::compileMatchOnly):

  • runtime/RegExp.h:
  • runtime/RegExpInlines.h:

(JSC::RegExp::matchInline):

  • testRegExp.cpp:

(parseRegExpLine):
(runFromFiles):

  • yarr/Yarr.h:
  • yarr/YarrInterpreter.cpp:

(JSC::Yarr::ByteCompiler::compile):
(JSC::Yarr::ByteCompiler::dumpDisjunction):

  • yarr/YarrJIT.cpp:

(JSC::Yarr::YarrGenerator::ParenContextSizes::ParenContextSizes):
(JSC::Yarr::YarrGenerator::ParenContextSizes::numSubpatterns):
(JSC::Yarr::YarrGenerator::ParenContextSizes::frameSlots):
(JSC::Yarr::YarrGenerator::ParenContext::sizeFor):
(JSC::Yarr::YarrGenerator::ParenContext::nextOffset):
(JSC::Yarr::YarrGenerator::ParenContext::beginOffset):
(JSC::Yarr::YarrGenerator::ParenContext::matchAmountOffset):
(JSC::Yarr::YarrGenerator::ParenContext::subpatternOffset):
(JSC::Yarr::YarrGenerator::ParenContext::savedFrameOffset):
(JSC::Yarr::YarrGenerator::initParenContextFreeList):
(JSC::Yarr::YarrGenerator::allocatePatternContext):
(JSC::Yarr::YarrGenerator::freePatternContext):
(JSC::Yarr::YarrGenerator::savePatternContext):
(JSC::Yarr::YarrGenerator::restorePatternContext):
(JSC::Yarr::YarrGenerator::tryReadUnicodeCharImpl):
(JSC::Yarr::YarrGenerator::storeToFrame):
(JSC::Yarr::YarrGenerator::generateJITFailReturn):
(JSC::Yarr::YarrGenerator::clearMatches):
(JSC::Yarr::YarrGenerator::generate):
(JSC::Yarr::YarrGenerator::backtrack):
(JSC::Yarr::YarrGenerator::opCompileParenthesesSubpattern):
(JSC::Yarr::YarrGenerator::generateEnter):
(JSC::Yarr::YarrGenerator::generateReturn):
(JSC::Yarr::YarrGenerator::YarrGenerator):
(JSC::Yarr::YarrGenerator::compile):

  • yarr/YarrJIT.h:

(JSC::Yarr::YarrCodeBlock::execute):

  • yarr/YarrPattern.cpp:

(JSC::Yarr::indentForNestingLevel):
(JSC::Yarr::dumpUChar32):
(JSC::Yarr::dumpCharacterClass):
(JSC::Yarr::PatternTerm::dump):
(JSC::Yarr::YarrPattern::dumpPattern):

  • yarr/YarrPattern.h:

(JSC::Yarr::PatternTerm::containsAnyCaptures):
(JSC::Yarr::BackTrackInfoParenthesesOnce::returnAddressIndex):
(JSC::Yarr::BackTrackInfoParentheses::beginIndex):
(JSC::Yarr::BackTrackInfoParentheses::returnAddressIndex):
(JSC::Yarr::BackTrackInfoParentheses::matchAmountIndex):
(JSC::Yarr::BackTrackInfoParentheses::patternContextHeadIndex):
(JSC::Yarr::BackTrackInfoAlternative::offsetIndex): Deleted.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ChangeLog

    r225693 r225695  
     12017-12-08  Michael Saboff  <[email protected]>
     2
     3        YARR: JIT RegExps with greedy parenthesized sub patterns
     4        https://bugs.webkit.org/show_bug.cgi?id=180538
     5
     6        Reviewed by JF Bastien.
     7
     8        This patch adds JIT support for regular expressions containing greedy counted
     9        parenthesis.  An example expression that couldn't be JIT'ed before is /q(a|b)*q/.
     10
     11        Just like in the interpreter, expressions with nested parenthetical subpatterns
     12        require saving the results of previous matches of the parentheses contents along
     13        with any associated state.  This saved state is needed in the case that we need
     14        to backtrack.  This state is called ParenContext within the code space allocated
     15        for this ParenContext is managed using a simple block allocator within the JIT'ed
     16        code.  The raw space managed by this allocator is passed into the JIT'ed function.
     17
     18        Since this fixed sized space may be exceeded, this patch adds a fallback mechanism.
     19        If the JIT'ed code exhausts all its ParenContext space, it returns a new error
     20        JSRegExpJITCodeFailure.  The caller will then bytecompile and interpret the
     21        expression.
     22
     23        Due to increased register usage by the parenthesis handling code, the use of
     24        registers by the JIT engine was restructured, with registers used for Unicode
     25        pattern matching replaced with constants.
     26
     27        Reworked some of the context structures that are used across the interpreter
     28        and JIT implementations to make them a little more uniform and to handle the
     29        needs of JIT'ing the new parentheses forms.
     30
     31        To help with development and debugging of this code, compiled patterns dumping
     32        code was enhanced.  Also added the ability to also dump interpreter ByteCodes.
     33
     34        * runtime/RegExp.cpp:
     35        (JSC::byteCodeCompilePattern):
     36        (JSC::RegExp::byteCodeCompileIfNecessary):
     37        (JSC::RegExp::compile):
     38        (JSC::RegExp::compileMatchOnly):
     39        * runtime/RegExp.h:
     40        * runtime/RegExpInlines.h:
     41        (JSC::RegExp::matchInline):
     42        * testRegExp.cpp:
     43        (parseRegExpLine):
     44        (runFromFiles):
     45        * yarr/Yarr.h:
     46        * yarr/YarrInterpreter.cpp:
     47        (JSC::Yarr::ByteCompiler::compile):
     48        (JSC::Yarr::ByteCompiler::dumpDisjunction):
     49        * yarr/YarrJIT.cpp:
     50        (JSC::Yarr::YarrGenerator::ParenContextSizes::ParenContextSizes):
     51        (JSC::Yarr::YarrGenerator::ParenContextSizes::numSubpatterns):
     52        (JSC::Yarr::YarrGenerator::ParenContextSizes::frameSlots):
     53        (JSC::Yarr::YarrGenerator::ParenContext::sizeFor):
     54        (JSC::Yarr::YarrGenerator::ParenContext::nextOffset):
     55        (JSC::Yarr::YarrGenerator::ParenContext::beginOffset):
     56        (JSC::Yarr::YarrGenerator::ParenContext::matchAmountOffset):
     57        (JSC::Yarr::YarrGenerator::ParenContext::subpatternOffset):
     58        (JSC::Yarr::YarrGenerator::ParenContext::savedFrameOffset):
     59        (JSC::Yarr::YarrGenerator::initParenContextFreeList):
     60        (JSC::Yarr::YarrGenerator::allocatePatternContext):
     61        (JSC::Yarr::YarrGenerator::freePatternContext):
     62        (JSC::Yarr::YarrGenerator::savePatternContext):
     63        (JSC::Yarr::YarrGenerator::restorePatternContext):
     64        (JSC::Yarr::YarrGenerator::tryReadUnicodeCharImpl):
     65        (JSC::Yarr::YarrGenerator::storeToFrame):
     66        (JSC::Yarr::YarrGenerator::generateJITFailReturn):
     67        (JSC::Yarr::YarrGenerator::clearMatches):
     68        (JSC::Yarr::YarrGenerator::generate):
     69        (JSC::Yarr::YarrGenerator::backtrack):
     70        (JSC::Yarr::YarrGenerator::opCompileParenthesesSubpattern):
     71        (JSC::Yarr::YarrGenerator::generateEnter):
     72        (JSC::Yarr::YarrGenerator::generateReturn):
     73        (JSC::Yarr::YarrGenerator::YarrGenerator):
     74        (JSC::Yarr::YarrGenerator::compile):
     75        * yarr/YarrJIT.h:
     76        (JSC::Yarr::YarrCodeBlock::execute):
     77        * yarr/YarrPattern.cpp:
     78        (JSC::Yarr::indentForNestingLevel):
     79        (JSC::Yarr::dumpUChar32):
     80        (JSC::Yarr::dumpCharacterClass):
     81        (JSC::Yarr::PatternTerm::dump):
     82        (JSC::Yarr::YarrPattern::dumpPattern):
     83        * yarr/YarrPattern.h:
     84        (JSC::Yarr::PatternTerm::containsAnyCaptures):
     85        (JSC::Yarr::BackTrackInfoParenthesesOnce::returnAddressIndex):
     86        (JSC::Yarr::BackTrackInfoParentheses::beginIndex):
     87        (JSC::Yarr::BackTrackInfoParentheses::returnAddressIndex):
     88        (JSC::Yarr::BackTrackInfoParentheses::matchAmountIndex):
     89        (JSC::Yarr::BackTrackInfoParentheses::patternContextHeadIndex):
     90        (JSC::Yarr::BackTrackInfoAlternative::offsetIndex): Deleted.
     91
    1922017-12-08  Joseph Pecoraro  <[email protected]>
    293
Note: See TracChangeset for help on using the changeset viewer.