Ignore:
Timestamp:
Dec 8, 2017, 12:32:42 PM (7 years ago)
Author:
[email protected]
Message:

YARR: JIT RegExps with greedy parenthesized sub patterns
https://bugs.webkit.org/show_bug.cgi?id=180538

Reviewed by JF Bastien.

This patch adds JIT support for regular expressions containing greedy counted
parenthesis. An example expression that couldn't be JIT'ed before is /q(a|b)*q/.

Just like in the interpreter, expressions with nested parenthetical subpatterns
require saving the results of previous matches of the parentheses contents along
with any associated state. This saved state is needed in the case that we need
to backtrack. This state is called ParenContext within the code space allocated
for this ParenContext is managed using a simple block allocator within the JIT'ed
code. The raw space managed by this allocator is passed into the JIT'ed function.

Since this fixed sized space may be exceeded, this patch adds a fallback mechanism.
If the JIT'ed code exhausts all its ParenContext space, it returns a new error
JSRegExpJITCodeFailure. The caller will then bytecompile and interpret the
expression.

Due to increased register usage by the parenthesis handling code, the use of
registers by the JIT engine was restructured, with registers used for Unicode
pattern matching replaced with constants.

Reworked some of the context structures that are used across the interpreter
and JIT implementations to make them a little more uniform and to handle the
needs of JIT'ing the new parentheses forms.

To help with development and debugging of this code, compiled patterns dumping
code was enhanced. Also added the ability to also dump interpreter ByteCodes.

  • runtime/RegExp.cpp:

(JSC::byteCodeCompilePattern):
(JSC::RegExp::byteCodeCompileIfNecessary):
(JSC::RegExp::compile):
(JSC::RegExp::compileMatchOnly):

  • runtime/RegExp.h:
  • runtime/RegExpInlines.h:

(JSC::RegExp::matchInline):

  • testRegExp.cpp:

(parseRegExpLine):
(runFromFiles):

  • yarr/Yarr.h:
  • yarr/YarrInterpreter.cpp:

(JSC::Yarr::ByteCompiler::compile):
(JSC::Yarr::ByteCompiler::dumpDisjunction):

  • yarr/YarrJIT.cpp:

(JSC::Yarr::YarrGenerator::ParenContextSizes::ParenContextSizes):
(JSC::Yarr::YarrGenerator::ParenContextSizes::numSubpatterns):
(JSC::Yarr::YarrGenerator::ParenContextSizes::frameSlots):
(JSC::Yarr::YarrGenerator::ParenContext::sizeFor):
(JSC::Yarr::YarrGenerator::ParenContext::nextOffset):
(JSC::Yarr::YarrGenerator::ParenContext::beginOffset):
(JSC::Yarr::YarrGenerator::ParenContext::matchAmountOffset):
(JSC::Yarr::YarrGenerator::ParenContext::subpatternOffset):
(JSC::Yarr::YarrGenerator::ParenContext::savedFrameOffset):
(JSC::Yarr::YarrGenerator::initParenContextFreeList):
(JSC::Yarr::YarrGenerator::allocatePatternContext):
(JSC::Yarr::YarrGenerator::freePatternContext):
(JSC::Yarr::YarrGenerator::savePatternContext):
(JSC::Yarr::YarrGenerator::restorePatternContext):
(JSC::Yarr::YarrGenerator::tryReadUnicodeCharImpl):
(JSC::Yarr::YarrGenerator::storeToFrame):
(JSC::Yarr::YarrGenerator::generateJITFailReturn):
(JSC::Yarr::YarrGenerator::clearMatches):
(JSC::Yarr::YarrGenerator::generate):
(JSC::Yarr::YarrGenerator::backtrack):
(JSC::Yarr::YarrGenerator::opCompileParenthesesSubpattern):
(JSC::Yarr::YarrGenerator::generateEnter):
(JSC::Yarr::YarrGenerator::generateReturn):
(JSC::Yarr::YarrGenerator::YarrGenerator):
(JSC::Yarr::YarrGenerator::compile):

  • yarr/YarrJIT.h:

(JSC::Yarr::YarrCodeBlock::execute):

  • yarr/YarrPattern.cpp:

(JSC::Yarr::indentForNestingLevel):
(JSC::Yarr::dumpUChar32):
(JSC::Yarr::dumpCharacterClass):
(JSC::Yarr::PatternTerm::dump):
(JSC::Yarr::YarrPattern::dumpPattern):

  • yarr/YarrPattern.h:

(JSC::Yarr::PatternTerm::containsAnyCaptures):
(JSC::Yarr::BackTrackInfoParenthesesOnce::returnAddressIndex):
(JSC::Yarr::BackTrackInfoParentheses::beginIndex):
(JSC::Yarr::BackTrackInfoParentheses::returnAddressIndex):
(JSC::Yarr::BackTrackInfoParentheses::matchAmountIndex):
(JSC::Yarr::BackTrackInfoParentheses::patternContextHeadIndex):
(JSC::Yarr::BackTrackInfoAlternative::offsetIndex): Deleted.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/runtime/RegExp.cpp

    r223010 r225695  
    272272}
    273273
     274
     275static std::unique_ptr<Yarr::BytecodePattern> byteCodeCompilePattern(VM* vm, Yarr::YarrPattern& pattern)
     276{
     277    return Yarr::byteCompile(pattern, &vm->m_regExpAllocator, &vm->m_regExpAllocatorLock);
     278}
     279
     280void RegExp::byteCodeCompileIfNecessary(VM* vm)
     281{
     282    if (m_regExpBytecode)
     283        return;
     284
     285    Yarr::YarrPattern pattern(m_patternString, m_flags, &m_constructionError, vm->stackLimit());
     286    if (m_constructionError) {
     287        RELEASE_ASSERT_NOT_REACHED();
     288#if COMPILER_QUIRK(CONSIDERS_UNREACHABLE_CODE)
     289        m_state = ParseError;
     290        return;
     291#endif
     292    }
     293    ASSERT(m_numSubpatterns == pattern.m_numSubpatterns);
     294
     295    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
     296}
     297
    274298void RegExp::compile(VM* vm, Yarr::YarrCharSize charSize)
    275299{
     
    304328#endif
    305329
     330    if (Options::dumpCompiledRegExpPatterns())
     331        dataLog("Can't JIT this regular expression: \"", m_patternString, "\"\n");
     332
    306333    m_state = ByteCode;
    307     m_regExpBytecode = Yarr::byteCompile(pattern, &vm->m_regExpAllocator, &vm->m_regExpAllocatorLock);
     334    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
    308335}
    309336
     
    357384#endif
    358385
     386    if (Options::dumpCompiledRegExpPatterns())
     387        dataLog("Can't JIT this regular expression: \"", m_patternString, "\"\n");
     388
    359389    m_state = ByteCode;
    360     m_regExpBytecode = Yarr::byteCompile(pattern, &vm->m_regExpAllocator, &vm->m_regExpAllocatorLock);
     390    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
    361391}
    362392
Note: See TracChangeset for help on using the changeset viewer.