Ignore:
Timestamp:
Oct 12, 2015, 10:56:26 AM (10 years ago)
Author:
[email protected]
Message:

FTL should generate code to call slow paths lazily
https://bugs.webkit.org/show_bug.cgi?id=149936

Reviewed by Saam Barati.

Source/JavaScriptCore:

We often have complex slow paths in FTL-generated code. Those slow paths may never run. Even
if they do run, they don't need stellar performance. So, it doesn't make sense to have LLVM
worry about compiling such slow path code.

This patch enables us to use our own MacroAssembler for compiling the slow path inside FTL
code. It does this by using a crazy lambda thingy (see FTLLowerDFGToLLVM.cpp's lazySlowPath()
and its documentation). The result is quite natural to use.

Even for straight slow path calls via something like vmCall(), the lazySlowPath offers the
benefit that the call marshalling and the exception checking are not expressed using LLVM IR
and do not require LLVM to think about it. It also has the benefit that we never generate the
code if it never runs. That's great, since function calls usually involve ~10 instructions
total (move arguments to argument registers, make the call, check exception, etc.).

This patch adds the lazy slow path abstraction and uses it for some slow paths in the FTL.
The code we generate with lazy slow paths is worse than the code that LLVM would have
generated. Therefore, a lazy slow path only makes sense when we have strong evidence that
the slow path will execute infrequently relative to the fast path. This completely precludes
the use of lazy slow paths for out-of-line Nodes that unconditionally call a C++ function.
It also precludes their use for the GetByVal out-of-bounds handler, since when we generate
a GetByVal with an out-of-bounds handler it means that we only know that the out-of-bounds
case executed at least once. So, for all we know, it may actually be the common case. So,
this patch just deployed the lazy slow path for GC slow paths and masquerades-as-undefined
slow paths. It makes sense for GC slow paths because those have a statistical guarantee of
slow path frequency - probably bounded at less than 1/10. It makes sense for masquerades-as-
undefined because we can say quite confidently that this is an uncommon scenario on the
modern Web.

Something that's always been challenging about abstractions involving the MacroAssembler is
that linking is a separate phase, and there is no way for someone who is just given access to
the MacroAssembler& to emit code that requires linking, since linking happens once we have
emitted all code and we are creating the LinkBuffer. Moreover, the FTL requires that the
final parts of linking happen on the main thread. This patch ran into this issue, and solved
it comprehensively, by introducing MacroAssembler::addLinkTask(). This takes a lambda and
runs it at the bitter end of linking - when performFinalization() is called. This ensure that
the task added by addLinkTask() runs on the main thread. This patch doesn't replace all of
the previously existing idioms for dealing with this issue; we can do that later.

This shows small speed-ups on a bunch of things. No big win on any benchmark aggregate. But
mainly this is done for https://bugs.webkit.org/show_bug.cgi?id=149852, where we found that
outlining the slow path in this way was a significant speed boost.

  • CMakeLists.txt:
  • JavaScriptCore.vcxproj/JavaScriptCore.vcxproj:
  • JavaScriptCore.xcodeproj/project.pbxproj:
  • assembler/AbstractMacroAssembler.h:

(JSC::AbstractMacroAssembler::replaceWithAddressComputation):
(JSC::AbstractMacroAssembler::addLinkTask):
(JSC::AbstractMacroAssembler::AbstractMacroAssembler):

  • assembler/LinkBuffer.cpp:

(JSC::LinkBuffer::linkCode):
(JSC::LinkBuffer::allocate):
(JSC::LinkBuffer::performFinalization):

  • assembler/LinkBuffer.h:

(JSC::LinkBuffer::wasAlreadyDisassembled):
(JSC::LinkBuffer::didAlreadyDisassemble):
(JSC::LinkBuffer::vm):
(JSC::LinkBuffer::executableOffsetFor):

  • bytecode/CodeOrigin.h:

(JSC::CodeOrigin::CodeOrigin):
(JSC::CodeOrigin::isSet):
(JSC::CodeOrigin::operator bool):
(JSC::CodeOrigin::isHashTableDeletedValue):
(JSC::CodeOrigin::operator!): Deleted.

  • ftl/FTLCompile.cpp:

(JSC::FTL::mmAllocateDataSection):

  • ftl/FTLInlineCacheDescriptor.h:

(JSC::FTL::InlineCacheDescriptor::InlineCacheDescriptor):
(JSC::FTL::CheckInDescriptor::CheckInDescriptor):
(JSC::FTL::LazySlowPathDescriptor::LazySlowPathDescriptor):

  • ftl/FTLJITCode.h:
  • ftl/FTLJITFinalizer.cpp:

(JSC::FTL::JITFinalizer::finalizeFunction):

  • ftl/FTLJITFinalizer.h:
  • ftl/FTLLazySlowPath.cpp: Added.

(JSC::FTL::LazySlowPath::LazySlowPath):
(JSC::FTL::LazySlowPath::~LazySlowPath):
(JSC::FTL::LazySlowPath::generate):

  • ftl/FTLLazySlowPath.h: Added.

(JSC::FTL::LazySlowPath::createGenerator):
(JSC::FTL::LazySlowPath::patchpoint):
(JSC::FTL::LazySlowPath::usedRegisters):
(JSC::FTL::LazySlowPath::callSiteIndex):
(JSC::FTL::LazySlowPath::stub):

  • ftl/FTLLazySlowPathCall.h: Added.

(JSC::FTL::createLazyCallGenerator):

  • ftl/FTLLowerDFGToLLVM.cpp:

(JSC::FTL::DFG::LowerDFGToLLVM::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToLLVM::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNotifyWrite):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIsObjectOrNull):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIsFunction):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIn):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToLLVM::compileCheckWatchdogTimer):
(JSC::FTL::DFG::LowerDFGToLLVM::allocatePropertyStorageWithSizeImpl):
(JSC::FTL::DFG::LowerDFGToLLVM::allocateObject):
(JSC::FTL::DFG::LowerDFGToLLVM::allocateJSArray):
(JSC::FTL::DFG::LowerDFGToLLVM::buildTypeOf):
(JSC::FTL::DFG::LowerDFGToLLVM::sensibleDoubleToInt32):
(JSC::FTL::DFG::LowerDFGToLLVM::lazySlowPath):
(JSC::FTL::DFG::LowerDFGToLLVM::speculate):
(JSC::FTL::DFG::LowerDFGToLLVM::emitStoreBarrier):

  • ftl/FTLOperations.cpp:

(JSC::FTL::operationMaterializeObjectInOSR):
(JSC::FTL::compileFTLLazySlowPath):

  • ftl/FTLOperations.h:
  • ftl/FTLSlowPathCall.cpp:

(JSC::FTL::SlowPathCallContext::SlowPathCallContext):
(JSC::FTL::SlowPathCallContext::~SlowPathCallContext):
(JSC::FTL::SlowPathCallContext::keyWithTarget):
(JSC::FTL::SlowPathCallContext::makeCall):
(JSC::FTL::callSiteIndexForCodeOrigin):
(JSC::FTL::storeCodeOrigin): Deleted.
(JSC::FTL::callOperation): Deleted.

  • ftl/FTLSlowPathCall.h:

(JSC::FTL::callOperation):

  • ftl/FTLState.h:
  • ftl/FTLThunks.cpp:

(JSC::FTL::genericGenerationThunkGenerator):
(JSC::FTL::osrExitGenerationThunkGenerator):
(JSC::FTL::lazySlowPathGenerationThunkGenerator):
(JSC::FTL::registerClobberCheck):

  • ftl/FTLThunks.h:
  • interpreter/CallFrame.h:

(JSC::CallSiteIndex::CallSiteIndex):
(JSC::CallSiteIndex::operator bool):
(JSC::CallSiteIndex::bits):

  • jit/CCallHelpers.h:

(JSC::CCallHelpers::setupArgument):
(JSC::CCallHelpers::setupArgumentsWithExecState):

  • jit/JITOperations.cpp:

Source/WTF:

Enables SharedTask to handle any function type, not just void().

It's probably better to use SharedTask instead of std::function in performance-sensitive
code. std::function uses the system malloc and has copy semantics. SharedTask uses FastMalloc
and has aliasing semantics. So, you can just trust that it will have sensible performance
characteristics.

  • wtf/ParallelHelperPool.cpp:

(WTF::ParallelHelperClient::~ParallelHelperClient):
(WTF::ParallelHelperClient::setTask):
(WTF::ParallelHelperClient::doSomeHelping):
(WTF::ParallelHelperClient::runTaskInParallel):
(WTF::ParallelHelperClient::finish):
(WTF::ParallelHelperClient::claimTask):
(WTF::ParallelHelperClient::runTask):
(WTF::ParallelHelperPool::doSomeHelping):
(WTF::ParallelHelperPool::helperThreadBody):

  • wtf/ParallelHelperPool.h:

(WTF::ParallelHelperClient::setFunction):
(WTF::ParallelHelperClient::runFunctionInParallel):
(WTF::ParallelHelperClient::pool):

  • wtf/SharedTask.h:

(WTF::createSharedTask):
(WTF::SharedTask::SharedTask): Deleted.
(WTF::SharedTask::~SharedTask): Deleted.
(WTF::SharedTaskFunctor::SharedTaskFunctor): Deleted.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ftl/FTLThunks.cpp

    r189575 r190860  
    3232#include "FPRInfo.h"
    3333#include "FTLOSRExitCompiler.h"
     34#include "FTLOperations.h"
    3435#include "FTLSaveRestore.h"
    3536#include "GPRInfo.h"
     
    4041using namespace DFG;
    4142
    42 MacroAssemblerCodeRef osrExitGenerationThunkGenerator(VM* vm)
     43static MacroAssemblerCodeRef genericGenerationThunkGenerator(
     44    VM* vm, FunctionPtr generationFunction, const char* name, unsigned extraPopsToRestore)
    4345{
    4446    AssemblyHelpers jit(vm, 0);
    4547   
    46     // Note that the "return address" will be the OSR exit ID.
     48    // Note that the "return address" will be the ID that we pass to the generation function.
    4749   
    4850    ptrdiff_t stackMisalignment = MacroAssembler::pushToSaveByteOffset();
     
    9193        jit.popToRestore(GPRInfo::regT1);
    9294    jit.popToRestore(MacroAssembler::framePointerRegister);
    93    
    94     // At this point we're sitting on the return address - so if we did a jump right now, the
    95     // tail-callee would be happy. Instead we'll stash the callee in the return address and then
    96     // restore all registers.
    97    
     95
     96    // When we came in here, there was an additional thing pushed to the stack. Some clients want it
     97    // popped before proceeding.
     98    while (extraPopsToRestore--)
     99        jit.popToRestore(GPRInfo::regT1);
     100
     101    // Put the return address wherever the return instruction wants it. On all platforms, this
     102    // ensures that the return address is out of the way of register restoration.
    98103    jit.restoreReturnAddressBeforeReturn(GPRInfo::regT0);
    99104
     
    103108   
    104109    LinkBuffer patchBuffer(*vm, jit, GLOBAL_THUNK_ID);
    105     patchBuffer.link(functionCall, compileFTLOSRExit);
    106     return FINALIZE_CODE(patchBuffer, ("FTL OSR exit generation thunk"));
     110    patchBuffer.link(functionCall, generationFunction);
     111    return FINALIZE_CODE(patchBuffer, ("%s", name));
     112}
     113
     114MacroAssemblerCodeRef osrExitGenerationThunkGenerator(VM* vm)
     115{
     116    unsigned extraPopsToRestore = 0;
     117    return genericGenerationThunkGenerator(
     118        vm, compileFTLOSRExit, "FTL OSR exit generation thunk", extraPopsToRestore);
     119}
     120
     121MacroAssemblerCodeRef lazySlowPathGenerationThunkGenerator(VM* vm)
     122{
     123    unsigned extraPopsToRestore = 1;
     124    return genericGenerationThunkGenerator(
     125        vm, compileFTLLazySlowPath, "FTL lazy slow path generation thunk", extraPopsToRestore);
    107126}
    108127
Note: See TracChangeset for help on using the changeset viewer.