Ignore:
Timestamp:
Oct 12, 2015, 10:56:26 AM (10 years ago)
Author:
[email protected]
Message:

FTL should generate code to call slow paths lazily
https://bugs.webkit.org/show_bug.cgi?id=149936

Reviewed by Saam Barati.

Source/JavaScriptCore:

We often have complex slow paths in FTL-generated code. Those slow paths may never run. Even
if they do run, they don't need stellar performance. So, it doesn't make sense to have LLVM
worry about compiling such slow path code.

This patch enables us to use our own MacroAssembler for compiling the slow path inside FTL
code. It does this by using a crazy lambda thingy (see FTLLowerDFGToLLVM.cpp's lazySlowPath()
and its documentation). The result is quite natural to use.

Even for straight slow path calls via something like vmCall(), the lazySlowPath offers the
benefit that the call marshalling and the exception checking are not expressed using LLVM IR
and do not require LLVM to think about it. It also has the benefit that we never generate the
code if it never runs. That's great, since function calls usually involve ~10 instructions
total (move arguments to argument registers, make the call, check exception, etc.).

This patch adds the lazy slow path abstraction and uses it for some slow paths in the FTL.
The code we generate with lazy slow paths is worse than the code that LLVM would have
generated. Therefore, a lazy slow path only makes sense when we have strong evidence that
the slow path will execute infrequently relative to the fast path. This completely precludes
the use of lazy slow paths for out-of-line Nodes that unconditionally call a C++ function.
It also precludes their use for the GetByVal out-of-bounds handler, since when we generate
a GetByVal with an out-of-bounds handler it means that we only know that the out-of-bounds
case executed at least once. So, for all we know, it may actually be the common case. So,
this patch just deployed the lazy slow path for GC slow paths and masquerades-as-undefined
slow paths. It makes sense for GC slow paths because those have a statistical guarantee of
slow path frequency - probably bounded at less than 1/10. It makes sense for masquerades-as-
undefined because we can say quite confidently that this is an uncommon scenario on the
modern Web.

Something that's always been challenging about abstractions involving the MacroAssembler is
that linking is a separate phase, and there is no way for someone who is just given access to
the MacroAssembler& to emit code that requires linking, since linking happens once we have
emitted all code and we are creating the LinkBuffer. Moreover, the FTL requires that the
final parts of linking happen on the main thread. This patch ran into this issue, and solved
it comprehensively, by introducing MacroAssembler::addLinkTask(). This takes a lambda and
runs it at the bitter end of linking - when performFinalization() is called. This ensure that
the task added by addLinkTask() runs on the main thread. This patch doesn't replace all of
the previously existing idioms for dealing with this issue; we can do that later.

This shows small speed-ups on a bunch of things. No big win on any benchmark aggregate. But
mainly this is done for https://bugs.webkit.org/show_bug.cgi?id=149852, where we found that
outlining the slow path in this way was a significant speed boost.

  • CMakeLists.txt:
  • JavaScriptCore.vcxproj/JavaScriptCore.vcxproj:
  • JavaScriptCore.xcodeproj/project.pbxproj:
  • assembler/AbstractMacroAssembler.h:

(JSC::AbstractMacroAssembler::replaceWithAddressComputation):
(JSC::AbstractMacroAssembler::addLinkTask):
(JSC::AbstractMacroAssembler::AbstractMacroAssembler):

  • assembler/LinkBuffer.cpp:

(JSC::LinkBuffer::linkCode):
(JSC::LinkBuffer::allocate):
(JSC::LinkBuffer::performFinalization):

  • assembler/LinkBuffer.h:

(JSC::LinkBuffer::wasAlreadyDisassembled):
(JSC::LinkBuffer::didAlreadyDisassemble):
(JSC::LinkBuffer::vm):
(JSC::LinkBuffer::executableOffsetFor):

  • bytecode/CodeOrigin.h:

(JSC::CodeOrigin::CodeOrigin):
(JSC::CodeOrigin::isSet):
(JSC::CodeOrigin::operator bool):
(JSC::CodeOrigin::isHashTableDeletedValue):
(JSC::CodeOrigin::operator!): Deleted.

  • ftl/FTLCompile.cpp:

(JSC::FTL::mmAllocateDataSection):

  • ftl/FTLInlineCacheDescriptor.h:

(JSC::FTL::InlineCacheDescriptor::InlineCacheDescriptor):
(JSC::FTL::CheckInDescriptor::CheckInDescriptor):
(JSC::FTL::LazySlowPathDescriptor::LazySlowPathDescriptor):

  • ftl/FTLJITCode.h:
  • ftl/FTLJITFinalizer.cpp:

(JSC::FTL::JITFinalizer::finalizeFunction):

  • ftl/FTLJITFinalizer.h:
  • ftl/FTLLazySlowPath.cpp: Added.

(JSC::FTL::LazySlowPath::LazySlowPath):
(JSC::FTL::LazySlowPath::~LazySlowPath):
(JSC::FTL::LazySlowPath::generate):

  • ftl/FTLLazySlowPath.h: Added.

(JSC::FTL::LazySlowPath::createGenerator):
(JSC::FTL::LazySlowPath::patchpoint):
(JSC::FTL::LazySlowPath::usedRegisters):
(JSC::FTL::LazySlowPath::callSiteIndex):
(JSC::FTL::LazySlowPath::stub):

  • ftl/FTLLazySlowPathCall.h: Added.

(JSC::FTL::createLazyCallGenerator):

  • ftl/FTLLowerDFGToLLVM.cpp:

(JSC::FTL::DFG::LowerDFGToLLVM::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToLLVM::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToLLVM::compileNotifyWrite):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIsObjectOrNull):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIsFunction):
(JSC::FTL::DFG::LowerDFGToLLVM::compileIn):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToLLVM::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToLLVM::compileCheckWatchdogTimer):
(JSC::FTL::DFG::LowerDFGToLLVM::allocatePropertyStorageWithSizeImpl):
(JSC::FTL::DFG::LowerDFGToLLVM::allocateObject):
(JSC::FTL::DFG::LowerDFGToLLVM::allocateJSArray):
(JSC::FTL::DFG::LowerDFGToLLVM::buildTypeOf):
(JSC::FTL::DFG::LowerDFGToLLVM::sensibleDoubleToInt32):
(JSC::FTL::DFG::LowerDFGToLLVM::lazySlowPath):
(JSC::FTL::DFG::LowerDFGToLLVM::speculate):
(JSC::FTL::DFG::LowerDFGToLLVM::emitStoreBarrier):

  • ftl/FTLOperations.cpp:

(JSC::FTL::operationMaterializeObjectInOSR):
(JSC::FTL::compileFTLLazySlowPath):

  • ftl/FTLOperations.h:
  • ftl/FTLSlowPathCall.cpp:

(JSC::FTL::SlowPathCallContext::SlowPathCallContext):
(JSC::FTL::SlowPathCallContext::~SlowPathCallContext):
(JSC::FTL::SlowPathCallContext::keyWithTarget):
(JSC::FTL::SlowPathCallContext::makeCall):
(JSC::FTL::callSiteIndexForCodeOrigin):
(JSC::FTL::storeCodeOrigin): Deleted.
(JSC::FTL::callOperation): Deleted.

  • ftl/FTLSlowPathCall.h:

(JSC::FTL::callOperation):

  • ftl/FTLState.h:
  • ftl/FTLThunks.cpp:

(JSC::FTL::genericGenerationThunkGenerator):
(JSC::FTL::osrExitGenerationThunkGenerator):
(JSC::FTL::lazySlowPathGenerationThunkGenerator):
(JSC::FTL::registerClobberCheck):

  • ftl/FTLThunks.h:
  • interpreter/CallFrame.h:

(JSC::CallSiteIndex::CallSiteIndex):
(JSC::CallSiteIndex::operator bool):
(JSC::CallSiteIndex::bits):

  • jit/CCallHelpers.h:

(JSC::CCallHelpers::setupArgument):
(JSC::CCallHelpers::setupArgumentsWithExecState):

  • jit/JITOperations.cpp:

Source/WTF:

Enables SharedTask to handle any function type, not just void().

It's probably better to use SharedTask instead of std::function in performance-sensitive
code. std::function uses the system malloc and has copy semantics. SharedTask uses FastMalloc
and has aliasing semantics. So, you can just trust that it will have sensible performance
characteristics.

  • wtf/ParallelHelperPool.cpp:

(WTF::ParallelHelperClient::~ParallelHelperClient):
(WTF::ParallelHelperClient::setTask):
(WTF::ParallelHelperClient::doSomeHelping):
(WTF::ParallelHelperClient::runTaskInParallel):
(WTF::ParallelHelperClient::finish):
(WTF::ParallelHelperClient::claimTask):
(WTF::ParallelHelperClient::runTask):
(WTF::ParallelHelperPool::doSomeHelping):
(WTF::ParallelHelperPool::helperThreadBody):

  • wtf/ParallelHelperPool.h:

(WTF::ParallelHelperClient::setFunction):
(WTF::ParallelHelperClient::runFunctionInParallel):
(WTF::ParallelHelperClient::pool):

  • wtf/SharedTask.h:

(WTF::createSharedTask):
(WTF::SharedTask::SharedTask): Deleted.
(WTF::SharedTask::~SharedTask): Deleted.
(WTF::SharedTaskFunctor::SharedTaskFunctor): Deleted.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ftl/FTLSlowPathCall.cpp

    r188932 r190860  
    11/*
    2  * Copyright (C) 2013, 2014 Apple Inc. All rights reserved.
     2 * Copyright (C) 2013-2015 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    3131#include "CCallHelpers.h"
    3232#include "FTLState.h"
     33#include "FTLThunks.h"
    3334#include "GPRInfo.h"
    3435#include "JSCInlines.h"
     
    3637namespace JSC { namespace FTL {
    3738
    38 namespace {
    39 
    4039// This code relies on us being 64-bit. FTL is currently always 64-bit.
    4140static const size_t wordSize = 8;
    4241
    43 // This will be an RAII thingy that will set up the necessary stack sizes and offsets and such.
    44 class CallContext {
    45 public:
    46     CallContext(
    47         State& state, const RegisterSet& usedRegisters, CCallHelpers& jit,
    48         unsigned numArgs, GPRReg returnRegister)
    49         : m_state(state)
    50         , m_usedRegisters(usedRegisters)
    51         , m_jit(jit)
    52         , m_numArgs(numArgs)
    53         , m_returnRegister(returnRegister)
    54     {
    55         // We don't care that you're using callee-save, stack, or hardware registers.
    56         m_usedRegisters.exclude(RegisterSet::stackRegisters());
    57         m_usedRegisters.exclude(RegisterSet::reservedHardwareRegisters());
    58         m_usedRegisters.exclude(RegisterSet::calleeSaveRegisters());
     42SlowPathCallContext::SlowPathCallContext(
     43    RegisterSet usedRegisters, CCallHelpers& jit, unsigned numArgs, GPRReg returnRegister)
     44    : m_jit(jit)
     45    , m_numArgs(numArgs)
     46    , m_returnRegister(returnRegister)
     47{
     48    // We don't care that you're using callee-save, stack, or hardware registers.
     49    usedRegisters.exclude(RegisterSet::stackRegisters());
     50    usedRegisters.exclude(RegisterSet::reservedHardwareRegisters());
     51    usedRegisters.exclude(RegisterSet::calleeSaveRegisters());
    5952       
    60         // The return register doesn't need to be saved.
    61         if (m_returnRegister != InvalidGPRReg)
    62             m_usedRegisters.clear(m_returnRegister);
     53    // The return register doesn't need to be saved.
     54    if (m_returnRegister != InvalidGPRReg)
     55        usedRegisters.clear(m_returnRegister);
    6356       
    64         size_t stackBytesNeededForReturnAddress = wordSize;
     57    size_t stackBytesNeededForReturnAddress = wordSize;
    6558       
    66         m_offsetToSavingArea =
    67             (std::max(m_numArgs, NUMBER_OF_ARGUMENT_REGISTERS) - NUMBER_OF_ARGUMENT_REGISTERS) * wordSize;
     59    m_offsetToSavingArea =
     60        (std::max(m_numArgs, NUMBER_OF_ARGUMENT_REGISTERS) - NUMBER_OF_ARGUMENT_REGISTERS) * wordSize;
    6861       
    69         for (unsigned i = std::min(NUMBER_OF_ARGUMENT_REGISTERS, numArgs); i--;)
    70             m_argumentRegisters.set(GPRInfo::toArgumentRegister(i));
    71         m_callingConventionRegisters.merge(m_argumentRegisters);
    72         if (returnRegister != InvalidGPRReg)
    73             m_callingConventionRegisters.set(GPRInfo::returnValueGPR);
    74         m_callingConventionRegisters.filter(m_usedRegisters);
     62    for (unsigned i = std::min(NUMBER_OF_ARGUMENT_REGISTERS, numArgs); i--;)
     63        m_argumentRegisters.set(GPRInfo::toArgumentRegister(i));
     64    m_callingConventionRegisters.merge(m_argumentRegisters);
     65    if (returnRegister != InvalidGPRReg)
     66        m_callingConventionRegisters.set(GPRInfo::returnValueGPR);
     67    m_callingConventionRegisters.filter(usedRegisters);
    7568       
    76         unsigned numberOfCallingConventionRegisters =
    77             m_callingConventionRegisters.numberOfSetRegisters();
     69    unsigned numberOfCallingConventionRegisters =
     70        m_callingConventionRegisters.numberOfSetRegisters();
    7871       
    79         size_t offsetToThunkSavingArea =
    80             m_offsetToSavingArea +
    81             numberOfCallingConventionRegisters * wordSize;
     72    size_t offsetToThunkSavingArea =
     73        m_offsetToSavingArea +
     74        numberOfCallingConventionRegisters * wordSize;
    8275       
    83         m_stackBytesNeeded =
    84             offsetToThunkSavingArea +
    85             stackBytesNeededForReturnAddress +
    86             (m_usedRegisters.numberOfSetRegisters() - numberOfCallingConventionRegisters) * wordSize;
     76    m_stackBytesNeeded =
     77        offsetToThunkSavingArea +
     78        stackBytesNeededForReturnAddress +
     79        (usedRegisters.numberOfSetRegisters() - numberOfCallingConventionRegisters) * wordSize;
    8780       
    88         m_stackBytesNeeded = (m_stackBytesNeeded + stackAlignmentBytes() - 1) & ~(stackAlignmentBytes() - 1);
     81    m_stackBytesNeeded = (m_stackBytesNeeded + stackAlignmentBytes() - 1) & ~(stackAlignmentBytes() - 1);
    8982       
    90         m_jit.subPtr(CCallHelpers::TrustedImm32(m_stackBytesNeeded), CCallHelpers::stackPointerRegister);
     83    m_jit.subPtr(CCallHelpers::TrustedImm32(m_stackBytesNeeded), CCallHelpers::stackPointerRegister);
     84
     85    m_thunkSaveSet = usedRegisters;
    9186       
    92         m_thunkSaveSet = m_usedRegisters;
     87    // This relies on all calling convention registers also being temp registers.
     88    unsigned stackIndex = 0;
     89    for (unsigned i = GPRInfo::numberOfRegisters; i--;) {
     90        GPRReg reg = GPRInfo::toRegister(i);
     91        if (!m_callingConventionRegisters.get(reg))
     92            continue;
     93        m_jit.storePtr(reg, CCallHelpers::Address(CCallHelpers::stackPointerRegister, m_offsetToSavingArea + (stackIndex++) * wordSize));
     94        m_thunkSaveSet.clear(reg);
     95    }
    9396       
    94         // This relies on all calling convention registers also being temp registers.
    95         unsigned stackIndex = 0;
    96         for (unsigned i = GPRInfo::numberOfRegisters; i--;) {
    97             GPRReg reg = GPRInfo::toRegister(i);
    98             if (!m_callingConventionRegisters.get(reg))
    99                 continue;
    100             m_jit.storePtr(reg, CCallHelpers::Address(CCallHelpers::stackPointerRegister, m_offsetToSavingArea + (stackIndex++) * wordSize));
    101             m_thunkSaveSet.clear(reg);
    102         }
    103        
    104         m_offset = offsetToThunkSavingArea;
     97    m_offset = offsetToThunkSavingArea;
     98}
     99   
     100SlowPathCallContext::~SlowPathCallContext()
     101{
     102    if (m_returnRegister != InvalidGPRReg)
     103        m_jit.move(GPRInfo::returnValueGPR, m_returnRegister);
     104   
     105    unsigned stackIndex = 0;
     106    for (unsigned i = GPRInfo::numberOfRegisters; i--;) {
     107        GPRReg reg = GPRInfo::toRegister(i);
     108        if (!m_callingConventionRegisters.get(reg))
     109            continue;
     110        m_jit.loadPtr(CCallHelpers::Address(CCallHelpers::stackPointerRegister, m_offsetToSavingArea + (stackIndex++) * wordSize), reg);
    105111    }
    106112   
    107     ~CallContext()
    108     {
    109         if (m_returnRegister != InvalidGPRReg)
    110             m_jit.move(GPRInfo::returnValueGPR, m_returnRegister);
    111        
    112         unsigned stackIndex = 0;
    113         for (unsigned i = GPRInfo::numberOfRegisters; i--;) {
    114             GPRReg reg = GPRInfo::toRegister(i);
    115             if (!m_callingConventionRegisters.get(reg))
    116                 continue;
    117             m_jit.loadPtr(CCallHelpers::Address(CCallHelpers::stackPointerRegister, m_offsetToSavingArea + (stackIndex++) * wordSize), reg);
    118         }
    119        
    120         m_jit.addPtr(CCallHelpers::TrustedImm32(m_stackBytesNeeded), CCallHelpers::stackPointerRegister);
    121     }
    122    
    123     RegisterSet usedRegisters() const
    124     {
    125         return m_thunkSaveSet;
    126     }
    127    
    128     ptrdiff_t offset() const
    129     {
    130         return m_offset;
    131     }
    132    
    133     SlowPathCallKey keyWithTarget(void* callTarget) const
    134     {
    135         return SlowPathCallKey(usedRegisters(), callTarget, m_argumentRegisters, offset());
    136     }
    137    
    138     MacroAssembler::Call makeCall(void* callTarget, MacroAssembler::JumpList* exceptionTarget)
    139     {
    140         MacroAssembler::Call result = m_jit.call();
    141         m_state.finalizer->slowPathCalls.append(SlowPathCall(
    142             result, keyWithTarget(callTarget)));
    143         if (exceptionTarget)
    144             exceptionTarget->append(m_jit.emitExceptionCheck());
    145         return result;
    146     }
    147    
    148 private:
    149     State& m_state;
    150     RegisterSet m_usedRegisters;
    151     RegisterSet m_argumentRegisters;
    152     RegisterSet m_callingConventionRegisters;
    153     CCallHelpers& m_jit;
    154     unsigned m_numArgs;
    155     GPRReg m_returnRegister;
    156     size_t m_offsetToSavingArea;
    157     size_t m_stackBytesNeeded;
    158     RegisterSet m_thunkSaveSet;
    159     ptrdiff_t m_offset;
    160 };
    161 
    162 } // anonymous namespace
    163 
    164 void storeCodeOrigin(State& state, CCallHelpers& jit, CodeOrigin codeOrigin)
    165 {
    166     if (!codeOrigin.isSet())
    167         return;
    168    
    169     CallSiteIndex callSite = state.jitCode->common.addCodeOrigin(codeOrigin);
    170     unsigned locationBits = callSite.bits();
    171     jit.store32(
    172         CCallHelpers::TrustedImm32(locationBits),
    173         CCallHelpers::tagFor(static_cast<VirtualRegister>(JSStack::ArgumentCount)));
     113    m_jit.addPtr(CCallHelpers::TrustedImm32(m_stackBytesNeeded), CCallHelpers::stackPointerRegister);
    174114}
    175115
    176 MacroAssembler::Call callOperation(
    177     State& state, const RegisterSet& usedRegisters, CCallHelpers& jit,
    178     CodeOrigin codeOrigin, MacroAssembler::JumpList* exceptionTarget,
    179     J_JITOperation_ESsiCI operation, GPRReg result, StructureStubInfo* stubInfo,
    180     GPRReg object, const UniquedStringImpl* uid)
     116SlowPathCallKey SlowPathCallContext::keyWithTarget(void* callTarget) const
    181117{
    182     storeCodeOrigin(state, jit, codeOrigin);
    183     CallContext context(state, usedRegisters, jit, 4, result);
    184     jit.setupArgumentsWithExecState(
    185         CCallHelpers::TrustedImmPtr(stubInfo), object, CCallHelpers::TrustedImmPtr(uid));
    186     return context.makeCall(bitwise_cast<void*>(operation), exceptionTarget);
     118    return SlowPathCallKey(m_thunkSaveSet, callTarget, m_argumentRegisters, m_offset);
    187119}
    188120
    189 MacroAssembler::Call callOperation(
    190     State& state, const RegisterSet& usedRegisters, CCallHelpers& jit,
    191     CodeOrigin codeOrigin, MacroAssembler::JumpList* exceptionTarget,
    192     J_JITOperation_ESsiJI operation, GPRReg result, StructureStubInfo* stubInfo,
    193     GPRReg object, UniquedStringImpl* uid)
     121SlowPathCall SlowPathCallContext::makeCall(void* callTarget)
    194122{
    195     storeCodeOrigin(state, jit, codeOrigin);
    196     CallContext context(state, usedRegisters, jit, 4, result);
    197     jit.setupArgumentsWithExecState(
    198         CCallHelpers::TrustedImmPtr(stubInfo), object,
    199         CCallHelpers::TrustedImmPtr(uid));
    200     return context.makeCall(bitwise_cast<void*>(operation), exceptionTarget);
     123    SlowPathCall result = SlowPathCall(m_jit.call(), keyWithTarget(callTarget));
     124
     125    m_jit.addLinkTask(
     126        [result] (LinkBuffer& linkBuffer) {
     127            VM& vm = linkBuffer.vm();
     128
     129            MacroAssemblerCodeRef thunk =
     130                vm.ftlThunks->getSlowPathCallThunk(vm, result.key());
     131
     132            linkBuffer.link(result.call(), CodeLocationLabel(thunk.code()));
     133        });
     134   
     135    return result;
    201136}
    202137
    203 MacroAssembler::Call callOperation(
    204     State& state, const RegisterSet& usedRegisters, CCallHelpers& jit,
    205     CodeOrigin codeOrigin, MacroAssembler::JumpList* exceptionTarget,
    206     V_JITOperation_ESsiJJI operation, StructureStubInfo* stubInfo, GPRReg value,
    207     GPRReg object, UniquedStringImpl* uid)
     138CallSiteIndex callSiteIndexForCodeOrigin(State& state, CodeOrigin codeOrigin)
    208139{
    209     storeCodeOrigin(state, jit, codeOrigin);
    210     CallContext context(state, usedRegisters, jit, 5, InvalidGPRReg);
    211     jit.setupArgumentsWithExecState(
    212         CCallHelpers::TrustedImmPtr(stubInfo), value, object,
    213         CCallHelpers::TrustedImmPtr(uid));
    214     return context.makeCall(bitwise_cast<void*>(operation), exceptionTarget);
     140    if (codeOrigin)
     141        return state.jitCode->common.addCodeOrigin(codeOrigin);
     142    return CallSiteIndex();
    215143}
    216144
Note: See TracChangeset for help on using the changeset viewer.