Ignore:
Timestamp:
Aug 28, 2014, 12:09:48 PM (11 years ago)
Author:
[email protected]
Message:

FTL should be able to do polymorphic call inlining
https://bugs.webkit.org/show_bug.cgi?id=135145

Reviewed by Geoffrey Garen.
Source/JavaScriptCore:


Added a log-based high-fidelity call edge profiler that runs in DFG JIT (and optionally
baseline JIT) code. Used it to do precise polymorphic inlining in the FTL. Potential
inlining sites use the call edge profile if it is available, but they will still fall back
on the call inline cache and rare case counts if it's not. Polymorphic inlining means that
multiple possible callees can be inlined with a switch to guard them. The slow path may
either be an OSR exit or a virtual call.

The call edge profiling added in this patch is very precise - it will tell you about every
call that has ever happened. It took some effort to reduce the overhead of this profiling.
This mostly involved ensuring that we don't do it unnecessarily. For example, we avoid it
in the baseline JIT (you can conditionally enable it but it's off by default) and we only do
it in the DFG JIT if we know that the regular inline cache profiling wasn't precise enough.
I also experimented with reducing the precision of the profiling. This led to a significant
reduction in the speed-up, so I avoided this approach. I also explored making log processing
concurrent, but that didn't help. Also, I tested the overhead of the log processing and
found that most of the overhead of this profiling is actually in putting things into the log
rather than in processing the log - that part appears to be surprisingly cheap.

Polymorphic inlining could be enabled in the DFG if we enabled baseline call edge profiling,
and if we guarded such inlining sites with some profiling mechanism to detect
polyvariant monomorphisation opportunities (where the callsite being inlined reveals that
it's actually monomorphic).

This is a ~28% speed-up on deltablue and a ~7% speed-up on richards, with small speed-ups on
other programs as well. It's about a 2% speed-up on Octane version 2, and never a regression
on anything we care about. Some aggregates, like V8Spider, see a regression. This is
highlighting the increase in profiling overhead. But since this doesn't show up on any major
score (code-load or SunSpider), it's probably not relevant.

Relanding after fixing debug assertions in fast/storage/serialized-script-value.html.

(JSC::CallEdge::dump):

  • bytecode/CallEdge.h: Added.

(JSC::CallEdge::operator!):
(JSC::CallEdge::callee):
(JSC::CallEdge::count):
(JSC::CallEdge::despecifiedClosure):
(JSC::CallEdge::CallEdge):

  • bytecode/CallEdgeProfile.cpp: Added.

(JSC::CallEdgeProfile::callEdges):
(JSC::CallEdgeProfile::numCallsToKnownCells):
(JSC::worthDespecifying):
(JSC::CallEdgeProfile::worthDespecifying):
(JSC::CallEdgeProfile::visitWeak):
(JSC::CallEdgeProfile::addSlow):
(JSC::CallEdgeProfile::mergeBack):
(JSC::CallEdgeProfile::fadeByHalf):
(JSC::CallEdgeLog::CallEdgeLog):
(JSC::CallEdgeLog::~CallEdgeLog):
(JSC::CallEdgeLog::isEnabled):
(JSC::operationProcessCallEdgeLog):
(JSC::CallEdgeLog::emitLogCode):
(JSC::CallEdgeLog::processLog):

  • bytecode/CallEdgeProfile.h: Added.

(JSC::CallEdgeProfile::numCallsToNotCell):
(JSC::CallEdgeProfile::numCallsToUnknownCell):
(JSC::CallEdgeProfile::totalCalls):

  • bytecode/CallEdgeProfileInlines.h: Added.

(JSC::CallEdgeProfile::CallEdgeProfile):
(JSC::CallEdgeProfile::add):

  • bytecode/CallLinkInfo.cpp:

(JSC::CallLinkInfo::visitWeak):

  • bytecode/CallLinkInfo.h:
  • bytecode/CallLinkStatus.cpp:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::computeFromLLInt):
(JSC::CallLinkStatus::computeFor):
(JSC::CallLinkStatus::computeExitSiteData):
(JSC::CallLinkStatus::computeFromCallLinkInfo):
(JSC::CallLinkStatus::computeFromCallEdgeProfile):
(JSC::CallLinkStatus::computeDFGStatuses):
(JSC::CallLinkStatus::isClosureCall):
(JSC::CallLinkStatus::makeClosureCall):
(JSC::CallLinkStatus::dump):
(JSC::CallLinkStatus::function): Deleted.
(JSC::CallLinkStatus::internalFunction): Deleted.
(JSC::CallLinkStatus::intrinsicFor): Deleted.

  • bytecode/CallLinkStatus.h:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::isSet):
(JSC::CallLinkStatus::couldTakeSlowPath):
(JSC::CallLinkStatus::edges):
(JSC::CallLinkStatus::size):
(JSC::CallLinkStatus::at):
(JSC::CallLinkStatus::operator[]):
(JSC::CallLinkStatus::canOptimize):
(JSC::CallLinkStatus::canTrustCounts):
(JSC::CallLinkStatus::isClosureCall): Deleted.
(JSC::CallLinkStatus::callTarget): Deleted.
(JSC::CallLinkStatus::executable): Deleted.
(JSC::CallLinkStatus::makeClosureCall): Deleted.

  • bytecode/CallVariant.cpp: Added.

(JSC::CallVariant::dump):

  • bytecode/CallVariant.h: Added.

(JSC::CallVariant::CallVariant):
(JSC::CallVariant::operator!):
(JSC::CallVariant::despecifiedClosure):
(JSC::CallVariant::rawCalleeCell):
(JSC::CallVariant::internalFunction):
(JSC::CallVariant::function):
(JSC::CallVariant::isClosureCall):
(JSC::CallVariant::executable):
(JSC::CallVariant::nonExecutableCallee):
(JSC::CallVariant::intrinsicFor):
(JSC::CallVariant::functionExecutable):
(JSC::CallVariant::isHashTableDeletedValue):
(JSC::CallVariant::operator==):
(JSC::CallVariant::operator!=):
(JSC::CallVariant::operator<):
(JSC::CallVariant::operator>):
(JSC::CallVariant::operator<=):
(JSC::CallVariant::operator>=):
(JSC::CallVariant::hash):
(JSC::CallVariant::deletedToken):
(JSC::CallVariantHash::hash):
(JSC::CallVariantHash::equal):

  • bytecode/CodeOrigin.h:

(JSC::InlineCallFrame::isNormalCall):

  • bytecode/ExitKind.cpp:

(JSC::exitKindToString):

  • bytecode/ExitKind.h:
  • bytecode/GetByIdStatus.cpp:

(JSC::GetByIdStatus::computeForStubInfo):

  • bytecode/PutByIdStatus.cpp:

(JSC::PutByIdStatus::computeForStubInfo):

  • dfg/DFGAbstractInterpreterInlines.h:

(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):

  • dfg/DFGBackwardsPropagationPhase.cpp:

(JSC::DFG::BackwardsPropagationPhase::propagate):

  • dfg/DFGBasicBlock.cpp:

(JSC::DFG::BasicBlock::~BasicBlock):

  • dfg/DFGBasicBlock.h:

(JSC::DFG::BasicBlock::takeLast):
(JSC::DFG::BasicBlock::didLink):

  • dfg/DFGByteCodeParser.cpp:

(JSC::DFG::ByteCodeParser::processSetLocalQueue):
(JSC::DFG::ByteCodeParser::removeLastNodeFromGraph):
(JSC::DFG::ByteCodeParser::addCallWithoutSettingResult):
(JSC::DFG::ByteCodeParser::addCall):
(JSC::DFG::ByteCodeParser::handleCall):
(JSC::DFG::ByteCodeParser::emitFunctionChecks):
(JSC::DFG::ByteCodeParser::undoFunctionChecks):
(JSC::DFG::ByteCodeParser::inliningCost):
(JSC::DFG::ByteCodeParser::inlineCall):
(JSC::DFG::ByteCodeParser::cancelLinkingForBlock):
(JSC::DFG::ByteCodeParser::attemptToInlineCall):
(JSC::DFG::ByteCodeParser::handleInlining):
(JSC::DFG::ByteCodeParser::handleConstantInternalFunction):
(JSC::DFG::ByteCodeParser::prepareToParseBlock):
(JSC::DFG::ByteCodeParser::clearCaches):
(JSC::DFG::ByteCodeParser::parseBlock):
(JSC::DFG::ByteCodeParser::linkBlock):
(JSC::DFG::ByteCodeParser::linkBlocks):
(JSC::DFG::ByteCodeParser::parseCodeBlock):

  • dfg/DFGCPSRethreadingPhase.cpp:

(JSC::DFG::CPSRethreadingPhase::freeUnnecessaryNodes):

  • dfg/DFGClobberize.h:

(JSC::DFG::clobberize):

  • dfg/DFGCommon.h:
  • dfg/DFGConstantFoldingPhase.cpp:

(JSC::DFG::ConstantFoldingPhase::foldConstants):

  • dfg/DFGDoesGC.cpp:

(JSC::DFG::doesGC):

  • dfg/DFGDriver.cpp:

(JSC::DFG::compileImpl):

  • dfg/DFGFixupPhase.cpp:

(JSC::DFG::FixupPhase::fixupNode):

  • dfg/DFGGraph.cpp:

(JSC::DFG::Graph::dump):
(JSC::DFG::Graph::getBlocksInPreOrder):
(JSC::DFG::Graph::visitChildren):

  • dfg/DFGJITCompiler.cpp:

(JSC::DFG::JITCompiler::link):

  • dfg/DFGLazyJSValue.cpp:

(JSC::DFG::LazyJSValue::switchLookupValue):

  • dfg/DFGLazyJSValue.h:

(JSC::DFG::LazyJSValue::switchLookupValue): Deleted.

  • dfg/DFGNode.cpp:

(WTF::printInternal):

  • dfg/DFGNode.h:

(JSC::DFG::OpInfo::OpInfo):
(JSC::DFG::Node::hasHeapPrediction):
(JSC::DFG::Node::hasCellOperand):
(JSC::DFG::Node::cellOperand):
(JSC::DFG::Node::setCellOperand):
(JSC::DFG::Node::canBeKnownFunction): Deleted.
(JSC::DFG::Node::hasKnownFunction): Deleted.
(JSC::DFG::Node::knownFunction): Deleted.
(JSC::DFG::Node::giveKnownFunction): Deleted.
(JSC::DFG::Node::hasFunction): Deleted.
(JSC::DFG::Node::function): Deleted.
(JSC::DFG::Node::hasExecutable): Deleted.
(JSC::DFG::Node::executable): Deleted.

  • dfg/DFGNodeType.h:
  • dfg/DFGPhantomCanonicalizationPhase.cpp:

(JSC::DFG::PhantomCanonicalizationPhase::run):

  • dfg/DFGPhantomRemovalPhase.cpp:

(JSC::DFG::PhantomRemovalPhase::run):

  • dfg/DFGPredictionPropagationPhase.cpp:

(JSC::DFG::PredictionPropagationPhase::propagate):

  • dfg/DFGSafeToExecute.h:

(JSC::DFG::safeToExecute):

  • dfg/DFGSpeculativeJIT.cpp:

(JSC::DFG::SpeculativeJIT::emitSwitch):

  • dfg/DFGSpeculativeJIT32_64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGSpeculativeJIT64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGStructureRegistrationPhase.cpp:

(JSC::DFG::StructureRegistrationPhase::run):

  • dfg/DFGTierUpCheckInjectionPhase.cpp:

(JSC::DFG::TierUpCheckInjectionPhase::run):
(JSC::DFG::TierUpCheckInjectionPhase::removeFTLProfiling):

  • dfg/DFGValidate.cpp:

(JSC::DFG::Validate::validate):

  • dfg/DFGWatchpointCollectionPhase.cpp:

(JSC::DFG::WatchpointCollectionPhase::handle):

  • ftl/FTLCapabilities.cpp:

(JSC::FTL::canCompile):

  • ftl/FTLLowerDFGToLLVM.cpp:

(JSC::FTL::ftlUnreachable):
(JSC::FTL::LowerDFGToLLVM::lower):
(JSC::FTL::LowerDFGToLLVM::compileNode):
(JSC::FTL::LowerDFGToLLVM::compileCheckCell):
(JSC::FTL::LowerDFGToLLVM::compileCheckBadCell):
(JSC::FTL::LowerDFGToLLVM::compileGetExecutable):
(JSC::FTL::LowerDFGToLLVM::compileNativeCallOrConstruct):
(JSC::FTL::LowerDFGToLLVM::compileSwitch):
(JSC::FTL::LowerDFGToLLVM::buildSwitch):
(JSC::FTL::LowerDFGToLLVM::compileCheckFunction): Deleted.
(JSC::FTL::LowerDFGToLLVM::compileCheckExecutable): Deleted.

  • heap/Heap.cpp:

(JSC::Heap::collect):

  • jit/AssemblyHelpers.h:

(JSC::AssemblyHelpers::storeValue):
(JSC::AssemblyHelpers::loadValue):

  • jit/CCallHelpers.h:

(JSC::CCallHelpers::setupArguments):

  • jit/GPRInfo.h:

(JSC::JSValueRegs::uses):

  • jit/JITCall.cpp:

(JSC::JIT::compileOpCall):

  • jit/JITCall32_64.cpp:

(JSC::JIT::compileOpCall):

  • runtime/Options.h:
  • runtime/VM.cpp:

(JSC::VM::ensureCallEdgeLog):

  • runtime/VM.h:
  • tests/stress/fold-profiled-call-to-call.js: Added. This test pinpoints the problem we saw in fast/storage/serialized-script-value.html.
  • tests/stress/new-array-then-exit.js: Added.
  • tests/stress/poly-call-exit-this.js: Added.
  • tests/stress/poly-call-exit.js: Added.

Source/WTF:


Add some power that I need for call edge profiling.

  • wtf/OwnPtr.h:

(WTF::OwnPtr<T>::createTransactionally):

  • wtf/Spectrum.h:

(WTF::Spectrum::add):
(WTF::Spectrum::addAll):
(WTF::Spectrum::get):
(WTF::Spectrum::size):
(WTF::Spectrum::KeyAndCount::KeyAndCount):
(WTF::Spectrum::clear):
(WTF::Spectrum::removeIf):

LayoutTests:

  • js/regress/script-tests/simple-poly-call-nested.js: Added.
  • js/regress/script-tests/simple-poly-call.js: Added.
  • js/regress/simple-poly-call-expected.txt: Added.
  • js/regress/simple-poly-call-nested-expected.txt: Added.
  • js/regress/simple-poly-call-nested.html: Added.
  • js/regress/simple-poly-call.html: Added.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/bytecode/CallLinkStatus.h

    r172961 r173069  
    4747public:
    4848    CallLinkStatus()
    49         : m_executable(0)
    50         , m_couldTakeSlowPath(false)
     49        : m_couldTakeSlowPath(false)
    5150        , m_isProved(false)
     51        , m_canTrustCounts(false)
    5252    {
    5353    }
     
    6262    explicit CallLinkStatus(JSValue);
    6363   
    64     CallLinkStatus(ExecutableBase* executable)
    65         : m_executable(executable)
     64    CallLinkStatus(CallVariant variant)
     65        : m_edges(1, CallEdge(variant, 1))
    6666        , m_couldTakeSlowPath(false)
    6767        , m_isProved(false)
     68        , m_canTrustCounts(false)
    6869    {
    6970    }
     
    9394    // Computes the status assuming that we never took slow path and never previously
    9495    // exited.
    95     static CallLinkStatus computeFor(const ConcurrentJITLocker&, CallLinkInfo&);
    96     static CallLinkStatus computeFor(const ConcurrentJITLocker&, CallLinkInfo&, ExitSiteData);
     96    static CallLinkStatus computeFor(const ConcurrentJITLocker&, CodeBlock*, CallLinkInfo&);
     97    static CallLinkStatus computeFor(
     98        const ConcurrentJITLocker&, CodeBlock*, CallLinkInfo&, ExitSiteData);
    9799#endif
    98100   
     
    108110        CodeBlock*, CodeOrigin, const CallLinkInfoMap&, const ContextMap&);
    109111   
    110     bool isSet() const { return m_callTarget || m_executable || m_couldTakeSlowPath; }
     112    bool isSet() const { return !m_edges.isEmpty() || m_couldTakeSlowPath; }
    111113   
    112114    bool operator!() const { return !isSet(); }
    113115   
    114116    bool couldTakeSlowPath() const { return m_couldTakeSlowPath; }
    115     bool isClosureCall() const { return m_executable && !m_callTarget; }
    116117   
    117     JSValue callTarget() const { return m_callTarget; }
    118     JSFunction* function() const;
    119     InternalFunction* internalFunction() const;
    120     Intrinsic intrinsicFor(CodeSpecializationKind) const;
    121     ExecutableBase* executable() const { return m_executable; }
     118    CallEdgeList edges() const { return m_edges; }
     119    unsigned size() const { return m_edges.size(); }
     120    CallEdge at(unsigned i) const { return m_edges[i]; }
     121    CallEdge operator[](unsigned i) const { return at(i); }
    122122    bool isProved() const { return m_isProved; }
    123     bool canOptimize() const { return (m_callTarget || m_executable) && !m_couldTakeSlowPath; }
     123    bool canOptimize() const { return !m_edges.isEmpty(); }
     124    bool canTrustCounts() const { return m_canTrustCounts; }
     125   
     126    bool isClosureCall() const; // Returns true if any callee is a closure call.
    124127   
    125128    void dump(PrintStream&) const;
    126129   
    127130private:
    128     void makeClosureCall()
    129     {
    130         ASSERT(!m_isProved);
    131         // Turn this into a closure call.
    132         m_callTarget = JSValue();
    133     }
     131    void makeClosureCall();
    134132   
    135133    static CallLinkStatus computeFromLLInt(const ConcurrentJITLocker&, CodeBlock*, unsigned bytecodeIndex);
     134#if ENABLE(JIT)
     135    static CallLinkStatus computeFromCallEdgeProfile(CallEdgeProfile*);
     136    static CallLinkStatus computeFromCallLinkInfo(
     137        const ConcurrentJITLocker&, CallLinkInfo&);
     138#endif
    136139   
    137     JSValue m_callTarget;
    138     ExecutableBase* m_executable;
     140    CallEdgeList m_edges;
    139141    bool m_couldTakeSlowPath;
    140142    bool m_isProved;
     143    bool m_canTrustCounts;
    141144};
    142145
Note: See TracChangeset for help on using the changeset viewer.