Ignore:
Timestamp:
Aug 28, 2014, 12:09:48 PM (11 years ago)
Author:
[email protected]
Message:

FTL should be able to do polymorphic call inlining
https://bugs.webkit.org/show_bug.cgi?id=135145

Reviewed by Geoffrey Garen.
Source/JavaScriptCore:


Added a log-based high-fidelity call edge profiler that runs in DFG JIT (and optionally
baseline JIT) code. Used it to do precise polymorphic inlining in the FTL. Potential
inlining sites use the call edge profile if it is available, but they will still fall back
on the call inline cache and rare case counts if it's not. Polymorphic inlining means that
multiple possible callees can be inlined with a switch to guard them. The slow path may
either be an OSR exit or a virtual call.

The call edge profiling added in this patch is very precise - it will tell you about every
call that has ever happened. It took some effort to reduce the overhead of this profiling.
This mostly involved ensuring that we don't do it unnecessarily. For example, we avoid it
in the baseline JIT (you can conditionally enable it but it's off by default) and we only do
it in the DFG JIT if we know that the regular inline cache profiling wasn't precise enough.
I also experimented with reducing the precision of the profiling. This led to a significant
reduction in the speed-up, so I avoided this approach. I also explored making log processing
concurrent, but that didn't help. Also, I tested the overhead of the log processing and
found that most of the overhead of this profiling is actually in putting things into the log
rather than in processing the log - that part appears to be surprisingly cheap.

Polymorphic inlining could be enabled in the DFG if we enabled baseline call edge profiling,
and if we guarded such inlining sites with some profiling mechanism to detect
polyvariant monomorphisation opportunities (where the callsite being inlined reveals that
it's actually monomorphic).

This is a ~28% speed-up on deltablue and a ~7% speed-up on richards, with small speed-ups on
other programs as well. It's about a 2% speed-up on Octane version 2, and never a regression
on anything we care about. Some aggregates, like V8Spider, see a regression. This is
highlighting the increase in profiling overhead. But since this doesn't show up on any major
score (code-load or SunSpider), it's probably not relevant.

Relanding after fixing debug assertions in fast/storage/serialized-script-value.html.

(JSC::CallEdge::dump):

  • bytecode/CallEdge.h: Added.

(JSC::CallEdge::operator!):
(JSC::CallEdge::callee):
(JSC::CallEdge::count):
(JSC::CallEdge::despecifiedClosure):
(JSC::CallEdge::CallEdge):

  • bytecode/CallEdgeProfile.cpp: Added.

(JSC::CallEdgeProfile::callEdges):
(JSC::CallEdgeProfile::numCallsToKnownCells):
(JSC::worthDespecifying):
(JSC::CallEdgeProfile::worthDespecifying):
(JSC::CallEdgeProfile::visitWeak):
(JSC::CallEdgeProfile::addSlow):
(JSC::CallEdgeProfile::mergeBack):
(JSC::CallEdgeProfile::fadeByHalf):
(JSC::CallEdgeLog::CallEdgeLog):
(JSC::CallEdgeLog::~CallEdgeLog):
(JSC::CallEdgeLog::isEnabled):
(JSC::operationProcessCallEdgeLog):
(JSC::CallEdgeLog::emitLogCode):
(JSC::CallEdgeLog::processLog):

  • bytecode/CallEdgeProfile.h: Added.

(JSC::CallEdgeProfile::numCallsToNotCell):
(JSC::CallEdgeProfile::numCallsToUnknownCell):
(JSC::CallEdgeProfile::totalCalls):

  • bytecode/CallEdgeProfileInlines.h: Added.

(JSC::CallEdgeProfile::CallEdgeProfile):
(JSC::CallEdgeProfile::add):

  • bytecode/CallLinkInfo.cpp:

(JSC::CallLinkInfo::visitWeak):

  • bytecode/CallLinkInfo.h:
  • bytecode/CallLinkStatus.cpp:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::computeFromLLInt):
(JSC::CallLinkStatus::computeFor):
(JSC::CallLinkStatus::computeExitSiteData):
(JSC::CallLinkStatus::computeFromCallLinkInfo):
(JSC::CallLinkStatus::computeFromCallEdgeProfile):
(JSC::CallLinkStatus::computeDFGStatuses):
(JSC::CallLinkStatus::isClosureCall):
(JSC::CallLinkStatus::makeClosureCall):
(JSC::CallLinkStatus::dump):
(JSC::CallLinkStatus::function): Deleted.
(JSC::CallLinkStatus::internalFunction): Deleted.
(JSC::CallLinkStatus::intrinsicFor): Deleted.

  • bytecode/CallLinkStatus.h:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::isSet):
(JSC::CallLinkStatus::couldTakeSlowPath):
(JSC::CallLinkStatus::edges):
(JSC::CallLinkStatus::size):
(JSC::CallLinkStatus::at):
(JSC::CallLinkStatus::operator[]):
(JSC::CallLinkStatus::canOptimize):
(JSC::CallLinkStatus::canTrustCounts):
(JSC::CallLinkStatus::isClosureCall): Deleted.
(JSC::CallLinkStatus::callTarget): Deleted.
(JSC::CallLinkStatus::executable): Deleted.
(JSC::CallLinkStatus::makeClosureCall): Deleted.

  • bytecode/CallVariant.cpp: Added.

(JSC::CallVariant::dump):

  • bytecode/CallVariant.h: Added.

(JSC::CallVariant::CallVariant):
(JSC::CallVariant::operator!):
(JSC::CallVariant::despecifiedClosure):
(JSC::CallVariant::rawCalleeCell):
(JSC::CallVariant::internalFunction):
(JSC::CallVariant::function):
(JSC::CallVariant::isClosureCall):
(JSC::CallVariant::executable):
(JSC::CallVariant::nonExecutableCallee):
(JSC::CallVariant::intrinsicFor):
(JSC::CallVariant::functionExecutable):
(JSC::CallVariant::isHashTableDeletedValue):
(JSC::CallVariant::operator==):
(JSC::CallVariant::operator!=):
(JSC::CallVariant::operator<):
(JSC::CallVariant::operator>):
(JSC::CallVariant::operator<=):
(JSC::CallVariant::operator>=):
(JSC::CallVariant::hash):
(JSC::CallVariant::deletedToken):
(JSC::CallVariantHash::hash):
(JSC::CallVariantHash::equal):

  • bytecode/CodeOrigin.h:

(JSC::InlineCallFrame::isNormalCall):

  • bytecode/ExitKind.cpp:

(JSC::exitKindToString):

  • bytecode/ExitKind.h:
  • bytecode/GetByIdStatus.cpp:

(JSC::GetByIdStatus::computeForStubInfo):

  • bytecode/PutByIdStatus.cpp:

(JSC::PutByIdStatus::computeForStubInfo):

  • dfg/DFGAbstractInterpreterInlines.h:

(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):

  • dfg/DFGBackwardsPropagationPhase.cpp:

(JSC::DFG::BackwardsPropagationPhase::propagate):

  • dfg/DFGBasicBlock.cpp:

(JSC::DFG::BasicBlock::~BasicBlock):

  • dfg/DFGBasicBlock.h:

(JSC::DFG::BasicBlock::takeLast):
(JSC::DFG::BasicBlock::didLink):

  • dfg/DFGByteCodeParser.cpp:

(JSC::DFG::ByteCodeParser::processSetLocalQueue):
(JSC::DFG::ByteCodeParser::removeLastNodeFromGraph):
(JSC::DFG::ByteCodeParser::addCallWithoutSettingResult):
(JSC::DFG::ByteCodeParser::addCall):
(JSC::DFG::ByteCodeParser::handleCall):
(JSC::DFG::ByteCodeParser::emitFunctionChecks):
(JSC::DFG::ByteCodeParser::undoFunctionChecks):
(JSC::DFG::ByteCodeParser::inliningCost):
(JSC::DFG::ByteCodeParser::inlineCall):
(JSC::DFG::ByteCodeParser::cancelLinkingForBlock):
(JSC::DFG::ByteCodeParser::attemptToInlineCall):
(JSC::DFG::ByteCodeParser::handleInlining):
(JSC::DFG::ByteCodeParser::handleConstantInternalFunction):
(JSC::DFG::ByteCodeParser::prepareToParseBlock):
(JSC::DFG::ByteCodeParser::clearCaches):
(JSC::DFG::ByteCodeParser::parseBlock):
(JSC::DFG::ByteCodeParser::linkBlock):
(JSC::DFG::ByteCodeParser::linkBlocks):
(JSC::DFG::ByteCodeParser::parseCodeBlock):

  • dfg/DFGCPSRethreadingPhase.cpp:

(JSC::DFG::CPSRethreadingPhase::freeUnnecessaryNodes):

  • dfg/DFGClobberize.h:

(JSC::DFG::clobberize):

  • dfg/DFGCommon.h:
  • dfg/DFGConstantFoldingPhase.cpp:

(JSC::DFG::ConstantFoldingPhase::foldConstants):

  • dfg/DFGDoesGC.cpp:

(JSC::DFG::doesGC):

  • dfg/DFGDriver.cpp:

(JSC::DFG::compileImpl):

  • dfg/DFGFixupPhase.cpp:

(JSC::DFG::FixupPhase::fixupNode):

  • dfg/DFGGraph.cpp:

(JSC::DFG::Graph::dump):
(JSC::DFG::Graph::getBlocksInPreOrder):
(JSC::DFG::Graph::visitChildren):

  • dfg/DFGJITCompiler.cpp:

(JSC::DFG::JITCompiler::link):

  • dfg/DFGLazyJSValue.cpp:

(JSC::DFG::LazyJSValue::switchLookupValue):

  • dfg/DFGLazyJSValue.h:

(JSC::DFG::LazyJSValue::switchLookupValue): Deleted.

  • dfg/DFGNode.cpp:

(WTF::printInternal):

  • dfg/DFGNode.h:

(JSC::DFG::OpInfo::OpInfo):
(JSC::DFG::Node::hasHeapPrediction):
(JSC::DFG::Node::hasCellOperand):
(JSC::DFG::Node::cellOperand):
(JSC::DFG::Node::setCellOperand):
(JSC::DFG::Node::canBeKnownFunction): Deleted.
(JSC::DFG::Node::hasKnownFunction): Deleted.
(JSC::DFG::Node::knownFunction): Deleted.
(JSC::DFG::Node::giveKnownFunction): Deleted.
(JSC::DFG::Node::hasFunction): Deleted.
(JSC::DFG::Node::function): Deleted.
(JSC::DFG::Node::hasExecutable): Deleted.
(JSC::DFG::Node::executable): Deleted.

  • dfg/DFGNodeType.h:
  • dfg/DFGPhantomCanonicalizationPhase.cpp:

(JSC::DFG::PhantomCanonicalizationPhase::run):

  • dfg/DFGPhantomRemovalPhase.cpp:

(JSC::DFG::PhantomRemovalPhase::run):

  • dfg/DFGPredictionPropagationPhase.cpp:

(JSC::DFG::PredictionPropagationPhase::propagate):

  • dfg/DFGSafeToExecute.h:

(JSC::DFG::safeToExecute):

  • dfg/DFGSpeculativeJIT.cpp:

(JSC::DFG::SpeculativeJIT::emitSwitch):

  • dfg/DFGSpeculativeJIT32_64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGSpeculativeJIT64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGStructureRegistrationPhase.cpp:

(JSC::DFG::StructureRegistrationPhase::run):

  • dfg/DFGTierUpCheckInjectionPhase.cpp:

(JSC::DFG::TierUpCheckInjectionPhase::run):
(JSC::DFG::TierUpCheckInjectionPhase::removeFTLProfiling):

  • dfg/DFGValidate.cpp:

(JSC::DFG::Validate::validate):

  • dfg/DFGWatchpointCollectionPhase.cpp:

(JSC::DFG::WatchpointCollectionPhase::handle):

  • ftl/FTLCapabilities.cpp:

(JSC::FTL::canCompile):

  • ftl/FTLLowerDFGToLLVM.cpp:

(JSC::FTL::ftlUnreachable):
(JSC::FTL::LowerDFGToLLVM::lower):
(JSC::FTL::LowerDFGToLLVM::compileNode):
(JSC::FTL::LowerDFGToLLVM::compileCheckCell):
(JSC::FTL::LowerDFGToLLVM::compileCheckBadCell):
(JSC::FTL::LowerDFGToLLVM::compileGetExecutable):
(JSC::FTL::LowerDFGToLLVM::compileNativeCallOrConstruct):
(JSC::FTL::LowerDFGToLLVM::compileSwitch):
(JSC::FTL::LowerDFGToLLVM::buildSwitch):
(JSC::FTL::LowerDFGToLLVM::compileCheckFunction): Deleted.
(JSC::FTL::LowerDFGToLLVM::compileCheckExecutable): Deleted.

  • heap/Heap.cpp:

(JSC::Heap::collect):

  • jit/AssemblyHelpers.h:

(JSC::AssemblyHelpers::storeValue):
(JSC::AssemblyHelpers::loadValue):

  • jit/CCallHelpers.h:

(JSC::CCallHelpers::setupArguments):

  • jit/GPRInfo.h:

(JSC::JSValueRegs::uses):

  • jit/JITCall.cpp:

(JSC::JIT::compileOpCall):

  • jit/JITCall32_64.cpp:

(JSC::JIT::compileOpCall):

  • runtime/Options.h:
  • runtime/VM.cpp:

(JSC::VM::ensureCallEdgeLog):

  • runtime/VM.h:
  • tests/stress/fold-profiled-call-to-call.js: Added. This test pinpoints the problem we saw in fast/storage/serialized-script-value.html.
  • tests/stress/new-array-then-exit.js: Added.
  • tests/stress/poly-call-exit-this.js: Added.
  • tests/stress/poly-call-exit.js: Added.

Source/WTF:


Add some power that I need for call edge profiling.

  • wtf/OwnPtr.h:

(WTF::OwnPtr<T>::createTransactionally):

  • wtf/Spectrum.h:

(WTF::Spectrum::add):
(WTF::Spectrum::addAll):
(WTF::Spectrum::get):
(WTF::Spectrum::size):
(WTF::Spectrum::KeyAndCount::KeyAndCount):
(WTF::Spectrum::clear):
(WTF::Spectrum::removeIf):

LayoutTests:

  • js/regress/script-tests/simple-poly-call-nested.js: Added.
  • js/regress/script-tests/simple-poly-call.js: Added.
  • js/regress/simple-poly-call-expected.txt: Added.
  • js/regress/simple-poly-call-nested-expected.txt: Added.
  • js/regress/simple-poly-call-nested.html: Added.
  • js/regress/simple-poly-call.html: Added.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/bytecode/CallLinkStatus.cpp

    r172961 r173069  
    3333#include "JSCInlines.h"
    3434#include <wtf/CommaPrinter.h>
     35#include <wtf/ListDump.h>
    3536
    3637namespace JSC {
     
    3940
    4041CallLinkStatus::CallLinkStatus(JSValue value)
    41     : m_callTarget(value)
    42     , m_executable(0)
    43     , m_couldTakeSlowPath(false)
     42    : m_couldTakeSlowPath(false)
    4443    , m_isProved(false)
    4544{
    46     if (!value || !value.isCell())
     45    if (!value || !value.isCell()) {
     46        m_couldTakeSlowPath = true;
    4747        return;
    48    
    49     if (!value.asCell()->inherits(JSFunction::info()))
    50         return;
    51    
    52     m_executable = jsCast<JSFunction*>(value.asCell())->executable();
    53 }
    54 
    55 JSFunction* CallLinkStatus::function() const
    56 {
    57     if (!m_callTarget || !m_callTarget.isCell())
    58         return 0;
    59    
    60     if (!m_callTarget.asCell()->inherits(JSFunction::info()))
    61         return 0;
    62    
    63     return jsCast<JSFunction*>(m_callTarget.asCell());
    64 }
    65 
    66 InternalFunction* CallLinkStatus::internalFunction() const
    67 {
    68     if (!m_callTarget || !m_callTarget.isCell())
    69         return 0;
    70    
    71     if (!m_callTarget.asCell()->inherits(InternalFunction::info()))
    72         return 0;
    73    
    74     return jsCast<InternalFunction*>(m_callTarget.asCell());
    75 }
    76 
    77 Intrinsic CallLinkStatus::intrinsicFor(CodeSpecializationKind kind) const
    78 {
    79     if (!m_executable)
    80         return NoIntrinsic;
    81    
    82     return m_executable->intrinsicFor(kind);
     48    }
     49   
     50    m_edges.append(CallEdge(CallVariant(value.asCell()), 1));
    8351}
    8452
     
    8856    UNUSED_PARAM(bytecodeIndex);
    8957#if ENABLE(DFG_JIT)
    90     if (profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadFunction))) {
     58    if (profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadCell))) {
    9159        // We could force this to be a closure call, but instead we'll just assume that it
    9260        // takes slow path.
     
    12694        return computeFromLLInt(locker, profiledBlock, bytecodeIndex);
    12795   
    128     return computeFor(locker, *callLinkInfo, exitSiteData);
     96    return computeFor(locker, profiledBlock, *callLinkInfo, exitSiteData);
    12997#else
    13098    return CallLinkStatus();
     
    140108#if ENABLE(DFG_JIT)
    141109    exitSiteData.m_takesSlowPath =
    142         profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadCache, exitingJITType))
     110        profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadType, exitingJITType))
    143111        || profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadExecutable, exitingJITType));
    144112    exitSiteData.m_badFunction =
    145         profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadFunction, exitingJITType));
     113        profiledBlock->hasExitSite(locker, DFG::FrequentExitSite(bytecodeIndex, BadCell, exitingJITType));
    146114#else
    147115    UNUSED_PARAM(locker);
     
    155123
    156124#if ENABLE(JIT)
    157 CallLinkStatus CallLinkStatus::computeFor(const ConcurrentJITLocker&, CallLinkInfo& callLinkInfo)
     125CallLinkStatus CallLinkStatus::computeFor(
     126    const ConcurrentJITLocker& locker, CodeBlock* profiledBlock, CallLinkInfo& callLinkInfo)
     127{
     128    // We don't really need this, but anytime we have to debug this code, it becomes indispensable.
     129    UNUSED_PARAM(profiledBlock);
     130   
     131    if (Options::callStatusShouldUseCallEdgeProfile()) {
     132        // Always trust the call edge profile over anything else since this has precise counts.
     133        // It can make the best possible decision because it never "forgets" what happened for any
     134        // call, with the exception of fading out the counts of old calls (for example if the
     135        // counter type is 16-bit then calls that happened more than 2^16 calls ago are given half
     136        // weight, and this compounds for every 2^15 [sic] calls after that). The combination of
     137        // high fidelity for recent calls and fading for older calls makes this the most useful
     138        // mechamism of choosing how to optimize future calls.
     139        CallEdgeProfile* edgeProfile = callLinkInfo.callEdgeProfile.get();
     140        WTF::loadLoadFence();
     141        if (edgeProfile) {
     142            CallLinkStatus result = computeFromCallEdgeProfile(edgeProfile);
     143            if (!!result)
     144                return result;
     145        }
     146    }
     147   
     148    return computeFromCallLinkInfo(locker, callLinkInfo);
     149}
     150
     151CallLinkStatus CallLinkStatus::computeFromCallLinkInfo(
     152    const ConcurrentJITLocker&, CallLinkInfo& callLinkInfo)
    158153{
    159154    // Note that despite requiring that the locker is held, this code is racy with respect
     
    178173    JSFunction* target = callLinkInfo.lastSeenCallee.get();
    179174    if (!target)
    180         return CallLinkStatus();
     175        return takesSlowPath();
    181176   
    182177    if (callLinkInfo.hasSeenClosure)
     
    186181}
    187182
     183CallLinkStatus CallLinkStatus::computeFromCallEdgeProfile(CallEdgeProfile* edgeProfile)
     184{
     185    // In cases where the call edge profile saw nothing, use the CallLinkInfo instead.
     186    if (!edgeProfile->totalCalls())
     187        return CallLinkStatus();
     188   
     189    // To do anything meaningful, we require that the majority of calls are to something we
     190    // know how to handle.
     191    unsigned numCallsToKnown = edgeProfile->numCallsToKnownCells();
     192    unsigned numCallsToUnknown = edgeProfile->numCallsToNotCell() + edgeProfile->numCallsToUnknownCell();
     193   
     194    // We require that the majority of calls were to something that we could possibly inline.
     195    if (numCallsToKnown <= numCallsToUnknown)
     196        return takesSlowPath();
     197   
     198    // We require that the number of such calls is greater than some minimal threshold, so that we
     199    // avoid inlining completely cold calls.
     200    if (numCallsToKnown < Options::frequentCallThreshold())
     201        return takesSlowPath();
     202   
     203    CallLinkStatus result;
     204    result.m_edges = edgeProfile->callEdges();
     205    result.m_couldTakeSlowPath = !!numCallsToUnknown;
     206    result.m_canTrustCounts = true;
     207   
     208    return result;
     209}
     210
    188211CallLinkStatus CallLinkStatus::computeFor(
    189     const ConcurrentJITLocker& locker, CallLinkInfo& callLinkInfo, ExitSiteData exitSiteData)
    190 {
    191     if (exitSiteData.m_takesSlowPath)
    192         return takesSlowPath();
    193    
    194     CallLinkStatus result = computeFor(locker, callLinkInfo);
     212    const ConcurrentJITLocker& locker, CodeBlock* profiledBlock, CallLinkInfo& callLinkInfo,
     213    ExitSiteData exitSiteData)
     214{
     215    CallLinkStatus result = computeFor(locker, profiledBlock, callLinkInfo);
    195216    if (exitSiteData.m_badFunction)
    196217        result.makeClosureCall();
     218    if (exitSiteData.m_takesSlowPath)
     219        result.m_couldTakeSlowPath = true;
    197220   
    198221    return result;
     
    228251        {
    229252            ConcurrentJITLocker locker(dfgCodeBlock->m_lock);
    230             map.add(info.codeOrigin, computeFor(locker, info, exitSiteData));
     253            map.add(info.codeOrigin, computeFor(locker, dfgCodeBlock, info, exitSiteData));
    231254        }
    232255    }
     
    257280}
    258281
     282bool CallLinkStatus::isClosureCall() const
     283{
     284    for (unsigned i = m_edges.size(); i--;) {
     285        if (m_edges[i].callee().isClosureCall())
     286            return true;
     287    }
     288    return false;
     289}
     290
     291void CallLinkStatus::makeClosureCall()
     292{
     293    ASSERT(!m_isProved);
     294    for (unsigned i = m_edges.size(); i--;)
     295        m_edges[i] = m_edges[i].despecifiedClosure();
     296   
     297    if (!ASSERT_DISABLED) {
     298        // Doing this should not have created duplicates, because the CallEdgeProfile
     299        // should despecify closures if doing so would reduce the number of known callees.
     300        for (unsigned i = 0; i < m_edges.size(); ++i) {
     301            for (unsigned j = i + 1; j < m_edges.size(); ++j)
     302                ASSERT(m_edges[i].callee() != m_edges[j].callee());
     303        }
     304    }
     305}
     306
    259307void CallLinkStatus::dump(PrintStream& out) const
    260308{
     
    272320        out.print(comma, "Could Take Slow Path");
    273321   
    274     if (m_callTarget)
    275         out.print(comma, "Known target: ", m_callTarget);
    276    
    277     if (m_executable) {
    278         out.print(comma, "Executable/CallHash: ", RawPointer(m_executable));
    279         if (!isCompilationThread())
    280             out.print("/", m_executable->hashFor(CodeForCall));
    281     }
     322    out.print(listDump(m_edges));
    282323}
    283324
Note: See TracChangeset for help on using the changeset viewer.