Ignore:
Timestamp:
Feb 2, 2015, 10:38:08 AM (11 years ago)
Author:
[email protected]
Message:

Polymorphic call inlining should be based on polymorphic call inline caching rather than logging
https://bugs.webkit.org/show_bug.cgi?id=140660

Reviewed by Geoffrey Garen.

When we first implemented polymorphic call inlining, we did the profiling based on a call
edge log. The idea was to store each call edge (a tuple of call site and callee) into a
global log that was processed lazily. Processing the log would give precise counts of call
edges, and could be used to drive well-informed inlining decisions - polymorphic or not.
This was a speed-up on throughput tests but a slow-down for latency tests. It was a net win
nonetheless.

Experience with this code shows three things. First, the call edge profiler is buggy and
complex. It would take work to fix the bugs. Second, the call edge profiler incurs lots of
overhead for latency code that we care deeply about. Third, it's not at all clear that
having call edge counts for every possible callee is any better than just having call edge
counts for the limited number of callees that an inline cache would catch.

So, this patch removes the call edge profiler and replaces it with a polymorphic call inline
cache. If we miss the basic call inline cache, we inflate the cache to be a jump to an
out-of-line stub that cases on the previously known callees. If that misses again, then we
rewrite that stub to include the new callee. We do this up to some number of callees. If we
hit the limit then we switch to using a plain virtual call.

Substantial speed-up on V8Spider; undoes the slow-down that the original call edge profiler
caused. Might be a SunSpider speed-up (below 1%), depending on hardware.

Rolling this back in after fixing https://bugs.webkit.org/show_bug.cgi?id=141107.

(JSC::CallEdge::count):
(JSC::CallEdge::CallEdge):

  • bytecode/CallEdgeProfile.cpp: Removed.
  • bytecode/CallEdgeProfile.h: Removed.
  • bytecode/CallEdgeProfileInlines.h: Removed.
  • bytecode/CallLinkInfo.cpp:

(JSC::CallLinkInfo::unlink):
(JSC::CallLinkInfo::visitWeak):

  • bytecode/CallLinkInfo.h:
  • bytecode/CallLinkStatus.cpp:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::computeFor):
(JSC::CallLinkStatus::computeFromCallLinkInfo):
(JSC::CallLinkStatus::isClosureCall):
(JSC::CallLinkStatus::makeClosureCall):
(JSC::CallLinkStatus::dump):
(JSC::CallLinkStatus::computeFromCallEdgeProfile): Deleted.

  • bytecode/CallLinkStatus.h:

(JSC::CallLinkStatus::CallLinkStatus):
(JSC::CallLinkStatus::isSet):
(JSC::CallLinkStatus::variants):
(JSC::CallLinkStatus::size):
(JSC::CallLinkStatus::at):
(JSC::CallLinkStatus::operator[]):
(JSC::CallLinkStatus::canOptimize):
(JSC::CallLinkStatus::edges): Deleted.
(JSC::CallLinkStatus::canTrustCounts): Deleted.

  • bytecode/CallVariant.cpp:

(JSC::variantListWithVariant):
(JSC::despecifiedVariantList):

  • bytecode/CallVariant.h:
  • bytecode/CodeBlock.cpp:

(JSC::CodeBlock::~CodeBlock):
(JSC::CodeBlock::linkIncomingPolymorphicCall):
(JSC::CodeBlock::unlinkIncomingCalls):
(JSC::CodeBlock::noticeIncomingCall):

  • bytecode/CodeBlock.h:

(JSC::CodeBlock::isIncomingCallAlreadyLinked): Deleted.

  • dfg/DFGAbstractInterpreterInlines.h:

(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):

  • dfg/DFGByteCodeParser.cpp:

(JSC::DFG::ByteCodeParser::addCallWithoutSettingResult):
(JSC::DFG::ByteCodeParser::handleCall):
(JSC::DFG::ByteCodeParser::handleInlining):

  • dfg/DFGClobberize.h:

(JSC::DFG::clobberize):

  • dfg/DFGConstantFoldingPhase.cpp:

(JSC::DFG::ConstantFoldingPhase::foldConstants):

  • dfg/DFGDoesGC.cpp:

(JSC::DFG::doesGC):

  • dfg/DFGDriver.cpp:

(JSC::DFG::compileImpl):

  • dfg/DFGFixupPhase.cpp:

(JSC::DFG::FixupPhase::fixupNode):

  • dfg/DFGNode.h:

(JSC::DFG::Node::hasHeapPrediction):

  • dfg/DFGNodeType.h:
  • dfg/DFGOperations.cpp:
  • dfg/DFGPredictionPropagationPhase.cpp:

(JSC::DFG::PredictionPropagationPhase::propagate):

  • dfg/DFGSafeToExecute.h:

(JSC::DFG::safeToExecute):

  • dfg/DFGSpeculativeJIT32_64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGSpeculativeJIT64.cpp:

(JSC::DFG::SpeculativeJIT::emitCall):
(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGTierUpCheckInjectionPhase.cpp:

(JSC::DFG::TierUpCheckInjectionPhase::run):
(JSC::DFG::TierUpCheckInjectionPhase::removeFTLProfiling): Deleted.

  • ftl/FTLCapabilities.cpp:

(JSC::FTL::canCompile):

  • heap/Heap.cpp:

(JSC::Heap::collect):

  • jit/BinarySwitch.h:
  • jit/ClosureCallStubRoutine.cpp: Removed.
  • jit/ClosureCallStubRoutine.h: Removed.
  • jit/JITCall.cpp:

(JSC::JIT::compileOpCall):

  • jit/JITCall32_64.cpp:

(JSC::JIT::compileOpCall):

  • jit/JITOperations.cpp:
  • jit/JITOperations.h:

(JSC::operationLinkPolymorphicCallFor):
(JSC::operationLinkClosureCallFor): Deleted.

  • jit/JITStubRoutine.h:
  • jit/JITWriteBarrier.h:
  • jit/PolymorphicCallStubRoutine.cpp: Added.

(JSC::PolymorphicCallNode::~PolymorphicCallNode):
(JSC::PolymorphicCallNode::unlink):
(JSC::PolymorphicCallCase::dump):
(JSC::PolymorphicCallStubRoutine::PolymorphicCallStubRoutine):
(JSC::PolymorphicCallStubRoutine::~PolymorphicCallStubRoutine):
(JSC::PolymorphicCallStubRoutine::variants):
(JSC::PolymorphicCallStubRoutine::edges):
(JSC::PolymorphicCallStubRoutine::visitWeak):
(JSC::PolymorphicCallStubRoutine::markRequiredObjectsInternal):

  • jit/PolymorphicCallStubRoutine.h: Added.

(JSC::PolymorphicCallNode::PolymorphicCallNode):
(JSC::PolymorphicCallCase::PolymorphicCallCase):
(JSC::PolymorphicCallCase::variant):
(JSC::PolymorphicCallCase::codeBlock):

  • jit/Repatch.cpp:

(JSC::linkSlowFor):
(JSC::linkFor):
(JSC::revertCall):
(JSC::unlinkFor):
(JSC::linkVirtualFor):
(JSC::linkPolymorphicCall):
(JSC::linkClosureCall): Deleted.

  • jit/Repatch.h:
  • jit/ThunkGenerators.cpp:

(JSC::linkPolymorphicCallForThunkGenerator):
(JSC::linkPolymorphicCallThunkGenerator):
(JSC::linkPolymorphicCallThatPreservesRegsThunkGenerator):
(JSC::linkClosureCallForThunkGenerator): Deleted.
(JSC::linkClosureCallThunkGenerator): Deleted.
(JSC::linkClosureCallThatPreservesRegsThunkGenerator): Deleted.

  • jit/ThunkGenerators.h:

(JSC::linkPolymorphicCallThunkGeneratorFor):
(JSC::linkClosureCallThunkGeneratorFor): Deleted.

  • llint/LLIntSlowPaths.cpp:

(JSC::LLInt::jitCompileAndSetHeuristics):

  • runtime/Options.h:
  • runtime/VM.cpp:

(JSC::VM::prepareToDiscardCode):
(JSC::VM::ensureCallEdgeLog): Deleted.

  • runtime/VM.h:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/bytecode/CallLinkStatus.cpp

    r179392 r179478  
    11/*
    2  * Copyright (C) 2012, 2013, 2014 Apple Inc. All rights reserved.
     2 * Copyright (C) 2012-2015 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    4848    }
    4949   
    50     m_edges.append(CallEdge(CallVariant(value.asCell()), 1));
     50    m_variants.append(CallVariant(value.asCell()));
    5151}
    5252
     
    130130    UNUSED_PARAM(profiledBlock);
    131131   
    132     if (Options::callStatusShouldUseCallEdgeProfile()) {
    133         // Always trust the call edge profile over anything else since this has precise counts.
    134         // It can make the best possible decision because it never "forgets" what happened for any
    135         // call, with the exception of fading out the counts of old calls (for example if the
    136         // counter type is 16-bit then calls that happened more than 2^16 calls ago are given half
    137         // weight, and this compounds for every 2^15 [sic] calls after that). The combination of
    138         // high fidelity for recent calls and fading for older calls makes this the most useful
    139         // mechamism of choosing how to optimize future calls.
    140         CallEdgeProfile* edgeProfile = callLinkInfo.callEdgeProfile.get();
    141         WTF::loadLoadFence();
    142         if (edgeProfile) {
    143             CallLinkStatus result = computeFromCallEdgeProfile(edgeProfile);
    144             if (!!result)
    145                 return result;
    146         }
    147     }
    148    
    149132    return computeFromCallLinkInfo(locker, callLinkInfo);
    150133}
     
    166149    // is probably OK for now.
    167150   
     151    // PolymorphicCallStubRoutine is a GCAwareJITStubRoutine, so if non-null, it will stay alive
     152    // until next GC even if the CallLinkInfo is concurrently cleared. Also, the variants list is
     153    // never mutated after the PolymorphicCallStubRoutine is instantiated. We have some conservative
     154    // fencing in place to make sure that we see the variants list after construction.
     155    if (PolymorphicCallStubRoutine* stub = callLinkInfo.stub.get()) {
     156        WTF::loadLoadFence();
     157       
     158        CallEdgeList edges = stub->edges();
     159       
     160        // Now that we've loaded the edges list, there are no further concurrency concerns. We will
     161        // just manipulate and prune this list to our liking - mostly removing entries that are too
     162        // infrequent and ensuring that it's sorted in descending order of frequency.
     163       
     164        RELEASE_ASSERT(edges.size());
     165       
     166        std::sort(
     167            edges.begin(), edges.end(),
     168            [] (CallEdge a, CallEdge b) {
     169                return a.count() > b.count();
     170            });
     171        RELEASE_ASSERT(edges.first().count() >= edges.last().count());
     172       
     173        double totalCallsToKnown = 0;
     174        double totalCallsToUnknown = callLinkInfo.slowPathCount;
     175        CallVariantList variants;
     176        for (size_t i = 0; i < edges.size(); ++i) {
     177            CallEdge edge = edges[i];
     178            // If the call is at the tail of the distribution, then we don't optimize it and we
     179            // treat it as if it was a call to something unknown. We define the tail as being either
     180            // a call that doesn't belong to the N most frequent callees (N =
     181            // maxPolymorphicCallVariantsForInlining) or that has a total call count that is too
     182            // small.
     183            if (i >= Options::maxPolymorphicCallVariantsForInlining()
     184                || edge.count() < Options::frequentCallThreshold())
     185                totalCallsToUnknown += edge.count();
     186            else {
     187                totalCallsToKnown += edge.count();
     188                variants.append(edge.callee());
     189            }
     190        }
     191       
     192        // Bail if we didn't find any calls that qualified.
     193        RELEASE_ASSERT(!!totalCallsToKnown == !!variants.size());
     194        if (variants.isEmpty())
     195            return takesSlowPath();
     196       
     197        // We require that the distribution of callees is skewed towards a handful of common ones.
     198        if (totalCallsToKnown / totalCallsToUnknown < Options::minimumCallToKnownRate())
     199            return takesSlowPath();
     200       
     201        RELEASE_ASSERT(totalCallsToKnown);
     202        RELEASE_ASSERT(variants.size());
     203       
     204        CallLinkStatus result;
     205        result.m_variants = variants;
     206        result.m_couldTakeSlowPath = !!totalCallsToUnknown;
     207        return result;
     208    }
     209   
    168210    if (callLinkInfo.slowPathCount >= Options::couldTakeSlowCaseMinimumCount())
    169211        return takesSlowPath();
    170    
    171     if (ClosureCallStubRoutine* stub = callLinkInfo.stub.get())
    172         return CallLinkStatus(stub->executable());
    173212   
    174213    JSFunction* target = callLinkInfo.lastSeenCallee.get();
     
    180219
    181220    return CallLinkStatus(target);
    182 }
    183 
    184 CallLinkStatus CallLinkStatus::computeFromCallEdgeProfile(CallEdgeProfile* edgeProfile)
    185 {
    186     // In cases where the call edge profile saw nothing, use the CallLinkInfo instead.
    187     if (!edgeProfile->totalCalls())
    188         return CallLinkStatus();
    189    
    190     // To do anything meaningful, we require that the majority of calls are to something we
    191     // know how to handle.
    192     unsigned numCallsToKnown = edgeProfile->numCallsToKnownCells();
    193     unsigned numCallsToUnknown = edgeProfile->numCallsToNotCell() + edgeProfile->numCallsToUnknownCell();
    194    
    195     // We require that the majority of calls were to something that we could possibly inline.
    196     if (numCallsToKnown <= numCallsToUnknown)
    197         return takesSlowPath();
    198    
    199     // We require that the number of such calls is greater than some minimal threshold, so that we
    200     // avoid inlining completely cold calls.
    201     if (numCallsToKnown < Options::frequentCallThreshold())
    202         return takesSlowPath();
    203    
    204     CallLinkStatus result;
    205     result.m_edges = edgeProfile->callEdges();
    206     result.m_couldTakeSlowPath = !!numCallsToUnknown;
    207     result.m_canTrustCounts = true;
    208    
    209     return result;
    210221}
    211222
     
    283294bool CallLinkStatus::isClosureCall() const
    284295{
    285     for (unsigned i = m_edges.size(); i--;) {
    286         if (m_edges[i].callee().isClosureCall())
     296    for (unsigned i = m_variants.size(); i--;) {
     297        if (m_variants[i].isClosureCall())
    287298            return true;
    288299    }
     
    292303void CallLinkStatus::makeClosureCall()
    293304{
    294     ASSERT(!m_isProved);
    295     for (unsigned i = m_edges.size(); i--;)
    296         m_edges[i] = m_edges[i].despecifiedClosure();
    297    
    298     if (!ASSERT_DISABLED) {
    299         // Doing this should not have created duplicates, because the CallEdgeProfile
    300         // should despecify closures if doing so would reduce the number of known callees.
    301         for (unsigned i = 0; i < m_edges.size(); ++i) {
    302             for (unsigned j = i + 1; j < m_edges.size(); ++j)
    303                 ASSERT(m_edges[i].callee() != m_edges[j].callee());
    304         }
    305     }
     305    m_variants = despecifiedVariantList(m_variants);
    306306}
    307307
     
    321321        out.print(comma, "Could Take Slow Path");
    322322   
    323     out.print(listDump(m_edges));
     323    if (!m_variants.isEmpty())
     324        out.print(comma, listDump(m_variants));
    324325}
    325326
Note: See TracChangeset for help on using the changeset viewer.