Ignore:
Timestamp:
Dec 5, 2017, 9:53:57 AM (8 years ago)
Author:
[email protected]
Message:

GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934

Reviewed by JF Bastien.
PerformanceTests:


Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.

  • Octane/splay.js: Added.

(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):

Source/JavaScriptCore:


This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.

The constraint solver supports running constraints in parallel in two different ways:

  • Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now.


  • A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run.


It's not possible to have a non-concurrent constraint that creates parallel work.

The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:

  • No need to start any other threads.


  • The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step.


A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).

  • API/JSMarkingConstraintPrivate.cpp:

(JSContextGroupAddMarkingConstraint):

  • API/JSVirtualMachine.mm:

(scanExternalObjectGraph):
(scanExternalRememberedSet):

  • JavaScriptCore.xcodeproj/project.pbxproj:
  • Sources.txt:
  • bytecode/AccessCase.cpp:

(JSC::AccessCase::propagateTransitions const):

  • bytecode/CodeBlock.cpp:

(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):

  • dfg/DFGWorklist.cpp:
  • ftl/FTLCompile.cpp:

(JSC::FTL::compile):

  • heap/ConstraintParallelism.h: Added.

(WTF::printInternal):

  • heap/Heap.cpp:

(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.

  • heap/Heap.h:

(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):

  • heap/HeapInlines.h:

(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.

  • heap/HeapSnapshotBuilder.cpp:

(JSC::HeapSnapshotBuilder::appendNode):

  • heap/LargeAllocation.h:

(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.

  • heap/LockDuringMarking.h:

(JSC::lockDuringMarking):

  • heap/MarkedAllocator.cpp:

(JSC::MarkedAllocator::parallelNotEmptyBlockSource):

  • heap/MarkedAllocator.h:
  • heap/MarkedBlock.h:

(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.

  • heap/MarkedSpace.h:

(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):

  • heap/MarkingConstraint.cpp:

(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):

  • heap/MarkingConstraint.h:

(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.

  • heap/MarkingConstraintSet.cpp:

(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.

  • heap/MarkingConstraintSet.h:
  • heap/MarkingConstraintSolver.cpp: Added.

(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):

  • heap/MarkingConstraintSolver.h: Added.
  • heap/OpaqueRootSet.h: Removed.
  • heap/ParallelSourceAdapter.h: Added.

(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):

  • heap/SimpleMarkingConstraint.cpp: Added.

(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):

  • heap/SimpleMarkingConstraint.h: Added.
  • heap/SlotVisitor.cpp:

(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.

  • heap/SlotVisitor.h:
  • heap/SlotVisitorInlines.h:

(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):

  • heap/Subspace.cpp:

(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):

  • heap/Subspace.h:
  • heap/SubspaceInlines.h:

(JSC::Subspace::forEachMarkedCellInParallel):

  • heap/VisitCounter.h: Added.

(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):

  • heap/VisitingTimeout.h: Removed.
  • heap/WeakBlock.cpp:

(JSC::WeakBlock::specializedVisit):

  • runtime/Structure.cpp:

(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):

Source/WebCore:

No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.

This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.

  • ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
  • ForwardingHeaders/heap/VisitingTimeout.h: Removed.
  • Sources.txt:
  • WebCore.xcodeproj/project.pbxproj:
  • bindings/js/DOMGCOutputConstraint.cpp: Added.

(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):

  • bindings/js/DOMGCOutputConstraint.h: Added.
  • bindings/js/WebCoreJSClientData.cpp:

(WebCore::JSVMClientData::initNormalWorld):

  • dom/Node.cpp:

(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):

Source/WTF:


This does some changes to make it easier to do parallel constraint solving:

  • I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining.


  • Added more logic to Deque<>, since I use it for part of the load balancer.



  • Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore.


  • Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock.
  • WTF.xcodeproj/project.pbxproj:
  • wtf/Atomics.h:

(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.

  • wtf/BitVector.h:

(WTF::BitVector::iterator::operator++):

  • wtf/CMakeLists.txt:
  • wtf/ConcurrentPtrHashSet.cpp: Added.

(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):

  • wtf/ConcurrentPtrHashSet.h: Added.

(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):

  • wtf/Deque.h:

(WTF::inlineCapacity>::takeFirst):

  • wtf/FastMalloc.h:
  • wtf/Lock.cpp:

(WTF::LockBase::lockSlow):

  • wtf/Locker.h:

(WTF::holdLockIf):

  • wtf/ScopedLambda.h:
  • wtf/SharedTask.h:

(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.

  • wtf/StackShot.h: Added.

(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):

  • wtf/StackShotProfiler.h: Added.

(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):

Tools:

  • Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
  • TestWebKitAPI/CMakeLists.txt:
  • TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
  • TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).

(TestWebKitAPI::TEST):

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/heap/MarkingConstraint.cpp

    r210844 r225524  
    2828
    2929#include "JSCInlines.h"
     30#include "VisitCounter.h"
    3031
    3132namespace JSC {
    3233
    33 MarkingConstraint::MarkingConstraint(
    34     CString abbreviatedName, CString name,
    35     ::Function<void(SlotVisitor&, const VisitingTimeout&)> executeFunction,
    36     ConstraintVolatility volatility)
     34static constexpr bool verboseMarkingConstraint = false;
     35
     36MarkingConstraint::MarkingConstraint(CString abbreviatedName, CString name, ConstraintVolatility volatility, ConstraintConcurrency concurrency, ConstraintParallelism parallelism)
    3737    : m_abbreviatedName(abbreviatedName)
    3838    , m_name(WTFMove(name))
    39     , m_executeFunction(WTFMove(executeFunction))
    4039    , m_volatility(volatility)
    41 {
    42 }
    43 
    44 MarkingConstraint::MarkingConstraint(
    45     CString abbreviatedName, CString name,
    46     ::Function<void(SlotVisitor&, const VisitingTimeout&)> executeFunction,
    47     ::Function<double(SlotVisitor&)> quickWorkEstimateFunction,
    48     ConstraintVolatility volatility)
    49     : m_abbreviatedName(abbreviatedName)
    50     , m_name(WTFMove(name))
    51     , m_executeFunction(WTFMove(executeFunction))
    52     , m_quickWorkEstimateFunction(WTFMove(quickWorkEstimateFunction))
    53     , m_volatility(volatility)
     40    , m_concurrency(concurrency)
     41    , m_parallelism(parallelism)
    5442{
    5543}
     
    6452}
    6553
    66 void MarkingConstraint::execute(SlotVisitor& visitor, bool& didVisitSomething, MonotonicTime timeout)
     54ConstraintParallelism MarkingConstraint::execute(SlotVisitor& visitor)
     55{
     56    VisitCounter visitCounter(visitor);
     57    ConstraintParallelism result = executeImpl(visitor);
     58    m_lastVisitCount += visitCounter.visitCount();
     59    if (verboseMarkingConstraint && visitCounter.visitCount())
     60        dataLog("(", abbreviatedName(), " visited ", visitCounter.visitCount(), " in execute)");
     61    if (result == ConstraintParallelism::Parallel) {
     62        // It's illegal to produce parallel work if you haven't advertised it upfront because the solver
     63        // has optimizations for constraints that promise to never produce parallel work.
     64        RELEASE_ASSERT(m_parallelism == ConstraintParallelism::Parallel);
     65    }
     66    return result;
     67}
     68
     69double MarkingConstraint::quickWorkEstimate(SlotVisitor&)
     70{
     71    return 0;
     72}
     73
     74double MarkingConstraint::workEstimate(SlotVisitor& visitor)
     75{
     76    return lastVisitCount() + quickWorkEstimate(visitor);
     77}
     78
     79void MarkingConstraint::prepareToExecute(const AbstractLocker& constraintSolvingLocker, SlotVisitor& visitor)
    6780{
    6881    if (Options::logGC())
    6982        dataLog(abbreviatedName());
    70     VisitingTimeout visitingTimeout(visitor, didVisitSomething, timeout);
    71     m_executeFunction(visitor, visitingTimeout);
    72     m_lastVisitCount = visitingTimeout.visitCount(visitor);
    73     didVisitSomething = visitingTimeout.didVisitSomething(visitor);
     83    VisitCounter visitCounter(visitor);
     84    prepareToExecuteImpl(constraintSolvingLocker, visitor);
     85    m_lastVisitCount = visitCounter.visitCount();
     86    if (verboseMarkingConstraint && visitCounter.visitCount())
     87        dataLog("(", abbreviatedName(), " visited ", visitCounter.visitCount(), " in prepareToExecute)");
     88}
     89
     90void MarkingConstraint::doParallelWork(SlotVisitor& visitor)
     91{
     92    VisitCounter visitCounter(visitor);
     93    doParallelWorkImpl(visitor);
     94    if (verboseMarkingConstraint && visitCounter.visitCount())
     95        dataLog("(", abbreviatedName(), " visited ", visitCounter.visitCount(), " in doParallelWork)");
     96    {
     97        auto locker = holdLock(m_lock);
     98        m_lastVisitCount += visitCounter.visitCount();
     99    }
     100}
     101
     102void MarkingConstraint::finishParallelWork(SlotVisitor& visitor)
     103{
     104    VisitCounter visitCounter(visitor);
     105    finishParallelWorkImpl(visitor);
     106    m_lastVisitCount += visitCounter.visitCount();
     107    if (verboseMarkingConstraint && visitCounter.visitCount())
     108        dataLog("(", abbreviatedName(), " visited ", visitCounter.visitCount(), " in finishParallelWork)");
     109}
     110
     111void MarkingConstraint::prepareToExecuteImpl(const AbstractLocker&, SlotVisitor&)
     112{
     113}
     114
     115void MarkingConstraint::doParallelWorkImpl(SlotVisitor&)
     116{
     117    UNREACHABLE_FOR_PLATFORM();
     118}
     119
     120void MarkingConstraint::finishParallelWorkImpl(SlotVisitor&)
     121{
    74122}
    75123
Note: See TracChangeset for help on using the changeset viewer.