The collector thread should only start when the mutator doesn't have heap access
https://bugs.webkit.org/show_bug.cgi?id=167737
Reviewed by Keith Miller.
JSTests:
Add versions of splay that flash heap access, to simulate what might happen if a third-party app
was running concurrent GC. In this case, we might actually start the collector thread.
- stress/splay-flash-access-1ms.js: Added.
(performance.now):
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(jscSetUp):
(jscTearDown):
(jscRun):
(averageAbovePercentile):
(printPercentile):
- stress/splay-flash-access.js: Added.
(performance.now):
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(jscSetUp):
(jscTearDown):
(jscRun):
(averageAbovePercentile):
(printPercentile):
Source/JavaScriptCore:
This turns the collector thread's workflow into a state machine, so that the mutator thread can
run it directly. This reduces the amount of synchronization we do with the collector thread, and
means that most apps will never start the collector thread. The collector thread will still start
when we need to finish collecting and we don't have heap access.
In this new world, "stopping the world" means relinquishing control of collection to the mutator.
This means tracking who is conducting collection. I use the GCConductor enum to say who is
conducting. It's either GCConductor::Mutator or GCConductor::Collector. I use the term "conn" to
refer to the concept of conducting (having the conn, relinquishing the conn, taking the conn).
So, stopping the world means giving the mutator the conn. Releasing heap access means giving the
collector the conn.
This meant bringing back the conservative scan of the calling thread. It turns out that this
scan was too slow to be called on each GC increment because apparently setjmp() now does system
calls. So, I wrote our own callee save register saving for the GC. Then I had doubts about
whether or not it was correct, so I also made it so that the GC only rarely asks for the register
state. I think we still want to use my register saving code instead of setjmp because setjmp
seems to save things we don't need, and that could make us overly conservative.
It turns out that this new scheduling discipline makes the old space-time scheduler perform
better than the new stochastic space-time scheduler on systems with fewer than 4 cores. This is
because the mutator having the conn enables us to time the mutator<->collector context switches
by polling. The OS is never involved. So, we can use super precise timing. This allows the old
space-time schduler to shine like it hadn't before.
The splay results imply that this is all a good thing. On 2-core systems, this reduces pause
times by 40% and it increases throughput about 5%. On 1-core systems, this reduces pause times by
half and reduces throughput by 8%. On 4-or-more-core systems, this doesn't seem to have much
effect.
- CMakeLists.txt:
- JavaScriptCore.xcodeproj/project.pbxproj:
- dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::ThreadBody::ThreadBody):
(JSC::DFG::Worklist::dump):
(JSC::DFG::numberOfWorklists):
(JSC::DFG::ensureWorklistForIndex):
(JSC::DFG::existingWorklistForIndexOrNull):
(JSC::DFG::existingWorklistForIndex):
(JSC::DFG::numberOfWorklists): Deleted.
(JSC::DFG::ensureWorklistForIndex): Deleted.
(JSC::DFG::existingWorklistForIndexOrNull): Deleted.
(JSC::DFG::existingWorklistForIndex): Deleted.
- heap/CollectingScope.h: Added.
(JSC::CollectingScope::CollectingScope):
(JSC::CollectingScope::~CollectingScope):
- heap/CollectorPhase.cpp: Added.
(JSC::worldShouldBeSuspended):
(WTF::printInternal):
- heap/CollectorPhase.h: Added.
- heap/EdenGCActivityCallback.cpp:
(JSC::EdenGCActivityCallback::lastGCLength):
- heap/FullGCActivityCallback.cpp:
(JSC::FullGCActivityCallback::doCollection):
(JSC::FullGCActivityCallback::lastGCLength):
- heap/GCConductor.cpp: Added.
(JSC::gcConductorShortName):
(WTF::printInternal):
- heap/GCConductor.h: Added.
- heap/Heap.cpp:
(JSC::Heap::Thread::Thread):
(JSC::Heap::Heap):
(JSC::Heap::lastChanceToFinalize):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::shouldCollectInCollectorThread):
(JSC::Heap::collectInCollectorThread):
(JSC::Heap::checkConn):
(JSC::Heap::runCurrentPhase):
(JSC::Heap::runNotRunningPhase):
(JSC::Heap::runBeginPhase):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::runConcurrentPhase):
(JSC::Heap::runReloopPhase):
(JSC::Heap::runEndPhase):
(JSC::Heap::changePhase):
(JSC::Heap::finishChangingPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::stopTheMutator):
(JSC::Heap::resumeTheMutator):
(JSC::Heap::stopIfNecessarySlow):
(JSC::Heap::collectInMutatorThread):
(JSC::Heap::collectInMutatorThreadImpl):
(JSC::Heap::waitForCollector):
(JSC::Heap::acquireAccessSlow):
(JSC::Heap::releaseAccessSlow):
(JSC::Heap::relinquishConn):
(JSC::Heap::finishRelinquishingConn):
(JSC::Heap::handleNeedFinalize):
(JSC::Heap::notifyThreadStopping):
(JSC::Heap::finalize):
(JSC::Heap::requestCollection):
(JSC::Heap::waitForCollection):
(JSC::Heap::updateAllocationLimits):
(JSC::Heap::didFinishCollection):
(JSC::Heap::collectIfNecessaryOrDefer):
(JSC::Heap::preventCollection):
(JSC::Heap::performIncrement):
(JSC::Heap::markToFixpoint): Deleted.
(JSC::Heap::shouldCollectInThread): Deleted.
(JSC::Heap::collectInThread): Deleted.
(JSC::Heap::stopTheWorld): Deleted.
(JSC::Heap::resumeTheWorld): Deleted.
(JSC::Heap::machineThreads):
(JSC::Heap::lastFullGCLength):
(JSC::Heap::lastEdenGCLength):
(JSC::Heap::increaseLastFullGCLength):
(JSC::Heap::mutatorIsStopped): Deleted.
- heap/HeapStatistics.cpp: Removed.
- heap/HeapStatistics.h: Removed.
- heap/HelpingGCScope.h: Removed.
- heap/MachineStackMarker.cpp:
(JSC::MachineThreads::gatherFromCurrentThread):
(JSC::MachineThreads::gatherConservativeRoots):
- heap/MachineStackMarker.h:
- heap/MarkedBlock.cpp:
(JSC::MarkedBlock::Handle::sweep):
(WTF::printInternal):
- heap/MutatorState.h:
- heap/RegisterState.h: Added.
- heap/SlotVisitor.cpp:
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::donateAll):
- heap/StochasticSpaceTimeMutatorScheduler.cpp:
(JSC::StochasticSpaceTimeMutatorScheduler::beginCollection):
(JSC::StochasticSpaceTimeMutatorScheduler::synchronousDrainingDidStall):
(JSC::StochasticSpaceTimeMutatorScheduler::timeToStop):
- heap/SweepingScope.h: Added.
(JSC::SweepingScope::SweepingScope):
(JSC::SweepingScope::~SweepingScope):
(JSC::JITWorklist::Thread::Thread):
(GlobalObject::finishCreation):
(functionFlashHeapAccess):
- runtime/InitializeThreading.cpp:
(JSC::initializeThreading):
(JSC::JSCell::classInfo):
(JSC::overrideDefaults):
- runtime/Options.h:
- runtime/TestRunnerUtils.cpp:
(JSC::finalizeStatsAtEndOfTesting):
Source/WebCore:
Added new tests in JSTests and LayoutTests.
The WebCore changes involve:
- Refactoring around new header discipline.
- Adding crazy GC APIs to window.internals to enable us to test the GC's runloop discipline.
- ForwardingHeaders/heap/GCFinalizationCallback.h: Added.
- ForwardingHeaders/heap/IncrementalSweeper.h: Added.
- ForwardingHeaders/heap/MachineStackMarker.h: Added.
- ForwardingHeaders/heap/RunningScope.h: Added.
- bindings/js/CommonVM.cpp:
- testing/Internals.cpp:
(WebCore::Internals::parserMetaData):
(WebCore::Internals::isReadableStreamDisturbed):
(WebCore::Internals::isGCRunning):
(WebCore::Internals::addGCFinalizationCallback):
(WebCore::Internals::stopSweeping):
(WebCore::Internals::startSweeping):
- testing/Internals.h:
- testing/Internals.idl:
Source/WTF:
Extend the use of AbstractLocker so that we can use more locking idioms.
(WTF::AutomaticThreadCondition::notifyOne):
(WTF::AutomaticThreadCondition::notifyAll):
(WTF::AutomaticThreadCondition::add):
(WTF::AutomaticThreadCondition::remove):
(WTF::AutomaticThreadCondition::contains):
(WTF::AutomaticThread::AutomaticThread):
(WTF::AutomaticThread::tryStop):
(WTF::AutomaticThread::isWaiting):
(WTF::AutomaticThread::notify):
(WTF::AutomaticThread::start):
(WTF::AutomaticThread::threadIsStopping):
- wtf/AutomaticThread.h:
- wtf/NumberOfCores.cpp:
(WTF::numberOfProcessorCores): Allow this to be overridden for testing.
- wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperClient::finish):
(WTF::ParallelHelperClient::claimTask):
(WTF::ParallelHelperPool::Thread::Thread):
(WTF::ParallelHelperPool::didMakeWorkAvailable):
(WTF::ParallelHelperPool::hasClientWithTask):
(WTF::ParallelHelperPool::getClientWithTask):
- wtf/ParallelHelperPool.h:
Tools:
Make more tests collect continuously.
- Scripts/run-jsc-stress-tests:
LayoutTests:
When running in WebCore, the JSC GC may find itself completing draining in the parallel helpers
at a time when the main thread runloop is idle. If the mutator has the conn, then there will not
be any GC threads to receive the notification from the shared mark stack condition variable. So
nobody will know that we need to reloop.
Fortunately, the SlotVisitor now knows that it has to schedule the stopIfNecessary timer in
addition to notifying the condition variable.
This adds a variant of splay that quickly builds up a big enough heap to cause significant GCs to
happen and then waits until a GC is running. When it's running, it registers a callback to the
GC's finalize phase. When the callback runs, it finishes the test. This is a barely-sound test
that uses a lot of while box API from Internals, but it proves that the SlotVisitor's runloop
ping works: if I comment it out, this test will always fail. Otherwise it always succeeds.
- js/dom/gc-slot-visitor-parallel-drain-pings-runloop-when-done.html: Added.