Ignore:
Timestamp:
Jan 24, 2018, 8:19:44 PM (8 years ago)
Author:
[email protected]
Message:

JSC GC should support TLCs (thread local caches)
https://bugs.webkit.org/show_bug.cgi?id=181559

Reviewed by Mark Lam and Saam Barati.
Source/JavaScriptCore:


This is a big step towards object distancing by site origin. This patch implements TLCs, or
thread-local caches, which allow each thread to allocate from its own free lists. It also
means that any given thread can context-switch TLCs. This will allow us to do separate
allocation for separate site origins. Eventually, once we reshape how MarkedBlock looks, this
will allow us to have a hard distancing constraint between objects from different origins.

In this new design, every "size class" is represented as a BlockDirectory (formerly known as
MarkedAllocator, prior to r226822). This contains a bag of blocks allocated using some
aligned memory allocator (which roughly represents which cage you came out of), and anyone
using the same allocator can share those blocks - but so long as they are in that
BlockDirectory, they will have the size and type of that directory. Previously, each
BlockDirectory had exactly one FreeList. Now, each BlockDirectory has a double-linked-list of
LocalAllocators, each of which has a FreeList.

To decide which LocalAllocator to allocate out of, we need a ThreadLocalCache and a
BlockDirectory. The directory gives us an offset-within-the-ThreadLocalCache, which we simply
call the Allocator (which is just a POD type that contains a 32-bit offset). Each allocation
starts by figuring out what Allocator it wants (often we have this information at JIT time).
Then the allocation loads its ThreadLocalCache::Data from a fast TLS slot. Then we add the
Allocator offset to the ThreadLocalCache::Data to get the LocalAllocator. Note that we use
offsets as opposed to indices to make it easy to do the math on each allocation (if
LocalAllocator had a weird size then every allocation would have to do an imul).

This is a definite slow-down on GC-heavy benchmarks, but by a small margin, and only on
unusually heavy tests. For example, boyer and splay are both 3% regressed, but the Octane
geomean is just fine. The JetStream score regressed by 0.5% with p = 0.08 (so maybe there is
something there, but it's not significant according to our threshold).

  • JavaScriptCore.xcodeproj/project.pbxproj:
  • Sources.txt:
  • b3/B3LowerToAir.cpp:
  • b3/B3PatchpointSpecial.cpp:

(JSC::B3::PatchpointSpecial::admitsStack):

  • b3/B3StackmapSpecial.cpp:

(JSC::B3::StackmapSpecial::forEachArgImpl):
(JSC::B3::StackmapSpecial::isArgValidForRep):

  • b3/B3StackmapValue.cpp:

(JSC::B3::StackmapValue::appendSomeRegisterWithClobber):

  • b3/B3StackmapValue.h:
  • b3/B3Validate.cpp:
  • b3/B3ValueRep.cpp:

(JSC::B3::ValueRep::addUsedRegistersTo const):
(JSC::B3::ValueRep::dump const):
(WTF::printInternal):

  • b3/B3ValueRep.h:

(JSC::B3::ValueRep::ValueRep):

  • bytecode/AccessCase.cpp:

(JSC::AccessCase::generateImpl):

  • bytecode/ObjectAllocationProfile.h:

(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::clear):

  • bytecode/ObjectAllocationProfileInlines.h:

(JSC::ObjectAllocationProfile::initializeProfile):

  • dfg/DFGSpeculativeJIT.cpp:

(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileCreateThis):
(JSC::DFG::SpeculativeJIT::compileNewObject):

  • dfg/DFGSpeculativeJIT.h:

(JSC::DFG::SpeculativeJIT::emitAllocateJSCell):
(JSC::DFG::SpeculativeJIT::emitAllocateJSObject):

  • ftl/FTLAbstractHeapRepository.h:
  • ftl/FTLLowerDFGToB3.cpp:

(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorageWithSizeImpl):
(JSC::FTL::DFG::LowerDFGToB3::allocateHeapCell):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::allocatorForSize):
(JSC::FTL::DFG::LowerDFGToB3::allocateVariableSizedObject):
(JSC::FTL::DFG::LowerDFGToB3::allocateVariableSizedCell):

  • heap/Allocator.cpp: Added.

(JSC::Allocator::cellSize const):

  • heap/Allocator.h: Added.

(JSC::Allocator::Allocator):
(JSC::Allocator::offset const):
(JSC::Allocator::operator== const):
(JSC::Allocator::operator!= const):
(JSC::Allocator::operator bool const):

  • heap/AllocatorInlines.h: Added.

(JSC::Allocator::allocate const):
(JSC::Allocator::tryAllocate const):

  • heap/BlockDirectory.cpp:

(JSC::BlockDirectory::BlockDirectory):
(JSC::BlockDirectory::findBlockForAllocation):
(JSC::BlockDirectory::stopAllocating):
(JSC::BlockDirectory::prepareForAllocation):
(JSC::BlockDirectory::stopAllocatingForGood):
(JSC::BlockDirectory::resumeAllocating):
(JSC::BlockDirectory::endMarking):
(JSC::BlockDirectory::isFreeListedCell):
(JSC::BlockDirectory::didConsumeFreeList): Deleted.
(JSC::BlockDirectory::tryAllocateWithoutCollecting): Deleted.
(JSC::BlockDirectory::allocateIn): Deleted.
(JSC::BlockDirectory::tryAllocateIn): Deleted.
(JSC::BlockDirectory::doTestCollectionsIfNeeded): Deleted.
(JSC::BlockDirectory::allocateSlowCase): Deleted.

  • heap/BlockDirectory.h:

(JSC::BlockDirectory::cellKind const):
(JSC::BlockDirectory::allocator const):
(JSC::BlockDirectory::freeList const): Deleted.
(JSC::BlockDirectory::offsetOfFreeList): Deleted.
(JSC::BlockDirectory::offsetOfCellSize): Deleted.

  • heap/BlockDirectoryInlines.h:

(JSC::BlockDirectory::isFreeListedCell const): Deleted.
(JSC::BlockDirectory::allocate): Deleted.

  • heap/CompleteSubspace.cpp:

(JSC::CompleteSubspace::CompleteSubspace):
(JSC::CompleteSubspace::allocatorFor):
(JSC::CompleteSubspace::allocate):
(JSC::CompleteSubspace::allocateNonVirtual):
(JSC::CompleteSubspace::allocatorForSlow):
(JSC::CompleteSubspace::allocateSlow):
(JSC::CompleteSubspace::tryAllocateSlow):

  • heap/CompleteSubspace.h:

(JSC::CompleteSubspace::allocatorForSizeStep):
(JSC::CompleteSubspace::allocatorForNonVirtual):

  • heap/FreeList.h:
  • heap/GCDeferralContext.h:
  • heap/Heap.cpp:

(JSC::Heap::Heap):
(JSC::Heap::lastChanceToFinalize):

  • heap/Heap.h:

(JSC::Heap::threadLocalCacheLayout):

  • heap/IsoCellSet.h:
  • heap/IsoSubspace.cpp:

(JSC::IsoSubspace::IsoSubspace):
(JSC::IsoSubspace::allocatorFor):
(JSC::IsoSubspace::allocate):
(JSC::IsoSubspace::allocateNonVirtual):

  • heap/IsoSubspace.h:

(JSC::IsoSubspace::allocatorForNonVirtual):

  • heap/LocalAllocator.cpp: Added.

(JSC::LocalAllocator::LocalAllocator):
(JSC::LocalAllocator::reset):
(JSC::LocalAllocator::~LocalAllocator):
(JSC::LocalAllocator::stopAllocating):
(JSC::LocalAllocator::resumeAllocating):
(JSC::LocalAllocator::prepareForAllocation):
(JSC::LocalAllocator::stopAllocatingForGood):
(JSC::LocalAllocator::allocateSlowCase):
(JSC::LocalAllocator::didConsumeFreeList):
(JSC::LocalAllocator::tryAllocateWithoutCollecting):
(JSC::LocalAllocator::allocateIn):
(JSC::LocalAllocator::tryAllocateIn):
(JSC::LocalAllocator::doTestCollectionsIfNeeded):
(JSC::LocalAllocator::isFreeListedCell const):

  • heap/LocalAllocator.h: Added.

(JSC::LocalAllocator::offsetOfFreeList):
(JSC::LocalAllocator::offsetOfCellSize):

  • heap/LocalAllocatorInlines.h: Added.

(JSC::LocalAllocator::allocate):

  • heap/MarkedSpace.cpp:

(JSC::MarkedSpace::stopAllocatingForGood):

  • heap/MarkedSpace.h:
  • heap/SlotVisitor.cpp:
  • heap/SlotVisitor.h:
  • heap/Subspace.h:
  • heap/ThreadLocalCache.cpp: Added.

(JSC::ThreadLocalCache::create):
(JSC::ThreadLocalCache::ThreadLocalCache):
(JSC::ThreadLocalCache::~ThreadLocalCache):
(JSC::ThreadLocalCache::allocateData):
(JSC::ThreadLocalCache::destroyData):
(JSC::ThreadLocalCache::installSlow):
(JSC::ThreadLocalCache::installData):
(JSC::ThreadLocalCache::allocatorSlow):
(JSC::ThreadLocalCache::destructor):

  • heap/ThreadLocalCache.h: Added.

(JSC::ThreadLocalCache::offsetOfSize):
(JSC::ThreadLocalCache::offsetOfFirstAllocator):

  • heap/ThreadLocalCacheInlines.h: Added.

(JSC::ThreadLocalCache::getImpl):
(JSC::ThreadLocalCache::get):
(JSC::ThreadLocalCache::install):
(JSC::ThreadLocalCache::allocator):
(JSC::ThreadLocalCache::tryGetAllocator):

  • heap/ThreadLocalCacheLayout.cpp: Added.

(JSC::ThreadLocalCacheLayout::ThreadLocalCacheLayout):
(JSC::ThreadLocalCacheLayout::~ThreadLocalCacheLayout):
(JSC::ThreadLocalCacheLayout::allocateOffset):
(JSC::ThreadLocalCacheLayout::snapshot):
(JSC::ThreadLocalCacheLayout::directory):

  • heap/ThreadLocalCacheLayout.h: Added.
  • jit/AssemblyHelpers.cpp:

(JSC::AssemblyHelpers::emitAllocateWithNonNullAllocator):
(JSC::AssemblyHelpers::emitAllocate):
(JSC::AssemblyHelpers::emitAllocateVariableSized):

  • jit/AssemblyHelpers.h:

(JSC::AssemblyHelpers::vm):
(JSC::AssemblyHelpers::emitAllocateJSCell):
(JSC::AssemblyHelpers::emitAllocateJSObject):
(JSC::AssemblyHelpers::emitAllocateJSObjectWithKnownSize):
(JSC::AssemblyHelpers::emitAllocateWithNonNullAllocator): Deleted.
(JSC::AssemblyHelpers::emitAllocate): Deleted.
(JSC::AssemblyHelpers::emitAllocateVariableSized): Deleted.

  • jit/JITOpcodes.cpp:

(JSC::JIT::emit_op_new_object):
(JSC::JIT::emit_op_create_this):

  • jit/JITOpcodes32_64.cpp:

(JSC::JIT::emit_op_new_object):
(JSC::JIT::emit_op_create_this):

  • runtime/ButterflyInlines.h:

(JSC::Butterfly::createUninitialized):
(JSC::Butterfly::tryCreate):
(JSC::Butterfly::growArrayRight):

  • runtime/DirectArguments.cpp:

(JSC::DirectArguments::overrideThings):

  • runtime/GenericArgumentsInlines.h:

(JSC::GenericArguments<Type>::initModifiedArgumentsDescriptor):

  • runtime/HashMapImpl.h:

(JSC::HashMapBuffer::create):

  • runtime/JSArray.cpp:

(JSC::JSArray::tryCreateUninitializedRestricted):
(JSC::JSArray::unshiftCountSlowCase):

  • runtime/JSArray.h:

(JSC::JSArray::tryCreate):

  • runtime/JSArrayBufferView.cpp:

(JSC::JSArrayBufferView::ConstructionContext::ConstructionContext):

  • runtime/JSCellInlines.h:

(JSC::tryAllocateCellHelper):

  • runtime/JSGlobalObject.cpp:

(JSC::JSGlobalObject::JSGlobalObject):

  • runtime/JSGlobalObject.h:

(JSC::JSGlobalObject::threadLocalCache const):

  • runtime/JSLock.cpp:

(JSC::JSLock::didAcquireLock):

  • runtime/Options.h:
  • runtime/RegExpMatchesArray.h:

(JSC::tryCreateUninitializedRegExpMatchesArray):

  • runtime/VM.cpp:

(JSC::VM::VM):

  • runtime/VM.h:
  • runtime/VMEntryScope.cpp:

(JSC::VMEntryScope::VMEntryScope):

Source/WTF:

  • wtf/Bitmap.h: Just fixing a compile error.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/jit/AssemblyHelpers.cpp

    r226440 r227592  
    11/*
    2  * Copyright (C) 2011-2017 Apple Inc. All rights reserved.
     2 * Copyright (C) 2011-2018 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    582582}
    583583#endif
     584
     585void AssemblyHelpers::emitAllocateWithNonNullAllocator(GPRReg resultGPR, const JITAllocator& allocator, GPRReg allocatorGPR, GPRReg scratchGPR, JumpList& slowPath)
     586{
     587    // NOTE: This is carefully written so that we can call it while we disallow scratch
     588    // register usage.
     589       
     590    if (Options::forceGCSlowPaths()) {
     591        slowPath.append(jump());
     592        return;
     593    }
     594   
     595    Jump popPath;
     596    Jump done;
     597   
     598#if ENABLE(FAST_TLS_JIT)
     599    loadFromTLSPtr(fastTLSOffsetForKey(WTF_GC_TLC_KEY), scratchGPR);
     600#else
     601    loadPtr(&vm().threadLocalCacheData, scratchGPR);
     602#endif
     603    if (allocator.isConstant()) {
     604        slowPath.append(branch32(BelowOrEqual, Address(scratchGPR, ThreadLocalCache::offsetOfSizeInData()), TrustedImm32(allocator.allocator().offset())));
     605        addPtr(TrustedImm32(ThreadLocalCache::offsetOfFirstAllocatorInData() + allocator.allocator().offset()), scratchGPR, allocatorGPR);
     606    } else {
     607        slowPath.append(branch32(BelowOrEqual, Address(scratchGPR, ThreadLocalCache::offsetOfSizeInData()), allocatorGPR));
     608        addPtr(TrustedImm32(ThreadLocalCache::offsetOfFirstAllocatorInData()), allocatorGPR);
     609        addPtr(scratchGPR, allocatorGPR);
     610    }
     611
     612    load32(Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfRemaining()), resultGPR);
     613    popPath = branchTest32(Zero, resultGPR);
     614    if (allocator.isConstant())
     615        add32(TrustedImm32(-allocator.allocator().cellSize(vm().heap)), resultGPR, scratchGPR);
     616    else {
     617        if (isX86()) {
     618            move(resultGPR, scratchGPR);
     619            sub32(Address(allocatorGPR, LocalAllocator::offsetOfCellSize()), scratchGPR);
     620        } else {
     621            load32(Address(allocatorGPR, LocalAllocator::offsetOfCellSize()), scratchGPR);
     622            sub32(resultGPR, scratchGPR, scratchGPR);
     623        }
     624    }
     625    negPtr(resultGPR);
     626    store32(scratchGPR, Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfRemaining()));
     627    Address payloadEndAddr = Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfPayloadEnd());
     628    if (isX86())
     629        addPtr(payloadEndAddr, resultGPR);
     630    else {
     631        loadPtr(payloadEndAddr, scratchGPR);
     632        addPtr(scratchGPR, resultGPR);
     633    }
     634       
     635    done = jump();
     636       
     637    popPath.link(this);
     638       
     639    loadPtr(Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfScrambledHead()), resultGPR);
     640    if (isX86())
     641        xorPtr(Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfSecret()), resultGPR);
     642    else {
     643        loadPtr(Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfSecret()), scratchGPR);
     644        xorPtr(scratchGPR, resultGPR);
     645    }
     646    slowPath.append(branchTestPtr(Zero, resultGPR));
     647       
     648    // The object is half-allocated: we have what we know is a fresh object, but
     649    // it's still on the GC's free list.
     650    loadPtr(Address(resultGPR), scratchGPR);
     651    storePtr(scratchGPR, Address(allocatorGPR, LocalAllocator::offsetOfFreeList() + FreeList::offsetOfScrambledHead()));
     652       
     653    done.link(this);
     654}
     655
     656void AssemblyHelpers::emitAllocate(GPRReg resultGPR, const JITAllocator& allocator, GPRReg allocatorGPR, GPRReg scratchGPR, JumpList& slowPath)
     657{
     658    if (allocator.isConstant()) {
     659        if (!allocator.allocator()) {
     660            slowPath.append(jump());
     661            return;
     662        }
     663    }
     664    emitAllocateWithNonNullAllocator(resultGPR, allocator, allocatorGPR, scratchGPR, slowPath);
     665}
     666
     667void AssemblyHelpers::emitAllocateVariableSized(GPRReg resultGPR, CompleteSubspace& subspace, GPRReg allocationSize, GPRReg scratchGPR1, GPRReg scratchGPR2, JumpList& slowPath)
     668{
     669    static_assert(!(MarkedSpace::sizeStep & (MarkedSpace::sizeStep - 1)), "MarkedSpace::sizeStep must be a power of two.");
     670   
     671    unsigned stepShift = getLSBSet(MarkedSpace::sizeStep);
     672   
     673    add32(TrustedImm32(MarkedSpace::sizeStep - 1), allocationSize, scratchGPR1);
     674    urshift32(TrustedImm32(stepShift), scratchGPR1);
     675    slowPath.append(branch32(Above, scratchGPR1, TrustedImm32(MarkedSpace::largeCutoff >> stepShift)));
     676    move(TrustedImmPtr(subspace.allocatorForSizeStep() - 1), scratchGPR2);
     677    load32(BaseIndex(scratchGPR2, scratchGPR1, TimesFour), scratchGPR1);
     678   
     679    emitAllocate(resultGPR, JITAllocator::variable(), scratchGPR1, scratchGPR2, slowPath);
     680}
    584681
    585682void AssemblyHelpers::restoreCalleeSavesFromEntryFrameCalleeSavesBuffer(EntryFrame*& topEntryFrame)
Note: See TracChangeset for help on using the changeset viewer.