Ignore:
Timestamp:
Apr 20, 2017, 10:55:44 AM (8 years ago)
Author:
[email protected]
Message:

Optimize SharedArrayBuffer in the DFG+FTL
https://bugs.webkit.org/show_bug.cgi?id=164108

Reviewed by Saam Barati.

JSTests:

Added a fairly comprehensive test of the intrinsics. This creates a function for each possible
combination of type and operation, and then first uses it nicely and then tries a bunch of
erroneous conditions like OOB.

  • stress/SharedArrayBuffer-opt.js: Added.

(string_appeared_here.switch):
(string_appeared_here.str):
(runAtomic):
(shouldFail):
(Symbol):
(string_appeared_here.a.of.arrays.m.of.atomics):

  • stress/SharedArrayBuffer.js:

Source/JavaScriptCore:

This adds atomics intrinsics to the DFG and wires them through to the DFG and FTL backends. This
was super easy in the FTL since B3 already has comprehensive atomic intrinsics, which are more
powerful than what we need right now. In the DFG backend, I went with an easy-to-write
implementation that just reduces everything to a weak CAS loop. It's very inefficient with
registers (it needs ~8) but it's the DFG backend, so it's not obvious how much we care.

To make the rare cases easy to handle, I refactored AtomicsObject.cpp so that the operations for
the slow paths can share code with the native functions.

This also fixes register handling in the X86 implementations of CAS, in the case that
expectedAndResult is not %rax. This also fixes the ARM64 implementation of branchWeakCAS.

I adapted the CascadeLock from WTF/benchmarks/ToyLocks.h as a microbenchmark of lock performance.
This benchmark performs 2.5x faster, in both the contended and uncontended case, thanks to this
change. It's still about 3x slower than native. I investigated this only a bit. I suspect that
the story will be different in asm.js code, which will get constant-folding of the typed array
backing store by virtue of how it uses lexically scoped variables as pointers to the heap arrays.
It's worth noting that the native lock I was comparing against, the very nicely-tuned
CascadeLock, is at the very high end of lock throughput under virtually all conditions
(uncontended, microcontended, held for a long time). I also compared to WTF::Lock and others, and
the only ones that performed better in this microbenchmark were spinlocks. I don't recommend
using those. So, when I say this is 3x slower than native, I really mean that it's 3x slower than
the fastest native lock that I have in my arsenal.

Also worth noting is that I experimented with exposing Atomics.yield(), which uses sched_yield,
as a way of testing if adding a yield loop to the JS cascadeLock would help. It does not help. I
did not investigate why.

  • assembler/AbstractMacroAssembler.h:

(JSC::AbstractMacroAssembler::JumpList::append):

  • assembler/CPU.h:

(JSC::is64Bit):
(JSC::is32Bit):

  • b3/B3Common.h:

(JSC::B3::is64Bit): Deleted.
(JSC::B3::is32Bit): Deleted.

  • b3/B3LowerToAir.cpp:

(JSC::B3::Air::LowerToAir::appendTrapping):
(JSC::B3::Air::LowerToAir::appendCAS):
(JSC::B3::Air::LowerToAir::appendGeneralAtomic):

  • dfg/DFGAbstractInterpreterInlines.h:

(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):

  • dfg/DFGByteCodeParser.cpp:

(JSC::DFG::ByteCodeParser::handleIntrinsicCall):

  • dfg/DFGClobberize.h:

(JSC::DFG::clobberize):

  • dfg/DFGDoesGC.cpp:

(JSC::DFG::doesGC):

  • dfg/DFGFixupPhase.cpp:

(JSC::DFG::FixupPhase::fixupNode):

  • dfg/DFGNode.h:

(JSC::DFG::Node::hasHeapPrediction):
(JSC::DFG::Node::hasArrayMode):

  • dfg/DFGNodeType.h:

(JSC::DFG::isAtomicsIntrinsic):
(JSC::DFG::numExtraAtomicsArgs):

  • dfg/DFGPredictionPropagationPhase.cpp:
  • dfg/DFGSSALoweringPhase.cpp:

(JSC::DFG::SSALoweringPhase::handleNode):

  • dfg/DFGSafeToExecute.h:

(JSC::DFG::safeToExecute):

  • dfg/DFGSpeculativeJIT.cpp:

(JSC::DFG::SpeculativeJIT::loadFromIntTypedArray):
(JSC::DFG::SpeculativeJIT::setIntTypedArrayLoadResult):
(JSC::DFG::SpeculativeJIT::compileGetByValOnIntTypedArray):
(JSC::DFG::SpeculativeJIT::getIntTypedArrayStoreOperand):
(JSC::DFG::SpeculativeJIT::compilePutByValForIntTypedArray):

  • dfg/DFGSpeculativeJIT.h:

(JSC::DFG::SpeculativeJIT::callOperation):

  • dfg/DFGSpeculativeJIT32_64.cpp:

(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGSpeculativeJIT64.cpp:

(JSC::DFG::SpeculativeJIT::compile):

  • ftl/FTLAbstractHeapRepository.cpp:

(JSC::FTL::AbstractHeapRepository::decorateFencedAccess):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):

  • ftl/FTLAbstractHeapRepository.h:
  • ftl/FTLCapabilities.cpp:

(JSC::FTL::canCompile):

  • ftl/FTLLowerDFGToB3.cpp:

(JSC::FTL::DFG::LowerDFGToB3::compileNode):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsReadModifyWrite):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsIsLockFree):
(JSC::FTL::DFG::LowerDFGToB3::compileGetByVal):
(JSC::FTL::DFG::LowerDFGToB3::compilePutByVal):
(JSC::FTL::DFG::LowerDFGToB3::pointerIntoTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::loadFromIntTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::storeType):
(JSC::FTL::DFG::LowerDFGToB3::setIntTypedArrayLoadResult):
(JSC::FTL::DFG::LowerDFGToB3::getIntTypedArrayStoreOperand):
(JSC::FTL::DFG::LowerDFGToB3::vmCall):

  • ftl/FTLOutput.cpp:

(JSC::FTL::Output::store):
(JSC::FTL::Output::store32As8):
(JSC::FTL::Output::store32As16):
(JSC::FTL::Output::atomicXchgAdd):
(JSC::FTL::Output::atomicXchgAnd):
(JSC::FTL::Output::atomicXchgOr):
(JSC::FTL::Output::atomicXchgSub):
(JSC::FTL::Output::atomicXchgXor):
(JSC::FTL::Output::atomicXchg):
(JSC::FTL::Output::atomicStrongCAS):

  • ftl/FTLOutput.h:

(JSC::FTL::Output::store32):
(JSC::FTL::Output::store64):
(JSC::FTL::Output::storePtr):
(JSC::FTL::Output::storeFloat):
(JSC::FTL::Output::storeDouble):

  • jit/JITOperations.h:
  • runtime/AtomicsObject.cpp:

(JSC::atomicsFuncAdd):
(JSC::atomicsFuncAnd):
(JSC::atomicsFuncCompareExchange):
(JSC::atomicsFuncExchange):
(JSC::atomicsFuncIsLockFree):
(JSC::atomicsFuncLoad):
(JSC::atomicsFuncOr):
(JSC::atomicsFuncStore):
(JSC::atomicsFuncSub):
(JSC::atomicsFuncWait):
(JSC::atomicsFuncWake):
(JSC::atomicsFuncXor):
(JSC::operationAtomicsAdd):
(JSC::operationAtomicsAnd):
(JSC::operationAtomicsCompareExchange):
(JSC::operationAtomicsExchange):
(JSC::operationAtomicsIsLockFree):
(JSC::operationAtomicsLoad):
(JSC::operationAtomicsOr):
(JSC::operationAtomicsStore):
(JSC::operationAtomicsSub):
(JSC::operationAtomicsXor):

  • runtime/AtomicsObject.h:

Source/WTF:

Made small changes as part of benchmarking the JS versions of these locks.

  • benchmarks/LockSpeedTest.cpp:
  • benchmarks/ToyLocks.h:
  • wtf/Range.h:

(WTF::Range::dump):

LayoutTests:

Add a test of futex performance.

  • workers/sab/cascade_lock-worker.js: Added.

(onmessage):

  • workers/sab/cascade_lock.html: Added.
  • workers/sab/worker-resources.js:

(cascadeLockSlow):
(cascadeLock):
(cascadeUnlock):

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h

    r214384 r215565  
    30243024    void atomicStrongCAS8(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address, RegisterID result)
    30253025    {
    3026         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
     3026        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
    30273027    }
    30283028
    30293029    void atomicStrongCAS8(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address, RegisterID result)
    30303030    {
    3031         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3031        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30323032    }
    30333033
    30343034    void atomicStrongCAS16(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address, RegisterID result)
    30353035    {
    3036         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
     3036        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
    30373037    }
    30383038
    30393039    void atomicStrongCAS16(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address, RegisterID result)
    30403040    {
    3041         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3041        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30423042    }
    30433043
    30443044    void atomicStrongCAS32(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address, RegisterID result)
    30453045    {
    3046         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
     3046        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
    30473047    }
    30483048
    30493049    void atomicStrongCAS32(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address, RegisterID result)
    30503050    {
    3051         atomicStrongCAS(cond, expectedAndResult, result, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3051        atomicStrongCAS(cond, expectedAndResult, result, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30523052    }
    30533053
    30543054    void atomicStrongCAS8(RegisterID expectedAndResult, RegisterID newValue, Address address)
    30553055    {
    3056         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
     3056        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
    30573057    }
    30583058
    30593059    void atomicStrongCAS8(RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    30603060    {
    3061         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3061        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30623062    }
    30633063
    30643064    void atomicStrongCAS16(RegisterID expectedAndResult, RegisterID newValue, Address address)
    30653065    {
    3066         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
     3066        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
    30673067    }
    30683068
    30693069    void atomicStrongCAS16(RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    30703070    {
    3071         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3071        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30723072    }
    30733073
    30743074    void atomicStrongCAS32(RegisterID expectedAndResult, RegisterID newValue, Address address)
    30753075    {
    3076         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
     3076        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
    30773077    }
    30783078
    30793079    void atomicStrongCAS32(RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    30803080    {
    3081         atomicStrongCAS(expectedAndResult, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3081        atomicStrongCAS(expectedAndResult, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30823082    }
    30833083
    30843084    Jump branchAtomicStrongCAS8(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address)
    30853085    {
    3086         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
     3086        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base); });
    30873087    }
    30883088
    30893089    Jump branchAtomicStrongCAS8(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    30903090    {
    3091         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3091        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgb_rm(newValue, address.offset, address.base, address.index, address.scale); });
    30923092    }
    30933093
    30943094    Jump branchAtomicStrongCAS16(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address)
    30953095    {
    3096         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
     3096        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base); });
    30973097    }
    30983098
    30993099    Jump branchAtomicStrongCAS16(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    31003100    {
    3101         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3101        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgw_rm(newValue, address.offset, address.base, address.index, address.scale); });
    31023102    }
    31033103
    31043104    Jump branchAtomicStrongCAS32(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, Address address)
    31053105    {
    3106         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
     3106        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base); });
    31073107    }
    31083108
    31093109    Jump branchAtomicStrongCAS32(StatusCondition cond, RegisterID expectedAndResult, RegisterID newValue, BaseIndex address)
    31103110    {
    3111         return branchAtomicStrongCAS(cond, expectedAndResult, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
     3111        return branchAtomicStrongCAS(cond, expectedAndResult, address, [&] { m_assembler.cmpxchgl_rm(newValue, address.offset, address.base, address.index, address.scale); });
    31123112    }
    31133113
     
    40734073    }
    40744074   
    4075     template<typename Func>
    4076     void atomicStrongCAS(StatusCondition cond, RegisterID expectedAndResult, RegisterID result, const Func& func)
    4077     {
     4075    template<typename AddressType, typename Func>
     4076    void atomicStrongCAS(StatusCondition cond, RegisterID expectedAndResult, RegisterID result, AddressType& address, const Func& func)
     4077    {
     4078        address = address.withSwappedRegister(X86Registers::eax, expectedAndResult);
    40784079        swap(expectedAndResult, X86Registers::eax);
    40794080        m_assembler.lock();
     
    40834084    }
    40844085
    4085     template<typename Func>
    4086     void atomicStrongCAS(RegisterID expectedAndResult, const Func& func)
    4087     {
     4086    template<typename AddressType, typename Func>
     4087    void atomicStrongCAS(RegisterID expectedAndResult, AddressType& address, const Func& func)
     4088    {
     4089        address = address.withSwappedRegister(X86Registers::eax, expectedAndResult);
    40884090        swap(expectedAndResult, X86Registers::eax);
    40894091        m_assembler.lock();
     
    40924094    }
    40934095
    4094     template<typename Func>
    4095     Jump branchAtomicStrongCAS(StatusCondition cond, RegisterID expectedAndResult, const Func& func)
    4096     {
     4096    template<typename AddressType, typename Func>
     4097    Jump branchAtomicStrongCAS(StatusCondition cond, RegisterID expectedAndResult, AddressType& address, const Func& func)
     4098    {
     4099        address = address.withSwappedRegister(X86Registers::eax, expectedAndResult);
    40974100        swap(expectedAndResult, X86Registers::eax);
    40984101        m_assembler.lock();
Note: See TracChangeset for help on using the changeset viewer.