Ignore:
Timestamp:
Apr 20, 2017, 10:55:44 AM (8 years ago)
Author:
[email protected]
Message:

Optimize SharedArrayBuffer in the DFG+FTL
https://bugs.webkit.org/show_bug.cgi?id=164108

Reviewed by Saam Barati.

JSTests:

Added a fairly comprehensive test of the intrinsics. This creates a function for each possible
combination of type and operation, and then first uses it nicely and then tries a bunch of
erroneous conditions like OOB.

  • stress/SharedArrayBuffer-opt.js: Added.

(string_appeared_here.switch):
(string_appeared_here.str):
(runAtomic):
(shouldFail):
(Symbol):
(string_appeared_here.a.of.arrays.m.of.atomics):

  • stress/SharedArrayBuffer.js:

Source/JavaScriptCore:

This adds atomics intrinsics to the DFG and wires them through to the DFG and FTL backends. This
was super easy in the FTL since B3 already has comprehensive atomic intrinsics, which are more
powerful than what we need right now. In the DFG backend, I went with an easy-to-write
implementation that just reduces everything to a weak CAS loop. It's very inefficient with
registers (it needs ~8) but it's the DFG backend, so it's not obvious how much we care.

To make the rare cases easy to handle, I refactored AtomicsObject.cpp so that the operations for
the slow paths can share code with the native functions.

This also fixes register handling in the X86 implementations of CAS, in the case that
expectedAndResult is not %rax. This also fixes the ARM64 implementation of branchWeakCAS.

I adapted the CascadeLock from WTF/benchmarks/ToyLocks.h as a microbenchmark of lock performance.
This benchmark performs 2.5x faster, in both the contended and uncontended case, thanks to this
change. It's still about 3x slower than native. I investigated this only a bit. I suspect that
the story will be different in asm.js code, which will get constant-folding of the typed array
backing store by virtue of how it uses lexically scoped variables as pointers to the heap arrays.
It's worth noting that the native lock I was comparing against, the very nicely-tuned
CascadeLock, is at the very high end of lock throughput under virtually all conditions
(uncontended, microcontended, held for a long time). I also compared to WTF::Lock and others, and
the only ones that performed better in this microbenchmark were spinlocks. I don't recommend
using those. So, when I say this is 3x slower than native, I really mean that it's 3x slower than
the fastest native lock that I have in my arsenal.

Also worth noting is that I experimented with exposing Atomics.yield(), which uses sched_yield,
as a way of testing if adding a yield loop to the JS cascadeLock would help. It does not help. I
did not investigate why.

  • assembler/AbstractMacroAssembler.h:

(JSC::AbstractMacroAssembler::JumpList::append):

  • assembler/CPU.h:

(JSC::is64Bit):
(JSC::is32Bit):

  • b3/B3Common.h:

(JSC::B3::is64Bit): Deleted.
(JSC::B3::is32Bit): Deleted.

  • b3/B3LowerToAir.cpp:

(JSC::B3::Air::LowerToAir::appendTrapping):
(JSC::B3::Air::LowerToAir::appendCAS):
(JSC::B3::Air::LowerToAir::appendGeneralAtomic):

  • dfg/DFGAbstractInterpreterInlines.h:

(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):

  • dfg/DFGByteCodeParser.cpp:

(JSC::DFG::ByteCodeParser::handleIntrinsicCall):

  • dfg/DFGClobberize.h:

(JSC::DFG::clobberize):

  • dfg/DFGDoesGC.cpp:

(JSC::DFG::doesGC):

  • dfg/DFGFixupPhase.cpp:

(JSC::DFG::FixupPhase::fixupNode):

  • dfg/DFGNode.h:

(JSC::DFG::Node::hasHeapPrediction):
(JSC::DFG::Node::hasArrayMode):

  • dfg/DFGNodeType.h:

(JSC::DFG::isAtomicsIntrinsic):
(JSC::DFG::numExtraAtomicsArgs):

  • dfg/DFGPredictionPropagationPhase.cpp:
  • dfg/DFGSSALoweringPhase.cpp:

(JSC::DFG::SSALoweringPhase::handleNode):

  • dfg/DFGSafeToExecute.h:

(JSC::DFG::safeToExecute):

  • dfg/DFGSpeculativeJIT.cpp:

(JSC::DFG::SpeculativeJIT::loadFromIntTypedArray):
(JSC::DFG::SpeculativeJIT::setIntTypedArrayLoadResult):
(JSC::DFG::SpeculativeJIT::compileGetByValOnIntTypedArray):
(JSC::DFG::SpeculativeJIT::getIntTypedArrayStoreOperand):
(JSC::DFG::SpeculativeJIT::compilePutByValForIntTypedArray):

  • dfg/DFGSpeculativeJIT.h:

(JSC::DFG::SpeculativeJIT::callOperation):

  • dfg/DFGSpeculativeJIT32_64.cpp:

(JSC::DFG::SpeculativeJIT::compile):

  • dfg/DFGSpeculativeJIT64.cpp:

(JSC::DFG::SpeculativeJIT::compile):

  • ftl/FTLAbstractHeapRepository.cpp:

(JSC::FTL::AbstractHeapRepository::decorateFencedAccess):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):

  • ftl/FTLAbstractHeapRepository.h:
  • ftl/FTLCapabilities.cpp:

(JSC::FTL::canCompile):

  • ftl/FTLLowerDFGToB3.cpp:

(JSC::FTL::DFG::LowerDFGToB3::compileNode):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsReadModifyWrite):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsIsLockFree):
(JSC::FTL::DFG::LowerDFGToB3::compileGetByVal):
(JSC::FTL::DFG::LowerDFGToB3::compilePutByVal):
(JSC::FTL::DFG::LowerDFGToB3::pointerIntoTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::loadFromIntTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::storeType):
(JSC::FTL::DFG::LowerDFGToB3::setIntTypedArrayLoadResult):
(JSC::FTL::DFG::LowerDFGToB3::getIntTypedArrayStoreOperand):
(JSC::FTL::DFG::LowerDFGToB3::vmCall):

  • ftl/FTLOutput.cpp:

(JSC::FTL::Output::store):
(JSC::FTL::Output::store32As8):
(JSC::FTL::Output::store32As16):
(JSC::FTL::Output::atomicXchgAdd):
(JSC::FTL::Output::atomicXchgAnd):
(JSC::FTL::Output::atomicXchgOr):
(JSC::FTL::Output::atomicXchgSub):
(JSC::FTL::Output::atomicXchgXor):
(JSC::FTL::Output::atomicXchg):
(JSC::FTL::Output::atomicStrongCAS):

  • ftl/FTLOutput.h:

(JSC::FTL::Output::store32):
(JSC::FTL::Output::store64):
(JSC::FTL::Output::storePtr):
(JSC::FTL::Output::storeFloat):
(JSC::FTL::Output::storeDouble):

  • jit/JITOperations.h:
  • runtime/AtomicsObject.cpp:

(JSC::atomicsFuncAdd):
(JSC::atomicsFuncAnd):
(JSC::atomicsFuncCompareExchange):
(JSC::atomicsFuncExchange):
(JSC::atomicsFuncIsLockFree):
(JSC::atomicsFuncLoad):
(JSC::atomicsFuncOr):
(JSC::atomicsFuncStore):
(JSC::atomicsFuncSub):
(JSC::atomicsFuncWait):
(JSC::atomicsFuncWake):
(JSC::atomicsFuncXor):
(JSC::operationAtomicsAdd):
(JSC::operationAtomicsAnd):
(JSC::operationAtomicsCompareExchange):
(JSC::operationAtomicsExchange):
(JSC::operationAtomicsIsLockFree):
(JSC::operationAtomicsLoad):
(JSC::operationAtomicsOr):
(JSC::operationAtomicsStore):
(JSC::operationAtomicsSub):
(JSC::operationAtomicsXor):

  • runtime/AtomicsObject.h:

Source/WTF:

Made small changes as part of benchmarking the JS versions of these locks.

  • benchmarks/LockSpeedTest.cpp:
  • benchmarks/ToyLocks.h:
  • wtf/Range.h:

(WTF::Range::dump):

LayoutTests:

Add a test of futex performance.

  • workers/sab/cascade_lock-worker.js: Added.

(onmessage):

  • workers/sab/cascade_lock.html: Added.
  • workers/sab/worker-resources.js:

(cascadeLockSlow):
(cascadeLock):
(cascadeUnlock):

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ftl/FTLOutput.cpp

    r211670 r215565  
    11/*
    2  * Copyright (C) 2013-2016 Apple Inc. All rights reserved.
     2 * Copyright (C) 2013-2017 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    3030
    3131#include "B3ArgumentRegValue.h"
     32#include "B3AtomicValue.h"
    3233#include "B3BasicBlockInlines.h"
    3334#include "B3CCallValue.h"
     
    441442}
    442443
    443 void Output::store(LValue value, TypedPointer pointer)
     444LValue Output::store(LValue value, TypedPointer pointer)
    444445{
    445446    LValue store = m_block->appendNew<MemoryValue>(m_proc, Store, origin(), value, pointer.value());
    446447    m_heaps->decorateMemory(pointer.heap(), store);
     448    return store;
    447449}
    448450
     
    455457}
    456458
    457 void Output::store32As8(LValue value, TypedPointer pointer)
     459LValue Output::store32As8(LValue value, TypedPointer pointer)
    458460{
    459461    LValue store = m_block->appendNew<MemoryValue>(m_proc, Store8, origin(), value, pointer.value());
    460462    m_heaps->decorateMemory(pointer.heap(), store);
    461 }
    462 
    463 void Output::store32As16(LValue value, TypedPointer pointer)
     463    return store;
     464}
     465
     466LValue Output::store32As16(LValue value, TypedPointer pointer)
    464467{
    465468    LValue store = m_block->appendNew<MemoryValue>(m_proc, Store16, origin(), value, pointer.value());
    466469    m_heaps->decorateMemory(pointer.heap(), store);
     470    return store;
    467471}
    468472
     
    664668}
    665669
     670LValue Output::atomicXchgAdd(LValue operand, TypedPointer pointer, Width width)
     671{
     672    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchgAdd, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     673    m_heaps->decorateMemory(pointer.heap(), result);
     674    return result;
     675}
     676
     677LValue Output::atomicXchgAnd(LValue operand, TypedPointer pointer, Width width)
     678{
     679    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchgAnd, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     680    m_heaps->decorateMemory(pointer.heap(), result);
     681    return result;
     682}
     683
     684LValue Output::atomicXchgOr(LValue operand, TypedPointer pointer, Width width)
     685{
     686    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchgOr, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     687    m_heaps->decorateMemory(pointer.heap(), result);
     688    return result;
     689}
     690
     691LValue Output::atomicXchgSub(LValue operand, TypedPointer pointer, Width width)
     692{
     693    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchgSub, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     694    m_heaps->decorateMemory(pointer.heap(), result);
     695    return result;
     696}
     697
     698LValue Output::atomicXchgXor(LValue operand, TypedPointer pointer, Width width)
     699{
     700    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchgXor, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     701    m_heaps->decorateMemory(pointer.heap(), result);
     702    return result;
     703}
     704
     705LValue Output::atomicXchg(LValue operand, TypedPointer pointer, Width width)
     706{
     707    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicXchg, origin(), width, operand, pointer.value(), 0, HeapRange(), HeapRange());
     708    m_heaps->decorateMemory(pointer.heap(), result);
     709    return result;
     710}
     711
     712LValue Output::atomicStrongCAS(LValue expected, LValue newValue, TypedPointer pointer, Width width)
     713{
     714    LValue result = m_block->appendNew<AtomicValue>(m_proc, AtomicStrongCAS, origin(), width, expected, newValue, pointer.value(), 0, HeapRange(), HeapRange());
     715    m_heaps->decorateMemory(pointer.heap(), result);
     716    return result;
     717}
     718
    666719void Output::jump(LBasicBlock destination)
    667720{
     
    777830}
    778831
    779 void Output::store(LValue value, TypedPointer pointer, StoreType type)
     832LValue Output::store(LValue value, TypedPointer pointer, StoreType type)
    780833{
    781834    switch (type) {
    782835    case Store32As8:
    783         store32As8(value, pointer);
    784         return;
     836        return store32As8(value, pointer);
    785837    case Store32As16:
    786         store32As16(value, pointer);
    787         return;
     838        return store32As16(value, pointer);
    788839    case Store32:
    789         store32(value, pointer);
    790         return;
     840        return store32(value, pointer);
    791841    case Store64:
    792         store64(value, pointer);
    793         return;
     842        return store64(value, pointer);
    794843    case StorePtr:
    795         storePtr(value, pointer);
    796         return;
     844        return storePtr(value, pointer);
    797845    case StoreFloat:
    798         storeFloat(value, pointer);
    799         return;
     846        return storeFloat(value, pointer);
    800847    case StoreDouble:
    801         storeDouble(value, pointer);
    802         return;
     848        return storeDouble(value, pointer);
    803849    }
    804850    RELEASE_ASSERT_NOT_REACHED();
     851    return nullptr;
    805852}
    806853
Note: See TracChangeset for help on using the changeset viewer.