Ignore:
Timestamp:
Dec 6, 2009, 1:42:03 AM (15 years ago)
Author:
[email protected]
Message:

2009-12-05 Maciej Stachowiak <[email protected]>

Reviewed by Oliver Hunt.

conway benchmark spends half it's time in op_less (jump fusion fails)
https://bugs.webkit.org/show_bug.cgi?id=32190

<1% speedup on SunSpider and V8
2x speedup on "conway" benchmark


Two optimizations:

1) Improve codegen for logical operators &&,
and ! in a condition context


When generating code for combinations of &&,
and !, in a

condition context (i.e. in an if statement or loop condition), we
used to produce a value, and then separately jump based on its
truthiness. Now we pass the false and true targets in, and let the
logical operators generate jumps directly. This helps in four
ways:

a) Individual clauses of a short-circuit logical operator can now
jump directly to the then or else clause of an if statement (or to
the top or exit of a loop) instead of jumping to a jump.


b) It used to be that jump fusion with the condition of the first
clause of a logical operator was inhibited, because the register
was ref'd to be used later, in the actual condition jump; this no
longer happens since a jump straight to the final target is
generated directly.

c) It used to be that jump fusion with the condition of the second
clause of a logical operator was inhibited, because there was a
jump target right after the second clause and before the actual
condition jump. But now it's no longer necessary for the first
clause to jump there so jump fusion is not blocked.

d) We avoid generating excess mov statements in some cases.


As a concrete example this source:


if (!((x < q && y < q)
(t < q && z < q))) {

...

}


Used to generate this bytecode:


[ 34] less r1, r-15, r-19
[ 38] jfalse r1, 7(->45)
[ 41] less r1, r-16, r-19
[ 45] jtrue r1, 14(->59)
[ 48] less r1, r-17, r-19
[ 52] jfalse r1, 7(->59)
[ 55] less r1, r-18, r-19
[ 59] jtrue r1, 17(->76)


And now generates this bytecode (also taking advantage of the second optimization below):


[ 34] jnless r-15, r-19, 8(->42)
[ 38] jless r-16, r-19, 26(->64)
[ 42] jnless r-17, r-19, 8(->50)
[ 46] jless r-18, r-19, 18(->64)


Note the jump fusion and the fact that there's less jump
indirection - three of the four jumps go straight to the target
clause instead of indirecting through another jump.


2) Implement jless opcode to take advantage of the above, since we'll now often generate
a less followed by a jtrue where fusion is not forbidden.


  • parser/Nodes.h: (JSC::ExpressionNode::hasConditionContextCodegen): Helper function to determine whether a node supports special conditional codegen. Return false as this is the default. (JSC::ExpressionNode::emitBytecodeInConditionContext): Assert not reached - only really defined for nodes that do have conditional codegen. (JSC::UnaryOpNode::expr): Add const version. (JSC::LogicalNotNode::hasConditionContextCodegen): Returne true only if subexpression supports it. (JSC::LogicalOpNode::hasConditionContextCodegen): Return true.
  • parser/Nodes.cpp: (JSC::LogicalNotNode::emitBytecodeInConditionContext): Implemented - just swap the true and false targets for the child node. (JSC::LogicalOpNode::emitBytecodeInConditionContext): Implemented - handle jumps directly, improving codegen quality. Also handles further nested conditional codegen. (JSC::ConditionalNode::emitBytecode): Use condition context codegen when available. (JSC::IfNode::emitBytecode): ditto (JSC::IfElseNode::emitBytecode): ditto (JSC::DoWhileNode::emitBytecode): ditto (JSC::WhileNode::emitBytecode): ditto (JSC::ForNode::emitBytecode): ditto
  • bytecode/Opcode.h:
  • Added loop_if_false opcode - needed now that falsey jumps can be backwards.
  • Added jless opcode to take advantage of new fusion opportunities.
  • bytecode/CodeBlock.cpp: (JSC::CodeBlock::dump): Handle above.
  • bytecompiler/BytecodeGenerator.cpp: (JSC::BytecodeGenerator::emitJumpIfTrue): Add peephole for less + jtrue ==> jless. (JSC::BytecodeGenerator::emitJumpIfFalse): Add handling of backwrds falsey jumps.
  • bytecompiler/BytecodeGenerator.h: (JSC::BytecodeGenerator::emitNodeInConditionContext): Wrapper to handle tracking of overly deep expressions etc.
  • interpreter/Interpreter.cpp: (JSC::Interpreter::privateExecute): Implement the two new opcodes (loop_if_false, jless).
  • jit/JIT.cpp: (JSC::JIT::privateCompileMainPass): Implement JIT support for the two new opcodes. (JSC::JIT::privateCompileSlowCases): ditto
  • jit/JIT.h:
  • jit/JITArithmetic.cpp: (JSC::JIT::emit_op_jless): (JSC::JIT::emitSlow_op_jless): ditto (JSC::JIT::emitBinaryDoubleOp): ditto
  • jit/JITOpcodes.cpp: (JSC::JIT::emitSlow_op_loop_if_less): ditto (JSC::JIT::emit_op_loop_if_false): ditto (JSC::JIT::emitSlow_op_loop_if_false): ditto
  • jit/JITStubs.cpp:
  • jit/JITStubs.h: (JSC::):

2009-12-05 Maciej Stachowiak <[email protected]>

Reviewed by Oliver Hunt.

conway benchmark spends half it's time in op_less (jump fusion fails)
https://bugs.webkit.org/show_bug.cgi?id=32190

  • fast/js/codegen-loops-logical-nodes-expected.txt:
  • fast/js/script-tests/codegen-loops-logical-nodes.js: Update to test some newly sensitive cases of codegen that were not already covered.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/jit/JITArithmetic.cpp

    r50599 r51735  
    147147    stubCall.call();
    148148    emitJumpSlowToHot(branchTest32(Zero, regT0), target);
     149}
     150
     151void JIT::emit_op_jless(Instruction* currentInstruction)
     152{
     153    unsigned op1 = currentInstruction[1].u.operand;
     154    unsigned op2 = currentInstruction[2].u.operand;
     155    unsigned target = currentInstruction[3].u.operand;
     156
     157    JumpList notInt32Op1;
     158    JumpList notInt32Op2;
     159
     160    // Int32 less.
     161    if (isOperandConstantImmediateInt(op1)) {
     162        emitLoad(op2, regT3, regT2);
     163        notInt32Op2.append(branch32(NotEqual, regT3, Imm32(JSValue::Int32Tag)));
     164        addJump(branch32(GreaterThan, regT2, Imm32(getConstantOperand(op1).asInt32())), target);
     165    } else if (isOperandConstantImmediateInt(op2)) {
     166        emitLoad(op1, regT1, regT0);
     167        notInt32Op1.append(branch32(NotEqual, regT1, Imm32(JSValue::Int32Tag)));
     168        addJump(branch32(LessThan, regT0, Imm32(getConstantOperand(op2).asInt32())), target);
     169    } else {
     170        emitLoad2(op1, regT1, regT0, op2, regT3, regT2);
     171        notInt32Op1.append(branch32(NotEqual, regT1, Imm32(JSValue::Int32Tag)));
     172        notInt32Op2.append(branch32(NotEqual, regT3, Imm32(JSValue::Int32Tag)));
     173        addJump(branch32(LessThan, regT0, regT2), target);
     174    }
     175
     176    if (!supportsFloatingPoint()) {
     177        addSlowCase(notInt32Op1);
     178        addSlowCase(notInt32Op2);
     179        return;
     180    }
     181    Jump end = jump();
     182
     183    // Double less.
     184    emitBinaryDoubleOp(op_jless, target, op1, op2, OperandTypes(), notInt32Op1, notInt32Op2, !isOperandConstantImmediateInt(op1), isOperandConstantImmediateInt(op1) || !isOperandConstantImmediateInt(op2));
     185    end.link(this);
     186}
     187
     188void JIT::emitSlow_op_jless(Instruction* currentInstruction, Vector<SlowCaseEntry>::iterator& iter)
     189{
     190    unsigned op1 = currentInstruction[1].u.operand;
     191    unsigned op2 = currentInstruction[2].u.operand;
     192    unsigned target = currentInstruction[3].u.operand;
     193
     194    if (!supportsFloatingPoint()) {
     195        if (!isOperandConstantImmediateInt(op1) && !isOperandConstantImmediateInt(op2))
     196            linkSlowCase(iter); // int32 check
     197        linkSlowCase(iter); // int32 check
     198    } else {
     199        if (!isOperandConstantImmediateInt(op1)) {
     200            linkSlowCase(iter); // double check
     201            linkSlowCase(iter); // int32 check
     202        }
     203        if (isOperandConstantImmediateInt(op1) || !isOperandConstantImmediateInt(op2))
     204            linkSlowCase(iter); // double check
     205    }
     206
     207    JITStubCall stubCall(this, cti_op_jless);
     208    stubCall.addArgument(op1);
     209    stubCall.addArgument(op2);
     210    stubCall.call();
     211    emitJumpSlowToHot(branchTest32(NonZero, regT0), target);
    149212}
    150213
     
    832895                addJump(branchDouble(DoubleLessThanOrEqualOrUnordered, fpRegT0, fpRegT2), dst);
    833896                break;
     897            case op_jless:
     898                emitLoadDouble(op1, fpRegT2);
     899                addJump(branchDouble(DoubleLessThan, fpRegT2, fpRegT0), dst);
     900                break;
    834901            case op_jnlesseq:
    835902                emitLoadDouble(op1, fpRegT2);
     
    885952                addJump(branchDouble(DoubleLessThanOrEqualOrUnordered, fpRegT1, fpRegT0), dst);
    886953                break;
     954            case op_jless:
     955                emitLoadDouble(op2, fpRegT1);
     956                addJump(branchDouble(DoubleLessThan, fpRegT0, fpRegT1), dst);
     957                break;
    887958            case op_jnlesseq:
    888959                emitLoadDouble(op2, fpRegT1);
     
    14531524        stubCall.call();
    14541525        emitJumpSlowToHot(branchTest32(Zero, regT0), target);
     1526    }
     1527}
     1528
     1529void JIT::emit_op_jless(Instruction* currentInstruction)
     1530{
     1531    unsigned op1 = currentInstruction[1].u.operand;
     1532    unsigned op2 = currentInstruction[2].u.operand;
     1533    unsigned target = currentInstruction[3].u.operand;
     1534
     1535    // We generate inline code for the following cases in the fast path:
     1536    // - int immediate to constant int immediate
     1537    // - constant int immediate to int immediate
     1538    // - int immediate to int immediate
     1539
     1540    if (isOperandConstantImmediateInt(op2)) {
     1541        emitGetVirtualRegister(op1, regT0);
     1542        emitJumpSlowCaseIfNotImmediateInteger(regT0);
     1543#if USE(JSVALUE64)
     1544        int32_t op2imm = getConstantOperandImmediateInt(op2);
     1545#else
     1546        int32_t op2imm = static_cast<int32_t>(JSImmediate::rawValue(getConstantOperand(op2)));
     1547#endif
     1548        addJump(branch32(LessThan, regT0, Imm32(op2imm)), target);
     1549    } else if (isOperandConstantImmediateInt(op1)) {
     1550        emitGetVirtualRegister(op2, regT1);
     1551        emitJumpSlowCaseIfNotImmediateInteger(regT1);
     1552#if USE(JSVALUE64)
     1553        int32_t op1imm = getConstantOperandImmediateInt(op1);
     1554#else
     1555        int32_t op1imm = static_cast<int32_t>(JSImmediate::rawValue(getConstantOperand(op1)));
     1556#endif
     1557        addJump(branch32(GreaterThan, regT1, Imm32(op1imm)), target);
     1558    } else {
     1559        emitGetVirtualRegisters(op1, regT0, op2, regT1);
     1560        emitJumpSlowCaseIfNotImmediateInteger(regT0);
     1561        emitJumpSlowCaseIfNotImmediateInteger(regT1);
     1562
     1563        addJump(branch32(LessThan, regT0, regT1), target);
     1564    }
     1565}
     1566
     1567void JIT::emitSlow_op_jless(Instruction* currentInstruction, Vector<SlowCaseEntry>::iterator& iter)
     1568{
     1569    unsigned op1 = currentInstruction[1].u.operand;
     1570    unsigned op2 = currentInstruction[2].u.operand;
     1571    unsigned target = currentInstruction[3].u.operand;
     1572
     1573    // We generate inline code for the following cases in the slow path:
     1574    // - floating-point number to constant int immediate
     1575    // - constant int immediate to floating-point number
     1576    // - floating-point number to floating-point number.
     1577
     1578    if (isOperandConstantImmediateInt(op2)) {
     1579        linkSlowCase(iter);
     1580
     1581        if (supportsFloatingPoint()) {
     1582#if USE(JSVALUE64)
     1583            Jump fail1 = emitJumpIfNotImmediateNumber(regT0);
     1584            addPtr(tagTypeNumberRegister, regT0);
     1585            movePtrToDouble(regT0, fpRegT0);
     1586#else
     1587            Jump fail1;
     1588            if (!m_codeBlock->isKnownNotImmediate(op1))
     1589                fail1 = emitJumpIfNotJSCell(regT0);
     1590
     1591            Jump fail2 = checkStructure(regT0, m_globalData->numberStructure.get());
     1592            loadDouble(Address(regT0, OBJECT_OFFSETOF(JSNumberCell, m_value)), fpRegT0);
     1593#endif
     1594           
     1595            int32_t op2imm = getConstantOperand(op2).asInt32();
     1596                   
     1597            move(Imm32(op2imm), regT1);
     1598            convertInt32ToDouble(regT1, fpRegT1);
     1599
     1600            emitJumpSlowToHot(branchDouble(DoubleLessThan, fpRegT0, fpRegT1), target);
     1601
     1602            emitJumpSlowToHot(jump(), OPCODE_LENGTH(op_jnless));
     1603
     1604#if USE(JSVALUE64)
     1605            fail1.link(this);
     1606#else
     1607            if (!m_codeBlock->isKnownNotImmediate(op1))
     1608                fail1.link(this);
     1609            fail2.link(this);
     1610#endif
     1611        }
     1612
     1613        JITStubCall stubCall(this, cti_op_jless);
     1614        stubCall.addArgument(regT0);
     1615        stubCall.addArgument(op2, regT2);
     1616        stubCall.call();
     1617        emitJumpSlowToHot(branchTest32(Zero, regT0), target);
     1618
     1619    } else if (isOperandConstantImmediateInt(op1)) {
     1620        linkSlowCase(iter);
     1621
     1622        if (supportsFloatingPoint()) {
     1623#if USE(JSVALUE64)
     1624            Jump fail1 = emitJumpIfNotImmediateNumber(regT1);
     1625            addPtr(tagTypeNumberRegister, regT1);
     1626            movePtrToDouble(regT1, fpRegT1);
     1627#else
     1628            Jump fail1;
     1629            if (!m_codeBlock->isKnownNotImmediate(op2))
     1630                fail1 = emitJumpIfNotJSCell(regT1);
     1631           
     1632            Jump fail2 = checkStructure(regT1, m_globalData->numberStructure.get());
     1633            loadDouble(Address(regT1, OBJECT_OFFSETOF(JSNumberCell, m_value)), fpRegT1);
     1634#endif
     1635           
     1636            int32_t op1imm = getConstantOperand(op1).asInt32();
     1637                   
     1638            move(Imm32(op1imm), regT0);
     1639            convertInt32ToDouble(regT0, fpRegT0);
     1640
     1641            emitJumpSlowToHot(branchDouble(DoubleLessThan, fpRegT0, fpRegT1), target);
     1642
     1643            emitJumpSlowToHot(jump(), OPCODE_LENGTH(op_jnless));
     1644
     1645#if USE(JSVALUE64)
     1646            fail1.link(this);
     1647#else
     1648            if (!m_codeBlock->isKnownNotImmediate(op2))
     1649                fail1.link(this);
     1650            fail2.link(this);
     1651#endif
     1652        }
     1653
     1654        JITStubCall stubCall(this, cti_op_jless);
     1655        stubCall.addArgument(op1, regT2);
     1656        stubCall.addArgument(regT1);
     1657        stubCall.call();
     1658        emitJumpSlowToHot(branchTest32(Zero, regT0), target);
     1659
     1660    } else {
     1661        linkSlowCase(iter);
     1662
     1663        if (supportsFloatingPoint()) {
     1664#if USE(JSVALUE64)
     1665            Jump fail1 = emitJumpIfNotImmediateNumber(regT0);
     1666            Jump fail2 = emitJumpIfNotImmediateNumber(regT1);
     1667            Jump fail3 = emitJumpIfImmediateInteger(regT1);
     1668            addPtr(tagTypeNumberRegister, regT0);
     1669            addPtr(tagTypeNumberRegister, regT1);
     1670            movePtrToDouble(regT0, fpRegT0);
     1671            movePtrToDouble(regT1, fpRegT1);
     1672#else
     1673            Jump fail1;
     1674            if (!m_codeBlock->isKnownNotImmediate(op1))
     1675                fail1 = emitJumpIfNotJSCell(regT0);
     1676
     1677            Jump fail2;
     1678            if (!m_codeBlock->isKnownNotImmediate(op2))
     1679                fail2 = emitJumpIfNotJSCell(regT1);
     1680
     1681            Jump fail3 = checkStructure(regT0, m_globalData->numberStructure.get());
     1682            Jump fail4 = checkStructure(regT1, m_globalData->numberStructure.get());
     1683            loadDouble(Address(regT0, OBJECT_OFFSETOF(JSNumberCell, m_value)), fpRegT0);
     1684            loadDouble(Address(regT1, OBJECT_OFFSETOF(JSNumberCell, m_value)), fpRegT1);
     1685#endif
     1686
     1687            emitJumpSlowToHot(branchDouble(DoubleLessThan, fpRegT0, fpRegT1), target);
     1688
     1689            emitJumpSlowToHot(jump(), OPCODE_LENGTH(op_jnless));
     1690
     1691#if USE(JSVALUE64)
     1692            fail1.link(this);
     1693            fail2.link(this);
     1694            fail3.link(this);
     1695#else
     1696            if (!m_codeBlock->isKnownNotImmediate(op1))
     1697                fail1.link(this);
     1698            if (!m_codeBlock->isKnownNotImmediate(op2))
     1699                fail2.link(this);
     1700            fail3.link(this);
     1701            fail4.link(this);
     1702#endif
     1703        }
     1704
     1705        linkSlowCase(iter);
     1706        JITStubCall stubCall(this, cti_op_jless);
     1707        stubCall.addArgument(regT0);
     1708        stubCall.addArgument(regT1);
     1709        stubCall.call();
     1710        emitJumpSlowToHot(branchTest32(NotZero, regT0), target);
    14551711    }
    14561712}
Note: See TracChangeset for help on using the changeset viewer.