Ignore:
Timestamp:
Dec 6, 2009, 1:42:03 AM (15 years ago)
Author:
[email protected]
Message:

2009-12-05 Maciej Stachowiak <[email protected]>

Reviewed by Oliver Hunt.

conway benchmark spends half it's time in op_less (jump fusion fails)
https://bugs.webkit.org/show_bug.cgi?id=32190

<1% speedup on SunSpider and V8
2x speedup on "conway" benchmark


Two optimizations:

1) Improve codegen for logical operators &&,
and ! in a condition context


When generating code for combinations of &&,
and !, in a

condition context (i.e. in an if statement or loop condition), we
used to produce a value, and then separately jump based on its
truthiness. Now we pass the false and true targets in, and let the
logical operators generate jumps directly. This helps in four
ways:

a) Individual clauses of a short-circuit logical operator can now
jump directly to the then or else clause of an if statement (or to
the top or exit of a loop) instead of jumping to a jump.


b) It used to be that jump fusion with the condition of the first
clause of a logical operator was inhibited, because the register
was ref'd to be used later, in the actual condition jump; this no
longer happens since a jump straight to the final target is
generated directly.

c) It used to be that jump fusion with the condition of the second
clause of a logical operator was inhibited, because there was a
jump target right after the second clause and before the actual
condition jump. But now it's no longer necessary for the first
clause to jump there so jump fusion is not blocked.

d) We avoid generating excess mov statements in some cases.


As a concrete example this source:


if (!((x < q && y < q)
(t < q && z < q))) {

...

}


Used to generate this bytecode:


[ 34] less r1, r-15, r-19
[ 38] jfalse r1, 7(->45)
[ 41] less r1, r-16, r-19
[ 45] jtrue r1, 14(->59)
[ 48] less r1, r-17, r-19
[ 52] jfalse r1, 7(->59)
[ 55] less r1, r-18, r-19
[ 59] jtrue r1, 17(->76)


And now generates this bytecode (also taking advantage of the second optimization below):


[ 34] jnless r-15, r-19, 8(->42)
[ 38] jless r-16, r-19, 26(->64)
[ 42] jnless r-17, r-19, 8(->50)
[ 46] jless r-18, r-19, 18(->64)


Note the jump fusion and the fact that there's less jump
indirection - three of the four jumps go straight to the target
clause instead of indirecting through another jump.


2) Implement jless opcode to take advantage of the above, since we'll now often generate
a less followed by a jtrue where fusion is not forbidden.


  • parser/Nodes.h: (JSC::ExpressionNode::hasConditionContextCodegen): Helper function to determine whether a node supports special conditional codegen. Return false as this is the default. (JSC::ExpressionNode::emitBytecodeInConditionContext): Assert not reached - only really defined for nodes that do have conditional codegen. (JSC::UnaryOpNode::expr): Add const version. (JSC::LogicalNotNode::hasConditionContextCodegen): Returne true only if subexpression supports it. (JSC::LogicalOpNode::hasConditionContextCodegen): Return true.
  • parser/Nodes.cpp: (JSC::LogicalNotNode::emitBytecodeInConditionContext): Implemented - just swap the true and false targets for the child node. (JSC::LogicalOpNode::emitBytecodeInConditionContext): Implemented - handle jumps directly, improving codegen quality. Also handles further nested conditional codegen. (JSC::ConditionalNode::emitBytecode): Use condition context codegen when available. (JSC::IfNode::emitBytecode): ditto (JSC::IfElseNode::emitBytecode): ditto (JSC::DoWhileNode::emitBytecode): ditto (JSC::WhileNode::emitBytecode): ditto (JSC::ForNode::emitBytecode): ditto
  • bytecode/Opcode.h:
  • Added loop_if_false opcode - needed now that falsey jumps can be backwards.
  • Added jless opcode to take advantage of new fusion opportunities.
  • bytecode/CodeBlock.cpp: (JSC::CodeBlock::dump): Handle above.
  • bytecompiler/BytecodeGenerator.cpp: (JSC::BytecodeGenerator::emitJumpIfTrue): Add peephole for less + jtrue ==> jless. (JSC::BytecodeGenerator::emitJumpIfFalse): Add handling of backwrds falsey jumps.
  • bytecompiler/BytecodeGenerator.h: (JSC::BytecodeGenerator::emitNodeInConditionContext): Wrapper to handle tracking of overly deep expressions etc.
  • interpreter/Interpreter.cpp: (JSC::Interpreter::privateExecute): Implement the two new opcodes (loop_if_false, jless).
  • jit/JIT.cpp: (JSC::JIT::privateCompileMainPass): Implement JIT support for the two new opcodes. (JSC::JIT::privateCompileSlowCases): ditto
  • jit/JIT.h:
  • jit/JITArithmetic.cpp: (JSC::JIT::emit_op_jless): (JSC::JIT::emitSlow_op_jless): ditto (JSC::JIT::emitBinaryDoubleOp): ditto
  • jit/JITOpcodes.cpp: (JSC::JIT::emitSlow_op_loop_if_less): ditto (JSC::JIT::emit_op_loop_if_false): ditto (JSC::JIT::emitSlow_op_loop_if_false): ditto
  • jit/JITStubs.cpp:
  • jit/JITStubs.h: (JSC::):

2009-12-05 Maciej Stachowiak <[email protected]>

Reviewed by Oliver Hunt.

conway benchmark spends half it's time in op_less (jump fusion fails)
https://bugs.webkit.org/show_bug.cgi?id=32190

  • fast/js/codegen-loops-logical-nodes-expected.txt:
  • fast/js/script-tests/codegen-loops-logical-nodes.js: Update to test some newly sensitive cases of codegen that were not already covered.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/JavaScriptCore/interpreter/Interpreter.cpp

    r51671 r51735  
    26482648        NEXT_INSTRUCTION();
    26492649    }
     2650    DEFINE_OPCODE(op_loop_if_false) {
     2651        /* loop_if_true cond(r) target(offset)
     2652         
     2653           Jumps to offset target from the current instruction, if and
     2654           only if register cond converts to boolean as false.
     2655
     2656           Additionally this loop instruction may terminate JS execution is
     2657           the JS timeout is reached.
     2658         */
     2659        int cond = vPC[1].u.operand;
     2660        int target = vPC[2].u.operand;
     2661        if (!callFrame->r(cond).jsValue().toBoolean(callFrame)) {
     2662            vPC += target;
     2663            CHECK_FOR_TIMEOUT();
     2664            NEXT_INSTRUCTION();
     2665        }
     2666       
     2667        vPC += OPCODE_LENGTH(op_loop_if_true);
     2668        NEXT_INSTRUCTION();
     2669    }
    26502670    DEFINE_OPCODE(op_jtrue) {
    26512671        /* jtrue cond(r) target(offset)
     
    28092829
    28102830        vPC += OPCODE_LENGTH(op_jnless);
     2831        NEXT_INSTRUCTION();
     2832    }
     2833    DEFINE_OPCODE(op_jless) {
     2834        /* jless src1(r) src2(r) target(offset)
     2835
     2836           Checks whether register src1 is less than register src2, as
     2837           with the ECMAScript '<' operator, and then jumps to offset
     2838           target from the current instruction, if and only if the
     2839           result of the comparison is true.
     2840        */
     2841        JSValue src1 = callFrame->r(vPC[1].u.operand).jsValue();
     2842        JSValue src2 = callFrame->r(vPC[2].u.operand).jsValue();
     2843        int target = vPC[3].u.operand;
     2844
     2845        bool result = jsLess(callFrame, src1, src2);
     2846        CHECK_FOR_EXCEPTION();
     2847       
     2848        if (result) {
     2849            vPC += target;
     2850            NEXT_INSTRUCTION();
     2851        }
     2852
     2853        vPC += OPCODE_LENGTH(op_jless);
    28112854        NEXT_INSTRUCTION();
    28122855    }
Note: See TracChangeset for help on using the changeset viewer.