SlideShare a Scribd company logo
JavaScript Engine
  Performance
关于我

•   Baidu资深工程师

•   目前主要做性能优化相关的工作

•   参与W3C的“HTML” 和“Web Performance” 工作组




             @nwind                 @nwind
请注意

•   我不是虚拟机的专家,仅仅是业余兴趣

•   很多内容都经过了简化,实际情况要复杂很多

•   这里面的观点仅代表我个人看法
大纲

•   虚拟机的基本原理

•   JavaScript引擎是如何优化性能的

•   V8、Dart、Node.js的介绍

•   如何编写高性能的JavaScript代码
VM basic
Virtual Machine history
•   pascal 1970

•   smalltalk 1980

•   self 1986

•   python 1991

•   java 1995

•   javascript 1995
Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现
联网,以及面向对象编程是如何工作的。

但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
How Virtual Machine Work?

•   Parser

•   Intermediate Representation

•   Interpreter, JIT

•   Runtime, Garbage Collection
Parser


•   Tokenize

•   AST
Tokenize
              identifier           number


keyword
          var foo = 10;                    semicolon


                          equal
AST

               Assign




Variable foo            Constant 10
Intermediate Representation


•   Bytecode

•   Stack vs. register
Bytecode (SpiderMonkey)
                      00000:   deffun 0 null
                      00005:   nop
                      00006:   callvar 0
function foo(bar) {   00009:   int8 2
                      00011:   call 1
    return bar + 1;   00014:   pop
}                     00015:   stop

                      foo:
foo(2);               00020:   getarg 0
                      00023:   one
                      00024:   add
                      00025:   return
                      00026:   stop
Bytecode (JSC)
                      8 m_instructions; 168 bytes at 0x7fc1ba3070e0;
                      1 parameter(s); 10 callee register(s)

                      [    0]   enter
                      [    1]   mov! !    r0, undefined(@k0)
                      [    4]   get_global_var!   r1, 5
                      [    7]   mov! !    r2, undefined(@k0)

function foo(bar) {   [
                      [
                          10]
                          13]
                                mov! !
                                call!!
                                          r3, 2(@k1)
                                          r1, 2, 10
    return bar + 1;   [
                      [
                          17]
                          19]
                                op_call_put_result! !
                                end! !    r0
                                                          r0


}                     Constants:
                         k0 = undefined
                         k1 = 2
foo(2);               3 m_instructions; 64 bytes at 0x7fc1ba306e80;
                      2 parameter(s); 1 callee register(s)

                      [    0] enter
                      [    1] add! !     r0, r-7, 1(@k0)
                      [    6] ret! !     r0

                      Constants:
                         k0 = 1

                      End: 3
Stack vs. register
•   Stack

    •   JVM, .NET, PHP, Python, Old JavaScript engine

•   Register

    •   Lua, Dalvik, Modern JavaScript engine

    •   Smaller, Faster (about 20%~30%)

    •   RISC
Stack vs. register
local a,t,i    1:   PUSHNIL      3
a=a+i          2:   GETLOCAL     0 ; a
               3:   GETLOCAL     2 ; i
               4:   ADD
                                             local a,t,i   1:   LOADNIL    0   2   0
               5:   SETLOCAL     0   ; a
                                             a=a+i         2:   ADD        0   0   2
a=a+1          6:   SETLOCAL     0   ; a
                                             a=a+1         3:   ADD        0   0   250 ; a
               7:   ADDI         1
                                             a=t[i]        4:   GETTABLE   0   1   2
               8:   SETLOCAL     0   ;   a
a=t[i]         9:   GETLOCAL     1   ;   t
              10:   GETINDEXED   2   ;   i
              11:   SETLOCAL     0   ;   a
Interpreter


•   Switch statement

•   Direct threading, Indirect threading, Token threading ...
Switch statement
 while (true) {
 ! switch (opcode) {
 ! ! case ADD:
 ! ! ! ...
 ! ! ! break;

 ! ! case SUB:
 ! ! ! ...
 ! ! ! break;
       ...
 !}
 }
Direct threading
typedef void *Inst;
Inst program[] = { &&ADD, &&SUB };
Inst *ip = program;
goto *ip++;

ADD:
      ...
      goto *ip++;

SUB:
       ...

http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
Threaded Code
http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
Context Threading
          Essence of our Solution
…                      CTT - Context
iload_1                Threading Table           Bytecode bodies
iload_1               (generated code)           (ret terminated)
iadd
                     call   iload_1                 iload_1:
istore_1
iload_1              call   iload_1                   ..
bipush 64            call   iadd                    ret;
if_icmplt 2          call   istore_1
…                    call   iload_1                 iadd:
                     ..                               ..
                                                    ret;

                      Return Branch Predictor Stack

  Package bodies as subroutines andtechnique for virtual machine interpreters
            Context Threading: A flexible and efficient dispatch call them
Garbage Collection

•   Reference counting (php, python ...), Smart pointer

•   Tracing

    •   Generational

    •   Stop-the-world, Concurrent, Incremental
    •   Copying, Sweep, Compact
Why JavaScript is slow?

•   Dynamic Type

•   Weak Type

•   Need to parse every time

•   GC
Fight with Weak Type
Object model in most VM
     typedef union {
       void *p;
       double d;
       long l;
     } Value;

     typedef struct {
       unsigned char type;
       Value value;
     } Object;

     Object a;
Tagged pointer
在几乎所有系统中,指针地址会对齐 (4或8字节)




         http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
这意味着
0xc00ab958               指针的最后2或3个位⼀一定是0


             可以在最后⼀一位加1来表示指针

       1     0   0   1      1   0   0   0
              9                  8
           Pointer         Small Number
Tagged pointer
                Memory
                    ...
var a = 1           2

var b = {a:1}   0x3d2aa00
                    ...
                    ...
                 object b
                    ...
Small Number

2 − 1 = 1073741823
 30


−2 = −1073741824
  30

 31位能表示十亿,对大部分应用来说足够了
External Fixed Typed Array

•   Strong type, Fixed length

•   Out of VM heap

•   Example: Int32Array, Float64Array
Small Number + Typed Array
                               Seconds (smaller is better)


                                                                       4200
5000

                                                                  4020
3750
                                                3180
2500
                                   40x
1250

       50         70            80
   0
       C/C++   Java(HotSpot)      V8                 PHP              Ruby              Python


                               http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
Warning: Benchmark
       lies
ES6 will have struct
ES6 StructType
Point2D = new StructType({   Color = new StructType({
! x: uint32,                 ! r: uint8,
! y: uint32                  ! g: uint8,
});                          ! b: uint8
                             });


              Pixel = new StructType({
              ! point: Point2D,
              ! color: Color
              });
Use typed array to run faster
Fight with Dynamic
       Type
foo.bar
foo.bar in C

movl 4(%edx), %ecx   //get
movl %ecx, 4(%edx)   //put
foo.bar in JavaScript
found = HashTable.FindEntry(key)
if (found) return found;

for (pt = GetPrototype();
       pt != null;
       pt = pt.GetPrototype()) {
    found = pt.HashTable.FindEntry(key)
    if (found) return found;
}
How to optimize?
First, We need to know
     Object layout
Add Type for object

                      add property y
add property x




                     http://code.google.com/apis/v8/design.html
Inline Cache

•   Slow lookup at first time

•   Modify the JIT code in-place

•   Next time will directly jump to the address
Inline cache make simple

                      return foo.lookupProperty(bar);
function fun(foo) {
    return foo.bar;
}


                      if (foo[hiddenClass] == 0xfe1) {
                          return foo[indexOf_bar];
                      }
                      return foo.lookupProperty(bar);
实际代码中的JS并不会那么动态
Delete操作只占了0.1%
                     “An Analysis of the Dynamic Behavior of JavaScript...”




99%的原始类型可以在运行通过静态分析确定
97%的属性访问可以被inline cache

                    “TypeCastor: Demystify Dynamic Typing of JavaScript...”
V8 can’t handle delete yet




                                         20x times
                                          slower!


      http://jsperf.com/test-v8-delete
Avoid alter object property
          layout
Faster Data Structure
    & Algorithm
Array push is faster
than String concat?
http://jsperf.com/nwind-string-concat-vs-array-push
Why?
other string optimizations

•   Adaptive string search

    •   Single char, Linear, Boyer-Moore-Horspool

•   Adaptive ascii and utf-8

•   Zero copy sub string
Feel free to use String in
     modern Engine
Just-In-Time (JIT)
JIT

•   Method JIT, Trace JIT, Regular expression JIT

•   Register allocation

•   Code generation
How JIT work?

•   mmap, malloc (mprotect)

•   generate native code

•   cast (c), reinterpret_cast (c++)

•   call the function
V8
V8

•   Lars Bak

•   Hidden Class, PICs

•   Some of Built-in objects are written in JavaScript

•   Crankshaft
•   Precise generation GC
Lars Bak
•   implement VM since 1988

•   Beta

•   Self

•   JVM (VM architect at Sun)

•   V8 (Google)
Lines of code (VM only)
                             .cpp/.c                .h
500000


         110831
375000



250000             70787

         359986                63975
125000             224038                  80867    8043    15475
                              135547               120941   108280     42113
                                           83920
                                                                       44646
     0
         HotSpot    V8      SpiderMonkey    JSC     Ruby    CPython   PHP-Zend
Crankshaft
Javascript engine performance
Source code        Native Code


runtime profiling
                   High-Level IR    Low-Level IR   Opt Native Code



                         }   Crankshaft
Crankshaft

•   Profiling

•   Compiler optimization

•   Generate new JIT code

•   On-stack replacement
•   Deoptimize
High-Level IR (Hydrogen)
•   AST to SSA

•   Type inference (type feedback from inline cache)

•   Compiler optimization

    •   Function inline
    •   Loop-invariant code motion, Global value numbering

    •   Eliminate dead phis

    •   ...
Loop-invariant code motion

                            tmp = x + y;
for (i = 0; i < n; i++) {   for (i = 0; i < n; i++) {
    a[i] = x + y;               a[i] = tmp;
}                           }
Function inline limit for now
•   big function (large than 600 bytes)

•   have recursive

•   have unsupported statements

    •   with, switch
    •   try/catch/finally

    •   ...
Avoid “with”, “switch” and
    “try” in hot path
Built-in objects written in JS
   function ArraySort(comparefn) {
     ...
     // In-place QuickSort algorithm.
     // For short (length <= 22) arrays, insertion sort is used for efficiency.

    if (!IS_SPEC_FUNCTION(comparefn)) {
      comparefn = function (x, y) {
        if (x === y) return 0;
        if (%_IsSmi(x) && %_IsSmi(y)) {
           return %SmiLexicographicCompare(x, y);
         }
         x = ToString(x);
         y = ToString(y);
        if (x == y) return 0;
        else return x < y ? -1 : 1;
      };
    }
    ...



                               v8/src/array.js
GC

•   Precise

•   Stop-the-world

•   Generation

•   Incremental (2011-10)
V8 performance
V8 performance
V8 performance



     Why?
V8 performance



Unfair, they are using gmp library
Warning: Benchmark
       lies
Javascript engine performance
Node.JS
•   Pros                                •   Cons

    •   Easy to write Async I/O             •   Lack of great libraries

    •   One language for everything         •   Large JS is hard to maintain

    •   Maybe Faster than PHP, Python       •   Easy to have Memory leak
                                                (compare to PHP, Erlang)
    •   Bet on JavaScript is safe
                                            •   Still too youth, unproved
Why Dart?

•   Build for large application

    •   option type, structured, libraries, tools

•   Performance

    •   lightweight process like erlang
    •   easy to write a faster vm than javascript
The future of Dart?

•   It will not replace JS

•   But it may replace GWT, and become a better choice for
    Building large front-end application

    •   with great IDE, mature libraries

    •   and some way to communicate with JavaScript
How to make
JavaScript faster?
How to make JavaScript faster?
 •   Wait for ES6: StructType, const, WeakMap, yield...

 •   High performance build-in library

 •   WebCL

 •   Embed another language

     •   KL(FabricEngine), GLSL(WebGL)

 •   Wait for Quantum computer :)
Things you can learn also
•   NaN tagging

•   Polymorphic Inline Cache

•   Type Inference

•   Regex JIT

•   Runtime optimization

•   ...
References
•   The behavior of efficient virtual   •   Context Threading: A Flexible and
    machine interpreters on modern        Efficient Dispatch Technique for
    architectures                         Virtual Machine Interpreters

•   Virtual Machine Showdown: Stack   •   Effective Inline-Threaded
    Versus Registers                      Interpretation of Java Bytecode
                                          Using Preparation Sequences
•   The implementation of Lua 5.0
                                      •   Smalltalk-80: the language and its
•   Why Is the New Google V8 Engine       implementation
    so Fast?
References
•   Design of the Java HotSpotTM          •   LLVM: A Compilation Framework
    Client Compiler for Java 6                for Lifelong Program Analysis &
                                              Transformation
•   Oracle JRockit: The Definitive Guide
                                          •   Emscripten: An LLVM-to-JavaScript
•   Virtual Machines: Versatile               Compiler
    platforms for systems and
    processes                             •   An Analysis of the Dynamic
                                              Behavior of JavaScript Programs
•   Fast and Precise Hybrid Type
    Inference for JavaScript
References
•   Adaptive Optimization for SELF      •   Design, Implementation, and
                                            Evaluation of Optimizations in a
•   Bytecodes meet Combinators:             Just-In-Time Compiler
    invokedynamic on the JVM
                                        •   Optimizing direct threaded code by
•   Context Threading: A Flexible and       selective inlining
    Efficient Dispatch Technique for
    Virtual Machine Interpreters        •   Linear scan register allocation

•   Efficient Implementation of the       •   Optimizing Invokedynamic
    Smalltalk-80 System
                                        •   Threaded Code
References
•   Why Not a Bytecode VM?             •   Making the Compilation
                                           "Pipeline" Explicit- Dynamic
•   A Survey of Adaptive                   Compilation Using Trace Tree
    Optimization in Virtual Machines       Specialization

•   An Efficient Implementation of       •   Uniprocessor Garbage Collection
    SELF, a Dynamically-Typed              Techniques
    Object-Oriented Language Based
    on Prototypes
References
•   Representing Type Information in   •   The Structure and Performance of
    Dynamically Typed Languages            Efficient Interpreters

•   The Behavior of Efficient Virtual    •   Know Your Engines: How to Make
    Machine Interpreters on Modern         Your JavaScript Fast
    Architectures
                                       •   IE Blog, Chromium Blog, WebKit
•   Trace-based Just-in-Time Type          Blog, Opera Blog, Mozilla Blog,
    Specialization for Dynamic             Wingolog’s Blog, RednaxelaFX’s
    Languages                              Blog, David Mandelin’s Blog,
                                           Brendan Eich’s Blog...
!ank y"

More Related Content

PPTX
The glymphatic system in sleep and neurodegenerative diseases
PDF
Integral
PPTX
Massiv presentation
PDF
Neurobiology of sleep onset: the Sleep-wake switch
PDF
Hnicheel 5
PDF
Mie.s15 ht
PPTX
лекц 7 хүчирхийлэлийн тухай ойлголт
PPTX
Cs101 lec7
The glymphatic system in sleep and neurodegenerative diseases
Integral
Massiv presentation
Neurobiology of sleep onset: the Sleep-wake switch
Hnicheel 5
Mie.s15 ht
лекц 7 хүчирхийлэлийн тухай ойлголт
Cs101 lec7

What's hot (8)

PDF
PDF
Makro l 3
PPT
repetitive nerve stimulation
PPTX
DOCX
нүүр хуудас
PDF
хи квадрат тархалт
PPTX
хүхэр,түүний нэгдлүүд
PDF
MT101 Lecture 1(Mongolia)
Makro l 3
repetitive nerve stimulation
нүүр хуудас
хи квадрат тархалт
хүхэр,түүний нэгдлүүд
MT101 Lecture 1(Mongolia)
Ad

Viewers also liked (20)

PDF
浏览器渲染与web前端开发
PDF
高工的个人发展规划
PDF
Catch a spider monkey
PDF
Virtual machine and javascript engine
PPTX
Turbo charging v8 engine
PPTX
Indexed DB
PPTX
Understanding Javascript Engines
PPT
Xaml Tutorial By Allan
PDF
JavaScript Patterns
PDF
Baidu前端交流会-百度基础平台分享
PDF
Baidu Map API Introduction
PDF
D2分享:让前端开发更高效
PPT
知道你为什么找不到好工作吗?
PPT
Where to meet pretties
PDF
HTTP2:新的机遇与挑战
PPTX
Browser Wars Episode 1: The Phantom Menace
PDF
Web前端性能优化 2014
ODP
Engine Presentation
PPTX
Ic engine and its types,applications
浏览器渲染与web前端开发
高工的个人发展规划
Catch a spider monkey
Virtual machine and javascript engine
Turbo charging v8 engine
Indexed DB
Understanding Javascript Engines
Xaml Tutorial By Allan
JavaScript Patterns
Baidu前端交流会-百度基础平台分享
Baidu Map API Introduction
D2分享:让前端开发更高效
知道你为什么找不到好工作吗?
Where to meet pretties
HTTP2:新的机遇与挑战
Browser Wars Episode 1: The Phantom Menace
Web前端性能优化 2014
Engine Presentation
Ic engine and its types,applications
Ad

Similar to Javascript engine performance (20)

PDF
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
PDF
Exploring the x64
PDF
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
PPTX
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
PDF
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
PDF
Marat-Slides
KEY
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
PDF
Vectorization in ATLAS
PPT
EMBEDDED SYSTEMS 4&5
PDF
Vectorization on x86: all you need to know
PDF
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
PDF
Devirtualizing FinSpy
PPTX
Как работает LLVM бэкенд в C#. Егор Богатов ➠ CoreHard Autumn 2019
PDF
Zn task - defcon russia 20
PDF
Using Python3 to Build a Cloud Computing Service for my Superboard II
PPTX
General Purpose Computing using Graphics Hardware
PDF
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
PDF
SFO15-500: VIXL
PDF
Appsec obfuscator reloaded
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Exploring the x64
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Marat-Slides
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Vectorization in ATLAS
EMBEDDED SYSTEMS 4&5
Vectorization on x86: all you need to know
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Devirtualizing FinSpy
Как работает LLVM бэкенд в C#. Егор Богатов ➠ CoreHard Autumn 2019
Zn task - defcon russia 20
Using Python3 to Build a Cloud Computing Service for my Superboard II
General Purpose Computing using Graphics Hardware
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
SFO15-500: VIXL
Appsec obfuscator reloaded

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Spectroscopy.pptx food analysis technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
A comparative analysis of optical character recognition models for extracting...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
Machine Learning_overview_presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Spectroscopy.pptx food analysis technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Javascript engine performance

  • 1. JavaScript Engine Performance
  • 2. 关于我 • Baidu资深工程师 • 目前主要做性能优化相关的工作 • 参与W3C的“HTML” 和“Web Performance” 工作组 @nwind @nwind
  • 3. 请注意 • 我不是虚拟机的专家,仅仅是业余兴趣 • 很多内容都经过了简化,实际情况要复杂很多 • 这里面的观点仅代表我个人看法
  • 4. 大纲 • 虚拟机的基本原理 • JavaScript引擎是如何优化性能的 • V8、Dart、Node.js的介绍 • 如何编写高性能的JavaScript代码
  • 6. Virtual Machine history • pascal 1970 • smalltalk 1980 • self 1986 • python 1991 • java 1995 • javascript 1995
  • 8. How Virtual Machine Work? • Parser • Intermediate Representation • Interpreter, JIT • Runtime, Garbage Collection
  • 9. Parser • Tokenize • AST
  • 10. Tokenize identifier number keyword var foo = 10; semicolon equal
  • 11. AST Assign Variable foo Constant 10
  • 12. Intermediate Representation • Bytecode • Stack vs. register
  • 13. Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0 function foo(bar) { 00009: int8 2 00011: call 1 return bar + 1; 00014: pop } 00015: stop foo: foo(2); 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
  • 14. Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0) function foo(bar) { [ [ 10] 13] mov! ! call!! r3, 2(@k1) r1, 2, 10 return bar + 1; [ [ 17] 19] op_call_put_result! ! end! ! r0 r0 } Constants: k0 = undefined k1 = 2 foo(2); 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
  • 15. Stack vs. register • Stack • JVM, .NET, PHP, Python, Old JavaScript engine • Register • Lua, Dalvik, Modern JavaScript engine • Smaller, Faster (about 20%~30%) • RISC
  • 16. Stack vs. register local a,t,i 1: PUSHNIL 3 a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD local a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; a a=a+i 2: ADD 0 0 2 a=a+1 6: SETLOCAL 0 ; a a=a+1 3: ADD 0 0 250 ; a 7: ADDI 1 a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; a a=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
  • 17. Interpreter • Switch statement • Direct threading, Indirect threading, Token threading ...
  • 18. Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! break; ... !} }
  • 19. Direct threading typedef void *Inst; Inst program[] = { &&ADD, &&SUB }; Inst *ip = program; goto *ip++; ADD: ... goto *ip++; SUB: ... http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
  • 22. Context Threading Essence of our Solution … CTT - Context iload_1 Threading Table Bytecode bodies iload_1 (generated code) (ret terminated) iadd call iload_1 iload_1: istore_1 iload_1 call iload_1 .. bipush 64 call iadd ret; if_icmplt 2 call istore_1 … call iload_1 iadd: .. .. ret; Return Branch Predictor Stack Package bodies as subroutines andtechnique for virtual machine interpreters Context Threading: A flexible and efficient dispatch call them
  • 23. Garbage Collection • Reference counting (php, python ...), Smart pointer • Tracing • Generational • Stop-the-world, Concurrent, Incremental • Copying, Sweep, Compact
  • 24. Why JavaScript is slow? • Dynamic Type • Weak Type • Need to parse every time • GC
  • 26. Object model in most VM typedef union { void *p; double d; long l; } Value; typedef struct { unsigned char type; Value value; } Object; Object a;
  • 28. 在几乎所有系统中,指针地址会对齐 (4或8字节) http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
  • 29. 这意味着 0xc00ab958 指针的最后2或3个位⼀一定是0 可以在最后⼀一位加1来表示指针 1 0 0 1 1 0 0 0 9 8 Pointer Small Number
  • 30. Tagged pointer Memory ... var a = 1 2 var b = {a:1} 0x3d2aa00 ... ... object b ...
  • 31. Small Number 2 − 1 = 1073741823 30 −2 = −1073741824 30 31位能表示十亿,对大部分应用来说足够了
  • 32. External Fixed Typed Array • Strong type, Fixed length • Out of VM heap • Example: Int32Array, Float64Array
  • 33. Small Number + Typed Array Seconds (smaller is better) 4200 5000 4020 3750 3180 2500 40x 1250 50 70 80 0 C/C++ Java(HotSpot) V8 PHP Ruby Python http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
  • 35. ES6 will have struct
  • 36. ES6 StructType Point2D = new StructType({ Color = new StructType({ ! x: uint32, ! r: uint8, ! y: uint32 ! g: uint8, }); ! b: uint8 }); Pixel = new StructType({ ! point: Point2D, ! color: Color });
  • 37. Use typed array to run faster
  • 40. foo.bar in C movl 4(%edx), %ecx //get movl %ecx, 4(%edx) //put
  • 41. foo.bar in JavaScript found = HashTable.FindEntry(key) if (found) return found; for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found; }
  • 43. First, We need to know Object layout
  • 44. Add Type for object add property y add property x http://code.google.com/apis/v8/design.html
  • 45. Inline Cache • Slow lookup at first time • Modify the JIT code in-place • Next time will directly jump to the address
  • 46. Inline cache make simple return foo.lookupProperty(bar); function fun(foo) { return foo.bar; } if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar]; } return foo.lookupProperty(bar);
  • 47. 实际代码中的JS并不会那么动态 Delete操作只占了0.1% “An Analysis of the Dynamic Behavior of JavaScript...” 99%的原始类型可以在运行通过静态分析确定 97%的属性访问可以被inline cache “TypeCastor: Demystify Dynamic Typing of JavaScript...”
  • 48. V8 can’t handle delete yet 20x times slower! http://jsperf.com/test-v8-delete
  • 49. Avoid alter object property layout
  • 50. Faster Data Structure & Algorithm
  • 51. Array push is faster than String concat?
  • 53. Why?
  • 54. other string optimizations • Adaptive string search • Single char, Linear, Boyer-Moore-Horspool • Adaptive ascii and utf-8 • Zero copy sub string
  • 55. Feel free to use String in modern Engine
  • 57. JIT • Method JIT, Trace JIT, Regular expression JIT • Register allocation • Code generation
  • 58. How JIT work? • mmap, malloc (mprotect) • generate native code • cast (c), reinterpret_cast (c++) • call the function
  • 59. V8
  • 60. V8 • Lars Bak • Hidden Class, PICs • Some of Built-in objects are written in JavaScript • Crankshaft • Precise generation GC
  • 61. Lars Bak • implement VM since 1988 • Beta • Self • JVM (VM architect at Sun) • V8 (Google)
  • 62. Lines of code (VM only) .cpp/.c .h 500000 110831 375000 250000 70787 359986 63975 125000 224038 80867 8043 15475 135547 120941 108280 42113 83920 44646 0 HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
  • 65. Source code Native Code runtime profiling High-Level IR Low-Level IR Opt Native Code } Crankshaft
  • 66. Crankshaft • Profiling • Compiler optimization • Generate new JIT code • On-stack replacement • Deoptimize
  • 67. High-Level IR (Hydrogen) • AST to SSA • Type inference (type feedback from inline cache) • Compiler optimization • Function inline • Loop-invariant code motion, Global value numbering • Eliminate dead phis • ...
  • 68. Loop-invariant code motion tmp = x + y; for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { a[i] = x + y; a[i] = tmp; } }
  • 69. Function inline limit for now • big function (large than 600 bytes) • have recursive • have unsupported statements • with, switch • try/catch/finally • ...
  • 70. Avoid “with”, “switch” and “try” in hot path
  • 71. Built-in objects written in JS function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
  • 72. GC • Precise • Stop-the-world • Generation • Incremental (2011-10)
  • 76. V8 performance Unfair, they are using gmp library
  • 79. Node.JS • Pros • Cons • Easy to write Async I/O • Lack of great libraries • One language for everything • Large JS is hard to maintain • Maybe Faster than PHP, Python • Easy to have Memory leak (compare to PHP, Erlang) • Bet on JavaScript is safe • Still too youth, unproved
  • 80. Why Dart? • Build for large application • option type, structured, libraries, tools • Performance • lightweight process like erlang • easy to write a faster vm than javascript
  • 81. The future of Dart? • It will not replace JS • But it may replace GWT, and become a better choice for Building large front-end application • with great IDE, mature libraries • and some way to communicate with JavaScript
  • 83. How to make JavaScript faster? • Wait for ES6: StructType, const, WeakMap, yield... • High performance build-in library • WebCL • Embed another language • KL(FabricEngine), GLSL(WebGL) • Wait for Quantum computer :)
  • 84. Things you can learn also • NaN tagging • Polymorphic Inline Cache • Type Inference • Regex JIT • Runtime optimization • ...
  • 85. References • The behavior of efficient virtual • Context Threading: A Flexible and machine interpreters on modern Efficient Dispatch Technique for architectures Virtual Machine Interpreters • Virtual Machine Showdown: Stack • Effective Inline-Threaded Versus Registers Interpretation of Java Bytecode Using Preparation Sequences • The implementation of Lua 5.0 • Smalltalk-80: the language and its • Why Is the New Google V8 Engine implementation so Fast?
  • 86. References • Design of the Java HotSpotTM • LLVM: A Compilation Framework Client Compiler for Java 6 for Lifelong Program Analysis & Transformation • Oracle JRockit: The Definitive Guide • Emscripten: An LLVM-to-JavaScript • Virtual Machines: Versatile Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript Programs • Fast and Precise Hybrid Type Inference for JavaScript
  • 87. References • Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a • Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded code by • Context Threading: A Flexible and selective inlining Efficient Dispatch Technique for Virtual Machine Interpreters • Linear scan register allocation • Efficient Implementation of the • Optimizing Invokedynamic Smalltalk-80 System • Threaded Code
  • 88. References • Why Not a Bytecode VM? • Making the Compilation "Pipeline" Explicit- Dynamic • A Survey of Adaptive Compilation Using Trace Tree Optimization in Virtual Machines Specialization • An Efficient Implementation of • Uniprocessor Garbage Collection SELF, a Dynamically-Typed Techniques Object-Oriented Language Based on Prototypes
  • 89. References • Representing Type Information in • The Structure and Performance of Dynamically Typed Languages Efficient Interpreters • The Behavior of Efficient Virtual • Know Your Engines: How to Make Machine Interpreters on Modern Your JavaScript Fast Architectures • IE Blog, Chromium Blog, WebKit • Trace-based Just-in-Time Type Blog, Opera Blog, Mozilla Blog, Specialization for Dynamic Wingolog’s Blog, RednaxelaFX’s Languages Blog, David Mandelin’s Blog, Brendan Eich’s Blog...