SlideShare a Scribd company logo
How to create a programming language
TABLE OF CONTENT

 1. Table of Content
 2. Introduction
       1. Summary
       2. About The Author
       3. Before We Begin
 3. Overview
       1. The Four Parts of a Language
       2. Meet Awesome: Our Toy Language
 4. Lexer
       1. Lex (Flex)
       2. Ragel
       3. Python Style Indentation For Awesome
       4. Do It Yourself I
 5. Parser
       1. Bison (Yacc)
       2. Lemon
       3. ANTLR
       4. PEGs
       5. Operator Precedence
       6. Connecting The Lexer and Parser in Awesome
       7. Do It Yourself II
 6. Runtime Model
       1. Procedural
       2. Class-based
       3. Prototype-based
       4. Functional
       5. Our Awesome Runtime
       6. Do It Yourself III
7. Interpreter
       1. Do It Yourself IV
 8. Compilation
       1. Using LLVM from Ruby
       2. Compiling Awesome to Machine Code
 9. Virtual Machine
       1. Byte-code
       2. Types of VM
       3. Prototyping a VM in Ruby
10. Going Further
       1. Homoiconicity
       2. Self-Hosting
       3. What’s Missing?
11. Resources
       1. Books & Papers
       2. Events
       3. Forums and Blogs
       4. Interesting Languages
12. Solutions to Do It Yourself
       1. Solutions to Do It Yourself I
       2. Solutions to Do It Yourself II
       3. Solutions to Do It Yourself III
       4. Solutions to Do It Yourself IV
13. Appendix: Mio, a minimalist homoiconic language
       1. Homoicowhat?
       2. Messages all the way down
       3. The Runtime
       4. Implementing Mio in Mio
       5. But it’s ugly
14. Farewell!
Published November 2011.

Cover background image © Asja Boros

Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy
is for a single user. You may not share it in any way unless you have written permission
of the author.
This is a sample chapter.
Buy the full book online at
   createyourproglang.com
PARSER

    By themselves, the tokens output by the lexer are just building blocks. The parser
    contextualiees them by organieing them in a structure. The lexer produces an array of
    tokensj the parser produces a tree of nodes.

    Lets take those tokens from previous section:


1    [IDENTIFIER print] [STRING "I ate"] [COMMA]
2                          [NUMBER 3] [COMMA]
3                          [IDENTIFIER pies]



    The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of
    nodes that represents what the code means to the language. The previous lexer
    tokens will produce the following:


1    [lCall name=print,
2           argements=[lString valee="I ate"k,
3                          lNemher valee=3k,
4                          lLocal name=piesk]
5    k]



    Or as a visual tree:

    Figure 2
The parser found that print was a method call and the following tokens are the
    arguments.

    Parser generators are commonly used to accomplish the otherwise tedious task of
    building a parser. Much like the English language, a programming language needs a
    grammar to define its rules. The parser generator will convert this grammar into a parser
    that will compile lexer tokens into AST nodes.


    BISON (YACC )

    Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet
    Another Compiler Compiler, because it compiles the grammar to a compiler of
    tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it
    has been ported to several target languages.

             Racc for Ruby
             Ply for Python
             iavaCC for iava

    Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how
    a Yacc grammar rule is defined:


1    Call: /* Name of the rele */
2         Expression '.' IDENTIFIER                     { yy = CallNodefnew(y1, y3, NULL); }
3    j Expression '.' IDENTIFIER '(' ArgList ')'        { yy = CallNodefnew(y1, y3, y5); }
4    /*     y1       y2        y3     y4   y5      y6     l= valees from the rele are stored in
5                                                          these variahles. */
6    ;



    On the left is defined how the rule can be matched using tokens and other rules.
    On the right side, between brackets is the action to execute when the rule matches.
In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store
the result in $$.


LEMON

Lemon is huite similar to Yacc, with a few differences. From its website:

                    Using a different grammar syntax which is less prone to programming

                    errors.

                    The parser generated by Lemon is both re-entrant and thread-safe.

                    Lemon includes the concept of a non-terminal destructor, which

                    makes it much easier to write a parser that does not leak memory.


For more information, refer to the the manual or check real examples inside Potion.


ANTLR

ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in
the same grammar. It has been ported to several target languages.


PEGS

Parsing Expression Grammars, or PEGs, are very powerful at parsing complex
languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s
infamous syntax with encouraging results (tinyrb’s grammar).

Treetop is an interesting Ruby tool for creating PEG.


OPERATOR PRECEDENCE

One of the common pitfalls of language parsing is operator precedence. Parsing x
+ y    * z should not produce the same result as (x                + y)      * z, same for all other
operators. Each language has an operator precedence table, often based on
    mathematics order of operations. Several ways to handle this exist. Yacc-based
    parsers implement the Shunting Yard algorithm in which you give a precedence
    level to each kind of operator. Operators are declared in Bison and Yacc with %left
    and %right macros. Read more in Bison’s manual.

    Here’s the operator precedence table for our language, based on the C language
    operator precedence:


1    left   '.'
2    right 'n'
3    left   '*' '/'
4    left   '+' '-'
5    left   'k' 'k=' 'l' 'l='
6    left   '==' 'n='
7    left   'vv'
8    left   'jj'
9    right '='
10   left   ','



    The higher the precedence (top is higher), the sooner the operator will be parsed. If
    the line a + b    * c is being parsed, the part b   * c will be parsed first since *
    has higher precedence than +. Now, if several operators having the same
    precedence are competing to be parsed all the once, the conflict is resolved using
    associativity, declared with the left and right keyword before the token. For
    example, with the expression a = b        = c. Since = has right-to-left associativity, it
    will start parsing from the right, b   = c. Resulting in a = (b = c).

    For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative
    can be used. Simply declaring the grammar rules in the right order will produce the
    desired result:
1    expression:          exeality
 2    exeality:            additive ( ( '==' j 'n=' ) additive )*
 3    additive:            meltiplicative ( ( '+' j '-' ) meltiplicative )*
 4    meltiplicative:    primary ( ( '*' j '/' ) primary )*
 5    primary:            '(' expression ')' j NUMBER j zARIABLE j '-' primary



     The parser will try to match rules recursively, starting from expression and
     finding its way to primary. Since multiplicative is the last rule called in the
     parsing process, it will have greater precedence.


     CONNECTING THE LEXER AND PARSER IN
     AWESOME

     For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder
     to build a parser from scratch than it is to create a lexer. However, most languages
     end up writing their own parser because the result is faster and provides better error
     reporting.

     The input file you supply to Racc contains the grammar of your language and is very
     similar to a Yacc grammar.


 1    class qarser                                                                  grammar.y
 2

 3    g Declare toiens prodeced hy the lexer
 4    toien IF ELSE
 5    toien DEF
 6    toien CLASS
 7    toien NEWLINE
 8    toien NUMBER
 9    toien STRING
10    toien TRUE FALSE NIL
11    toien IDENTIFIER
12    toien CONSTANT
13    toien INDENT DEDENT
14
This is a sample chapter.
Buy the full book online at

  createyourproglang.com
Ad

Recommended

Introduction to Python
Introduction to Python
amiable_indian
 
Python by Rj
Python by Rj
Shree M.L.Kakadiya MCA mahila college, Amreli
 
web programming UNIT VIII python by Bhavsingh Maloth
web programming UNIT VIII python by Bhavsingh Maloth
Bhavsingh Maloth
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
Jaganadh Gopinadhan
 
Programming with Python
Programming with Python
Rasan Samarasinghe
 
Python ppt
Python ppt
Mohita Pandey
 
PYTHON NOTES
PYTHON NOTES
Ni
 
Introduction to python
Introduction to python
MaheshPandit16
 
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio
 
Get started python programming part 1
Get started python programming part 1
Nicholas I
 
Python 3 Programming Language
Python 3 Programming Language
Tahani Al-Manie
 
Python - the basics
Python - the basics
University of Technology
 
Intro to Python Programming Language
Intro to Python Programming Language
Dipankar Achinta
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
Amira ElSharkawy
 
Introduction to python
Introduction to python
Yi-Fan Chu
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
Tom Lee
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]
Tom Lee
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists Anymore
Erin Dees
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
Tom Lee
 
introduction to python
introduction to python
Jincy Nelson
 
Your Own Metric System
Your Own Metric System
Erin Dees
 
Python revision tour i
Python revision tour i
Mr. Vikram Singh Slathia
 
Python
Python
Rural Engineering College Hulkoti
 
Unit VI
Unit VI
Bhavsingh Maloth
 
Write Your Own JVM Compiler
Write Your Own JVM Compiler
Erin Dees
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
Kevlin Henney
 
Lexing and parsing
Lexing and parsing
Elizabeth Smith
 
Learn python – for beginners
Learn python – for beginners
RajKumar Rampelli
 
Parser
Parser
Mallikarjun Rao
 
Lexyacc
Lexyacc
unifesptk
 

More Related Content

What's hot (20)

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio
 
Get started python programming part 1
Get started python programming part 1
Nicholas I
 
Python 3 Programming Language
Python 3 Programming Language
Tahani Al-Manie
 
Python - the basics
Python - the basics
University of Technology
 
Intro to Python Programming Language
Intro to Python Programming Language
Dipankar Achinta
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
Amira ElSharkawy
 
Introduction to python
Introduction to python
Yi-Fan Chu
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
Tom Lee
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]
Tom Lee
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists Anymore
Erin Dees
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
Tom Lee
 
introduction to python
introduction to python
Jincy Nelson
 
Your Own Metric System
Your Own Metric System
Erin Dees
 
Python revision tour i
Python revision tour i
Mr. Vikram Singh Slathia
 
Python
Python
Rural Engineering College Hulkoti
 
Unit VI
Unit VI
Bhavsingh Maloth
 
Write Your Own JVM Compiler
Write Your Own JVM Compiler
Erin Dees
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
Kevlin Henney
 
Lexing and parsing
Lexing and parsing
Elizabeth Smith
 
Learn python – for beginners
Learn python – for beginners
RajKumar Rampelli
 
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio
 
Get started python programming part 1
Get started python programming part 1
Nicholas I
 
Python 3 Programming Language
Python 3 Programming Language
Tahani Al-Manie
 
Intro to Python Programming Language
Intro to Python Programming Language
Dipankar Achinta
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
Amira ElSharkawy
 
Introduction to python
Introduction to python
Yi-Fan Chu
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
Tom Lee
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]
Tom Lee
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists Anymore
Erin Dees
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
Tom Lee
 
introduction to python
introduction to python
Jincy Nelson
 
Your Own Metric System
Your Own Metric System
Erin Dees
 
Write Your Own JVM Compiler
Write Your Own JVM Compiler
Erin Dees
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
Kevlin Henney
 
Learn python – for beginners
Learn python – for beginners
RajKumar Rampelli
 

Similar to How to create a programming language (20)

Parser
Parser
Mallikarjun Rao
 
Lexyacc
Lexyacc
unifesptk
 
Lexyacc
Lexyacc
Rommel Garcìa
 
Syntax
Syntax
ABDERRAHMAN ID -SAID
 
Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
David Beazley (Dabeaz LLC)
 
Inside Python
Inside Python
Alexey Ivanov
 
Programming Languages #devcon2013
Programming Languages #devcon2013
Iván Montes
 
Ch4c.ppt
Ch4c.ppt
MDSayem35
 
Lexyacc
Lexyacc
Hina Tahir
 
CD U1-5.pptx
CD U1-5.pptx
Himajanaidu2
 
Os Keysholistic
Os Keysholistic
oscon2007
 
Ch2
Ch2
kinnarshah8888
 
ProjectCompilers.pdfPage 1 of 6 Project Con.docx
ProjectCompilers.pdfPage 1 of 6 Project Con.docx
wkyra78
 
Unit1.ppt
Unit1.ppt
BerlinShaheema2
 
compiler Design laboratory lex and yacc tutorial
compiler Design laboratory lex and yacc tutorial
babar532588
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
Amrita Sharma
 
Lex & yacc
Lex & yacc
Taha Malampatti
 
2. introduction to compiler
2. introduction to compiler
Saeed Parsa
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and Lemon
Tristan Penman
 
SS & CD Module 3
SS & CD Module 3
ShwetaNirmanik
 
Ad

Recently uploaded (20)

Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Ad

How to create a programming language

  • 2. TABLE OF CONTENT 1. Table of Content 2. Introduction 1. Summary 2. About The Author 3. Before We Begin 3. Overview 1. The Four Parts of a Language 2. Meet Awesome: Our Toy Language 4. Lexer 1. Lex (Flex) 2. Ragel 3. Python Style Indentation For Awesome 4. Do It Yourself I 5. Parser 1. Bison (Yacc) 2. Lemon 3. ANTLR 4. PEGs 5. Operator Precedence 6. Connecting The Lexer and Parser in Awesome 7. Do It Yourself II 6. Runtime Model 1. Procedural 2. Class-based 3. Prototype-based 4. Functional 5. Our Awesome Runtime 6. Do It Yourself III
  • 3. 7. Interpreter 1. Do It Yourself IV 8. Compilation 1. Using LLVM from Ruby 2. Compiling Awesome to Machine Code 9. Virtual Machine 1. Byte-code 2. Types of VM 3. Prototyping a VM in Ruby 10. Going Further 1. Homoiconicity 2. Self-Hosting 3. What’s Missing? 11. Resources 1. Books & Papers 2. Events 3. Forums and Blogs 4. Interesting Languages 12. Solutions to Do It Yourself 1. Solutions to Do It Yourself I 2. Solutions to Do It Yourself II 3. Solutions to Do It Yourself III 4. Solutions to Do It Yourself IV 13. Appendix: Mio, a minimalist homoiconic language 1. Homoicowhat? 2. Messages all the way down 3. The Runtime 4. Implementing Mio in Mio 5. But it’s ugly 14. Farewell!
  • 4. Published November 2011. Cover background image © Asja Boros Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy is for a single user. You may not share it in any way unless you have written permission of the author.
  • 5. This is a sample chapter. Buy the full book online at createyourproglang.com
  • 6. PARSER By themselves, the tokens output by the lexer are just building blocks. The parser contextualiees them by organieing them in a structure. The lexer produces an array of tokensj the parser produces a tree of nodes. Lets take those tokens from previous section: 1 [IDENTIFIER print] [STRING "I ate"] [COMMA] 2 [NUMBER 3] [COMMA] 3 [IDENTIFIER pies] The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of nodes that represents what the code means to the language. The previous lexer tokens will produce the following: 1 [lCall name=print, 2 argements=[lString valee="I ate"k, 3 lNemher valee=3k, 4 lLocal name=piesk] 5 k] Or as a visual tree: Figure 2
  • 7. The parser found that print was a method call and the following tokens are the arguments. Parser generators are commonly used to accomplish the otherwise tedious task of building a parser. Much like the English language, a programming language needs a grammar to define its rules. The parser generator will convert this grammar into a parser that will compile lexer tokens into AST nodes. BISON (YACC ) Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet Another Compiler Compiler, because it compiles the grammar to a compiler of tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it has been ported to several target languages. Racc for Ruby Ply for Python iavaCC for iava Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how a Yacc grammar rule is defined: 1 Call: /* Name of the rele */ 2 Expression '.' IDENTIFIER { yy = CallNodefnew(y1, y3, NULL); } 3 j Expression '.' IDENTIFIER '(' ArgList ')' { yy = CallNodefnew(y1, y3, y5); } 4 /* y1 y2 y3 y4 y5 y6 l= valees from the rele are stored in 5 these variahles. */ 6 ; On the left is defined how the rule can be matched using tokens and other rules. On the right side, between brackets is the action to execute when the rule matches.
  • 8. In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store the result in $$. LEMON Lemon is huite similar to Yacc, with a few differences. From its website: Using a different grammar syntax which is less prone to programming errors. The parser generated by Lemon is both re-entrant and thread-safe. Lemon includes the concept of a non-terminal destructor, which makes it much easier to write a parser that does not leak memory. For more information, refer to the the manual or check real examples inside Potion. ANTLR ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in the same grammar. It has been ported to several target languages. PEGS Parsing Expression Grammars, or PEGs, are very powerful at parsing complex languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s infamous syntax with encouraging results (tinyrb’s grammar). Treetop is an interesting Ruby tool for creating PEG. OPERATOR PRECEDENCE One of the common pitfalls of language parsing is operator precedence. Parsing x + y * z should not produce the same result as (x + y) * z, same for all other
  • 9. operators. Each language has an operator precedence table, often based on mathematics order of operations. Several ways to handle this exist. Yacc-based parsers implement the Shunting Yard algorithm in which you give a precedence level to each kind of operator. Operators are declared in Bison and Yacc with %left and %right macros. Read more in Bison’s manual. Here’s the operator precedence table for our language, based on the C language operator precedence: 1 left '.' 2 right 'n' 3 left '*' '/' 4 left '+' '-' 5 left 'k' 'k=' 'l' 'l=' 6 left '==' 'n=' 7 left 'vv' 8 left 'jj' 9 right '=' 10 left ',' The higher the precedence (top is higher), the sooner the operator will be parsed. If the line a + b * c is being parsed, the part b * c will be parsed first since * has higher precedence than +. Now, if several operators having the same precedence are competing to be parsed all the once, the conflict is resolved using associativity, declared with the left and right keyword before the token. For example, with the expression a = b = c. Since = has right-to-left associativity, it will start parsing from the right, b = c. Resulting in a = (b = c). For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative can be used. Simply declaring the grammar rules in the right order will produce the desired result:
  • 10. 1 expression: exeality 2 exeality: additive ( ( '==' j 'n=' ) additive )* 3 additive: meltiplicative ( ( '+' j '-' ) meltiplicative )* 4 meltiplicative: primary ( ( '*' j '/' ) primary )* 5 primary: '(' expression ')' j NUMBER j zARIABLE j '-' primary The parser will try to match rules recursively, starting from expression and finding its way to primary. Since multiplicative is the last rule called in the parsing process, it will have greater precedence. CONNECTING THE LEXER AND PARSER IN AWESOME For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder to build a parser from scratch than it is to create a lexer. However, most languages end up writing their own parser because the result is faster and provides better error reporting. The input file you supply to Racc contains the grammar of your language and is very similar to a Yacc grammar. 1 class qarser grammar.y 2 3 g Declare toiens prodeced hy the lexer 4 toien IF ELSE 5 toien DEF 6 toien CLASS 7 toien NEWLINE 8 toien NUMBER 9 toien STRING 10 toien TRUE FALSE NIL 11 toien IDENTIFIER 12 toien CONSTANT 13 toien INDENT DEDENT 14
  • 11. This is a sample chapter. Buy the full book online at createyourproglang.com