Compiler Design Introduction

Introduction of Compiler Design

• Compiler is a software which converts a
program written in high level language (Source
Language) to low level language
(Object/Target/Machine Language).

• Cross Compiler that runs on a machine ‘A’ and
produces a code for another machine ‘B’.
• It is capable of creating code for a platform
other than the one on which the compiler is
running.
• Source-to-source Compiler or transcompiler
or transpiler is a compiler that translates
source code written in one programming
language into source code of another
programming language.

Language processing systems (using
Compiler)
• We know a computer is a logical assembly of
Software and Hardware.
• The hardware knows a language, that is hard for
us to grasp, consequently we tend to write
programs in high-level language, that is much less
complicated for us to comprehend and maintain
in thoughts.
• Now these programs go through a series of
transformation so that they can readily be used
machines.
• This is where language procedure systems come
handy.

• High Level Language – If a program contains #define or
#include directives such as #include or #define it is called
HLL. They are closer to humans but far from machines.
These (#) tags are called pre-processor directives. They
direct the pre-processor about what to do.
• Pre-Processor – The pre-processor removes all the #include
directives by including the files called file inclusion and all
the #define directives using macro expansion. It performs
file inclusion, augmentation, macro-processing etc.
• Assembly Language – Its neither in binary form nor high
level. It is an intermediate state that is a combination of
machine instructions and some other useful data needed
for execution.
• Assembler – For every platform (Hardware + OS) we will
have a assembler. They are not universal since for each
platform we have one. The output of assembler is called
object file. Its translates assembly language to machine
code.

• Interpreter – An interpreter converts high level language
into low level machine language, just like a compiler. But
they are different in the way they read the input. The
Compiler in one go reads the inputs, does the processing
and executes the source code whereas the interpreter does
the same line by line. Compiler scans the entire program
and translates it as a whole into machine code whereas an
interpreter translates the program one statement at a time.
Interpreted programs are usually slower with respect to
compiled ones.
• Relocatable Machine Code – It can be loaded at any point
and can be run. The address within the program will be in
such a way that it will cooperate for the program
movement.
• Loader/Linker – It converts the relocatable code into
absolute code and tries to run the program resulting in a
running program or an error message (or sometimes both
can happen). Linker loads a variety of object files into a
single file to make it executable. Then loader loads it in
memory and executes it.

Phases of a Compiler
• There are two major phases of compilation,
which in turn have many parts.
• Each of them take input from the output of
the previous level and work in a coordinated
way.

Analysis Phase
• An intermediate representation is created
from the give source code :
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
4. Intermediate Code Generator

• Lexical analyzer divides the program into
“tokens”,
• Syntax analyzer recognizes “sentences” in the
program using syntax of language and
• Semantic analyzer checks static semantics of
each construct.
• Intermediate Code Generator generates
“abstract” code.

Synthesis Phase
• Equivalent target program is created from the
intermediate representation.
• It has two parts :
1. Code Optimizer
2. Code Generator
• Code Optimizer optimizes the abstract code,
and
• final Code Generator translates abstract
intermediate code into specific machine
instructions.

Compiler construction tools
• The compiler writer can use some specialized
tools that help in implementing various
phases of a compiler.
• These tools assist in the creation of an entire
compiler or its parts.
• Some commonly used compiler construction
tools include:

1. Parser Generator –
• It produces syntax analyzers (parsers) from the
input that is based on a grammatical
description of programming language or on a
context-free grammar.
• It is useful as the syntax analysis phase is
highly complex and consumes more manual
and compilation time.
Example: PIC, EQM

2. Scanner Generator
• It generates lexical analyzers from the input
that consists of regular expression description
based on tokens of a language.
• It generates a finite automation to recognize
the regular expression.
Example: Lex

• Syntax directed translation engines –
It generates intermediate code with three
address format from the input that consists of a
parse tree. These engines have routines to
traverse the parse tree and then produces the
intermediate code. In this, each node of the parse
tree is associated with one or more translations.
• Automatic code generators –
It generates the machine language for a target
machine. Each operation of the intermediate
language is translated using a collection of rules
and then is taken as an input by the code
generator. Template matching process is used. An
intermediate language statement is replaced by
its equivalent machine language statement using
templates.

• Data-flow analysis engines –
It is used in code optimization.
• Data flow analysis is a key part of the code
optimization that gathers the information,
that is the values that flow from one part of a
program to another.
• Compiler construction toolkits –
It provides an integrated set of routines that
aids in building compiler components or in the
construction of various phases of compiler.

Symbol Table in Compiler
• Symbol Table is an important data structure
created and maintained by the compiler in
order to keep track of semantics of variable
i.e. it stores information about scope and
binding information about names, information
about instances of various entities such as
variable and function names, classes, objects,
etc.

• It is built in lexical and syntax analysis phases.
• The information is collected by the analysis
phases of compiler and is used by synthesis
phases of compiler to generate code.
• It is used by compiler to achieve compile time
efficiency.

• It is used by various phases of compiler as follows :-
• Lexical Analysis: Creates new table entries in the table,
example like entries about token.
• Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in
the table.
• Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
• Intermediate Code generation: Refers symbol table for
knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
• Code Optimization: Uses information present in
symbol table for machine dependent optimization.
• Target Code generation: Generates code by using
address information of identifier present in the table.

• Symbol Table entries –
• Each entry in symbol table is associated with
attributes that support compiler in different
phases.
Items stored in Symbol table:
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages

• Operations of Symbol table –
• The basic operations defined on a symbol
table include:

Implementation of Symbol table

Consider the following expression and
construct a DAG for it
( a + b ) x ( a + b + c )

Intermediate Code Generation in
Compiler Design
• In the analysis-synthesis model of a compiler, the
front end of a compiler translates a source
program into an independent intermediate code,
then the back end of the compiler uses this
intermediate code to generate the target code
(which can be understood by the machine).
• The benefits of using machine independent
intermediate code are:
• Because of the machine independent
intermediate code, portability will be enhanced.

• For ex, suppose, if a compiler translates the
source language to its target machine
language without having the option for
generating intermediate code, then for each
new machine, a full native compiler is
required.
• Because, obviously, there were some
modifications in the compiler itself according
to the machine specifications.
• Retargeting is facilitated
• It is easier to apply source code modification
to improve the performance of source code by
optimising the intermediate code.

• If we generate machine code directly from
source code then for n target machine we will
have n optimisers and n code generators but if
we will have a machine independent
intermediate code,
we will have only one optimiser.
• Intermediate code can be either language
specific (e.g., Bytecode for Java) or language.
independent (three-address code).

The following are commonly used
intermediate code representation:
• Postfix Notation –
• The ordinary (infix) way of writing the sum of a and b is
with operator in the middle : a + b
• The postfix notation for the same expression places the
operator at the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.
• In postfix notation the operator follows the operand.
• Example – The postfix representation of the expression (a –
b) * (c + d) + (a – b) is : ab – cd + *ab -+.
Read more: Infix to Postfix

Introduction of Lexical Analysis
• Lexical Analysis is the first phase of compiler
also known as scanner.
• It converts the High level input program into a
sequence of Tokens.
• Lexical Analysis can be implemented with
the Deterministic finite Automata.
• The output is a sequence of tokens that is sent
to the parser for syntax analysis

• What is a token?
A lexical token is a sequence of characters that
can be treated as a unit in the grammar of the
programming languages.
• Example of tokens:
• Type token (id, number, real, . . . )
• Punctuation tokens (IF, void, return, . . . )
• Alphabetic tokens (keywords)

Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs,
newline etc

• Lexeme:
• The sequence of characters matched by a
pattern to form the corresponding token or a
sequence of input characters that comprises a
single token is called a lexeme.
• eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”,
“;” .

• How Lexical Analyzer functions
• 1. Tokenization .i.e Dividing the program into
valid tokens.
2. Remove white space characters.
3. Remove comments.
4. It also provides help in generating error
message by providing row number and
column number.

• Exercise 2:
• Count number of tokens :
• int max(int i);
• Lexical analyzer first read int and finds it to be
valid and accepts as token
• max is read by it and found to be valid
function name after reading (
• int is also a token , then again i as another
token and finally ;

Error detection and Recovery in
Compiler
• In this phase of compilation, all possible errors
made by the user are detected and reported
to the user in form of error messages. This
process of locating errors and reporting it to
user is called Error Handling process.
Functions of Error handler
• Detection
• Reporting
• Recovery

Compile time errors are of three
types
• These errors are detected during the lexical
analysis phase. Typical lexical errors are
1. Exceeding length of identifier or numeric
constants.
2. Appearance of illegal characters
3. Unmatched string

Syntactic phase errors
• These errors are detected during syntax
analysis phase. Typical syntax errors are
• Errors in structure
• Missing operator
• Misspelled keywords
• Unbalanced parenthesis

Semantic errors
• These errors are detected during semantic
analysis phase. Typical semantic errors are
1. Incompatible type of operands
2. Undeclared variables
3. Not matching of actual arguments with
formal one

Code Optimization in Compiler Design
• The code optimization in the synthesis phase is a
program transformation technique, which tries to
improve the intermediate code by making it
consume fewer resources (i.e. CPU, Memory) so
that faster-running machine code will result.
Compiler optimizing process should meet the
following objectives :
• The optimization must be correct, it must not, in
any way, change the meaning of the program.
• Optimization should increase the speed and
performance of the program.
• The compilation time must be kept reasonable.
• The optimization process should not delay the
overall compiling process.

• When to Optimize?
Optimization of the code is often performed at the end
of the development stage since it reduces readability
and adds code that is used to increase the
performance.
• Why Optimize?
Optimizing an algorithm is beyond the scope of the
code optimization phase. So the program is optimized.
And it may involve reducing the size of the code. So
optimization helps to:
• Reduce the space consumed and increases the speed
of compilation.
• Manually analyzing datasets involves a lot of time.
Hence we make use of software like Tableau for data
analysis. Similarly manually performing the
optimization is also tedious and is better done using a
code optimizer.
• An optimized code often promotes re-usability.

Types of Code Optimization –The optimization process
can be broadly classified into two types :
• Machine Independent Optimization – This code
optimization phase attempts to improve
the intermediate code to get a better target code as
the output. The part of the intermediate code which is
transformed here does not involve any CPU registers or
absolute memory locations.
• Machine Dependent Optimization – Machine-
dependent optimization is done after the target
code has been generated and when the code is
transformed according to the target machine
architecture. It involves CPU registers and may have
absolute memory references rather than relative
references. Machine-dependent optimizers put efforts
to take maximum advantage of the memory hierarchy

Code Optimization is done in the
following different ways

Hence, after variable propagation, a*b and x*b will be identified as common sub-
expression.

Three address code in Compiler
• Three address code is a type of intermediate
code which is easy to generate and can be
easily converted to machine code.
• It makes use of at most three addresses and
one operator to represent an expression and
the value computed at each instruction is
stored in temporary variable generated by
compiler.
• The compiler decides the order of operation
given by three address code.

General representation
• Where a, b or c represents operands like
names, constants or compiler generated
temporaries and op represents the operator

• Example-2: Write three address code for
following code

Parse Tree
• Parse : It means to resolve (a sentence) into its
component parts and describe their syntactic
roles or simply it is an act of parsing a string or
a text.
• Tree : A tree may be a widely used abstract
data type that simulates a hierarchical tree
structure, with a root value and sub-trees of
youngsters with a parent node, represented as
a group of linked nodes.

• Uses of Parse Tree :
• It helps in making syntax analysis by reflecting
the syntax of the input language.
• It uses an in-memory representation of the
input with a structure that conforms to the
grammar.
• The advantages of using parse trees rather
than semantic actions: you’ll make multiple
passes over the info without having to re-
parse the input.

Types of Parsers in Compiler Design
• Parser is that phase of compiler which takes
token string as input and with the help of
existing grammar, converts it into the
corresponding parse tree.
• Parser is also known as Syntax Analyzer.

Bottom Up or Shift Reduce Parsers |
Set 2
• Bottom Up Parsers / Shift Reduce Parsers
Build the parse tree from leaves to root.
• Bottom-up parsing can be defined as an
attempt to reduce the input string w to the
start symbol of grammar by tracing out the
rightmost derivations of w in reverse.
Eg.

Recursive Descent Parser
• Parsing is the process to determine whether
the start symbol can derive the program or
not.
• If the Parsing is successful then the program is
a valid program otherwise the program is
invalid.

There are generally two types of
Parsers

Recursive Descent Parser
• It is a kind of Top-Down Parser.
• A top-down parser builds the parse tree from
the top to down, starting with the start non-
terminal.
• A Predictive Parser is a special case of
Recursive Descent Parser, where no Back
Tracking is required.
• By carefully writing a grammar means
eliminating left recursion and left factoring
from it, the resulting grammar will be a
grammar that can be parsed by a recursive
descent parser.

• A general shift reduce parsing is LR parsing.
• The L stands for scanning the input from left
to right and R stands for constructing a
rightmost derivation in reverse.
Benefits of LR parsing:
• Many programming languages using some
variations of an LR parser. It should be noted
that C++ and Perl are exceptions to it.
• LR Parser can be implemented very efficiently
• Here we will look at the construction of GOTO
graph of grammar by using all the four LR
parsing techniques

S – attributed and L – attributed SDTs
in Syntax directed translation
• Types of attributes –
Attributes may be of two types – Synthesized or
Inherited.
1. Synthesized attributes –
A Synthesized attribute is an attribute of the
non-terminal on the left-hand side of a
production.
• Synthesized attributes represent information
that is being passed up the parse tree.
• The attribute can take value only from its
children (Variables in the RHS of the
production).

• For eg. let’s say A -> BC is a production of a
grammar, and A’s attribute is dependent on B’s
attributes or C’s attributes then it will be
synthesized attribute.

2. Inherited attributes –
An attribute of a nonterminal on the right-
hand side of a production is called an
inherited attribute.
• The attribute can take value either from its
parent or from its siblings (variables in the LHS
or RHS of the production).
• For example, let’s say A -> BC is a production
of a grammar and B’s attribute is dependent
on A’s attributes or C’s attributes then it will
be inherited attribute.

• Now, let’s discuss about S-attributed and L-
attributed SDT.
• S-attributed SDT :
– If an SDT uses only synthesized attributes, it is
called as S-attributed SDT.
– S-attributed SDTs are evaluated in bottom-up
parsing, as the values of the parent nodes depend
upon the values of the child nodes.
– Semantic actions are placed in rightmost place of
RHS.

• L-attributed SDT:
– If an SDT uses both synthesized attributes and
inherited attributes with a restriction that
inherited attribute can inherit values from left
siblings only, it is called as L-attributed SDT.
– Attributes in L-attributed SDTs are evaluated by
depth-first and left-to-right parsing manner.
– Semantic actions are placed anywhere in RHS.

• For example,
• A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S} is not an
L-attributed grammar
• since Y.S = A.S and Y.S = X.S are allowed but Y.S
= Z.S violates the L-attributed SDT definition as
attributed is inheriting the value from its right
sibling.

Compiler Design | Detection of a Loop
in Three Address Code
• Loop optimization is the phase after the
Intermediate Code Generation.
• The main intention of this phase is to reduce
the number of lines in a program.
• In any program majority of the time is spent
by any program is actually inside the loop for
an iterative program.
• In case of the recursive program a block will
be there and the majority of the time will
present inside the block

• Loop Optimization –
• To apply loop optimization we must first
detect loops.
• For detecting loops we use Control Flow
Analysis(CFA) using Program Flow Graph(PFG).
• To find program flow graph we need to find
Basic Block

• Basic Block –
• A basic block is a sequence of three address
statements where control enters at the beginning
and leaves only at the end without any jumps or
halts.
• Finding the Basic Block –
In order to find the basic blocks, we need to find
the leaders in the program.
• Then a basic block will start from one leader to
the next leader but not including the next leader.
Which means if you find out that lines no 1 is a
leader and line no 15 is the next leader, then the
line from 1 to 14 is a basic block, but not
including line no 15

• Identifying leader in a Basic Block –
• First statement is always a leader
• Statement that is target of conditional or un-
conditional statement is a leader
• Statement that follows immediately a
conditional or un-conditional statement is a
leader

Language Processors: Assembler,
Compiler and Interpreter
• Language Processors –
Assembly language is machine dependent yet
mnemonics that are being used to represent
instructions in it are not directly understandable
by machine and high Level language is machine
independent.
• A computer understands instructions in machine
code, i.e. in the form of 0s and 1s.
• It is a tedious task to write a computer program
directly in machine code.
• The programs are written mostly in high level
languages like Java, C++, Python etc. and are
called source code.

• These source code cannot be executed
directly by the computer and must be
converted into machine language to be
executed.
• Hence, a special translator system software is
used to translate the program written in high
level language into machine code is
called Language Processor and the program
after translated into machine code (object
program / object code).

The language processors can be any
of the following three types:
• Compiler –
The language processor that reads the complete source
program written in high level language as a whole in
one go and translates it into an equivalent program in
machine language is called as a Compiler.
Example: C, C++, C#, Java In a compiler, the source
code is translated to object code successfully if it is free
of errors.
• The compiler specifies the errors at the end of
compilation with line numbers when there are any
errors in the source code.
• The errors must be removed before the compiler can
successfully recompile the source code again.

• Assembler –
The Assembler is used to translate the
program written in Assembly language into
machine code.
• The source program is a input of assembler
that contains assembly language instructions.
• The output generated by assembler is the
object code or machine code understandable
by the computer.

• Interpreter –
The translation of single statement of source
program into machine code is done by language
processor and executes it immediately before
moving on to the next line is called an interpreter.
• If there is an error in the statement, the
interpreter terminates its translating process at
that statement and displays an error message.
• The interpreter moves on to the next line for
execution only after removal of the error.
• An Interpreter directly executes instructions
written in a programming or scripting language
without previously converting them to an object
code or machine code.
Example: Perl, Python and Matlab.

FOLLOW Set in Syntax Analysis
• Follow(X) to be the set of terminals that can
appear immediately to the right of Non-
Terminal X in some sentential form.

• 1) FOLLOW(S) = { $ } // where S is the starting
Non-Terminal
• 2) If A -> pBq is a production, where p, B and q
are any grammar symbols, then everything in
FIRST(q) except Є is in FOLLOW(B).
• 3) If A->pB is a production, then everything in
FOLLOW(A) is in FOLLOW(B).
• 4) If A->pBq is a production and FIRST(q)
contains Є, then FOLLOW(B) contains {
FIRST(q) – Є } U FOLLOW(A)

• FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is
there because of 5th rule
• FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st
production rule FOLLOW(T) = { FIRST(E’) – Є }
U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U
FOLLOW(T) = { *, +, $, ) }

Classification of Context Free
Grammars
• Context Free Grammars (CFG) can be classified on the
basis of following two properties:
• 1) Based on number of strings it generates.
• If CFG is generating finite number of strings, then CFG
is Non-Recursive (or the grammar is said to be Non-
recursive grammar)
• If CFG can generate infinite number of strings then the
grammar is said to be Recursive grammar
• During Compilation, the parser uses the grammar of
the language to make a parse tree(or derivation tree)
out of the source code. The grammar used must be
unambiguous. An ambiguous grammar must not be
used for parsing.

• 2) Based on number of derivation trees.
• If there is only 1 derivation tree then the CFG
is unambiguous.
• If there are more than 1 derivation tree, then
the CFG is ambiguous.

Examples of Recursive Grammars

Examples of Non-Recursive Grammars

Ambiguous Grammar
• Context Free Grammars(CFGs) are classified
based on:
1. Number of Derivation trees
2. Number of strings
Depending on Number of Derivation trees, CFGs
are sub-divided into 2 types:
1. Ambiguous grammars
2. Unambiguous grammars

• For Example:
1. Let us consider this grammar : E -> E+E|id
• We can create 2 parse tree from this grammar
to obtain a string id+id+id :

Note
• Both the above parse trees are derived from
same grammar rules but both parse trees are
different.
• Hence the grammar is ambiguous.

• From the above grammar String 3*2+5 can be
derived in 2 ways:

• Derivation tree:
• It tells how string is derived using production
rules from S and has been shown in Figure 1.

• Note : Ambiguity of a grammar is undecidable,
i.e. there is no particular algorithm for
removing the ambiguity of a grammar, but we
can remove ambiguity by:
• Disambiguate the grammar i.e., rewriting the
grammar such that there is only one
derivation or parse tree possible for a string of
the language which the grammar represents.

Clone a Directed Acyclic Graph
• A directed acyclic graph (DAG) is a graph
which doesn’t contain a cycle and has directed
edges.
• We are given a DAG, we need to clone it, i.e.,
create another graph that has copy of its
vertices and edges connecting them.

Syntax Directed Translation in
Compiler Design
• Background : Parser uses a CFG(Context-free-
Grammer) to validate the input string and
produce output for next phase of the
compiler.
• Output could be either a parse tree or abstract
syntax tree.
• Now to interleave semantic analysis with
syntax analysis phase of the compiler, we use
Syntax Directed Translation.

• Definition
Syntax Directed Translation are augmented rules
to the grammar that facilitate semantic analysis.
SDT involves passing information bottom-up
and/or top-down the parse tree in form of
attributes attached to the nodes.
• Syntax directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes
associated to the non-terminals in their
definitions.
• The general approach to Syntax-Directed
Translation is to construct a parse tree or syntax
tree and compute the values of attributes at the
nodes of the tree by visiting them in some order.
In many cases, translation can be done during
parsing without building an explicit tree.

• To evaluate translation rules, we can employ one
depth first search traversal on the parse tree.
• This is possible only because SDT rules don’t
impose any specific order on evaluation until
children attributes are computed before parents
for a grammar having all synthesized attributes.
• Otherwise, we would have to figure out the best
suited plan to traverse through the parse tree and
evaluate all the attributes in one or more
traversals.
• For better understanding, we will move bottom
up in left to right fashion for computing
translation rules of our example.

• Syntax Directed Definition (SDD) is a kind of
abstract specification.
• It is generalization of context free grammar in
which each grammar production X –> a is
associated with it a set of production rules of
the form s = f(b1, b2, ……bk) where s is the
attribute obtained from function f.
• The attribute can be a string, number, type or
a memory location.
• Semantic rules are fragments of code which
are embedded usually at the end of
production and enclosed in curly braces ({ }).

• Let us assume an input string 4 * 5 + 6 for
computing synthesized attributes.
• The annotated parse tree for the input string
is

• For computation of attributes we start from
leftmost bottom node.
• The rule F –> digit is used to reduce digit to F and
the value of digit is obtained from lexical analyzer
which becomes value of F i.e. from semantic
action F.val = digit.lexval.
• Hence, F.val = 4 and since T is parent node of F so,
we get T.val = 4 from semantic action T.val = F.val.
Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val *
F.val . Hence, T.val = 4 * 5 = 20
• Similarly, combination of E1.val + T.val becomes
E.val i.e. E.val = E1.val + T.val = 26.
• Then, the production S –> E is applied to reduce
E.val = 26 and semantic action associated with it
prints the result E.val .
• Hence, the output will be 26.

• Let us assume an input string int a, c for
computing inherited attributes.
• The annotated parse tree for the input string
is

• The value of L nodes is obtained from T.type
(sibling) which is basically lexical value
obtained as int, float or double.
• Then L node gives type of identifiers a and c.
The computation of type is done in top down
manner or preorder traversal.
• Using function Enter_type the type of
identifiers a and c is inserted in symbol table
at corresponding id.entry.

Removing Direct and Indirect Left
Recursion in a Grammar

Check if the given grammar contains left recursion, if present
then separate the production and start working on it.
In our example,
S-->S a/ S b /c / d

Bootstrapping in Compiler Design

• Step-4: Thus we get a compiler written in ASM
which compiles C and generates code in ASM.

Peephole Optimization in Compiler
Design
• Peephole optimization is a type of Code
Optimization performed on a small part of the
code.
• It is performed on the very small set of
instructions in a segment of code.
• It basically works on the theory of replacement in
which a part of code is replaced by shorter and
faster code without change in output.
• Peephole is the machine dependent optimization.

Objectives of Peephole Optimization
• The objective of peephole optimization is:
1. To improve performance
2. To reduce memory footprint
3. To reduce code size

Peephole Optimization Techniques

Construction of LL(1) Parsing Table
• A top-down parser builds the parse tree from
the top down, starting with the start non-
terminal.
• There are two types of Top-Down Parsers:
1. Top-Down Parser with Backtracking
2. Top-Down Parsers without Backtracking
3. Top-Down Parsers without backtracking can
further be divided into two parts:

• In the table, rows will contain the Non-
Terminals and the column will contain the
Terminal Symbols.
• All the Null Productions of the Grammars will
go under the Follow elements and the
remaining productions will lie under the
elements of the First set.

• As you can see that all the null productions
are put under the Follow set of that symbol
and all the remaining productions are lie
under the First of that symbol.
• Note: Every grammar is not feasible for LL(1)
Parsing table.
• It may be possible that one cell may contain
more than one production.

SLR, CLR and LALR Parsers
• SLR Parser
The SLR parser is similar to LR(0) parser except
that the reduced entry.
• The reduced productions are written only in
the FOLLOW of the variable whose production
is reduced.

CLR PARSER
• In the SLR method we were working with
LR(0)) items.
• In CLR parsing we will be using LR(1) items.
• LR(k) item is defined to be an item using
lookaheads of length k.
• So , the LR(1) item is comprised of two parts :
the LR(0) item and the lookahead associated
with the item.
LR(1) parsers are more powerful parser.
For LR(1) items we modify the Closure and
GOTO function.

Introduction to YACC
• A parser generator is a program that takes as
input a specification of a syntax, and produces
as output a procedure for recognizing that
language. Historically, they are also called
compiler-compilers.
YACC (yet another compiler-compiler) is
an LALR(1) (LookAhead, Left-to-right,
Rightmost derivation producer with 1
lookahead token) parser generator.
• YACC was originally designed for being
complemented by Lex.

Operator grammar and precedence
parser
• A grammar that is used to define
mathematical operators is called an operator
grammar or operator precedence grammar.
• Such grammars have the restriction that no
production has either an empty right-hand
side (null productions) or two adjacent non-
terminals in its right-hand side.

• Operator precedence parser –
An operator precedence parser is a bottom-up
parser that interprets an operator grammar.
• This parser is only used for operator
grammars.
• Ambiguous grammars are not allowed in any
parser except operator precedence parser.
• There are two methods for determining what
precedence relations should hold between a
pair of terminals:

1. Use the conventional associativity and
precedence of operator.
2. The second method of selecting operator-
precedence relations is first to construct an
unambiguous grammar for the language, a
grammar that reflects the correct
associativity and precedence in its parse
trees.

• This parser relies on the following three
precedence relations: ⋖, ≐, ⋗
• a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has same precedence as”
b.

• There is not given any relation between id and
id as id will not be compared and two
variables can not come side by side.
• There is also a disadvantage of this table – if
we have n operators then size of table will be
n*n and complexity will be 0(n2).
• In order to decrease the size of table, we
use operator function table.

• Operator precedence parsers usually do not store
the precedence table with the relations; rather
they are implemented in a special way.
• Operator precedence parsers use precedence
functions that map terminal symbols to integers,
and the precedence relations between the
symbols are implemented by numerical
comparison.
• The parsing table can be encoded by two
precedence functions f and g that map terminal
symbols to integers. We select f and g such that:
• f(a) < g(b) whenever a yields precedence to b
• f(a) = g(b) whenever a and b have the same
precedence
• f(a) > g(b) whenever a takes precedence over b

• Since there is no cycle in the graph, we can
make this function table:
Size of the table is 2n

• One disadvantage of function tables is that
even though we have blank entries in relation
table we have non-blank entries in function
table.
• Blank entries are also called error.
• Hence error detection capability of relation
table is greater than function table.

Compiler Design Introduction

More Related Content

What's hot (20)

Similar to Compiler Design Introduction (20)

More from Thapar Institute (19)

Recently uploaded (20)

Compiler Design Introduction