SlideShare a Scribd company logo
Introduction of Compiler Design
• Compiler is a software which converts a
program written in high level language (Source
Language) to low level language
(Object/Target/Machine Language).
• Cross Compiler that runs on a machine ‘A’ and
produces a code for another machine ‘B’.
• It is capable of creating code for a platform
other than the one on which the compiler is
running.
• Source-to-source Compiler or transcompiler
or transpiler is a compiler that translates
source code written in one programming
language into source code of another
programming language.
Language processing systems (using
Compiler)
• We know a computer is a logical assembly of
Software and Hardware.
• The hardware knows a language, that is hard for
us to grasp, consequently we tend to write
programs in high-level language, that is much less
complicated for us to comprehend and maintain
in thoughts.
• Now these programs go through a series of
transformation so that they can readily be used
machines.
• This is where language procedure systems come
handy.
Compiler Design Introduction
• High Level Language – If a program contains #define or
#include directives such as #include or #define it is called
HLL. They are closer to humans but far from machines.
These (#) tags are called pre-processor directives. They
direct the pre-processor about what to do.
• Pre-Processor – The pre-processor removes all the #include
directives by including the files called file inclusion and all
the #define directives using macro expansion. It performs
file inclusion, augmentation, macro-processing etc.
• Assembly Language – Its neither in binary form nor high
level. It is an intermediate state that is a combination of
machine instructions and some other useful data needed
for execution.
• Assembler – For every platform (Hardware + OS) we will
have a assembler. They are not universal since for each
platform we have one. The output of assembler is called
object file. Its translates assembly language to machine
code.
• Interpreter – An interpreter converts high level language
into low level machine language, just like a compiler. But
they are different in the way they read the input. The
Compiler in one go reads the inputs, does the processing
and executes the source code whereas the interpreter does
the same line by line. Compiler scans the entire program
and translates it as a whole into machine code whereas an
interpreter translates the program one statement at a time.
Interpreted programs are usually slower with respect to
compiled ones.
• Relocatable Machine Code – It can be loaded at any point
and can be run. The address within the program will be in
such a way that it will cooperate for the program
movement.
• Loader/Linker – It converts the relocatable code into
absolute code and tries to run the program resulting in a
running program or an error message (or sometimes both
can happen). Linker loads a variety of object files into a
single file to make it executable. Then loader loads it in
memory and executes it.
Phases of a Compiler
• There are two major phases of compilation,
which in turn have many parts.
• Each of them take input from the output of
the previous level and work in a coordinated
way.
Compiler Design Introduction
Analysis Phase
• An intermediate representation is created
from the give source code :
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
4. Intermediate Code Generator
• Lexical analyzer divides the program into
“tokens”,
• Syntax analyzer recognizes “sentences” in the
program using syntax of language and
• Semantic analyzer checks static semantics of
each construct.
• Intermediate Code Generator generates
“abstract” code.
Synthesis Phase
• Equivalent target program is created from the
intermediate representation.
• It has two parts :
1. Code Optimizer
2. Code Generator
• Code Optimizer optimizes the abstract code,
and
• final Code Generator translates abstract
intermediate code into specific machine
instructions.
Compiler construction tools
• The compiler writer can use some specialized
tools that help in implementing various
phases of a compiler.
• These tools assist in the creation of an entire
compiler or its parts.
• Some commonly used compiler construction
tools include:
1. Parser Generator –
• It produces syntax analyzers (parsers) from the
input that is based on a grammatical
description of programming language or on a
context-free grammar.
• It is useful as the syntax analysis phase is
highly complex and consumes more manual
and compilation time.
Example: PIC, EQM
Compiler Design Introduction
2. Scanner Generator
• It generates lexical analyzers from the input
that consists of regular expression description
based on tokens of a language.
• It generates a finite automation to recognize
the regular expression.
Example: Lex
Compiler Design Introduction
• Syntax directed translation engines –
It generates intermediate code with three
address format from the input that consists of a
parse tree. These engines have routines to
traverse the parse tree and then produces the
intermediate code. In this, each node of the parse
tree is associated with one or more translations.
• Automatic code generators –
It generates the machine language for a target
machine. Each operation of the intermediate
language is translated using a collection of rules
and then is taken as an input by the code
generator. Template matching process is used. An
intermediate language statement is replaced by
its equivalent machine language statement using
templates.
• Data-flow analysis engines –
It is used in code optimization.
• Data flow analysis is a key part of the code
optimization that gathers the information,
that is the values that flow from one part of a
program to another.
• Compiler construction toolkits –
It provides an integrated set of routines that
aids in building compiler components or in the
construction of various phases of compiler.
Symbol Table in Compiler
• Symbol Table is an important data structure
created and maintained by the compiler in
order to keep track of semantics of variable
i.e. it stores information about scope and
binding information about names, information
about instances of various entities such as
variable and function names, classes, objects,
etc.
• It is built in lexical and syntax analysis phases.
• The information is collected by the analysis
phases of compiler and is used by synthesis
phases of compiler to generate code.
• It is used by compiler to achieve compile time
efficiency.
• It is used by various phases of compiler as follows :-
• Lexical Analysis: Creates new table entries in the table,
example like entries about token.
• Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in
the table.
• Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
• Intermediate Code generation: Refers symbol table for
knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
• Code Optimization: Uses information present in
symbol table for machine dependent optimization.
• Target Code generation: Generates code by using
address information of identifier present in the table.
• Symbol Table entries –
• Each entry in symbol table is associated with
attributes that support compiler in different
phases.
Items stored in Symbol table:
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Compiler Design Introduction
• Operations of Symbol table –
• The basic operations defined on a symbol
table include:
Implementation of Symbol table
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Consider the following expression and
construct a DAG for it
( a + b ) x ( a + b + c )
Compiler Design Introduction
Intermediate Code Generation in
Compiler Design
• In the analysis-synthesis model of a compiler, the
front end of a compiler translates a source
program into an independent intermediate code,
then the back end of the compiler uses this
intermediate code to generate the target code
(which can be understood by the machine).
• The benefits of using machine independent
intermediate code are:
• Because of the machine independent
intermediate code, portability will be enhanced.
• For ex, suppose, if a compiler translates the
source language to its target machine
language without having the option for
generating intermediate code, then for each
new machine, a full native compiler is
required.
• Because, obviously, there were some
modifications in the compiler itself according
to the machine specifications.
• Retargeting is facilitated
• It is easier to apply source code modification
to improve the performance of source code by
optimising the intermediate code.
Compiler Design Introduction
• If we generate machine code directly from
source code then for n target machine we will
have n optimisers and n code generators but if
we will have a machine independent
intermediate code,
we will have only one optimiser.
• Intermediate code can be either language
specific (e.g., Bytecode for Java) or language.
independent (three-address code).
The following are commonly used
intermediate code representation:
• Postfix Notation –
• The ordinary (infix) way of writing the sum of a and b is
with operator in the middle : a + b
• The postfix notation for the same expression places the
operator at the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.
• In postfix notation the operator follows the operand.
• Example – The postfix representation of the expression (a –
b) * (c + d) + (a – b) is : ab – cd + *ab -+.
Read more: Infix to Postfix
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Introduction of Lexical Analysis
• Lexical Analysis is the first phase of compiler
also known as scanner.
• It converts the High level input program into a
sequence of Tokens.
• Lexical Analysis can be implemented with
the Deterministic finite Automata.
• The output is a sequence of tokens that is sent
to the parser for syntax analysis
Compiler Design Introduction
• What is a token?
A lexical token is a sequence of characters that
can be treated as a unit in the grammar of the
programming languages.
• Example of tokens:
• Type token (id, number, real, . . . )
• Punctuation tokens (IF, void, return, . . . )
• Alphabetic tokens (keywords)
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs,
newline etc
• Lexeme:
• The sequence of characters matched by a
pattern to form the corresponding token or a
sequence of input characters that comprises a
single token is called a lexeme.
• eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”,
“;” .
• How Lexical Analyzer functions
• 1. Tokenization .i.e Dividing the program into
valid tokens.
2. Remove white space characters.
3. Remove comments.
4. It also provides help in generating error
message by providing row number and
column number.
Compiler Design Introduction
• Exercise 2:
• Count number of tokens :
• int max(int i);
• Lexical analyzer first read int and finds it to be
valid and accepts as token
• max is read by it and found to be valid
function name after reading (
• int is also a token , then again i as another
token and finally ;
Error detection and Recovery in
Compiler
• In this phase of compilation, all possible errors
made by the user are detected and reported
to the user in form of error messages. This
process of locating errors and reporting it to
user is called Error Handling process.
Functions of Error handler
• Detection
• Reporting
• Recovery
Compiler Design Introduction
Compile time errors are of three
types
• These errors are detected during the lexical
analysis phase. Typical lexical errors are
1. Exceeding length of identifier or numeric
constants.
2. Appearance of illegal characters
3. Unmatched string
Compiler Design Introduction
Syntactic phase errors
• These errors are detected during syntax
analysis phase. Typical syntax errors are
• Errors in structure
• Missing operator
• Misspelled keywords
• Unbalanced parenthesis
Compiler Design Introduction
Semantic errors
• These errors are detected during semantic
analysis phase. Typical semantic errors are
1. Incompatible type of operands
2. Undeclared variables
3. Not matching of actual arguments with
formal one
Compiler Design Introduction
Code Optimization in Compiler Design
• The code optimization in the synthesis phase is a
program transformation technique, which tries to
improve the intermediate code by making it
consume fewer resources (i.e. CPU, Memory) so
that faster-running machine code will result.
Compiler optimizing process should meet the
following objectives :
• The optimization must be correct, it must not, in
any way, change the meaning of the program.
• Optimization should increase the speed and
performance of the program.
• The compilation time must be kept reasonable.
• The optimization process should not delay the
overall compiling process.
• When to Optimize?
Optimization of the code is often performed at the end
of the development stage since it reduces readability
and adds code that is used to increase the
performance.
• Why Optimize?
Optimizing an algorithm is beyond the scope of the
code optimization phase. So the program is optimized.
And it may involve reducing the size of the code. So
optimization helps to:
• Reduce the space consumed and increases the speed
of compilation.
• Manually analyzing datasets involves a lot of time.
Hence we make use of software like Tableau for data
analysis. Similarly manually performing the
optimization is also tedious and is better done using a
code optimizer.
• An optimized code often promotes re-usability.
Types of Code Optimization –The optimization process
can be broadly classified into two types :
• Machine Independent Optimization – This code
optimization phase attempts to improve
the intermediate code to get a better target code as
the output. The part of the intermediate code which is
transformed here does not involve any CPU registers or
absolute memory locations.
• Machine Dependent Optimization – Machine-
dependent optimization is done after the target
code has been generated and when the code is
transformed according to the target machine
architecture. It involves CPU registers and may have
absolute memory references rather than relative
references. Machine-dependent optimizers put efforts
to take maximum advantage of the memory hierarchy
Code Optimization is done in the
following different ways
Hence, after variable propagation, a*b and x*b will be identified as common sub-
expression.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Three address code in Compiler
• Three address code is a type of intermediate
code which is easy to generate and can be
easily converted to machine code.
• It makes use of at most three addresses and
one operator to represent an expression and
the value computed at each instruction is
stored in temporary variable generated by
compiler.
• The compiler decides the order of operation
given by three address code.
General representation
• Where a, b or c represents operands like
names, constants or compiler generated
temporaries and op represents the operator
Compiler Design Introduction
• Example-2: Write three address code for
following code
Compiler Design Introduction
Parse Tree
• Parse : It means to resolve (a sentence) into its
component parts and describe their syntactic
roles or simply it is an act of parsing a string or
a text.
• Tree : A tree may be a widely used abstract
data type that simulates a hierarchical tree
structure, with a root value and sub-trees of
youngsters with a parent node, represented as
a group of linked nodes.
Compiler Design Introduction
Compiler Design Introduction
• Uses of Parse Tree :
• It helps in making syntax analysis by reflecting
the syntax of the input language.
• It uses an in-memory representation of the
input with a structure that conforms to the
grammar.
• The advantages of using parse trees rather
than semantic actions: you’ll make multiple
passes over the info without having to re-
parse the input.
Types of Parsers in Compiler Design
• Parser is that phase of compiler which takes
token string as input and with the help of
existing grammar, converts it into the
corresponding parse tree.
• Parser is also known as Syntax Analyzer.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Bottom Up or Shift Reduce Parsers |
Set 2
• Bottom Up Parsers / Shift Reduce Parsers
Build the parse tree from leaves to root.
• Bottom-up parsing can be defined as an
attempt to reduce the input string w to the
start symbol of grammar by tracing out the
rightmost derivations of w in reverse.
Eg.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Recursive Descent Parser
• Parsing is the process to determine whether
the start symbol can derive the program or
not.
• If the Parsing is successful then the program is
a valid program otherwise the program is
invalid.
There are generally two types of
Parsers
Recursive Descent Parser
• It is a kind of Top-Down Parser.
• A top-down parser builds the parse tree from
the top to down, starting with the start non-
terminal.
• A Predictive Parser is a special case of
Recursive Descent Parser, where no Back
Tracking is required.
• By carefully writing a grammar means
eliminating left recursion and left factoring
from it, the resulting grammar will be a
grammar that can be parsed by a recursive
descent parser.
**Here e is Epsilon
Compiler Design Introduction
• A general shift reduce parsing is LR parsing.
• The L stands for scanning the input from left
to right and R stands for constructing a
rightmost derivation in reverse.
Benefits of LR parsing:
• Many programming languages using some
variations of an LR parser. It should be noted
that C++ and Perl are exceptions to it.
• LR Parser can be implemented very efficiently
• Here we will look at the construction of GOTO
graph of grammar by using all the four LR
parsing techniques
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
S – attributed and L – attributed SDTs
in Syntax directed translation
• Types of attributes –
Attributes may be of two types – Synthesized or
Inherited.
1. Synthesized attributes –
A Synthesized attribute is an attribute of the
non-terminal on the left-hand side of a
production.
• Synthesized attributes represent information
that is being passed up the parse tree.
• The attribute can take value only from its
children (Variables in the RHS of the
production).
• For eg. let’s say A -> BC is a production of a
grammar, and A’s attribute is dependent on B’s
attributes or C’s attributes then it will be
synthesized attribute.
2. Inherited attributes –
An attribute of a nonterminal on the right-
hand side of a production is called an
inherited attribute.
• The attribute can take value either from its
parent or from its siblings (variables in the LHS
or RHS of the production).
• For example, let’s say A -> BC is a production
of a grammar and B’s attribute is dependent
on A’s attributes or C’s attributes then it will
be inherited attribute.
• Now, let’s discuss about S-attributed and L-
attributed SDT.
• S-attributed SDT :
– If an SDT uses only synthesized attributes, it is
called as S-attributed SDT.
– S-attributed SDTs are evaluated in bottom-up
parsing, as the values of the parent nodes depend
upon the values of the child nodes.
– Semantic actions are placed in rightmost place of
RHS.
• L-attributed SDT:
– If an SDT uses both synthesized attributes and
inherited attributes with a restriction that
inherited attribute can inherit values from left
siblings only, it is called as L-attributed SDT.
– Attributes in L-attributed SDTs are evaluated by
depth-first and left-to-right parsing manner.
– Semantic actions are placed anywhere in RHS.
• For example,
• A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S} is not an
L-attributed grammar
• since Y.S = A.S and Y.S = X.S are allowed but Y.S
= Z.S violates the L-attributed SDT definition as
attributed is inheriting the value from its right
sibling.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design | Detection of a Loop
in Three Address Code
• Loop optimization is the phase after the
Intermediate Code Generation.
• The main intention of this phase is to reduce
the number of lines in a program.
• In any program majority of the time is spent
by any program is actually inside the loop for
an iterative program.
• In case of the recursive program a block will
be there and the majority of the time will
present inside the block
• Loop Optimization –
• To apply loop optimization we must first
detect loops.
• For detecting loops we use Control Flow
Analysis(CFA) using Program Flow Graph(PFG).
• To find program flow graph we need to find
Basic Block
• Basic Block –
• A basic block is a sequence of three address
statements where control enters at the beginning
and leaves only at the end without any jumps or
halts.
• Finding the Basic Block –
In order to find the basic blocks, we need to find
the leaders in the program.
• Then a basic block will start from one leader to
the next leader but not including the next leader.
Which means if you find out that lines no 1 is a
leader and line no 15 is the next leader, then the
line from 1 to 14 is a basic block, but not
including line no 15
• Identifying leader in a Basic Block –
• First statement is always a leader
• Statement that is target of conditional or un-
conditional statement is a leader
• Statement that follows immediately a
conditional or un-conditional statement is a
leader
Compiler Design Introduction
Language Processors: Assembler,
Compiler and Interpreter
• Language Processors –
Assembly language is machine dependent yet
mnemonics that are being used to represent
instructions in it are not directly understandable
by machine and high Level language is machine
independent.
• A computer understands instructions in machine
code, i.e. in the form of 0s and 1s.
• It is a tedious task to write a computer program
directly in machine code.
• The programs are written mostly in high level
languages like Java, C++, Python etc. and are
called source code.
• These source code cannot be executed
directly by the computer and must be
converted into machine language to be
executed.
• Hence, a special translator system software is
used to translate the program written in high
level language into machine code is
called Language Processor and the program
after translated into machine code (object
program / object code).
The language processors can be any
of the following three types:
• Compiler –
The language processor that reads the complete source
program written in high level language as a whole in
one go and translates it into an equivalent program in
machine language is called as a Compiler.
Example: C, C++, C#, Java In a compiler, the source
code is translated to object code successfully if it is free
of errors.
• The compiler specifies the errors at the end of
compilation with line numbers when there are any
errors in the source code.
• The errors must be removed before the compiler can
successfully recompile the source code again.
Compiler Design Introduction
• Assembler –
The Assembler is used to translate the
program written in Assembly language into
machine code.
• The source program is a input of assembler
that contains assembly language instructions.
• The output generated by assembler is the
object code or machine code understandable
by the computer.
Compiler Design Introduction
• Interpreter –
The translation of single statement of source
program into machine code is done by language
processor and executes it immediately before
moving on to the next line is called an interpreter.
• If there is an error in the statement, the
interpreter terminates its translating process at
that statement and displays an error message.
• The interpreter moves on to the next line for
execution only after removal of the error.
• An Interpreter directly executes instructions
written in a programming or scripting language
without previously converting them to an object
code or machine code.
Example: Perl, Python and Matlab.
FIRST Set in Syntax Analysis
Compiler Design Introduction
Compiler Design Introduction
Note
FOLLOW Set in Syntax Analysis
• Follow(X) to be the set of terminals that can
appear immediately to the right of Non-
Terminal X in some sentential form.
Compiler Design Introduction
• 1) FOLLOW(S) = { $ } // where S is the starting
Non-Terminal
• 2) If A -> pBq is a production, where p, B and q
are any grammar symbols, then everything in
FIRST(q) except Є is in FOLLOW(B).
• 3) If A->pB is a production, then everything in
FOLLOW(A) is in FOLLOW(B).
• 4) If A->pBq is a production and FIRST(q)
contains Є, then FOLLOW(B) contains {
FIRST(q) – Є } U FOLLOW(A)
Compiler Design Introduction
• FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is
there because of 5th rule
• FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st
production rule FOLLOW(T) = { FIRST(E’) – Є }
U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U
FOLLOW(T) = { *, +, $, ) }
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Classification of Context Free
Grammars
• Context Free Grammars (CFG) can be classified on the
basis of following two properties:
• 1) Based on number of strings it generates.
• If CFG is generating finite number of strings, then CFG
is Non-Recursive (or the grammar is said to be Non-
recursive grammar)
• If CFG can generate infinite number of strings then the
grammar is said to be Recursive grammar
• During Compilation, the parser uses the grammar of
the language to make a parse tree(or derivation tree)
out of the source code. The grammar used must be
unambiguous. An ambiguous grammar must not be
used for parsing.
• 2) Based on number of derivation trees.
• If there is only 1 derivation tree then the CFG
is unambiguous.
• If there are more than 1 derivation tree, then
the CFG is ambiguous.
Examples of Recursive Grammars
Examples of Non-Recursive Grammars
Compiler Design Introduction
Ambiguous Grammar
• Context Free Grammars(CFGs) are classified
based on:
1. Number of Derivation trees
2. Number of strings
Depending on Number of Derivation trees, CFGs
are sub-divided into 2 types:
1. Ambiguous grammars
2. Unambiguous grammars
Ambiguous grammar
• For Example:
1. Let us consider this grammar : E -> E+E|id
• We can create 2 parse tree from this grammar
to obtain a string id+id+id :
Note
• Both the above parse trees are derived from
same grammar rules but both parse trees are
different.
• Hence the grammar is ambiguous.
Compiler Design Introduction
• From the above grammar String 3*2+5 can be
derived in 2 ways:
Compiler Design Introduction
• Derivation tree:
• It tells how string is derived using production
rules from S and has been shown in Figure 1.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Removing Left Recursion
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Inherently ambiguous Language
• Note : Ambiguity of a grammar is undecidable,
i.e. there is no particular algorithm for
removing the ambiguity of a grammar, but we
can remove ambiguity by:
• Disambiguate the grammar i.e., rewriting the
grammar such that there is only one
derivation or parse tree possible for a string of
the language which the grammar represents.
Clone a Directed Acyclic Graph
• A directed acyclic graph (DAG) is a graph
which doesn’t contain a cycle and has directed
edges.
• We are given a DAG, we need to clone it, i.e.,
create another graph that has copy of its
vertices and edges connecting them.
Syntax Directed Translation in
Compiler Design
• Background : Parser uses a CFG(Context-free-
Grammer) to validate the input string and
produce output for next phase of the
compiler.
• Output could be either a parse tree or abstract
syntax tree.
• Now to interleave semantic analysis with
syntax analysis phase of the compiler, we use
Syntax Directed Translation.
• Definition
Syntax Directed Translation are augmented rules
to the grammar that facilitate semantic analysis.
SDT involves passing information bottom-up
and/or top-down the parse tree in form of
attributes attached to the nodes.
• Syntax directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes
associated to the non-terminals in their
definitions.
• The general approach to Syntax-Directed
Translation is to construct a parse tree or syntax
tree and compute the values of attributes at the
nodes of the tree by visiting them in some order.
In many cases, translation can be done during
parsing without building an explicit tree.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• To evaluate translation rules, we can employ one
depth first search traversal on the parse tree.
• This is possible only because SDT rules don’t
impose any specific order on evaluation until
children attributes are computed before parents
for a grammar having all synthesized attributes.
• Otherwise, we would have to figure out the best
suited plan to traverse through the parse tree and
evaluate all the attributes in one or more
traversals.
• For better understanding, we will move bottom
up in left to right fashion for computing
translation rules of our example.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• Syntax Directed Definition (SDD) is a kind of
abstract specification.
• It is generalization of context free grammar in
which each grammar production X –> a is
associated with it a set of production rules of
the form s = f(b1, b2, ……bk) where s is the
attribute obtained from function f.
• The attribute can be a string, number, type or
a memory location.
• Semantic rules are fragments of code which
are embedded usually at the end of
production and enclosed in curly braces ({ }).
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• Let us assume an input string 4 * 5 + 6 for
computing synthesized attributes.
• The annotated parse tree for the input string
is
• For computation of attributes we start from
leftmost bottom node.
• The rule F –> digit is used to reduce digit to F and
the value of digit is obtained from lexical analyzer
which becomes value of F i.e. from semantic
action F.val = digit.lexval.
• Hence, F.val = 4 and since T is parent node of F so,
we get T.val = 4 from semantic action T.val = F.val.
Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val *
F.val . Hence, T.val = 4 * 5 = 20
• Similarly, combination of E1.val + T.val becomes
E.val i.e. E.val = E1.val + T.val = 26.
• Then, the production S –> E is applied to reduce
E.val = 26 and semantic action associated with it
prints the result E.val .
• Hence, the output will be 26.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• Let us assume an input string int a, c for
computing inherited attributes.
• The annotated parse tree for the input string
is
• The value of L nodes is obtained from T.type
(sibling) which is basically lexical value
obtained as int, float or double.
• Then L node gives type of identifiers a and c.
The computation of type is done in top down
manner or preorder traversal.
• Using function Enter_type the type of
identifiers a and c is inserted in symbol table
at corresponding id.entry.
Compiler Design Introduction
Compiler Design Introduction
Removing Direct and Indirect Left
Recursion in a Grammar
Check if the given grammar contains left recursion, if present
then separate the production and start working on it.
In our example,
S-->S a/ S b /c / d
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Bootstrapping in Compiler Design
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• Step-4: Thus we get a compiler written in ASM
which compiles C and generates code in ASM.
Peephole Optimization in Compiler
Design
• Peephole optimization is a type of Code
Optimization performed on a small part of the
code.
• It is performed on the very small set of
instructions in a segment of code.
• It basically works on the theory of replacement in
which a part of code is replaced by shorter and
faster code without change in output.
• Peephole is the machine dependent optimization.
Objectives of Peephole Optimization
• The objective of peephole optimization is:
1. To improve performance
2. To reduce memory footprint
3. To reduce code size
Peephole Optimization Techniques
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Construction of LL(1) Parsing Table
• A top-down parser builds the parse tree from
the top down, starting with the start non-
terminal.
• There are two types of Top-Down Parsers:
1. Top-Down Parser with Backtracking
2. Top-Down Parsers without Backtracking
3. Top-Down Parsers without backtracking can
further be divided into two parts:
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• In the table, rows will contain the Non-
Terminals and the column will contain the
Terminal Symbols.
• All the Null Productions of the Grammars will
go under the Follow elements and the
remaining productions will lie under the
elements of the First set.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
• As you can see that all the null productions
are put under the Follow set of that symbol
and all the remaining productions are lie
under the First of that symbol.
• Note: Every grammar is not feasible for LL(1)
Parsing table.
• It may be possible that one cell may contain
more than one production.
Compiler Design Introduction
Compiler Design Introduction
SLR, CLR and LALR Parsers
• SLR Parser
The SLR parser is similar to LR(0) parser except
that the reduced entry.
• The reduced productions are written only in
the FOLLOW of the variable whose production
is reduced.
Compiler Design Introduction
Compiler Design Introduction
CLR PARSER
• In the SLR method we were working with
LR(0)) items.
• In CLR parsing we will be using LR(1) items.
• LR(k) item is defined to be an item using
lookaheads of length k.
• So , the LR(1) item is comprised of two parts :
the LR(0) item and the lookahead associated
with the item.
LR(1) parsers are more powerful parser.
For LR(1) items we modify the Closure and
GOTO function.
Compiler Design Introduction
Compiler Design Introduction
Introduction to YACC
• A parser generator is a program that takes as
input a specification of a syntax, and produces
as output a procedure for recognizing that
language. Historically, they are also called
compiler-compilers.
YACC (yet another compiler-compiler) is
an LALR(1) (LookAhead, Left-to-right,
Rightmost derivation producer with 1
lookahead token) parser generator.
• YACC was originally designed for being
complemented by Lex.
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Compiler Design Introduction
Input File
Compiler Design Introduction
Operator grammar and precedence
parser
• A grammar that is used to define
mathematical operators is called an operator
grammar or operator precedence grammar.
• Such grammars have the restriction that no
production has either an empty right-hand
side (null productions) or two adjacent non-
terminals in its right-hand side.
Compiler Design Introduction
• Operator precedence parser –
An operator precedence parser is a bottom-up
parser that interprets an operator grammar.
• This parser is only used for operator
grammars.
• Ambiguous grammars are not allowed in any
parser except operator precedence parser.
• There are two methods for determining what
precedence relations should hold between a
pair of terminals:
1. Use the conventional associativity and
precedence of operator.
2. The second method of selecting operator-
precedence relations is first to construct an
unambiguous grammar for the language, a
grammar that reflects the correct
associativity and precedence in its parse
trees.
• This parser relies on the following three
precedence relations: ⋖, ≐, ⋗
• a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has same precedence as”
b.
• There is not given any relation between id and
id as id will not be compared and two
variables can not come side by side.
• There is also a disadvantage of this table – if
we have n operators then size of table will be
n*n and complexity will be 0(n2).
• In order to decrease the size of table, we
use operator function table.
• Operator precedence parsers usually do not store
the precedence table with the relations; rather
they are implemented in a special way.
• Operator precedence parsers use precedence
functions that map terminal symbols to integers,
and the precedence relations between the
symbols are implemented by numerical
comparison.
• The parsing table can be encoded by two
precedence functions f and g that map terminal
symbols to integers. We select f and g such that:
• f(a) < g(b) whenever a yields precedence to b
• f(a) = g(b) whenever a and b have the same
precedence
• f(a) > g(b) whenever a takes precedence over b
Compiler Design Introduction
• Since there is no cycle in the graph, we can
make this function table:
Size of the table is 2n
• One disadvantage of function tables is that
even though we have blank entries in relation
table we have non-blank entries in function
table.
• Blank entries are also called error.
• Hence error detection capability of relation
table is greater than function table.

More Related Content

PPT
basics of compiler design
PPTX
Types of Compilers
PDF
Syntax analysis
PPT
Introduction to Compiler design
PPT
Compiler Design Basics
PPT
Principles of compiler design
PPTX
3. Syntax Analyzer.pptx
PDF
Generic Programming
basics of compiler design
Types of Compilers
Syntax analysis
Introduction to Compiler design
Compiler Design Basics
Principles of compiler design
3. Syntax Analyzer.pptx
Generic Programming

What's hot (20)

PPTX
Optimization of basic blocks
PPTX
Compiler Chapter 1
PPTX
PPTX
Context free grammar
PDF
An Introduction to the C++ Standard Library
PDF
Lecture 01 introduction to compiler
PPTX
Moore and mealy machines
PDF
Theory of Computation Basic Concepts and Grammar
PPTX
Editor structure
PPTX
Push down automata
PPTX
System programming vs application programming
PPTX
Introduction to Compilers
PPTX
Fundamentals of Language Processing
PDF
Constructors and destructors
PPT
Assembler
PPTX
Lexical Analysis - Compiler Design
PPTX
Deterministic Finite Automata
PPTX
Java byte code presentation
PDF
Functional programming
PPTX
Symbol table design (Compiler Construction)
Optimization of basic blocks
Compiler Chapter 1
Context free grammar
An Introduction to the C++ Standard Library
Lecture 01 introduction to compiler
Moore and mealy machines
Theory of Computation Basic Concepts and Grammar
Editor structure
Push down automata
System programming vs application programming
Introduction to Compilers
Fundamentals of Language Processing
Constructors and destructors
Assembler
Lexical Analysis - Compiler Design
Deterministic Finite Automata
Java byte code presentation
Functional programming
Symbol table design (Compiler Construction)
Ad

Similar to Compiler Design Introduction (20)

PPTX
Chapter 1.pptx
PPTX
Compiler Design Introduction With Design
PPT
Introduction to compiler design and phases of compiler
PPT
Compiler Design Basics
PPTX
Pros and cons of c as a compiler language
PPTX
Phases of Compiler.pptx
PPTX
Compiler an overview
PPTX
COMPILER DESIGN PPTS.pptx
PPTX
Unit 1 part1 Introduction of Compiler Design.pptx
PPTX
4_5802928814682016556.pptx
PPTX
CD - CH1 - Introduction to compiler design.pptx
PPTX
Cd ch1 - introduction
PDF
Compilers Principles, Practice & Tools Compilers
PPT
Chap01-Intro.ppt
PPTX
CD module 1.pptx Introduction to compiler Design
PPTX
Compiler Construction from very basic start
PPT
Concept of compiler in details
PPTX
Phases of Compiler
PDF
Chapter1.pdf
PPTX
System software module 1 presentation file
Chapter 1.pptx
Compiler Design Introduction With Design
Introduction to compiler design and phases of compiler
Compiler Design Basics
Pros and cons of c as a compiler language
Phases of Compiler.pptx
Compiler an overview
COMPILER DESIGN PPTS.pptx
Unit 1 part1 Introduction of Compiler Design.pptx
4_5802928814682016556.pptx
CD - CH1 - Introduction to compiler design.pptx
Cd ch1 - introduction
Compilers Principles, Practice & Tools Compilers
Chap01-Intro.ppt
CD module 1.pptx Introduction to compiler Design
Compiler Construction from very basic start
Concept of compiler in details
Phases of Compiler
Chapter1.pdf
System software module 1 presentation file
Ad

More from Thapar Institute (19)

PPTX
Digital Electronics Unit_4_new.pptx
PPTX
Digital Electronics Unit_3.pptx
PPTX
Digital Electronics Unit_2.pptx
PPTX
Digital Electronics Unit_1.pptx
PPTX
C Language Part 1
PPTX
Web Technology Part 4
PPTX
Web Technology Part 3
PPTX
Web Technology Part 2
PPTX
Web Technology Part 1
PPTX
TOC Introduction
PPTX
CSS Introduction
PPTX
COA (Unit_4.pptx)
PPTX
COA(Unit_3.pptx)
PPTX
COA (Unit_2.pptx)
PPTX
COA (Unit_1.pptx)
PPTX
Software Testing Introduction (Part 4))
PPTX
Software Testing Introduction (Part 3)
PPTX
Software Testing Introduction (Part 2)
PPTX
Software Testing Introduction (Part 1)
Digital Electronics Unit_4_new.pptx
Digital Electronics Unit_3.pptx
Digital Electronics Unit_2.pptx
Digital Electronics Unit_1.pptx
C Language Part 1
Web Technology Part 4
Web Technology Part 3
Web Technology Part 2
Web Technology Part 1
TOC Introduction
CSS Introduction
COA (Unit_4.pptx)
COA(Unit_3.pptx)
COA (Unit_2.pptx)
COA (Unit_1.pptx)
Software Testing Introduction (Part 4))
Software Testing Introduction (Part 3)
Software Testing Introduction (Part 2)
Software Testing Introduction (Part 1)

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
573137875-Attendance-Management-System-original
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Mechanical Engineering MATERIALS Selection
PDF
composite construction of structures.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Construction Project Organization Group 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
UNIT-1 - COAL BASED THERMAL POWER PLANTS
573137875-Attendance-Management-System-original
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Mechanical Engineering MATERIALS Selection
composite construction of structures.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Geodesy 1.pptx...............................................
UNIT 4 Total Quality Management .pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Construction Project Organization Group 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)

Compiler Design Introduction

  • 2. • Compiler is a software which converts a program written in high level language (Source Language) to low level language (Object/Target/Machine Language).
  • 3. • Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. • It is capable of creating code for a platform other than the one on which the compiler is running. • Source-to-source Compiler or transcompiler or transpiler is a compiler that translates source code written in one programming language into source code of another programming language.
  • 4. Language processing systems (using Compiler) • We know a computer is a logical assembly of Software and Hardware. • The hardware knows a language, that is hard for us to grasp, consequently we tend to write programs in high-level language, that is much less complicated for us to comprehend and maintain in thoughts. • Now these programs go through a series of transformation so that they can readily be used machines. • This is where language procedure systems come handy.
  • 6. • High Level Language – If a program contains #define or #include directives such as #include or #define it is called HLL. They are closer to humans but far from machines. These (#) tags are called pre-processor directives. They direct the pre-processor about what to do. • Pre-Processor – The pre-processor removes all the #include directives by including the files called file inclusion and all the #define directives using macro expansion. It performs file inclusion, augmentation, macro-processing etc. • Assembly Language – Its neither in binary form nor high level. It is an intermediate state that is a combination of machine instructions and some other useful data needed for execution. • Assembler – For every platform (Hardware + OS) we will have a assembler. They are not universal since for each platform we have one. The output of assembler is called object file. Its translates assembly language to machine code.
  • 7. • Interpreter – An interpreter converts high level language into low level machine language, just like a compiler. But they are different in the way they read the input. The Compiler in one go reads the inputs, does the processing and executes the source code whereas the interpreter does the same line by line. Compiler scans the entire program and translates it as a whole into machine code whereas an interpreter translates the program one statement at a time. Interpreted programs are usually slower with respect to compiled ones. • Relocatable Machine Code – It can be loaded at any point and can be run. The address within the program will be in such a way that it will cooperate for the program movement. • Loader/Linker – It converts the relocatable code into absolute code and tries to run the program resulting in a running program or an error message (or sometimes both can happen). Linker loads a variety of object files into a single file to make it executable. Then loader loads it in memory and executes it.
  • 8. Phases of a Compiler • There are two major phases of compilation, which in turn have many parts. • Each of them take input from the output of the previous level and work in a coordinated way.
  • 10. Analysis Phase • An intermediate representation is created from the give source code : 1. Lexical Analyzer 2. Syntax Analyzer 3. Semantic Analyzer 4. Intermediate Code Generator
  • 11. • Lexical analyzer divides the program into “tokens”, • Syntax analyzer recognizes “sentences” in the program using syntax of language and • Semantic analyzer checks static semantics of each construct. • Intermediate Code Generator generates “abstract” code.
  • 12. Synthesis Phase • Equivalent target program is created from the intermediate representation. • It has two parts : 1. Code Optimizer 2. Code Generator • Code Optimizer optimizes the abstract code, and • final Code Generator translates abstract intermediate code into specific machine instructions.
  • 13. Compiler construction tools • The compiler writer can use some specialized tools that help in implementing various phases of a compiler. • These tools assist in the creation of an entire compiler or its parts. • Some commonly used compiler construction tools include:
  • 14. 1. Parser Generator – • It produces syntax analyzers (parsers) from the input that is based on a grammatical description of programming language or on a context-free grammar. • It is useful as the syntax analysis phase is highly complex and consumes more manual and compilation time. Example: PIC, EQM
  • 16. 2. Scanner Generator • It generates lexical analyzers from the input that consists of regular expression description based on tokens of a language. • It generates a finite automation to recognize the regular expression. Example: Lex
  • 18. • Syntax directed translation engines – It generates intermediate code with three address format from the input that consists of a parse tree. These engines have routines to traverse the parse tree and then produces the intermediate code. In this, each node of the parse tree is associated with one or more translations. • Automatic code generators – It generates the machine language for a target machine. Each operation of the intermediate language is translated using a collection of rules and then is taken as an input by the code generator. Template matching process is used. An intermediate language statement is replaced by its equivalent machine language statement using templates.
  • 19. • Data-flow analysis engines – It is used in code optimization. • Data flow analysis is a key part of the code optimization that gathers the information, that is the values that flow from one part of a program to another. • Compiler construction toolkits – It provides an integrated set of routines that aids in building compiler components or in the construction of various phases of compiler.
  • 20. Symbol Table in Compiler • Symbol Table is an important data structure created and maintained by the compiler in order to keep track of semantics of variable i.e. it stores information about scope and binding information about names, information about instances of various entities such as variable and function names, classes, objects, etc.
  • 21. • It is built in lexical and syntax analysis phases. • The information is collected by the analysis phases of compiler and is used by synthesis phases of compiler to generate code. • It is used by compiler to achieve compile time efficiency.
  • 22. • It is used by various phases of compiler as follows :- • Lexical Analysis: Creates new table entries in the table, example like entries about token. • Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of reference, use, etc in the table. • Semantic Analysis: Uses available information in the table to check for semantics i.e. to verify that expressions and assignments are semantically correct(type checking) and update it accordingly. • Intermediate Code generation: Refers symbol table for knowing how much and what type of run-time is allocated and table helps in adding temporary variable information. • Code Optimization: Uses information present in symbol table for machine dependent optimization. • Target Code generation: Generates code by using address information of identifier present in the table.
  • 23. • Symbol Table entries – • Each entry in symbol table is associated with attributes that support compiler in different phases. Items stored in Symbol table: • Variable names and constants • Procedure and function names • Literal constants and strings • Compiler generated temporaries • Labels in source languages
  • 25. • Operations of Symbol table – • The basic operations defined on a symbol table include:
  • 34. Consider the following expression and construct a DAG for it ( a + b ) x ( a + b + c )
  • 36. Intermediate Code Generation in Compiler Design • In the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). • The benefits of using machine independent intermediate code are: • Because of the machine independent intermediate code, portability will be enhanced.
  • 37. • For ex, suppose, if a compiler translates the source language to its target machine language without having the option for generating intermediate code, then for each new machine, a full native compiler is required. • Because, obviously, there were some modifications in the compiler itself according to the machine specifications. • Retargeting is facilitated • It is easier to apply source code modification to improve the performance of source code by optimising the intermediate code.
  • 39. • If we generate machine code directly from source code then for n target machine we will have n optimisers and n code generators but if we will have a machine independent intermediate code, we will have only one optimiser. • Intermediate code can be either language specific (e.g., Bytecode for Java) or language. independent (three-address code).
  • 40. The following are commonly used intermediate code representation: • Postfix Notation – • The ordinary (infix) way of writing the sum of a and b is with operator in the middle : a + b • The postfix notation for the same expression places the operator at the right end as ab +. • In general, if e1 and e2 are any postfix expressions, and + is any binary operator, the result of applying + to the values denoted by e1 and e2 is postfix notation by e1e2 +. • No parentheses are needed in postfix notation because the position and arity (number of arguments) of the operators permit only one way to decode a postfix expression. • In postfix notation the operator follows the operand. • Example – The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+. Read more: Infix to Postfix
  • 44. Introduction of Lexical Analysis • Lexical Analysis is the first phase of compiler also known as scanner. • It converts the High level input program into a sequence of Tokens. • Lexical Analysis can be implemented with the Deterministic finite Automata. • The output is a sequence of tokens that is sent to the parser for syntax analysis
  • 46. • What is a token? A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages. • Example of tokens: • Type token (id, number, real, . . . ) • Punctuation tokens (IF, void, return, . . . ) • Alphabetic tokens (keywords)
  • 47. Example of Non-Tokens: Comments, preprocessor directive, macros, blanks, tabs, newline etc
  • 48. • Lexeme: • The sequence of characters matched by a pattern to form the corresponding token or a sequence of input characters that comprises a single token is called a lexeme. • eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
  • 49. • How Lexical Analyzer functions • 1. Tokenization .i.e Dividing the program into valid tokens. 2. Remove white space characters. 3. Remove comments. 4. It also provides help in generating error message by providing row number and column number.
  • 51. • Exercise 2: • Count number of tokens : • int max(int i); • Lexical analyzer first read int and finds it to be valid and accepts as token • max is read by it and found to be valid function name after reading ( • int is also a token , then again i as another token and finally ;
  • 52. Error detection and Recovery in Compiler • In this phase of compilation, all possible errors made by the user are detected and reported to the user in form of error messages. This process of locating errors and reporting it to user is called Error Handling process. Functions of Error handler • Detection • Reporting • Recovery
  • 54. Compile time errors are of three types • These errors are detected during the lexical analysis phase. Typical lexical errors are 1. Exceeding length of identifier or numeric constants. 2. Appearance of illegal characters 3. Unmatched string
  • 56. Syntactic phase errors • These errors are detected during syntax analysis phase. Typical syntax errors are • Errors in structure • Missing operator • Misspelled keywords • Unbalanced parenthesis
  • 58. Semantic errors • These errors are detected during semantic analysis phase. Typical semantic errors are 1. Incompatible type of operands 2. Undeclared variables 3. Not matching of actual arguments with formal one
  • 60. Code Optimization in Compiler Design • The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result. Compiler optimizing process should meet the following objectives : • The optimization must be correct, it must not, in any way, change the meaning of the program. • Optimization should increase the speed and performance of the program. • The compilation time must be kept reasonable. • The optimization process should not delay the overall compiling process.
  • 61. • When to Optimize? Optimization of the code is often performed at the end of the development stage since it reduces readability and adds code that is used to increase the performance. • Why Optimize? Optimizing an algorithm is beyond the scope of the code optimization phase. So the program is optimized. And it may involve reducing the size of the code. So optimization helps to: • Reduce the space consumed and increases the speed of compilation. • Manually analyzing datasets involves a lot of time. Hence we make use of software like Tableau for data analysis. Similarly manually performing the optimization is also tedious and is better done using a code optimizer. • An optimized code often promotes re-usability.
  • 62. Types of Code Optimization –The optimization process can be broadly classified into two types : • Machine Independent Optimization – This code optimization phase attempts to improve the intermediate code to get a better target code as the output. The part of the intermediate code which is transformed here does not involve any CPU registers or absolute memory locations. • Machine Dependent Optimization – Machine- dependent optimization is done after the target code has been generated and when the code is transformed according to the target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references. Machine-dependent optimizers put efforts to take maximum advantage of the memory hierarchy
  • 63. Code Optimization is done in the following different ways
  • 64. Hence, after variable propagation, a*b and x*b will be identified as common sub- expression.
  • 69. Three address code in Compiler • Three address code is a type of intermediate code which is easy to generate and can be easily converted to machine code. • It makes use of at most three addresses and one operator to represent an expression and the value computed at each instruction is stored in temporary variable generated by compiler. • The compiler decides the order of operation given by three address code.
  • 70. General representation • Where a, b or c represents operands like names, constants or compiler generated temporaries and op represents the operator
  • 72. • Example-2: Write three address code for following code
  • 74. Parse Tree • Parse : It means to resolve (a sentence) into its component parts and describe their syntactic roles or simply it is an act of parsing a string or a text. • Tree : A tree may be a widely used abstract data type that simulates a hierarchical tree structure, with a root value and sub-trees of youngsters with a parent node, represented as a group of linked nodes.
  • 77. • Uses of Parse Tree : • It helps in making syntax analysis by reflecting the syntax of the input language. • It uses an in-memory representation of the input with a structure that conforms to the grammar. • The advantages of using parse trees rather than semantic actions: you’ll make multiple passes over the info without having to re- parse the input.
  • 78. Types of Parsers in Compiler Design • Parser is that phase of compiler which takes token string as input and with the help of existing grammar, converts it into the corresponding parse tree. • Parser is also known as Syntax Analyzer.
  • 84. Bottom Up or Shift Reduce Parsers | Set 2 • Bottom Up Parsers / Shift Reduce Parsers Build the parse tree from leaves to root. • Bottom-up parsing can be defined as an attempt to reduce the input string w to the start symbol of grammar by tracing out the rightmost derivations of w in reverse. Eg.
  • 89. Recursive Descent Parser • Parsing is the process to determine whether the start symbol can derive the program or not. • If the Parsing is successful then the program is a valid program otherwise the program is invalid.
  • 90. There are generally two types of Parsers
  • 91. Recursive Descent Parser • It is a kind of Top-Down Parser. • A top-down parser builds the parse tree from the top to down, starting with the start non- terminal. • A Predictive Parser is a special case of Recursive Descent Parser, where no Back Tracking is required. • By carefully writing a grammar means eliminating left recursion and left factoring from it, the resulting grammar will be a grammar that can be parsed by a recursive descent parser.
  • 92. **Here e is Epsilon
  • 94. • A general shift reduce parsing is LR parsing. • The L stands for scanning the input from left to right and R stands for constructing a rightmost derivation in reverse. Benefits of LR parsing: • Many programming languages using some variations of an LR parser. It should be noted that C++ and Perl are exceptions to it. • LR Parser can be implemented very efficiently • Here we will look at the construction of GOTO graph of grammar by using all the four LR parsing techniques
  • 106. S – attributed and L – attributed SDTs in Syntax directed translation • Types of attributes – Attributes may be of two types – Synthesized or Inherited. 1. Synthesized attributes – A Synthesized attribute is an attribute of the non-terminal on the left-hand side of a production. • Synthesized attributes represent information that is being passed up the parse tree. • The attribute can take value only from its children (Variables in the RHS of the production).
  • 107. • For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is dependent on B’s attributes or C’s attributes then it will be synthesized attribute.
  • 108. 2. Inherited attributes – An attribute of a nonterminal on the right- hand side of a production is called an inherited attribute. • The attribute can take value either from its parent or from its siblings (variables in the LHS or RHS of the production). • For example, let’s say A -> BC is a production of a grammar and B’s attribute is dependent on A’s attributes or C’s attributes then it will be inherited attribute.
  • 109. • Now, let’s discuss about S-attributed and L- attributed SDT. • S-attributed SDT : – If an SDT uses only synthesized attributes, it is called as S-attributed SDT. – S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes. – Semantic actions are placed in rightmost place of RHS.
  • 110. • L-attributed SDT: – If an SDT uses both synthesized attributes and inherited attributes with a restriction that inherited attribute can inherit values from left siblings only, it is called as L-attributed SDT. – Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner. – Semantic actions are placed anywhere in RHS.
  • 111. • For example, • A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S} is not an L-attributed grammar • since Y.S = A.S and Y.S = X.S are allowed but Y.S = Z.S violates the L-attributed SDT definition as attributed is inheriting the value from its right sibling.
  • 114. Compiler Design | Detection of a Loop in Three Address Code • Loop optimization is the phase after the Intermediate Code Generation. • The main intention of this phase is to reduce the number of lines in a program. • In any program majority of the time is spent by any program is actually inside the loop for an iterative program. • In case of the recursive program a block will be there and the majority of the time will present inside the block
  • 115. • Loop Optimization – • To apply loop optimization we must first detect loops. • For detecting loops we use Control Flow Analysis(CFA) using Program Flow Graph(PFG). • To find program flow graph we need to find Basic Block
  • 116. • Basic Block – • A basic block is a sequence of three address statements where control enters at the beginning and leaves only at the end without any jumps or halts. • Finding the Basic Block – In order to find the basic blocks, we need to find the leaders in the program. • Then a basic block will start from one leader to the next leader but not including the next leader. Which means if you find out that lines no 1 is a leader and line no 15 is the next leader, then the line from 1 to 14 is a basic block, but not including line no 15
  • 117. • Identifying leader in a Basic Block – • First statement is always a leader • Statement that is target of conditional or un- conditional statement is a leader • Statement that follows immediately a conditional or un-conditional statement is a leader
  • 119. Language Processors: Assembler, Compiler and Interpreter • Language Processors – Assembly language is machine dependent yet mnemonics that are being used to represent instructions in it are not directly understandable by machine and high Level language is machine independent. • A computer understands instructions in machine code, i.e. in the form of 0s and 1s. • It is a tedious task to write a computer program directly in machine code. • The programs are written mostly in high level languages like Java, C++, Python etc. and are called source code.
  • 120. • These source code cannot be executed directly by the computer and must be converted into machine language to be executed. • Hence, a special translator system software is used to translate the program written in high level language into machine code is called Language Processor and the program after translated into machine code (object program / object code).
  • 121. The language processors can be any of the following three types: • Compiler – The language processor that reads the complete source program written in high level language as a whole in one go and translates it into an equivalent program in machine language is called as a Compiler. Example: C, C++, C#, Java In a compiler, the source code is translated to object code successfully if it is free of errors. • The compiler specifies the errors at the end of compilation with line numbers when there are any errors in the source code. • The errors must be removed before the compiler can successfully recompile the source code again.
  • 123. • Assembler – The Assembler is used to translate the program written in Assembly language into machine code. • The source program is a input of assembler that contains assembly language instructions. • The output generated by assembler is the object code or machine code understandable by the computer.
  • 125. • Interpreter – The translation of single statement of source program into machine code is done by language processor and executes it immediately before moving on to the next line is called an interpreter. • If there is an error in the statement, the interpreter terminates its translating process at that statement and displays an error message. • The interpreter moves on to the next line for execution only after removal of the error. • An Interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code. Example: Perl, Python and Matlab.
  • 126. FIRST Set in Syntax Analysis
  • 129. Note
  • 130. FOLLOW Set in Syntax Analysis • Follow(X) to be the set of terminals that can appear immediately to the right of Non- Terminal X in some sentential form.
  • 132. • 1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal • 2) If A -> pBq is a production, where p, B and q are any grammar symbols, then everything in FIRST(q) except Є is in FOLLOW(B). • 3) If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B). • 4) If A->pBq is a production and FIRST(q) contains Є, then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
  • 134. • FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th rule • FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) } FOLLOW(T’) = FOLLOW(T) = { + , $ , ) } FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }
  • 138. Classification of Context Free Grammars • Context Free Grammars (CFG) can be classified on the basis of following two properties: • 1) Based on number of strings it generates. • If CFG is generating finite number of strings, then CFG is Non-Recursive (or the grammar is said to be Non- recursive grammar) • If CFG can generate infinite number of strings then the grammar is said to be Recursive grammar • During Compilation, the parser uses the grammar of the language to make a parse tree(or derivation tree) out of the source code. The grammar used must be unambiguous. An ambiguous grammar must not be used for parsing.
  • 139. • 2) Based on number of derivation trees. • If there is only 1 derivation tree then the CFG is unambiguous. • If there are more than 1 derivation tree, then the CFG is ambiguous.
  • 143. Ambiguous Grammar • Context Free Grammars(CFGs) are classified based on: 1. Number of Derivation trees 2. Number of strings Depending on Number of Derivation trees, CFGs are sub-divided into 2 types: 1. Ambiguous grammars 2. Unambiguous grammars
  • 145. • For Example: 1. Let us consider this grammar : E -> E+E|id • We can create 2 parse tree from this grammar to obtain a string id+id+id :
  • 146. Note • Both the above parse trees are derived from same grammar rules but both parse trees are different. • Hence the grammar is ambiguous.
  • 148. • From the above grammar String 3*2+5 can be derived in 2 ways:
  • 150. • Derivation tree: • It tells how string is derived using production rules from S and has been shown in Figure 1.
  • 159. • Note : Ambiguity of a grammar is undecidable, i.e. there is no particular algorithm for removing the ambiguity of a grammar, but we can remove ambiguity by: • Disambiguate the grammar i.e., rewriting the grammar such that there is only one derivation or parse tree possible for a string of the language which the grammar represents.
  • 160. Clone a Directed Acyclic Graph • A directed acyclic graph (DAG) is a graph which doesn’t contain a cycle and has directed edges. • We are given a DAG, we need to clone it, i.e., create another graph that has copy of its vertices and edges connecting them.
  • 161. Syntax Directed Translation in Compiler Design • Background : Parser uses a CFG(Context-free- Grammer) to validate the input string and produce output for next phase of the compiler. • Output could be either a parse tree or abstract syntax tree. • Now to interleave semantic analysis with syntax analysis phase of the compiler, we use Syntax Directed Translation.
  • 162. • Definition Syntax Directed Translation are augmented rules to the grammar that facilitate semantic analysis. SDT involves passing information bottom-up and/or top-down the parse tree in form of attributes attached to the nodes. • Syntax directed translation rules use 1) lexical values of nodes, 2) constants & 3) attributes associated to the non-terminals in their definitions. • The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree and compute the values of attributes at the nodes of the tree by visiting them in some order. In many cases, translation can be done during parsing without building an explicit tree.
  • 166. • To evaluate translation rules, we can employ one depth first search traversal on the parse tree. • This is possible only because SDT rules don’t impose any specific order on evaluation until children attributes are computed before parents for a grammar having all synthesized attributes. • Otherwise, we would have to figure out the best suited plan to traverse through the parse tree and evaluate all the attributes in one or more traversals. • For better understanding, we will move bottom up in left to right fashion for computing translation rules of our example.
  • 170. • Syntax Directed Definition (SDD) is a kind of abstract specification. • It is generalization of context free grammar in which each grammar production X –> a is associated with it a set of production rules of the form s = f(b1, b2, ……bk) where s is the attribute obtained from function f. • The attribute can be a string, number, type or a memory location. • Semantic rules are fragments of code which are embedded usually at the end of production and enclosed in curly braces ({ }).
  • 175. • Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. • The annotated parse tree for the input string is
  • 176. • For computation of attributes we start from leftmost bottom node. • The rule F –> digit is used to reduce digit to F and the value of digit is obtained from lexical analyzer which becomes value of F i.e. from semantic action F.val = digit.lexval. • Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F production, the corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 = 20 • Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26. • Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated with it prints the result E.val . • Hence, the output will be 26.
  • 180. • Let us assume an input string int a, c for computing inherited attributes. • The annotated parse tree for the input string is
  • 181. • The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as int, float or double. • Then L node gives type of identifiers a and c. The computation of type is done in top down manner or preorder traversal. • Using function Enter_type the type of identifiers a and c is inserted in symbol table at corresponding id.entry.
  • 184. Removing Direct and Indirect Left Recursion in a Grammar
  • 185. Check if the given grammar contains left recursion, if present then separate the production and start working on it. In our example, S-->S a/ S b /c / d
  • 195. • Step-4: Thus we get a compiler written in ASM which compiles C and generates code in ASM.
  • 196. Peephole Optimization in Compiler Design • Peephole optimization is a type of Code Optimization performed on a small part of the code. • It is performed on the very small set of instructions in a segment of code. • It basically works on the theory of replacement in which a part of code is replaced by shorter and faster code without change in output. • Peephole is the machine dependent optimization.
  • 197. Objectives of Peephole Optimization • The objective of peephole optimization is: 1. To improve performance 2. To reduce memory footprint 3. To reduce code size
  • 202. Construction of LL(1) Parsing Table • A top-down parser builds the parse tree from the top down, starting with the start non- terminal. • There are two types of Top-Down Parsers: 1. Top-Down Parser with Backtracking 2. Top-Down Parsers without Backtracking 3. Top-Down Parsers without backtracking can further be divided into two parts:
  • 206. • In the table, rows will contain the Non- Terminals and the column will contain the Terminal Symbols. • All the Null Productions of the Grammars will go under the Follow elements and the remaining productions will lie under the elements of the First set.
  • 210. • As you can see that all the null productions are put under the Follow set of that symbol and all the remaining productions are lie under the First of that symbol. • Note: Every grammar is not feasible for LL(1) Parsing table. • It may be possible that one cell may contain more than one production.
  • 213. SLR, CLR and LALR Parsers • SLR Parser The SLR parser is similar to LR(0) parser except that the reduced entry. • The reduced productions are written only in the FOLLOW of the variable whose production is reduced.
  • 216. CLR PARSER • In the SLR method we were working with LR(0)) items. • In CLR parsing we will be using LR(1) items. • LR(k) item is defined to be an item using lookaheads of length k. • So , the LR(1) item is comprised of two parts : the LR(0) item and the lookahead associated with the item. LR(1) parsers are more powerful parser. For LR(1) items we modify the Closure and GOTO function.
  • 219. Introduction to YACC • A parser generator is a program that takes as input a specification of a syntax, and produces as output a procedure for recognizing that language. Historically, they are also called compiler-compilers. YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right, Rightmost derivation producer with 1 lookahead token) parser generator. • YACC was originally designed for being complemented by Lex.
  • 226. Operator grammar and precedence parser • A grammar that is used to define mathematical operators is called an operator grammar or operator precedence grammar. • Such grammars have the restriction that no production has either an empty right-hand side (null productions) or two adjacent non- terminals in its right-hand side.
  • 228. • Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets an operator grammar. • This parser is only used for operator grammars. • Ambiguous grammars are not allowed in any parser except operator precedence parser. • There are two methods for determining what precedence relations should hold between a pair of terminals:
  • 229. 1. Use the conventional associativity and precedence of operator. 2. The second method of selecting operator- precedence relations is first to construct an unambiguous grammar for the language, a grammar that reflects the correct associativity and precedence in its parse trees.
  • 230. • This parser relies on the following three precedence relations: ⋖, ≐, ⋗ • a ⋖ b This means a “yields precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has same precedence as” b.
  • 231. • There is not given any relation between id and id as id will not be compared and two variables can not come side by side. • There is also a disadvantage of this table – if we have n operators then size of table will be n*n and complexity will be 0(n2). • In order to decrease the size of table, we use operator function table.
  • 232. • Operator precedence parsers usually do not store the precedence table with the relations; rather they are implemented in a special way. • Operator precedence parsers use precedence functions that map terminal symbols to integers, and the precedence relations between the symbols are implemented by numerical comparison. • The parsing table can be encoded by two precedence functions f and g that map terminal symbols to integers. We select f and g such that: • f(a) < g(b) whenever a yields precedence to b • f(a) = g(b) whenever a and b have the same precedence • f(a) > g(b) whenever a takes precedence over b
  • 234. • Since there is no cycle in the graph, we can make this function table: Size of the table is 2n
  • 235. • One disadvantage of function tables is that even though we have blank entries in relation table we have non-blank entries in function table. • Blank entries are also called error. • Hence error detection capability of relation table is greater than function table.