SlideShare a Scribd company logo
Introduction to Compilation
Lecture 01
What is a compiler?
• Programming problems are easier to solve in high-level
languages
– Languages closer to the level of the problem domain, e.g.,
• SmallTalk: OO programming
• JavaScript: Web pages
• Solutions are usually more efficient (faster, smaller) when
written in machine language
– Language that reflects to the cycle-by-cycle working of a processor
• Compilers are the bridges:
– Tools to translate programs written in high-level languages to
efficient executable code
What is a compiler?
A program that reads a program written in one language
and translates it into another language.
Source language Target language
Traditionally, compilers go from high-level languages to low-level
languages.
Compilation task is full of variety??
• Thousands of source languages
– Fortran, Pascal, C, C++, Java, ……
• Thousands of target languages
– Some other lower level language (assembly language),
machine language
• Compilation process has similar variety
– Single pass, multi-pass, load-and-go, optimizing….
• Variety is overwhelming……
• Good news is:
– Few basic techniques is sufficient to cover all variety
– Many efficient tools are available
Requirement
• In order to translate statements in a language, one
needs to understand both
– the structure of the language: the way “sentences" are
constructed in the language, and
– the meaning of the language: what each “sentence"
stands for.
• Terminology:
Structure ≡ Syntax
Meaning ≡ Semantics
Analysis-Synthesis model of compilation
Front End –
language specific
Back End –
machine specific
Source
Language
Target Language
Intermediate
Language
• Two parts
– Analysis
• Breaks up the source program into constituents
– Synthesis
• Constructs the target program
Software tools that performs analysis
• Structure Editors
– Gives hierarchical structure
– Suggests keywords/structures automatically
• Pretty Printer
– Provides an organized and structured look to program
– Different color, font, indentation are used
• Static Checkers
– Tries to discover potential bugs without running
– Identify logical errors, type-checking, dead codes
identification etc
• Interpreters
– Does not produce a target program
– Executes the operations implied by the program
Compilation Steps/Phases
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
tokens Syntactic
structure
Intermediate
Language
Target
language
Intermediate
Language
Compilation Steps/Phases
• Lexical Analysis Phase: Generates the “tokens” in the source
program
• Syntax Analysis Phase: Recognizes “sentences" in the program
using the syntax of the language
• Semantic Analysis Phase: Infers information about the program
using the semantics of the language
• Intermediate Code Generation Phase: Generates “abstract” code
based on the syntactic structure of the program and the semantic
information from Phase 2
• Optimization Phase: Refines the generated code using a series of
optimizing transformations
• Final Code Generation Phase: Translates the abstract intermediate
code into specific machine instructions
Lexical Analysis
• Convert the stream of characters representing input
program into a sequence of tokens
• Tokens are the “words" of the programming language
• Lexeme
– The characters comprising a token
• For instance, the sequence of characters “static int" is
recognized as two tokens, representing the two words
“static" and “int"
• The sequence of characters “*x++" is recognized as
three tokens, representing “*", “x" and “++“
• Removes the white spaces
• Removes the comments
Lexical Analysis
• Input: result = a + b * 10
• Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘10’
identifiers
operators
Syntax Analysis (Parsing)
• Uncover the structure of a sentence in the program from a
stream of tokens.
• For instance, the phrase “x = +y", which is recognized as
four tokens, representing “x", “=“ and “+" and “y", has the
structure =(x,+(y)), i.e., an assignment expression, that
operates on “x" and the expression “+(y)".
• Build a tree called a parse tree that reflects the structure of
the input sentence.
Syntax Analysis: Grammars
• Expression grammar
Exp ::= Exp ‘+’ Exp
| Exp ‘*’ Exp
| ID
| NUMBER
Assign ::= ID ‘=‘ Exp
Syntax Tree
Assign
result +
a *
b 10
Input: result = a + b * 10
Semantic Analysis
• Concerned with the semantic (meaning) of the
program
• Performs type checking
– Operator operand compitability
Intermediate Code Generation
• Translate each hierarchical structure decorated as
tree into intermediate code
• A program translated for an abstract machine
• Properties of intermediate codes
– Should be easy to generate
– Should be easy to translate
• Intermediate code hides many machine-level details,
but has instruction-level mapping to many assembly
languages
• Main motivation: portability
• One commonly used form is “Three-address Code”
Code Optimization
• Apply a series of transformations to improve the time
and space efficiency of the generated code.
• Peephole optimizations: generate new instructions by
combining/expanding on a small number of
consecutive instructions.
• Global optimizations: reorder, remove or add
instructions to change the structure of generated code
• Consumes a significant fraction of the compilation
time
• Optimization capability varies widely
• Simple optimization techniques can be vary valuable
Code Generation
• Map instructions in the intermediate code to specific
machine instructions.
• Memory management, register allocation, instruction
selection, instruction scheduling, …
• Generates sufficient information to enable symbolic
debugging.
Symbol Table
• Records the identifiers used in the source program
– Collects various associated information as attributes
• Variables: type, scope, storage allocation
• Procedure: number and types of arguments method of
argument passing
• It’s a data structure with collection of records
– Different fields are collected and used at different
phases of compilation
Error Detection, Recovery and Reporting
• Each phase can encounter error
• Specific types of error can be detected by specific
phases
– Lexical Error: int abc, 1num;
– Syntax Error: total = capital + rate year;
– Semantic Error: value = myarray [realIndex];
• Should be able to proceed and process the rest of the
program after an error detected
• Should be able to link the error with the source
program
Error Detection, Recovery and Reporting
Scanner
(lexical
analysis)
Parser
(syntax
analysis)
Code
Optimizer
Semantic
Analysis
(IC generator)
Code
Generator
Symbol
Table
Source
language
tokens Syntactic
structure
Target
language
Error
Handler
Translation of a statement
Lexical Analyzer
result = a + b * 10
Syntax Analyzer
id1 = id2 + id3 * 10
Assign
id1 +
id2 *
id3 10
…….b
…….a
…….result
Symbol Table
Translation of a statement
Assign
id1 +
id2 *
id3 10 Semantic Analyzer
Assign
id1 +
id2 *
id3 INTTOREAL
10
Translation of a statement
Intermediate Code Generator
temp1 := INTTOREAL (10)
temp2 := id3 * temp1
temp3 := id2 + temp2
Id1 := temp3
Code Optimizer
temp1 := id3 * 10.0
Id1 := id2 + temp1
Code Generator
MOVF id3, R2
MULF #10.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
Syntax Analyzer versus Lexical Analyzer
• Which constructs of a program should be recognized by
the lexical analyzer, and which ones by the syntax
analyzer?
– Both of them do similar things; But the lexical analyzer deals
with simple non-recursive constructs of the language.
– The syntax analyzer deals with recursive constructs of the
language.
– The lexical analyzer simplifies the job of the syntax analyzer.
– The lexical analyzer recognizes the smallest meaningful
units (tokens) in a source program.
– The syntax analyzer works on the smallest meaningful units
(tokens) in a source program to recognize meaningful
structures in our programming language.
Cousins of the Compiler
• Preprocessor
– Macro preprocessing
• Define and use shorthand for longer constructs
– File inclusion
• Include header files
– “Rational” Preprocessors
• Augment older languages with modern flow-of-control or
data-structures
– Language Extension
• Add capabilities to a language
• Equel: query language embedded in C
Assemblers
Compiler
Assembler
Source program
Assembly program
Relocatable machine code
Loader/link-editor
Absolute machine code
Two-Pass Assembly
• Simplest form of assembler
• First pass
– All the identifiers are stored in a symbol table
– Storage is allocated
• Second pass
– Translates each operand code in the machine language
MOV a, R1
ADD #2, R1
MOV R1, b
Identifier Address
a 0
b 4
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
Source After First Pass After 2nd Pass
Instruction
code Register
Addressing
Mode Address
Relocation
bit
Loaders and Link-Editors
• Convert the relocatable machine code into absolute
machine code
– Map the relocatable address
• Place altered instructions and data in memory
• Make a single program from several files of
relocatable machine code
– Several files of relocatable codes
– Library files
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
0001 01 00 00001111
0011 01 10 00000010
0010 01 00 00010011
Address space of data
to be loaded starting
at location L=00001111
Multi Pass Compilers
• Passes
– Several phases of compilers are grouped in to passes
– Often passes generate an explicit output file
– In each pass the whole input file/source is processed
Syntax Analyzer
Lexical Analyzer Intermediate Code Generator
• Semantic analysis
How many passes?
• Relatively few passes is desirable
– Reading and writing intermediate files take time
– It may require to keep the entire file in memory
• One phase generate information in different order than
that is needed by the next phase
• Memory space is not trivial in some cases
• Grouping into same pass incurs some problems
– Intermediate code generation and code generation in
the same pass is difficult
• e.g. Target of ‘goto’ that jumps forward is now known
• ‘Backpatching’ can be a remedy
Issues Driving Compiler Design
• Correctness
• Speed (runtime and compile time)
– Degrees of optimization
– Multiple passes
• Space
• Feedback to user
• Debugging
Other Applications
• In addition to the development of a compiler, the techniques
used in compiler design can be applicable to many problems in
computer science.
– Techniques used in a lexical analyzer can be used in text editors,
information retrieval system, and pattern recognition programs.
– Techniques used in a parser can be used in a query processing
system such as SQL.
– Many software having a complex front-end may need techniques
used in compiler design.
• A symbolic equation solver which takes an equation as input.
That program should parse the given input equation.
– Most of the techniques used in compiler design can be used in
Natural Language Processing (NLP) systems.

More Related Content

PPT
Compiler Design
PPTX
Compiler design
PPT
Introduction to Compiler design
PPT
Compiler Design Basics
PDF
Compiler Design Lecture Notes
PPT
basics of compiler design
PPTX
Compiler Chapter 1
PPTX
Phases of compiler
Compiler Design
Compiler design
Introduction to Compiler design
Compiler Design Basics
Compiler Design Lecture Notes
basics of compiler design
Compiler Chapter 1
Phases of compiler

What's hot (20)

PPTX
Parsing in Compiler Design
PDF
Symbol table in compiler Design
PPTX
Code generation
PPT
Introduction to Compiler
PPTX
Compilers
PDF
Lecture 3 basic syntax and semantics
PPT
1.Role lexical Analyzer
PPTX
Lexical analysis - Compiler Design
PPTX
Phases of Compiler
PPT
Pass 1 flowchart
PPT
Assembler
PPTX
Passes of Compiler.pptx
PPTX
Code Optimization
PPTX
Introduction to loaders
PPTX
Error Detection & Recovery
PPTX
System Programming Overview
PPTX
Lecture 14 run time environment
PPTX
Language processing activity
PPTX
Compiler design syntax analysis
PPT
phases of a compiler
Parsing in Compiler Design
Symbol table in compiler Design
Code generation
Introduction to Compiler
Compilers
Lecture 3 basic syntax and semantics
1.Role lexical Analyzer
Lexical analysis - Compiler Design
Phases of Compiler
Pass 1 flowchart
Assembler
Passes of Compiler.pptx
Code Optimization
Introduction to loaders
Error Detection & Recovery
System Programming Overview
Lecture 14 run time environment
Language processing activity
Compiler design syntax analysis
phases of a compiler
Ad

Viewers also liked (20)

PPT
Compiler Design - Introduction to Compiler
PPT
what is compiler and five phases of compiler
PPTX
Lecture 02 lexical analysis
PDF
Phases of the Compiler - Systems Programming
PPT
What is Compiler?
PPTX
Fog computing ( foggy cloud)
PPT
Lexical analyzer
PPTX
Compiler vs Interpreter-Compiler design ppt.
PPT
Introduction to Compiler Construction
PPT
PPTX
Translators(Compiler, Assembler) and interpreter
PPT
Lecture 03 lexical analysis
PPT
Classification of Compilers
PPT
Compiler Construction introduction
PPT
Compiler Design Tutorial
DOC
Compiler Design(NANTHU NOTES)
PDF
Different phases of a compiler
PDF
Introduction to Functional Languages
PPTX
Passescd
Compiler Design - Introduction to Compiler
what is compiler and five phases of compiler
Lecture 02 lexical analysis
Phases of the Compiler - Systems Programming
What is Compiler?
Fog computing ( foggy cloud)
Lexical analyzer
Compiler vs Interpreter-Compiler design ppt.
Introduction to Compiler Construction
Translators(Compiler, Assembler) and interpreter
Lecture 03 lexical analysis
Classification of Compilers
Compiler Construction introduction
Compiler Design Tutorial
Compiler Design(NANTHU NOTES)
Different phases of a compiler
Introduction to Functional Languages
Passescd
Ad

Similar to Lecture 01 introduction to compiler (20)

PDF
Compiler design Introduction
PDF
unit1pdf__2021_12_14_12_37_34.pdf
PPTX
Chapter 1.pptx
PPTX
The Phases of a Compiler
PPTX
1._Introduction_.pptx
PPTX
1-Phases of compiler-26-04-2023.pptx
PPT
Chapter One
PPTX
PDF
Chapter1pdf__2021_11_23_10_53_20.pdf
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPTX
Unit1.pptx of compiler design students subjects
PPTX
Compiler Design Introduction
PPTX
Lecture 1 introduction to language processors
PPTX
Compiler an overview
PPTX
Phases of Compiler.pptx
PPT
compiler construvtion aaaaaaaaaaaaaaaaaads
PDF
Compiler_Lecture1.pdf
PPTX
Pros and cons of c as a compiler language
PDF
design intoduction of_COMPILER_DESIGN.pdf
PDF
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE
Compiler design Introduction
unit1pdf__2021_12_14_12_37_34.pdf
Chapter 1.pptx
The Phases of a Compiler
1._Introduction_.pptx
1-Phases of compiler-26-04-2023.pptx
Chapter One
Chapter1pdf__2021_11_23_10_53_20.pdf
COMPILER CONSTRUCTION KU 1.pptx
Unit1.pptx of compiler design students subjects
Compiler Design Introduction
Lecture 1 introduction to language processors
Compiler an overview
Phases of Compiler.pptx
compiler construvtion aaaaaaaaaaaaaaaaaads
Compiler_Lecture1.pdf
Pros and cons of c as a compiler language
design intoduction of_COMPILER_DESIGN.pdf
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE

More from Iffat Anjum (20)

PPTX
Cognitive radio network_MS_defense_presentation
PPTX
Lecture 15 run timeenvironment_2
PPT
Lecture 16 17 code-generation
PPTX
Lecture 12 intermediate code generation
PPT
Lecture 13 intermediate code generation 2.pptx
PPTX
Lecture 11 semantic analysis 2
PPTX
Lecture 09 syntax analysis 05
PPTX
Lecture 10 semantic analysis 01
PPTX
Lecture 07 08 syntax analysis-4
PPT
Lecture 06 syntax analysis 3
PPT
Lecture 05 syntax analysis 2
PPT
Lecture 04 syntax analysis
PPTX
Distributed contention based mac protocol for cognitive radio
PPT
On qo s provisioning in context aware wireless sensor networks for healthcare
PPT
Data link control
PPT
Pnp mac preemptive slot allocation and non preemptive transmission for provid...
PPT
Qo s based mac protocol for medical wireless body area sensor networks
PPT
A reinforcement learning based routing protocol with qo s support for biomedi...
PPT
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
PPTX
Quality of service aware mac protocol for body sensor networks
Cognitive radio network_MS_defense_presentation
Lecture 15 run timeenvironment_2
Lecture 16 17 code-generation
Lecture 12 intermediate code generation
Lecture 13 intermediate code generation 2.pptx
Lecture 11 semantic analysis 2
Lecture 09 syntax analysis 05
Lecture 10 semantic analysis 01
Lecture 07 08 syntax analysis-4
Lecture 06 syntax analysis 3
Lecture 05 syntax analysis 2
Lecture 04 syntax analysis
Distributed contention based mac protocol for cognitive radio
On qo s provisioning in context aware wireless sensor networks for healthcare
Data link control
Pnp mac preemptive slot allocation and non preemptive transmission for provid...
Qo s based mac protocol for medical wireless body area sensor networks
A reinforcement learning based routing protocol with qo s support for biomedi...
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
Quality of service aware mac protocol for body sensor networks

Recently uploaded (20)

PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Pre independence Education in Inndia.pdf
PPTX
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Business Ethics Teaching Materials for college
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
Insiders guide to clinical Medicine.pdf
Open Quiz Monsoon Mind Game Final Set.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
GDM (1) (1).pptx small presentation for students
Microbial diseases, their pathogenesis and prophylaxis
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pre independence Education in Inndia.pdf
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Week 4 Term 3 Study Techniques revisited.pptx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Business Ethics Teaching Materials for college
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Microbial disease of the cardiovascular and lymphatic systems
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharma ospi slides which help in ospi learning
Insiders guide to clinical Medicine.pdf

Lecture 01 introduction to compiler

  • 2. What is a compiler? • Programming problems are easier to solve in high-level languages – Languages closer to the level of the problem domain, e.g., • SmallTalk: OO programming • JavaScript: Web pages • Solutions are usually more efficient (faster, smaller) when written in machine language – Language that reflects to the cycle-by-cycle working of a processor • Compilers are the bridges: – Tools to translate programs written in high-level languages to efficient executable code
  • 3. What is a compiler? A program that reads a program written in one language and translates it into another language. Source language Target language Traditionally, compilers go from high-level languages to low-level languages.
  • 4. Compilation task is full of variety?? • Thousands of source languages – Fortran, Pascal, C, C++, Java, …… • Thousands of target languages – Some other lower level language (assembly language), machine language • Compilation process has similar variety – Single pass, multi-pass, load-and-go, optimizing…. • Variety is overwhelming…… • Good news is: – Few basic techniques is sufficient to cover all variety – Many efficient tools are available
  • 5. Requirement • In order to translate statements in a language, one needs to understand both – the structure of the language: the way “sentences" are constructed in the language, and – the meaning of the language: what each “sentence" stands for. • Terminology: Structure ≡ Syntax Meaning ≡ Semantics
  • 6. Analysis-Synthesis model of compilation Front End – language specific Back End – machine specific Source Language Target Language Intermediate Language • Two parts – Analysis • Breaks up the source program into constituents – Synthesis • Constructs the target program
  • 7. Software tools that performs analysis • Structure Editors – Gives hierarchical structure – Suggests keywords/structures automatically • Pretty Printer – Provides an organized and structured look to program – Different color, font, indentation are used • Static Checkers – Tries to discover potential bugs without running – Identify logical errors, type-checking, dead codes identification etc • Interpreters – Does not produce a target program – Executes the operations implied by the program
  • 9. Compilation Steps/Phases • Lexical Analysis Phase: Generates the “tokens” in the source program • Syntax Analysis Phase: Recognizes “sentences" in the program using the syntax of the language • Semantic Analysis Phase: Infers information about the program using the semantics of the language • Intermediate Code Generation Phase: Generates “abstract” code based on the syntactic structure of the program and the semantic information from Phase 2 • Optimization Phase: Refines the generated code using a series of optimizing transformations • Final Code Generation Phase: Translates the abstract intermediate code into specific machine instructions
  • 10. Lexical Analysis • Convert the stream of characters representing input program into a sequence of tokens • Tokens are the “words" of the programming language • Lexeme – The characters comprising a token • For instance, the sequence of characters “static int" is recognized as two tokens, representing the two words “static" and “int" • The sequence of characters “*x++" is recognized as three tokens, representing “*", “x" and “++“ • Removes the white spaces • Removes the comments
  • 11. Lexical Analysis • Input: result = a + b * 10 • Tokens: ‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘10’ identifiers operators
  • 12. Syntax Analysis (Parsing) • Uncover the structure of a sentence in the program from a stream of tokens. • For instance, the phrase “x = +y", which is recognized as four tokens, representing “x", “=“ and “+" and “y", has the structure =(x,+(y)), i.e., an assignment expression, that operates on “x" and the expression “+(y)". • Build a tree called a parse tree that reflects the structure of the input sentence.
  • 13. Syntax Analysis: Grammars • Expression grammar Exp ::= Exp ‘+’ Exp | Exp ‘*’ Exp | ID | NUMBER Assign ::= ID ‘=‘ Exp
  • 14. Syntax Tree Assign result + a * b 10 Input: result = a + b * 10
  • 15. Semantic Analysis • Concerned with the semantic (meaning) of the program • Performs type checking – Operator operand compitability
  • 16. Intermediate Code Generation • Translate each hierarchical structure decorated as tree into intermediate code • A program translated for an abstract machine • Properties of intermediate codes – Should be easy to generate – Should be easy to translate • Intermediate code hides many machine-level details, but has instruction-level mapping to many assembly languages • Main motivation: portability • One commonly used form is “Three-address Code”
  • 17. Code Optimization • Apply a series of transformations to improve the time and space efficiency of the generated code. • Peephole optimizations: generate new instructions by combining/expanding on a small number of consecutive instructions. • Global optimizations: reorder, remove or add instructions to change the structure of generated code • Consumes a significant fraction of the compilation time • Optimization capability varies widely • Simple optimization techniques can be vary valuable
  • 18. Code Generation • Map instructions in the intermediate code to specific machine instructions. • Memory management, register allocation, instruction selection, instruction scheduling, … • Generates sufficient information to enable symbolic debugging.
  • 19. Symbol Table • Records the identifiers used in the source program – Collects various associated information as attributes • Variables: type, scope, storage allocation • Procedure: number and types of arguments method of argument passing • It’s a data structure with collection of records – Different fields are collected and used at different phases of compilation
  • 20. Error Detection, Recovery and Reporting • Each phase can encounter error • Specific types of error can be detected by specific phases – Lexical Error: int abc, 1num; – Syntax Error: total = capital + rate year; – Semantic Error: value = myarray [realIndex]; • Should be able to proceed and process the rest of the program after an error detected • Should be able to link the error with the source program
  • 21. Error Detection, Recovery and Reporting Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language tokens Syntactic structure Target language Error Handler
  • 22. Translation of a statement Lexical Analyzer result = a + b * 10 Syntax Analyzer id1 = id2 + id3 * 10 Assign id1 + id2 * id3 10 …….b …….a …….result Symbol Table
  • 23. Translation of a statement Assign id1 + id2 * id3 10 Semantic Analyzer Assign id1 + id2 * id3 INTTOREAL 10
  • 24. Translation of a statement Intermediate Code Generator temp1 := INTTOREAL (10) temp2 := id3 * temp1 temp3 := id2 + temp2 Id1 := temp3 Code Optimizer temp1 := id3 * 10.0 Id1 := id2 + temp1 Code Generator MOVF id3, R2 MULF #10.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1
  • 25. Syntax Analyzer versus Lexical Analyzer • Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer? – Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language. – The syntax analyzer deals with recursive constructs of the language. – The lexical analyzer simplifies the job of the syntax analyzer. – The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program. – The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language.
  • 26. Cousins of the Compiler • Preprocessor – Macro preprocessing • Define and use shorthand for longer constructs – File inclusion • Include header files – “Rational” Preprocessors • Augment older languages with modern flow-of-control or data-structures – Language Extension • Add capabilities to a language • Equel: query language embedded in C
  • 27. Assemblers Compiler Assembler Source program Assembly program Relocatable machine code Loader/link-editor Absolute machine code
  • 28. Two-Pass Assembly • Simplest form of assembler • First pass – All the identifiers are stored in a symbol table – Storage is allocated • Second pass – Translates each operand code in the machine language MOV a, R1 ADD #2, R1 MOV R1, b Identifier Address a 0 b 4 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * Source After First Pass After 2nd Pass Instruction code Register Addressing Mode Address Relocation bit
  • 29. Loaders and Link-Editors • Convert the relocatable machine code into absolute machine code – Map the relocatable address • Place altered instructions and data in memory • Make a single program from several files of relocatable machine code – Several files of relocatable codes – Library files 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * 0001 01 00 00001111 0011 01 10 00000010 0010 01 00 00010011 Address space of data to be loaded starting at location L=00001111
  • 30. Multi Pass Compilers • Passes – Several phases of compilers are grouped in to passes – Often passes generate an explicit output file – In each pass the whole input file/source is processed Syntax Analyzer Lexical Analyzer Intermediate Code Generator • Semantic analysis
  • 31. How many passes? • Relatively few passes is desirable – Reading and writing intermediate files take time – It may require to keep the entire file in memory • One phase generate information in different order than that is needed by the next phase • Memory space is not trivial in some cases • Grouping into same pass incurs some problems – Intermediate code generation and code generation in the same pass is difficult • e.g. Target of ‘goto’ that jumps forward is now known • ‘Backpatching’ can be a remedy
  • 32. Issues Driving Compiler Design • Correctness • Speed (runtime and compile time) – Degrees of optimization – Multiple passes • Space • Feedback to user • Debugging
  • 33. Other Applications • In addition to the development of a compiler, the techniques used in compiler design can be applicable to many problems in computer science. – Techniques used in a lexical analyzer can be used in text editors, information retrieval system, and pattern recognition programs. – Techniques used in a parser can be used in a query processing system such as SQL. – Many software having a complex front-end may need techniques used in compiler design. • A symbolic equation solver which takes an equation as input. That program should parse the given input equation. – Most of the techniques used in compiler design can be used in Natural Language Processing (NLP) systems.