SlideShare a Scribd company logo
Compilers
Compilers
Compiler is a translator program that translates a program written in (HLL) the
source program and translate it into an equivalent program in (MLL) the target
program. As an important part of a compiler is error showing to the
programmer.
Executing a program written n HLL programming language is basically of two
parts. The source program must first be compiled and translated into a object
program. Then the resulting object program is loaded into a memory executed
Compilers
A compiler bridges the semantic gap between a programming language
domain and an execution domain and generates a target program.the
target program may be a machine language program or an object
module.
TYPES OF COMPILERS
Based on the specific input it takes and the output it produces, the Compilers
can be classified into the following types;
●
Traditional Compilers(C, C++, Pascal): These Compilers convert a source
program in a HLL into its equivalent in native machine code or object code.
●
Interpreters(LISP, SNOBOL, Java1.0): These Compilers first convert
Source code into intermediate code, and then interprets (emulates) it to its
equivalent machine code.
●
Cross-Compilers: These are the compilers that run on one machine and
produce code for another machine
●
Incremental Compilers: These compilers separate the source into user defined–
steps; Compiling/recompiling step- by- step; interpreting steps in a given order
●
Converters (e.g. COBOL to C++): These Programs will be compiling from one
high level language to another.
●
Just-In-Time (JIT) Compilers (Java, Micosoft.NET): These are the runtime
compilers from intermediate language (byte code, MSIL) to executable code or
native machine code. These perform type –based verification which makes the
executable code more trustworthy
●
Ahead-of-Time (AOT) Compilers (e.g., .NET ngen): These are the pre-
compilers to the native code for Java and .NET
●
Binary Compilation: These compilers will be compiling object code of one
platform into object code of another platform.
PHASES OF A COMPILER
Due to the complexity of compilation task, a Compiler typically proceeds
in a Sequence of compilation phases. The phases communicate with
each other via clearly defined interfaces. Generally an interface contains
a Data structure (e.g., tree), Set of exported functions. Each phase works
on an abstract intermediate representation of the source program, not
the source program text itself (except the first phase).
Compiler Phases are the individual modules which are chronologically
executed to perform their respective Sub-activities, and finally integrate
the solutions to give target code.
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
LEXICAL ANALYZER (SCANNER):
The Scanner is the first phase that works as interface between the compiler and
the Source language program and performs the following functions:
●
Reads the characters in the Source program and groups them into a stream of
tokens in which each token specifies a logically cohesive sequence of
characters, such as an identifier , a Keyword , a punctuation mark, a multi
character operator like := .
●
The character sequence forming a token is called a lexeme of the token.
●
The Scanner generates a token-id, and also enters that identifiers name in the
Symbol table if it doesn‘t exist.
●
Also removes the Comments, and unnecessary spaces.
The format of the token is < Token name, Attribute value>
SYNTAX ANALYZER (PARSER)
The Parser interacts with the Scanner, and its subsequent
phase Semantic Analyzer and performs the following functions:
●
Groups the above received, and recorded token stream into syntactic
structures, usually into a structure called Parse Tree whose leaves are
tokens.
●
The interior node of this tree represents the stream of tokens that
logically belongs together.
●
It means it checks the syntax of program elements
SEMANTIC ANALYZER
This phase receives the syntax tree as input, and checks the semantically
correctness of the program. Though the tokens are valid and syntactically
correct, it may happen that they are not correct semantically. Therefore the
semantic analyzer checks the semantics (meaning) of the statements formed.
●
The Syntactically and Semantically correct structures are produced here in
the form of a Syntax tree or DAG or some other sequential representation
like matrix
INTERMEDIATE CODE GENERATOR(ICG)
This phase takes the syntactically and semantically correct structure as
input, and produces its equivalent intermediate notation of the source
program. The Intermediate Code should have two important properties
specified below:
●
It should be easy to produce,and Easy to translate into the target
program. Example intermediate code forms are:
●
Three address codes,
●
Polish notations, etc.
CODE OPTIMIZER
This phase is optional in some Compilers, but so useful and beneficial in
terms of saving development time, effort, and cost. This phase performs
the following specific functions:
●
Attempts to improve the IC so as to have a faster machine code.
Typical functions include –Loop Optimization, Removal of redundant
computations, Strength reduction, Frequency reductions etc.
●
Sometimes the data structures used in representing the intermediate
forms may also be changed.
CODE GENERATOR
This is the final phase of the compiler and generates the target code,
normally consisting of the relocatable machine code or Assembly code or
absolute machine code.
●
Memory locations are selected for each variable used, and assignment of
variables to registers is done.
●
Intermediate instructions are translated into a sequence of machine
instructions.
The Compiler also performs the Symbol table management and Error handling
throughout the compilation process. Symbol table is nothing but a data structure
that stores different source language constructs, and tokens generated during
the compilation. These two interact with all phases of the Compiler.
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
OVER VIEW OF LEXICAL ANALYSIS
●
To identify the tokens we need some method of describing the possible
tokens that can appear in the input stream. For this purpose we introduce
regular expression, a notation that can be used to describe essentially all the
tokens of programming language.
●
Secondly , having decided what the tokens are, we need some mechanism to
recognize these in the input stream. This is done by the token recognizers,
which are designed using transition diagrams and finite automata.
ROLE OF LEXICAL ANALYZER
The main task of the lexical analyzer is to read the input characters of the
source program, group them into lexemes, and produce as output tokens for
each lexeme in the source program. This stream of tokens is sent to the parser
for syntax analysis. It is common for the lexical analyzer to interact with the
symbol table as well. When the lexical analyzer discovers a lexeme constituting
an identifier, it needs to enter that lexeme into the symbol table
Token
Token is a sequence of characters that can be treated as a single logical
entity. The token name is an abstract symbol representing a kind of
lexical unit, e.g., a particular keyword, or a sequence of input characters
denoting an identifier. The token names are the input symbols that
the parser processes.
Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern
●
A set of strings in the input for which the same token is produced as
output. This set of strings is described by a rule called a pattern
associated with the token.In the case of a keyword as a token, the
pattern is just the sequence of characters that form the Keyword.
●
A pattern is a rule describing the set of lexemes that can represent a
particular token in source program
●
[a-zA-Z][a-zA-Z0-9]* identifier pattern
Lexeme
A lexeme is a sequence of characters in the source program that is
matched by the pattern for a token.
A lexeme is a sequence of characters in the source program that
matches the pattern for a token and is identified by the lexical analyzer
as an instance of that token.
Example: In the following C language statement ,
printf ("Total = %dn‖, score) ;
both printf and score are lexemes matching the pattern for token id, and
"Total = %dn‖ is a lexeme matching literal [or string]
Input Buffering
The lexical analyzer scans the input from left to right one character at a
time. It uses two pointers begin ptr(bp) and forward ptr(fp) to keep track
of the pointer of the input scanned.
input buffering is an important concept in compiler design that refers to
the way in which the compiler reads input from the source code. In many
cases, the compiler reads input one character at a time, which can be a
slow and inefficient process. Input buffering is a technique that allows the
compiler to read input in larger chunks, which can improve performance
and reduce overhead.
The basic idea behind input buffering is to read a block of input from the
source code into a buffer, and then process that buffer before reading
the next block. The size of the buffer can vary depending on the specific
needs of the compiler and the characteristics of the source code being
compiled.
LEX the Lexical Analyzer generator
Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool
is referred to as the Lex language and the tool itself is the Lex compiler.Lex
compiler transforms the input patterns into a transition diagram and generates
code, in a file called lex .yy .c, it is a c program given for C Compiler, gives the
Object code
The declarations section : includes declarations of variables, manifest
constants (identifiers declared to stand for a constant, e.g., the name of a
token), and regular definitions. It appears between %{. . .%}
In the Translation rules section, We place Pattern Action pairs where
each pair have the form
Pattern {Action}
The auxiliary function definitions section includes the definitions of
functions used to install identifiers and numbers in the Symbol tale
LEX Program Example:
%{
/* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN,
ELSE,ID, NUMBER,
RELOP */
%}
LT LE.... for relational operators for <,><= etc
IF then for conditional statement
ID for identifiers(variable name)
NUMBER for numeric literals
RELOP for relational operators
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
Recognition of tokens:
Recognition of tokens is the process of identifying and classifying the basic
symbols or building blocks of a programming language, such as keywords,
identifiers, literals, and symbols
Token Recognition Techniques_
1. *Regular Expressions*: Used to define patterns for matching tokens, such as
keywords, identifiers, and literals.
2. *Finite State Machines*: Used to recognize tokens by traversing a finite
state machine, where each state corresponds to a specific token.
3. *Lexical Analysis*: Involves scanning the input stream and identifying tokens
based on their syntax and semantics.
Token Recognition Process_
1. *Scanning*: The input stream is scanned character by character.
2. *Pattern Matching*: The scanner applies regular expressions or finite
state machines to match the input characters against known token
patterns.
3. *Token Identification*: When a match is found, the scanner identifies
the token and returns its type and value.
4. *Error Handling*: If the input stream contains invalid characters or
token patterns, the scanner reports an error.
Token Types_
1. *Keywords*: Reserved words with special meanings, such as "if,"
"while," or "function."
2. *Identifiers*: Names given to variables, functions, or labels.
3. *Literals*: Values that are represented directly in the code, such as
numbers, strings, or booleans.
4. *Symbols*: Special characters used in the programming language,
such as operators (+, -, *, /), separators (,), or punctuation (.).
compiler.pdfljdvgepitju4io3elkhldhyreyio4uw
Compilers
c
Compilers
c
Compilers
c

More Related Content

PDF
An Introduction to the Compiler Designss
PPTX
Compiler Design Introduction With Design
PPTX
Chapter 1.pptx
PPTX
CD U1-5.pptx
PPTX
1._Introduction_.pptx
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPTX
Structure of the compiler
PDF
Compiler_Lecture1.pdf
An Introduction to the Compiler Designss
Compiler Design Introduction With Design
Chapter 1.pptx
CD U1-5.pptx
1._Introduction_.pptx
COMPILER CONSTRUCTION KU 1.pptx
Structure of the compiler
Compiler_Lecture1.pdf

Similar to compiler.pdfljdvgepitju4io3elkhldhyreyio4uw (20)

PPT
Compiler Design in Computer Applications
PPTX
Plc part 2
PPTX
ppt_cd.pptx ppt on phases of compiler of jntuk syllabus
DOCX
Compiler Design
PPTX
COMPILER DESIGN PPTS.pptx
PDF
design intoduction of_COMPILER_DESIGN.pdf
PDF
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE
PDF
COMPILER DESIGN Engineering learinin.pdf
DOCX
Compiler Design Material
DOCX
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
PDF
Chapter#01 cc
PDF
Compiler design Introduction
PPTX
The Phases of a Compiler
PPTX
Phases of Compiler.pptx
PPT
Unit1.ppt
PPTX
role of lexical anaysis
PPTX
PDF
Structure of a Compiler, Compiler and Interpreter, Lexical Analysis: Role of ...
PPTX
Introduction to Compilers
PDF
Lecture 01 introduction to compiler
Compiler Design in Computer Applications
Plc part 2
ppt_cd.pptx ppt on phases of compiler of jntuk syllabus
Compiler Design
COMPILER DESIGN PPTS.pptx
design intoduction of_COMPILER_DESIGN.pdf
3_1_COMPILER_DESIGNGARGREREGREGREGREGREGRGRERE
COMPILER DESIGN Engineering learinin.pdf
Compiler Design Material
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
Chapter#01 cc
Compiler design Introduction
The Phases of a Compiler
Phases of Compiler.pptx
Unit1.ppt
role of lexical anaysis
Structure of a Compiler, Compiler and Interpreter, Lexical Analysis: Role of ...
Introduction to Compilers
Lecture 01 introduction to compiler
Ad

More from abhinandpk2405 (20)

PDF
process.pdfzljwiyrouyaeutoaetodtusiokklhh
PDF
threads (1).pdfmjlkjfwjgliwiufuaiusyroayr
PPTX
Complexity Classes.pptxfhasfuhaikfuahikhk
PPTX
2.Cache Memory.pptxoigeyu49-gasdihurovhvhd;oig
PPTX
Controlling I.pptxkosgpwoywpooiptiewpito
PPTX
linux unit 4 (2).pptxjiy8t7r7iguyguyy888
PPTX
LINUX M1 P4 notes.pptxgyfdes e4e4e54v 4
PPTX
Linux unit 2 part 3 notes.pptxl;lk;l; k
PPTX
randomaccess.pptxdfghjkoigyrsreuitttrdok
PPTX
Command line arguments & This keyword.pptx
PPTX
Efficiency,Perfomance& (1)studyhihhu.pptx
PPTX
Marketing Strategyyguigiuiiiguooogu.pptx
PPTX
Raid structure os.pptxmbj;fdjhlljtzejtjdfi
PPTX
QueueUsingArray-1.pptxnansbsbssnsnxbxbhs
PDF
Microprocessor module 4.pdfbabssbabanjxnsb
PPTX
KERNEL_I[1].pptxhbffffgbbbbbggg ffffvbbbhhhm
PPTX
ssosnnnnnnnnlkkkkkkkkkkkkkkkkkkkkkk.pptx
PPTX
Types of Operating Systemdddddddddd.pptx
PPTX
hhtp (3).pptx hyper text transfer protocol
PPTX
topologies abhi.pptxtopologiessssssssssd
process.pdfzljwiyrouyaeutoaetodtusiokklhh
threads (1).pdfmjlkjfwjgliwiufuaiusyroayr
Complexity Classes.pptxfhasfuhaikfuahikhk
2.Cache Memory.pptxoigeyu49-gasdihurovhvhd;oig
Controlling I.pptxkosgpwoywpooiptiewpito
linux unit 4 (2).pptxjiy8t7r7iguyguyy888
LINUX M1 P4 notes.pptxgyfdes e4e4e54v 4
Linux unit 2 part 3 notes.pptxl;lk;l; k
randomaccess.pptxdfghjkoigyrsreuitttrdok
Command line arguments & This keyword.pptx
Efficiency,Perfomance& (1)studyhihhu.pptx
Marketing Strategyyguigiuiiiguooogu.pptx
Raid structure os.pptxmbj;fdjhlljtzejtjdfi
QueueUsingArray-1.pptxnansbsbssnsnxbxbhs
Microprocessor module 4.pdfbabssbabanjxnsb
KERNEL_I[1].pptxhbffffgbbbbbggg ffffvbbbhhhm
ssosnnnnnnnnlkkkkkkkkkkkkkkkkkkkkkk.pptx
Types of Operating Systemdddddddddd.pptx
hhtp (3).pptx hyper text transfer protocol
topologies abhi.pptxtopologiessssssssssd
Ad

Recently uploaded (20)

PDF
Trump Administration's workforce development strategy
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Yogi Goddess Pres Conference Studio Updates
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
RMMM.pdf make it easy to upload and study
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Trump Administration's workforce development strategy
Weekly quiz Compilation Jan -July 25.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Yogi Goddess Pres Conference Studio Updates
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
RMMM.pdf make it easy to upload and study
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Complications of Minimal Access Surgery at WLH
What if we spent less time fighting change, and more time building what’s rig...
Chinmaya Tiranga quiz Grand Finale.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Paper A Mock Exam 9_ Attempt review.pdf.
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc

compiler.pdfljdvgepitju4io3elkhldhyreyio4uw

  • 2. Compilers Compiler is a translator program that translates a program written in (HLL) the source program and translate it into an equivalent program in (MLL) the target program. As an important part of a compiler is error showing to the programmer. Executing a program written n HLL programming language is basically of two parts. The source program must first be compiled and translated into a object program. Then the resulting object program is loaded into a memory executed
  • 3. Compilers A compiler bridges the semantic gap between a programming language domain and an execution domain and generates a target program.the target program may be a machine language program or an object module.
  • 4. TYPES OF COMPILERS Based on the specific input it takes and the output it produces, the Compilers can be classified into the following types; ● Traditional Compilers(C, C++, Pascal): These Compilers convert a source program in a HLL into its equivalent in native machine code or object code. ● Interpreters(LISP, SNOBOL, Java1.0): These Compilers first convert Source code into intermediate code, and then interprets (emulates) it to its equivalent machine code. ● Cross-Compilers: These are the compilers that run on one machine and produce code for another machine
  • 5. ● Incremental Compilers: These compilers separate the source into user defined– steps; Compiling/recompiling step- by- step; interpreting steps in a given order ● Converters (e.g. COBOL to C++): These Programs will be compiling from one high level language to another. ● Just-In-Time (JIT) Compilers (Java, Micosoft.NET): These are the runtime compilers from intermediate language (byte code, MSIL) to executable code or native machine code. These perform type –based verification which makes the executable code more trustworthy ● Ahead-of-Time (AOT) Compilers (e.g., .NET ngen): These are the pre- compilers to the native code for Java and .NET ● Binary Compilation: These compilers will be compiling object code of one platform into object code of another platform.
  • 6. PHASES OF A COMPILER Due to the complexity of compilation task, a Compiler typically proceeds in a Sequence of compilation phases. The phases communicate with each other via clearly defined interfaces. Generally an interface contains a Data structure (e.g., tree), Set of exported functions. Each phase works on an abstract intermediate representation of the source program, not the source program text itself (except the first phase). Compiler Phases are the individual modules which are chronologically executed to perform their respective Sub-activities, and finally integrate the solutions to give target code.
  • 8. LEXICAL ANALYZER (SCANNER): The Scanner is the first phase that works as interface between the compiler and the Source language program and performs the following functions: ● Reads the characters in the Source program and groups them into a stream of tokens in which each token specifies a logically cohesive sequence of characters, such as an identifier , a Keyword , a punctuation mark, a multi character operator like := . ● The character sequence forming a token is called a lexeme of the token. ● The Scanner generates a token-id, and also enters that identifiers name in the Symbol table if it doesn‘t exist. ● Also removes the Comments, and unnecessary spaces. The format of the token is < Token name, Attribute value>
  • 9. SYNTAX ANALYZER (PARSER) The Parser interacts with the Scanner, and its subsequent phase Semantic Analyzer and performs the following functions: ● Groups the above received, and recorded token stream into syntactic structures, usually into a structure called Parse Tree whose leaves are tokens. ● The interior node of this tree represents the stream of tokens that logically belongs together. ● It means it checks the syntax of program elements
  • 10. SEMANTIC ANALYZER This phase receives the syntax tree as input, and checks the semantically correctness of the program. Though the tokens are valid and syntactically correct, it may happen that they are not correct semantically. Therefore the semantic analyzer checks the semantics (meaning) of the statements formed. ● The Syntactically and Semantically correct structures are produced here in the form of a Syntax tree or DAG or some other sequential representation like matrix
  • 11. INTERMEDIATE CODE GENERATOR(ICG) This phase takes the syntactically and semantically correct structure as input, and produces its equivalent intermediate notation of the source program. The Intermediate Code should have two important properties specified below: ● It should be easy to produce,and Easy to translate into the target program. Example intermediate code forms are: ● Three address codes, ● Polish notations, etc.
  • 12. CODE OPTIMIZER This phase is optional in some Compilers, but so useful and beneficial in terms of saving development time, effort, and cost. This phase performs the following specific functions: ● Attempts to improve the IC so as to have a faster machine code. Typical functions include –Loop Optimization, Removal of redundant computations, Strength reduction, Frequency reductions etc. ● Sometimes the data structures used in representing the intermediate forms may also be changed.
  • 13. CODE GENERATOR This is the final phase of the compiler and generates the target code, normally consisting of the relocatable machine code or Assembly code or absolute machine code. ● Memory locations are selected for each variable used, and assignment of variables to registers is done. ● Intermediate instructions are translated into a sequence of machine instructions. The Compiler also performs the Symbol table management and Error handling throughout the compilation process. Symbol table is nothing but a data structure that stores different source language constructs, and tokens generated during the compilation. These two interact with all phases of the Compiler.
  • 18. OVER VIEW OF LEXICAL ANALYSIS ● To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce regular expression, a notation that can be used to describe essentially all the tokens of programming language. ● Secondly , having decided what the tokens are, we need some mechanism to recognize these in the input stream. This is done by the token recognizers, which are designed using transition diagrams and finite automata.
  • 19. ROLE OF LEXICAL ANALYZER The main task of the lexical analyzer is to read the input characters of the source program, group them into lexemes, and produce as output tokens for each lexeme in the source program. This stream of tokens is sent to the parser for syntax analysis. It is common for the lexical analyzer to interact with the symbol table as well. When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that lexeme into the symbol table
  • 20. Token Token is a sequence of characters that can be treated as a single logical entity. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes. Typical tokens are, 1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
  • 21. Pattern ● A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token.In the case of a keyword as a token, the pattern is just the sequence of characters that form the Keyword. ● A pattern is a rule describing the set of lexemes that can represent a particular token in source program ● [a-zA-Z][a-zA-Z0-9]* identifier pattern
  • 22. Lexeme A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. Example: In the following C language statement , printf ("Total = %dn‖, score) ; both printf and score are lexemes matching the pattern for token id, and "Total = %dn‖ is a lexeme matching literal [or string]
  • 23. Input Buffering The lexical analyzer scans the input from left to right one character at a time. It uses two pointers begin ptr(bp) and forward ptr(fp) to keep track of the pointer of the input scanned. input buffering is an important concept in compiler design that refers to the way in which the compiler reads input from the source code. In many cases, the compiler reads input one character at a time, which can be a slow and inefficient process. Input buffering is a technique that allows the compiler to read input in larger chunks, which can improve performance and reduce overhead.
  • 24. The basic idea behind input buffering is to read a block of input from the source code into a buffer, and then process that buffer before reading the next block. The size of the buffer can vary depending on the specific needs of the compiler and the characteristics of the source code being compiled.
  • 25. LEX the Lexical Analyzer generator Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool is referred to as the Lex language and the tool itself is the Lex compiler.Lex compiler transforms the input patterns into a transition diagram and generates code, in a file called lex .yy .c, it is a c program given for C Compiler, gives the Object code
  • 26. The declarations section : includes declarations of variables, manifest constants (identifiers declared to stand for a constant, e.g., the name of a token), and regular definitions. It appears between %{. . .%} In the Translation rules section, We place Pattern Action pairs where each pair have the form Pattern {Action} The auxiliary function definitions section includes the definitions of functions used to install identifiers and numbers in the Symbol tale
  • 27. LEX Program Example: %{ /* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN, ELSE,ID, NUMBER, RELOP */ %} LT LE.... for relational operators for <,><= etc IF then for conditional statement ID for identifiers(variable name) NUMBER for numeric literals RELOP for relational operators
  • 30. Recognition of tokens: Recognition of tokens is the process of identifying and classifying the basic symbols or building blocks of a programming language, such as keywords, identifiers, literals, and symbols Token Recognition Techniques_ 1. *Regular Expressions*: Used to define patterns for matching tokens, such as keywords, identifiers, and literals. 2. *Finite State Machines*: Used to recognize tokens by traversing a finite state machine, where each state corresponds to a specific token. 3. *Lexical Analysis*: Involves scanning the input stream and identifying tokens based on their syntax and semantics.
  • 31. Token Recognition Process_ 1. *Scanning*: The input stream is scanned character by character. 2. *Pattern Matching*: The scanner applies regular expressions or finite state machines to match the input characters against known token patterns. 3. *Token Identification*: When a match is found, the scanner identifies the token and returns its type and value. 4. *Error Handling*: If the input stream contains invalid characters or token patterns, the scanner reports an error.
  • 32. Token Types_ 1. *Keywords*: Reserved words with special meanings, such as "if," "while," or "function." 2. *Identifiers*: Names given to variables, functions, or labels. 3. *Literals*: Values that are represented directly in the code, such as numbers, strings, or booleans. 4. *Symbols*: Special characters used in the programming language, such as operators (+, -, *, /), separators (,), or punctuation (.).