System design using HDL - Module 3

SYSTEM DESIGN USING HDL (ECE43)
#
Digital system design using Verilog,
Charles Roth, Lizy Kurian John, Byeong Kil Lee,
1st Edition, 2016, Cengage Learning
1 2.1, 2.2, 2.3 - 2.8, 2.11, 2.13 - 2.15
2 2.9, 2.10, 2.12, 2.16 - 2.19, 8.1, 8.2
3 3.1 - 3.4, 5.1, 5.2.1, 5.3
4 4.1 - 4.5, 4.8, 4.6, 4.7, 4.9, 4.11
5 6.1 - 6.5, 6.7 - 6.12
INTRODUCTION TO
PROGRAMMABLE
LOGIC DEVICES

Brief overview of
Programmable Logic
Devices

Need of programmable logic devices:
• Implementation of a significant amount of functionality
into one physical chip.
• Removes the need for multiple off-the-shelf devices.
• Easy reprogramming, therefore increased ability to
change the design.
• Easier to change the design in case of errors or change
in the design specifications.

Programmable logic
Factory programmable devices
ROM
(Read only
memory)
MPGA (Mask
Programmable
Gate Array)
Field Programmable Devices
SPLD (Simple
Programmable
Logic Device)
CPLD
(Complex
programmable
Logic Device)
FPGA (Field
Programmable
Gate Array)
GAL (Generic
Array Logic)
PAL
(Programmable
Array Logic)
PLA
(Programmable
Logic Array)
PROM
(Programmable
Read Only
Memory)

• Factory Programmable Devices: Generic devices that
are programmed at the factory to meet the Customer’s
requirements. Programming can be done only once.
Examples: ROM, MPGA
• ROM: Primarily meant for memory, but can be used to
implement combinational circuits.
• MPGA: Also called as gate arrays, they have been a
popular technology for creating ASIC.

• Field Programmable Devices: Devices that are
programmed by the user, rather than in factory.
Factor SPLD CPLD FPGA
Density
Low (few
hundred gates)
Low to medium
(500 to 12,000 gates)
Medium to high (3000 to
5,000,000 gates)
Timing Predictable Predictable Unpredictable
Cost Low Low to Medium Medium to high
Major
Vendors
(with device
families)
Lattice
(GAL16LV8,
GAL22V10),
Cypress
(PALCE16V8),
AMD (22V10)
Xilinx (CoolRunner,
XC9500),
Altera (MAX)
Xilinx (Kintex, Artix, Virtex,
Spartan),
Altera (Stratix, Cyclone, Arria),
Lattice (Mach, ECP),
Microsemi (Axcelerator, Fusion)

• PLA: It consists of programmable AND array &
programmable OR array.
• PAL: It is a special case of PLA, where OR array is
fixed and only AND array is programmable. It can also
contain flip-flops.
• Earlier programmable devices were only one time
programmable (OTP, PROM); later on, the advent of
Ultraviolet and electronically erasable technology
gradually led to re-programmable logic devices.

CMOS Electrically Erasable PLDs:
• It contains macroblocks with array of gates, flip-flops,
multiplexers, or standard building blocks.
• PLAs, PALs, GALs & PLDs are collectively referred as
SPLDs.
GAL (Generic Array Logic):
• Lattice semiconductor created similar devices with easy
reprogrammability, and called their line of devices as
GALs.

• ROM consists of an array of semiconductor
devices that are interconnected to store an
array of binary data.
• Data stored in ROM can be read out when
required, but cannot be changed under
normal operating conditions.
• Output pattern stored in ROM is called a
word.
• Each input serves as address, which selects
one of the stored words as output.

• Size of ROM is given as follows: 2n X m, where “n”
represents the number of input lines and “m” represents
the width of output lines.

• A ROM’s size, with 4-bit output line and 3-bit input line
can be written as, 8 words X 4 bits.
• In the following example, when ABC=010, F0F1F2F3=0111.

• ROM consists of a decoder and a memory array. When a
pattern of 1’s and 0’s is applied as input to the decoder,
any one of its output becomes 1, which in turn selects
that particular stored word from the array.
• Types of ROM:
 Mask programmable ROM
 PROM (user programmable)
 EPROM (UV erasure)
 EEPROM (Electrically erasable)
 Flash memory

• Mask programmable ROM: Data array is permanently stored
during manufacture, by selectively including or omitting the
switching elements, in the cross-point switch matrix. Special
masks are used for this purpose, which is an expensive process.
• PROM: One time, user programmable (fuse / antifuse).
• EPROM: Programmer uses voltage pulses to store electronic
charges in the memory array location. UV light is used for the
erasure of complete data that is stored.
• EEPROM: Uses electronic pulses for erasure of data. It can be
reprogrammed only 100 to 1000 times.
• Flash memories: They have built-in programming and erasure
capabilities, and data can be written while in-circuit, without
needing any separate programmer.

• ROM can implement any combinational circuit, by
storing the outputs for all of the input combinations.
Hence, this method is also called as LUT method.
Ex-1: Implement a 2 bit adder using ROM:
Solution: Input : two 2-bit numbers.
Output : Sum having 3-bits.
• Can be implemented with 16 X 3 ROM.

Data to be stored in memory:
0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6

Ex-2: Compute the size of the ROM to implement an 8:3
priority encoder.
There will be 256 entries in the ROM.
Size of the ROM: 28 X 4

Ex-3: Implement the following state machine, of BCD to
Excess-3 code converter, using ROM.
PS NS Z
X=0 X=1 X=0 X=1
S0 S1 S2 1 0
S1 S3 S4 1 0
S2 S4 S4 0 1
S3 S5 S5 0 1
S4 S5 S6 1 0
S5 S0 S0 0 1
S6 S0 - 1 -

• Sequential circuit is designed
using ROM and flip-flops.
• ROM is used to realize the output
functions and the next state
equations.
• The state of the circuit is stored in
a register of D flip-flops, and fed
back to the input of the ROM.
• To realize the given Mealy
machine, a ROM and 3 flip-flops
are necessary.

• The ROM will generate
the next state equations
and output Z, from the
present states and input X.
• Q1, Q2, Q3 and X are
connected to the address
lines, with X connected to
the LSB.
• Contents of ROM are:
3, 4, 6, 8, 9, 8, A, B,
B, C, 0, 1, 1, 0, 0, 0.

Programmable Logic
Array (PLA)

• PLA with n-input lines and
m-output lines, can realize
m-functions of n-variables.
• When compared to ROM,
instead of decoder, AND
array is used to realize the
product terms.
• Later on, OR array is used
to sum the product terms.

Ex-4: Using PLA, Realize the following functions:
F0 = ⅀m(0,1,4,6) = (A'B'+AC')
F1 = ⅀m(2,3,4,6,7) = (B+AC')
F2 = ⅀m(0,1,2,6) = (A'B'+BC')
F3 = ⅀m(2,3,5,6,7) = (AC+B)
Solution: There are 3 inputs: A, B & C. There are 5 distinct
product terms in the 4 outputs.
Unlike ROM, in a PLA implementation, the product terms
can be shared among the functions.

F0 =⅀m(0,1,4,6)=(A'B'+AC')
F1 =⅀m(2,3,4,6,7)=(B+AC')
F2 =⅀m(0,1,2,6)=(A'B'+BC')
F3 =⅀m(2,3,5,6,7)=(AC+B)

• Instead of AND-OR logic, PLA
may use NOR-NOR logic.
• 2-input NOR gate can be built
using nMOS transistors:
• NOR-NOR with inverters at
input and output = AND-OR.
F0 = ⅀m(0,1,4,6) = (A'B'+AC')
F1 = ⅀m(2,3,4,6,7) = (B+AC')
F2 = ⅀m(0,1,2,6) = (A'B'+BC')
F3 = ⅀m(2,3,5,6,7) = (AC+B)

Ex-5: Using PLA, Realize the following functions:
F1 = ⅀m(2,3,5,7,8,9,10,11,13,15)
F2 = ⅀m(2,3,5,6,7,10,11,14,15)
F3 = ⅀m(6,7,8,9,13,14,15)
Solution: After minimization, the simplified functions are :
F1 = ⅀m(2,3,5,7,8,9,10,11,13,15) = bd+b'c+ab'
F2 = ⅀m(2,3,5,6,7,10,11,14,15) = c+a'bd
F3 = ⅀m(6,7,8,9,13,14,15) = bc+ab'c'+abd
Here, the PLA requires 8 different product terms.

• To reduce the number of rows
in PLA, these functions can be
reorganized using K-map.
F1 = a'bd+abd+b'c+ab'c'
F2 = b'c+bc+a'bd
F3 = bc+ab'c'+abd
There are only 5 different
product terms, and , the
PLA table has only 5 rows.

• In case of PLA, unlike
memory, the number of terms
in each equation is not
important, as the size of PLA
does not depend on the number
of terms within an equation.
• To reduce the number of rows
in PLA, instead of using K-
maps, the Espresso algorithm
can be used. This is a complex
algorithm, which is used as
logic minimization algorithm
for VLSI synthesis.
F1 = a'bd+abd+b'c+ab'c'
F2 = b'c+bc+a'bd
F3 = bc+ab'c'+abd
The PLA implementation has 4
inputs, 5 product terms & 3 outputs.

Programmable
Array Logic
(PAL)

• It is a special case of PLA, in which AND array is programmable
and OR array is fixed.
• Due to this reason, PAL is less expensive than PLA, and is easier
to program as well.
• The following figure represents a segment of an un-programmed
PAL, along with the input buffers which contain two outputs.

Ex-6: Implement I1I2'+I1'I2.
Solution:
• As OR gates cannot be
programmed, AND
terms cannot be
shared among two or
more OR gates.
• Typical PALs have 10
to 20 inputs, and 2 to
10 outputs, with 2 to 8
AND gates driving
each OR gate.

Ex-7: Implement a full-adder using PAL.
Solution:
SUM = X'Y'Cin+X'YCin'+XY'Cin'+XYCin
COUT = XY+YCin+XCin

• PALs were made available that contained D flip-flops as
well, and were called as “sequential PALs”.
Ex-8: Implement Q+ = D = A'BQ'+AB'Q.
Solution:

PLD/GAL
(Programmable Logic Device
/ Generic Array Logic)

• PALs and PLAs are good for implementing small
circuitry. But, they are not re-programmable.
• When they are made as erasable/reprogrammable, by
incorporating Flash memory, such PALs are often
referred as PLDs/GALs.
• An example is 22CEV10, which is a CMOS electrically
erasable PLD, that can realize both combinational as
well as sequential circuits.

• 22CEV10 contains:
 12 declared input pins
 10 pins that can be programmed as input / output
 Programmable AND array (8 till 16 gates feeding each OR gate)
 10 OR gates, each of which drives an output macrocell
 10 D Flip-flops, with asynchronous reset and synchronous preset
 Each macrocell contains the D Flip-flop, multiplexer,
and additional programmability at the output
• 22CEV10 => 22 pins out of which 10 are bidirectional

System design using HDL - Module 3

• Each macrocell has 2 programmable interconnect bits: S1 & S0.
• When the particular bit is programmed, it is connected to 0 V.
• Erasing that bit disconnects it from 0 V, and it floats at logic-1.
S1 S0 Output
0 0 D Flip-flop output
0 1 D Flip-flop output inverted
1 0 OR output
1 1 OR output inverted

• CAD programs are available for PAL/PLD
programming. These programs accept logic equations,
truth tables, state graphs or state tables as inputs.
• They automatically generate the required bit patterns,
which can be downloaded into a PLD programmer,
which will create the necessary connections.
• PALASM (Programmable Array Logic ASsembler for
Military) from MMI & AMD, and ABEL (Advanced
Boolean Expression Language) from DATA I/O are the
two popular languages that are used for programming.

CPLD
(Complex Programmable
Logic Device)

• This is a programmable IC which is equivalent to several
PLDs in the same silicon chip. Typically a CPLD comprises
of 500 to 10,000 logic gates.
• It consists of a number of PAL-like logic blocks, along with a
programmable interconnect. The interconnect matrix is
implemented using crossbar switch. Even though it is
expensive, it results in predictable timing.
• CPLDs are electronically erasable and reprogrammable,
and hence are sometimes referred to as EPLDs (Erasable
Programmable Logic Device).

 Typically a CPLD contains a
number of macrocells, that are
grouped into function blocks.
 Each macrocell contains a flip-
flop and an OR-gate, and the
macrocell has its inputs
connected to an AND gate array.
 The major manufacturers of
CPLD are: Xilinx, Altera,
Lattice, Cypress and Atmel.

AN EXAMPLE
XILINX COOLRUNNER
(XCR3046XL)

• This CPLD has 4 function blocks, and each block has 16
associated macrocells. A function block is a programmable
AND-OR array, which is configured as a PLA.
• Each macrocell contains a flip-flop and additional
multiplexers, that route the signals from the function blocks
to the I/O blocks or to the interconnect array.
• The interconnect array selects signals from the macrocell
outputs and the I/O blocks, and connects them back to
function blocks. Thus, a signal generated from any function
block can be used as an input to any other function block.

• Initially, two D-inputs
have to be generated for
the Flip-flops.
• Later on, two outputs
(Z1, Z2) have to be
generated, by utilizing
the Flip-flop outputs.
• Hence, four macrocells
are required for the
implementation of the
required Mealy machine.
Ex-9: Implement a Mealy
sequential machine with
2 inputs and 2 outputs.

Ex-10: Implement a parallel adder with accumulator.

• The accumulator register
needs one FF for each bit.
• But that bit also needs to
generate the sum and
carry bits corresponding
to that particular bit.
• Hence, each bit of an
adder requires two
macrocells, one for the
sum and the accumulator,
and the other for the carry.

FPGA
(Field Programmable Gate Array)
They contain an array of identical logic blocks
with programmable interconnections.
User can program the functions realized by
each logic block, and can flexibly program
the connections between them.

ADVANTAGES DISADVANTAGES
The time-to-market of FPGA
product is much much lesser.
FPGAs are less dense than
MPGAs.
With FPGA, it is easier to correct
the mistakes in the design.
FPGAs are slower, due to the RC
delay in programmable points.
The prototyping cost is much
reduced, with the usage of FPGA.
Interconnect delays in FPGAs are
unpredictable.
At low volumes, FPGAs are
cheaper than MPGAs.
Programming overhead is much
higher, because of the resources.
MPGA versus FPGA

When compared to CPLD
the major advantage of FPGA is its
highly flexible programmable
interconnect, and due to this fact
itself, the major disadvantage is its
unpredictable interconnect delay.

FPGA typically contains three
programmable elements:
1. Programmable logic blocks
(Configurable Logic Blocks)
2. Programmable routing resources
3. Programmable I/O blocks

• Programmable logic blocks
• These are created by Muxes, LUTs, and AND-OR arrays.
• Programming refers to: a) Changing the contents of LUT,
b) Changing the I/O signals to the Muxes, c) Selecting or not
selecting the particular gates in the AND-OR arrays.
• Programmable interconnect
• For making or breaking the specific connections.
• For connecting various blocks in the chip to each other.
• For connecting specific I/O pins to specific logic blocks.
• Programmable I/O blocks
• I/O pads can be programmed as i/p, o/p or bidirectional.
• They also can be programmed as inverting, non-inverting, tri-
state, slew rate adjustable, passive pull-up etc.

• Based on the topology in which the logic blocks and the
interconnect resources are distributed inside, there can
be four different basic architectures of FPGAs that are
in the market since 1980s:
• Matrix based architecture
• Row based architecture
• Hierarchical PLD architecture
• Sea-of-gates architecture
• Modern FPGAs that are in the market, contain special
purpose blocks including a microprocessor.
Architectures of FPGA

1. Matrix based architecture (e.g., Most Xilinx FPGAs)
• This architecture is also called as “symmetrical array”, and it contains 8X8
arrays in smaller chips, and 100X100 or larger arrays in larger chips.
• Routing is called two-dimensional channeled routing, since routing
resources are available in horizontal and vertical directions.

2. Row based architecture (e.g., some Microsemi FPGAs)
• The logic blocks are organized into rows, and hence, there are rows of logic
blocks, and rows of routing resources.
• Routing is called one-dimensional channeled routing, as the routing
resources are channeled between the rows.

3. Hierarchical PLD architecture (e.g., Altera APEX20, APEX II)
• At the lower level, the FPGAs contain clusters of logic blocks with localized
resources for interconnection.
• At the higher level, the global interconnect is used for interconnection
between the clusters of logic blocks.

4. Sea-of-gates architecture (e.g., Microsemi Fusion)
• FPGAs contain a large number of gates, and there is an interconnect
superimposed on the sea-of-gates.
• There are other terminologies such as sea-of-cells or sea-of-tiles, to indicate
the topology with a large number of logic blocks.

• The term “Programming technology”
is used to denote the technology by
which the programmability in an
FPGA is achieved, especially for the
programmable interconnect.
• Some of the techniques are:
• SRAM programming technology
• EPROM / EEPROM / Flash
programming technology
• Antifuse programming technology
FPGA Programming Technologies

SRAM Programming Technology
• As in the case of ROM, an SRAM can be used to store the
“configuration bits” for interconnection, in an LUT.
• e.g., Sixteen SRAM cells can implement any function of
four variables.
• The programmable interconnect can be achieved by
SRAM, in the following two ways:
• Pass transistor is used for connecting two points
• Routing matrices are implemented by using mux

Disadvantages of SRAM Programming Technology
1. Six transistors are required for every SRAM cell.
• e.g., if FPGA has 1 million programmable points, 6 million transistors are
required for achieving this programmability.
2. Since SRAM is volatile, all the contents are lost during power failure. This is a
serious setback when an FPGA is used in the final product.
• As a solution, EPROM can be used as “boot ROM”, to store the
configuration bits, and its contents can be transferred to SRAM whenever
power gets resumed.
Advantages of SRAM Programming Technology
1. As SRAM is a volatile memory, new contents can be written again and again,
thus providing flexibility during prototyping.
2. Fabrication steps for manufacturing SRAM are same as that for manufacturing
other logic cells.

EPROM / EEPROM / Flash Programming Technology
• Instead of SRAM, EPROM cells are used to control the programmable
interconnections. Each EPROM cell contains a MOSFET, which has two gates: Control
gate and Floating gate.
• The drain of the transistor can be connected to VDD by means of a pull-up resistor.
When a high voltage (10 - 13 V) is applied to the control gate, electrons get injected into
the floating gate, and the transistor turns OFF.
• The electrons remain trapped at the floating gate. The trapped negative charges can be
removed, by exposing the EPROM to UV light.

Disadvantages of EPROM Programming Technology
1. EPROM is slower than SRAM, because of the dual-gate structure.
2. While manufacturing, EPROMs require more processing steps than SRAMs.
3. EPROM based switches have high ON-resistance, and also have high static-
power-consumption.
4. For erasure, the EPROM chip has to be physically removed from the PCB.
 EEPROM is similar to EPROM, but removal of the gate charge can be done
electrically. Hence, for erasure, the chip need not be removed from PCB.
 The memory cells can be selectively erased and can be rewritten, and this does
not require any additional equipment.
 Flash is a form of EEPROM, in which a block of cells can be erased at once, by
applying a large voltage at the control gate, causing the electrons to pull off.
 By sensing the amount of current flow, each cell in Flash can store multiple bits
of information, which in turn depends on the number of trapped electrons.
 While writing bits into, Flash is faster than EEPROM, but slower than SRAM.

• Antifuse programming element changes from high resistance (open - OFF) to low
resistance (closed - ON), when a high voltage is applied.
• Antifuses are built by dielectric layers between N+ diffusion and polysilicon
layers, or by amorphous silicon in between metal layers.
Antifuse Programming Technology
Advantages:
• When compared to MOSFETs, the area
consumed by the antifuse is smaller.
• Antifuse based connections are faster than
SRAM / EPROM technologies.
Disadvantages:
• The antifuse connection is OTP.
• Because of this, design change is not possible.

Comparison of FPGA Programming Technologies
Programming
technology
Storage Programmability Area overhead Resistance Capacitance
SRAM Volatile
In-Circuit
reprogrammable
Large
Medium to
high
High
EPROM
Non-
volatile
Out-of-Circuit
reprogrammable
Small High High
EEPROM /
Flash
Non-
volatile
In-Circuit
reprogrammable
Medium to
large
High High
Antifuse
Non-
volatile
Not
reprogrammable
Small Low Low

• Manufacturers use different names to denote their logic blocks:
• Xilinx calls them as Configurable Logic Blocks (CLB).
• Microsemi calls them as VersaTiles.
• Altera calls them as Logic Elements (LE), and a group
of LEs is called as Logic Array Blocks(LABs).
• Mainly two types of logic blocks are used in FPGAs:
1. LUT based programmable logic blocks.
2. Mux based programmable logic blocks.
I. Programmable logic block architectures

• Look Up Table contains memory cells along with multiplexers.
• The output for each input combination is stored in memory cells.
• The input combination is used as control inputs to the multiplexer.
• For a 2-variable function, 4 memory cells and a 2:1 mux is required.
• For an n-input function, 2n memory cells and 2n :1 mux are required.

1. LUT (Look-Up Table) based Programmable logic block
• Each block contains two
LUT4 and two flip-flops.
• The LUT4 can generate any
one function of 4 variables.
• The flip-flop has chip enable,
set and reset inputs.
• A multiplexer is used to select
in between the combinational
and the latched version of the
LUT4 output.
• The multiplexer is controlled
by a bit stored in memory.

• Choosing X1 as LSB and X4 as
MSB, X4 input need not be used,
as F1 uses only 3 variables.
• To store the contents in the LUT,
the truth table of the function has
to be constructed.
• From the truth table, the contents
of LUT to implement the function
F1 will be {0,1,1,0,0,0,1,1}.
• As LUT4 contains 16 memory
cells for output, it is better to
store the other 8 bits as well,
irrespective of the status of X4.
• Thus, the contents of LUT are
{0,1,1,0,0,0,1,1,0,1,1,0,0,0,1,1}.
Ex-11: Implement the function A'B'C+A'BC'+AB, using LUT.

2. Multiplexer based Programmable logic block
• With LUT, it is not necessary to minimize the function, as the number of
terms in the function is not important (all o/p bits need to be stored).
• But LUT requires storage space. To save it, multiplexers along with
basic gates, can be used.

• As there are 3 variables, we can choose a 4:1 mux, which has 2 select lines.
• The truth table can be constructed, so as to define the output in terms of C.
• The mux select lines can be A & B, and the mux input lines can be connected in
accordance with the last column in the truth table.
A B C F1 Mux i/p in terms of C
0 0 0 0 C
0 0 1 1 C
0 1 0 1 C'
0 1 1 0 C'
1 0 0 0 0
1 0 1 0 0
1 1 0 1 1
1 1 1 1 1
Ex-12: Implement the function A'B'C+A'BC'+AB, using mux.

II. Programmable interconnect
1. General purpose interconnect The completely non-blocking
switch matrix is very expensive.
e.g., in a 4X4 matrix, out of 16
switches, only 4 switches are
utilized at any point of time.
Crosspoint switch matrix
6-way switch
To reduce the number of
multiple connections for a
single route, the crosspoint can
be configured as a 6-way
switch. But, this crosspoint is
more complicated than the
earlier one.
The interconnect in between the
logic blocks should provide flexible
interconnection in between the
rows and columns (e.g., row-
column, row-row, column-column).

2. Direct interconnect
Direct interconnect
to 4 neighbors
Special connections
to 8 neighbors
To reduce the delay
in the switch
matrix, many
FPGAs provide
direct connections
between the logic
blocks, by means of
dedicated switches.

3. Global
interconnect
lines
For high fan-out and low-skew clock
distribution, FPGAs provide routing lines that
span the entire width & height of the device.
When the clock is distributed to a few million
gates in the chip, the delay in the wire causes the
clock edges to arrive at different times at different
parts of the chip. This is called as “clock skew”,
which needs to be eradicated, for the faithful
functionality of the circuitry on the chip.

Interconnects in
row-based FPGAs
The previous interconnects discussed, are applicable to matrix-based
architecture, which has symmetrical arrays. For row-based architecture,
as it is one-dimensional, it has arrays of switches in the routing channel,
which is situated in between the logic blocks.
i) Non-segmented ii) Segmented
When the 3 connections required are x, y & z, they can be done in 2 ways: non-
segmented (full length track, faster), segmented (reduced resources, slower)
Example nets

II. Programmable I/O blocks
• I/O blocks on modern FPGAs allow the
use of a pin as true or inverted, direct or
latched, input or output, and so on.
• The I/O options can be selected by means
of the configuration memory cells,
indicated in the figure as “M”.
• The inversion is performed using an XOR
gate, and one memory-bit.
• The direction of the pin is decided using a
tri-state buffer, and its control can be
selected as active high or active low, using
another memory-bit.
• Similarly, the rate-of-change of output (slew rate), and the pull-up option (open drain, built-
in resistor), can be configured using the memory cells (SRAM, EEPROM / Flash, antifuse).

Dedicated Specialized
Components in FPGA
1. Dedicated memory: The embedded RAM, can be
used to implement the memory needs of the
circuit, that is being designed.
2. Dedicated Arithmetic Units: The custom
implementation of adders and multipliers inside
FPGA, is smaller and faster, than its counterpart
that is implemented using FPGA.
3. DSP Blocks: To support DSP applications, the
vendors provide the hardware inside the FPGA for
encryption/decryption, FFTs, FIR filters, IIR
filters, compression/decompression, and so forth.
4. Embedded Processors: This is a hybrid solution
where part of the design is in a programmable
processor (high flexibility), and the remaining part
is implemented in hardware (better performance).
5. Content Addressable Memory: This is a special
kind of memory in which the content, and not the
address, is used to search the memory.

1. Rapid Prototyping
• As FPGAs contain 5 million or more gates, many large real-world systems can prototyped very
quickly using a single FPGA.
• If a single FPGA will not suffice, multiple FPGAs can be interconnected to realize larger systems,
by plugging the boards into a backplane.
2. Final Products in Medium Speed Systems
• Circuits realized using FPGAs typically operate in the range of 150-200 MHz. If this speed is
sufficient, FPGAs can be used for the final product, instead of the prototype.
• In the final product, if enhancements to the system are required, they can be done as software
updates, rather than hardware changes.
3. Glue Logic
• This is a digital circuitry that works as an interface between two different logic modules.
• Using SRAM FPGAs, the new interface logic can be implemented on the same FPGA.
4. Hardware Accelerators / Coprocessors
• For a software application, an FPGA can be used as a coprocessor, so that it is used to implement a
key kernel, and thus the application can be accelerated.
• Examples of such applications are - pattern matching, computer architecture simulator, emulator
boards, hardware testing boards, and so on.
Applications of FPGA

Design Flow for FPGACreate a behavioral, RTL or structural model of the
design using HDL
Simulate and Debug the Design
Synthesize the design targeting the desired device
Run a mapping of the design, that will break the
logic diagram into pieces that will fit into the CLBs
Run the place-and-route program, to place the logic
blocks in FPGA and to route the interconnections
Run a program that will generate the bit pattern
that is necessary to program the FPGA
Download the bit pattern into the configuration cells
and test the operation of FPGA
1 & 2 3&4&5 6 & 7
1
2
3
4
5
6
7

STATE MACHINE CHARTS
 A “State Machine” is used to control a digital system that carries
out a step-by-step procedure or an algorithm.
 A “State Diagram” or “State Graph” is used to specify the
operation of such state machine.
 A “State Machine Chart” is an alternative to state diagram, and
the SM chart has the following advantages:
• It offers an easier understanding of the digital system.
• It automatically satisfies the conditions of the state graph
(exactly one true transition from a state at any time, unique
definition of the next state for every input combination).
• It directly leads to a hardware realization of the system.

• An SM chart contains 3 principal components, as shown.
• An SM chart is constructed from SM blocks, where each SM
block describes the machine operation during one state.
• Therefore, each SM block contains exactly one state box,
together with decision boxes and conditional output boxes
that are associated with that particular state.
• Thus, an SM block contains exactly one entrance path, and
one or more exit paths.

• A path through an SM block from
entrance to exit is called as “link path”.
• In an SM block, when the system
enters that state, the outputs in the
state box become true.
• e.g., when state S1 is entered, Z1 & Z2
become 1. If X1 = 0, then Z3 & Z4 also
become 1. If X2 = 0, then the machine
goes to the next state via exit path 1.
During this condition, Z5 remains at 0.
• If X1 = 1, then Z3 & Z4 remain at 0,
and if X3 = 0, then Z5 becomes 1, and
the machine goes to the next state via
exit path 3.

• A given SM block can be
drawn in different forms, as
shown in the figure.
• Here, Z1 = A + BC. As this
is a combinational circuit,
there is only one state, and
there is no state change.
• The second SM chart
allows for individual testing
of input variables, and the
function is, Z1 = A + A'BC,
which is the same.

Rules for constructing an SM block
1. For every valid combination of input variables, exactly
one exit path must be defined.
2. Within an SM block, no internal feedback is allowed.
3. SM block can be drawn either in a serial form or in a
parallel form. Both are equivalent, as all the tests take
place within one clock time.

A given state graph can be converted into an equivalent SM chart, as shown. This
state graph has 3 Moore outputs (Za, Zb, Zc) and 2 Mealy outputs (Z1, Z2). Hence,
the Moore outputs will appear in state boxes and Mealy outputs will appear in
conditional output boxes. Each SM block will have only one decision box, as
there is only one input variable to be tested.

Example: Derivation of SM chart for a Binary multiplier
• Abbreviations: St = Start, Sh = Shift,
Ad = Add, M = current multiplier bit,
K = completion signal.
• If M = 1, the multiplicand is added to the
contents of accumulator, followed by a right
shift. If M = 0, then the addition is skipped,
and only the right shift occurs.
• Conversion of the SM chart into Verilog
code is a straightforward process.
• “case” statement can be used to specify
each state, and “if” statement can be used
for the conditional output boxes.

Verilog code for the
Binary multiplier

Realization of SM charts
Example-1:
 As there are 3 states, the state
assignments can be 00, 01 & 11.
 Taking these values as A & B,
Za = A'B', Zb = A'B, Zc = AB,
Z1 = ABX', Z2 = ABX.
 From the link paths 2 & 3, the
next state of A can be written as,
A+ = A'BX + ABX
 From the link paths 1, 2 & 3, the
next state of B is written as,
B+ = A'B'X + A'BX + ABX

Procedure for deriving the next state equation
1. Perform state assignment for all of the states.
2. Write the output equations directly from the SM chart.
3. For the next state, identify all the states in which Q = 1.
4. Find all the link paths that lead into the particular state.
5. For each link path, find a term that has value equal to 1.
6. The expression for Q+ is formed by ORing all the terms.
7. Q+ is realized using D-FF and combinational circuit.

Example-2:  As there are 4 states, the state
assignments can be 00, 01, 10 & 11,
respectively for S0, S1, S2 & S3.
 Load = A'B'St, Ad = A'BM,
Sh = A'BM' + AB'
 A is true in S2 & S3. Hence,
A+ = A'BM + A'BM'K + AB'K
 B is true in S1 & S3. Hence,
B+ = A'B'St + A'BM'K' + AB'K'
+ A'BM'K + AB'K
Or, B+ = A'B'St + A'BM' + AB'

A B St M K A+ B+ Load Sh Ad Done
S0
0 0 0 - - 0 0 0 0 0 0
0 0 1 - - 0 1 1 0 0 0
S1
0 1 - 0 0 0 1 0 1 0 0
0 1 - 0 1 1 1 0 1 0 0
0 1 - 1 - 1 0 0 0 1 0
S2
1 0 - - 0 0 1 0 1 0 0
1 0 - - 1 1 1 0 1 0 0
S3 1 1 - - - 0 0 0 0 0 1
State transition table for multiplier control

System design using HDL - Module 3

More Related Content

What's hot (20)

Similar to System design using HDL - Module 3 (20)

More from Aravinda Koithyar (20)

Recently uploaded (20)

System design using HDL - Module 3