SlideShare a Scribd company logo
11/17/2019 1
A. Computer Architecture
Single Cycle Datapath
11/17/2019 2
The CPU
• Processor (CPU): the active part of the computer,
which does all the work (data manipulation and
decision-making)
– Datapath: portion of the processor which contains
hardware necessary to perform all operations
required by the computer
– Control: portion of the processor (also in
hardware) which tells the datapath what needs to
be done (the brain)
11/17/2019 3
The Processor: Datapath & Control
11/17/2019 4
Abstract View of the DataPath
• The data path contains 2 types of logic elements:
– Combinational: Elements that operate on data values. Their
outputs depend on their inputs. The ALU is an combinational
element.
– State: Elements with internal storage. Their state is defined
by the values they contain (memory and registers).
Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
11/17/2019 5
Clocking Methodology
11/17/2019 6
Our Implementation
11/17/2019 7
Clocking Methodology
Registers
Register
ALU
Read Write
11/17/2019 8
Clocking Methodology
Read
Write
11/17/2019 9
Instruction Datapath
Instruction
Memory
Read address
Instruction
PC
Add
4
• Instructions will be held in
the instruction memory
• The instruction to fetch is at
the location specified by the
PC
– Instr. = M[PC]
Note: Regular instruction width
(32 for MIPS) makes this easy
• After we fetch one
instruction, the PC must be
incremented to the next
instruction
• All instructions are 4 bytes
• PC = PC + 4
11/17/2019 10
R-type Instruction Datapath
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Result
Zero
ALU
Instruction
• R-type Instructions have three registers
– Two read (Rs, Rt) to provide data to the ALU
– One write (Rd) to receive data from the ALU
• We’ll need to specify the operation to the ALU (later...)
• We might be interested if the result of the ALU is zero (later...)
Read reg num A
11/17/2019 11
Memory Operations
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
• Memory operations first need to compute the effective address
– LW $t1, 450($s3) # E.A. = 450 + $s3
– Add together one register and 16 bits of immediate data
– Immediate data needs to be converted from 16-bit to 32-bit
• Memory then performs load or store using destination register
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
11/17/2019 12
Branches
Add
Result
Sh.
Left
2
Result
Zero
sign
extend
16 32
PC + 4
To control
logic
Instruction
• Branches conditionally
change the next instruction
– BEQ $2, $1, 42
– The offset is specified as
the number of words to
be added to the next
instruction (PC+4)
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
• Control logic has to decide if
the branch is taken
– Uses ‘zero’ output of ALU
• Take offset, multiply by 4
– Shift left two
• Add this to PC+4 (from PC
logic)
offset
11/17/2019 13
Integrating the R-types and Memory
• R-types and Load/Stores are similar in many respects
• Differences:
– 2nd ALU source: R-types use register, I-types use
Immediate
– Write Data: R-types use ALU result, I-types use memory
• Mux the conflicting datapaths together
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
0
1
1
0
Memory
Datapath
11/17/2019 14
Adding the instruction memory
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Simply add the instruction memory
and PC to the beginning of the datapath.
Data Memory
Read address
Write address
Write data
Read data
Result
Zero 1
0
0
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
11/17/2019 15
Adding the Branch Datapath
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Data Memory
Read address
Write address
Write data
Read data
Result
Zero 1
0
0
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Add
Result
Sh.
Left
2
0
1
Now we have the datapath for R-type, I-type, and branch instructions.
On to the control logic!
11/17/2019 16
When does everything happen?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Combinational Logic:
Just does it! Outputs are
always just a function of its
inputs (with some delay)
Registers: Written at the end of the clock cycle.
(Rising edge triggered).
clk
clk
clk
Single-Cycle Design
11/17/2019 17
What do we need to control?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
ALU -
What is the
Operation?
Memory-
Read/Write/neither?
Mux - are we
branching or not?
Mux - Where
does 2nd ALU
operand come
from?
Registers-
Should we
write data? Mux - Result from
ALU or Memory?
Almost all of the information we need is in the instruction!
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
11/17/2019 18
The ALU
• The ALU is stuck right in the middle of everything...
• It must:
– Add, Subtract, And, or Or for arithmetic instructions
– Subtract for a branch on equal
– Subtract and set for a SLT
– Add for a memory access
0
1
A
Operation
Result
+ 2
B
CarryIn
CarryOut
0
1
BInvert
3
Less
Function BInvert Op Carryin Result
And 0 00 0 R = A • B
Or 0 01 0 R = A  B
Add 0 10 0 R = A + B
Subtract 1 10 1 R = A - B
SLT 1 11 1 R = 1 if A < B
0 if A B
Always the same: Combine into one signal called “sub”
11/17/2019 19
Setting the ALU controls
• The instruction Opcode and Function give us the info we need
– For R-type instructions, Opcode is zero, function code
determines ALU controls
Instruction Opcode ALUOp Funct. Code ALU action ALU control
sub op
add R-type 10 100000 add 0 10
sub R-type 10 100010 subtract 1 10
and R-type 10 100100 and 0 00
or R-type 10 100101 or 0 01
SLT R-type 10 101010 SLT 1 11
New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type
– For I-type instructions, Opcode determines ALU controls
load word LW 00 xxxxxx add 0 10
store word SW 00 xxxxxx add 0 10
branch equal BEQ 01 xxxxxx subtract 1 10
11/17/2019 20
Decoding the Instruction - Data
The instruction holds the key to all of the data signals
Write
reg./
Read
reg. B
R-type
Memory,
Branch
Opcode RS RT RD ShAmt Function
31-26 25-21 20-16 15-11 10-6 5-0
Opcode RS RT Immediate Data
31-26 25-21 20-16 15-0
To ctrl
logic
Read
reg. A
Memory address or Branch Offset
To ctrl
logic
Read
reg. A
Read
reg. B
Write
reg.
To ALU
Control
Not
Used
One problem - Write register number must come from two different places.
11/17/2019 21
Instruction Decoding
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
Ctrl
Read Reg A: Rs
Read Reg B: Rt
Write Reg: Either Rd or Rt
Immediate Data: [15-0]
Opcode: [31-26]
0
1
We can decode the data simply
by dividing up the instruction bus
11/17/2019 22
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
ALU Control - A function of: ALUOp and the function code
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
Load
Store
Load
Memory
Load,R-type BEQ and zero
00: Memory
01: Branch
10: R-type
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
R-type
11/17/2019 23
Inside the control oval
Reg ALU Mem Reg Mem Mem
Instruction Opcode Write Src To Reg Dest Read Write PCSrc ALUOp
• This control logic can be decoded in several ways:
– Random logic, PLA, PAL
• Just build hardware that looks for the 4 opcodes
– For each opcode, assert the appropriate signals
Note: BEQ must also check the zero output of the ALU...
BEQ 000100 0 0 x x 0 0 1 01
R-format 000000 1 0 0 1 0 0 0 10
LW 100011 1 1 1 0 1 0 0 00
SW 101011 0 1 x x 0 1 0 00
0:Rt
1:Rd
0:Reg
1:Imm
1:Mem
0:ALU
1:Branch
00:Mem
01:Branch
10:R-type
11/17/2019 24
Control Unit Implementation
11/17/2019 25
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
Read
Write
We must AND
BEQ and Zero
11/17/2019 26
Jumping
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
Read
Write
1
0
Sh.
Left
2
J:[25-0]
Concat.
26
4
32
28
[31-28]
Jump
11/17/2019 27
Complete Control
11/17/2019 28
Operation of the Datapath
• Let's see the stages of execution of a R-type instruction
add $t1,$t2,$t3:
1. An instruction is fetched from memory, the PC is incremented
2. Two registers $t2 and $t3 are read from the register file.
3. The ALU operates on the data read from the register file.
4. The results of the ALU is written into the register $t1.
• Let's look at lw $t1,offset($t2)
1. An instruction is fetched from memory, the PC is incremented
2. The register $t2 is read from the register file.
3. The ALU computes the sum of $t2 and the sign-extended offset.
4. The sum from the ALU is used as the address for the data memory.
5. The data from memory is written into register $t1.
11/17/2019 29
Performance of Single-Cycle
Machines
• Let's assume that the operation time for the following units is:
Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register
file - 1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
• Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
• Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type
Load
Store
Branch
Jump
11/17/2019 30
Performance of Single-Cycle
Machines
• Let's assume that the operation time for the following units is:
Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register
file - 1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
• Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
• Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type 2 1 2 0 1 6ns
Load 2 1 2 2 1 8ns
Store 2 1 2 2 7ns
Branch 2 1 2 5ns
Jump 2 2ns
11/17/2019 31
Fixed vs. Variable Cycle Length
• Lets Assume a program has the following instruction mix: 24%
loads, 12% stores, 44% R-type, 18% branchs, 2% jumps.
• For the fixed cycle length the cycle time is 8 ns, long enough for
the longest instruction (load). Thus each instruction takes 8 ns
to execute.
• For the variable cycle time the average CPU clock cycle is:
8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns
• It is obvious that the variable clock implementation is faster but
it is extremely hard to implement.
• Variable clock implementation is 8/6.3 = 1.27 times faster
• When adding instructions such as multiply and divide which can
take tens of cycles this scheme is too slow.
11/17/2019 32
Observations on the Single Cycle
Design
• The single-cycle datapath is straightforward, but...
– It has to use 3 separate ALU’s
– It has separate Instruction and Data memories
– Cycle time is determined by worst-case path
• A multi-cycle datapath might be better
– We can reuse some of the hardware
– We can combine the memories
– Cycle time is still constant, but instructions may
take differing numbers of cycles

More Related Content

PPT
Pipeline hazards in computer Architecture ppt
PDF
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
PPTX
Lexical analyzer generator lex
PPT
Amortized Analysis of Algorithms
PPTX
Introduction to Dynamic Programming, Principle of Optimality
PPT
Quick sort Algorithm Discussion And Analysis
PPTX
Register allocation and assignment
PPTX
Realizations of discrete time systems 1 unit
Pipeline hazards in computer Architecture ppt
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Lexical analyzer generator lex
Amortized Analysis of Algorithms
Introduction to Dynamic Programming, Principle of Optimality
Quick sort Algorithm Discussion And Analysis
Register allocation and assignment
Realizations of discrete time systems 1 unit

What's hot (20)

PPTX
lazy learners and other classication methods
PDF
Run time storage
PDF
8051-mazidi-solution
PPTX
flag register of 8086
PDF
Ridge regression
PPTX
Travelling salesman problem using genetic algorithms
PPTX
Artificial Intelligence Notes Unit 4
PPTX
Stochastic Gradient Decent (SGD).pptx
PPTX
MOS Inverters Static Characteristics.pptx
PPTX
Flowshop scheduling
PPT
Some simple example of simulink/F28335 Digital I/O
PPTX
Gate level design, switch logic, pass transistors
PDF
Basics of Digital Filters
PPT
Pipelining & All Hazards Solution
PDF
Sampling Theorem
PPTX
DAA AND DAS
PPTX
Advanced topics in artificial neural networks
PPTX
Math Co-processor 8087
PPTX
ML - Multiple Linear Regression
PPT
Intermediate code generation (Compiler Design)
lazy learners and other classication methods
Run time storage
8051-mazidi-solution
flag register of 8086
Ridge regression
Travelling salesman problem using genetic algorithms
Artificial Intelligence Notes Unit 4
Stochastic Gradient Decent (SGD).pptx
MOS Inverters Static Characteristics.pptx
Flowshop scheduling
Some simple example of simulink/F28335 Digital I/O
Gate level design, switch logic, pass transistors
Basics of Digital Filters
Pipelining & All Hazards Solution
Sampling Theorem
DAA AND DAS
Advanced topics in artificial neural networks
Math Co-processor 8087
ML - Multiple Linear Regression
Intermediate code generation (Compiler Design)
Ad

Similar to 3. Single Cycle Data Path in computer architecture (20)

PPTX
BASICS OF MIPS ARCHITECTURE AND THEIR INSTRUCTION SET
PDF
multi cycle in microprocessor 8086 sy B-tech
PPT
Computer Organization Unit 4 Processor &Control Unit
PPT
Unit 1 basic structure of computers
DOCX
4th sem,(cs is),computer org unit-7
PPTX
Multicycle Datapath and Control: Enhancing CPU Efficiency through Sequential ...
PPT
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
PDF
lecture07_RISCV_Impl.pdflecture07_RISCV_Impl.pdf
DOCX
Attachment_ VHDL datasheet
PDF
2. ALU and MIPS Arcitecture introduction.pdf
PPT
W8_1: Intro to UoS Educational Processor
PPTX
ARM instruction set
PPT
Register Transfer Language and Micro Operations
PPTX
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
PDF
CPU Architecture
PPTX
Unit 4_DECA_Complete Digital Electronics.pptx
PPTX
Understanding Single-Cycle Datapath Architecture in Computer Systems.pptx
PPT
basic-processing-unit computer organ.ppt
PPT
Computer Organization for third semester Vtu SyllabusModule 4.ppt
PPTX
ARM instruction set
BASICS OF MIPS ARCHITECTURE AND THEIR INSTRUCTION SET
multi cycle in microprocessor 8086 sy B-tech
Computer Organization Unit 4 Processor &Control Unit
Unit 1 basic structure of computers
4th sem,(cs is),computer org unit-7
Multicycle Datapath and Control: Enhancing CPU Efficiency through Sequential ...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
lecture07_RISCV_Impl.pdflecture07_RISCV_Impl.pdf
Attachment_ VHDL datasheet
2. ALU and MIPS Arcitecture introduction.pdf
W8_1: Intro to UoS Educational Processor
ARM instruction set
Register Transfer Language and Micro Operations
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
CPU Architecture
Unit 4_DECA_Complete Digital Electronics.pptx
Understanding Single-Cycle Datapath Architecture in Computer Systems.pptx
basic-processing-unit computer organ.ppt
Computer Organization for third semester Vtu SyllabusModule 4.ppt
ARM instruction set
Ad

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PDF
Business Analytics and business intelligence.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to machine learning and Linear Models
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
1_Introduction to advance data techniques.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Business Analytics and business intelligence.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Acumen Training GuidePresentation.pptx
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
Miokarditis (Inflamasi pada Otot Jantung)
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IB Computer Science - Internal Assessment.pptx
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to machine learning and Linear Models
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx

3. Single Cycle Data Path in computer architecture

  • 1. 11/17/2019 1 A. Computer Architecture Single Cycle Datapath
  • 2. 11/17/2019 2 The CPU • Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making) – Datapath: portion of the processor which contains hardware necessary to perform all operations required by the computer – Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)
  • 3. 11/17/2019 3 The Processor: Datapath & Control
  • 4. 11/17/2019 4 Abstract View of the DataPath • The data path contains 2 types of logic elements: – Combinational: Elements that operate on data values. Their outputs depend on their inputs. The ALU is an combinational element. – State: Elements with internal storage. Their state is defined by the values they contain (memory and registers). Registers Register # Data Register # Data memory Address Data Register # PC Instruction ALU Instruction memory Address
  • 9. 11/17/2019 9 Instruction Datapath Instruction Memory Read address Instruction PC Add 4 • Instructions will be held in the instruction memory • The instruction to fetch is at the location specified by the PC – Instr. = M[PC] Note: Regular instruction width (32 for MIPS) makes this easy • After we fetch one instruction, the PC must be incremented to the next instruction • All instructions are 4 bytes • PC = PC + 4
  • 10. 11/17/2019 10 R-type Instruction Datapath Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Result Zero ALU Instruction • R-type Instructions have three registers – Two read (Rs, Rt) to provide data to the ALU – One write (Rd) to receive data from the ALU • We’ll need to specify the operation to the ALU (later...) • We might be interested if the result of the ALU is zero (later...) Read reg num A
  • 11. 11/17/2019 11 Memory Operations Data Memory Read address Write address Write data Read data Result Zero sign extend 16 32 • Memory operations first need to compute the effective address – LW $t1, 450($s3) # E.A. = 450 + $s3 – Add together one register and 16 bits of immediate data – Immediate data needs to be converted from 16-bit to 32-bit • Memory then performs load or store using destination register Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Instruction
  • 12. 11/17/2019 12 Branches Add Result Sh. Left 2 Result Zero sign extend 16 32 PC + 4 To control logic Instruction • Branches conditionally change the next instruction – BEQ $2, $1, 42 – The offset is specified as the number of words to be added to the next instruction (PC+4) Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A • Control logic has to decide if the branch is taken – Uses ‘zero’ output of ALU • Take offset, multiply by 4 – Shift left two • Add this to PC+4 (from PC logic) offset
  • 13. 11/17/2019 13 Integrating the R-types and Memory • R-types and Load/Stores are similar in many respects • Differences: – 2nd ALU source: R-types use register, I-types use Immediate – Write Data: R-types use ALU result, I-types use memory • Mux the conflicting datapaths together Data Memory Read address Write address Write data Read data Result Zero sign extend 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Instruction 0 1 1 0 Memory Datapath
  • 14. 11/17/2019 14 Adding the instruction memory Instruction Memory Add 4 Read address Instruction [31-0] Result PC Simply add the instruction memory and PC to the beginning of the datapath. Data Memory Read address Write address Write data Read data Result Zero 1 0 0 1 sign extend 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A
  • 15. 11/17/2019 15 Adding the Branch Datapath Instruction Memory Add 4 Read address Instruction [31-0] Result PC Data Memory Read address Write address Write data Read data Result Zero 1 0 0 1 sign extend 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Add Result Sh. Left 2 0 1 Now we have the datapath for R-type, I-type, and branch instructions. On to the control logic!
  • 16. 11/17/2019 16 When does everything happen? Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Combinational Logic: Just does it! Outputs are always just a function of its inputs (with some delay) Registers: Written at the end of the clock cycle. (Rising edge triggered). clk clk clk Single-Cycle Design
  • 17. 11/17/2019 17 What do we need to control? Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 ALU - What is the Operation? Memory- Read/Write/neither? Mux - are we branching or not? Mux - Where does 2nd ALU operand come from? Registers- Should we write data? Mux - Result from ALU or Memory? Almost all of the information we need is in the instruction! Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A
  • 18. 11/17/2019 18 The ALU • The ALU is stuck right in the middle of everything... • It must: – Add, Subtract, And, or Or for arithmetic instructions – Subtract for a branch on equal – Subtract and set for a SLT – Add for a memory access 0 1 A Operation Result + 2 B CarryIn CarryOut 0 1 BInvert 3 Less Function BInvert Op Carryin Result And 0 00 0 R = A • B Or 0 01 0 R = A  B Add 0 10 0 R = A + B Subtract 1 10 1 R = A - B SLT 1 11 1 R = 1 if A < B 0 if A B Always the same: Combine into one signal called “sub”
  • 19. 11/17/2019 19 Setting the ALU controls • The instruction Opcode and Function give us the info we need – For R-type instructions, Opcode is zero, function code determines ALU controls Instruction Opcode ALUOp Funct. Code ALU action ALU control sub op add R-type 10 100000 add 0 10 sub R-type 10 100010 subtract 1 10 and R-type 10 100100 and 0 00 or R-type 10 100101 or 0 01 SLT R-type 10 101010 SLT 1 11 New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type – For I-type instructions, Opcode determines ALU controls load word LW 00 xxxxxx add 0 10 store word SW 00 xxxxxx add 0 10 branch equal BEQ 01 xxxxxx subtract 1 10
  • 20. 11/17/2019 20 Decoding the Instruction - Data The instruction holds the key to all of the data signals Write reg./ Read reg. B R-type Memory, Branch Opcode RS RT RD ShAmt Function 31-26 25-21 20-16 15-11 10-6 5-0 Opcode RS RT Immediate Data 31-26 25-21 20-16 15-0 To ctrl logic Read reg. A Memory address or Branch Offset To ctrl logic Read reg. A Read reg. B Write reg. To ALU Control Not Used One problem - Write register number must come from two different places.
  • 21. 11/17/2019 21 Instruction Decoding Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] Op:[31-26] Ctrl Read Reg A: Rs Read Reg B: Rt Write Reg: Either Rd or Rt Immediate Data: [15-0] Opcode: [31-26] 0 1 We can decode the data simply by dividing up the instruction bus
  • 22. 11/17/2019 22 Control Signals Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A ALU Ctrl 6 ALUOp ALU Control - A function of: ALUOp and the function code RegWrite MemToReg MemWrite MemRead ALUSrc PCSrc Load Store Load Memory Load,R-type BEQ and zero 00: Memory 01: Branch 10: R-type 0 1 Ctrl Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] Op:[31-26] FC:[5-0] RegDest R-type
  • 23. 11/17/2019 23 Inside the control oval Reg ALU Mem Reg Mem Mem Instruction Opcode Write Src To Reg Dest Read Write PCSrc ALUOp • This control logic can be decoded in several ways: – Random logic, PLA, PAL • Just build hardware that looks for the 4 opcodes – For each opcode, assert the appropriate signals Note: BEQ must also check the zero output of the ALU... BEQ 000100 0 0 x x 0 0 1 01 R-format 000000 1 0 0 1 0 0 0 10 LW 100011 1 1 1 0 1 0 0 00 SW 101011 0 1 x x 0 1 0 00 0:Rt 1:Rd 0:Reg 1:Imm 1:Mem 0:ALU 1:Branch 00:Mem 01:Branch 10:R-type
  • 24. 11/17/2019 24 Control Unit Implementation
  • 25. 11/17/2019 25 Control Signals Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A ALU Ctrl 6 ALUOp RegWrite MemToReg MemWrite MemRead ALUSrc PCSrc 0 1 Ctrl Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] Op:[31-26] FC:[5-0] RegDest BEQ Read Write We must AND BEQ and Zero
  • 26. 11/17/2019 26 Jumping Instruction Memory Data Memory Add Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Result Sh. Left 2 0 1 1 0 0 1 sign extend PC 16 32 Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A ALU Ctrl 6 ALUOp RegWrite MemToReg MemWrite MemRead ALUSrc PCSrc 0 1 Ctrl Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] Op:[31-26] FC:[5-0] RegDest BEQ Read Write 1 0 Sh. Left 2 J:[25-0] Concat. 26 4 32 28 [31-28] Jump
  • 28. 11/17/2019 28 Operation of the Datapath • Let's see the stages of execution of a R-type instruction add $t1,$t2,$t3: 1. An instruction is fetched from memory, the PC is incremented 2. Two registers $t2 and $t3 are read from the register file. 3. The ALU operates on the data read from the register file. 4. The results of the ALU is written into the register $t1. • Let's look at lw $t1,offset($t2) 1. An instruction is fetched from memory, the PC is incremented 2. The register $t2 is read from the register file. 3. The ALU computes the sum of $t2 and the sign-extended offset. 4. The sum from the ALU is used as the address for the data memory. 5. The data from memory is written into register $t1.
  • 29. 11/17/2019 29 Performance of Single-Cycle Machines • Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. • Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length. 2. Every instruction operates in a varying length clock cycle. • Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total R-Type Load Store Branch Jump
  • 30. 11/17/2019 30 Performance of Single-Cycle Machines • Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. • Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length. 2. Every instruction operates in a varying length clock cycle. • Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total R-Type 2 1 2 0 1 6ns Load 2 1 2 2 1 8ns Store 2 1 2 2 7ns Branch 2 1 2 5ns Jump 2 2ns
  • 31. 11/17/2019 31 Fixed vs. Variable Cycle Length • Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-type, 18% branchs, 2% jumps. • For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction (load). Thus each instruction takes 8 ns to execute. • For the variable cycle time the average CPU clock cycle is: 8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns • It is obvious that the variable clock implementation is faster but it is extremely hard to implement. • Variable clock implementation is 8/6.3 = 1.27 times faster • When adding instructions such as multiply and divide which can take tens of cycles this scheme is too slow.
  • 32. 11/17/2019 32 Observations on the Single Cycle Design • The single-cycle datapath is straightforward, but... – It has to use 3 separate ALU’s – It has separate Instruction and Data memories – Cycle time is determined by worst-case path • A multi-cycle datapath might be better – We can reuse some of the hardware – We can combine the memories – Cycle time is still constant, but instructions may take differing numbers of cycles