SlideShare a Scribd company logo
CS 465
Computer Architecture
Fall 2009
Lecture 01: Introduction
Daniel Barbará ( cs.gmu.edu/~dbarbara)
[Adapted from Computer Organization and Design,
Patterson & Hennessy, © 2005, UCB]
Course Administration
 Instructor: Daniel Barbará
dbarbara@gmu.edu
4420 Eng. Bldg.
 Text: Required: Computer Organization & Design –
The Hardware Software Interface, Patterson &
Hennessy, the 4th Edition
Grading Information
 Grade determinates
 Midterm Exam ~25%
 Final Exam 1 ~35%
 Homeworks ~40%
- Due at the beginning of class (or, if its code to be submitted
electronically, by 17:00 on the due date). No late assignments
will be accepted.
 Course prerequisites
 grade of C or better in CS 367
Acknowledgements
 Slides adopted from Dr. Zhong
 Contributions from Dr. Setia
 Slides also adopt materials from many other universities
 IMPORTANT:
- Slides are not intended as replacement for the text
- You spent the money on the book, please read it!
Course Topics (Tentative)
 Instruction set architecture (Chapter 2)
 MIPS
 Arithmetic operations & data (Chapter 3)
 System performance (Chapter 4)
 Processor (Chapter 5)
 Datapath and control
 Pipelining to improve performance (Chapter 6)
 Memory hierarchy (Chapter 7)
 I/O (Chapter 8)
Focus of the Course
 How computers work
 MIPS instruction set architecture
 The implementation of MIPS instruction set architecture – MIPS
processor design
 Issues affecting modern processors
 Pipelining – processor performance improvement
 Cache – memory system, I/O systems
Why Learn Computer Architecture?
 You want to call yourself a “computer scientist”
 Computer architecture impacts every other aspect of computer science
 You need to make a purchasing decision or offer “expert” advice
 You want to build software people use – sell many, many copies-
(need performance)
 Both hardware and software affect performance
- Algorithm determines number of source-level statements
- Language/compiler/architecture determine machine instructions (Chapter 2
and 3)
- Processor/memory determine how fast instructions are executed (Chapter 5,
6, and 7)
- Assessing and understanding performance(Chapter 4)
Outline Today
 Course logistics
 Computer architectures overview
 Trends in computer architectures
Computer Systems
 Software
 Application software – Word Processors, Email, Internet
Browsers, Games
 Systems software – Compilers, Operating Systems
 Hardware
 CPU
 Memory
 I/O devices (mouse, keyboard, display, disks, networks,……..)
Operating
systems
Applications
software
laTE X
Virtual
memory
File
system
I/O device
drivers
Assemblers
as
Compilers
gcc
Systems
software
Software
Software
D.Barbará
instruction set
software
hardware
Instruction Set Architecture
 One of the most important abstractions is ISA
 A critical interface between HW and SW
 Example: MIPS
 Desired properties
 Convenience (from software side)
 Efficiency (from hardware side)
D.Barbará
What is Computer Architecture
 Programmer’s view: a pleasant environment
 Operating system’s view: a set of resources (hw
& sw)
 System architecture view: a set of components
 Compiler’s view: an instruction set architecture
with OS help
 Microprocessor architecture view: a set of
functional units
 VLSI designer’s view: a set of transistors
implementing logic
 Mechanical engineer’s view: a heater!
D.Barbará
What is Computer Architecture
 Patterson & Hennessy: Computer
architecture = Instruction set architecture
+ Machine organization + Hardware
 For this course, computer architecture
mainly refers to ISA (Instruction Set
Architecture)
 Programmer-visible, serves as the boundary
between the software and hardware
 Modern ISA examples: MIPS, SPARC,
PowerPC, DEC Alpha
D.Barbará
Organization and Hardware
 Organization: high-level aspects of a computer’s
design
 Principal components: memory, CPU, I/O, …
 How components are interconnected
 How information flows between components
 E.g. AMD Opteron 64 and Intel Pentium 4: same ISA
but different organizations
 Hardware: detailed logic design and the
packaging technology of a computer
 E.g. Pentium 4 and Mobile Pentium 4: nearly identical
organizations but different hardware details
Types of computers and their applications
 Desktop
 Run third-party software
 Office to home applications
 30 years old
 Servers
 Modern version of what used to be called mainframes,
minicomputers and supercomputers
 Large workloads
 Built using the same technology in desktops but higher capacity
- Expandable
- Scalable
- Reliable
 Large spectrum: from low-end (file storage, small businesses) to
supercomputers (high end scientific and engineering
applications)
- Gigabytes to Terabytes to Petabytes of storage
 Examples: file servers, web servers, database servers
Types of computers…
 Embedded
 Microprocessors everywhere! (washing machines, cell phones,
automobiles, video games)
 Run one or a few applications
 Specialized hardware integrated with the application (not your
common processor)
 Usually stringent limitations (battery power)
 High tolerance for failure (don’t want your airplane avionics to
fail!)
 Becoming ubiquitous
 Engineered using processor cores
- The core allows the engineer to integrate other functions into the
processor for fabrication on the same chip
- Using hardware description languages: Verilog, VHDL
Where is the Market?
290
93
3
488
114
3
892
135
4
862
129
4
1122
131
5
0
200
400
600
800
1000
1200
1998 1999 2000 2001 2002
Embedded
Desktop
Servers
Millions
of
Computers
In this class you will learn
 How programs written in a high-level language (e.g.,
Java) translate into the language of the hardware and
how the hardware executes them.
 The interface between software and hardware and how
software instructs hardware to perform the needed
functions.
 The factors that determine the performance of a program
 The techniques that hardware designers employ to
improve performance.
As a consequence, you will understand what features may
make one computer design better than another for a
particular application
High-level to Machine Language
High-level language program
(in C)
Assembly language program
(for MIPS)
Binary machine language program
(for MIPS)
Compiler
Assembler
Evolution…
 In the beginning there were only bits… and people spent
countless hours trying to program in machine language
01100011001 011001110100
 Finally before everybody went insane, the assembler
was invented: write in mnemonics called assembly
language and let the assembler translate (a one to one
translation)
Add A,B
 This wasn’t for everybody, obviously… (imagine how
modern applications would have been possible in
assembly), so high-level language were born (and with
them compilers to translate to assembly, a many-to-one
translation)
C= A*(SQRT(B)+3.0)
THE BIG IDEA
 Levels of abstraction: each layer provides its own
(simplified) view and hides the details of the next.
Instruction Set Architecture (ISA)
 ISA: An abstract interface between the hardware and the
lowest level software of a machine that encompasses all
the information necessary to write a machine language
program that will run correctly, including instructions,
registers, memory access, I/O, and so on.
“... the attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data flows and
controls, the logic design, and the physical implementation.”
– Amdahl, Blaauw, and Brooks, 1964
 Enables implementations of varying cost and performance to run
identical software
 ABI (application binary interface): The user portion of the
instruction set plus the operating system interfaces used
by application programmers. Defines a standard for
binary portability across computers.
ISA Type Sales
0
200
400
600
800
1000
1200
1400
1998 1999 2000 2001 2002
Other
SPARC
Hitachi SH
PowerPC
Motorola 68K
MIPS
IA-32
ARM
PowerPoint “comic” bar chart with approximate values (see
text for correct values)
Millions
of
Processor
Organization of a computer
Anatomy of Computer
Personal Computer
Processor
Computer
Control
(“brain”)
Datapath
(“brawn”)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Display,
Printer
Disk
(where
programs,
data
live when
not running)
5 classic components
 Datapath: performs arithmetic operation
 Control: guides the operation of other components based on the user
instructions
PC Motherboard Closeup
Inside the Pentium 4
Moore’s Law
 In 1965, Gordon Moore predicted that the number of
transistors that can be integrated on a die would double
every 18 to 24 months (i.e., grow exponentially with
time).
 Amazingly visionary – million transistor/chip barrier was
crossed in the 1980’s.
 2300 transistors, 1 MHz clock (Intel 4004) - 1971
 16 Million transistors (Ultra Sparc III)
 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001
 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die
(Intel Pentium 4) - 2004
 140 Million transistor (HP PA-8500)
Processor Performance Increase
1
10
100
1000
10000
1987 1989 1991 1993 1995 1997 1999 2001 2003
Year
Performance
(SPEC
Int)
SUN-4/260 MIPS M/120
MIPS M2000
IBM RS6000
HP 9000/750
DEC AXP/500 IBM POWER 100
DEC Alpha 4/266
DEC Alpha 5/500
DEC Alpha 21264/600
DEC Alpha 5/300
DEC Alpha 21264A/667
Intel Xeon/2000
Intel Pentium 4/3000
Year
Transistors
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
CMOS improvements:
• Die size: 2X every 3 yrs
• Line width: halve / 7 yrs
Itanium II: 241 million
Pentium 4: 55 million
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
Moore’s Law
Trend: Microprocessor Capacity
Moore’s Law
 “Cramming More Components onto Integrated Circuits”
 Gordon Moore, Electronics, 1965
 # of transistors per cost-effective integrated circuit doubles every 18 months
“Transistor capacity doubles every 18-24 months”
Speed 2x / 1.5 years (since ‘85);
100X performance in last decade
Trend: Microprocessor Performance
Memory
 Dynamic Random Access Memory (DRAM)
 The choice for main memory
 Volatile (contents go away when power is lost)
 Fast
 Relatively small
 DRAM capacity: 2x / 2 years (since ‘96);
64x size improvement in last decade
 Static Random Access Memory (SRAM)
 The choice for cache
 Much faster than DRAM, but less dense and more costly
 Magnetic disks
 The choice for secondary memory
 Non-volatile
 Slower
 Relatively large
 Capacity: 2x / 1 year (since ‘97)
250X size in last decade
 Solid state (Flash) memory
 The choice for embedded computers
 Non-volatile
Memory
 Optical disks
 Removable, therefore very large
 Slower than disks
 Magnetic tape
 Even slower
 Sequential (non-random) access
 The choice for archival
DRAM Capacity Growth
10
100
1000
10000
100000
1000000
1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
Year of introduction
Kbit
capacity
16K
64K
256K
1M
4M
16M
64M
128M
256M
512M
Trend: Memory Capacity
size
Year
Bits
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size (Mbit)
1980 0.0625
1983 0.25
1986 1
1989 4
1992 16
1996 64
1998 128
2000 256
2002 512
2006 2048
• Now 1.4X/yr, or 2X every 2 years.
• more than 10000X since 1980!
Growth of capacity per chip
(Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta = 1024)
Come up with a clever mnemonic, fame!
Dramatic Technology Change
 State-of-the-art PC when you graduate:
(at least…)
 Processor clock speed: 5000 MegaHertz
(5.0 GigaHertz)
 Memory capacity: 4000 MegaBytes
(4.0 GigaBytes)
 Disk capacity: 2000 GigaBytes
(2.0 TeraBytes)
 New units! Mega => Giga, Giga => Tera
Example Machine Organization
 Workstation design target
 25% of cost on processor
 25% of cost on memory (minimum memory size)
 Rest on I/O devices, power supplies, box
CPU
Computer
Control
Datapath
Memory Devices
Input
Output
MIPS R3000 Instruction Set Architecture
 Instruction Categories
 Load/Store
 Computational
 Jump and Branch
 Floating Point
- coprocessor
 Memory Management
 Special
R0 - R31
PC
HI
LO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
Registers
Defining Performance
 Which airplane has the best performance?
0 100 200 300 400 500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000 4000 6000 8000 10000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passengers x mph
§1.4
Performance
Response Time and Throughput
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
- e.g., tasks/transactions/… per hour
 How are response time and throughput affected by
 Replacing the processor with a faster version?
 Adding more processors?
 We’ll focus on response time for now…
Relative Performance
 Define Performance = 1/Execution Time
 “X is n time faster than Y”
n

 X
Y
Y
X
time
Execution
time
Execution
e
Performanc
e
Performanc
 Example: time taken to run a program
 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Measuring Execution Time
 Elapsed time
 Total response time, including all aspects
- Processing, I/O, OS overhead, idle time
 Determines system performance
 CPU time
 Time spent processing a given job
- Discounts I/O time, other jobs’ shares
 Comprises user CPU time and system CPU time
 Different programs are affected differently by CPU and system
performance
CPU Clocking
 Operation of digital hardware governed by a constant-rate clock
Clock (cycles)
Data transfer
and computation
Update state
Clock period
 Clock period: duration of a clock cycle
 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock rate against cycle
count
Rate
Clock
Cycles
Clock
CPU
Time
Cycle
Clock
Cycles
Clock
CPU
Time
CPU



CPU Time Example
 Computer A: 2GHz clock, 10s CPU time
 Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles
 How fast must Computer B clock be?
4GHz
6s
10
24
6s
10
20
1.2
Rate
Clock
10
20
2GHz
10s
Rate
Clock
Time
CPU
Cycles
Clock
6s
Cycles
Clock
1.2
Time
CPU
Cycles
Clock
Rate
Clock
9
9
B
9
A
A
A
A
B
B
B















Instruction Count and CPI
 Instruction Count for a program
 Determined by program, ISA and compiler
 Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
- Average CPI affected by instruction mix
Rate
Clock
CPI
Count
n
Instructio
Time
Cycle
Clock
CPI
Count
n
Instructio
Time
CPU
n
Instructio
per
Cycles
Count
n
Instructio
Cycles
Clock







CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 Same ISA
 Which is faster, and by how much?
1.2
500ps
I
600ps
I
A
Time
CPU
B
Time
CPU
600ps
I
500ps
1.2
I
B
Time
Cycle
B
CPI
Count
n
Instructio
B
Time
CPU
500ps
I
250ps
2.0
I
A
Time
Cycle
A
CPI
Count
n
Instructio
A
Time
CPU




















A is faster…
…by this much
CPI in More Detail
 If different instruction classes take different numbers of
cycles




n
1
i
i
i )
Count
n
Instructio
(CPI
Cycles
Clock
 Weighted average CPI











n
1
i
i
i
Count
n
Instructio
Count
n
Instructio
CPI
Count
n
Instructio
Cycles
Clock
CPI
Relative frequency
CPI Example
 Alternative compiled code sequences using instructions in classes A,
B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
 Sequence 1: IC = 5
 Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
 Avg. CPI = 10/5 = 2.0
 Sequence 2: IC = 6
 Clock Cycles
= 4×1 + 1×2 + 1×3
= 9
 Avg. CPI = 9/6 = 1.5
Performance Summary
 Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc
The BIG Picture
cycle
Clock
Seconds
n
Instructio
cycles
Clock
Program
ns
Instructio
Time
CPU 


Power Trends
 In CMOS IC technology
§1.5
The
Power
Wall
Frequency
Voltage
load
Capacitive
Power 2



×1000
×30 5V → 1V
Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction
0.52
0.85
F
V
C
0.85
F
0.85)
(V
0.85
C
P
P 4
old
2
old
old
old
2
old
old
old
new










 The power wall
 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?
Uniprocessor Performance
§1.6
The
Sea
Change:
The
Switch
to
Multiprocessors
Constrained by power, instruction-level parallelism,
memory latency
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism
- Hardware executes multiple instructions at once
- Hidden from the programmer
 Hard to do
- Programming for performance
- Load balancing
- Optimizing communication and synchronization
SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, Web, …
 SPEC CPU2006
 Elapsed time to execute a selection of programs
- Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
- CINT2006 (integer) and CFP2006 (floating-point)
n
n
1
i
i
ratio
time
Execution


CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
High cache miss rates
SPEC Power Benchmark
 Power consumption of server at different workload levels
 Performance: ssj_ops/sec
 Power: Watts (Joules/sec)












 
 

10
0
i
i
10
0
i
i power
ssj_ops
Watt
per
ssj_ops
Overall
SPECpower_ssj2008 for X4
Target Load % Performance (ssj_ops/sec) Average Power (Watts)
100% 231,867 295
90% 211,282 286
80% 185,803 275
70% 163,427 265
60% 140,160 256
50% 118,324 246
40% 920,35 233
30% 70,500 222
20% 47,126 206
10% 23,066 180
0% 0 141
Overall sum 1,283,590 2,605
∑ssj_ops/ ∑power 493
Pitfall: Amdahl’s Law
 Improving an aspect of a computer and expecting a proportional
improvement in overall performance
§1.8
Fallacies
and
Pitfalls
20
80
20 

n
 Can’t be done!
unaffected
affected
improved T
factor
t
improvemen
T
T 

 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to get 5× overall?
 Corollary: make the common case fast
Fallacy: Low Power at Idle
 Look back at X4 power benchmark
 At 100% load: 295W
 At 50% load: 246W (83%)
 At 10% load: 180W (61%)
 Google data center
 Mostly operates at 10% – 50% load
 At 100% load less than 1% of the time
 Consider designing processors to make power
proportional to load
Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesn’t account for
- Differences in ISAs between computers
- Differences in complexity between instructions
6
6
6
10
CPI
rate
Clock
10
rate
Clock
CPI
count
n
Instructio
count
n
Instructio
10
time
Execution
count
n
Instructio
MIPS







 CPI varies between programs on a given CPU
Concluding Remarks
 Cost/performance is improving
 Due to underlying technology development
 Hierarchical layers of abstraction
 In both hardware and software
 Instruction set architecture
 The hardware/software interface
 Execution time: the best performance measure
 Power is a limiting factor
 Use parallelism to improve performance
§1.9
Concluding
Remarks

More Related Content

PPTX
2021Arch_2_Ch1_1.pptx Fundamentals of Quantitative Design and Analysis
PPTX
microprocessor and microcontroller material
PPT
module01.ppt
PPTX
computer Architecture
PPT
lec01_intr architecture com computeo.ppt
PPTX
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
PPTX
Unit I _Computer organisation andarchitecture
PPTX
CH01-COA10e_Stallings .pptx
2021Arch_2_Ch1_1.pptx Fundamentals of Quantitative Design and Analysis
microprocessor and microcontroller material
module01.ppt
computer Architecture
lec01_intr architecture com computeo.ppt
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
Unit I _Computer organisation andarchitecture
CH01-COA10e_Stallings .pptx

Similar to CS465Lec1.ppt computer architecture in the fall term (20)

PPTX
Advanced Computer Architecture – An Introduction
PDF
lect1.pdf
PPT
onur-447-spring15-lecture2-isa-afterlecture.ppt
PPTX
IS 139 Lecture 1
PPTX
IS 139 Lecture 1 - 2015
PDF
009911554.pdf
PPTX
lecture3-isa.pptxlecture3-isa.pptxlecture3-isa.pptx
PPT
ComputerOrganization .ppt
PPT
CSE675_01_Introduction.ppt
PPT
CSE675_01_Introduction.ppt
PPT
software engineering CSE675_01_Introduction.ppt
PPTX
Fundamental Of Computer Architecture.pptx
PPTX
Chap 1 CA.pptx
PPT
Fundamentals of Computer Design including performance measurements & quantita...
PPT
Mano PPT for introduction Computer Architecture .ppt
PDF
Introduction of Computer Architecture.pdf
PDF
Lecture1.pdf
PPT
1. Introduction to computer Organisation and architecture.ppt
PPT
PPTX
Computer Organization and Design
Advanced Computer Architecture – An Introduction
lect1.pdf
onur-447-spring15-lecture2-isa-afterlecture.ppt
IS 139 Lecture 1
IS 139 Lecture 1 - 2015
009911554.pdf
lecture3-isa.pptxlecture3-isa.pptxlecture3-isa.pptx
ComputerOrganization .ppt
CSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.ppt
Fundamental Of Computer Architecture.pptx
Chap 1 CA.pptx
Fundamentals of Computer Design including performance measurements & quantita...
Mano PPT for introduction Computer Architecture .ppt
Introduction of Computer Architecture.pdf
Lecture1.pdf
1. Introduction to computer Organisation and architecture.ppt
Computer Organization and Design
Ad

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Computer network topology notes for revision
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
annual-report-2024-2025 original latest.
PDF
Mega Projects Data Mega Projects Data
SAP 2 completion done . PRESENTATION.pptx
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
STERILIZATION AND DISINFECTION-1.ppthhhbx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IBA_Chapter_11_Slides_Final_Accessible.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Leprosy and NLEP programme community medicine
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Predictive modeling basics in data cleaning process
Computer network topology notes for revision
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
annual-report-2024-2025 original latest.
Mega Projects Data Mega Projects Data
Ad

CS465Lec1.ppt computer architecture in the fall term

  • 1. CS 465 Computer Architecture Fall 2009 Lecture 01: Introduction Daniel Barbará ( cs.gmu.edu/~dbarbara) [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, UCB]
  • 2. Course Administration  Instructor: Daniel Barbará [email protected] 4420 Eng. Bldg.  Text: Required: Computer Organization & Design – The Hardware Software Interface, Patterson & Hennessy, the 4th Edition
  • 3. Grading Information  Grade determinates  Midterm Exam ~25%  Final Exam 1 ~35%  Homeworks ~40% - Due at the beginning of class (or, if its code to be submitted electronically, by 17:00 on the due date). No late assignments will be accepted.  Course prerequisites  grade of C or better in CS 367
  • 4. Acknowledgements  Slides adopted from Dr. Zhong  Contributions from Dr. Setia  Slides also adopt materials from many other universities  IMPORTANT: - Slides are not intended as replacement for the text - You spent the money on the book, please read it!
  • 5. Course Topics (Tentative)  Instruction set architecture (Chapter 2)  MIPS  Arithmetic operations & data (Chapter 3)  System performance (Chapter 4)  Processor (Chapter 5)  Datapath and control  Pipelining to improve performance (Chapter 6)  Memory hierarchy (Chapter 7)  I/O (Chapter 8)
  • 6. Focus of the Course  How computers work  MIPS instruction set architecture  The implementation of MIPS instruction set architecture – MIPS processor design  Issues affecting modern processors  Pipelining – processor performance improvement  Cache – memory system, I/O systems
  • 7. Why Learn Computer Architecture?  You want to call yourself a “computer scientist”  Computer architecture impacts every other aspect of computer science  You need to make a purchasing decision or offer “expert” advice  You want to build software people use – sell many, many copies- (need performance)  Both hardware and software affect performance - Algorithm determines number of source-level statements - Language/compiler/architecture determine machine instructions (Chapter 2 and 3) - Processor/memory determine how fast instructions are executed (Chapter 5, 6, and 7) - Assessing and understanding performance(Chapter 4)
  • 8. Outline Today  Course logistics  Computer architectures overview  Trends in computer architectures
  • 9. Computer Systems  Software  Application software – Word Processors, Email, Internet Browsers, Games  Systems software – Compilers, Operating Systems  Hardware  CPU  Memory  I/O devices (mouse, keyboard, display, disks, networks,……..)
  • 11. D.Barbará instruction set software hardware Instruction Set Architecture  One of the most important abstractions is ISA  A critical interface between HW and SW  Example: MIPS  Desired properties  Convenience (from software side)  Efficiency (from hardware side)
  • 12. D.Barbará What is Computer Architecture  Programmer’s view: a pleasant environment  Operating system’s view: a set of resources (hw & sw)  System architecture view: a set of components  Compiler’s view: an instruction set architecture with OS help  Microprocessor architecture view: a set of functional units  VLSI designer’s view: a set of transistors implementing logic  Mechanical engineer’s view: a heater!
  • 13. D.Barbará What is Computer Architecture  Patterson & Hennessy: Computer architecture = Instruction set architecture + Machine organization + Hardware  For this course, computer architecture mainly refers to ISA (Instruction Set Architecture)  Programmer-visible, serves as the boundary between the software and hardware  Modern ISA examples: MIPS, SPARC, PowerPC, DEC Alpha
  • 14. D.Barbará Organization and Hardware  Organization: high-level aspects of a computer’s design  Principal components: memory, CPU, I/O, …  How components are interconnected  How information flows between components  E.g. AMD Opteron 64 and Intel Pentium 4: same ISA but different organizations  Hardware: detailed logic design and the packaging technology of a computer  E.g. Pentium 4 and Mobile Pentium 4: nearly identical organizations but different hardware details
  • 15. Types of computers and their applications  Desktop  Run third-party software  Office to home applications  30 years old  Servers  Modern version of what used to be called mainframes, minicomputers and supercomputers  Large workloads  Built using the same technology in desktops but higher capacity - Expandable - Scalable - Reliable  Large spectrum: from low-end (file storage, small businesses) to supercomputers (high end scientific and engineering applications) - Gigabytes to Terabytes to Petabytes of storage  Examples: file servers, web servers, database servers
  • 16. Types of computers…  Embedded  Microprocessors everywhere! (washing machines, cell phones, automobiles, video games)  Run one or a few applications  Specialized hardware integrated with the application (not your common processor)  Usually stringent limitations (battery power)  High tolerance for failure (don’t want your airplane avionics to fail!)  Becoming ubiquitous  Engineered using processor cores - The core allows the engineer to integrate other functions into the processor for fabrication on the same chip - Using hardware description languages: Verilog, VHDL
  • 17. Where is the Market? 290 93 3 488 114 3 892 135 4 862 129 4 1122 131 5 0 200 400 600 800 1000 1200 1998 1999 2000 2001 2002 Embedded Desktop Servers Millions of Computers
  • 18. In this class you will learn  How programs written in a high-level language (e.g., Java) translate into the language of the hardware and how the hardware executes them.  The interface between software and hardware and how software instructs hardware to perform the needed functions.  The factors that determine the performance of a program  The techniques that hardware designers employ to improve performance. As a consequence, you will understand what features may make one computer design better than another for a particular application
  • 19. High-level to Machine Language High-level language program (in C) Assembly language program (for MIPS) Binary machine language program (for MIPS) Compiler Assembler
  • 20. Evolution…  In the beginning there were only bits… and people spent countless hours trying to program in machine language 01100011001 011001110100  Finally before everybody went insane, the assembler was invented: write in mnemonics called assembly language and let the assembler translate (a one to one translation) Add A,B  This wasn’t for everybody, obviously… (imagine how modern applications would have been possible in assembly), so high-level language were born (and with them compilers to translate to assembly, a many-to-one translation) C= A*(SQRT(B)+3.0)
  • 21. THE BIG IDEA  Levels of abstraction: each layer provides its own (simplified) view and hides the details of the next.
  • 22. Instruction Set Architecture (ISA)  ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on. “... the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” – Amdahl, Blaauw, and Brooks, 1964  Enables implementations of varying cost and performance to run identical software  ABI (application binary interface): The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers.
  • 23. ISA Type Sales 0 200 400 600 800 1000 1200 1400 1998 1999 2000 2001 2002 Other SPARC Hitachi SH PowerPC Motorola 68K MIPS IA-32 ARM PowerPoint “comic” bar chart with approximate values (see text for correct values) Millions of Processor
  • 24. Organization of a computer
  • 25. Anatomy of Computer Personal Computer Processor Computer Control (“brain”) Datapath (“brawn”) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk (where programs, data live when not running) 5 classic components  Datapath: performs arithmetic operation  Control: guides the operation of other components based on the user instructions
  • 28. Moore’s Law  In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time).  Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s.  2300 transistors, 1 MHz clock (Intel 4004) - 1971  16 Million transistors (Ultra Sparc III)  42 Million transistors, 2 GHz clock (Intel Xeon) – 2001  55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) - 2004  140 Million transistor (HP PA-8500)
  • 29. Processor Performance Increase 1 10 100 1000 10000 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year Performance (SPEC Int) SUN-4/260 MIPS M/120 MIPS M2000 IBM RS6000 HP 9000/750 DEC AXP/500 IBM POWER 100 DEC Alpha 4/266 DEC Alpha 5/500 DEC Alpha 21264/600 DEC Alpha 5/300 DEC Alpha 21264A/667 Intel Xeon/2000 Intel Pentium 4/3000
  • 30. Year Transistors 1000 10000 100000 1000000 10000000 100000000 1970 1975 1980 1985 1990 1995 2000 i80386 i4004 i8080 Pentium i80486 i80286 i8086 CMOS improvements: • Die size: 2X every 3 yrs • Line width: halve / 7 yrs Itanium II: 241 million Pentium 4: 55 million Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law Trend: Microprocessor Capacity
  • 31. Moore’s Law  “Cramming More Components onto Integrated Circuits”  Gordon Moore, Electronics, 1965  # of transistors per cost-effective integrated circuit doubles every 18 months “Transistor capacity doubles every 18-24 months” Speed 2x / 1.5 years (since ‘85); 100X performance in last decade
  • 33. Memory  Dynamic Random Access Memory (DRAM)  The choice for main memory  Volatile (contents go away when power is lost)  Fast  Relatively small  DRAM capacity: 2x / 2 years (since ‘96); 64x size improvement in last decade  Static Random Access Memory (SRAM)  The choice for cache  Much faster than DRAM, but less dense and more costly  Magnetic disks  The choice for secondary memory  Non-volatile  Slower  Relatively large  Capacity: 2x / 1 year (since ‘97) 250X size in last decade  Solid state (Flash) memory  The choice for embedded computers  Non-volatile
  • 34. Memory  Optical disks  Removable, therefore very large  Slower than disks  Magnetic tape  Even slower  Sequential (non-random) access  The choice for archival
  • 35. DRAM Capacity Growth 10 100 1000 10000 100000 1000000 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 Year of introduction Kbit capacity 16K 64K 256K 1M 4M 16M 64M 128M 256M 512M
  • 36. Trend: Memory Capacity size Year Bits 1000 10000 100000 1000000 10000000 100000000 1000000000 1970 1975 1980 1985 1990 1995 2000 year size (Mbit) 1980 0.0625 1983 0.25 1986 1 1989 4 1992 16 1996 64 1998 128 2000 256 2002 512 2006 2048 • Now 1.4X/yr, or 2X every 2 years. • more than 10000X since 1980! Growth of capacity per chip
  • 37. (Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta = 1024) Come up with a clever mnemonic, fame! Dramatic Technology Change  State-of-the-art PC when you graduate: (at least…)  Processor clock speed: 5000 MegaHertz (5.0 GigaHertz)  Memory capacity: 4000 MegaBytes (4.0 GigaBytes)  Disk capacity: 2000 GigaBytes (2.0 TeraBytes)  New units! Mega => Giga, Giga => Tera
  • 38. Example Machine Organization  Workstation design target  25% of cost on processor  25% of cost on memory (minimum memory size)  Rest on I/O devices, power supplies, box CPU Computer Control Datapath Memory Devices Input Output
  • 39. MIPS R3000 Instruction Set Architecture  Instruction Categories  Load/Store  Computational  Jump and Branch  Floating Point - coprocessor  Memory Management  Special R0 - R31 PC HI LO OP OP OP rs rt rd sa funct rs rt immediate jump target 3 Instruction Formats: all 32 bits wide Registers
  • 40. Defining Performance  Which airplane has the best performance? 0 100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 0 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 0 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 0 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph §1.4 Performance
  • 41. Response Time and Throughput  Response time  How long it takes to do a task  Throughput  Total work done per unit time - e.g., tasks/transactions/… per hour  How are response time and throughput affected by  Replacing the processor with a faster version?  Adding more processors?  We’ll focus on response time for now…
  • 42. Relative Performance  Define Performance = 1/Execution Time  “X is n time faster than Y” n   X Y Y X time Execution time Execution e Performanc e Performanc  Example: time taken to run a program  10s on A, 15s on B  Execution TimeB / Execution TimeA = 15s / 10s = 1.5  So A is 1.5 times faster than B
  • 43. Measuring Execution Time  Elapsed time  Total response time, including all aspects - Processing, I/O, OS overhead, idle time  Determines system performance  CPU time  Time spent processing a given job - Discounts I/O time, other jobs’ shares  Comprises user CPU time and system CPU time  Different programs are affected differently by CPU and system performance
  • 44. CPU Clocking  Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transfer and computation Update state Clock period  Clock period: duration of a clock cycle  e.g., 250ps = 0.25ns = 250×10–12s  Clock frequency (rate): cycles per second  e.g., 4.0GHz = 4000MHz = 4.0×109Hz
  • 45. CPU Time  Performance improved by  Reducing number of clock cycles  Increasing clock rate  Hardware designer must often trade off clock rate against cycle count Rate Clock Cycles Clock CPU Time Cycle Clock Cycles Clock CPU Time CPU   
  • 46. CPU Time Example  Computer A: 2GHz clock, 10s CPU time  Designing Computer B  Aim for 6s CPU time  Can do faster clock, but causes 1.2 × clock cycles  How fast must Computer B clock be? 4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock 9 9 B 9 A A A A B B B               
  • 47. Instruction Count and CPI  Instruction Count for a program  Determined by program, ISA and compiler  Average cycles per instruction  Determined by CPU hardware  If different instructions have different CPI - Average CPI affected by instruction mix Rate Clock CPI Count n Instructio Time Cycle Clock CPI Count n Instructio Time CPU n Instructio per Cycles Count n Instructio Cycles Clock       
  • 48. CPI Example  Computer A: Cycle Time = 250ps, CPI = 2.0  Computer B: Cycle Time = 500ps, CPI = 1.2  Same ISA  Which is faster, and by how much? 1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU                     A is faster… …by this much
  • 49. CPI in More Detail  If different instruction classes take different numbers of cycles     n 1 i i i ) Count n Instructio (CPI Cycles Clock  Weighted average CPI            n 1 i i i Count n Instructio Count n Instructio CPI Count n Instructio Cycles Clock CPI Relative frequency
  • 50. CPI Example  Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1  Sequence 1: IC = 5  Clock Cycles = 2×1 + 1×2 + 2×3 = 10  Avg. CPI = 10/5 = 2.0  Sequence 2: IC = 6  Clock Cycles = 4×1 + 1×2 + 1×3 = 9  Avg. CPI = 9/6 = 1.5
  • 51. Performance Summary  Performance depends on  Algorithm: affects IC, possibly CPI  Programming language: affects IC, CPI  Compiler: affects IC, CPI  Instruction set architecture: affects IC, CPI, Tc The BIG Picture cycle Clock Seconds n Instructio cycles Clock Program ns Instructio Time CPU   
  • 52. Power Trends  In CMOS IC technology §1.5 The Power Wall Frequency Voltage load Capacitive Power 2    ×1000 ×30 5V → 1V
  • 53. Reducing Power  Suppose a new CPU has  85% of capacitive load of old CPU  15% voltage and 15% frequency reduction 0.52 0.85 F V C 0.85 F 0.85) (V 0.85 C P P 4 old 2 old old old 2 old old old new            The power wall  We can’t reduce voltage further  We can’t remove more heat  How else can we improve performance?
  • 55. Multiprocessors  Multicore microprocessors  More than one processor per chip  Requires explicitly parallel programming  Compare with instruction level parallelism - Hardware executes multiple instructions at once - Hidden from the programmer  Hard to do - Programming for performance - Load balancing - Optimizing communication and synchronization
  • 56. SPEC CPU Benchmark  Programs used to measure performance  Supposedly typical of actual workload  Standard Performance Evaluation Corp (SPEC)  Develops benchmarks for CPU, I/O, Web, …  SPEC CPU2006  Elapsed time to execute a selection of programs - Negligible I/O, so focuses on CPU performance  Normalize relative to reference machine  Summarize as geometric mean of performance ratios - CINT2006 (integer) and CFP2006 (floating-point) n n 1 i i ratio time Execution  
  • 57. CINT2006 for Opteron X4 2356 Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Geometric mean 11.7 High cache miss rates
  • 58. SPEC Power Benchmark  Power consumption of server at different workload levels  Performance: ssj_ops/sec  Power: Watts (Joules/sec)                  10 0 i i 10 0 i i power ssj_ops Watt per ssj_ops Overall
  • 59. SPECpower_ssj2008 for X4 Target Load % Performance (ssj_ops/sec) Average Power (Watts) 100% 231,867 295 90% 211,282 286 80% 185,803 275 70% 163,427 265 60% 140,160 256 50% 118,324 246 40% 920,35 233 30% 70,500 222 20% 47,126 206 10% 23,066 180 0% 0 141 Overall sum 1,283,590 2,605 ∑ssj_ops/ ∑power 493
  • 60. Pitfall: Amdahl’s Law  Improving an aspect of a computer and expecting a proportional improvement in overall performance §1.8 Fallacies and Pitfalls 20 80 20   n  Can’t be done! unaffected affected improved T factor t improvemen T T    Example: multiply accounts for 80s/100s  How much improvement in multiply performance to get 5× overall?  Corollary: make the common case fast
  • 61. Fallacy: Low Power at Idle  Look back at X4 power benchmark  At 100% load: 295W  At 50% load: 246W (83%)  At 10% load: 180W (61%)  Google data center  Mostly operates at 10% – 50% load  At 100% load less than 1% of the time  Consider designing processors to make power proportional to load
  • 62. Pitfall: MIPS as a Performance Metric  MIPS: Millions of Instructions Per Second  Doesn’t account for - Differences in ISAs between computers - Differences in complexity between instructions 6 6 6 10 CPI rate Clock 10 rate Clock CPI count n Instructio count n Instructio 10 time Execution count n Instructio MIPS         CPI varies between programs on a given CPU
  • 63. Concluding Remarks  Cost/performance is improving  Due to underlying technology development  Hierarchical layers of abstraction  In both hardware and software  Instruction set architecture  The hardware/software interface  Execution time: the best performance measure  Power is a limiting factor  Use parallelism to improve performance §1.9 Concluding Remarks