SlideShare a Scribd company logo
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Needs buffers to store results across stages.
A cache to handle slow memory access times
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
Fetch Decode Execute Write
t=1 2 3 4 5
Cache
CPU Architecture
4 stages of instruction execution
Too many cycles per instruction (CPI)
To reduce the CPI, introduce  pipelined execution
Needs buffers to store results across stages.
A cache to handle slow memory access times
Multilevel caches, out­of­order execution, branch prediction, ...
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
CPU architecture getting too complex.
Not translating to equivalent performance 
benefits
Need a rethink on traditional CPU architectures.
PPoPP 2010, Bangalore, India.
Basic Architecture Concepts
Couple with this the new wisdom in computer 
architectures.
 Memory Wall – memory latencies far higher
 ILP Wall – Reducing benefits from instruction 
level parallelism
 Power Wall – Increase in power consumption 
with increase in clock rates.
Multi­core is the way forward
Ex: GPUs, Cell, Intel Quad core, ...
Predicted that 100+ core computers would be a 
reality soon.
PPoPP 2010, Bangalore, India.
Multicore and Manycore Processors
IBM Cell
NVidia GeForce 8800 includes 128 scalar processors 
and Tesla
Sun T1 and T2
Tilera Tile64
Picochip combines 430 simple RISC cores
Cisco 188
TRIPS 
PPoPP 2010, Bangalore, India.
The Case for the GPUs
GPUs are now common. They also have high computing 
power per dollar, compared to the CPU
Today’s computer system has a CPU and a GPU, with the 
GPU being used primarily for graphics.
GPUs are good at some tasks and not so good at others.
They are especially good at processing large data such as 
images.
Let us use the right processor for the right task.
Goal: Increase the overall throughput of the computer system 
on the given task. Use CPU and GPU synergistically.
PPoPP 2010, Bangalore, India.
Evolution of GPUs
Graphics: a few hundred triangles/vertices map to afewhundred 
thousand pixels
Process pixels in parallel. Do the same thing on a large number of 
different items.
Data parallel model : parallelism provided by the data
Thousands to millions of data elements
Same program/instruction on all of them 
Hardware: 8­16  cores to process vertices and 64­128 to process 
pixels by 2005
Less versatile than CPU cores
SIMD mode of computations. Less hardware for instruction issue
No caching, branch prediction, out of order execution, etc.
‐ ‐
Can pack more cores in same silicon die area
PPoPP 2010, Bangalore, India.
GPUs as a Case Study
GPGPU – General Purpose Programming on 
GPUs 
OpenGL extensions 
Very difficult to program
Recently manufactures started supporting C­like 
 
programming abstraction to program GPUs
CUDA from NVidia
Other benefits of GPGPU
Affordable cost, easy availability, computational 
power
PPoPP 2010, Bangalore, India.
GPUs as a Case Study
GPUs suited for routines with high arithmetic 
intensity.
One feature is high memory latency, depending 
on the nature of access.
Should overlap memory with arithmetic.
PPoPP 2010, Bangalore, India.
CPU Vs GPU
Few powerful cores Vs. lots of small cores

GPUs: For good performance, applications 
need high arithmetic intensity
GPUs : No system managed cache.
PPoPP 2010, Bangalore, India.
GPGPU as a Case Study
Regular algorithms
Map well to data parallel model of GPUs
Each work item operates by itself or with a few 
neighbors
Example settings : image processing.
Threads can share data, e.g., apron pixels in an 
image processing kernel.
PPoPP 2010, Bangalore, India.
GPU as a Case Study
Irregular algorithms
Applications with data accesses that are not regular 
in nature.
Occurs in settings such as graph algorithms, data 
structures building, etc.
Difficult to get high efficiency due to high memory 
latency of accesses. 
PPoPP 2010, Bangalore, India.
GPGPU Tools and APIs
OpenGL
CUDA
OpenCL
Brook

More Related Content

PDF
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
PDF
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
PDF
Adjusting Bitset for graph : SHORT REPORT / NOTES
PDF
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
PDF
Adjusting primitives for graph : SHORT REPORT / NOTES
PDF
Experiments with Primitive operations : SHORT REPORT / NOTES
PDF
PageRank Experiments : SHORT REPORT / NOTES
PDF
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Adjusting Bitset for graph : SHORT REPORT / NOTES
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Adjusting primitives for graph : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTES
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

More from Subhajit Sahu (20)

PDF
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
PDF
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
PDF
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
PDF
Shared memory Parallelism (NOTES)
PDF
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
PDF
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
PDF
Application Areas of Community Detection: A Review : NOTES
PDF
Community Detection on the GPU : NOTES
PDF
Survey for extra-child-process package : NOTES
PDF
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
PDF
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
PDF
Fast Incremental Community Detection on Dynamic Graphs : NOTES
PDF
Can you fix farming by going back 8000 years : NOTES
PDF
HITS algorithm : NOTES
PDF
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
PDF
Are Satellites Covered in Gold Foil : NOTES
PDF
Taxation for Traders < Markets and Taxation : NOTES
PDF
A Generalization of the PageRank Algorithm : NOTES
PDF
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
PDF
Income Tax Calender 2021 (ITD) : NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Shared memory Parallelism (NOTES)
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Application Areas of Community Detection: A Review : NOTES
Community Detection on the GPU : NOTES
Survey for extra-child-process package : NOTES
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Can you fix farming by going back 8000 years : NOTES
HITS algorithm : NOTES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Are Satellites Covered in Gold Foil : NOTES
Taxation for Traders < Markets and Taxation : NOTES
A Generalization of the PageRank Algorithm : NOTES
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
Income Tax Calender 2021 (ITD) : NOTES
Ad

Basic Computer Architecture and the Case for GPUs : NOTES