SlideShare a Scribd company logo
ADVANCED COMPUTER ARCHITECTURE
AND PARALLEL PROCESSING
Shared memory architecture
1- Shared memory systems form a major category of multiprocessors.
2- In this category all processors share a global memory.
3- Communication between tasks running on different processors is performed
through writing to and reading from the global memory.
4- All interprocessor coordination and synchronization is also accomplished via
the global memory.
5- A shared memory computer system consists of :
- a set of independent processors.
- a set of memory modules, and an interconnection network as shown in
following Figure 4.1
6- Two main problems need to be addressed when designing a shared memory
system:
• performance degradation due to contention, and
• coherence problems
2
Shared memory architecture cont.
3
Shared memory architecture cont
• Performance degradation might happen when multiple processors are trying to
access the shared memory simultaneously. ?
• Caches : having multiple copies of data, spread throughout the caches might lead
to a coherence problem
- The copies in the caches are coherent if they are all equal to the same value.
- If one of the processors writes over the value of one of the copies, then the copy
becomes inconsistent because it no longer equals the value of the other copies.
4
4.1 CLASSIFICATION OF SHARED
MEMORY SYSTEMS
• The simplest shared memory system consists of one memory module (M)
that can be accessed from two processors (P1 and P2).
• An arbitration unit within the memory module passes requests through to a
memory controller. If the memory module is not busy and a single request
arrives, then the arbitration unit passes that request to the memory
controller and the request is satisfied. The module is placed in the busy
state while a request is being serviced. If a new request arrives while the
memory is busy servicing a previous request, the memory module sends a
wait signal, through the memory controller, to the processor making the new
request. 5
4.1 CLASSIFICATION OF SHARED
MEMORY SYSTEMS cont.
• In response, the requesting processor may hold its request on the line until the
memory becomes free or it may repeat its request some time later. If the
arbitration unit receives two requests, it selects one of them and passes it to the
memory controller. Again, the denied request can be either held to be served
next or it may be repeated some time later.
• Categories of shared memory systems Based on the interconnection network:
 Uniform Memory Access (UMA) .
 Non-uniform Memory Access (NUMA)
 Cache Only Memory Architecture (COMA)
6
Uniform Memory Access (UMA)
 In the UMA system a shared memory is accessible by all processors through an
interconnection network in the same way a single processor accesses its
memory.
 All processors have equal access time to any memory location.
 The interconnection network used: a single bus, multiple buses, or a crossbar
switch(diagonal,straight).
 Called it SMP (symmetric multiprocessor) systems?
 access to shared memory is balanced (Each processor has equal opportunity to
read/write to memory, including equal access speed).
7
Nonuniform Memory Access
(NUMA)
• each processor has part of the shared memory attached. The memory has a
single address space. Therefore, any processor could access any memory
location directly using its real address. However, the access time to modules
depends on the distance to the processor.
• the tree and the hierarchical bus networks are architectures used to interconnect
processors.
8
Cache-Only Memory Architecture
(COMA)
• Similar to the NUMA, each processor has part of the shared memory in the
COMA. However, in this case the shared memory consists of cache memory.
• requires that data be migrated to the processor requesting it.
• There is no memory hierarchy and the address space is made of all the caches.
There is a cache directory (D) that helps in remote cache access.
9
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS
 Shared memory systems can be designed using :
• bus-based (simplest network for shared memory systems).
• switch-based interconnection networks (message passing).
 The bus/cache architecture alleviates the need for expensive multi-ported
memories and interface circuitry as well as the need to adopt a message-
passing paradigm when developing application software.
 the bus may get saturated if multiple processors are trying to access the shared
memory (via the bus) simultaneously. (contention problem) solve uses Caches
 High speed caches connected to each processor on one side and the bus on the
other side mean that local copies of instructions and data can be supplied at the
highest possible rate.
 If the local processor finds all of its instructions and data in the local cache hit
rate is 100%, otherwise miss rate of a cache, so must be copied from the
global memory, across the bus, into the cache, and then passed on to the local
processor. 10
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
 One of the goals of the cache is to maintain a high hit rate, or low miss rate
under high processor loads. A high hit rate means the processors are not using
the bus as much. Hit rates are determined by a number of factors, ranging from
the application programs being run to the manner in which cache hardware is
implemented.
11
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• We define the variables for hit rate, number of processors, processor speed, bus
speed, and processor duty cycle rates as follows:
 N = number of processors;
 h = hit rate of each cache, assumed to be the same for all caches;
 (1 - h) = miss rate of all caches;
 B = bandwidth of the bus, measured in cycles/second;
 I = processor duty cycle, assumed to be identical for all processors, in
fetches/cycle; and
 V = peak processor speed, in fetches/second
12
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• The effective bandwidth of the bus is BI fetches/second.
If each processor is running at a speed of V, then misses are being generated
at a rate of V(1 - h). For an N-processor system, misses are simultaneously
being generated at a rate of N(1 - h)V. This leads to saturation of the bus when
N processors simultaneously try to access the bus. That is, 𝑵 𝟏 − 𝒉 𝑽 ≤ 𝑩𝑰.
The maximum number of processors with cache memories that the bus can
support is given by the relation,
𝑁 ≤
𝐵𝐼
1−ℎ 𝑉
13
4.2 BUS-BASED SYMMETRIC
MULTIPROCESSORS cont.
• Example 1
Suppose a shared memory system is constructed from processors that can
execute V = 107 instructions/s and the processor duty cycle I = 1. The caches
are designed to support a hit rate of 97%, and the bus supports a peak
bandwidth of B = 106 cycles/s. Then, (1 - h) = 0.03, and the maximum number
of processors N Is 𝑁 ≤
106∗1
(0.03∗107)
=3.33.
Thus, the system we have in mind can support only three processors!
We might ask what hit rate is needed to support a 30-processor system. In this
case, h = 1- BI/NV = 1 - (106(1))/((30)(107)) = 1 - 1/300, so for the system we
have in mind, h = 0.9967. Increasing h by 2.8%. results in supporting a factor of
ten more processors.
14
4.3 BASIC CACHE COHERENCY
METHODS
• Multiple copies of data, spread throughout the caches, lead to a coherence
problem among the caches. The copies in the caches are coherent if they all
equal the same value. However, if one of the processors writes over the value of
one of the copies, then the copy becomes inconsistent because it no longer
equals the value of the other copies. If data are allowed to become inconsistent
(incoherent), incorrect results will be propagated through the system, leading to
incorrect final results. Cache coherence algorithms are needed to maintain a
level of consistency throughout the parallel system.
Copies data in the
caches
coherence inconsistent
15
Cache–Memory Coherence
 In a single cache system, coherence between memory and the cache is
maintained using one of two policies:
• write-through:
The memory is updated every time the cache is updated.
• write-back:
The memory is updated only when the block in the cache is being
replaced.
16
Cache–Memory Coherence
17
Cache–Cache Coherence
• In multiprocessing system, when a task running on processor P requests the data in
global memory location X, for example, the contents of X are copied to processor P’s
local cache, where it is passed on to P. Now, suppose processor Q also accesses X.
What happens if Q wants to write a new value over the old value of X?
 There are two fundamental cache coherence policies:
 write-invalidate
Maintains consistency by reading from local caches until a write occurs.
 write-update
Maintains consistency by immediately updating all copies in all caches.
18
Cache–Cache Coherence
19
Shared Memory System
Coherence
• The four combinations to maintain coherence among all caches and
global memory are:
 Write-update and write-through
 Write-update and write-back
 Write-invalidate and write-through
 Write-invalidate and write-back
• If we permit a write-update and write-through directly on global memory
location X, the bus would start to get busy and ultimately all processors
would be idle while waiting for writes to complete. In write-update and write-
back, only copies in all caches are updated. On the contrary, if the write is
limited to the copy of X in cache Q, the caches become inconsistent on X.
Setting the dirty bit prevents the spread of inconsistent values of X, but at
some point, the inconsistent copies must be updated.
20

More Related Content

PPTX
Multiprocessor Architecture (Advanced computer architecture)
PPTX
contiguous memory allocation.pptx
DOCX
Parallel computing persentation
PDF
Parallel programming model, language and compiler in ACA.
PDF
Parallel Algorithms
PPTX
Parallel Programing Model
PPTX
Memory management ppt
PPTX
Inter Process Communication
Multiprocessor Architecture (Advanced computer architecture)
contiguous memory allocation.pptx
Parallel computing persentation
Parallel programming model, language and compiler in ACA.
Parallel Algorithms
Parallel Programing Model
Memory management ppt
Inter Process Communication

What's hot (20)

PPTX
Distributed and clustered systems
PPTX
Synchronization in distributed computing
PPT
Parallel Processing Concepts
PPT
Distributed system
PPT
distributed shared memory
PDF
Unit 5 Advanced Computer Architecture
PPTX
Swap space management and protection in os
PPTX
Distributed computing
PPT
Introduction to MPI
PPS
Virtual memory
PPTX
Memory interleaving
PPTX
Parallel processing
PPT
parallel programming models
PPTX
Shared Memory Multi Processor
PPTX
SYNCHRONIZATION IN MULTIPROCESSING
PPT
Distributed computing
PPTX
Applications of paralleL processing
PDF
Cs8493 unit 4
PPTX
Cache memory principles
PPT
multiprocessors and multicomputers
Distributed and clustered systems
Synchronization in distributed computing
Parallel Processing Concepts
Distributed system
distributed shared memory
Unit 5 Advanced Computer Architecture
Swap space management and protection in os
Distributed computing
Introduction to MPI
Virtual memory
Memory interleaving
Parallel processing
parallel programming models
Shared Memory Multi Processor
SYNCHRONIZATION IN MULTIPROCESSING
Distributed computing
Applications of paralleL processing
Cs8493 unit 4
Cache memory principles
multiprocessors and multicomputers
Ad

Similar to ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING (20)

PDF
Week5
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PDF
Aca2 07 new
PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
PDF
22CS201 COA
PPT
Executing Multiple Thread on Modern Processor
PDF
Lect18
PPTX
Introduction to Thread Level Parallelism
PDF
Lecture 6.1
PDF
Lecture 6.1
PDF
Lecture 6.1
PDF
Distributed system lectures
PPT
module4.ppt
PPT
Distributed shared memory in distributed systems.ppt
PPTX
Lecture5
PDF
07-multiprocessors-cccffw whoo ofsvnk cfchjMF.pdf
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
PDF
Multiprocessor
PPTX
PPT
Lecture4
Week5
KA 5 - Lecture 1 - Parallel Processing.pdf
Aca2 07 new
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
22CS201 COA
Executing Multiple Thread on Modern Processor
Lect18
Introduction to Thread Level Parallelism
Lecture 6.1
Lecture 6.1
Lecture 6.1
Distributed system lectures
module4.ppt
Distributed shared memory in distributed systems.ppt
Lecture5
07-multiprocessors-cccffw whoo ofsvnk cfchjMF.pdf
chapter-6-multiprocessors-and-thread-level (1).ppt
Multiprocessor
Lecture4
Ad

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Machine Learning_overview_presentation.pptx
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Machine Learning_overview_presentation.pptx
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25-Week II
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx

ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

  • 2. Shared memory architecture 1- Shared memory systems form a major category of multiprocessors. 2- In this category all processors share a global memory. 3- Communication between tasks running on different processors is performed through writing to and reading from the global memory. 4- All interprocessor coordination and synchronization is also accomplished via the global memory. 5- A shared memory computer system consists of : - a set of independent processors. - a set of memory modules, and an interconnection network as shown in following Figure 4.1 6- Two main problems need to be addressed when designing a shared memory system: • performance degradation due to contention, and • coherence problems 2
  • 4. Shared memory architecture cont • Performance degradation might happen when multiple processors are trying to access the shared memory simultaneously. ? • Caches : having multiple copies of data, spread throughout the caches might lead to a coherence problem - The copies in the caches are coherent if they are all equal to the same value. - If one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer equals the value of the other copies. 4
  • 5. 4.1 CLASSIFICATION OF SHARED MEMORY SYSTEMS • The simplest shared memory system consists of one memory module (M) that can be accessed from two processors (P1 and P2). • An arbitration unit within the memory module passes requests through to a memory controller. If the memory module is not busy and a single request arrives, then the arbitration unit passes that request to the memory controller and the request is satisfied. The module is placed in the busy state while a request is being serviced. If a new request arrives while the memory is busy servicing a previous request, the memory module sends a wait signal, through the memory controller, to the processor making the new request. 5
  • 6. 4.1 CLASSIFICATION OF SHARED MEMORY SYSTEMS cont. • In response, the requesting processor may hold its request on the line until the memory becomes free or it may repeat its request some time later. If the arbitration unit receives two requests, it selects one of them and passes it to the memory controller. Again, the denied request can be either held to be served next or it may be repeated some time later. • Categories of shared memory systems Based on the interconnection network:  Uniform Memory Access (UMA) .  Non-uniform Memory Access (NUMA)  Cache Only Memory Architecture (COMA) 6
  • 7. Uniform Memory Access (UMA)  In the UMA system a shared memory is accessible by all processors through an interconnection network in the same way a single processor accesses its memory.  All processors have equal access time to any memory location.  The interconnection network used: a single bus, multiple buses, or a crossbar switch(diagonal,straight).  Called it SMP (symmetric multiprocessor) systems?  access to shared memory is balanced (Each processor has equal opportunity to read/write to memory, including equal access speed). 7
  • 8. Nonuniform Memory Access (NUMA) • each processor has part of the shared memory attached. The memory has a single address space. Therefore, any processor could access any memory location directly using its real address. However, the access time to modules depends on the distance to the processor. • the tree and the hierarchical bus networks are architectures used to interconnect processors. 8
  • 9. Cache-Only Memory Architecture (COMA) • Similar to the NUMA, each processor has part of the shared memory in the COMA. However, in this case the shared memory consists of cache memory. • requires that data be migrated to the processor requesting it. • There is no memory hierarchy and the address space is made of all the caches. There is a cache directory (D) that helps in remote cache access. 9
  • 10. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS  Shared memory systems can be designed using : • bus-based (simplest network for shared memory systems). • switch-based interconnection networks (message passing).  The bus/cache architecture alleviates the need for expensive multi-ported memories and interface circuitry as well as the need to adopt a message- passing paradigm when developing application software.  the bus may get saturated if multiple processors are trying to access the shared memory (via the bus) simultaneously. (contention problem) solve uses Caches  High speed caches connected to each processor on one side and the bus on the other side mean that local copies of instructions and data can be supplied at the highest possible rate.  If the local processor finds all of its instructions and data in the local cache hit rate is 100%, otherwise miss rate of a cache, so must be copied from the global memory, across the bus, into the cache, and then passed on to the local processor. 10
  • 11. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont.  One of the goals of the cache is to maintain a high hit rate, or low miss rate under high processor loads. A high hit rate means the processors are not using the bus as much. Hit rates are determined by a number of factors, ranging from the application programs being run to the manner in which cache hardware is implemented. 11
  • 12. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • We define the variables for hit rate, number of processors, processor speed, bus speed, and processor duty cycle rates as follows:  N = number of processors;  h = hit rate of each cache, assumed to be the same for all caches;  (1 - h) = miss rate of all caches;  B = bandwidth of the bus, measured in cycles/second;  I = processor duty cycle, assumed to be identical for all processors, in fetches/cycle; and  V = peak processor speed, in fetches/second 12
  • 13. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • The effective bandwidth of the bus is BI fetches/second. If each processor is running at a speed of V, then misses are being generated at a rate of V(1 - h). For an N-processor system, misses are simultaneously being generated at a rate of N(1 - h)V. This leads to saturation of the bus when N processors simultaneously try to access the bus. That is, 𝑵 𝟏 − 𝒉 𝑽 ≤ 𝑩𝑰. The maximum number of processors with cache memories that the bus can support is given by the relation, 𝑁 ≤ 𝐵𝐼 1−ℎ 𝑉 13
  • 14. 4.2 BUS-BASED SYMMETRIC MULTIPROCESSORS cont. • Example 1 Suppose a shared memory system is constructed from processors that can execute V = 107 instructions/s and the processor duty cycle I = 1. The caches are designed to support a hit rate of 97%, and the bus supports a peak bandwidth of B = 106 cycles/s. Then, (1 - h) = 0.03, and the maximum number of processors N Is 𝑁 ≤ 106∗1 (0.03∗107) =3.33. Thus, the system we have in mind can support only three processors! We might ask what hit rate is needed to support a 30-processor system. In this case, h = 1- BI/NV = 1 - (106(1))/((30)(107)) = 1 - 1/300, so for the system we have in mind, h = 0.9967. Increasing h by 2.8%. results in supporting a factor of ten more processors. 14
  • 15. 4.3 BASIC CACHE COHERENCY METHODS • Multiple copies of data, spread throughout the caches, lead to a coherence problem among the caches. The copies in the caches are coherent if they all equal the same value. However, if one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer equals the value of the other copies. If data are allowed to become inconsistent (incoherent), incorrect results will be propagated through the system, leading to incorrect final results. Cache coherence algorithms are needed to maintain a level of consistency throughout the parallel system. Copies data in the caches coherence inconsistent 15
  • 16. Cache–Memory Coherence  In a single cache system, coherence between memory and the cache is maintained using one of two policies: • write-through: The memory is updated every time the cache is updated. • write-back: The memory is updated only when the block in the cache is being replaced. 16
  • 18. Cache–Cache Coherence • In multiprocessing system, when a task running on processor P requests the data in global memory location X, for example, the contents of X are copied to processor P’s local cache, where it is passed on to P. Now, suppose processor Q also accesses X. What happens if Q wants to write a new value over the old value of X?  There are two fundamental cache coherence policies:  write-invalidate Maintains consistency by reading from local caches until a write occurs.  write-update Maintains consistency by immediately updating all copies in all caches. 18
  • 20. Shared Memory System Coherence • The four combinations to maintain coherence among all caches and global memory are:  Write-update and write-through  Write-update and write-back  Write-invalidate and write-through  Write-invalidate and write-back • If we permit a write-update and write-through directly on global memory location X, the bus would start to get busy and ultimately all processors would be idle while waiting for writes to complete. In write-update and write- back, only copies in all caches are updated. On the contrary, if the write is limited to the copy of X in cache Q, the caches become inconsistent on X. Setting the dirty bit prevents the spread of inconsistent values of X, but at some point, the inconsistent copies must be updated. 20