SlideShare a Scribd company logo
Introduction: 1-1
CS 3006 Parallel and Distributed Computer
Fall 2022
Week # 1 – Lecture # 1, 2, 3
22nd,23rd, 24th August 2022
23rd, 24th, 25th Muḥarram ul Haram, 1444
Dr. Nadeem Kafi Khan
Lecture # 1 - Topics
• Introduction
• Definition and Architecture block diagram
• Shared Memory Systems
• Distributed Memory Systems
CS 3006
• Key course for BS (CS)
• Most of the computation parallel and distributed and large-scale storage is now
distributed.
• Course Instructor:
• Dr. Nadeem Kafi Khan, Assistant Professor (CS)
• Office: Main Campus, CS Block. Ext. 131.
• Email: nadeem.kafi@nu.edu.pk (pls. send email from your @nu.edu.pk)
• Please pay attention to my emails and Google classroom posts.
• Course slides and other materials will be posted on Google Classroom.
• Participation in class and on Google Classroom
CS 3006
• Textbook
• Introductin to Parallel Computing 2nd Ed.
By Ananth Grama, Anshul Gupta, George Karypis,
Vipin Kumar
• Reference Materials
• Will be posted on Google classroom
CS 3006
• Contact Hours
• Lecture See timetable
• Consultancy Hours Will be posted later
• Interactions Google Classroom and/or Email
• Course Pre-requisites
• Programming, data structures and Operating Systems
• Computer Organization and Assembly Language
Week # 1.pdf
CS 3006
• Evaluation Criteria
• Assignments (14%) – Lab based (Last two will constitute Semester Project)
• Quiz (6%)
• Mid Term (15+15=30%)
• Final (50%)
• Active reading of textbook REQUIRED.
• Plagiarism will be marked as Zero.
• Late submissions are not allowed.
• Required Attendancee 80%
Week # 1.pdf
Week # 1.pdf
PDC topics Discussed in Lecture # 1
• Motivation for Parallel and Distributed Computing
• Why we need PDC? …Real world example(s).
• Parallel Computing paradigm
• Shared Memory architecture exploited by multi-threaded.
• Distributed Memory paradigm
• Distributed memory architecture exploited by multi-processes.
• The computational task submitted to a master process, which distributed work
(execution of code or processing of data) to other slave processes running on
different computers of the cluster. The slave processes will execute the task in
parallel and send results to master which is responsible to display results.
• Why a cluster of 60 computers is a distributed system?
BSC-5C
BSC-5E
Lecture # 2 - Topics
• Parallel Execution Terms and their definitions
• Scalability
PDC CLOs as per FAST-NU official document
Week # 1.pdf
Some General Parallel Terminology
• Task
• A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor.
• Parallel Task
• A task that can be executed by multiple processors safely (yields correct
results)
• Serial Execution
• Execution of a program sequentially, one statement at a time. In the simplest
sense, this is what happens on a one processor machine. However, virtually
all parallel tasks will have sections of a parallel program that must be
executed serially.
Symmetric vs. Asymmetric Multiprocessing Architecture
• Same type of processing elements vs Different type of processors element used in computations.
• Same type of Computation vs Different type of computations done of same processing elements.
• Parallel Execution
• Execution of a program by more than one task, with each task being able to
execute the same or different statement at the same moment in time.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture
where all processors have direct (usually bus based) access to common
physical memory. In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory
that is not common. As a programming model, tasks can only logically "see"
local machine memory and must use communications to access memory on
other machines where other tasks are executing.
Some General Parallel Terminology
Shared Memory vs. Distributed Memory
a) Shared Memory b) Distributed Memory
This network interconnect is
either very high-speed
Ethernet switch ~10Gbit or
even higher Infiniband or
other switches
• Communications
• Parallel tasks typically need to exchange data. There are several ways this can
be accomplished, such as through a shared memory bus or over a network,
however the actual event of data exchange is commonly referred to as
communications regardless of the method employed.
• Synchronization
• The coordination of parallel tasks in real time, very often associated with
communications. Often implemented by establishing a synchronization point
within an application where a task may not proceed further until another
task(s) reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task, and can
therefore cause a parallel application's wall clock execution time to increase.
Some General Parallel Terminology
• Scalability
• Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the
addition of more processors. Factors that contribute to scalability
include:
• Hardware - particularly memory-cpu bandwidths and network
communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding
Some General Parallel Terminology
BSC-5E
BSC-5E
BSC-5C
Lecture # 3 - Topics
• Overhead in Parallel and Distributed Computing
• Speed-up and Amdahl Law
• Flynn’s Taxonomy
• Granularity
• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
• Task termination time
• Massively Parallel
• Refers to the hardware that comprises a given parallel system - having many
processors. The meaning of many keeps increasing, but currently IBM Blue
Gene/L pushes this number to 6 digits.
Some General Parallel Terminology
• Observed Speedup
• Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a parallel program's
performance.
Some General Parallel Terminology
Week # 1.pdf
Flynn’s Taxonomy
PU = Processing Unit
Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction stream is
being acted on by the CPU during any one clock
cycle
• Single data: only one data stream is being used as
input during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the most
prevalent form of computer
• Examples: most PCs, single CPU workstations and
mainframes
Single Instruction, Multiple Data (SIMD)
• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal
network, and a very large array of very small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as image
processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples: Vectorization is a prime example of SIMD in
which the same instruction is performed across
multiple data. A variant of SIMD is single instruction,
multi-thread (SIMT), which is commonly used to
describe GPU workgroups.
Multiple Instruction, Single Data (MISD)
• A single data stream is fed into multiple processing units.
• Each processing unit operates on the data independently via independent
instruction streams. This is not a common architecture.
• Few actual examples of this class of parallel computer have ever existed.
One is the expérimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.
• Redundant computation on the same data. This is used in
highly fault-tolerant approaches such as spacecraft
controllers. Because spacecraft are in high radiation
environments, these often run two copies of each
calculation and compare the output of the two.
Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel computer. Most modern
computers fall into this category.
• Multiple Instruction: every processor may be executing a different
instruction stream
• Multiple Data: every processor may be working with a different data
stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Examples: most current supercomputers, networked parallel computer
"grids" and multi-processor SMP computers - including some types of
PCs.
• The final category has parallelization in both
instructions and data and is referred to as MIMD.
This category describes multi-core parallel
architectures that comprise the majority of large
parallel systems.
Week # 1.pdf

More Related Content

PPTX
Parallel & Distributed processing
PPT
Lecture 2
PDF
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
PPTX
Parallel architecture-programming
PPTX
Parallel architecture &programming
PPTX
PPTX
PPTX
High performance computing
Parallel & Distributed processing
Lecture 2
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
Parallel architecture-programming
Parallel architecture &programming
High performance computing

Similar to Week # 1.pdf (20)

PPT
Lecture1
PPTX
Lec 2 (parallel design and programming)
PDF
Cloud Computing notes ccomputing paradigms UNIT 1.pdf
PPTX
CC unit 1.pptx
PPTX
Parallel processing
DOC
Aca module 1
PPTX
CSA unit5.pptx
PPTX
Module 3 - DBMS System Architecture Principles
PPTX
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
PPT
01-MessagePassingFundamentals.ppt
PPT
EMBEDDED OS
PDF
PDF
Lecture 2 more about parallel computing
PPTX
2Chapter Two- Process Management(2) (1).pptx
PPTX
Parallel computing
PDF
Unit 5 Advanced Computer Architecture
PPTX
Parallel Computing-Part-1.pptx
PDF
CS471- Parallel Processing-Lecture 0-Introduction&Plan.pdf
PPTX
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Lecture1
Lec 2 (parallel design and programming)
Cloud Computing notes ccomputing paradigms UNIT 1.pdf
CC unit 1.pptx
Parallel processing
Aca module 1
CSA unit5.pptx
Module 3 - DBMS System Architecture Principles
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
01-MessagePassingFundamentals.ppt
EMBEDDED OS
Lecture 2 more about parallel computing
2Chapter Two- Process Management(2) (1).pptx
Parallel computing
Unit 5 Advanced Computer Architecture
Parallel Computing-Part-1.pptx
CS471- Parallel Processing-Lecture 0-Introduction&Plan.pdf
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Ad

Recently uploaded (20)

PDF
witch fraud storyboard sequence-_1x1.pdf
PPTX
Cloud Computing ppt.ppt1QU4FFIWEKWEIFRRGx
PPTX
Copy of liver-cancer-case-study.pptx.pptx
PPTX
Understanding Postmodernism Powerpoint.pptx
PPTX
CMU-PPT-LACHICA-DEFENSE FOR RESEARCH PRESENTATION
PPTX
4277547e-f8e2-414e-8962-bf501ea91259.pptx
PPTX
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
PDF
Triangle of photography : aperture, exposure and ISO
PPSX
opcua_121710.ppsxthsrtuhrbxdtnhtdtndtyty
PPTX
Q1_TLE_8_Week_2asfsdgsgsdgdsgfasdgwrgrgqrweg
PPTX
DIMAYUGA ANDEA MAE P. BSED ENG 3-2 (CHAPTER 7).pptx
PPTX
Review1_Bollywood_Project analysis of bolywood trends from 1950s to 2025
PPTX
current by laws xxxxxxxxxxxxxxxxxxxxxxxxxxx
PPTX
Art Appreciation-Lesson-1-1.pptx College
PPTX
Visual-Arts.pptx power point elements of art the line, shape, form
PPTX
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
PPTX
Military history & Evolution of Armed Forces of the Philippines
PDF
waiting, Queuing, best time an event cab be done at a time .pdf
PPTX
Theatre Studies - Powerpoint Entertainmn
PPTX
Copy of Executive Design Pitch Deck by Slidesgo.pptx.pptx
witch fraud storyboard sequence-_1x1.pdf
Cloud Computing ppt.ppt1QU4FFIWEKWEIFRRGx
Copy of liver-cancer-case-study.pptx.pptx
Understanding Postmodernism Powerpoint.pptx
CMU-PPT-LACHICA-DEFENSE FOR RESEARCH PRESENTATION
4277547e-f8e2-414e-8962-bf501ea91259.pptx
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
Triangle of photography : aperture, exposure and ISO
opcua_121710.ppsxthsrtuhrbxdtnhtdtndtyty
Q1_TLE_8_Week_2asfsdgsgsdgdsgfasdgwrgrgqrweg
DIMAYUGA ANDEA MAE P. BSED ENG 3-2 (CHAPTER 7).pptx
Review1_Bollywood_Project analysis of bolywood trends from 1950s to 2025
current by laws xxxxxxxxxxxxxxxxxxxxxxxxxxx
Art Appreciation-Lesson-1-1.pptx College
Visual-Arts.pptx power point elements of art the line, shape, form
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
Military history & Evolution of Armed Forces of the Philippines
waiting, Queuing, best time an event cab be done at a time .pdf
Theatre Studies - Powerpoint Entertainmn
Copy of Executive Design Pitch Deck by Slidesgo.pptx.pptx
Ad

Week # 1.pdf

  • 1. Introduction: 1-1 CS 3006 Parallel and Distributed Computer Fall 2022 Week # 1 – Lecture # 1, 2, 3 22nd,23rd, 24th August 2022 23rd, 24th, 25th Muḥarram ul Haram, 1444 Dr. Nadeem Kafi Khan
  • 2. Lecture # 1 - Topics • Introduction • Definition and Architecture block diagram • Shared Memory Systems • Distributed Memory Systems
  • 3. CS 3006 • Key course for BS (CS) • Most of the computation parallel and distributed and large-scale storage is now distributed. • Course Instructor: • Dr. Nadeem Kafi Khan, Assistant Professor (CS) • Office: Main Campus, CS Block. Ext. 131. • Email: [email protected] (pls. send email from your @nu.edu.pk) • Please pay attention to my emails and Google classroom posts. • Course slides and other materials will be posted on Google Classroom. • Participation in class and on Google Classroom
  • 4. CS 3006 • Textbook • Introductin to Parallel Computing 2nd Ed. By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar • Reference Materials • Will be posted on Google classroom
  • 5. CS 3006 • Contact Hours • Lecture See timetable • Consultancy Hours Will be posted later • Interactions Google Classroom and/or Email • Course Pre-requisites • Programming, data structures and Operating Systems • Computer Organization and Assembly Language
  • 7. CS 3006 • Evaluation Criteria • Assignments (14%) – Lab based (Last two will constitute Semester Project) • Quiz (6%) • Mid Term (15+15=30%) • Final (50%) • Active reading of textbook REQUIRED. • Plagiarism will be marked as Zero. • Late submissions are not allowed. • Required Attendancee 80%
  • 10. PDC topics Discussed in Lecture # 1 • Motivation for Parallel and Distributed Computing • Why we need PDC? …Real world example(s). • Parallel Computing paradigm • Shared Memory architecture exploited by multi-threaded. • Distributed Memory paradigm • Distributed memory architecture exploited by multi-processes. • The computational task submitted to a master process, which distributed work (execution of code or processing of data) to other slave processes running on different computers of the cluster. The slave processes will execute the task in parallel and send results to master which is responsible to display results. • Why a cluster of 60 computers is a distributed system?
  • 13. Lecture # 2 - Topics • Parallel Execution Terms and their definitions • Scalability
  • 14. PDC CLOs as per FAST-NU official document
  • 16. Some General Parallel Terminology • Task • A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. • Parallel Task • A task that can be executed by multiple processors safely (yields correct results) • Serial Execution • Execution of a program sequentially, one statement at a time. In the simplest sense, this is what happens on a one processor machine. However, virtually all parallel tasks will have sections of a parallel program that must be executed serially.
  • 17. Symmetric vs. Asymmetric Multiprocessing Architecture • Same type of processing elements vs Different type of processors element used in computations. • Same type of Computation vs Different type of computations done of same processing elements.
  • 18. • Parallel Execution • Execution of a program by more than one task, with each task being able to execute the same or different statement at the same moment in time. • Shared Memory • From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. • Distributed Memory • In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing. Some General Parallel Terminology
  • 19. Shared Memory vs. Distributed Memory a) Shared Memory b) Distributed Memory This network interconnect is either very high-speed Ethernet switch ~10Gbit or even higher Infiniband or other switches
  • 20. • Communications • Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as communications regardless of the method employed. • Synchronization • The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. • Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase. Some General Parallel Terminology
  • 21. • Scalability • Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more processors. Factors that contribute to scalability include: • Hardware - particularly memory-cpu bandwidths and network communications • Application algorithm • Parallel overhead related • Characteristics of your specific application and coding Some General Parallel Terminology
  • 25. Lecture # 3 - Topics • Overhead in Parallel and Distributed Computing • Speed-up and Amdahl Law • Flynn’s Taxonomy • Granularity
  • 26. • Parallel Overhead • The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as: • Task start-up time • Synchronizations • Data communications • Software overhead imposed by parallel compilers, libraries, tools, operating system, etc. • Task termination time • Massively Parallel • Refers to the hardware that comprises a given parallel system - having many processors. The meaning of many keeps increasing, but currently IBM Blue Gene/L pushes this number to 6 digits. Some General Parallel Terminology
  • 27. • Observed Speedup • Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution wall-clock time of parallel execution • One of the simplest and most widely used indicators for a parallel program's performance. Some General Parallel Terminology
  • 29. Flynn’s Taxonomy PU = Processing Unit
  • 30. Single Instruction, Single Data (SISD) • A serial (non-parallel) computer • Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle • Single data: only one data stream is being used as input during any one clock cycle • Deterministic execution • This is the oldest and until recently, the most prevalent form of computer • Examples: most PCs, single CPU workstations and mainframes
  • 31. Single Instruction, Multiple Data (SIMD) • A type of parallel computer • Single instruction: All processing units execute the same instruction at any given clock cycle • Multiple data: Each processing unit can operate on a different data element • This type of machine typically has an instruction dispatcher, a very high-bandwidth internal network, and a very large array of very small-capacity instruction units. • Best suited for specialized problems characterized by a high degree of regularity, such as image processing. • Synchronous (lockstep) and deterministic execution • Two varieties: Processor Arrays and Vector Pipelines • Examples: Vectorization is a prime example of SIMD in which the same instruction is performed across multiple data. A variant of SIMD is single instruction, multi-thread (SIMT), which is commonly used to describe GPU workgroups.
  • 32. Multiple Instruction, Single Data (MISD) • A single data stream is fed into multiple processing units. • Each processing unit operates on the data independently via independent instruction streams. This is not a common architecture. • Few actual examples of this class of parallel computer have ever existed. One is the expérimental Carnegie-Mellon C.mmp computer (1971). • Some conceivable uses might be: • multiple frequency filters operating on a single signal stream • multiple cryptography algorithms attempting to crack a single coded message. • Redundant computation on the same data. This is used in highly fault-tolerant approaches such as spacecraft controllers. Because spacecraft are in high radiation environments, these often run two copies of each calculation and compare the output of the two.
  • 33. Multiple Instruction, Multiple Data (MIMD) • Currently, the most common type of parallel computer. Most modern computers fall into this category. • Multiple Instruction: every processor may be executing a different instruction stream • Multiple Data: every processor may be working with a different data stream • Execution can be synchronous or asynchronous, deterministic or non- deterministic • Examples: most current supercomputers, networked parallel computer "grids" and multi-processor SMP computers - including some types of PCs. • The final category has parallelization in both instructions and data and is referred to as MIMD. This category describes multi-core parallel architectures that comprise the majority of large parallel systems.