Week # 1.pdf

Introduction: 1-1
CS 3006 Parallel and Distributed Computer
Fall 2022
Week # 1 – Lecture # 1, 2, 3
22nd,23rd, 24th August 2022
23rd, 24th, 25th Muḥarram ul Haram, 1444
Dr. Nadeem Kafi Khan

Lecture # 1 - Topics
• Introduction
• Definition and Architecture block diagram
• Shared Memory Systems
• Distributed Memory Systems

CS 3006
• Key course for BS (CS)
• Most of the computation parallel and distributed and large-scale storage is now
distributed.
• Course Instructor:
• Dr. Nadeem Kafi Khan, Assistant Professor (CS)
• Office: Main Campus, CS Block. Ext. 131.
• Email: nadeem.kafi@nu.edu.pk (pls. send email from your @nu.edu.pk)
• Please pay attention to my emails and Google classroom posts.
• Course slides and other materials will be posted on Google Classroom.
• Participation in class and on Google Classroom

CS 3006
• Textbook
• Introductin to Parallel Computing 2nd Ed.
By Ananth Grama, Anshul Gupta, George Karypis,
Vipin Kumar
• Reference Materials
• Will be posted on Google classroom

CS 3006
• Contact Hours
• Lecture See timetable
• Consultancy Hours Will be posted later
• Interactions Google Classroom and/or Email
• Course Pre-requisites
• Programming, data structures and Operating Systems
• Computer Organization and Assembly Language

CS 3006
• Evaluation Criteria
• Assignments (14%) – Lab based (Last two will constitute Semester Project)
• Quiz (6%)
• Mid Term (15+15=30%)
• Final (50%)
• Active reading of textbook REQUIRED.
• Plagiarism will be marked as Zero.
• Late submissions are not allowed.
• Required Attendancee 80%

PDC topics Discussed in Lecture # 1
• Motivation for Parallel and Distributed Computing
• Why we need PDC? …Real world example(s).
• Parallel Computing paradigm
• Shared Memory architecture exploited by multi-threaded.
• Distributed Memory paradigm
• Distributed memory architecture exploited by multi-processes.
• The computational task submitted to a master process, which distributed work
(execution of code or processing of data) to other slave processes running on
different computers of the cluster. The slave processes will execute the task in
parallel and send results to master which is responsible to display results.
• Why a cluster of 60 computers is a distributed system?

• Parallel Execution Terms and their definitions
• Scalability

PDC CLOs as per FAST-NU official document

Some General Parallel Terminology
• Task
• A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor.
• Parallel Task
• A task that can be executed by multiple processors safely (yields correct
results)
• Serial Execution
• Execution of a program sequentially, one statement at a time. In the simplest
sense, this is what happens on a one processor machine. However, virtually
all parallel tasks will have sections of a parallel program that must be
executed serially.

Symmetric vs. Asymmetric Multiprocessing Architecture
• Same type of processing elements vs Different type of processors element used in computations.
• Same type of Computation vs Different type of computations done of same processing elements.

• Parallel Execution
• Execution of a program by more than one task, with each task being able to
execute the same or different statement at the same moment in time.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture
where all processors have direct (usually bus based) access to common
physical memory. In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory
that is not common. As a programming model, tasks can only logically "see"
local machine memory and must use communications to access memory on
other machines where other tasks are executing.

Shared Memory vs. Distributed Memory
a) Shared Memory b) Distributed Memory
This network interconnect is
either very high-speed
Ethernet switch ~10Gbit or
even higher Infiniband or
other switches

• Communications
• Parallel tasks typically need to exchange data. There are several ways this can
be accomplished, such as through a shared memory bus or over a network,
however the actual event of data exchange is commonly referred to as
communications regardless of the method employed.
• Synchronization
• The coordination of parallel tasks in real time, very often associated with
communications. Often implemented by establishing a synchronization point
within an application where a task may not proceed further until another
task(s) reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task, and can
therefore cause a parallel application's wall clock execution time to increase.

• Scalability
• Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the
addition of more processors. Factors that contribute to scalability
include:
• Hardware - particularly memory-cpu bandwidths and network
communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding

• Overhead in Parallel and Distributed Computing
• Speed-up and Amdahl Law
• Flynn’s Taxonomy
• Granularity

• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
• Task termination time
• Massively Parallel
• Refers to the hardware that comprises a given parallel system - having many
processors. The meaning of many keeps increasing, but currently IBM Blue
Gene/L pushes this number to 6 digits.

• Observed Speedup
• Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a parallel program's
performance.

Flynn’s Taxonomy
PU = Processing Unit

Single Instruction, Single Data (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction stream is
being acted on by the CPU during any one clock
cycle
• Single data: only one data stream is being used as
input during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the most
prevalent form of computer
• Examples: most PCs, single CPU workstations and
mainframes

Single Instruction, Multiple Data (SIMD)
• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal
network, and a very large array of very small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as image
processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples: Vectorization is a prime example of SIMD in
which the same instruction is performed across
multiple data. A variant of SIMD is single instruction,
multi-thread (SIMT), which is commonly used to
describe GPU workgroups.

Multiple Instruction, Single Data (MISD)
• A single data stream is fed into multiple processing units.
• Each processing unit operates on the data independently via independent
instruction streams. This is not a common architecture.
• Few actual examples of this class of parallel computer have ever existed.
One is the expérimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message.
• Redundant computation on the same data. This is used in
highly fault-tolerant approaches such as spacecraft
controllers. Because spacecraft are in high radiation
environments, these often run two copies of each
calculation and compare the output of the two.

Multiple Instruction, Multiple Data (MIMD)
• Currently, the most common type of parallel computer. Most modern
computers fall into this category.
• Multiple Instruction: every processor may be executing a different
instruction stream
• Multiple Data: every processor may be working with a different data
stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Examples: most current supercomputers, networked parallel computer
"grids" and multi-processor SMP computers - including some types of
PCs.
• The final category has parallelization in both
instructions and data and is referred to as MIMD.
This category describes multi-core parallel
architectures that comprise the majority of large
parallel systems.

Week # 1.pdf

More Related Content

Similar to Week # 1.pdf (20)

Recently uploaded (20)

Week # 1.pdf