SlideShare a Scribd company logo
1
Introduction
Parallel and Distributed Computing
Lecture 1-2 / 18
High Performance Computing most
generally refers to the practice of
aggregating computing power in a
way that delivers much higher
performance than one could get out
of a typical desktop computer or
workstation in order to solve large
problems in science, engineering, or
business.
2
Topics
 Introduction - today’s lecture
 System Architectures (Single Instruction - Single Data, Single Instruction -
Multiple Data, Multiple Instruction - Multiple Data, Shared Memory, Distributed
Memory, Cluster, Multiple Instruction - Single Data)
 Performance Analysis of parallel calculations (the speedup, efficiency, time
execution of algorithm…)
 Parallel numerical methods (Principles of Parallel Algorithm Design, Analytical
Modeling of Parallel Programs, Matrices Operations, Matrix-Vector Operations,
Graph Algorithms…)
 Software (Programming Using the Message-Passing Interface, OpenMP, CUDA,
Corba…)
3
What will you learn today?
 Why Use Parallel Computing?
 Motivation for parallelism (Moor’s law)
 What is traditional programming view?
 What is parallel computing?
 What is distributed computing?
 Concepts and terminology
von Neumann Computer Architecture
4
What is traditional programming view?
 Von Neumann View
- Program Counter + Registers = Thread/process
- Sequential change of Machine state
 Comprised of four main components:
 Memory
 Control Unit
 Arithmetic Logic Unit
 Input/Output
5
Von Neumann Architecture
 Read/write, random access memory is used to store both program
instructions and data
 Program instructions tell the computer to do something
 Data is simply information to be used by the program
 Control unit fetches instructions/data from memory, decodes the
instructions and then sequentially coordinates operations to accomplish
the programmed task.
 Arithmetic Unit performs basic arithmetic operations.
 Input/Output is the interface to the human operator.
6
All of the algorithms we’ve seen so far are
sequential:
• They have one “thread” of execution
• One step follows another in sequence
• One processor is all that is needed to run
the algorithm
Traditional (sequential) Processing View
7
The Computational Power Argument
Moore's law states [1965]:
2X transistors/Chip Every 1.5 or 2 years
Microprocessors have become smaller,
denser, and more powerful.
Gordon Moore is a co-founder of Intel.
What problems are there ?
With the increased use of computers in every sphere of
human activity, computer scientists are faced with two
crucial issues today:
 Processing has to be done faster like never before
 Larger or complex computation problems need to
be solved
8
What problems are there ?
 Increasing the number of transistors as per Moore’s
Law isn’t a solution, as it also increases the power
consumption.
 Power consumption causes a problem of processor
heating…
The perfect solution is PARALLELISM - in
hardware as well as software.
9
What is PARALLELISM ?
PARALLELISM is a form of computation in
which many instructions are carried out
simultaneously, operating on the principle that
large problems can often be divided into smaller
ones, which are then solved concurrently (in
parallel).
10
12
Why Use PARALLELISM ?
Save time and/or money
Expl: Parallel clusters can be built from cheap components
Solve larger problems
Expl: Many problems are so large and/or complex that it is impractical or
impossible to solve them on a single computer, especially given limited
computer memory
Provide concurrency
Expl: Multiple computing resources can do many things simultaneously.
Use of non-local resources
Limits to serial computing
Available memory
Performance
We can run…
 Larger problems
 Faster
 More cases
Parallel programming view
 Parallel computing is a form of computation in which many
calculations are carried out simultaneously.
 In the simplest sense, it is the simultaneous use of multiple
compute resources to solve a computational problem:
 1.To be run using multiple CPUs
 2.A problem is broken into discrete parts that can be solved
concurrently
 3.Each part is further broken down to a series of instructions
 4.Instructions from each part execute simultaneously on
different CPUs
13
Parallel / Distributed ( Cluster) / Grid
Computing
 Parallel computing: use of multiple computers or processors working
together on a common task.
 Each processor works on its section of the problem.
 Processors can exchange information
 Distributed (cluster) computing is where several different computers
(processing elements) work separately and they are connected by a network.
 Distributed computers are highly scalable.
 Grid Computing makes use of computers communicating over the
Internet to work on a given problem (so, that in some respects they can be
regarded as a single computer).
14
Parallel Computer Memory
Architectures
 Shared Memory
 Uniform Memory Access (UMA)
 Non-Uniform Memory Access (NUMA)
 Distributed Memory
 Hybrid Distributed-Shared Memory
18
Shared Memory
General Characteristics:
• Shared memory parallel computers vary widely, but generally have in common
the ability for all processors to access all memory as global address space.
• Multiple processors can operate independently but share the same memory
resources.
• Changes in a memory location effected by one processor are visible to all
other processors.
• Shared memory machines can be divided into two main classes based upon
memory access times: UMA and NUMA.
19
Shared Memory (UMA)
20
Uniform Memory Access (UMA):
 Identical processors, Symmetric Multiprocessor
(SMP)
 Equal access and access times to memory
 Sometimes called CC-UMA - Cache Coherent
UMA.
Cache coherent means if one processor updates
a location in shared memory, all the other
processors know about the update.
Shared Memory (NUMA)
Non-Uniform Memory Access (NUMA):
 Often made by physically linking two or more SMPs
 One SMP can directly access memory of another SMP
 Not all processors have equal access time to all memories
 Memory access across link is slower
 If cache coherency is maintained, then may also be called CC-
NUMA - Cache Coherent NUMA
21
Distributed Memory
Distributed memory systems require a communication network to connect inter-
processor memory.
 Processors have their own local memory.
 Because each processor has its own local memory, it operates independently.
Hence, the concept of cache coherency does not apply.
 When a processor needs access to data in another processor, it is usually the
task of the programmer to explicitly define how and when data is
communicated.
22
Hybrid Distributed-Shared Memory
 The largest and fastest computers in the world today employ both shared and
distributed memory architectures.
 The shared memory component is usually a cache coherent SMP machine.
Processors on a given SMP can address that machines memory as global.
 The distributed memory component is the networking of multiple SMPs. SMPs
know only about their own memory - not the memory on another SMP.
Therefore, network communications are required to move data from one SMP
to another.
23
Key Difference Between Data And Task
Parallelism
Data Parallelism
 It is the division of
threads(processes) or
instructions or tasks
internally into sub-parts for
execution.
 A task ‘A’ is divided into
sub-parts and then
processed.
24
Task Parallelism
 It is the divisions among
threads (processes) or
instructions or tasks
themselves for execution.
 A task ‘A’ and task ‘B’ are
processed separately by
different processors.
Implementation Of Parallel Computing
In Software
 When implemented in software(or rather algorithms), the
terminology calls it ‘parallel programming’.
 An algorithm is split into pieces and then executed, as seen
earlier.
 Important Points In Parallel Programming
 Dependencies - A typical scenario when line 6 of an algorithm is
dependent on lines 2,3,4 and 5
 Application Checkpoints - Just like saving the algorithm, or like creating
a backup point.
 Automatic Parallelisation - Identifying dependencies and parallelising
algorithms automatically. This has achieved limited success.
25
Implementation Of Parallel Computing
In Hardware
 When implemented in hardware, it is called as ‘parallel
processing’.
 Typically, when a chunk of load for execution is divided for
processing by units like cores, processors, CPUs, etc.
26
27
Who is doing Parallel Computing?
What are they using it for?
Physics is parallel.
The Human World is parallel too!
Sequence is unusual
Computer programs = models, distributed processes, increasingly parallel
28
Application Examples with
Massive Parallelism
Artificial Intelligence and Automation
AI is the intelligence exhibited by machines or software.
AI systems requires large amount of parallel computing
for which they are used.
1.Image processing
2.Expert Systems
3.Natural Language Processing(NLP)
4.Pattern Recognition
29
Application Examples with
Massive Parallelism
Genetic Engineering
Several of these analysis produce huge amounts of
information which becomes difficult to handle using single
processing units because of which parallel processing
algorithms are used
30
Application Examples with
Massive Parallelism
Medical Applications
 Parallel computing is used in medical image processing
 Used for scanning human body and scanning human brain
 Used in MRI reconstruction
 Used for vertebra detection and segmentation in X-ray
images
 Used for brain fiber tracking
31
Impediments to Parallel Computing
 Algorithm development is harder
—complexity of specifying and coordinating concurrent activities
 Software development is much harder
—lack of standardized & effective development tools, programming models,
and environments
 Rapid changes in computer system architecture
—today’s hot parallel algorithm may not be suitable for tomorrow’s parallel
computer!
32
Next lecture overview
 Some General Parallel Terminology
 Flynn's Taxonomy
33
Test questions
 What is traditional programming view?
 Who is doing Parallel Computing?
 What are they using it for?
 Types of Parallel Computer Hardware.
34
Textbooks
• Course Textbook:
1. Ananth Grama, George Karypis, Vipin Kumar, Anshul
Gupta “Introduction to Parallel Computing” (2nd Edition)
2. Marc Snir, William Gropp “MPI: The Complete
Reference (2-volume set)”
3. Victor Fijkhout with Edmond Chow, Robert Van De
Geijn “Introduction To High-Performance Scientific
Computing”
• Reserve Texts (recommended that you look at periodically)

More Related Content

PDF
Introduction to OpenMP
PPTX
Dichotomy of parallel computing platforms
PPTX
GPU Computing
PPTX
Multiprocessor Scheduling
PPT
Distributed Operating System
PPTX
Pram model
PDF
Aca2 01 new
PPT
Evaluation of morden computer & system attributes in ACA
Introduction to OpenMP
Dichotomy of parallel computing platforms
GPU Computing
Multiprocessor Scheduling
Distributed Operating System
Pram model
Aca2 01 new
Evaluation of morden computer & system attributes in ACA

What's hot (20)

PPTX
Distributed shred memory architecture
PDF
Lecture 1 introduction to parallel and distributed computing
PDF
Introduction to Parallel Computing
PPT
system interconnect architectures in ACA
PDF
Introduction to OpenMP
PPT
Secondary storage management in os
PPTX
distributed Computing system model
PPT
Case study windows
PPTX
Introduction to Parallel and Distributed Computing
DOCX
Parallel computing persentation
PPT
Parallel computing
PDF
Notes on NUMA architecture
PPT
1.prallelism
PPTX
Performance of Parallel Processors
PPTX
First order predicate logic (fopl)
PPTX
Introduction to Aneka, Aneka Model is explained
PPTX
Scope of parallelism
PPTX
Hypervisor
PPTX
Thread scheduling in Operating Systems
PDF
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
Distributed shred memory architecture
Lecture 1 introduction to parallel and distributed computing
Introduction to Parallel Computing
system interconnect architectures in ACA
Introduction to OpenMP
Secondary storage management in os
distributed Computing system model
Case study windows
Introduction to Parallel and Distributed Computing
Parallel computing persentation
Parallel computing
Notes on NUMA architecture
1.prallelism
Performance of Parallel Processors
First order predicate logic (fopl)
Introduction to Aneka, Aneka Model is explained
Scope of parallelism
Hypervisor
Thread scheduling in Operating Systems
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
Ad

Similar to parallel computing.ppt (20)

PPTX
intro, definitions, basic laws+.pptx
PPTX
Assignment-1 Updated Version advanced comp.pptx
PPTX
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PPTX
Underlying principles of parallel and distributed computing
PDF
Module 2.pdf
PPTX
PPT
Introduction to parallel_computing
DOC
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
DOCX
Introduction to parallel computing
PDF
A Parallel Computing-a Paradigm to achieve High Performance
PPTX
Lecture1
DOC
Operating Systems
PDF
Distributed system lectures
PPTX
PA CO-1.pptx on business analysis on systems
PDF
Parallel Processing
DOCX
Distributed system notes unit I
PPTX
Parallel processing
PPTX
Computer's clasification
PPTX
Parallel computing
PPTX
Parallel & Distributed processing
intro, definitions, basic laws+.pptx
Assignment-1 Updated Version advanced comp.pptx
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
Underlying principles of parallel and distributed computing
Module 2.pdf
Introduction to parallel_computing
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
Introduction to parallel computing
A Parallel Computing-a Paradigm to achieve High Performance
Lecture1
Operating Systems
Distributed system lectures
PA CO-1.pptx on business analysis on systems
Parallel Processing
Distributed system notes unit I
Parallel processing
Computer's clasification
Parallel computing
Parallel & Distributed processing
Ad

More from ssuser413a98 (6)

PPTX
Визначення терміну динамічний об’єкт на зображеннях .pptx
PPT
Фильтрация шумов на цифровом изображении.ppt
PPT
Шумоподавление в цифровых изображениях.ppt
PPT
Подавление шума в цифровых изображениях.ppt
PPT
Сегментация изображений в компьютерной графике.ppt
PPTX
lecture11_GPUArchCUDA01.pptx
Визначення терміну динамічний об’єкт на зображеннях .pptx
Фильтрация шумов на цифровом изображении.ppt
Шумоподавление в цифровых изображениях.ppt
Подавление шума в цифровых изображениях.ppt
Сегментация изображений в компьютерной графике.ppt
lecture11_GPUArchCUDA01.pptx

Recently uploaded (20)

PPTX
Welding lecture in detail for understanding
PDF
Well-logging-methods_new................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
PPT on Performance Review to get promotions
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
composite construction of structures.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Welding lecture in detail for understanding
Well-logging-methods_new................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
web development for engineering and engineering
Digital Logic Computer Design lecture notes
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT on Performance Review to get promotions
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Arduino robotics embedded978-1-4302-3184-4.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mechanical Engineering MATERIALS Selection
Foundation to blockchain - A guide to Blockchain Tech
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
composite construction of structures.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Lecture Notes Electrical Wiring System Components
UNIT-1 - COAL BASED THERMAL POWER PLANTS

parallel computing.ppt

  • 1. 1 Introduction Parallel and Distributed Computing Lecture 1-2 / 18 High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
  • 2. 2 Topics  Introduction - today’s lecture  System Architectures (Single Instruction - Single Data, Single Instruction - Multiple Data, Multiple Instruction - Multiple Data, Shared Memory, Distributed Memory, Cluster, Multiple Instruction - Single Data)  Performance Analysis of parallel calculations (the speedup, efficiency, time execution of algorithm…)  Parallel numerical methods (Principles of Parallel Algorithm Design, Analytical Modeling of Parallel Programs, Matrices Operations, Matrix-Vector Operations, Graph Algorithms…)  Software (Programming Using the Message-Passing Interface, OpenMP, CUDA, Corba…)
  • 3. 3 What will you learn today?  Why Use Parallel Computing?  Motivation for parallelism (Moor’s law)  What is traditional programming view?  What is parallel computing?  What is distributed computing?  Concepts and terminology von Neumann Computer Architecture
  • 4. 4 What is traditional programming view?  Von Neumann View - Program Counter + Registers = Thread/process - Sequential change of Machine state  Comprised of four main components:  Memory  Control Unit  Arithmetic Logic Unit  Input/Output
  • 5. 5 Von Neumann Architecture  Read/write, random access memory is used to store both program instructions and data  Program instructions tell the computer to do something  Data is simply information to be used by the program  Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task.  Arithmetic Unit performs basic arithmetic operations.  Input/Output is the interface to the human operator.
  • 6. 6 All of the algorithms we’ve seen so far are sequential: • They have one “thread” of execution • One step follows another in sequence • One processor is all that is needed to run the algorithm Traditional (sequential) Processing View
  • 7. 7 The Computational Power Argument Moore's law states [1965]: 2X transistors/Chip Every 1.5 or 2 years Microprocessors have become smaller, denser, and more powerful. Gordon Moore is a co-founder of Intel.
  • 8. What problems are there ? With the increased use of computers in every sphere of human activity, computer scientists are faced with two crucial issues today:  Processing has to be done faster like never before  Larger or complex computation problems need to be solved 8
  • 9. What problems are there ?  Increasing the number of transistors as per Moore’s Law isn’t a solution, as it also increases the power consumption.  Power consumption causes a problem of processor heating… The perfect solution is PARALLELISM - in hardware as well as software. 9
  • 10. What is PARALLELISM ? PARALLELISM is a form of computation in which many instructions are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel). 10
  • 11. 12 Why Use PARALLELISM ? Save time and/or money Expl: Parallel clusters can be built from cheap components Solve larger problems Expl: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory Provide concurrency Expl: Multiple computing resources can do many things simultaneously. Use of non-local resources Limits to serial computing Available memory Performance We can run…  Larger problems  Faster  More cases
  • 12. Parallel programming view  Parallel computing is a form of computation in which many calculations are carried out simultaneously.  In the simplest sense, it is the simultaneous use of multiple compute resources to solve a computational problem:  1.To be run using multiple CPUs  2.A problem is broken into discrete parts that can be solved concurrently  3.Each part is further broken down to a series of instructions  4.Instructions from each part execute simultaneously on different CPUs 13
  • 13. Parallel / Distributed ( Cluster) / Grid Computing  Parallel computing: use of multiple computers or processors working together on a common task.  Each processor works on its section of the problem.  Processors can exchange information  Distributed (cluster) computing is where several different computers (processing elements) work separately and they are connected by a network.  Distributed computers are highly scalable.  Grid Computing makes use of computers communicating over the Internet to work on a given problem (so, that in some respects they can be regarded as a single computer). 14
  • 14. Parallel Computer Memory Architectures  Shared Memory  Uniform Memory Access (UMA)  Non-Uniform Memory Access (NUMA)  Distributed Memory  Hybrid Distributed-Shared Memory 18
  • 15. Shared Memory General Characteristics: • Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space. • Multiple processors can operate independently but share the same memory resources. • Changes in a memory location effected by one processor are visible to all other processors. • Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA. 19
  • 16. Shared Memory (UMA) 20 Uniform Memory Access (UMA):  Identical processors, Symmetric Multiprocessor (SMP)  Equal access and access times to memory  Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update.
  • 17. Shared Memory (NUMA) Non-Uniform Memory Access (NUMA):  Often made by physically linking two or more SMPs  One SMP can directly access memory of another SMP  Not all processors have equal access time to all memories  Memory access across link is slower  If cache coherency is maintained, then may also be called CC- NUMA - Cache Coherent NUMA 21
  • 18. Distributed Memory Distributed memory systems require a communication network to connect inter- processor memory.  Processors have their own local memory.  Because each processor has its own local memory, it operates independently. Hence, the concept of cache coherency does not apply.  When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. 22
  • 19. Hybrid Distributed-Shared Memory  The largest and fastest computers in the world today employ both shared and distributed memory architectures.  The shared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machines memory as global.  The distributed memory component is the networking of multiple SMPs. SMPs know only about their own memory - not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another. 23
  • 20. Key Difference Between Data And Task Parallelism Data Parallelism  It is the division of threads(processes) or instructions or tasks internally into sub-parts for execution.  A task ‘A’ is divided into sub-parts and then processed. 24 Task Parallelism  It is the divisions among threads (processes) or instructions or tasks themselves for execution.  A task ‘A’ and task ‘B’ are processed separately by different processors.
  • 21. Implementation Of Parallel Computing In Software  When implemented in software(or rather algorithms), the terminology calls it ‘parallel programming’.  An algorithm is split into pieces and then executed, as seen earlier.  Important Points In Parallel Programming  Dependencies - A typical scenario when line 6 of an algorithm is dependent on lines 2,3,4 and 5  Application Checkpoints - Just like saving the algorithm, or like creating a backup point.  Automatic Parallelisation - Identifying dependencies and parallelising algorithms automatically. This has achieved limited success. 25
  • 22. Implementation Of Parallel Computing In Hardware  When implemented in hardware, it is called as ‘parallel processing’.  Typically, when a chunk of load for execution is divided for processing by units like cores, processors, CPUs, etc. 26
  • 23. 27 Who is doing Parallel Computing? What are they using it for? Physics is parallel. The Human World is parallel too! Sequence is unusual Computer programs = models, distributed processes, increasingly parallel
  • 24. 28 Application Examples with Massive Parallelism Artificial Intelligence and Automation AI is the intelligence exhibited by machines or software. AI systems requires large amount of parallel computing for which they are used. 1.Image processing 2.Expert Systems 3.Natural Language Processing(NLP) 4.Pattern Recognition
  • 25. 29 Application Examples with Massive Parallelism Genetic Engineering Several of these analysis produce huge amounts of information which becomes difficult to handle using single processing units because of which parallel processing algorithms are used
  • 26. 30 Application Examples with Massive Parallelism Medical Applications  Parallel computing is used in medical image processing  Used for scanning human body and scanning human brain  Used in MRI reconstruction  Used for vertebra detection and segmentation in X-ray images  Used for brain fiber tracking
  • 27. 31 Impediments to Parallel Computing  Algorithm development is harder —complexity of specifying and coordinating concurrent activities  Software development is much harder —lack of standardized & effective development tools, programming models, and environments  Rapid changes in computer system architecture —today’s hot parallel algorithm may not be suitable for tomorrow’s parallel computer!
  • 28. 32 Next lecture overview  Some General Parallel Terminology  Flynn's Taxonomy
  • 29. 33 Test questions  What is traditional programming view?  Who is doing Parallel Computing?  What are they using it for?  Types of Parallel Computer Hardware.
  • 30. 34 Textbooks • Course Textbook: 1. Ananth Grama, George Karypis, Vipin Kumar, Anshul Gupta “Introduction to Parallel Computing” (2nd Edition) 2. Marc Snir, William Gropp “MPI: The Complete Reference (2-volume set)” 3. Victor Fijkhout with Edmond Chow, Robert Van De Geijn “Introduction To High-Performance Scientific Computing” • Reserve Texts (recommended that you look at periodically)

Editor's Notes

  • #4: The term parallel computation is generally applied to any data processing, in which several computer operations can be executed simultaneously. Achieving parallelism is only possible if the following requirements to architectural principles of computer systems are met: independent functioning of separate computer devices – this requirement equally concerns all the main computer system components - processors, storage devices, input/output devices; redundancy of computer system elements – redundancy can be realized in the following basic forms: use of specialized devices such as separate processors for integer and real valued arithmetic, multilevel memory devices (registers, cache); duplication of computer devices by means of using separate processors of the same type or several RAM devices, etc. Processor pipelines may be an additional form of achieving parallelism when carrying out operations in the devices is represented as executing a sequence of subcommands which constitute an operation. As a result, when such devices are engaged in computation several different data elements may be at different processing stages simultaneously. Possible ways of achieving parallelism are discussed in detail in Patterson and Hennessy (1996), Culler and Singh (1998); the same works describe the history of parallel computations and give particular examples of parallel computers (see also Xu and Hwang (1998), Culler, Singh and Gupta (1998) Buyya (1999)). Considering the problems of parallel computations one should distinguish the following modes of independent program parts execution: Multitasking (time shared) mode. In multitasking mode a single processor is used for carrying out processes. This mode is pseudo-parallel when only one process is active (is being carried out) while the other processes are in the stand-by mode queuing to use the processor. The use of time shared mode can make computations more efficient (e.g. if one of the processes can not be carried out because the data input is expected, the processor can be used for carrying out the process ready for execution - see Tanenbaum (2001)). Such parallel computation effects as the necessity of processes mutual exclusion and synchronization etc also manifest themselves in this mode and as a result this mode can be used for initial preparation of parallel programs; Parallel execution. In case of parallel execution several instructions of data processing can be carried out simultaneously. This computational mode can be provided not only if several processors are available but also by means of pipeline and vector processing devices; Distributed computations. This term is used to denote parallel data processing which involves the use of several processing devices located at a distance from each other. As the data transmission through communication lines among the processing devices leads to considerable time delays, efficient data processing in this computational mode is possible only for parallel algorithms with low intensity of interprocessor data transmission streams. The above mentioned conditions are typical of the computations in multicomputer systems which are created when several separate computers are connected by LAN or WAN communication channels.
  • #7: Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU); 2. A problem is broken into a discrete series of instructions. 3. Instructions are executed one after another. 4. Only one instruction may execute at any moment in time.
  • #8: Закон мура является основной мотивацией для перехода к ПРО
  • #12: compare
  • #13: Parallel computing allows: Solve problems that don’t fit on a single CPU’s memory space Solve problems that can’t be solved in a reasonable time
  • #14: The diversity of parallel computing systems is immense. In a sense each system is unique – each systems use various types of hardware: processors (Intel, IBM, AMD, HP, NEC, Cray, …), interconnection networks (Ethernet, Myrinet, Infiniband, SCI, …). They operate under various operating systems (Unix/Linux versions, Windows , …) and they use different software. It may seem impossible to find something common for all these system types. But it is not so. Later we will try to formulate some well-known variants of parallel computer systems classifications, but before that we will analyze some examples.
  • #15: A cluster is a group of loosely coupled computers that work together closely, so that in some respects they can be regarded as a single computer. Clusters are composed of multiple standalone machines connected by a network. While machines in a cluster do not have to be symmetric, load balancing is more difficult if they are not. The most common type of cluster is the Beowulf cluster, which is a cluster implemented on multiple identical commercial off-the-shelf computers connected with a TCP/IP Ethernet local area network. Grid computing is the most distributed form of parallel computing. It makes use of computers communicating over the Internet to work on a given problem. Because of the low bandwidth and extremely high latency available on the Internet, grid computing typically deals only with embarrassingly parallel problems. Most grid computing applications use middleware, software that sits between the operating system and the application to manage network resources and standardize the software interface.
  • #16: There is no such thing as "multiprocessor" or "multicore" programming. The distinction between "multiprocessor" and "multicore" computers is probably not relevant to you as an application programmer; it has to do with subtleties of how the cores share access to memory. In order to take advantage of a multicore (or multiprocessor) computer, you need a program written in such a way that it can be run in parallel, and a runtime that will allow the program to actually be executed in parallel on multiple cores (and operating system, although any operating system you can run on your PC will do this). This is really parallel programming, although there are different approaches to parallel programming. A multicore processor is a processor that includes multiple execution units ("cores") on the same chip. These processors differ from superscalar processors, which can issue multiple instructions per cycle from one instruction stream (thread); by contrast, a multicore processor can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar as well—that is, on every cycle, each core can issue multiple instructions from one instruction stream. Parallel computers can be roughly classified according to the level at which the hardware supports parallelism—with multi-core and multi-processor computers having multiple processing elements within a single machine,
  • #17: Parallel computers can be roughly classified according to the level at which the hardware supports parallelism—with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, MPPs, and grids use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks. Parallel computer programs are more difficult to write than sequential ones,[5] because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically one of the greatest obstacles to getting good parallel program performance. A cluster is a group of loosely coupled computers that work together closely, so that in some respects they can be regarded as a single computer.[26] Clusters are composed of multiple standalone machines connected by a network. While machines in a cluster do not have to be symmetric, load balancing is more difficult if they are not. The most common type of cluster is the Beowulf cluster, which is a cluster implemented on multiple identical commercial off-the-shelf computers connected with a TCP/IP Ethernet local area network.[27] Beowulf technology was originally developed by Thomas Sterling and Donald Becker. The vast majority of the TOP500 supercomputers are clusters.[28] A cluster is group of computers connected in a local area network (LAN). A cluster is able to function as a unified computational resource. It implies higher reliability and efficiency than an LAN as well as a considerably lower cost in comparison to the other parallel computing system types (due to the use of standard hardware and software solutions). The beginning of cluster era was signified by the first project with the primary purpose of establishing connection among computers - ARPANET2 project. That was the period when the first principles were formulated which proved to be fundamental. Those principles later lead to the creation of local and global computational networks and of course to the creation of world wide computer network, the Internet. It’s true however that the first cluster appeared more than 20 years later. Those years were marked by a giant breakthrough in hardware development, the emergence of microprocessors and PCs which conquered the market, the accretion of parallel programming concepts and techniques, which eventually lead to the solution to the age-long problem, the problem of each parallel computational facility unicity which was the development of standards for the creation of parallel programs for systems with shared and distributed memory. In addition to that the available solutions in the area of highly efficient systems were very expensive at that time as they implied the use of high performance and specific components. The constant improvement of PC cost/performance ratio should also be taken into consideration. In the light of all those facts the emergence of clusters was inevitable.
  • #18: the problems of parallel computations one should distinguish the following modes of independent program parts execution: Multitasking (time shared) mode. In multitasking mode a single processor is used for carrying out processes. This mode is pseudo-parallel when only one process is active (is being carried out) while the other processes are in the stand-by mode queuing to use the processor. The use of time shared mode can make computations more efficient (e.g. if one of the processes can not be carried out because the data input is expected, the processor can be used for carrying out the process ready for execution - see Tanenbaum (2001)). Such parallel computation effects as the necessity of processes mutual exclusion and synchronization etc also manifest themselves in this mode and as a result this mode can be used for initial preparation of parallel programs; Parallel execution. In case of parallel execution several instructions of data processing can be carried out simultaneously. This computational mode can be provided not only if several processors are available but also by means of pipeline and vector processing devices; Distributed computations. This term is used to denote parallel data processing which involves the use of several processing devices located at a distance from each other. As the data transmission through communication lines among the processing devices leads to considerable time delays, efficient data processing in this computational mode is possible only for parallel algorithms with low intensity of interprocessor data transmission streams. The above mentioned conditions are typical of the computations in multicomputer systems which are created when several separate computers are connected by LAN or WAN communication channels.
  • #19: Classes of parallel comp Multicore computing A multicore processor is a processor that includes multiple execution units. These processors differ from superscalar processors, which can issue multiple instructions per cycle from one instruction stream (thread); by contrast, a multicore processor can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar as well—that is, on every cycle, each core can issue multiple instructions from one instruction stream. Symmetric multiprocessing A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a bus. Bus contention prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors.&quot; Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely cost-effective, provided that a sufficient amount of memory bandwidth exists.” </li></ul>uters Parallel computers can be classified according to the level at which the hardware supports parallelism. This classification is broadly analogous to the distance between basic computing nodes.
  • #23: Synchronization between tasks is likewise the programmers responsibility
  • #28: Science —Global climate modeling —Biology: genomics; protein folding; drug design —Astrophysical modeling —Computational Chemistry —Computational Material Sciences and Nanosciences • Engineering —Semiconductor design —Earthquake and structural modeling —Computation fluid dynamics (airplane design) —Combustion (engine design) —Crash simulation • Business —Financial and economic modeling —Transaction processing, web services and search engines • Defense —Nuclear weapons -- test by simulations —Cryptography
  • #32: Rapid pace of change - быстрые темпы изменения Impediments - преграды