SlideShare a Scribd company logo
Heterogeneous Computing with Open CL 1st Edition
Perhaad Mistry And Dana Schaa (Auth.) download
https://ebookname.com/product/heterogeneous-computing-with-open-
cl-1st-edition-perhaad-mistry-and-dana-schaa-auth/
Get Instant Ebook Downloads – Browse at https://ebookname.com
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...
Beowulf Cluster Computing with Linux 1st Edition Thomas
Sterling
https://ebookname.com/product/beowulf-cluster-computing-with-
linux-1st-edition-thomas-sterling/
Open how Compaq ended IBM s PC domination and helped
invent modern computing First E-Book Edition Canion
https://ebookname.com/product/open-how-compaq-ended-ibm-s-pc-
domination-and-helped-invent-modern-computing-first-e-book-
edition-canion/
QoS Over Heterogeneous Networks 1st Edition Mario
Marchese
https://ebookname.com/product/qos-over-heterogeneous-
networks-1st-edition-mario-marchese/
Business Economics II Macroeconomics Revised Edition
Debes Mukherjee
https://ebookname.com/product/business-economics-ii-
macroeconomics-revised-edition-debes-mukherjee/
Human Body Decomposition 1st Edition Jarvis Hayman
https://ebookname.com/product/human-body-decomposition-1st-
edition-jarvis-hayman/
Critical Mass The Emergence of Global Civil Society
Studies in International Governance First Trade Edition
James W. St.G. Walker
https://ebookname.com/product/critical-mass-the-emergence-of-
global-civil-society-studies-in-international-governance-first-
trade-edition-james-w-st-g-walker/
Restoring Colorado River Ecosystems A Troubled Sense of
Immensity 1st Edition Robert W. Adler
https://ebookname.com/product/restoring-colorado-river-
ecosystems-a-troubled-sense-of-immensity-1st-edition-robert-w-
adler/
Ethical Programs Hospitality And The Rhetorics Of
Software 1st Edition James J. Brown
https://ebookname.com/product/ethical-programs-hospitality-and-
the-rhetorics-of-software-1st-edition-james-j-brown/
Metaphor A Practical Introduction 1St Edition Edition
Zoltan Kovecses
https://ebookname.com/product/metaphor-a-practical-
introduction-1st-edition-edition-zoltan-kovecses/
Working Systemically with Families Formulation
Intervention and Evaluation 1st Edition Arlene Vetere
https://ebookname.com/product/working-systemically-with-families-
formulation-intervention-and-evaluation-1st-edition-arlene-
vetere/
Heterogeneous
Computing with
OpenCL
Heterogeneous
Computing with
OpenCL
Benedict Gaster
Lee Howes
David R. Kaeli
Perhaad Mistry
Dana Schaa
Acquiring Editor: Todd Green
Development Editor: Robyn Day
Project Manager: André Cuello
Designer: Joanne Blank
Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
# 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrange-
ments with organizations such as the Copyright Clearance Center and the Copyright Licensing
Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and
experience broaden our understanding, changes in research methods or professional
practices may become necessary. Practitioners and researchers must always rely on their own
experience and knowledge in evaluating and using any information or methods described
herein. In using such information or methods they should be mindful of their own safety and
the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of product
liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Heterogeneous computing with OpenCL / Benedict Gaster ... [et al.].
p. cm.
ISBN 978-0-12-387766-6
1. Parallel programming (Computer science) 2. OpenCL (Computer program language)
I. Gaster, Benedict.
QA76.642.H48 2012
005.2’752–dc23
2011020169
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-0-12-387766-6
For information on all MK publications
visit our website at www.mkp.com
Printed in the United States of America
12 13 14 15 10 9 8 7 6 5 4 3 2 1
Foreword
For more than two decades, the computer industry has been inspired and motivated
by the observation made by Gordon Moore (A.K.A “Moore’s law”) that the density
of transistors on die was doubling every 18 months. This observation created the an-
ticipation that the performance a certain application achieves on one generation of
processors will be doubled within two years when the next generation of processors
will be announced. Constant improvement in manufacturing and processor technol-
ogies was the main drive of this trend since it allowed any new processor generation
to shrink all the transistor’s dimensions within the “golden factor”, 0.3 (ideal shrink)
and to reduce the power supply accordingly. Thus, any new processor generation
could double the density of transistors, to gain 50% speed improvement (frequency)
while consuming the same power and keeping the same power density. When better
performance was required, computer architects were focused on using the extra tran-
sistors for pushing the frequency beyond what the shrink provided, and for adding
new architectural features that mainly aim at gaining performance improvement
for existing and new applications.
During the mid 2000s, the transistor size became so small that the “physics of
small devices” started to govern the characterization of the entire chip. Thus fre-
quency improvement and density increase could not be achieved anymore without
a significant increase of power consumption and of power density. A recent report
by the International Technology Roadmap for Semiconductors (ITRS) supports this
observation and indicates that this trend will continue for the foreseeable future and it
will most likely become the most significant factor affecting technology scaling and
the future of computer based system.
To cope with the expectation of doubling the performance every known period of
time (not 2 years anymore), two major changes happened (1) instead of increasing
the frequency, modern processors increase the number of cores on each die. This
trend forces the software to be changed as well. Since we cannot expect the hardware
to achieve significantly better performance for a given application anymore, we need
to develop new implementations for the same application that will take advantage of
the multicore architecture, and (2) thermal and power become first class citizens with
any design of future architecture. These trends encourage the community to start
looking at heterogeneous solutions: systems which are assembled from different sub-
systems, each of them optimized to achieve different optimization points or to ad-
dress different workloads. For example, many systems combine “traditional” CPU
architecture with special purpose FPGAs or Graphics Processors (GPUs). Such an
integration can be done at different levels; e.g., at the system level, at the board level
and recently at the core level.
Developing software for homogeneous parallel and distributed systems is consid-
ered to be a non-trivial task, even though such development uses well-known para-
digms and well established programming languages, developing methods,
algorithms, debugging tools, etc. Developing software to support general-purpose
vii
heterogeneous systems is relatively new and so less mature and much more difficult.
As heterogeneous systems are becoming unavoidable, many of the major software
and hardware manufacturers start creating software environments to support them.
AMD proposed the use of the Brook language developed in Stanford University,
to handle streaming computations, later extending the SW environment to include
the Close to Metal (CTM)and the Compute Abstraction Layer (CAL) for accessing
their low level streaming hardware primitives in order to take advantage of their
highly threaded parallel architecture. NVIDIA took a similar approach, co-designing
their recent generations of GPUs and the CUDA programming environment to take
advantage of the highly threaded GPU environment. Intel proposed to extend the use
of multi-core programming to program their Larrabee architecture. IBM proposed
the use of message-passing-based software in order to take advantage of its hetero-
geneous, non-coherent cell architecture and FPGA based solutions integrate libraries
written in VHDL with C or Cþþ based programs to achieve the best of two envi-
ronments. Each of these programming environments offers scope for benefiting do-
main-specific applications, but they all failed to address the requirement for general
purpose software that can serve different hardware architectures in the way that, for
example, Java code can run on very different ISA architectures.
The Open Computing Language (OpenCL) was designed to meet this important
need. It was defined and managed by the nonprofit technology consortium Khronos
The language and its development environment “borrows” many of its basic con-
cepts from very successful, hardware specific environments such as CUDA, CAL,
CTM, and blends them to create a hardware independent software development en-
vironment. It supports different levels of parallelism and efficiently maps to homo-
geneous or heterogeneous, single- or multiple-device systems consisting of CPUs,
GPUs, FPGA and potentially other future devices. In order to support future devices,
OpenCL defines a set of mechanisms that if met, the device could be seamlessly in-
cluded as part of the OpenCL environment. OpenCL also defines a run-time support
that allows to manage the resources, combine different types of hardware under the
same execution environment and hopefully in the future it will allow to dynamically
balance computations, power and other resources such as memory hierarchy, in a
more general manner.
This book is a text book that aims to teach students how to program heteroge-
neous environments. The book starts with a very important discussion on how to pro-
gram parallel systems and defines the concepts the students need to understand
before starting to program any heterogeneous system. It also provides a taxonomy
that can be used for understanding the different models used for parallel and distrib-
uted systems. Chapters 2 – 4 build the students’ step by step understanding of the
basic structures of OpenCL (Chapter 2) including the host and the device architecture
(Chapter 3). Chapter 4 provides an example that puts together these concepts using a
not trivial example.
Chapters 5 and 6 extend the concepts we learned so far with a better understand-
ing of the notions of concurrency and run-time execution in OpenCL (Chapter 5) and
the dissection between the CPU and the GPU (Chapter 6). After building the basics,
viii Foreword
the book dedicates 4 Chapters (7-10) to more sophisticated examples. These sections
are vital for students to understand that OpenCL can be used for a wide range of ap-
plications which are beyond any domain specific mode of operation. The book also
demonstrates how the same program can be run on different platforms, such as Nvi-
dia or AMD. The book ends with three chapters which are dedicated to advanced
topics.
No doubt that this is a very important book that provides students and researchers
with a better understanding of the world of heterogeneous computers in general and
the solutions provided by OpenCL in particular. The book is well written, fits stu-
dents’ different experience levels and so, can be used either as a text book in a course
on OpenCL, or different parts of the book can be used to extend other courses; e.g.,
the first two chapters are well fitted for a course on parallel programming and some
of the examples can be used as a part of advanced courses.
Dr. Avi Mendelson
Microsoft R&D Israel
Adjunct Professor, Technion
ix
Foreword
Preface
OUR HETEROGENEOUS WORLD
Our world is heterogeneous in nature. This kind of diversity provides a richness and
detail that is difficult to describe. At the same time, it provides a level of complexity
and interaction in which a wide range of different entities are optimized for specific
tasks and environments.
In computing, heterogeneous computer systems also add richness by allowing the
programmer to select the best architecture to execute the task at hand or to choose the
right task to make optimal use of a given architecture. These two views of the flex-
ibility of a heterogeneous system both become apparent when solving a computa-
tional problem involves a variety of different tasks. Recently, there has been an
upsurge in the computer design community experimenting with building heteroge-
neous systems. We are seeing new systems on the market that combine a number of
different classes of architectures. What has slowed this progression has been a lack
of standardized programming environment that can manage the diverse set of
resources in a common framework.
OPENCL
OpenCL has been developed specifically to ease the programming burden when writ-
ing applications for heterogeneous systems. OpenCL also addresses the current trend
to increase the number of cores on a given architecture. The OpenCL framework sup-
ports execution on multi-core central processing units, digital signal processors, field
programmable gate arrays, graphics processing units, and heterogeneous accelerated
processing units. The architectures already supported cover a wide range of ap-
proaches to extracting parallelism and efficiency from memory systems and instruc-
tion streams. Such diversity in architectures allows the designer to provide an
optimized solution to his or her problem—a solution that, if designed within the
OpenCL specification, can scale with the growth and breadth of available architec-
tures. OpenCL’s standard abstractions and interfaces allow the programmer to seam-
lessly “stitch” together an application within which execution can occur on a rich set
of heterogeneous devices from one or many manufacturers.
THIS TEXT
Until now, there has not been a single definitive text that can help programmers and
software engineers leverage the power and flexibility of the OpenCL programming
standard. This is our attempt to address this void. With this goal in mind, we have not
attempted to create a syntax guide—there are numerous good sources in which
programmers can find a complete and up-to-date description of OpenCL syntax.
xi
Instead, this text is an attempt to show a developer or student how to leverage
the OpenCL framework to build interesting and useful applications. We provide a
number of examples of real applications to demonstrate the power of this program-
ming standard.
Our hope is that the reader will embrace this new programming framework and
explore the full benefits of heterogeneous computing that it provides. We welcome
comments on how to improve upon this text, and we hope that this text will help you
build your next heterogeneous application.
xii Preface
Acknowledgments
We thank Manju Hegde for proposing the book project, BaoHuong Phan and Todd
Green for their management and input from the AMD and Morgan Kaufmann sides
of the project, and Jay Owen for connecting the participants on this project with each
other. On the technical side, we thank Jay Cornwall for his thorough work editing
much of this text, and we thank Joachim Deguara, Takahiro Harada, Justin Hensley,
Marc Romankewicz, and Byunghyun Jang for their significant contributions to in-
dividual chapters, particularly the sequence of case studies that could not have been
produced without their help. Also instrumental were Jari Nikara, Tomi Aarnio, and
Eero Aho from the Nokia Research Center and Janne Pietiäinen from the Tampere
University of Technology.
xiii
About the Authors
Benedict R. Gaster is a software architect working on programming models for
next-generation heterogeneous processors, particularly examining high-level ab-
stractions for parallel programming on the emerging class of processors that contain
both CPUs and accelerators such as GPUs. He has contributed extensively to the
OpenCL’s design and has represented AMD at the Khronos Group open standard
consortium. He has a Ph.D. in computer science for his work on type systems for
extensible records and variants.
Lee Howes has spent the past 2 years working at AMD and currently focuses on
programming models for the future of heterogeneous computing. His interests lie
in declaratively representing mappings of iteration domains to data and in commu-
nicating complicated architectural concepts and optimizations succinctly to a devel-
oper audience, both through programming model improvements and through
education. He has a Ph.D. in computer science from Imperial College London for
work in this area.
David Kaeli received a B.S. and Ph.D. in electrical engineering from Rutgers Uni-
versity and an M.S. in computer engineering from Syracuse University. He is Asso-
ciate Dean of Undergraduate Programs in the College of Engineering and a Full
Professor on the ECE faculty at Northeastern University, where he directs the North-
eastern University Computer Architecture Research Laboratory (NUCAR). Prior to
joining Northeastern in 1993, he spent 12 years at IBM, the last 7 at T. J. Watson
Research Center, Yorktown Heights, NY. He has co-authored more than 200 criti-
cally reviewed publications. His research spans a range of areas, including micro-
architecture to back-end compilers and software engineering. He leads a number
of research projects in the area of GPU computing. He currently serves as the Chair
of the IEEE Technical Committee on Computer Architecture. He is an IEEE Fellow
and a member of the ACM.
Perhaad Mistry is a Ph.D. candidate at Northeastern University. He received a B.S.
in electronics engineering from the University of Mumbai and an M.S. in computer
engineering from Northeastern University. He is currently a member of the North-
eastern University Computer Architecture Research Laboratory (NUCAR) and is ad-
vised by Dr. David Kaeli. He works on a variety of parallel computing projects. He
has designed scalable data structures for the physics simulations for GPGPU plat-
forms and has also implemented medical reconstruction algorithms for heteroge-
neous devices. His current research focuses on the design of profiling tools for
heterogeneous computing. He is studying the potential of using standards such as
OpenCL for building tools that simplify parallel programming and performance
analysis across the variety of heterogeneous devices available today.
xv
Dana Schaa received a B.S. in computer engineering from California Polytechnic
State University, San Luis Obispo, and an M.S. in electrical and computer engineer-
ing from Northeastern University, where he is also currently a Ph.D. candidate.
His research interests include parallel programming models and abstractions,
particularly for GPU architectures. He has developed GPU-based implementations
of several medical imaging research projects ranging from real-time visualization
to image reconstruction in distributed, heterogeneous environments. He married
his wonderful wife, Jenny, in 2010, and they live together in Boston with their
charming cats.
xvi About the authors
CHAPTER
Introduction to Parallel
Programming
1
INTRODUCTION
Today’s computing environments are becoming more multifaceted, exploiting the
capabilities of a range of multi-core microprocessors, central processing units
(CPUs), digital signal processors, reconfigurable hardware (FPGAs), and graphic
processing units (GPUs). Presented with so much heterogeneity, the process of de-
veloping efficient software for such a wide array of architectures poses a number of
challenges to the programming community.
Applications possess a number of workload behaviors, ranging from control
intensive (e.g., searching, sorting, and parsing) to data intensive (e.g., image
processing, simulation and modeling, and data mining). Applications can also
be characterized as compute intensive (e.g., iterative methods, numerical methods,
and financial modeling), where the overall throughput of the application is heavily
dependent on the computational efficiency of the underlying hardware. Each of
these workload classes typically executes most efficiently on a specific style of
hardware architecture. No single architecture is best for running all classes of
workloads, and most applications possess a mix of the workload characteristics.
For instance, control-intensive applications tend to run faster on superscalar CPUs,
where significant die real estate has been devoted to branch prediction mecha-
nisms, whereas data-intensive applications tend to run fast on vector architectures,
where the same operation is applied to multiple data items concurrently.
OPENCL
The Open Computing Language (OpenCL) is a heterogeneous programming
framework that is managed by the nonprofit technology consortium Khronos
Group. OpenCL is a framework for developing applications that execute across
a range of device types made by different vendors. It supports a wide range of
levels of parallelism and efficiently maps to homogeneous or heterogeneous,
single- or multiple-device systems consisting of CPUs, GPUs, and other types of de-
vices limited only by the imagination of vendors. The OpenCL definition offers both
a device-side language and a host management layer for the devices in a system.
Heterogeneous Computing with OpenCL
© 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
1
The device-side language is designed to efficiently map to a wide range of memory
systems.The hostlanguageaimstosupportefficient plumbingofcomplicatedconcur-
rent programs with low overhead. Together, these provide the developer with a path to
efficiently move from algorithm design to implementation.
OpenCL provides parallel computing using task-based and data-based parallel-
ism. It currently supports CPUs that include x86, ARM, and PowerPC, and it has
been adopted into graphics card drivers by both AMD (called the Accelerated
Parallel Processing SDK) and NVIDIA. Support for OpenCL is rapidly expanding
as a wide range of platform vendors have adopted OpenCL and support or plan to
support it for their hardware platforms. These vendors fall within a wide range of
market segments, from the embedded vendors (ARM and Imagination Technologies)
to the HPC vendors (AMD, Intel, NVIDIA, and IBM). The architectures supported
include multi-core CPUs, throughput and vector processors such as GPUs, and fine-
grained parallel devices such as FPGAs.
Most important, OpenCL’s cross-platform, industrywide support makes it an
excellent programming model for developers to learn and use, with the confidence
that it will continue to be widely available for years to come with ever-increasing
scope and applicability.
THE GOALS OF THIS BOOK
This book is the first of its kind to present OpenCL programming in a fashion appro-
priate for the classroom. The book is organized to address the need for teaching par-
allel programming on current system architectures using OpenCL as the target
language, and it includes examples for CPUs, GPUs, and their integration in the ac-
celerated processing unit (APU). Another major goal of this text is to provide a guide
to programmers to develop well-designed programs in OpenCL targeting parallel
systems. The book leads the programmer through the various abstractions and fea-
tures provided by the OpenCL programming environment. The examples offer the
reader a simple introduction and more complicated optimizations, and they suggest
further development and goals at which to aim. It also discusses tools for improving
the development process in terms of profiling and debugging such that the reader
need not feel lost in the development process.
The book is accompanied by a set of instructor slides and programming exam-
ples, which support the use of this text by an OpenCL instructor. Please visit
http://heterogeneouscomputingwithopencl.org/ for additional information.
THINKING PARALLEL
Most applications are first programmed to run on a single processor. In the field
of high-performance computing, classical approaches have been used to accelerate
computation when provided with multiple computing resources. Standard approaches
2 CHAPTER 1 Introduction to parallel programming
include “divide-and-conquer” and “scatter–gather” problem decomposition methods,
providing the programmer with a set of strategies to effectively exploit the parallel
resources available in high-performance systems. Divide-and-conquer methods iter-
atively break a problem into subproblems until the subproblems fit well on the com-
putational resources provided. Scatter–gather methods send a subset of the input data
set to each parallel resource and then collect the results of the computation and com-
bine them into a result data set. As before, the partitioning takes account of the size of
the subsets based on the capabilities of the parallel resources. Figure 1.1 shows how
popular applications such as sorting and a vector–scalar multiply can be effectively
mapped to parallel resources to accelerate processing.
The programming task becomes increasingly challenging when faced with the
growing parallelism and heterogeneity present in contemporary parallel processors.
Given the power and thermal limits of complementary metal-oxide semiconductor
(CMOS) technology, microprocessor vendors find it difficult to scale the frequency
of these devices to derive more performance and have instead decided to place mul-
tiple processors, sometimes specialized, on a single chip. In doing so, the problem of
extracting parallelism from an application is left to the programmer, who must de-
compose the underlying algorithms in the applications and map them efficiently to a
diverse variety of target hardware platforms.
In the past 5 years, parallel computing devices have been increasing in number
and processing capabilities. GPUs have also appeared on the computing scene and
A B
FIGURE 1.1
(A) Simple sorting and (B) dot product examples.
3
Thinking parallel
are providing new levels of processing capability at very low cost. Driven by the
demand for real-time three-dimensional graphics rendering, a highly data-parallel
problem, GPUs have evolved rapidly as very powerful, fully programmable, task
and data-parallel architectures. Hardware manufacturers are now combining CPUs
and GPUs on a single die, ushering in a new generation of heterogeneous computing.
Compute-intensive and data-intensive portions of a given application, called kernels,
may be offloaded to the GPU, providing significant performance per watt and raw
performance gains, while the host CPU continues to execute nonkernel tasks.
Many systems and phenomena in both the natural world and the man-made world
present us with different classes of parallelism and concurrency:
• Molecular dynamics
• Weather and ocean patterns
• Multimedia systems
• Tectonic plate drift
• Cell growth
• Automobile assembly lines
• Sound and light wave propagation
Parallel computing, as defined by Almasi and Gottlieb (1989), is “a form of compu-
tation in which many calculations are carried out simultaneously, operating on the
principle that large problems can often be divided into smaller ones, which are then
solved concurrently (i.e., in parallel).” The degree of parallelism that can be achieved
is dependent on the inherent nature of the problem at hand (remember that there ex-
ists significant parallelism in the world), and the skill of the algorithm or software
designer is to identify the different forms of parallelism present in the underlying
problem. We begin with a discussion of two simple examples to demonstrate inher-
ent parallel computation: vector multiplication and text searching.
Our first example carries out multiplication of the elements of two arrays A and B,
each with N elements, storing the result of each multiply in a corresponding array C.
Figure 1.2 shows the computation we would like to carry out. The serial Cþþ
program for code would look as follows:
for (i¼0; i<N; i++)
C[i] ¼ A[i] * B[i];
This code possesses significant parallelism but very little arithmetic intensity. The
computation of every element in C is independent of every other element. If we were
to parallelize this code, we could choose to generate a separate execution instance to
perform the computation of each element of C. This code possesses significant data-
level parallelism because the same operation is applied across all of A and B to pro-
duce C. We could also view this breakdown as a simple form of task parallelism
where each task operates on a subset of the same data; however, task parallelism gen-
eralizes further to execution on pipelines of data or even more sophisticated parallel
interactions. Figure 1.3 shows an example of task parallelism in a pipeline to support
filtering of images in frequency space using an FFT.
4 CHAPTER 1 Introduction to parallel programming
Let us consider a second example. The computation we are trying to carry out is
to find the number of occurrences of a string of characters in a body of text
(Figure 1.4). Assume that the body of text has already been parsed into a set of N
words. We could choose to divide the task of comparing the string against the N po-
tential matches into N comparisons (i.e., tasks), where each string of characters is
matched against the text string. This approach, although rather naı̈ve in terms of
search efficiency, is highly parallel. The process of the text string being compared
against the set of potential words presents N parallel tasks, each carrying out the same
Array A[ ]
Array B[ ]
Array C
[ ]
X
X
X
X
X
Parallel
multiplies
No communication
between
computations
FIGURE 1.2
Multiplying two arrays: This example provides for parallel computation without any need for
communication.
Input
Image 0
Output
image 0
Input
image 2
Input
image 3
Input
Image 0
Input
Image 0
Input
Image 0
Input
image 4
FFT
Inverse
FFT
Frequency
space filter
FIGURE 1.3
Filtering a series of images using an FFT shows clear task parallelism as a series of tasks
operate together in a pipeline to compute the overall result.
5
Thinking parallel
set of operations. There is even further parallelism within a single comparison task,
where the matching on a character-by-character basis presents a finer-grained degree
of parallelism. This example exhibits both data-level parallelism (we are going to be
performing the same operation on multiple data items) and task-level parallelism (we
can compare the string to all words concurrently).
Oncethenumberofmatchesisdetermined, weneedto accumulatethem toprovide
the total number of occurrences. Again, this summing can exploit parallelism. In this
step, we introduce the concept of “reduction,” where we can utilize the availability of
parallelresourcestocombinepartialssumsinaveryefficientmanner.Figure1.5shows
the reduction tree, which illustrates this summation process in log N steps.
CONCURRENCY AND PARALLEL PROGRAMMING MODELS
Here, we discuss concurrency and parallel processing models so that when attempt-
ing to map an application developed in OpenCL to a parallel platform, we can select
the right model to pursue. Although all of the following models can be supported in
OpenCL, the underlying hardware may restrict which model will be practical to use.
N parallel comparison tasks
Finer-grained character-by-character parallelism
Search string
Document words
Valence
Base
Acid
Acid
String
compare
String
compare
String
compare
String
compare
Oxygen
Char
compare
Char
compare
Char
compare
Char
compare
a b c a i s d e
FIGURE 1.4
An example of both task-level and data-level parallelism. We can have parallel tasks that
count the occurrence of string in a body of text. The lower portion of the figure shows that the
string comparison can be broken down to finer-grained parallel processing.
6 CHAPTER 1 Introduction to parallel programming
Concurrency is concerned with two or more activities happening at the same
time. We find concurrency in the real world all the time—for example, carrying a
child in one arm while crossing a road or, more generally, thinking about something
while doing something else with one’s hands.
When talking about concurrency in terms of computer programming, we mean a
single system performing multiple tasks independently. Although it is possible that
concurrent tasks may be executed at the same time (i.e., in parallel), this is not a re-
quirement. For example, consider a simple drawing application, which is either re-
ceiving input from the user via the mouse and keyboard or updating the display with
the current image. Conceptually, receiving and processing input are different oper-
ations (i.e., tasks) from updating the display. These tasks can be expressed in terms of
concurrency, but they do not need to be performed in parallel. In fact, in the case in
which they are executing on a single core of a CPU, they cannot be performed in
parallel. In this case, the application or the operating system should switch between
the tasks, allowing both some time to run on the core.
Parallelism is concerned with running two or more activities in parallel with the
explicit goal of increasing overall performance. For example, consider the following
assignments:
Reduction
network
Number of matches
Acid
String
compare
String
compare
String
compare
String
compare
Sum the
number of
matches
Sum the
number of
matches
Sum the
number of
matches
FIGURE 1.5
After all string comparisons are completed, we can sum up the number of matches in a
combining network.
7
Concurrency and parallel programming models
step 1) A ¼ B + C
step 2) D ¼ E + G
step 3) R ¼ A + D
The assignments of A and D in steps 1 and 2 (respectively) are said to be independent
of each other because there is no data flow between these two steps (i.e., the variables
E and G on the right side of step 2 do not appear on the left side step 1, and vice versa,
the variables B and C on the right sides of step 1 do not appear on the left side of
step 2.). Also the variable on the left side of step 1 (A) is not the same as the variable
on the left side of step 2 (D). This means that steps 1 and 2 can be executed in parallel
(i.e., at the same time). Step 3 is dependent on both steps 1 and 2, so cannot be
executed in parallel with either step 1 or 2.
Parallelprogramsmustbeconcurrent, but concurrentprogramsneed not beparallel.
Althoughmanyconcurrentprograms can be executed in parallel, interdependencies be-
tween concurrenttasks may preclude this. For example,an interleaved execution would
stillsatisfythedefinitionofconcurrencywhilenotexecutinginparallel.Asaresult,only
a subset of concurrent programs are parallel, and the set of all concurrent programs is
itself a subset of all programs. Figure 1.6 shows this relationship.
In the remainder of this section, some well-known approaches to programming con-
current and parallel systems are introduced with the aim of providing a foundation
before introducing OpenCL in Chapter 2.
All programs
Concurrent
programs
Parallel
programs
FIGURE 1.6
Parallel and concurrent programs are subsets of programs.
8 CHAPTER 1 Introduction to parallel programming
Threads and Shared Memory
A running program may consist of multiple subprograms that maintain their own in-
dependent control flow and that are allowed to run concurrently. These subprograms
are defined as threads. Communication between threads is via updates and access to
memory appearing in the same address space. Each thread has its own pool of local
memory—that is, variables—but all threads see the same set of global variables. A
simple analogy that can be used to describe the use of threads is the concept of a main
program that includes a number of subroutines. The main program is scheduled to
run by the operating system and performs necessary loading and acquisition of sys-
tem and user resources to run. Execution of the main program begins by performing
some serial work and then continues by creating a number of tasks that can be sched-
uled and run by the operating system concurrently using threads.
Each thread benefits from a global view of memory because it shares the same
memory address space of the main program. Threads communicate with each other
through global memory. This can require synchronization constructs to ensure that
more than one thread is not updating the same global address.
A memory consistency model is defined to manage load and store ordering. All
processors see the same address space and have direct access to these addresses with
the help of other processors. Mechanisms such as locks/semaphores are commonly
used to control access to shared memory that is accessed by multiple tasks. A key
feature of the shared memory model is the fact that the programmer is not responsible
for managing data movement, although depending on the consistency model imple-
mented in the hardware or runtime system, some level of memory consistency may
have to be enforced manually. This relaxes the requirement to specify explicitly the
communication of data between tasks, and as a result, parallel code development can
often be simplified.
There is a significant cost to supporting a fully consistent shared memory model in
hardware. For multiprocessor systems, the hardware structures required to support
this model become a limiting factor. Shared buses become bottlenecks in the design.
The extra hardware required typically grows exponentially in terms of its complexity
as we attempt to add additional processors. This has slowed the introduction of multi-
core and multiprocessor systems at the low end, and it has limited the number of cores
working together in a consistent shared memory system to relatively low numbers
because shared buses and coherence protocol overheads become bottlenecks. More
relaxed shared memory systems scale further, although in all cases scaling shared
memory systems comes at the cost of complicated and expensive interconnects.
Most multi-core CPU platforms support shared memory in one form or another.
OpenCL supports execution on shared memory devices.
Message-Passing Communication
The message-passing communication model enables explicit intercommunication of
a set of concurrent tasks that may use memory during computation. Multiple tasks
can reside on the same physical device and/or across an arbitrary number of devices.
9
Concurrency and parallel programming models
Tasks exchange data through communications by sending and receiving explicit
messages. Data transfer usually requires cooperative operations to be performed
by each process. For example, a send operation must have a matching receive
operation.
From a programming perspective, message-passing implementations commonly
comprise a library of hardware-independent routines for sending and receiving mes-
sages. The programmer is responsible for explicitly managing communication be-
tween tasks. Historically, a variety of message-passing libraries have been
available since the 1980s. MPI is currently the most popular message-passing mid-
dleware. These implementations differ substantially from each other, making it dif-
ficult for programmers to develop portable applications.
Different Grains of Parallelism
In parallel computing, granularity is a measure of the ratio of computation to com-
munication. Periods of computation are typically separated from periods of commu-
nication by synchronization events. The grain of parallelism is constrained by the
inherent characteristics of the algorithms constituting the application. It is important
that the parallel programmer selects the right granularity in order to reap the full ben-
efits of the underlying platform because choosing the right grain size can help to ex-
pose additional degrees of parallelism. Sometimes this selection is referred to as
“chunking,” determining the amount of data to assign to each task. Selecting the right
chunk size can help provide for further acceleration on parallel hardware. Next, we
consider some of the trade-offs associated with identifying the right grain size.
• Fine-grained parallelism
• Low arithmetic intensity.
• Maynothaveenoughworkto hidelong-durationasynchronouscommunication.
• Facilitates load balancing by providing a larger number of more manageable
(i.e., smaller) work units.
• If the granularity is too fine, it is possible that the overhead required for com-
munication and synchronization between tasks can actually produce a slower
parallel implementation than the original serial execution.
• Coarse-grained parallelism
• High arithmetic intensity.
• Complete applications can serve as the grain of parallelism.
• More difficult to load balance efficiently.
Given these trade-offs, which granularity will lead to the best implementation? The
most efficient granularity is dependent on the algorithm and the hardware environ-
ment in which it is run. In most cases, if the overhead associated with communication
and synchronization is high relative to the time of the computation task at hand, it
will generally be advantageous to work at a coarser granularity. Fine-grained paral-
lelism can help reduce overheads due to load imbalance or memory delays (this is
particularly true on a GPU, which depends on zero-overhead fine-grained thread
10 CHAPTER 1 Introduction to parallel programming
switching to hide memory latencies). Fine-grained parallelism can even occur at an
instruction level (this approach is used in very long instruction word (VLIW) and
superscalar architectures).
Data Sharing and Synchronization
Consider the case in which two applications run that do not share any data. As long as
the runtime system or operating system has access to adequate execution resources,
they can be run concurrently and even in parallel. If halfway through the execution of
one application it generated a result that was subsequently required by the second
application, then we would have to introduce some form of synchronization
into the system, and parallel execution—at least across the synchronization
point—becomes impossible.
When writing concurrent software, data sharing and synchronization play a crit-
ical role. Examples of data sharing in concurrent programs include
• the input of a task is dependent on the result of another task—for example, in a
producer/consumer or pipeline execution model; and
• when intermediate results are combined together (e.g., as part of a reduction, as in
our word search example shown in Figure 1.4).
Ideally, we would only attempt to parallelize portions of an application that are void
of data dependencies, but this is not always possible. Explicit synchronization prim-
itives such as barriers or locks may be used to support synchronization when neces-
sary. Although we only raise this issue here, later chapters revisit this question when
support for communication between host and device programs or when synchroni-
zation between tasks is required.
STRUCTURE
The remainder of the book is organized as follows:
Chapter 2 presents an introduction to OpenCL, including key concepts such as
kernels, platforms, and devices; the four different abstraction models; and devel-
oping your first OpenCL kernel. Understanding these different models is critical
to fully appreciate the richness of OpenCL’s programming model.
Chapter 3 presents some of the architectures that OpenCL does or might target,
including x86 CPUs, GPUs, and APUs. The text includes discussion of different
styles of architectures, including single instruction multiple data and VLIW. This
chapter also covers the concepts of multi-core and throughput-oriented systems,
as well as advances in heterogeneous architectures.
Chapter 4 introduces basic matrix multiplication, image rotation, and convolution
implementations to help the reader learn OpenCL by example.
11
Structure
Chapter 5 discusses concurrency and execution in the OpenCL programming
model. In this chapter, we discuss kernels, work items, and the OpenCL execu-
tion and memory hierarchies. We also show how queuing and synchronization
work in OpenCL such that the reader gains an understanding of how to write
OpenCL programs that interact with memory correctly.
Chapter 6 shows how OpenCL maps to an example architecture. For this study,
we choose a system comprising an AMD Phenom II CPU and an AMD Radeon
HD6970 GPU. This chapter allows us to show how the mappings of the OpenCL
programming model for largely serial architectures such as CPUs and vector/
throughput architectures such as GPUs differ, giving some idea how to optimize
for specific architectural styles.
Chapter 7 presents a case study that accelerates a convolution algorithm. Issues
related to memory space utilization and efficiency are considered, as well as work
item scheduling, wavefront occupancy, and overall efficiency. These techniques
arethefoundationsnecessaryfordevelopinghigh-performancecodeusingOpenCL.
Chapter 8 presents a case study targeting video processing, utilizing OpenCL to
build performant image processing effects that can be applied to video streams.
Chapter 9 presents another case study examining how to optimize the perfor-
mance of a histogramming application. In particular, it highlights how careful
design of workgroup size and memory access patterns can make a vast difference
to performance in memory-bound applications such as histograms.
Chapter 10 discusses how to leverage a heterogeneous CPU–GPU environment.
The target application is a mixed particle simulation (as illustrated on the cover of
this book) in which work is distributed across both the CPU and the GPU depend-
ing on the grain size of particles in the system.
Chapter 11 shows how to use OpenCL extensions using the device fission and
double precision extensions as examples.
Chapter 12 introduces the reader to debugging and analyzing OpenCL programs.
The right debugging tool can save a developer hundreds of wasted hours,
allowing him or her instead to learn the specific computer language and solve
the problem at hand.
Chapter 13 provides an overview and performance trade-offs of WebCL. WebCL
is not yet a product, but imagine what could be possible if the web were powered
by OpenCL. This chapter describes example OpenCL bindings for JavaScript,
discussing an implementation of a Firefox plug-in that allows web applications
to leverage the powerful parallel computing capabilities of modern CPUs and
GPUs.
Reference
Almasi, G. S., & Gottlieb, A. (1989). Highly Parallel Computing. Redwood City, CA: Ben-
jamin Cummings.
12 CHAPTER 1 Introduction to parallel programming
Further Reading and Relevant Websites
Chapman, B., Jost, G., van der Pas, R., & Kuck, D. J. (2007). Using OpenMP: Portable Shared
Memory Parallel Programming. Cambridge, MA: MIT Press.
Duffy, J. (2008). Concurrent Programming on Windows. Upper Saddle River, NJ: Addison-
Wesley.
Gropp, W., Lusk, E., & Skjellum, A. (1994). Using MPI: Portable Parallel Programming with
the Message-Passing Interface. MIT Press Scientific and Engineering Computation
Series. Cambridge, MA: MIT Press.
Herlihy, M., & Shavit, N. (2008). The Art of Multiprocessor Programming. Burlington, MA:
Morgan Kaufmann.
Khronos Group. OpenCL. www.khronos.org/opencl.
Mattson, T. G., Sanders, B. A., & Massingill, B. L. (2004). Patterns for Parallel Programming.
Upper Saddle River, NJ: Addison-Wesley.
NVIDA. CUDA Zone. http://www.nvidia.com/object/cuda_home_new.html.
AMD. OpenCL Zone. http://developer.amd.com/openclzone.
13
Further reading and relevant websites
CHAPTER
Introduction to OpenCL
2
INTRODUCTION
This chapter introduces OpenCL, the programming fabric that will allow us to weave
our application to execute concurrently. Programmers familiar with C and Cþþ
should have little trouble understanding the OpenCL syntax. We begin by reviewing
the OpenCL standard.
The OpenCL Standard
Open programming standards designers are tasked with a very challenging objective:
arrive at a common set of programming standards that are acceptable to a range of
competing needs and requirements. The Khronos consortium that manages the
OpenCL standard has done a good job addressing these requirements. The consor-
tium has developed an applications programming interface (API) that is general
enough to run on significantly different architectures while being adaptable enough
that each hardware platform can still obtain high performance. Using the core lan-
guage and correctly following the specification, any program designed for one ven-
dor can execute on another’s hardware. The model set forth by OpenCL creates
portable, vendor- and device-independent programs that are capable of being accel-
erated on many different hardware platforms.
The OpenCL API is a C with a Cþþ Wrapper API that is defined in terms of the C
API. There are third-party bindings for many languages, including Java, Python, and
.NET. The code that executes on an OpenCL device, which in general is not the same
device as the host CPU, is written in the OpenCL C language. OpenCL C is a
restricted version of the C99 language with extensions appropriate for executing
data-parallel code on a variety of heterogeneous devices.
The OpenCL Specification
The OpenCL specification is defined in four parts, called models, that can be sum-
marized as follows:
Heterogeneous Computing with OpenCL
© 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
15
Discovering Diverse Content Through
Random Scribd Documents
PASSION AND OTHER TALES. By Mrs. J. Thayer, Author of
"Floral Gems," &c. &c. 16mo. Cloth. 62-½
TURNOVER. A Tale of New Hampshire. Paper. 25
THE HISTORY OF THE HEN FEVER; a Humorous Record. By Geo.
P. Burnham. With twenty Illustrations. 12mo. Cloth. 1 25
The work is written in a happy but ludicrous style, and this
reliable history of the fowl mania in America, will create an
immense sensation.—Courier.
NEW MINIATURE VOLUMES.
THE ART OF CONVERSING. Written for the instruction of Youth
in the polite manners and language of the drawing-room, by a
Society of Gentlemen; with an illustrative title. Fourteenth
Edition. Gilt Edges. 37-½
THE SAME, Gilt Edges and Sides. 50
FLORAL GEMS: or, The Songs of the Flowers. By Mrs. J. Thayer.
Thirteenth Edition, with a beautiful frontispiece. Gilt Edges. 37-½
THE SAME, Gilt Edges and Sides. 50
THE AMETHYST: or, Poetical Gems. A Gift Book for all seasons.
Illustrated. Gilt Edges. 37-½
THE SAME, Gilt Edges and Sides. 40
ZION. With Illustrative Title. By Rev. Mr. Taylor. 42
THE SAME, Gilt Edges and Sides. 50
THE TRIUNE. With Illustrative Title. By Rev. Mr. Taylor. 37-½
TRIAD. With Illustrative Title. By Rev. Timothy A. Taylor. 37-½
TWO MOTTOES. By Rev. T.A. Taylor. 37-½
SOLACE. By Rev. T.A. Taylor. 37-½
THE SAME, Gilt Edges and Sides. 50
SONNETS. By Edward Moxon. 31-¼
THE SAME, Gilt Edges and Sides. 50
GRAY'S ELEGY, and other Poems. The Poetical Works of Thomas
Gray. "Poetry—Poetry;—Gray—Gray!" [Daniel Webster, the
night before his death, Oct. 24, 1852.]. 31
THE SAME, Gilt Edges and Sides. 50
The following Writing Books are offered on Liberal Terms.
FRENCH'S NEW WRITING BOOK, with a fine engraved copy on each
page. Just published, in Four Numbers, on a highly-improved plan.
No. 1 Contains the First Principles, &c. 10
No. 2 A fine Copy Hand. 10
No. 3 A bold Business Hand Writing. 10
No. 4 Beautiful Epistolary Writing for the Lady. 10
James French & Co., No. 78 Washington street, have just
published a new series of Writing Books for the use of Schools
and Academies. They are arranged upon a new and improved
plan, with a copy on each page, and ample instructions for
learners. We commend them to the attention of teachers and
parents.—Transcript.
They commence with those simple forms which the learner
needs first to make, and they conduct him, by natural and
appropriate steps, to those styles of the art which indicate the
chirography not only of the finished penman, but which are
adapted to the wants of those who wish to become
accomplished accountants.—Courier.
A new and original system of Writing Books, which cannot fail to
meet with favor. They consist of a series, and at the top of each
page is a finely-executed copy. We cordially recommend the
work.—Bee.
It is easily acquired, practical and beautiful.—Fitchburg Sentinel.
We have no hesitation in pronouncing them superior to anything
of the kind ever issued.—Star Spangled Banner.
FRENCH'S PRACTICAL WRITING BOOK, for the use of Schools and
Academies; in Three Numbers, with a copy for each page.
No. 1, Commencing with the First Principles. 10
No. 2, Running-hand copies for Business Purposes. 10
No. 3, Very fine copies, together with German Text and Old English.
10
BOSTON SCHOOL WRITING BOOK, for the use of Public and
Private Schools; in Six Numbers, with copies to assist the
Teacher and aid the Learner.
No. 1 Contains the Elementary Principles, together with the
Large Text Hand. 10
No. 2 Contains the Principles and First Exercises for a Small
Hand. 10
No. 3 Consists of the Capital Letters, and continuation of Small
Letters. 10
No. 4 Contains Business-hand Copies, beautifully executed. 10
No. 5 Consists of a continuation of Business Writing, also an
Alphabet of Roman Print. 10
No. 6 Contains many beautiful specimens of Epistolary Writing,
also an Alphabet of Old English and German Text. 10
LADIES' WRITING BOOK, for the use of Teachers and Learners,
with three engraved copies on each page, and the manner of
holding the pen, sitting at the table, &c., explained. 13
GENTLEMEN'S WRITING BOOK, for the use of Teachers and
Learners, with three engraved copies on each page, and the
manner of holding the pen, sitting at the table, &c., explained. 13
YANKEE PENMAN, Containing 48 pages, with engraved copies. 33
FRENCH'S EAGLE COVER WRITING BOOKS, made of fine blue
paper, without copies. 7
Transcriber's Note
Punctuation and formatting markup have been normalized.
Apparent printer's errors have been retained, unless stated
below.
Page numbers cited in illustration captions refer to their
discussion in the text. Illustrations have been moved near
their mention in the text, which has, in some instances,
affected page numbering
Page numbers have also been affected by the omission of
blank pages.
Page 21, "gray" changed to "grey" for consistency. (...rich and
poor, white, black and grey,—everybody was more or less
seriously affected by this curious epidemic.)
Page 60, "anexed" changed to "annexed". (In the addenda to
my Report (above named) there appeared the annexed
statement, by somebody:)
Page 88, "H.B.M." changed to "H.R.M." (Her Royal Majesty)
for consistency. (From Hon. Col. Phipps, H.R.M. Secretary.)
Page 116, "oustrip" changed to "outstrip". (At this time there
was found an ambitious individual, occasionally, who got
"ahead of his time," and whose laudable efforts to outstrip his
neighbors were only checked by the natural results of his own
superior "progressive" notions)
Page 153, "millenium" changed to "millennium". ("Fanny"
went into New York State, crowing when she left, crowing as
she went, and continuing to crow until she crowed the
community there clear through the next fourth o' July, out
into the fabled millennium.)
Page 162, "@" changed to "or". (The prices for chickens
ranged from $12 or $15 a pair, to $25 or $30, and often $40
to $50, a pair.)
*** END OF THE PROJECT GUTENBERG EBOOK THE HISTORY OF
THE HEN FEVER. A HUMOROUS RECORD ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
back
back
Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)
back
Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)
back
back
back
back
back
back
back
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookname.com

More Related Content

PDF
Using Opencl Programming Massively Parallel Computers Harcdr J Kowalik
PDF
CUDA by Example : Why CUDA? Why Now? : Notes
PDF
CUDA by Example : NOTES
PDF
Hardware/software Co-design: The Coming Golden Age
PDF
openCL Paper
PDF
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
DOCX
Parallel computing persentation
PPTX
Classification of computers
Using Opencl Programming Massively Parallel Computers Harcdr J Kowalik
CUDA by Example : Why CUDA? Why Now? : Notes
CUDA by Example : NOTES
Hardware/software Co-design: The Coming Golden Age
openCL Paper
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Parallel computing persentation
Classification of computers

Similar to Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.) (20)

PDF
Hybrid Multicore Computing : NOTES
PPTX
distributed system lab materials about ad
PDF
I understand that physics and hardware emmaded on the use of finete .pdf
PPTX
Unit 1 deeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeep.pptx
PDF
Highperformance Computing On The Intel Xeon Phi 2014th Edition Endong Wang
PPT
Current Trends in HPC
PDF
Nikravesh big datafeb2013bt
PDF
Uniprocessors to multiprocessors Uniprocessors to multiprocessors
DOCX
Super-Computer Architecture
PDF
introduction to advanced distributed system
PDF
Introduction into the problems of developing parallel programs
PPT
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
PDF
The Concurrency Challenge : Notes
PDF
Real-world Concurrency : Notes
PPTX
CC & Security for learners_Module 1.pptx
PPTX
CSE_17CS72_U1_S1_Pr.pptxx"xxxxxxxxxxxx"xx
PDF
Heterogenous system architecture(HSA)
PDF
The road to multi/many core computing
PPTX
Introduction to heterogeneous_computing_for_hpc
DOCX
Cloud computing
Hybrid Multicore Computing : NOTES
distributed system lab materials about ad
I understand that physics and hardware emmaded on the use of finete .pdf
Unit 1 deeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeep.pptx
Highperformance Computing On The Intel Xeon Phi 2014th Edition Endong Wang
Current Trends in HPC
Nikravesh big datafeb2013bt
Uniprocessors to multiprocessors Uniprocessors to multiprocessors
Super-Computer Architecture
introduction to advanced distributed system
Introduction into the problems of developing parallel programs
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
The Concurrency Challenge : Notes
Real-world Concurrency : Notes
CC & Security for learners_Module 1.pptx
CSE_17CS72_U1_S1_Pr.pptxx"xxxxxxxxxxxx"xx
Heterogenous system architecture(HSA)
The road to multi/many core computing
Introduction to heterogeneous_computing_for_hpc
Cloud computing
Ad

Recently uploaded (20)

PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Cell Structure & Organelles in detailed.
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Lesson notes of climatology university.
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Cell Types and Its function , kingdom of life
master seminar digital applications in india
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial diseases, their pathogenesis and prophylaxis
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
Microbial disease of the cardiovascular and lymphatic systems
Cell Structure & Organelles in detailed.
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Lesson notes of climatology university.
History, Philosophy and sociology of education (1).pptx
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
2.FourierTransform-ShortQuestionswithAnswers.pdf
Ad

Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)

  • 1. Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.) download https://ebookname.com/product/heterogeneous-computing-with-open- cl-1st-edition-perhaad-mistry-and-dana-schaa-auth/ Get Instant Ebook Downloads – Browse at https://ebookname.com
  • 2. Instant digital products (PDF, ePub, MOBI) available Download now and explore formats that suit you... Beowulf Cluster Computing with Linux 1st Edition Thomas Sterling https://ebookname.com/product/beowulf-cluster-computing-with- linux-1st-edition-thomas-sterling/ Open how Compaq ended IBM s PC domination and helped invent modern computing First E-Book Edition Canion https://ebookname.com/product/open-how-compaq-ended-ibm-s-pc- domination-and-helped-invent-modern-computing-first-e-book- edition-canion/ QoS Over Heterogeneous Networks 1st Edition Mario Marchese https://ebookname.com/product/qos-over-heterogeneous- networks-1st-edition-mario-marchese/ Business Economics II Macroeconomics Revised Edition Debes Mukherjee https://ebookname.com/product/business-economics-ii- macroeconomics-revised-edition-debes-mukherjee/
  • 3. Human Body Decomposition 1st Edition Jarvis Hayman https://ebookname.com/product/human-body-decomposition-1st- edition-jarvis-hayman/ Critical Mass The Emergence of Global Civil Society Studies in International Governance First Trade Edition James W. St.G. Walker https://ebookname.com/product/critical-mass-the-emergence-of- global-civil-society-studies-in-international-governance-first- trade-edition-james-w-st-g-walker/ Restoring Colorado River Ecosystems A Troubled Sense of Immensity 1st Edition Robert W. Adler https://ebookname.com/product/restoring-colorado-river- ecosystems-a-troubled-sense-of-immensity-1st-edition-robert-w- adler/ Ethical Programs Hospitality And The Rhetorics Of Software 1st Edition James J. Brown https://ebookname.com/product/ethical-programs-hospitality-and- the-rhetorics-of-software-1st-edition-james-j-brown/ Metaphor A Practical Introduction 1St Edition Edition Zoltan Kovecses https://ebookname.com/product/metaphor-a-practical- introduction-1st-edition-edition-zoltan-kovecses/
  • 4. Working Systemically with Families Formulation Intervention and Evaluation 1st Edition Arlene Vetere https://ebookname.com/product/working-systemically-with-families- formulation-intervention-and-evaluation-1st-edition-arlene- vetere/
  • 6. Heterogeneous Computing with OpenCL Benedict Gaster Lee Howes David R. Kaeli Perhaad Mistry Dana Schaa
  • 7. Acquiring Editor: Todd Green Development Editor: Robyn Day Project Manager: André Cuello Designer: Joanne Blank Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA # 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrange- ments with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of product liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Heterogeneous computing with OpenCL / Benedict Gaster ... [et al.]. p. cm. ISBN 978-0-12-387766-6 1. Parallel programming (Computer science) 2. OpenCL (Computer program language) I. Gaster, Benedict. QA76.642.H48 2012 005.2’752–dc23 2011020169 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-387766-6 For information on all MK publications visit our website at www.mkp.com Printed in the United States of America 12 13 14 15 10 9 8 7 6 5 4 3 2 1
  • 8. Foreword For more than two decades, the computer industry has been inspired and motivated by the observation made by Gordon Moore (A.K.A “Moore’s law”) that the density of transistors on die was doubling every 18 months. This observation created the an- ticipation that the performance a certain application achieves on one generation of processors will be doubled within two years when the next generation of processors will be announced. Constant improvement in manufacturing and processor technol- ogies was the main drive of this trend since it allowed any new processor generation to shrink all the transistor’s dimensions within the “golden factor”, 0.3 (ideal shrink) and to reduce the power supply accordingly. Thus, any new processor generation could double the density of transistors, to gain 50% speed improvement (frequency) while consuming the same power and keeping the same power density. When better performance was required, computer architects were focused on using the extra tran- sistors for pushing the frequency beyond what the shrink provided, and for adding new architectural features that mainly aim at gaining performance improvement for existing and new applications. During the mid 2000s, the transistor size became so small that the “physics of small devices” started to govern the characterization of the entire chip. Thus fre- quency improvement and density increase could not be achieved anymore without a significant increase of power consumption and of power density. A recent report by the International Technology Roadmap for Semiconductors (ITRS) supports this observation and indicates that this trend will continue for the foreseeable future and it will most likely become the most significant factor affecting technology scaling and the future of computer based system. To cope with the expectation of doubling the performance every known period of time (not 2 years anymore), two major changes happened (1) instead of increasing the frequency, modern processors increase the number of cores on each die. This trend forces the software to be changed as well. Since we cannot expect the hardware to achieve significantly better performance for a given application anymore, we need to develop new implementations for the same application that will take advantage of the multicore architecture, and (2) thermal and power become first class citizens with any design of future architecture. These trends encourage the community to start looking at heterogeneous solutions: systems which are assembled from different sub- systems, each of them optimized to achieve different optimization points or to ad- dress different workloads. For example, many systems combine “traditional” CPU architecture with special purpose FPGAs or Graphics Processors (GPUs). Such an integration can be done at different levels; e.g., at the system level, at the board level and recently at the core level. Developing software for homogeneous parallel and distributed systems is consid- ered to be a non-trivial task, even though such development uses well-known para- digms and well established programming languages, developing methods, algorithms, debugging tools, etc. Developing software to support general-purpose vii
  • 9. heterogeneous systems is relatively new and so less mature and much more difficult. As heterogeneous systems are becoming unavoidable, many of the major software and hardware manufacturers start creating software environments to support them. AMD proposed the use of the Brook language developed in Stanford University, to handle streaming computations, later extending the SW environment to include the Close to Metal (CTM)and the Compute Abstraction Layer (CAL) for accessing their low level streaming hardware primitives in order to take advantage of their highly threaded parallel architecture. NVIDIA took a similar approach, co-designing their recent generations of GPUs and the CUDA programming environment to take advantage of the highly threaded GPU environment. Intel proposed to extend the use of multi-core programming to program their Larrabee architecture. IBM proposed the use of message-passing-based software in order to take advantage of its hetero- geneous, non-coherent cell architecture and FPGA based solutions integrate libraries written in VHDL with C or Cþþ based programs to achieve the best of two envi- ronments. Each of these programming environments offers scope for benefiting do- main-specific applications, but they all failed to address the requirement for general purpose software that can serve different hardware architectures in the way that, for example, Java code can run on very different ISA architectures. The Open Computing Language (OpenCL) was designed to meet this important need. It was defined and managed by the nonprofit technology consortium Khronos The language and its development environment “borrows” many of its basic con- cepts from very successful, hardware specific environments such as CUDA, CAL, CTM, and blends them to create a hardware independent software development en- vironment. It supports different levels of parallelism and efficiently maps to homo- geneous or heterogeneous, single- or multiple-device systems consisting of CPUs, GPUs, FPGA and potentially other future devices. In order to support future devices, OpenCL defines a set of mechanisms that if met, the device could be seamlessly in- cluded as part of the OpenCL environment. OpenCL also defines a run-time support that allows to manage the resources, combine different types of hardware under the same execution environment and hopefully in the future it will allow to dynamically balance computations, power and other resources such as memory hierarchy, in a more general manner. This book is a text book that aims to teach students how to program heteroge- neous environments. The book starts with a very important discussion on how to pro- gram parallel systems and defines the concepts the students need to understand before starting to program any heterogeneous system. It also provides a taxonomy that can be used for understanding the different models used for parallel and distrib- uted systems. Chapters 2 – 4 build the students’ step by step understanding of the basic structures of OpenCL (Chapter 2) including the host and the device architecture (Chapter 3). Chapter 4 provides an example that puts together these concepts using a not trivial example. Chapters 5 and 6 extend the concepts we learned so far with a better understand- ing of the notions of concurrency and run-time execution in OpenCL (Chapter 5) and the dissection between the CPU and the GPU (Chapter 6). After building the basics, viii Foreword
  • 10. the book dedicates 4 Chapters (7-10) to more sophisticated examples. These sections are vital for students to understand that OpenCL can be used for a wide range of ap- plications which are beyond any domain specific mode of operation. The book also demonstrates how the same program can be run on different platforms, such as Nvi- dia or AMD. The book ends with three chapters which are dedicated to advanced topics. No doubt that this is a very important book that provides students and researchers with a better understanding of the world of heterogeneous computers in general and the solutions provided by OpenCL in particular. The book is well written, fits stu- dents’ different experience levels and so, can be used either as a text book in a course on OpenCL, or different parts of the book can be used to extend other courses; e.g., the first two chapters are well fitted for a course on parallel programming and some of the examples can be used as a part of advanced courses. Dr. Avi Mendelson Microsoft R&D Israel Adjunct Professor, Technion ix Foreword
  • 11. Preface OUR HETEROGENEOUS WORLD Our world is heterogeneous in nature. This kind of diversity provides a richness and detail that is difficult to describe. At the same time, it provides a level of complexity and interaction in which a wide range of different entities are optimized for specific tasks and environments. In computing, heterogeneous computer systems also add richness by allowing the programmer to select the best architecture to execute the task at hand or to choose the right task to make optimal use of a given architecture. These two views of the flex- ibility of a heterogeneous system both become apparent when solving a computa- tional problem involves a variety of different tasks. Recently, there has been an upsurge in the computer design community experimenting with building heteroge- neous systems. We are seeing new systems on the market that combine a number of different classes of architectures. What has slowed this progression has been a lack of standardized programming environment that can manage the diverse set of resources in a common framework. OPENCL OpenCL has been developed specifically to ease the programming burden when writ- ing applications for heterogeneous systems. OpenCL also addresses the current trend to increase the number of cores on a given architecture. The OpenCL framework sup- ports execution on multi-core central processing units, digital signal processors, field programmable gate arrays, graphics processing units, and heterogeneous accelerated processing units. The architectures already supported cover a wide range of ap- proaches to extracting parallelism and efficiency from memory systems and instruc- tion streams. Such diversity in architectures allows the designer to provide an optimized solution to his or her problem—a solution that, if designed within the OpenCL specification, can scale with the growth and breadth of available architec- tures. OpenCL’s standard abstractions and interfaces allow the programmer to seam- lessly “stitch” together an application within which execution can occur on a rich set of heterogeneous devices from one or many manufacturers. THIS TEXT Until now, there has not been a single definitive text that can help programmers and software engineers leverage the power and flexibility of the OpenCL programming standard. This is our attempt to address this void. With this goal in mind, we have not attempted to create a syntax guide—there are numerous good sources in which programmers can find a complete and up-to-date description of OpenCL syntax. xi
  • 12. Instead, this text is an attempt to show a developer or student how to leverage the OpenCL framework to build interesting and useful applications. We provide a number of examples of real applications to demonstrate the power of this program- ming standard. Our hope is that the reader will embrace this new programming framework and explore the full benefits of heterogeneous computing that it provides. We welcome comments on how to improve upon this text, and we hope that this text will help you build your next heterogeneous application. xii Preface
  • 13. Acknowledgments We thank Manju Hegde for proposing the book project, BaoHuong Phan and Todd Green for their management and input from the AMD and Morgan Kaufmann sides of the project, and Jay Owen for connecting the participants on this project with each other. On the technical side, we thank Jay Cornwall for his thorough work editing much of this text, and we thank Joachim Deguara, Takahiro Harada, Justin Hensley, Marc Romankewicz, and Byunghyun Jang for their significant contributions to in- dividual chapters, particularly the sequence of case studies that could not have been produced without their help. Also instrumental were Jari Nikara, Tomi Aarnio, and Eero Aho from the Nokia Research Center and Janne Pietiäinen from the Tampere University of Technology. xiii
  • 14. About the Authors Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, particularly examining high-level ab- stractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. He has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. He has a Ph.D. in computer science for his work on type systems for extensible records and variants. Lee Howes has spent the past 2 years working at AMD and currently focuses on programming models for the future of heterogeneous computing. His interests lie in declaratively representing mappings of iteration domains to data and in commu- nicating complicated architectural concepts and optimizations succinctly to a devel- oper audience, both through programming model improvements and through education. He has a Ph.D. in computer science from Imperial College London for work in this area. David Kaeli received a B.S. and Ph.D. in electrical engineering from Rutgers Uni- versity and an M.S. in computer engineering from Syracuse University. He is Asso- ciate Dean of Undergraduate Programs in the College of Engineering and a Full Professor on the ECE faculty at Northeastern University, where he directs the North- eastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, he spent 12 years at IBM, the last 7 at T. J. Watson Research Center, Yorktown Heights, NY. He has co-authored more than 200 criti- cally reviewed publications. His research spans a range of areas, including micro- architecture to back-end compilers and software engineering. He leads a number of research projects in the area of GPU computing. He currently serves as the Chair of the IEEE Technical Committee on Computer Architecture. He is an IEEE Fellow and a member of the ACM. Perhaad Mistry is a Ph.D. candidate at Northeastern University. He received a B.S. in electronics engineering from the University of Mumbai and an M.S. in computer engineering from Northeastern University. He is currently a member of the North- eastern University Computer Architecture Research Laboratory (NUCAR) and is ad- vised by Dr. David Kaeli. He works on a variety of parallel computing projects. He has designed scalable data structures for the physics simulations for GPGPU plat- forms and has also implemented medical reconstruction algorithms for heteroge- neous devices. His current research focuses on the design of profiling tools for heterogeneous computing. He is studying the potential of using standards such as OpenCL for building tools that simplify parallel programming and performance analysis across the variety of heterogeneous devices available today. xv
  • 15. Dana Schaa received a B.S. in computer engineering from California Polytechnic State University, San Luis Obispo, and an M.S. in electrical and computer engineer- ing from Northeastern University, where he is also currently a Ph.D. candidate. His research interests include parallel programming models and abstractions, particularly for GPU architectures. He has developed GPU-based implementations of several medical imaging research projects ranging from real-time visualization to image reconstruction in distributed, heterogeneous environments. He married his wonderful wife, Jenny, in 2010, and they live together in Boston with their charming cats. xvi About the authors
  • 16. CHAPTER Introduction to Parallel Programming 1 INTRODUCTION Today’s computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core microprocessors, central processing units (CPUs), digital signal processors, reconfigurable hardware (FPGAs), and graphic processing units (GPUs). Presented with so much heterogeneity, the process of de- veloping efficient software for such a wide array of architectures poses a number of challenges to the programming community. Applications possess a number of workload behaviors, ranging from control intensive (e.g., searching, sorting, and parsing) to data intensive (e.g., image processing, simulation and modeling, and data mining). Applications can also be characterized as compute intensive (e.g., iterative methods, numerical methods, and financial modeling), where the overall throughput of the application is heavily dependent on the computational efficiency of the underlying hardware. Each of these workload classes typically executes most efficiently on a specific style of hardware architecture. No single architecture is best for running all classes of workloads, and most applications possess a mix of the workload characteristics. For instance, control-intensive applications tend to run faster on superscalar CPUs, where significant die real estate has been devoted to branch prediction mecha- nisms, whereas data-intensive applications tend to run fast on vector architectures, where the same operation is applied to multiple data items concurrently. OPENCL The Open Computing Language (OpenCL) is a heterogeneous programming framework that is managed by the nonprofit technology consortium Khronos Group. OpenCL is a framework for developing applications that execute across a range of device types made by different vendors. It supports a wide range of levels of parallelism and efficiently maps to homogeneous or heterogeneous, single- or multiple-device systems consisting of CPUs, GPUs, and other types of de- vices limited only by the imagination of vendors. The OpenCL definition offers both a device-side language and a host management layer for the devices in a system. Heterogeneous Computing with OpenCL © 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved. 1
  • 17. The device-side language is designed to efficiently map to a wide range of memory systems.The hostlanguageaimstosupportefficient plumbingofcomplicatedconcur- rent programs with low overhead. Together, these provide the developer with a path to efficiently move from algorithm design to implementation. OpenCL provides parallel computing using task-based and data-based parallel- ism. It currently supports CPUs that include x86, ARM, and PowerPC, and it has been adopted into graphics card drivers by both AMD (called the Accelerated Parallel Processing SDK) and NVIDIA. Support for OpenCL is rapidly expanding as a wide range of platform vendors have adopted OpenCL and support or plan to support it for their hardware platforms. These vendors fall within a wide range of market segments, from the embedded vendors (ARM and Imagination Technologies) to the HPC vendors (AMD, Intel, NVIDIA, and IBM). The architectures supported include multi-core CPUs, throughput and vector processors such as GPUs, and fine- grained parallel devices such as FPGAs. Most important, OpenCL’s cross-platform, industrywide support makes it an excellent programming model for developers to learn and use, with the confidence that it will continue to be widely available for years to come with ever-increasing scope and applicability. THE GOALS OF THIS BOOK This book is the first of its kind to present OpenCL programming in a fashion appro- priate for the classroom. The book is organized to address the need for teaching par- allel programming on current system architectures using OpenCL as the target language, and it includes examples for CPUs, GPUs, and their integration in the ac- celerated processing unit (APU). Another major goal of this text is to provide a guide to programmers to develop well-designed programs in OpenCL targeting parallel systems. The book leads the programmer through the various abstractions and fea- tures provided by the OpenCL programming environment. The examples offer the reader a simple introduction and more complicated optimizations, and they suggest further development and goals at which to aim. It also discusses tools for improving the development process in terms of profiling and debugging such that the reader need not feel lost in the development process. The book is accompanied by a set of instructor slides and programming exam- ples, which support the use of this text by an OpenCL instructor. Please visit http://heterogeneouscomputingwithopencl.org/ for additional information. THINKING PARALLEL Most applications are first programmed to run on a single processor. In the field of high-performance computing, classical approaches have been used to accelerate computation when provided with multiple computing resources. Standard approaches 2 CHAPTER 1 Introduction to parallel programming
  • 18. include “divide-and-conquer” and “scatter–gather” problem decomposition methods, providing the programmer with a set of strategies to effectively exploit the parallel resources available in high-performance systems. Divide-and-conquer methods iter- atively break a problem into subproblems until the subproblems fit well on the com- putational resources provided. Scatter–gather methods send a subset of the input data set to each parallel resource and then collect the results of the computation and com- bine them into a result data set. As before, the partitioning takes account of the size of the subsets based on the capabilities of the parallel resources. Figure 1.1 shows how popular applications such as sorting and a vector–scalar multiply can be effectively mapped to parallel resources to accelerate processing. The programming task becomes increasingly challenging when faced with the growing parallelism and heterogeneity present in contemporary parallel processors. Given the power and thermal limits of complementary metal-oxide semiconductor (CMOS) technology, microprocessor vendors find it difficult to scale the frequency of these devices to derive more performance and have instead decided to place mul- tiple processors, sometimes specialized, on a single chip. In doing so, the problem of extracting parallelism from an application is left to the programmer, who must de- compose the underlying algorithms in the applications and map them efficiently to a diverse variety of target hardware platforms. In the past 5 years, parallel computing devices have been increasing in number and processing capabilities. GPUs have also appeared on the computing scene and A B FIGURE 1.1 (A) Simple sorting and (B) dot product examples. 3 Thinking parallel
  • 19. are providing new levels of processing capability at very low cost. Driven by the demand for real-time three-dimensional graphics rendering, a highly data-parallel problem, GPUs have evolved rapidly as very powerful, fully programmable, task and data-parallel architectures. Hardware manufacturers are now combining CPUs and GPUs on a single die, ushering in a new generation of heterogeneous computing. Compute-intensive and data-intensive portions of a given application, called kernels, may be offloaded to the GPU, providing significant performance per watt and raw performance gains, while the host CPU continues to execute nonkernel tasks. Many systems and phenomena in both the natural world and the man-made world present us with different classes of parallelism and concurrency: • Molecular dynamics • Weather and ocean patterns • Multimedia systems • Tectonic plate drift • Cell growth • Automobile assembly lines • Sound and light wave propagation Parallel computing, as defined by Almasi and Gottlieb (1989), is “a form of compu- tation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (i.e., in parallel).” The degree of parallelism that can be achieved is dependent on the inherent nature of the problem at hand (remember that there ex- ists significant parallelism in the world), and the skill of the algorithm or software designer is to identify the different forms of parallelism present in the underlying problem. We begin with a discussion of two simple examples to demonstrate inher- ent parallel computation: vector multiplication and text searching. Our first example carries out multiplication of the elements of two arrays A and B, each with N elements, storing the result of each multiply in a corresponding array C. Figure 1.2 shows the computation we would like to carry out. The serial Cþþ program for code would look as follows: for (i¼0; i<N; i++) C[i] ¼ A[i] * B[i]; This code possesses significant parallelism but very little arithmetic intensity. The computation of every element in C is independent of every other element. If we were to parallelize this code, we could choose to generate a separate execution instance to perform the computation of each element of C. This code possesses significant data- level parallelism because the same operation is applied across all of A and B to pro- duce C. We could also view this breakdown as a simple form of task parallelism where each task operates on a subset of the same data; however, task parallelism gen- eralizes further to execution on pipelines of data or even more sophisticated parallel interactions. Figure 1.3 shows an example of task parallelism in a pipeline to support filtering of images in frequency space using an FFT. 4 CHAPTER 1 Introduction to parallel programming
  • 20. Let us consider a second example. The computation we are trying to carry out is to find the number of occurrences of a string of characters in a body of text (Figure 1.4). Assume that the body of text has already been parsed into a set of N words. We could choose to divide the task of comparing the string against the N po- tential matches into N comparisons (i.e., tasks), where each string of characters is matched against the text string. This approach, although rather naı̈ve in terms of search efficiency, is highly parallel. The process of the text string being compared against the set of potential words presents N parallel tasks, each carrying out the same Array A[ ] Array B[ ] Array C [ ] X X X X X Parallel multiplies No communication between computations FIGURE 1.2 Multiplying two arrays: This example provides for parallel computation without any need for communication. Input Image 0 Output image 0 Input image 2 Input image 3 Input Image 0 Input Image 0 Input Image 0 Input image 4 FFT Inverse FFT Frequency space filter FIGURE 1.3 Filtering a series of images using an FFT shows clear task parallelism as a series of tasks operate together in a pipeline to compute the overall result. 5 Thinking parallel
  • 21. set of operations. There is even further parallelism within a single comparison task, where the matching on a character-by-character basis presents a finer-grained degree of parallelism. This example exhibits both data-level parallelism (we are going to be performing the same operation on multiple data items) and task-level parallelism (we can compare the string to all words concurrently). Oncethenumberofmatchesisdetermined, weneedto accumulatethem toprovide the total number of occurrences. Again, this summing can exploit parallelism. In this step, we introduce the concept of “reduction,” where we can utilize the availability of parallelresourcestocombinepartialssumsinaveryefficientmanner.Figure1.5shows the reduction tree, which illustrates this summation process in log N steps. CONCURRENCY AND PARALLEL PROGRAMMING MODELS Here, we discuss concurrency and parallel processing models so that when attempt- ing to map an application developed in OpenCL to a parallel platform, we can select the right model to pursue. Although all of the following models can be supported in OpenCL, the underlying hardware may restrict which model will be practical to use. N parallel comparison tasks Finer-grained character-by-character parallelism Search string Document words Valence Base Acid Acid String compare String compare String compare String compare Oxygen Char compare Char compare Char compare Char compare a b c a i s d e FIGURE 1.4 An example of both task-level and data-level parallelism. We can have parallel tasks that count the occurrence of string in a body of text. The lower portion of the figure shows that the string comparison can be broken down to finer-grained parallel processing. 6 CHAPTER 1 Introduction to parallel programming
  • 22. Concurrency is concerned with two or more activities happening at the same time. We find concurrency in the real world all the time—for example, carrying a child in one arm while crossing a road or, more generally, thinking about something while doing something else with one’s hands. When talking about concurrency in terms of computer programming, we mean a single system performing multiple tasks independently. Although it is possible that concurrent tasks may be executed at the same time (i.e., in parallel), this is not a re- quirement. For example, consider a simple drawing application, which is either re- ceiving input from the user via the mouse and keyboard or updating the display with the current image. Conceptually, receiving and processing input are different oper- ations (i.e., tasks) from updating the display. These tasks can be expressed in terms of concurrency, but they do not need to be performed in parallel. In fact, in the case in which they are executing on a single core of a CPU, they cannot be performed in parallel. In this case, the application or the operating system should switch between the tasks, allowing both some time to run on the core. Parallelism is concerned with running two or more activities in parallel with the explicit goal of increasing overall performance. For example, consider the following assignments: Reduction network Number of matches Acid String compare String compare String compare String compare Sum the number of matches Sum the number of matches Sum the number of matches FIGURE 1.5 After all string comparisons are completed, we can sum up the number of matches in a combining network. 7 Concurrency and parallel programming models
  • 23. step 1) A ¼ B + C step 2) D ¼ E + G step 3) R ¼ A + D The assignments of A and D in steps 1 and 2 (respectively) are said to be independent of each other because there is no data flow between these two steps (i.e., the variables E and G on the right side of step 2 do not appear on the left side step 1, and vice versa, the variables B and C on the right sides of step 1 do not appear on the left side of step 2.). Also the variable on the left side of step 1 (A) is not the same as the variable on the left side of step 2 (D). This means that steps 1 and 2 can be executed in parallel (i.e., at the same time). Step 3 is dependent on both steps 1 and 2, so cannot be executed in parallel with either step 1 or 2. Parallelprogramsmustbeconcurrent, but concurrentprogramsneed not beparallel. Althoughmanyconcurrentprograms can be executed in parallel, interdependencies be- tween concurrenttasks may preclude this. For example,an interleaved execution would stillsatisfythedefinitionofconcurrencywhilenotexecutinginparallel.Asaresult,only a subset of concurrent programs are parallel, and the set of all concurrent programs is itself a subset of all programs. Figure 1.6 shows this relationship. In the remainder of this section, some well-known approaches to programming con- current and parallel systems are introduced with the aim of providing a foundation before introducing OpenCL in Chapter 2. All programs Concurrent programs Parallel programs FIGURE 1.6 Parallel and concurrent programs are subsets of programs. 8 CHAPTER 1 Introduction to parallel programming
  • 24. Threads and Shared Memory A running program may consist of multiple subprograms that maintain their own in- dependent control flow and that are allowed to run concurrently. These subprograms are defined as threads. Communication between threads is via updates and access to memory appearing in the same address space. Each thread has its own pool of local memory—that is, variables—but all threads see the same set of global variables. A simple analogy that can be used to describe the use of threads is the concept of a main program that includes a number of subroutines. The main program is scheduled to run by the operating system and performs necessary loading and acquisition of sys- tem and user resources to run. Execution of the main program begins by performing some serial work and then continues by creating a number of tasks that can be sched- uled and run by the operating system concurrently using threads. Each thread benefits from a global view of memory because it shares the same memory address space of the main program. Threads communicate with each other through global memory. This can require synchronization constructs to ensure that more than one thread is not updating the same global address. A memory consistency model is defined to manage load and store ordering. All processors see the same address space and have direct access to these addresses with the help of other processors. Mechanisms such as locks/semaphores are commonly used to control access to shared memory that is accessed by multiple tasks. A key feature of the shared memory model is the fact that the programmer is not responsible for managing data movement, although depending on the consistency model imple- mented in the hardware or runtime system, some level of memory consistency may have to be enforced manually. This relaxes the requirement to specify explicitly the communication of data between tasks, and as a result, parallel code development can often be simplified. There is a significant cost to supporting a fully consistent shared memory model in hardware. For multiprocessor systems, the hardware structures required to support this model become a limiting factor. Shared buses become bottlenecks in the design. The extra hardware required typically grows exponentially in terms of its complexity as we attempt to add additional processors. This has slowed the introduction of multi- core and multiprocessor systems at the low end, and it has limited the number of cores working together in a consistent shared memory system to relatively low numbers because shared buses and coherence protocol overheads become bottlenecks. More relaxed shared memory systems scale further, although in all cases scaling shared memory systems comes at the cost of complicated and expensive interconnects. Most multi-core CPU platforms support shared memory in one form or another. OpenCL supports execution on shared memory devices. Message-Passing Communication The message-passing communication model enables explicit intercommunication of a set of concurrent tasks that may use memory during computation. Multiple tasks can reside on the same physical device and/or across an arbitrary number of devices. 9 Concurrency and parallel programming models
  • 25. Tasks exchange data through communications by sending and receiving explicit messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation. From a programming perspective, message-passing implementations commonly comprise a library of hardware-independent routines for sending and receiving mes- sages. The programmer is responsible for explicitly managing communication be- tween tasks. Historically, a variety of message-passing libraries have been available since the 1980s. MPI is currently the most popular message-passing mid- dleware. These implementations differ substantially from each other, making it dif- ficult for programmers to develop portable applications. Different Grains of Parallelism In parallel computing, granularity is a measure of the ratio of computation to com- munication. Periods of computation are typically separated from periods of commu- nication by synchronization events. The grain of parallelism is constrained by the inherent characteristics of the algorithms constituting the application. It is important that the parallel programmer selects the right granularity in order to reap the full ben- efits of the underlying platform because choosing the right grain size can help to ex- pose additional degrees of parallelism. Sometimes this selection is referred to as “chunking,” determining the amount of data to assign to each task. Selecting the right chunk size can help provide for further acceleration on parallel hardware. Next, we consider some of the trade-offs associated with identifying the right grain size. • Fine-grained parallelism • Low arithmetic intensity. • Maynothaveenoughworkto hidelong-durationasynchronouscommunication. • Facilitates load balancing by providing a larger number of more manageable (i.e., smaller) work units. • If the granularity is too fine, it is possible that the overhead required for com- munication and synchronization between tasks can actually produce a slower parallel implementation than the original serial execution. • Coarse-grained parallelism • High arithmetic intensity. • Complete applications can serve as the grain of parallelism. • More difficult to load balance efficiently. Given these trade-offs, which granularity will lead to the best implementation? The most efficient granularity is dependent on the algorithm and the hardware environ- ment in which it is run. In most cases, if the overhead associated with communication and synchronization is high relative to the time of the computation task at hand, it will generally be advantageous to work at a coarser granularity. Fine-grained paral- lelism can help reduce overheads due to load imbalance or memory delays (this is particularly true on a GPU, which depends on zero-overhead fine-grained thread 10 CHAPTER 1 Introduction to parallel programming
  • 26. switching to hide memory latencies). Fine-grained parallelism can even occur at an instruction level (this approach is used in very long instruction word (VLIW) and superscalar architectures). Data Sharing and Synchronization Consider the case in which two applications run that do not share any data. As long as the runtime system or operating system has access to adequate execution resources, they can be run concurrently and even in parallel. If halfway through the execution of one application it generated a result that was subsequently required by the second application, then we would have to introduce some form of synchronization into the system, and parallel execution—at least across the synchronization point—becomes impossible. When writing concurrent software, data sharing and synchronization play a crit- ical role. Examples of data sharing in concurrent programs include • the input of a task is dependent on the result of another task—for example, in a producer/consumer or pipeline execution model; and • when intermediate results are combined together (e.g., as part of a reduction, as in our word search example shown in Figure 1.4). Ideally, we would only attempt to parallelize portions of an application that are void of data dependencies, but this is not always possible. Explicit synchronization prim- itives such as barriers or locks may be used to support synchronization when neces- sary. Although we only raise this issue here, later chapters revisit this question when support for communication between host and device programs or when synchroni- zation between tasks is required. STRUCTURE The remainder of the book is organized as follows: Chapter 2 presents an introduction to OpenCL, including key concepts such as kernels, platforms, and devices; the four different abstraction models; and devel- oping your first OpenCL kernel. Understanding these different models is critical to fully appreciate the richness of OpenCL’s programming model. Chapter 3 presents some of the architectures that OpenCL does or might target, including x86 CPUs, GPUs, and APUs. The text includes discussion of different styles of architectures, including single instruction multiple data and VLIW. This chapter also covers the concepts of multi-core and throughput-oriented systems, as well as advances in heterogeneous architectures. Chapter 4 introduces basic matrix multiplication, image rotation, and convolution implementations to help the reader learn OpenCL by example. 11 Structure
  • 27. Chapter 5 discusses concurrency and execution in the OpenCL programming model. In this chapter, we discuss kernels, work items, and the OpenCL execu- tion and memory hierarchies. We also show how queuing and synchronization work in OpenCL such that the reader gains an understanding of how to write OpenCL programs that interact with memory correctly. Chapter 6 shows how OpenCL maps to an example architecture. For this study, we choose a system comprising an AMD Phenom II CPU and an AMD Radeon HD6970 GPU. This chapter allows us to show how the mappings of the OpenCL programming model for largely serial architectures such as CPUs and vector/ throughput architectures such as GPUs differ, giving some idea how to optimize for specific architectural styles. Chapter 7 presents a case study that accelerates a convolution algorithm. Issues related to memory space utilization and efficiency are considered, as well as work item scheduling, wavefront occupancy, and overall efficiency. These techniques arethefoundationsnecessaryfordevelopinghigh-performancecodeusingOpenCL. Chapter 8 presents a case study targeting video processing, utilizing OpenCL to build performant image processing effects that can be applied to video streams. Chapter 9 presents another case study examining how to optimize the perfor- mance of a histogramming application. In particular, it highlights how careful design of workgroup size and memory access patterns can make a vast difference to performance in memory-bound applications such as histograms. Chapter 10 discusses how to leverage a heterogeneous CPU–GPU environment. The target application is a mixed particle simulation (as illustrated on the cover of this book) in which work is distributed across both the CPU and the GPU depend- ing on the grain size of particles in the system. Chapter 11 shows how to use OpenCL extensions using the device fission and double precision extensions as examples. Chapter 12 introduces the reader to debugging and analyzing OpenCL programs. The right debugging tool can save a developer hundreds of wasted hours, allowing him or her instead to learn the specific computer language and solve the problem at hand. Chapter 13 provides an overview and performance trade-offs of WebCL. WebCL is not yet a product, but imagine what could be possible if the web were powered by OpenCL. This chapter describes example OpenCL bindings for JavaScript, discussing an implementation of a Firefox plug-in that allows web applications to leverage the powerful parallel computing capabilities of modern CPUs and GPUs. Reference Almasi, G. S., & Gottlieb, A. (1989). Highly Parallel Computing. Redwood City, CA: Ben- jamin Cummings. 12 CHAPTER 1 Introduction to parallel programming
  • 28. Further Reading and Relevant Websites Chapman, B., Jost, G., van der Pas, R., & Kuck, D. J. (2007). Using OpenMP: Portable Shared Memory Parallel Programming. Cambridge, MA: MIT Press. Duffy, J. (2008). Concurrent Programming on Windows. Upper Saddle River, NJ: Addison- Wesley. Gropp, W., Lusk, E., & Skjellum, A. (1994). Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press Scientific and Engineering Computation Series. Cambridge, MA: MIT Press. Herlihy, M., & Shavit, N. (2008). The Art of Multiprocessor Programming. Burlington, MA: Morgan Kaufmann. Khronos Group. OpenCL. www.khronos.org/opencl. Mattson, T. G., Sanders, B. A., & Massingill, B. L. (2004). Patterns for Parallel Programming. Upper Saddle River, NJ: Addison-Wesley. NVIDA. CUDA Zone. http://www.nvidia.com/object/cuda_home_new.html. AMD. OpenCL Zone. http://developer.amd.com/openclzone. 13 Further reading and relevant websites
  • 29. CHAPTER Introduction to OpenCL 2 INTRODUCTION This chapter introduces OpenCL, the programming fabric that will allow us to weave our application to execute concurrently. Programmers familiar with C and Cþþ should have little trouble understanding the OpenCL syntax. We begin by reviewing the OpenCL standard. The OpenCL Standard Open programming standards designers are tasked with a very challenging objective: arrive at a common set of programming standards that are acceptable to a range of competing needs and requirements. The Khronos consortium that manages the OpenCL standard has done a good job addressing these requirements. The consor- tium has developed an applications programming interface (API) that is general enough to run on significantly different architectures while being adaptable enough that each hardware platform can still obtain high performance. Using the core lan- guage and correctly following the specification, any program designed for one ven- dor can execute on another’s hardware. The model set forth by OpenCL creates portable, vendor- and device-independent programs that are capable of being accel- erated on many different hardware platforms. The OpenCL API is a C with a Cþþ Wrapper API that is defined in terms of the C API. There are third-party bindings for many languages, including Java, Python, and .NET. The code that executes on an OpenCL device, which in general is not the same device as the host CPU, is written in the OpenCL C language. OpenCL C is a restricted version of the C99 language with extensions appropriate for executing data-parallel code on a variety of heterogeneous devices. The OpenCL Specification The OpenCL specification is defined in four parts, called models, that can be sum- marized as follows: Heterogeneous Computing with OpenCL © 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved. 15
  • 30. Discovering Diverse Content Through Random Scribd Documents
  • 31. PASSION AND OTHER TALES. By Mrs. J. Thayer, Author of "Floral Gems," &c. &c. 16mo. Cloth. 62-½ TURNOVER. A Tale of New Hampshire. Paper. 25 THE HISTORY OF THE HEN FEVER; a Humorous Record. By Geo. P. Burnham. With twenty Illustrations. 12mo. Cloth. 1 25 The work is written in a happy but ludicrous style, and this reliable history of the fowl mania in America, will create an immense sensation.—Courier. NEW MINIATURE VOLUMES. THE ART OF CONVERSING. Written for the instruction of Youth in the polite manners and language of the drawing-room, by a Society of Gentlemen; with an illustrative title. Fourteenth Edition. Gilt Edges. 37-½ THE SAME, Gilt Edges and Sides. 50 FLORAL GEMS: or, The Songs of the Flowers. By Mrs. J. Thayer. Thirteenth Edition, with a beautiful frontispiece. Gilt Edges. 37-½ THE SAME, Gilt Edges and Sides. 50 THE AMETHYST: or, Poetical Gems. A Gift Book for all seasons. Illustrated. Gilt Edges. 37-½ THE SAME, Gilt Edges and Sides. 40 ZION. With Illustrative Title. By Rev. Mr. Taylor. 42 THE SAME, Gilt Edges and Sides. 50 THE TRIUNE. With Illustrative Title. By Rev. Mr. Taylor. 37-½
  • 32. TRIAD. With Illustrative Title. By Rev. Timothy A. Taylor. 37-½ TWO MOTTOES. By Rev. T.A. Taylor. 37-½ SOLACE. By Rev. T.A. Taylor. 37-½ THE SAME, Gilt Edges and Sides. 50 SONNETS. By Edward Moxon. 31-¼ THE SAME, Gilt Edges and Sides. 50 GRAY'S ELEGY, and other Poems. The Poetical Works of Thomas Gray. "Poetry—Poetry;—Gray—Gray!" [Daniel Webster, the night before his death, Oct. 24, 1852.]. 31 THE SAME, Gilt Edges and Sides. 50 The following Writing Books are offered on Liberal Terms. FRENCH'S NEW WRITING BOOK, with a fine engraved copy on each page. Just published, in Four Numbers, on a highly-improved plan. No. 1 Contains the First Principles, &c. 10 No. 2 A fine Copy Hand. 10 No. 3 A bold Business Hand Writing. 10 No. 4 Beautiful Epistolary Writing for the Lady. 10 James French & Co., No. 78 Washington street, have just published a new series of Writing Books for the use of Schools and Academies. They are arranged upon a new and improved plan, with a copy on each page, and ample instructions for learners. We commend them to the attention of teachers and parents.—Transcript. They commence with those simple forms which the learner needs first to make, and they conduct him, by natural and appropriate steps, to those styles of the art which indicate the
  • 33. chirography not only of the finished penman, but which are adapted to the wants of those who wish to become accomplished accountants.—Courier. A new and original system of Writing Books, which cannot fail to meet with favor. They consist of a series, and at the top of each page is a finely-executed copy. We cordially recommend the work.—Bee. It is easily acquired, practical and beautiful.—Fitchburg Sentinel. We have no hesitation in pronouncing them superior to anything of the kind ever issued.—Star Spangled Banner. FRENCH'S PRACTICAL WRITING BOOK, for the use of Schools and Academies; in Three Numbers, with a copy for each page. No. 1, Commencing with the First Principles. 10 No. 2, Running-hand copies for Business Purposes. 10 No. 3, Very fine copies, together with German Text and Old English. 10 BOSTON SCHOOL WRITING BOOK, for the use of Public and Private Schools; in Six Numbers, with copies to assist the Teacher and aid the Learner. No. 1 Contains the Elementary Principles, together with the Large Text Hand. 10 No. 2 Contains the Principles and First Exercises for a Small Hand. 10 No. 3 Consists of the Capital Letters, and continuation of Small Letters. 10 No. 4 Contains Business-hand Copies, beautifully executed. 10 No. 5 Consists of a continuation of Business Writing, also an Alphabet of Roman Print. 10 No. 6 Contains many beautiful specimens of Epistolary Writing, also an Alphabet of Old English and German Text. 10
  • 34. LADIES' WRITING BOOK, for the use of Teachers and Learners, with three engraved copies on each page, and the manner of holding the pen, sitting at the table, &c., explained. 13 GENTLEMEN'S WRITING BOOK, for the use of Teachers and Learners, with three engraved copies on each page, and the manner of holding the pen, sitting at the table, &c., explained. 13 YANKEE PENMAN, Containing 48 pages, with engraved copies. 33 FRENCH'S EAGLE COVER WRITING BOOKS, made of fine blue paper, without copies. 7
  • 35. Transcriber's Note Punctuation and formatting markup have been normalized. Apparent printer's errors have been retained, unless stated below. Page numbers cited in illustration captions refer to their discussion in the text. Illustrations have been moved near their mention in the text, which has, in some instances, affected page numbering Page numbers have also been affected by the omission of blank pages. Page 21, "gray" changed to "grey" for consistency. (...rich and poor, white, black and grey,—everybody was more or less seriously affected by this curious epidemic.) Page 60, "anexed" changed to "annexed". (In the addenda to my Report (above named) there appeared the annexed statement, by somebody:) Page 88, "H.B.M." changed to "H.R.M." (Her Royal Majesty) for consistency. (From Hon. Col. Phipps, H.R.M. Secretary.) Page 116, "oustrip" changed to "outstrip". (At this time there was found an ambitious individual, occasionally, who got "ahead of his time," and whose laudable efforts to outstrip his neighbors were only checked by the natural results of his own superior "progressive" notions) Page 153, "millenium" changed to "millennium". ("Fanny" went into New York State, crowing when she left, crowing as she went, and continuing to crow until she crowed the
  • 36. community there clear through the next fourth o' July, out into the fabled millennium.) Page 162, "@" changed to "or". (The prices for chickens ranged from $12 or $15 a pair, to $25 or $30, and often $40 to $50, a pair.)
  • 37. *** END OF THE PROJECT GUTENBERG EBOOK THE HISTORY OF THE HEN FEVER. A HUMOROUS RECORD *** Updated editions will replace the previous one—the old editions will be renamed. Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution. START: FULL LICENSE
  • 38. THE FULL PROJECT GUTENBERG LICENSE
  • 39. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license. Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works 1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8. 1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
  • 40. 1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others. 1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States. 1.E. Unless you have removed all references to Project Gutenberg: 1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project
  • 41. Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed: This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. 1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9. 1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work. 1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files
  • 42. containing a part of this work or any other work associated with Project Gutenberg™. 1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty
  • 43. payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright
  • 44. law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund.
  • 45. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™
  • 46. Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non- profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
  • 47. Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and
  • 48. credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 49. back
  • 50. back
  • 52. back
  • 54. back
  • 55. back
  • 56. back
  • 57. back
  • 58. back
  • 59. back
  • 60. back
  • 61. Welcome to our website – the ideal destination for book lovers and knowledge seekers. With a mission to inspire endlessly, we offer a vast collection of books, ranging from classic literary works to specialized publications, self-development books, and children's literature. Each book is a new journey of discovery, expanding knowledge and enriching the soul of the reade Our website is not just a platform for buying books, but a bridge connecting readers to the timeless values of culture and wisdom. With an elegant, user-friendly interface and an intelligent search system, we are committed to providing a quick and convenient shopping experience. Additionally, our special promotions and home delivery services ensure that you save time and fully enjoy the joy of reading. Let us accompany you on the journey of exploring knowledge and personal growth! ebookname.com