Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)

Heterogeneous Computing with Open CL 1st Edition
Perhaad Mistry And Dana Schaa (Auth.) download
https://ebookname.com/product/heterogeneous-computing-with-open-
cl-1st-edition-perhaad-mistry-and-dana-schaa-auth/
Get Instant Ebook Downloads – Browse at https://ebookname.com

Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...
Beowulf Cluster Computing with Linux 1st Edition Thomas
Sterling
https://ebookname.com/product/beowulf-cluster-computing-with-
linux-1st-edition-thomas-sterling/
Open how Compaq ended IBM s PC domination and helped
invent modern computing First E-Book Edition Canion
https://ebookname.com/product/open-how-compaq-ended-ibm-s-pc-
domination-and-helped-invent-modern-computing-first-e-book-
edition-canion/
QoS Over Heterogeneous Networks 1st Edition Mario
Marchese
https://ebookname.com/product/qos-over-heterogeneous-
networks-1st-edition-mario-marchese/
Business Economics II Macroeconomics Revised Edition
Debes Mukherjee
https://ebookname.com/product/business-economics-ii-
macroeconomics-revised-edition-debes-mukherjee/

Human Body Decomposition 1st Edition Jarvis Hayman
https://ebookname.com/product/human-body-decomposition-1st-
edition-jarvis-hayman/
Critical Mass The Emergence of Global Civil Society
Studies in International Governance First Trade Edition
James W. St.G. Walker
https://ebookname.com/product/critical-mass-the-emergence-of-
global-civil-society-studies-in-international-governance-first-
trade-edition-james-w-st-g-walker/
Restoring Colorado River Ecosystems A Troubled Sense of
Immensity 1st Edition Robert W. Adler
https://ebookname.com/product/restoring-colorado-river-
ecosystems-a-troubled-sense-of-immensity-1st-edition-robert-w-
adler/
Ethical Programs Hospitality And The Rhetorics Of
Software 1st Edition James J. Brown
https://ebookname.com/product/ethical-programs-hospitality-and-
the-rhetorics-of-software-1st-edition-james-j-brown/
Metaphor A Practical Introduction 1St Edition Edition
Zoltan Kovecses
https://ebookname.com/product/metaphor-a-practical-
introduction-1st-edition-edition-zoltan-kovecses/

Working Systemically with Families Formulation
Intervention and Evaluation 1st Edition Arlene Vetere
https://ebookname.com/product/working-systemically-with-families-
formulation-intervention-and-evaluation-1st-edition-arlene-
vetere/

Heterogeneous
Computing with
OpenCL

Heterogeneous
Computing with
OpenCL
Benedict Gaster
Lee Howes
David R. Kaeli
Perhaad Mistry
Dana Schaa

Acquiring Editor: Todd Green
Development Editor: Robyn Day
Project Manager: André Cuello
Designer: Joanne Blank
Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
# 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrange-
ments with organizations such as the Copyright Clearance Center and the Copyright Licensing
Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and
experience broaden our understanding, changes in research methods or professional
practices may become necessary. Practitioners and researchers must always rely on their own
experience and knowledge in evaluating and using any information or methods described
herein. In using such information or methods they should be mindful of their own safety and
the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of product
liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Heterogeneous computing with OpenCL / Benedict Gaster ... [et al.].
p. cm.
ISBN 978-0-12-387766-6
1. Parallel programming (Computer science) 2. OpenCL (Computer program language)
I. Gaster, Benedict.
QA76.642.H48 2012
005.2’752–dc23
2011020169
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-0-12-387766-6
For information on all MK publications
visit our website at www.mkp.com
Printed in the United States of America
12 13 14 15 10 9 8 7 6 5 4 3 2 1

Foreword
For more than two decades, the computer industry has been inspired and motivated
by the observation made by Gordon Moore (A.K.A “Moore’s law”) that the density
of transistors on die was doubling every 18 months. This observation created the an-
ticipation that the performance a certain application achieves on one generation of
processors will be doubled within two years when the next generation of processors
will be announced. Constant improvement in manufacturing and processor technol-
ogies was the main drive of this trend since it allowed any new processor generation
to shrink all the transistor’s dimensions within the “golden factor”, 0.3 (ideal shrink)
and to reduce the power supply accordingly. Thus, any new processor generation
could double the density of transistors, to gain 50% speed improvement (frequency)
while consuming the same power and keeping the same power density. When better
performance was required, computer architects were focused on using the extra tran-
sistors for pushing the frequency beyond what the shrink provided, and for adding
new architectural features that mainly aim at gaining performance improvement
for existing and new applications.
During the mid 2000s, the transistor size became so small that the “physics of
small devices” started to govern the characterization of the entire chip. Thus fre-
quency improvement and density increase could not be achieved anymore without
a significant increase of power consumption and of power density. A recent report
by the International Technology Roadmap for Semiconductors (ITRS) supports this
observation and indicates that this trend will continue for the foreseeable future and it
will most likely become the most significant factor affecting technology scaling and
the future of computer based system.
To cope with the expectation of doubling the performance every known period of
time (not 2 years anymore), two major changes happened (1) instead of increasing
the frequency, modern processors increase the number of cores on each die. This
trend forces the software to be changed as well. Since we cannot expect the hardware
to achieve significantly better performance for a given application anymore, we need
to develop new implementations for the same application that will take advantage of
the multicore architecture, and (2) thermal and power become first class citizens with
any design of future architecture. These trends encourage the community to start
looking at heterogeneous solutions: systems which are assembled from different sub-
systems, each of them optimized to achieve different optimization points or to ad-
dress different workloads. For example, many systems combine “traditional” CPU
architecture with special purpose FPGAs or Graphics Processors (GPUs). Such an
integration can be done at different levels; e.g., at the system level, at the board level
and recently at the core level.
Developing software for homogeneous parallel and distributed systems is consid-
ered to be a non-trivial task, even though such development uses well-known para-
digms and well established programming languages, developing methods,
algorithms, debugging tools, etc. Developing software to support general-purpose
vii

heterogeneous systems is relatively new and so less mature and much more difficult.
As heterogeneous systems are becoming unavoidable, many of the major software
and hardware manufacturers start creating software environments to support them.
AMD proposed the use of the Brook language developed in Stanford University,
to handle streaming computations, later extending the SW environment to include
the Close to Metal (CTM)and the Compute Abstraction Layer (CAL) for accessing
their low level streaming hardware primitives in order to take advantage of their
highly threaded parallel architecture. NVIDIA took a similar approach, co-designing
their recent generations of GPUs and the CUDA programming environment to take
advantage of the highly threaded GPU environment. Intel proposed to extend the use
of multi-core programming to program their Larrabee architecture. IBM proposed
the use of message-passing-based software in order to take advantage of its hetero-
geneous, non-coherent cell architecture and FPGA based solutions integrate libraries
written in VHDL with C or Cþþ based programs to achieve the best of two envi-
ronments. Each of these programming environments offers scope for benefiting do-
main-specific applications, but they all failed to address the requirement for general
purpose software that can serve different hardware architectures in the way that, for
example, Java code can run on very different ISA architectures.
The Open Computing Language (OpenCL) was designed to meet this important
need. It was defined and managed by the nonprofit technology consortium Khronos
The language and its development environment “borrows” many of its basic con-
cepts from very successful, hardware specific environments such as CUDA, CAL,
CTM, and blends them to create a hardware independent software development en-
vironment. It supports different levels of parallelism and efficiently maps to homo-
geneous or heterogeneous, single- or multiple-device systems consisting of CPUs,
GPUs, FPGA and potentially other future devices. In order to support future devices,
OpenCL defines a set of mechanisms that if met, the device could be seamlessly in-
cluded as part of the OpenCL environment. OpenCL also defines a run-time support
that allows to manage the resources, combine different types of hardware under the
same execution environment and hopefully in the future it will allow to dynamically
balance computations, power and other resources such as memory hierarchy, in a
more general manner.
This book is a text book that aims to teach students how to program heteroge-
neous environments. The book starts with a very important discussion on how to pro-
gram parallel systems and defines the concepts the students need to understand
before starting to program any heterogeneous system. It also provides a taxonomy
that can be used for understanding the different models used for parallel and distrib-
uted systems. Chapters 2 – 4 build the students’ step by step understanding of the
basic structures of OpenCL (Chapter 2) including the host and the device architecture
(Chapter 3). Chapter 4 provides an example that puts together these concepts using a
not trivial example.
Chapters 5 and 6 extend the concepts we learned so far with a better understand-
ing of the notions of concurrency and run-time execution in OpenCL (Chapter 5) and
the dissection between the CPU and the GPU (Chapter 6). After building the basics,
viii Foreword

the book dedicates 4 Chapters (7-10) to more sophisticated examples. These sections
are vital for students to understand that OpenCL can be used for a wide range of ap-
plications which are beyond any domain specific mode of operation. The book also
demonstrates how the same program can be run on different platforms, such as Nvi-
dia or AMD. The book ends with three chapters which are dedicated to advanced
topics.
No doubt that this is a very important book that provides students and researchers
with a better understanding of the world of heterogeneous computers in general and
the solutions provided by OpenCL in particular. The book is well written, fits stu-
dents’ different experience levels and so, can be used either as a text book in a course
on OpenCL, or different parts of the book can be used to extend other courses; e.g.,
the first two chapters are well fitted for a course on parallel programming and some
of the examples can be used as a part of advanced courses.
Dr. Avi Mendelson
Microsoft R&D Israel
Adjunct Professor, Technion
ix
Foreword

Preface
OUR HETEROGENEOUS WORLD
Our world is heterogeneous in nature. This kind of diversity provides a richness and
detail that is difficult to describe. At the same time, it provides a level of complexity
and interaction in which a wide range of different entities are optimized for specific
tasks and environments.
In computing, heterogeneous computer systems also add richness by allowing the
programmer to select the best architecture to execute the task at hand or to choose the
right task to make optimal use of a given architecture. These two views of the flex-
ibility of a heterogeneous system both become apparent when solving a computa-
tional problem involves a variety of different tasks. Recently, there has been an
upsurge in the computer design community experimenting with building heteroge-
neous systems. We are seeing new systems on the market that combine a number of
different classes of architectures. What has slowed this progression has been a lack
of standardized programming environment that can manage the diverse set of
resources in a common framework.
OPENCL
OpenCL has been developed specifically to ease the programming burden when writ-
ing applications for heterogeneous systems. OpenCL also addresses the current trend
to increase the number of cores on a given architecture. The OpenCL framework sup-
ports execution on multi-core central processing units, digital signal processors, field
programmable gate arrays, graphics processing units, and heterogeneous accelerated
processing units. The architectures already supported cover a wide range of ap-
proaches to extracting parallelism and efficiency from memory systems and instruc-
tion streams. Such diversity in architectures allows the designer to provide an
optimized solution to his or her problem—a solution that, if designed within the
OpenCL specification, can scale with the growth and breadth of available architec-
tures. OpenCL’s standard abstractions and interfaces allow the programmer to seam-
lessly “stitch” together an application within which execution can occur on a rich set
of heterogeneous devices from one or many manufacturers.
THIS TEXT
Until now, there has not been a single definitive text that can help programmers and
software engineers leverage the power and flexibility of the OpenCL programming
standard. This is our attempt to address this void. With this goal in mind, we have not
attempted to create a syntax guide—there are numerous good sources in which
programmers can find a complete and up-to-date description of OpenCL syntax.
xi

Instead, this text is an attempt to show a developer or student how to leverage
the OpenCL framework to build interesting and useful applications. We provide a
number of examples of real applications to demonstrate the power of this program-
ming standard.
Our hope is that the reader will embrace this new programming framework and
explore the full benefits of heterogeneous computing that it provides. We welcome
comments on how to improve upon this text, and we hope that this text will help you
build your next heterogeneous application.
xii Preface

Acknowledgments
We thank Manju Hegde for proposing the book project, BaoHuong Phan and Todd
Green for their management and input from the AMD and Morgan Kaufmann sides
of the project, and Jay Owen for connecting the participants on this project with each
other. On the technical side, we thank Jay Cornwall for his thorough work editing
much of this text, and we thank Joachim Deguara, Takahiro Harada, Justin Hensley,
Marc Romankewicz, and Byunghyun Jang for their significant contributions to in-
dividual chapters, particularly the sequence of case studies that could not have been
produced without their help. Also instrumental were Jari Nikara, Tomi Aarnio, and
Eero Aho from the Nokia Research Center and Janne Pietiäinen from the Tampere
University of Technology.
xiii

About the Authors
Benedict R. Gaster is a software architect working on programming models for
next-generation heterogeneous processors, particularly examining high-level ab-
stractions for parallel programming on the emerging class of processors that contain
both CPUs and accelerators such as GPUs. He has contributed extensively to the
OpenCL’s design and has represented AMD at the Khronos Group open standard
consortium. He has a Ph.D. in computer science for his work on type systems for
extensible records and variants.
Lee Howes has spent the past 2 years working at AMD and currently focuses on
programming models for the future of heterogeneous computing. His interests lie
in declaratively representing mappings of iteration domains to data and in commu-
nicating complicated architectural concepts and optimizations succinctly to a devel-
oper audience, both through programming model improvements and through
education. He has a Ph.D. in computer science from Imperial College London for
work in this area.
David Kaeli received a B.S. and Ph.D. in electrical engineering from Rutgers Uni-
versity and an M.S. in computer engineering from Syracuse University. He is Asso-
ciate Dean of Undergraduate Programs in the College of Engineering and a Full
Professor on the ECE faculty at Northeastern University, where he directs the North-
eastern University Computer Architecture Research Laboratory (NUCAR). Prior to
joining Northeastern in 1993, he spent 12 years at IBM, the last 7 at T. J. Watson
Research Center, Yorktown Heights, NY. He has co-authored more than 200 criti-
cally reviewed publications. His research spans a range of areas, including micro-
architecture to back-end compilers and software engineering. He leads a number
of research projects in the area of GPU computing. He currently serves as the Chair
of the IEEE Technical Committee on Computer Architecture. He is an IEEE Fellow
and a member of the ACM.
Perhaad Mistry is a Ph.D. candidate at Northeastern University. He received a B.S.
in electronics engineering from the University of Mumbai and an M.S. in computer
engineering from Northeastern University. He is currently a member of the North-
eastern University Computer Architecture Research Laboratory (NUCAR) and is ad-
vised by Dr. David Kaeli. He works on a variety of parallel computing projects. He
has designed scalable data structures for the physics simulations for GPGPU plat-
forms and has also implemented medical reconstruction algorithms for heteroge-
neous devices. His current research focuses on the design of profiling tools for
heterogeneous computing. He is studying the potential of using standards such as
OpenCL for building tools that simplify parallel programming and performance
analysis across the variety of heterogeneous devices available today.
xv

Dana Schaa received a B.S. in computer engineering from California Polytechnic
State University, San Luis Obispo, and an M.S. in electrical and computer engineer-
ing from Northeastern University, where he is also currently a Ph.D. candidate.
His research interests include parallel programming models and abstractions,
particularly for GPU architectures. He has developed GPU-based implementations
of several medical imaging research projects ranging from real-time visualization
to image reconstruction in distributed, heterogeneous environments. He married
his wonderful wife, Jenny, in 2010, and they live together in Boston with their
charming cats.
xvi About the authors

CHAPTER
Introduction to Parallel
Programming
1
INTRODUCTION
Today’s computing environments are becoming more multifaceted, exploiting the
capabilities of a range of multi-core microprocessors, central processing units
(CPUs), digital signal processors, reconfigurable hardware (FPGAs), and graphic
processing units (GPUs). Presented with so much heterogeneity, the process of de-
veloping efficient software for such a wide array of architectures poses a number of
challenges to the programming community.
Applications possess a number of workload behaviors, ranging from control
intensive (e.g., searching, sorting, and parsing) to data intensive (e.g., image
processing, simulation and modeling, and data mining). Applications can also
be characterized as compute intensive (e.g., iterative methods, numerical methods,
and financial modeling), where the overall throughput of the application is heavily
dependent on the computational efficiency of the underlying hardware. Each of
these workload classes typically executes most efficiently on a specific style of
hardware architecture. No single architecture is best for running all classes of
workloads, and most applications possess a mix of the workload characteristics.
For instance, control-intensive applications tend to run faster on superscalar CPUs,
where significant die real estate has been devoted to branch prediction mecha-
nisms, whereas data-intensive applications tend to run fast on vector architectures,
where the same operation is applied to multiple data items concurrently.
OPENCL
The Open Computing Language (OpenCL) is a heterogeneous programming
framework that is managed by the nonprofit technology consortium Khronos
Group. OpenCL is a framework for developing applications that execute across
a range of device types made by different vendors. It supports a wide range of
levels of parallelism and efficiently maps to homogeneous or heterogeneous,
single- or multiple-device systems consisting of CPUs, GPUs, and other types of de-
vices limited only by the imagination of vendors. The OpenCL definition offers both
a device-side language and a host management layer for the devices in a system.
Heterogeneous Computing with OpenCL
© 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
1

The device-side language is designed to efficiently map to a wide range of memory
systems.The hostlanguageaimstosupportefficient plumbingofcomplicatedconcur-
rent programs with low overhead. Together, these provide the developer with a path to
efficiently move from algorithm design to implementation.
OpenCL provides parallel computing using task-based and data-based parallel-
ism. It currently supports CPUs that include x86, ARM, and PowerPC, and it has
been adopted into graphics card drivers by both AMD (called the Accelerated
Parallel Processing SDK) and NVIDIA. Support for OpenCL is rapidly expanding
as a wide range of platform vendors have adopted OpenCL and support or plan to
support it for their hardware platforms. These vendors fall within a wide range of
market segments, from the embedded vendors (ARM and Imagination Technologies)
to the HPC vendors (AMD, Intel, NVIDIA, and IBM). The architectures supported
include multi-core CPUs, throughput and vector processors such as GPUs, and fine-
grained parallel devices such as FPGAs.
Most important, OpenCL’s cross-platform, industrywide support makes it an
excellent programming model for developers to learn and use, with the confidence
that it will continue to be widely available for years to come with ever-increasing
scope and applicability.
THE GOALS OF THIS BOOK
This book is the first of its kind to present OpenCL programming in a fashion appro-
priate for the classroom. The book is organized to address the need for teaching par-
allel programming on current system architectures using OpenCL as the target
language, and it includes examples for CPUs, GPUs, and their integration in the ac-
celerated processing unit (APU). Another major goal of this text is to provide a guide
to programmers to develop well-designed programs in OpenCL targeting parallel
systems. The book leads the programmer through the various abstractions and fea-
tures provided by the OpenCL programming environment. The examples offer the
reader a simple introduction and more complicated optimizations, and they suggest
further development and goals at which to aim. It also discusses tools for improving
the development process in terms of profiling and debugging such that the reader
need not feel lost in the development process.
The book is accompanied by a set of instructor slides and programming exam-
ples, which support the use of this text by an OpenCL instructor. Please visit
http://heterogeneouscomputingwithopencl.org/ for additional information.
THINKING PARALLEL
Most applications are first programmed to run on a single processor. In the field
of high-performance computing, classical approaches have been used to accelerate
computation when provided with multiple computing resources. Standard approaches
2 CHAPTER 1 Introduction to parallel programming

include “divide-and-conquer” and “scatter–gather” problem decomposition methods,
providing the programmer with a set of strategies to effectively exploit the parallel
resources available in high-performance systems. Divide-and-conquer methods iter-
atively break a problem into subproblems until the subproblems fit well on the com-
putational resources provided. Scatter–gather methods send a subset of the input data
set to each parallel resource and then collect the results of the computation and com-
bine them into a result data set. As before, the partitioning takes account of the size of
the subsets based on the capabilities of the parallel resources. Figure 1.1 shows how
popular applications such as sorting and a vector–scalar multiply can be effectively
mapped to parallel resources to accelerate processing.
The programming task becomes increasingly challenging when faced with the
growing parallelism and heterogeneity present in contemporary parallel processors.
Given the power and thermal limits of complementary metal-oxide semiconductor
(CMOS) technology, microprocessor vendors find it difficult to scale the frequency
of these devices to derive more performance and have instead decided to place mul-
tiple processors, sometimes specialized, on a single chip. In doing so, the problem of
extracting parallelism from an application is left to the programmer, who must de-
compose the underlying algorithms in the applications and map them efficiently to a
diverse variety of target hardware platforms.
In the past 5 years, parallel computing devices have been increasing in number
and processing capabilities. GPUs have also appeared on the computing scene and
A B
FIGURE 1.1
(A) Simple sorting and (B) dot product examples.
3
Thinking parallel

are providing new levels of processing capability at very low cost. Driven by the
demand for real-time three-dimensional graphics rendering, a highly data-parallel
problem, GPUs have evolved rapidly as very powerful, fully programmable, task
and data-parallel architectures. Hardware manufacturers are now combining CPUs
and GPUs on a single die, ushering in a new generation of heterogeneous computing.
Compute-intensive and data-intensive portions of a given application, called kernels,
may be offloaded to the GPU, providing significant performance per watt and raw
performance gains, while the host CPU continues to execute nonkernel tasks.
Many systems and phenomena in both the natural world and the man-made world
present us with different classes of parallelism and concurrency:
• Molecular dynamics
• Weather and ocean patterns
• Multimedia systems
• Tectonic plate drift
• Cell growth
• Automobile assembly lines
• Sound and light wave propagation
Parallel computing, as defined by Almasi and Gottlieb (1989), is “a form of compu-
tation in which many calculations are carried out simultaneously, operating on the
principle that large problems can often be divided into smaller ones, which are then
solved concurrently (i.e., in parallel).” The degree of parallelism that can be achieved
is dependent on the inherent nature of the problem at hand (remember that there ex-
ists significant parallelism in the world), and the skill of the algorithm or software
designer is to identify the different forms of parallelism present in the underlying
problem. We begin with a discussion of two simple examples to demonstrate inher-
ent parallel computation: vector multiplication and text searching.
Our first example carries out multiplication of the elements of two arrays A and B,
each with N elements, storing the result of each multiply in a corresponding array C.
Figure 1.2 shows the computation we would like to carry out. The serial Cþþ
program for code would look as follows:
for (i¼0; i<N; i++)
C[i] ¼ A[i] * B[i];
This code possesses significant parallelism but very little arithmetic intensity. The
computation of every element in C is independent of every other element. If we were
to parallelize this code, we could choose to generate a separate execution instance to
perform the computation of each element of C. This code possesses significant data-
level parallelism because the same operation is applied across all of A and B to pro-
duce C. We could also view this breakdown as a simple form of task parallelism
where each task operates on a subset of the same data; however, task parallelism gen-
eralizes further to execution on pipelines of data or even more sophisticated parallel
interactions. Figure 1.3 shows an example of task parallelism in a pipeline to support
filtering of images in frequency space using an FFT.

Let us consider a second example. The computation we are trying to carry out is
to find the number of occurrences of a string of characters in a body of text
(Figure 1.4). Assume that the body of text has already been parsed into a set of N
words. We could choose to divide the task of comparing the string against the N po-
tential matches into N comparisons (i.e., tasks), where each string of characters is
matched against the text string. This approach, although rather naı̈ve in terms of
search efficiency, is highly parallel. The process of the text string being compared
against the set of potential words presents N parallel tasks, each carrying out the same
Array A[ ]
Array B[ ]
Array C
[ ]
X
X
X
X
X
Parallel
multiplies
No communication
between
computations
FIGURE 1.2
Multiplying two arrays: This example provides for parallel computation without any need for
communication.
Input
Image 0
Output
image 0
Input
image 2
Input
image 3
Input
Image 0
Input
Image 0
Input
Image 0
Input
image 4
FFT
Inverse
FFT
Frequency
space filter
FIGURE 1.3
Filtering a series of images using an FFT shows clear task parallelism as a series of tasks
operate together in a pipeline to compute the overall result.
5
Thinking parallel

set of operations. There is even further parallelism within a single comparison task,
where the matching on a character-by-character basis presents a finer-grained degree
of parallelism. This example exhibits both data-level parallelism (we are going to be
performing the same operation on multiple data items) and task-level parallelism (we
can compare the string to all words concurrently).
Oncethenumberofmatchesisdetermined, weneedto accumulatethem toprovide
the total number of occurrences. Again, this summing can exploit parallelism. In this
step, we introduce the concept of “reduction,” where we can utilize the availability of
parallelresourcestocombinepartialssumsinaveryefficientmanner.Figure1.5shows
the reduction tree, which illustrates this summation process in log N steps.
CONCURRENCY AND PARALLEL PROGRAMMING MODELS
Here, we discuss concurrency and parallel processing models so that when attempt-
ing to map an application developed in OpenCL to a parallel platform, we can select
the right model to pursue. Although all of the following models can be supported in
OpenCL, the underlying hardware may restrict which model will be practical to use.
N parallel comparison tasks
Finer-grained character-by-character parallelism
Search string
Document words
Valence
Base
Acid
Acid
String
compare
String
compare
String
compare
String
compare
Oxygen
Char
compare
Char
compare
Char
compare
Char
compare
a b c a i s d e
FIGURE 1.4
An example of both task-level and data-level parallelism. We can have parallel tasks that
count the occurrence of string in a body of text. The lower portion of the figure shows that the
string comparison can be broken down to finer-grained parallel processing.

Concurrency is concerned with two or more activities happening at the same
time. We find concurrency in the real world all the time—for example, carrying a
child in one arm while crossing a road or, more generally, thinking about something
while doing something else with one’s hands.
When talking about concurrency in terms of computer programming, we mean a
single system performing multiple tasks independently. Although it is possible that
concurrent tasks may be executed at the same time (i.e., in parallel), this is not a re-
quirement. For example, consider a simple drawing application, which is either re-
ceiving input from the user via the mouse and keyboard or updating the display with
the current image. Conceptually, receiving and processing input are different oper-
ations (i.e., tasks) from updating the display. These tasks can be expressed in terms of
concurrency, but they do not need to be performed in parallel. In fact, in the case in
which they are executing on a single core of a CPU, they cannot be performed in
parallel. In this case, the application or the operating system should switch between
the tasks, allowing both some time to run on the core.
Parallelism is concerned with running two or more activities in parallel with the
explicit goal of increasing overall performance. For example, consider the following
assignments:
Reduction
network
Number of matches
Acid
String
compare
String
compare
String
compare
String
compare
Sum the
number of
matches
Sum the
number of
matches
Sum the
number of
matches
FIGURE 1.5
After all string comparisons are completed, we can sum up the number of matches in a
combining network.
7
Concurrency and parallel programming models

step 1) A ¼ B + C
step 2) D ¼ E + G
step 3) R ¼ A + D
The assignments of A and D in steps 1 and 2 (respectively) are said to be independent
of each other because there is no data flow between these two steps (i.e., the variables
E and G on the right side of step 2 do not appear on the left side step 1, and vice versa,
the variables B and C on the right sides of step 1 do not appear on the left side of
step 2.). Also the variable on the left side of step 1 (A) is not the same as the variable
on the left side of step 2 (D). This means that steps 1 and 2 can be executed in parallel
(i.e., at the same time). Step 3 is dependent on both steps 1 and 2, so cannot be
executed in parallel with either step 1 or 2.
Parallelprogramsmustbeconcurrent, but concurrentprogramsneed not beparallel.
Althoughmanyconcurrentprograms can be executed in parallel, interdependencies be-
tween concurrenttasks may preclude this. For example,an interleaved execution would
stillsatisfythedefinitionofconcurrencywhilenotexecutinginparallel.Asaresult,only
a subset of concurrent programs are parallel, and the set of all concurrent programs is
itself a subset of all programs. Figure 1.6 shows this relationship.
In the remainder of this section, some well-known approaches to programming con-
current and parallel systems are introduced with the aim of providing a foundation
before introducing OpenCL in Chapter 2.
All programs
Concurrent
programs
Parallel
programs
FIGURE 1.6
Parallel and concurrent programs are subsets of programs.

Threads and Shared Memory
A running program may consist of multiple subprograms that maintain their own in-
dependent control flow and that are allowed to run concurrently. These subprograms
are defined as threads. Communication between threads is via updates and access to
memory appearing in the same address space. Each thread has its own pool of local
memory—that is, variables—but all threads see the same set of global variables. A
simple analogy that can be used to describe the use of threads is the concept of a main
program that includes a number of subroutines. The main program is scheduled to
run by the operating system and performs necessary loading and acquisition of sys-
tem and user resources to run. Execution of the main program begins by performing
some serial work and then continues by creating a number of tasks that can be sched-
uled and run by the operating system concurrently using threads.
Each thread benefits from a global view of memory because it shares the same
memory address space of the main program. Threads communicate with each other
through global memory. This can require synchronization constructs to ensure that
more than one thread is not updating the same global address.
A memory consistency model is defined to manage load and store ordering. All
processors see the same address space and have direct access to these addresses with
the help of other processors. Mechanisms such as locks/semaphores are commonly
used to control access to shared memory that is accessed by multiple tasks. A key
feature of the shared memory model is the fact that the programmer is not responsible
for managing data movement, although depending on the consistency model imple-
mented in the hardware or runtime system, some level of memory consistency may
have to be enforced manually. This relaxes the requirement to specify explicitly the
communication of data between tasks, and as a result, parallel code development can
often be simplified.
There is a significant cost to supporting a fully consistent shared memory model in
hardware. For multiprocessor systems, the hardware structures required to support
this model become a limiting factor. Shared buses become bottlenecks in the design.
The extra hardware required typically grows exponentially in terms of its complexity
as we attempt to add additional processors. This has slowed the introduction of multi-
core and multiprocessor systems at the low end, and it has limited the number of cores
working together in a consistent shared memory system to relatively low numbers
because shared buses and coherence protocol overheads become bottlenecks. More
relaxed shared memory systems scale further, although in all cases scaling shared
memory systems comes at the cost of complicated and expensive interconnects.
Most multi-core CPU platforms support shared memory in one form or another.
OpenCL supports execution on shared memory devices.
Message-Passing Communication
The message-passing communication model enables explicit intercommunication of
a set of concurrent tasks that may use memory during computation. Multiple tasks
can reside on the same physical device and/or across an arbitrary number of devices.
9
Concurrency and parallel programming models

Tasks exchange data through communications by sending and receiving explicit
messages. Data transfer usually requires cooperative operations to be performed
by each process. For example, a send operation must have a matching receive
operation.
From a programming perspective, message-passing implementations commonly
comprise a library of hardware-independent routines for sending and receiving mes-
sages. The programmer is responsible for explicitly managing communication be-
tween tasks. Historically, a variety of message-passing libraries have been
available since the 1980s. MPI is currently the most popular message-passing mid-
dleware. These implementations differ substantially from each other, making it dif-
ficult for programmers to develop portable applications.
Different Grains of Parallelism
In parallel computing, granularity is a measure of the ratio of computation to com-
munication. Periods of computation are typically separated from periods of commu-
nication by synchronization events. The grain of parallelism is constrained by the
inherent characteristics of the algorithms constituting the application. It is important
that the parallel programmer selects the right granularity in order to reap the full ben-
efits of the underlying platform because choosing the right grain size can help to ex-
pose additional degrees of parallelism. Sometimes this selection is referred to as
“chunking,” determining the amount of data to assign to each task. Selecting the right
chunk size can help provide for further acceleration on parallel hardware. Next, we
consider some of the trade-offs associated with identifying the right grain size.
• Fine-grained parallelism
• Low arithmetic intensity.
• Maynothaveenoughworkto hidelong-durationasynchronouscommunication.
• Facilitates load balancing by providing a larger number of more manageable
(i.e., smaller) work units.
• If the granularity is too fine, it is possible that the overhead required for com-
munication and synchronization between tasks can actually produce a slower
parallel implementation than the original serial execution.
• Coarse-grained parallelism
• High arithmetic intensity.
• Complete applications can serve as the grain of parallelism.
• More difficult to load balance efficiently.
Given these trade-offs, which granularity will lead to the best implementation? The
most efficient granularity is dependent on the algorithm and the hardware environ-
ment in which it is run. In most cases, if the overhead associated with communication
and synchronization is high relative to the time of the computation task at hand, it
will generally be advantageous to work at a coarser granularity. Fine-grained paral-
lelism can help reduce overheads due to load imbalance or memory delays (this is
particularly true on a GPU, which depends on zero-overhead fine-grained thread

switching to hide memory latencies). Fine-grained parallelism can even occur at an
instruction level (this approach is used in very long instruction word (VLIW) and
superscalar architectures).
Data Sharing and Synchronization
Consider the case in which two applications run that do not share any data. As long as
the runtime system or operating system has access to adequate execution resources,
they can be run concurrently and even in parallel. If halfway through the execution of
one application it generated a result that was subsequently required by the second
application, then we would have to introduce some form of synchronization
into the system, and parallel execution—at least across the synchronization
point—becomes impossible.
When writing concurrent software, data sharing and synchronization play a crit-
ical role. Examples of data sharing in concurrent programs include
• the input of a task is dependent on the result of another task—for example, in a
producer/consumer or pipeline execution model; and
• when intermediate results are combined together (e.g., as part of a reduction, as in
our word search example shown in Figure 1.4).
Ideally, we would only attempt to parallelize portions of an application that are void
of data dependencies, but this is not always possible. Explicit synchronization prim-
itives such as barriers or locks may be used to support synchronization when neces-
sary. Although we only raise this issue here, later chapters revisit this question when
support for communication between host and device programs or when synchroni-
zation between tasks is required.
STRUCTURE
The remainder of the book is organized as follows:
Chapter 2 presents an introduction to OpenCL, including key concepts such as
kernels, platforms, and devices; the four different abstraction models; and devel-
oping your first OpenCL kernel. Understanding these different models is critical
to fully appreciate the richness of OpenCL’s programming model.
Chapter 3 presents some of the architectures that OpenCL does or might target,
including x86 CPUs, GPUs, and APUs. The text includes discussion of different
styles of architectures, including single instruction multiple data and VLIW. This
chapter also covers the concepts of multi-core and throughput-oriented systems,
as well as advances in heterogeneous architectures.
Chapter 4 introduces basic matrix multiplication, image rotation, and convolution
implementations to help the reader learn OpenCL by example.
11
Structure

Chapter 5 discusses concurrency and execution in the OpenCL programming
model. In this chapter, we discuss kernels, work items, and the OpenCL execu-
tion and memory hierarchies. We also show how queuing and synchronization
work in OpenCL such that the reader gains an understanding of how to write
OpenCL programs that interact with memory correctly.
Chapter 6 shows how OpenCL maps to an example architecture. For this study,
we choose a system comprising an AMD Phenom II CPU and an AMD Radeon
HD6970 GPU. This chapter allows us to show how the mappings of the OpenCL
programming model for largely serial architectures such as CPUs and vector/
throughput architectures such as GPUs differ, giving some idea how to optimize
for specific architectural styles.
Chapter 7 presents a case study that accelerates a convolution algorithm. Issues
related to memory space utilization and efficiency are considered, as well as work
item scheduling, wavefront occupancy, and overall efficiency. These techniques
arethefoundationsnecessaryfordevelopinghigh-performancecodeusingOpenCL.
Chapter 8 presents a case study targeting video processing, utilizing OpenCL to
build performant image processing effects that can be applied to video streams.
Chapter 9 presents another case study examining how to optimize the perfor-
mance of a histogramming application. In particular, it highlights how careful
design of workgroup size and memory access patterns can make a vast difference
to performance in memory-bound applications such as histograms.
Chapter 10 discusses how to leverage a heterogeneous CPU–GPU environment.
The target application is a mixed particle simulation (as illustrated on the cover of
this book) in which work is distributed across both the CPU and the GPU depend-
ing on the grain size of particles in the system.
Chapter 11 shows how to use OpenCL extensions using the device fission and
double precision extensions as examples.
Chapter 12 introduces the reader to debugging and analyzing OpenCL programs.
The right debugging tool can save a developer hundreds of wasted hours,
allowing him or her instead to learn the specific computer language and solve
the problem at hand.
Chapter 13 provides an overview and performance trade-offs of WebCL. WebCL
is not yet a product, but imagine what could be possible if the web were powered
by OpenCL. This chapter describes example OpenCL bindings for JavaScript,
discussing an implementation of a Firefox plug-in that allows web applications
to leverage the powerful parallel computing capabilities of modern CPUs and
GPUs.
Reference
Almasi, G. S., & Gottlieb, A. (1989). Highly Parallel Computing. Redwood City, CA: Ben-
jamin Cummings.

Further Reading and Relevant Websites
Chapman, B., Jost, G., van der Pas, R., & Kuck, D. J. (2007). Using OpenMP: Portable Shared
Memory Parallel Programming. Cambridge, MA: MIT Press.
Duffy, J. (2008). Concurrent Programming on Windows. Upper Saddle River, NJ: Addison-
Wesley.
Gropp, W., Lusk, E., & Skjellum, A. (1994). Using MPI: Portable Parallel Programming with
the Message-Passing Interface. MIT Press Scientific and Engineering Computation
Series. Cambridge, MA: MIT Press.
Herlihy, M., & Shavit, N. (2008). The Art of Multiprocessor Programming. Burlington, MA:
Morgan Kaufmann.
Khronos Group. OpenCL. www.khronos.org/opencl.
Mattson, T. G., Sanders, B. A., & Massingill, B. L. (2004). Patterns for Parallel Programming.
Upper Saddle River, NJ: Addison-Wesley.
NVIDA. CUDA Zone. http://www.nvidia.com/object/cuda_home_new.html.
AMD. OpenCL Zone. http://developer.amd.com/openclzone.
13
Further reading and relevant websites

CHAPTER
Introduction to OpenCL
2
INTRODUCTION
This chapter introduces OpenCL, the programming fabric that will allow us to weave
our application to execute concurrently. Programmers familiar with C and Cþþ
should have little trouble understanding the OpenCL syntax. We begin by reviewing
the OpenCL standard.
The OpenCL Standard
Open programming standards designers are tasked with a very challenging objective:
arrive at a common set of programming standards that are acceptable to a range of
competing needs and requirements. The Khronos consortium that manages the
OpenCL standard has done a good job addressing these requirements. The consor-
tium has developed an applications programming interface (API) that is general
enough to run on significantly different architectures while being adaptable enough
that each hardware platform can still obtain high performance. Using the core lan-
guage and correctly following the specification, any program designed for one ven-
dor can execute on another’s hardware. The model set forth by OpenCL creates
portable, vendor- and device-independent programs that are capable of being accel-
erated on many different hardware platforms.
The OpenCL API is a C with a Cþþ Wrapper API that is defined in terms of the C
API. There are third-party bindings for many languages, including Java, Python, and
.NET. The code that executes on an OpenCL device, which in general is not the same
device as the host CPU, is written in the OpenCL C language. OpenCL C is a
restricted version of the C99 language with extensions appropriate for executing
data-parallel code on a variety of heterogeneous devices.
The OpenCL Specification
The OpenCL specification is defined in four parts, called models, that can be sum-
marized as follows:
Heterogeneous Computing with OpenCL
© 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved.
15

Discovering Diverse Content Through
Random Scribd Documents

PASSION AND OTHER TALES. By Mrs. J. Thayer, Author of
"Floral Gems," &c. &c. 16mo. Cloth. 62-½
TURNOVER. A Tale of New Hampshire. Paper. 25
THE HISTORY OF THE HEN FEVER; a Humorous Record. By Geo.
P. Burnham. With twenty Illustrations. 12mo. Cloth. 1 25
The work is written in a happy but ludicrous style, and this
reliable history of the fowl mania in America, will create an
immense sensation.—Courier.
NEW MINIATURE VOLUMES.
THE ART OF CONVERSING. Written for the instruction of Youth
in the polite manners and language of the drawing-room, by a
Society of Gentlemen; with an illustrative title. Fourteenth
Edition. Gilt Edges. 37-½
THE SAME, Gilt Edges and Sides. 50
FLORAL GEMS: or, The Songs of the Flowers. By Mrs. J. Thayer.
Thirteenth Edition, with a beautiful frontispiece. Gilt Edges. 37-½
THE AMETHYST: or, Poetical Gems. A Gift Book for all seasons.
Illustrated. Gilt Edges. 37-½
ZION. With Illustrative Title. By Rev. Mr. Taylor. 42
THE TRIUNE. With Illustrative Title. By Rev. Mr. Taylor. 37-½

TRIAD. With Illustrative Title. By Rev. Timothy A. Taylor. 37-½
TWO MOTTOES. By Rev. T.A. Taylor. 37-½
SOLACE. By Rev. T.A. Taylor. 37-½
SONNETS. By Edward Moxon. 31-¼
GRAY'S ELEGY, and other Poems. The Poetical Works of Thomas
Gray. "Poetry—Poetry;—Gray—Gray!" [Daniel Webster, the
night before his death, Oct. 24, 1852.]. 31
The following Writing Books are offered on Liberal Terms.
FRENCH'S NEW WRITING BOOK, with a fine engraved copy on each
page. Just published, in Four Numbers, on a highly-improved plan.
No. 1 Contains the First Principles, &c. 10
No. 2 A fine Copy Hand. 10
No. 3 A bold Business Hand Writing. 10
No. 4 Beautiful Epistolary Writing for the Lady. 10
James French & Co., No. 78 Washington street, have just
published a new series of Writing Books for the use of Schools
and Academies. They are arranged upon a new and improved
plan, with a copy on each page, and ample instructions for
learners. We commend them to the attention of teachers and
parents.—Transcript.
They commence with those simple forms which the learner
needs first to make, and they conduct him, by natural and
appropriate steps, to those styles of the art which indicate the

chirography not only of the finished penman, but which are
adapted to the wants of those who wish to become
accomplished accountants.—Courier.
A new and original system of Writing Books, which cannot fail to
meet with favor. They consist of a series, and at the top of each
page is a finely-executed copy. We cordially recommend the
work.—Bee.
It is easily acquired, practical and beautiful.—Fitchburg Sentinel.
We have no hesitation in pronouncing them superior to anything
of the kind ever issued.—Star Spangled Banner.
FRENCH'S PRACTICAL WRITING BOOK, for the use of Schools and
Academies; in Three Numbers, with a copy for each page.
No. 1, Commencing with the First Principles. 10
No. 2, Running-hand copies for Business Purposes. 10
No. 3, Very fine copies, together with German Text and Old English.
10
BOSTON SCHOOL WRITING BOOK, for the use of Public and
Private Schools; in Six Numbers, with copies to assist the
Teacher and aid the Learner.
No. 1 Contains the Elementary Principles, together with the
Large Text Hand. 10
No. 2 Contains the Principles and First Exercises for a Small
Hand. 10
No. 3 Consists of the Capital Letters, and continuation of Small
Letters. 10
No. 4 Contains Business-hand Copies, beautifully executed. 10
No. 5 Consists of a continuation of Business Writing, also an
Alphabet of Roman Print. 10
No. 6 Contains many beautiful specimens of Epistolary Writing,
also an Alphabet of Old English and German Text. 10

LADIES' WRITING BOOK, for the use of Teachers and Learners,
with three engraved copies on each page, and the manner of
holding the pen, sitting at the table, &c., explained. 13
GENTLEMEN'S WRITING BOOK, for the use of Teachers and
Learners, with three engraved copies on each page, and the
manner of holding the pen, sitting at the table, &c., explained. 13
YANKEE PENMAN, Containing 48 pages, with engraved copies. 33
FRENCH'S EAGLE COVER WRITING BOOKS, made of fine blue
paper, without copies. 7

Transcriber's Note
Punctuation and formatting markup have been normalized.
Apparent printer's errors have been retained, unless stated
below.
Page numbers cited in illustration captions refer to their
discussion in the text. Illustrations have been moved near
their mention in the text, which has, in some instances,
affected page numbering
Page numbers have also been affected by the omission of
blank pages.
Page 21, "gray" changed to "grey" for consistency. (...rich and
poor, white, black and grey,—everybody was more or less
seriously affected by this curious epidemic.)
Page 60, "anexed" changed to "annexed". (In the addenda to
my Report (above named) there appeared the annexed
statement, by somebody:)
Page 88, "H.B.M." changed to "H.R.M." (Her Royal Majesty)
for consistency. (From Hon. Col. Phipps, H.R.M. Secretary.)
Page 116, "oustrip" changed to "outstrip". (At this time there
was found an ambitious individual, occasionally, who got
"ahead of his time," and whose laudable efforts to outstrip his
neighbors were only checked by the natural results of his own
superior "progressive" notions)
Page 153, "millenium" changed to "millennium". ("Fanny"
went into New York State, crowing when she left, crowing as
she went, and continuing to crow until she crowed the

community there clear through the next fourth o' July, out
into the fabled millennium.)
Page 162, "@" changed to "or". (The prices for chickens
ranged from $12 or $15 a pair, to $25 or $30, and often $40
to $50, a pair.)

*** END OF THE PROJECT GUTENBERG EBOOK THE HISTORY OF
THE HEN FEVER. A HUMOROUS RECORD ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project

Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files

containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty

payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright

law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.

If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and

credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookname.com

Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)

More Related Content

Similar to Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.) (20)

Recently uploaded (20)

Heterogeneous Computing with Open CL 1st Edition Perhaad Mistry And Dana Schaa (Auth.)