SlideShare a Scribd company logo
Achieving Improved Performance In Multi-threaded Programming With GPU Computing CSE 4120 :  Technical Writing & Seminar MD.  Mesbah Uddin Khan  [mesbahuk@gmail.com] Dated, July 13, 2011
Things we need to know CPU GPU Threads Multi-thread Programming Parallel Programming OpenCL GPU Computing
Introduction Using graphics hardware to enhance CPU based standard desktop applications is a question not only of programming models but also of critical optimizations that are required to achieve true performance improvements.
Hardware Trends  Two major hardware trends make parallel programming a crucial issue for all software engineers today. The rise of many-core CPU architectures The inclusion of powerful graphics processing units(GPUs).
Central Processing Unit (CPU) CPU support parallel programming Assigns threads to different tasks and coordinates their activities.  Newer programming models also consider the non-uniform memory architecture (NUMA) of modern desktop systems They rely on the underlying concept of parallel threads CPU hardware is optimized for such coarse-grained,  task-parallel programming with synchronized shared memory.
Graphics Processing Unit (GPU) GPUs are mainly designed for fine-grained, data-parallel computation. Graphics processing is an embarrassingly parallel problem.  GPU hardware is optimized for heavy loads.  It aims at combining a maximum number of simple parallel processing elements, each having only a small amount of local memory.  For example, the Nvidia Geforce GTX 480 graphics card supports up to 1,536 GPU threads on each of its 15 compute units. So, at full operational capacity, it can run 23,040 parallel execution streams.
From CPU to GPU Applications running on a computer can access GPU resources with the help of a control API implemented in user-mode libraries and the graphics card driver. Leading GPU-interested companies defined Open Computing Language (OpenCL), a vendor neutral way of accessing computable resources.
Example Application โ€“ Sudoku (1/2) A Sudoku field typically consists of 3x3 subfields with each having 3x3 places.  Three facts make this problem a representative example of algorithms appropriate for GPU execution: Data validation is the primary application task, The computational effort grows with the game fields size (i.e. the problem size), The workload belongs to the GPU-friendly class of embarrassingly parallel problems that have only a  very small serial execution portion.
Example Application โ€“ Sudoku (2/2) Fig: Execution time of the Sudoku validation on different compute devices (a) problem size of 10,000 to 50,000 possible Sudoku places and  (b) problem size of 100,000 to 700,000 Sudoku places.
Best CPU-GPU Practices To push the performance of GPU-enabled desktop applications even further requires fine-grained tuning of data placement and parallel activities on the GPU card. Algorithm Design Memory Transfer Control Flow Memory Types Memory Access Sizing Instructions Precision
Developer Support Vendors (like Nvidia and AMD) offer software development kits with different C compilers for Windows and Linux based systems. Developers can also utilize special libraries, such as  AMDโ€™s Core Math Library, Nvidiaโ€™s libraries for basic linear algebra subroutines and fast Fourier transforms, etc. Nvidia and AMD also provide big knowledge bases with tutorials, examples, articles, use cases, and developer forums on their websites.
Concluding Remarks The GPU market continues to evolve quickly Nvidia has already started distinguishing between GPU computing for normal graphic cards and as a sole-purpose activity on processors such as its Tesla series. Higher-level languages like Java and C-Sharp can be benefited from GPU computing by using GPU-based libraries.
References โ€œ Joint Forces: From Multithreaded Programming to GPU Computing,โ€ IEEE Software,  Jan/Feb 2011 By Frank Feinbube,  Peter Troger and Andreas Polze.  Pseudorandomness Advanced Micro Devices,  ATI Stream Computing OpenCL Programming Guide,  June 2010. Nvidia OpenCL Best Practices Guide,  Version 2.3,  August 2009.
Thank you allโ€ฆ  ๏Š
Ad

Recommended

Lec04 gpu architecture
Lec04 gpu architecture
Taras Zakharchenko
ย 
GPU Computing: A brief overview
GPU Computing: A brief overview
Rajiv Kumar
ย 
GPU Programming with Java
GPU Programming with Java
Kelum Senanayake
ย 
Gpu databases
Gpu databases
Mahmoud Eskandari
ย 
Parallel computing with Gpu
Parallel computing with Gpu
Rohit Khatana
ย 
graphics processing unit ppt
graphics processing unit ppt
Nitesh Dubey
ย 
Parallel Computing on the GPU
Parallel Computing on the GPU
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
ย 
GPU Computing
GPU Computing
Khan Mostafa
ย 
GPU - An Introduction
GPU - An Introduction
Dhan V Sagar
ย 
Gpu
Gpu
Divyaprathapraju Divyaprathapraju
ย 
Gpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
ย 
19564926 graphics-processing-unit
19564926 graphics-processing-unit
Dayakar Siddula
ย 
GPU power consumption and performance trends
GPU power consumption and performance trends
Alessio Villardita
ย 
Introduction to Computing on GPU
Introduction to Computing on GPU
Ilya Kuzovkin
ย 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
ย 
Graphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
ย 
Graphics processing unit
Graphics processing unit
Shashwat Shriparv
ย 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
Linaro
ย 
GPU Programming
GPU Programming
William Cunningham
ย 
Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
ย 
GPU - Basic Working
GPU - Basic Working
Nived R Nambiar
ย 
Gpu presentation
Gpu presentation
spartasoft
ย 
CUDA
CUDA
Areeb Khan
ย 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
Jafar Khan
ย 
Gpu
Gpu
hashim102
ย 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
geetachauhan
ย 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
ย 
INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
USAINS Holding Sdn. Bhd. (wholly-owned by Universiti Sains Malaysia)
ย 
GPU Programming
GPU Programming
sgleadow
ย 
Haskell Accelerate
Haskell Accelerate
Steve Severance
ย 

More Related Content

What's hot (19)

GPU - An Introduction
GPU - An Introduction
Dhan V Sagar
ย 
Gpu
Gpu
Divyaprathapraju Divyaprathapraju
ย 
Gpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
ย 
19564926 graphics-processing-unit
19564926 graphics-processing-unit
Dayakar Siddula
ย 
GPU power consumption and performance trends
GPU power consumption and performance trends
Alessio Villardita
ย 
Introduction to Computing on GPU
Introduction to Computing on GPU
Ilya Kuzovkin
ย 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
ย 
Graphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
ย 
Graphics processing unit
Graphics processing unit
Shashwat Shriparv
ย 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
Linaro
ย 
GPU Programming
GPU Programming
William Cunningham
ย 
Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
ย 
GPU - Basic Working
GPU - Basic Working
Nived R Nambiar
ย 
Gpu presentation
Gpu presentation
spartasoft
ย 
CUDA
CUDA
Areeb Khan
ย 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
Jafar Khan
ย 
Gpu
Gpu
hashim102
ย 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
geetachauhan
ย 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
ย 
GPU - An Introduction
GPU - An Introduction
Dhan V Sagar
ย 
Gpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
ย 
19564926 graphics-processing-unit
19564926 graphics-processing-unit
Dayakar Siddula
ย 
GPU power consumption and performance trends
GPU power consumption and performance trends
Alessio Villardita
ย 
Introduction to Computing on GPU
Introduction to Computing on GPU
Ilya Kuzovkin
ย 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
ย 
Graphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
ย 
Graphics processing unit
Graphics processing unit
Shashwat Shriparv
ย 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
Linaro
ย 
Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
ย 
GPU - Basic Working
GPU - Basic Working
Nived R Nambiar
ย 
Gpu presentation
Gpu presentation
spartasoft
ย 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
Jafar Khan
ย 
Gpu
Gpu
hashim102
ย 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
geetachauhan
ย 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
ย 

Viewers also liked (6)

INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
USAINS Holding Sdn. Bhd. (wholly-owned by Universiti Sains Malaysia)
ย 
GPU Programming
GPU Programming
sgleadow
ย 
Haskell Accelerate
Haskell Accelerate
Steve Severance
ย 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
ย 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
ย 
GPU Programming with CUDA
GPU Programming with CUDA
Filipo Mรณr
ย 
GPU Programming
GPU Programming
sgleadow
ย 
Haskell Accelerate
Haskell Accelerate
Steve Severance
ย 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
ย 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
ย 
GPU Programming with CUDA
GPU Programming with CUDA
Filipo Mรณr
ย 
Ad

Similar to Achieving Improved Performance In Multi-threaded Programming With GPU Computing (20)

Gpu computing-webgl
Gpu computing-webgl
VisCircle
ย 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
cseij
ย 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
ย 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
ย 
openCL Paper
openCL Paper
Justin McKennon
ย 
High performance computing with accelarators
High performance computing with accelarators
Emmanuel college
ย 
Stream Processing
Stream Processing
arnamoy10
ย 
Amd fusion apus
Amd fusion apus
Maulik Dhameliya
ย 
CUDA by Example : Why CUDA? Why Now? : Notes
CUDA by Example : Why CUDA? Why Now? : Notes
Subhajit Sahu
ย 
CUDA by Example : NOTES
CUDA by Example : NOTES
Subhajit Sahu
ย 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
ย 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
ย 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
HSA Foundation
ย 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
DavidsonJebaseelan1
ย 
Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)
Dr. Michael Agbaje
ย 
Gpu
Gpu
Siddhu gowda
ย 
Gpu
Gpu
Siddhu gowda
ย 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
ย 
Tesla personal super computer
Tesla personal super computer
Priya Manik
ย 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
ย 
Gpu computing-webgl
Gpu computing-webgl
VisCircle
ย 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
cseij
ย 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
ย 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
ย 
High performance computing with accelarators
High performance computing with accelarators
Emmanuel college
ย 
Stream Processing
Stream Processing
arnamoy10
ย 
CUDA by Example : Why CUDA? Why Now? : Notes
CUDA by Example : Why CUDA? Why Now? : Notes
Subhajit Sahu
ย 
CUDA by Example : NOTES
CUDA by Example : NOTES
Subhajit Sahu
ย 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
ย 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
ย 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
HSA Foundation
ย 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
DavidsonJebaseelan1
ย 
Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)
Dr. Michael Agbaje
ย 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
ย 
Tesla personal super computer
Tesla personal super computer
Priya Manik
ย 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
ย 
Ad

Recently uploaded (20)

Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
ย 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
ย 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
ย 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
ย 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
ย 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
ย 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
ย 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
ย 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
ย 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
ย 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
ย 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
ย 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
ย 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
ย 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
ย 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
ย 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
ย 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
ย 
Wenn alles versagt - IBM Tape schรผtzt, was zรคhlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schรผtzt, was zรคhlt! Und besonders mit dem neust...
Josef Weingand
ย 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
ย 
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
ย 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
ย 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
ย 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
ย 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
ย 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
ย 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
ย 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
ย 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
ย 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
ย 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
ย 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
ย 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
ย 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
ย 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
ย 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
ย 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
ย 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
ย 
Wenn alles versagt - IBM Tape schรผtzt, was zรคhlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schรผtzt, was zรคhlt! Und besonders mit dem neust...
Josef Weingand
ย 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
ย 

Achieving Improved Performance In Multi-threaded Programming With GPU Computing

  • 1. Achieving Improved Performance In Multi-threaded Programming With GPU Computing CSE 4120 : Technical Writing & Seminar MD. Mesbah Uddin Khan [[email protected]] Dated, July 13, 2011
  • 2. Things we need to know CPU GPU Threads Multi-thread Programming Parallel Programming OpenCL GPU Computing
  • 3. Introduction Using graphics hardware to enhance CPU based standard desktop applications is a question not only of programming models but also of critical optimizations that are required to achieve true performance improvements.
  • 4. Hardware Trends Two major hardware trends make parallel programming a crucial issue for all software engineers today. The rise of many-core CPU architectures The inclusion of powerful graphics processing units(GPUs).
  • 5. Central Processing Unit (CPU) CPU support parallel programming Assigns threads to different tasks and coordinates their activities. Newer programming models also consider the non-uniform memory architecture (NUMA) of modern desktop systems They rely on the underlying concept of parallel threads CPU hardware is optimized for such coarse-grained, task-parallel programming with synchronized shared memory.
  • 6. Graphics Processing Unit (GPU) GPUs are mainly designed for fine-grained, data-parallel computation. Graphics processing is an embarrassingly parallel problem. GPU hardware is optimized for heavy loads. It aims at combining a maximum number of simple parallel processing elements, each having only a small amount of local memory. For example, the Nvidia Geforce GTX 480 graphics card supports up to 1,536 GPU threads on each of its 15 compute units. So, at full operational capacity, it can run 23,040 parallel execution streams.
  • 7. From CPU to GPU Applications running on a computer can access GPU resources with the help of a control API implemented in user-mode libraries and the graphics card driver. Leading GPU-interested companies defined Open Computing Language (OpenCL), a vendor neutral way of accessing computable resources.
  • 8. Example Application โ€“ Sudoku (1/2) A Sudoku field typically consists of 3x3 subfields with each having 3x3 places. Three facts make this problem a representative example of algorithms appropriate for GPU execution: Data validation is the primary application task, The computational effort grows with the game fields size (i.e. the problem size), The workload belongs to the GPU-friendly class of embarrassingly parallel problems that have only a very small serial execution portion.
  • 9. Example Application โ€“ Sudoku (2/2) Fig: Execution time of the Sudoku validation on different compute devices (a) problem size of 10,000 to 50,000 possible Sudoku places and (b) problem size of 100,000 to 700,000 Sudoku places.
  • 10. Best CPU-GPU Practices To push the performance of GPU-enabled desktop applications even further requires fine-grained tuning of data placement and parallel activities on the GPU card. Algorithm Design Memory Transfer Control Flow Memory Types Memory Access Sizing Instructions Precision
  • 11. Developer Support Vendors (like Nvidia and AMD) offer software development kits with different C compilers for Windows and Linux based systems. Developers can also utilize special libraries, such as AMDโ€™s Core Math Library, Nvidiaโ€™s libraries for basic linear algebra subroutines and fast Fourier transforms, etc. Nvidia and AMD also provide big knowledge bases with tutorials, examples, articles, use cases, and developer forums on their websites.
  • 12. Concluding Remarks The GPU market continues to evolve quickly Nvidia has already started distinguishing between GPU computing for normal graphic cards and as a sole-purpose activity on processors such as its Tesla series. Higher-level languages like Java and C-Sharp can be benefited from GPU computing by using GPU-based libraries.
  • 13. References โ€œ Joint Forces: From Multithreaded Programming to GPU Computing,โ€ IEEE Software, Jan/Feb 2011 By Frank Feinbube, Peter Troger and Andreas Polze. Pseudorandomness Advanced Micro Devices, ATI Stream Computing OpenCL Programming Guide, June 2010. Nvidia OpenCL Best Practices Guide, Version 2.3, August 2009.