Raspberry Pi Computer Cluster
Noel J. Petit
Ken Johnson
Paulina Vo
Don Vo
Christian Grant
April 10, 2015
Computer Science
Augsburg College
2211 Riverside Avenue
Minneapolis,MN 55454
petit@augsburg.edu
vop@augsburg.edu
1
1 Abstract
A cluster of Raspberry Pi computers has been created to test and demon-
strate the effectiveness of small computers in parallel processing. The comput-
ers use MPICH, a message passing protocol to share a large task and then co-
ordinate their results at the end of the processing among a group of 8 or more
Raspberry Pi computers. We compare the single CPU (such as the common
Intel Core i5) vs cluster performance of various parallel algorithms such as
sorting and searching. We found limitations in the ability to speed up a pro-
cess because of the communication and operating system overhead among pro-
cessors. We tested various operating systems and configurations for their abil-
ity to speed up a process shared by many CPU’s. Results were improved by
using simpler operating systems and limiting the tasks assigned to the Rasp-
berry Pi. For example, by removing services such as Network Time Protocol,
Remote Desktop, and system logging, we were able to approximately double
the processing speed of an 8 node parallel system. In addition, we tested file
sharing with NFS and SecureNFS as well as file sharing with file systems out-
side the cluster (for example, Google storage). We will review the technical
findings as well as what we learned by building a cluster of small computers
to simulate a high performance processor.
2 Introduction
Created a cluster of inexpensive Raspberry Pi computers to demonstrate the
power of parallel computing. Each node of the cluster is a Raspberry Pi mini-
computer in a family of computers introduced in 2012 as a full-featured mini-
computer for $35. At that cost, it is possible to acquire many computers and
network them via Ethernet or WiFi. In our case we used an Ethernet switch
to join 6 to 8 model B Raspberry Pi’s and demonstrate various parallel pro-
cessing features.
3 Raspberry Pi
This minicomputer has passed through three versions – B, B+, and 2. Here is
a rough comparison of the models available.
Model Number B B+ 2
Processor Broadcom BCM 2835 BCM 2835 BCM 2836
CPU Speed 700 MHz Single 700 MHz Single 900 MHz Quad
Core ARM 1176 Core ARM 1176 Core ARM Cortex A7
Memory 512 MB SD RAM 512 MB SD RAM 1 GB SD RAM
All have USB 2.0. ports, audio and video outputs as well as Ethernet and
power connectors. Each draws about 700 ma at 5 volts for a power draw of
about 4 watts.
2
A number of operating systems are available. The install manager for the
Raspberry Pi is NOOBS. The operating systems included with NOOBS are:
• Arch Linux ARM
• OpenELEC
• Pidora (Fedora Remix)
• Puppy Linux
• Raspbmc and XBMC open source digital media center
• RISC OS - The operating system of the first ARM-based computer
• Raspbian
In our case we wanted to have as many features and languages as possible so
we used the Raspbian available from the Raspberry Pi foundation. That may
not be the fastest of operating systems but it is very close to the Ubuntu that
students use in our public lab and provided the widest of features. The B and
B+ processors were measured at about 40 Megaflops with the quad core 2
processor at 98 Megaflops (1). On the CPUlevel the performance is similar to
a 300MHzPentium IIof 1997-1999. Connecting machines is done with a simple
100 Mbps Ethernet switch. Each processor in our cluster is assigned a fixed
IP.
4 MPICH
The MPICH consortium of contributors who develop and support a library
or message passing protocols for communication among computers. MPICH
includes compilers for FORTRAN, C, C++. In our case we chose C as our
language and used many of the example parallel programs to demonstrate
the cluster. MPICH includes a specialized compiler and run time for C which
manages the distribution and collection of tasks and data among connected
processors. To allow processors to share compiled programs and their data,
all computers share files via Network File Sharing (NFS) on either one of the
Raspberry Pi’s or a separate Ubuntu server running NFS server.
For simple programs, MPICH starts all of the programs on as many proces-
sors as specified and runs all to completion. Each processor is aware of how
many other processors are in the cluster as well as its index in the array of
processors. Thus, every processor knows who it is as well as how to address
all of the other neighbor processors. Some of the programs distribute the task
among processors by breaking the shared data into blocks. Some distribute
the tasks by sending data to each processor and waiting for the “master” pro-
cessor to receive data from all the slaves.
5 Parallel vs Single Processor Tasks
3
As a start, let’s consider running a simple mathematical task on a single pro-
cessor and then distributing this task among several processors. There will
always be start-up time spent distributing the task to multiple processors so
we expect short tasks to take longer when distributed among processors.
5.1 Calculating Prime Numbers
Prime MPI calculates prime numbers and distributes the work amongst the
various numbers of processors. Prime MPIs code is derived from jburkardt@fsu.edu.
The work is divided up amongst two Raspberry Pi 2s. Each Raspberry Pi 2
has quad core capabilities. The way the work division is represented in this
paper is shown as ...
(Raspberry Pi #1) / (Raspberry Pi #2)
The work distribution of the two Raspberry Pis was tested on prime number
lists of size 3.2 · 104
, 6.4 · 104
, 1.25 · 105
, 2.56 · 105
, and 5.12 · 105
. The predicted
data for this segment will show a logarithmic plot for the graph. It is also pre-
dicted that the graph will display close to the same results between processes
between corresponding division. For example, 1/0 will display the same data
as 1/1. The process should run faster, however the time must also account for
the communication between Pis. It is also predicted that at 5/4 and 5/5, the
runtimes will increase since the code will divide up the processes to be run on
more processes that exist. The data collected is shown below.
Number of Processes per Pi
Number of
Primes 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5
32,000 3.66 3.66 1.84 1.84 0.92 0.92 0.65 0.92 0.95 1.25
64,000 13.6 13.6 6.84 6.84 3.4 3.4 2.37 3.46 3.1
128,000 51.3 51.3 25.7 25.7 13.0 13.0 8.62 13.0 10.6
256,000 192.8 192.8 96.6 96.6 48.5 48.5 32.5 48.6 40.1
512,000 729.9 729.9 367.3 367.3 183.6 183.6 122.4 188.6 150.1
Using Microsoft Excel, the data was entered and compiled to form the graph
below. The horizontal axis represents the number of processors the work is
distributed amongst. The vertical axis represents the runtime for each test.
4
As shown from the data collected, the predictions made were true. There was
some variation once the program was run on eight (4/4) and nine (5/4) pro-
cessors. The variation between the two is fairly small except for size 512,000
(difference of 66.2). There is a max of eight processors, so when the code was
tested on nine processors the runtime decreased which was also predicted.
5.2 Ring MPI
Ring MPI sends messages of size 100, 1000, 10000, 100000, and 1000000 from
processor 0 to 1 to 2 to ... to P-1 then back to 0. P represents the number of
processors used. We expect the program to take longer with more processors
since the message needs to be relayed to more processors before returning to
processor 0.
Number of Processes per Pi
Values Sent 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5
100 N/A 0.001 0.001 0.003 0.004 0.006 0.007 0.010 0.010 0.011
1000 N/A 0.003 0.003 0.006 0.008 0.009 0.009 0.015 0.015 0.015
10000 N/A 0.018 0.017 0.039 0.034 0.051 0.051 0.070 0.074 0.086
100000 N/A 0.139 0.141 0.277 0.280 0.417 0.419 0.560 0.573 0.705
1000000 N/A 1.365 1.385 2.730 2.740 4.095 4.117 5.464 5.496 6.834
As expected, the amount of time it took to relay a message took longer with
more processors. There is no result for 1/0 processors because there is no
other processor to communicate with processor 0.
5.3 Search MPI
Search MPI is a function which utilizes parallel programming to find a value J
which satisfies the condition F(J) equals some value C. It works by searching
integers between two endpoint values, A and B, and evaluating each integer
on a function F. Based on the number of processors the function has access to
it will divide up the “search” work accordingly. We should expect to see the
time it takes for the program to run to decrease as the number of processors
increases.
5
Number of Processes per Pi
Range 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5
1 3.0 1.5 1.0 75.0 0.64 0.51 0.46 0.38 0.39 0.41
1 · 108
29.7 14.8 9.9 7.4 6.0 5.0 4.2 3.7 4.0 4.0
1 · 109
297.0 148.0 99.0 74.0 60.0 50.0 42.0 37.0 43.0 37.0
Looking at the table above we can see the time it takes for the function to
execute decay as more processors are employed. The second row displays a
time of 14.8 when two are active and a time of 4 when 10 are active. This
data output is what we expected and agrees with our assumption about the
relationship between processors and time to execute.
6 Conclusion
After a series of tests, it can be concluded that there are benefits as well as
limitations when implementing parallel processing on Raspberry Pis. Some
of these limitations are simply due to the communication overhead that is in-
herent to many parallel processing structures. For certain problems, there is
a significant gain in performance however the best cases for these are divisible
problems that are not interdependent on results of other nodes. Cases that
depend on computation results from other nodes in a parallelized cluster ex-
perience less of a gain in performance time. This method of parallel process-
ing is ideally suited for Monte Carlo simulations.
6
7 Refrences
B. Peguero. (2014). MpichCluster in Ubuntu [Online].
Available https://help.ubuntu.com/community/MpichCluster
J. Burkardt. (2011). MPI C Examples [Online].
Available http://people.sc.fsu.edu/ jburkardt/c src/mpi/mpi.html
MPICH. (2015). MPICH Guide [Online].
Available http://www.mpich.org/documentation/guides/
J. Fitzpatrick. (2013). The HTG Guide to Getting Started with Raspberry Pi
[Online]. Available http://www.howtogeek.com/138281/the-htg-guide-to-
getting-started-with-raspberry-pi/all/
7

More Related Content

PDF
Building A Linux Cluster Using Raspberry PI #1!
PDF
Move Message Passing Interface Applications to the Next Level
PDF
Hetergeneous Compute with Standards Based OFI/MPI/OpenMP Programming
PPTX
MPI message passing interface
PDF
Next Generation MPICH: What to Expect - Lightweight Communication and More
PDF
MPI History
PDF
High Performance Computing using MPI
PPT
Migration To Multi Core - Parallel Programming Models
Building A Linux Cluster Using Raspberry PI #1!
Move Message Passing Interface Applications to the Next Level
Hetergeneous Compute with Standards Based OFI/MPI/OpenMP Programming
MPI message passing interface
Next Generation MPICH: What to Expect - Lightweight Communication and More
MPI History
High Performance Computing using MPI
Migration To Multi Core - Parallel Programming Models

What's hot (20)

PPTX
Performance measures
PDF
Opnet lab 5 solutions
PDF
Opnet lab 4 solutions
PPT
What is [Open] MPI?
PDF
Opnet lab 3 solutions
PDF
Opnet lab 6 solutions
PDF
MPI Tutorial
PPTX
C# Parallel programming
PDF
Opnet lab 2 solutions
PPTX
Message Passing Interface (MPI)-A means of machine communication
PPTX
MPI Sessions: a proposal to the MPI Forum
ODP
Scaling Streaming - Concepts, Research, Goals
PDF
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
PPT
Comparing Cpp And Erlang For Motorola Telecoms Software
PDF
A Library for Emerging High-Performance Computing Clusters
PPTX
Serving BERT Models in Production with TorchServe
PDF
Parallel Programming in .NET
PPT
Open MPI
PPT
Message passing interface
PPTX
The Message Passing Interface (MPI) in Layman's Terms
Performance measures
Opnet lab 5 solutions
Opnet lab 4 solutions
What is [Open] MPI?
Opnet lab 3 solutions
Opnet lab 6 solutions
MPI Tutorial
C# Parallel programming
Opnet lab 2 solutions
Message Passing Interface (MPI)-A means of machine communication
MPI Sessions: a proposal to the MPI Forum
Scaling Streaming - Concepts, Research, Goals
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Comparing Cpp And Erlang For Motorola Telecoms Software
A Library for Emerging High-Performance Computing Clusters
Serving BERT Models in Production with TorchServe
Parallel Programming in .NET
Open MPI
Message passing interface
The Message Passing Interface (MPI) in Layman's Terms
Ad

Similar to Building A Linux Cluster Using Raspberry PI #2! (20)

PDF
Model checking
PDF
Burst Buffer: From Alpha to Omega
PDF
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
PDF
Process Control Block (PCB) print 4.pdf
PDF
Runtime Performance Optimizations for an OpenFOAM Simulation
DOCX
1.multicore processors
DOCX
RFP-Final3
PDF
Harvard poster
PDF
The Network Ip Address Scheme
DOCX
ProjectPurposeThe purpose of this project is to provide an o.docx
PPTX
Senior Design: Raspberry Pi Cluster Computing
PPTX
Programming using MPI and OpenMP
PDF
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
PDF
Performance evaluation of larger matrices over cluster of four nodes using mpi
PDF
Conference Paper: Universal Node: Towards a high-performance NFV environment
PDF
cpu-affinity
PPTX
Study of various factors affecting performance of multi core processors
PDF
PeerToPeerComputing (1)
PDF
G04844450
DOCX
NP-lab-manual.docx
Model checking
Burst Buffer: From Alpha to Omega
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
Process Control Block (PCB) print 4.pdf
Runtime Performance Optimizations for an OpenFOAM Simulation
1.multicore processors
RFP-Final3
Harvard poster
The Network Ip Address Scheme
ProjectPurposeThe purpose of this project is to provide an o.docx
Senior Design: Raspberry Pi Cluster Computing
Programming using MPI and OpenMP
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Performance evaluation of larger matrices over cluster of four nodes using mpi
Conference Paper: Universal Node: Towards a high-performance NFV environment
cpu-affinity
Study of various factors affecting performance of multi core processors
PeerToPeerComputing (1)
G04844450
NP-lab-manual.docx
Ad

More from A Jorge Garcia (20)

PDF
LIMACON 2023 Brochure
PDF
2022-RESUME-NEW
PDF
MAT122 DAY508 MEETING 44 of 45 2021.1217 FRIDAY
PDF
MAT122 DAY507 MEETING 43 of 45 2021.1216 THURSDAY
PDF
MAT122 DAY506 MEETING 42 of 45 2021.1215 WEDNESDAY
PDF
MAT122 DAY308 Lesson 26 of 45
PDF
MAT122 DAY307 Lesson 25 of 45
PDF
MAT122 DAY306 Lesson 24 of 45
PDF
MAT122 DAY305 Lesson 23 of 45
PDF
MAT122 DAY304 Lesson 22 of 45
PDF
MAT122 DAY303 Lesson 21 of 45
PDF
MAT122 DAY302 Lesson 20 of 45
PDF
MAT122 DAY301 Lesson 19 of 45
PDF
MAT122 DAY205
PDF
MAT122 DAY204
PDF
MAT122 DAY203
PDF
MAT122 DAY202
PDF
MAT122 DAY201
PDF
MAT122 DAY06
PDF
MAT122 DAY05
LIMACON 2023 Brochure
2022-RESUME-NEW
MAT122 DAY508 MEETING 44 of 45 2021.1217 FRIDAY
MAT122 DAY507 MEETING 43 of 45 2021.1216 THURSDAY
MAT122 DAY506 MEETING 42 of 45 2021.1215 WEDNESDAY
MAT122 DAY308 Lesson 26 of 45
MAT122 DAY307 Lesson 25 of 45
MAT122 DAY306 Lesson 24 of 45
MAT122 DAY305 Lesson 23 of 45
MAT122 DAY304 Lesson 22 of 45
MAT122 DAY303 Lesson 21 of 45
MAT122 DAY302 Lesson 20 of 45
MAT122 DAY301 Lesson 19 of 45
MAT122 DAY205
MAT122 DAY204
MAT122 DAY203
MAT122 DAY202
MAT122 DAY201
MAT122 DAY06
MAT122 DAY05

Recently uploaded (20)

PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PPTX
Education and Perspectives of Education.pptx
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
IP : I ; Unit I : Preformulation Studies
PDF
Empowerment Technology for Senior High School Guide
PDF
Climate and Adaptation MCQs class 7 from chatgpt
PPTX
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
PPTX
Module on health assessment of CHN. pptx
PDF
Journal of Dental Science - UDMY (2021).pdf
PDF
English Textual Question & Ans (12th Class).pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
Climate Change and Its Global Impact.pptx
PDF
My India Quiz Book_20210205121199924.pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
AI-driven educational solutions for real-life interventions in the Philippine...
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Education and Perspectives of Education.pptx
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
IP : I ; Unit I : Preformulation Studies
Empowerment Technology for Senior High School Guide
Climate and Adaptation MCQs class 7 from chatgpt
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
Module on health assessment of CHN. pptx
Journal of Dental Science - UDMY (2021).pdf
English Textual Question & Ans (12th Class).pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Climate Change and Its Global Impact.pptx
My India Quiz Book_20210205121199924.pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
B.Sc. DS Unit 2 Software Engineering.pptx

Building A Linux Cluster Using Raspberry PI #2!

  • 1. Raspberry Pi Computer Cluster Noel J. Petit Ken Johnson Paulina Vo Don Vo Christian Grant April 10, 2015 Computer Science Augsburg College 2211 Riverside Avenue Minneapolis,MN 55454 [email protected] [email protected] 1
  • 2. 1 Abstract A cluster of Raspberry Pi computers has been created to test and demon- strate the effectiveness of small computers in parallel processing. The comput- ers use MPICH, a message passing protocol to share a large task and then co- ordinate their results at the end of the processing among a group of 8 or more Raspberry Pi computers. We compare the single CPU (such as the common Intel Core i5) vs cluster performance of various parallel algorithms such as sorting and searching. We found limitations in the ability to speed up a pro- cess because of the communication and operating system overhead among pro- cessors. We tested various operating systems and configurations for their abil- ity to speed up a process shared by many CPU’s. Results were improved by using simpler operating systems and limiting the tasks assigned to the Rasp- berry Pi. For example, by removing services such as Network Time Protocol, Remote Desktop, and system logging, we were able to approximately double the processing speed of an 8 node parallel system. In addition, we tested file sharing with NFS and SecureNFS as well as file sharing with file systems out- side the cluster (for example, Google storage). We will review the technical findings as well as what we learned by building a cluster of small computers to simulate a high performance processor. 2 Introduction Created a cluster of inexpensive Raspberry Pi computers to demonstrate the power of parallel computing. Each node of the cluster is a Raspberry Pi mini- computer in a family of computers introduced in 2012 as a full-featured mini- computer for $35. At that cost, it is possible to acquire many computers and network them via Ethernet or WiFi. In our case we used an Ethernet switch to join 6 to 8 model B Raspberry Pi’s and demonstrate various parallel pro- cessing features. 3 Raspberry Pi This minicomputer has passed through three versions – B, B+, and 2. Here is a rough comparison of the models available. Model Number B B+ 2 Processor Broadcom BCM 2835 BCM 2835 BCM 2836 CPU Speed 700 MHz Single 700 MHz Single 900 MHz Quad Core ARM 1176 Core ARM 1176 Core ARM Cortex A7 Memory 512 MB SD RAM 512 MB SD RAM 1 GB SD RAM All have USB 2.0. ports, audio and video outputs as well as Ethernet and power connectors. Each draws about 700 ma at 5 volts for a power draw of about 4 watts. 2
  • 3. A number of operating systems are available. The install manager for the Raspberry Pi is NOOBS. The operating systems included with NOOBS are: • Arch Linux ARM • OpenELEC • Pidora (Fedora Remix) • Puppy Linux • Raspbmc and XBMC open source digital media center • RISC OS - The operating system of the first ARM-based computer • Raspbian In our case we wanted to have as many features and languages as possible so we used the Raspbian available from the Raspberry Pi foundation. That may not be the fastest of operating systems but it is very close to the Ubuntu that students use in our public lab and provided the widest of features. The B and B+ processors were measured at about 40 Megaflops with the quad core 2 processor at 98 Megaflops (1). On the CPUlevel the performance is similar to a 300MHzPentium IIof 1997-1999. Connecting machines is done with a simple 100 Mbps Ethernet switch. Each processor in our cluster is assigned a fixed IP. 4 MPICH The MPICH consortium of contributors who develop and support a library or message passing protocols for communication among computers. MPICH includes compilers for FORTRAN, C, C++. In our case we chose C as our language and used many of the example parallel programs to demonstrate the cluster. MPICH includes a specialized compiler and run time for C which manages the distribution and collection of tasks and data among connected processors. To allow processors to share compiled programs and their data, all computers share files via Network File Sharing (NFS) on either one of the Raspberry Pi’s or a separate Ubuntu server running NFS server. For simple programs, MPICH starts all of the programs on as many proces- sors as specified and runs all to completion. Each processor is aware of how many other processors are in the cluster as well as its index in the array of processors. Thus, every processor knows who it is as well as how to address all of the other neighbor processors. Some of the programs distribute the task among processors by breaking the shared data into blocks. Some distribute the tasks by sending data to each processor and waiting for the “master” pro- cessor to receive data from all the slaves. 5 Parallel vs Single Processor Tasks 3
  • 4. As a start, let’s consider running a simple mathematical task on a single pro- cessor and then distributing this task among several processors. There will always be start-up time spent distributing the task to multiple processors so we expect short tasks to take longer when distributed among processors. 5.1 Calculating Prime Numbers Prime MPI calculates prime numbers and distributes the work amongst the various numbers of processors. Prime MPIs code is derived from [email protected]. The work is divided up amongst two Raspberry Pi 2s. Each Raspberry Pi 2 has quad core capabilities. The way the work division is represented in this paper is shown as ... (Raspberry Pi #1) / (Raspberry Pi #2) The work distribution of the two Raspberry Pis was tested on prime number lists of size 3.2 · 104 , 6.4 · 104 , 1.25 · 105 , 2.56 · 105 , and 5.12 · 105 . The predicted data for this segment will show a logarithmic plot for the graph. It is also pre- dicted that the graph will display close to the same results between processes between corresponding division. For example, 1/0 will display the same data as 1/1. The process should run faster, however the time must also account for the communication between Pis. It is also predicted that at 5/4 and 5/5, the runtimes will increase since the code will divide up the processes to be run on more processes that exist. The data collected is shown below. Number of Processes per Pi Number of Primes 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5 32,000 3.66 3.66 1.84 1.84 0.92 0.92 0.65 0.92 0.95 1.25 64,000 13.6 13.6 6.84 6.84 3.4 3.4 2.37 3.46 3.1 128,000 51.3 51.3 25.7 25.7 13.0 13.0 8.62 13.0 10.6 256,000 192.8 192.8 96.6 96.6 48.5 48.5 32.5 48.6 40.1 512,000 729.9 729.9 367.3 367.3 183.6 183.6 122.4 188.6 150.1 Using Microsoft Excel, the data was entered and compiled to form the graph below. The horizontal axis represents the number of processors the work is distributed amongst. The vertical axis represents the runtime for each test. 4
  • 5. As shown from the data collected, the predictions made were true. There was some variation once the program was run on eight (4/4) and nine (5/4) pro- cessors. The variation between the two is fairly small except for size 512,000 (difference of 66.2). There is a max of eight processors, so when the code was tested on nine processors the runtime decreased which was also predicted. 5.2 Ring MPI Ring MPI sends messages of size 100, 1000, 10000, 100000, and 1000000 from processor 0 to 1 to 2 to ... to P-1 then back to 0. P represents the number of processors used. We expect the program to take longer with more processors since the message needs to be relayed to more processors before returning to processor 0. Number of Processes per Pi Values Sent 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5 100 N/A 0.001 0.001 0.003 0.004 0.006 0.007 0.010 0.010 0.011 1000 N/A 0.003 0.003 0.006 0.008 0.009 0.009 0.015 0.015 0.015 10000 N/A 0.018 0.017 0.039 0.034 0.051 0.051 0.070 0.074 0.086 100000 N/A 0.139 0.141 0.277 0.280 0.417 0.419 0.560 0.573 0.705 1000000 N/A 1.365 1.385 2.730 2.740 4.095 4.117 5.464 5.496 6.834 As expected, the amount of time it took to relay a message took longer with more processors. There is no result for 1/0 processors because there is no other processor to communicate with processor 0. 5.3 Search MPI Search MPI is a function which utilizes parallel programming to find a value J which satisfies the condition F(J) equals some value C. It works by searching integers between two endpoint values, A and B, and evaluating each integer on a function F. Based on the number of processors the function has access to it will divide up the “search” work accordingly. We should expect to see the time it takes for the program to run to decrease as the number of processors increases. 5
  • 6. Number of Processes per Pi Range 1/0 1/1 2/1 2/2 3/2 3/3 4/3 4/4 5/4 5/5 1 3.0 1.5 1.0 75.0 0.64 0.51 0.46 0.38 0.39 0.41 1 · 108 29.7 14.8 9.9 7.4 6.0 5.0 4.2 3.7 4.0 4.0 1 · 109 297.0 148.0 99.0 74.0 60.0 50.0 42.0 37.0 43.0 37.0 Looking at the table above we can see the time it takes for the function to execute decay as more processors are employed. The second row displays a time of 14.8 when two are active and a time of 4 when 10 are active. This data output is what we expected and agrees with our assumption about the relationship between processors and time to execute. 6 Conclusion After a series of tests, it can be concluded that there are benefits as well as limitations when implementing parallel processing on Raspberry Pis. Some of these limitations are simply due to the communication overhead that is in- herent to many parallel processing structures. For certain problems, there is a significant gain in performance however the best cases for these are divisible problems that are not interdependent on results of other nodes. Cases that depend on computation results from other nodes in a parallelized cluster ex- perience less of a gain in performance time. This method of parallel process- ing is ideally suited for Monte Carlo simulations. 6
  • 7. 7 Refrences B. Peguero. (2014). MpichCluster in Ubuntu [Online]. Available https://help.ubuntu.com/community/MpichCluster J. Burkardt. (2011). MPI C Examples [Online]. Available http://people.sc.fsu.edu/ jburkardt/c src/mpi/mpi.html MPICH. (2015). MPICH Guide [Online]. Available http://www.mpich.org/documentation/guides/ J. Fitzpatrick. (2013). The HTG Guide to Getting Started with Raspberry Pi [Online]. Available http://www.howtogeek.com/138281/the-htg-guide-to- getting-started-with-raspberry-pi/all/ 7