SlideShare a Scribd company logo
Apache Hadoop cluster
on Macintosh OSX
The Trigger #DIY
The Kitchen Setup
TheNetwork
Master Chef a.k.a Namenode
Helpers a.k.a Datanode(s)
The Base Ingredients
0.13.0
10.7.5
0.9.5
200
MB/s
2.4.0
1.7.0.55
5.6.17
Basics
• Ensure that all the namenode and datanode machines are running
on the same OSX version
• For the purpose of this POC, I have selected OSX 10.7.5. All sample
commands are specific to this OS. You may need to tweak the
commands to suit your OS version compatibility
• I am a homebrew fan , so I have used the old and gold ruby based
platform for downloading all software needed to run the POC. You
may very well opt for downloading the installers individually and
tweak the process if you wish
• You will need fair bit of understanding of OSX and Hadoop to
understand and interpret. If not, no worries – most of the stuff can
be looked up online by simple Google search
• The “Namenode” machine needs more RAM than “Datanode”
machines. Please configure the namenode machine with at least 8
GB RAM
The Cooking
• Ensure that ALL datanodes and namenode machines are running on the
same OSX version and preferably have regulated software update strategy
(i.e. automatic software disabled)
• Disable automatic “sleep” options in the machines to avoid machines goes
into hibernation (from System Preferences)
• Download and Install “Xcode command line tools for Lion” (skip if Xcode
present)
• As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all
machines:
 “networksetup –listallnetworkservices” command will display all the network
names that your machine uses to connect to your network (E.g: Ethernet, Wi-
Fi etc.)
 “networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may
need to change the network name if it is any different)
The Cooking..
• Give logical names to ALL machines e.g. namenode.local ,datanode01.local
datanode02.local et al. (from System Preferences -> Sharing -> Computer
Name)
• Enable the following services from the Sharing panel of System
Preferences
– File Sharing
– Remote Login
– Remote Management
• Create one universal username (with Administrator privileges) on all
machines . E.g: hadoopuser. Preferably have the same password
• For the rest of steps , please login as this user and execute the commands
The Cooking
• On the namenode, run the command:
vi /etc/hosts
• Add all datanode hostnames , one host per line
• On each of the datanodes, run the command:
vi /etc/hosts
• Add the namenode hostname
sudo visudo
• Add an entry on the last line of the file as under:
hadoopuser ALL=(ALL) NOPASSWD: ALL
Coffee Time
• Install Java JDK and JRE on all the machines from Oracle Site
(http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for
instructions)
• Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same
in your .profile file. Run the following command to open your .profile
• vi ~/.profile
• #Paste the subsequent lines in the file and save it :
export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`"
• You may additionally paste the following lines in the same file:
export PATH=$PATH:/usr/local/sbin
PS1="H : d t: w :"
This is helpful for housekeeping activities
The Brewing
• Install “brew” and other components from it
 Run on terminal :
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
[the quotes need to be there]
 Run following command on terminal to ensure that it has been installed properly
brew doctor
 Run following commands in the same order on terminal
brew install makedepend
brew install wget
brew install ssh-copy-id
brew install hadoop
 Run following command on the “namenode” machine
brew install hive
brew install mysql
[assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver.
brew installs the software in “/usr/local/Cellar” location]
 Run the following command for setting up keyless login from namenode to ALL
datanodes. Run the command on namenode:
ssh-keygen
[press Enter key twice to accept default RSA , and no-passphrase]
 Run the following command recursively for ALL datanode hostnames. Run the command
on namenode:
ssh-copy-id hadoopuser@datanode01.local
provide the password when prompted. The command is verbose and tells if the key is
installed properly. You may validate the same by executing the command :
ssh hadoopuser@datanode01.local . It should NOT ask you to supply password anymore.
After the requisite software has been installed , the next step is to configure the different
components in a stepwise manner. Hadoop works in a distributed mode with “namenode”
being the central hub of the cluster. This gives enough reason to have the common
configuration files created on namenode first, and then copied in an automated manner
into all the datanodes. Let’s start with the .profile changes on namenode machine first.
The Saute
 We are going to configure Hive to use MySQL as the metastore for this POC. All we need
is to create a db user “hiveuser” with a valid password in the MySQL DB installed and
running on namenode AND copy the MySQL driver jar into Hive lib directory
 On the namenode , please fire the command to go to your HADOOP_CONF_DIR
location:
cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
Here , we need to create/modify the following set of files:
slaves
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
log4j.properties
 On the namenode, please fire the command to go to your HIVE_CONF_DIR location:
cd /usr/local/Cellar/hive/0.13.0/libexec/conf
Here , we need to create/modify the following set of files:
hive-site.xml
hive-log4j.properties
The Slow cooking
 Please find attached a simple script that, if installed on the namenode, can help you
copy your config files to ALL datanodes (I call it the config-push)
 Please find attached another simple script that I use for rebooting all the datanodes.
The Plating
 You may wish to take the next steps if desired:
 Install zookeeper
 Configure and run journalnodes
 Go for High Availability cluster implementation with multiple Namenodes
 Leave feedback if you wish to know the Hadoop configuration samples
The Garnishing
Disclaimer: Don’t sue me for any damage/infringement, I am not rich 

More Related Content

PPT
11. From Hadoop to Spark 2/2
PDF
Introduction to Apache Hive
PDF
Habits of Effective Sqoop Users
PDF
Get started with Developing Frameworks in Go on Apache Mesos
KEY
MongoSF - mongodb @ foursquare
PPTX
Developing Frameworks for Apache Mesos
KEY
Building Distributed Systems in Scala
PPTX
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
11. From Hadoop to Spark 2/2
Introduction to Apache Hive
Habits of Effective Sqoop Users
Get started with Developing Frameworks in Go on Apache Mesos
MongoSF - mongodb @ foursquare
Developing Frameworks for Apache Mesos
Building Distributed Systems in Scala
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools

What's hot (20)

PPTX
HBaseConEast2016: HBase on Docker with Clusterdock
PDF
Why your Spark job is failing
PDF
Streaming Processing with a Distributed Commit Log
PDF
Spark Programming
PDF
Cross Datacenter Replication in Apache Solr 6
PDF
Hive dirty/beautiful hacks in TD
PPTX
Why your Spark Job is Failing
PDF
Apache Sqoop: Unlocking Hadoop for Your Relational Database
PPTX
Tuning tips for Apache Spark Jobs
PDF
Using Morphlines for On-the-Fly ETL
PPTX
How to build your query engine in spark
ODP
Introduction to Spark with Scala
PPTX
Apache Kafka, HDFS, Accumulo and more on Mesos
PPTX
Solr 4: Run Solr in SolrCloud Mode on your local file system.
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
PDF
Buzzwords 2014 / Overview / part1
PPTX
NYC Lucene/Solr Meetup: Spark / Solr
PDF
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
ODP
Apache SolrCloud
PPTX
Administering and Monitoring SolrCloud Clusters
HBaseConEast2016: HBase on Docker with Clusterdock
Why your Spark job is failing
Streaming Processing with a Distributed Commit Log
Spark Programming
Cross Datacenter Replication in Apache Solr 6
Hive dirty/beautiful hacks in TD
Why your Spark Job is Failing
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Tuning tips for Apache Spark Jobs
Using Morphlines for On-the-Fly ETL
How to build your query engine in spark
Introduction to Spark with Scala
Apache Kafka, HDFS, Accumulo and more on Mesos
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Buzzwords 2014 / Overview / part1
NYC Lucene/Solr Meetup: Spark / Solr
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Apache SolrCloud
Administering and Monitoring SolrCloud Clusters
Ad

Similar to Hadoop on osx (20)

PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
PDF
02 Hadoop deployment and configuration
PPTX
Hadoop installation on windows
PDF
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
PDF
R hive tutorial supplement 1 - Installing Hadoop
PPTX
Configuring Your First Hadoop Cluster On EC2
PPTX
Hadoop 2.4 installing on ubuntu 14.04
DOCX
Hadoop installation
PDF
Deploy hadoop cluster
PPTX
Hadoop installation
PDF
Hadoop single node installation on ubuntu 14
PPTX
Exp-3.pptx
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
DOC
Configure h base hadoop and hbase client
PPTX
Hadoop single node setup
PDF
Setting up a HADOOP 2.2 cluster on CentOS 6
PDF
Single node hadoop cluster installation
PDF
Hadoop completereference
DOCX
Single node setup
PPTX
Hadoop single cluster installation
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
02 Hadoop deployment and configuration
Hadoop installation on windows
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
R hive tutorial supplement 1 - Installing Hadoop
Configuring Your First Hadoop Cluster On EC2
Hadoop 2.4 installing on ubuntu 14.04
Hadoop installation
Deploy hadoop cluster
Hadoop installation
Hadoop single node installation on ubuntu 14
Exp-3.pptx
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
Configure h base hadoop and hbase client
Hadoop single node setup
Setting up a HADOOP 2.2 cluster on CentOS 6
Single node hadoop cluster installation
Hadoop completereference
Single node setup
Hadoop single cluster installation
Ad

Recently uploaded (20)

PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Spectroscopy.pptx food analysis technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Heart disease approach using modified random forest and particle swarm optimi...
Empathic Computing: Creating Shared Understanding
Group 1 Presentation -Planning and Decision Making .pptx
OMC Textile Division Presentation 2021.pptx
TLE Review Electricity (Electricity).pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
cloud_computing_Infrastucture_as_cloud_p
A comparative analysis of optical character recognition models for extracting...
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf

Hadoop on osx

  • 1. Apache Hadoop cluster on Macintosh OSX
  • 3. The Kitchen Setup TheNetwork Master Chef a.k.a Namenode Helpers a.k.a Datanode(s)
  • 5. Basics • Ensure that all the namenode and datanode machines are running on the same OSX version • For the purpose of this POC, I have selected OSX 10.7.5. All sample commands are specific to this OS. You may need to tweak the commands to suit your OS version compatibility • I am a homebrew fan , so I have used the old and gold ruby based platform for downloading all software needed to run the POC. You may very well opt for downloading the installers individually and tweak the process if you wish • You will need fair bit of understanding of OSX and Hadoop to understand and interpret. If not, no worries – most of the stuff can be looked up online by simple Google search • The “Namenode” machine needs more RAM than “Datanode” machines. Please configure the namenode machine with at least 8 GB RAM
  • 6. The Cooking • Ensure that ALL datanodes and namenode machines are running on the same OSX version and preferably have regulated software update strategy (i.e. automatic software disabled) • Disable automatic “sleep” options in the machines to avoid machines goes into hibernation (from System Preferences) • Download and Install “Xcode command line tools for Lion” (skip if Xcode present) • As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all machines:  “networksetup –listallnetworkservices” command will display all the network names that your machine uses to connect to your network (E.g: Ethernet, Wi- Fi etc.)  “networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may need to change the network name if it is any different)
  • 7. The Cooking.. • Give logical names to ALL machines e.g. namenode.local ,datanode01.local datanode02.local et al. (from System Preferences -> Sharing -> Computer Name) • Enable the following services from the Sharing panel of System Preferences – File Sharing – Remote Login – Remote Management • Create one universal username (with Administrator privileges) on all machines . E.g: hadoopuser. Preferably have the same password • For the rest of steps , please login as this user and execute the commands
  • 8. The Cooking • On the namenode, run the command: vi /etc/hosts • Add all datanode hostnames , one host per line • On each of the datanodes, run the command: vi /etc/hosts • Add the namenode hostname sudo visudo • Add an entry on the last line of the file as under: hadoopuser ALL=(ALL) NOPASSWD: ALL
  • 9. Coffee Time • Install Java JDK and JRE on all the machines from Oracle Site (http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for instructions) • Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same in your .profile file. Run the following command to open your .profile • vi ~/.profile • #Paste the subsequent lines in the file and save it : export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`" • You may additionally paste the following lines in the same file: export PATH=$PATH:/usr/local/sbin PS1="H : d t: w :" This is helpful for housekeeping activities
  • 10. The Brewing • Install “brew” and other components from it  Run on terminal : ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)" [the quotes need to be there]  Run following command on terminal to ensure that it has been installed properly brew doctor  Run following commands in the same order on terminal brew install makedepend brew install wget brew install ssh-copy-id brew install hadoop  Run following command on the “namenode” machine brew install hive brew install mysql [assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver. brew installs the software in “/usr/local/Cellar” location]
  • 11.  Run the following command for setting up keyless login from namenode to ALL datanodes. Run the command on namenode: ssh-keygen [press Enter key twice to accept default RSA , and no-passphrase]  Run the following command recursively for ALL datanode hostnames. Run the command on namenode: ssh-copy-id [email protected] provide the password when prompted. The command is verbose and tells if the key is installed properly. You may validate the same by executing the command : ssh [email protected] . It should NOT ask you to supply password anymore. After the requisite software has been installed , the next step is to configure the different components in a stepwise manner. Hadoop works in a distributed mode with “namenode” being the central hub of the cluster. This gives enough reason to have the common configuration files created on namenode first, and then copied in an automated manner into all the datanodes. Let’s start with the .profile changes on namenode machine first. The Saute
  • 12.  We are going to configure Hive to use MySQL as the metastore for this POC. All we need is to create a db user “hiveuser” with a valid password in the MySQL DB installed and running on namenode AND copy the MySQL driver jar into Hive lib directory  On the namenode , please fire the command to go to your HADOOP_CONF_DIR location: cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop Here , we need to create/modify the following set of files: slaves core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml log4j.properties  On the namenode, please fire the command to go to your HIVE_CONF_DIR location: cd /usr/local/Cellar/hive/0.13.0/libexec/conf Here , we need to create/modify the following set of files: hive-site.xml hive-log4j.properties The Slow cooking
  • 13.  Please find attached a simple script that, if installed on the namenode, can help you copy your config files to ALL datanodes (I call it the config-push)  Please find attached another simple script that I use for rebooting all the datanodes. The Plating
  • 14.  You may wish to take the next steps if desired:  Install zookeeper  Configure and run journalnodes  Go for High Availability cluster implementation with multiple Namenodes  Leave feedback if you wish to know the Hadoop configuration samples The Garnishing
  • 15. Disclaimer: Don’t sue me for any damage/infringement, I am not rich 