SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 277
IMPROVING PERFORMANCE OF APRIORI ALGORITHM USING
HADOOP
Ravindra Bachate1
, Hyder Ali Hingoliwala2
1
Department of Computer Engineering, JSPM’s JSCOE, Pune, Maharashtra, India
2
Department of Computer Engineering, JSPM’s JSCOE, Pune, Maharashtra, India
Abstract
Spatial data is a data having a geological information. This paper explores the use of Hadoop framework to improve the
performance of Apriori algorithm for spatial data mining. FP growth algorithm is better than Apriori but it fails in certain
situations. By applying the Apriori algorithm parallely using Hadoop framework to spatial data, we can perform well as compare
to FP growth. This paper includes clustering based on geological location, classification based on mineral resource type and
spatial coherence between mineral resources. Spatial data mining find out the different association rules by observing the spatial
data by using Apriori algorithm. The result of the paper will indicate the accurate prediction of occurrence of commodity with
respect to other commodity of mineral resources.
Keywords: Hadoop, data mining, association rules, clustering, spatial coherence
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
FP growth algorithm is a very popular algorithm used for
the association mining due its performance and less storage
requirement. With these pros, it has some limitations also
which are considered rarely. This paper focuses on the
limitations of FP growth association mining algorithm and
trying to suggest improvements in Apriori algorithm which
can overcome the limitations of FP Growth association
mining algorithm. As we aware about the hardware cost, it’s
decreasing day by day, so there is no need to concentrate
more on storage requirement. FP growth has two limitations
– 1. It is difficult to use for interactive mining where the
user may change support value as per requirement. 2. It is
not suitable where the data has been increasing with time.
So to deal with limitations of FP growth, this paper suggest
to implement Apriori algorithm using Hadoop for
association rule mining.
1.1 Mineral Resources Data System
Mineral Resources Data System is a collection of data
describing metallic and nonmetallic mineral resources in the
world [7]. It includes resource name, location, commodity,
geologic characteristics, resource description, production,
reserves, and references. As MRDS contains mineral
resources data around the world, it is large and complex. If
data size goes beyond the Tera Byte, it is difficult to process
and mine using FP growth algorithm. The performance of
FP growth algorithm hampers when adding the new records
and also by changing the support value. As the mineral
resources data set is collected from various regions and
people, we need to perform ETL (extract, transform and
load) operations on the data for processing and mining.
1.2 Hadoop Map Reduce
To deal with unstructured and big data like mineral
resources data system (MRDS), we need a best technology
which can cope with it. There are two options available,
parallel DBMS and Hadoop Map Reduce technology.
Hadoop has another projects also which can be used for the
mining purpose like Hive. But internally again all the
projects using Hadoop Map Reduce technique [5]. It gives a
better data processing performance with minimum cost and
time as compare to parallel DBMS because it works with
commodity hardware. Hadoop stores data in the form of
blocks on Hadoop Distributed File System i.e. HDFS. The
Hadoop framework provides a solution for problems of
massive data processing; because it runs applications on
large cluster built of commodity hardware with failure
tolerance [4].Unstructured data can be processed with
Hadoop Map Reduce technique which is not possible with
RDBMS. Map Reduce provides flexibility and fault
tolerance which is not with parallel DBMS. Map Reduce
provides automatic parallelization, data partitioning, task
scheduling, handling machine failures and manages inter-
machine communication. Hadoop is totally transparent from
the end user. The rate of growing an unstructured data is
much more as compare to the structured data. The
unstructured data includes media files, heavy text files, csv
files, log files etc.
2. RELATED WORK
Association rule mining algorithm includes to find coverage,
support, confidence, lift and interesting [4]. Coverage
defines the proportion of case data specified on the Left Haft
Hand Side of the rule. Support of an association rule means
percentage of task relevant data for which the pattern is true.
Confidence gives the trustworthiness associated with the
patterns discovered. Lift is a measure of the importance of
the association. The term interesting gives the strength of
associations between sets of items in the association rule.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 278
Hongyong Yu, Deshuai Wang [1] proposed a system for
data processing and mining log data of SaaS cloud using
Hadoop. This paper focuses on Hadoop’s Map Reduce
technique and the algorithm used for data mining by
Hongyong Yu, Deshuai Wang. They suggest by applying
Apriori algorithm concurrently in the distributed system,
performance of Apriori algorithm can be increased in
proportional to the number of nodes in the distributed
system [1].
Every day, 2.5 quintillion bytes of data are created and 90
percent of the data in the world today were produced within
the past two years [3].The commonly used software
technology cannot cope with massive data and big challenge
is to extract an important information from it. Big data has
large volume, heterogeneous format and decentralized data
control. The example of big data applications are Facebook,
Twitter and Google. It is a big challenge to manage and
mining a massive data because of its volume, different file
formats and growing rate of the data in the world. There are
many challenges with big data such as storage, processing,
variety and cost.
3. PROPOSED SYSTEM
This paper proposes a system which overcomes the problem
faced in FP growth association mining algorithm. To
improve the performance of Apriori algorithm, it is
implemented parallely using Hadoop map reduce technique.
The proposed system has three modules
1. Spatial Clustering
2. Spatial Classification
3. Spatial Coherence
Before implementing these modules, we need to perform
Extract, Transfer and Load operations on the raw data.
Because the data available may be in the various form and
having an unnecessary information into to it. So first we
need to extract the required data from raw data and then
transfer it in to the csv format. To process this data, it should
be loaded on to the HDFS.
Fig-1 Proposed System
3.1 Spatial Clustering
There are various algorithms available for clustering like k
means but here in this paper, Hadoop partition technique is
used to perform spatial clustering based on the geological
location like Alaska, California etc[2]. It runs parallel on
different nodes concurrently. Because of this, time required
for spatial clustering is less as compared to k-means
clustering algorithm. In this paper, U.S. mineral resources
dataset is taken. In effect, there are 50 clusters formed with
respect to the states in United State.
These clusters have all the records which belongs to
respective state. Intention behind to form the clusters is to
make it specialize and reduce the processing time and cost.
3.2 Spatial Classification
Here, the same technique is used for implementing a spatial
classification which is used for spatial clustering. In this
module, again the data set is clustered according to the type
of commodity which is required for the next module. In
spatial classification, the records having same commodity
put in to the separate cluster to make it more specialize. It
means, e.g. Alaska state may have another sub clusters with
commodity type like gold, silver, copper etc.
3.3 Spatial Coherence
Objective of this paper is to find out the spatial coherence.
Spatial coherence means if we are mining for gold then what
are the possibilities of getting other mineral resources at the
same location. For this, association mining is required. As
we have discussed above, FP Growth algorithms fails in two
situations. So to overcome this, this paper introduces
parallel Apriori algorithm for spatial association mining. In
Apriori algorithm, first of all, we have to find out the
candidate keys. After this, we have to perform two task, one
is to find out the support values and second is to find out the
confidence. Formula’s to calculate a support and confidence
for gold with copper.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 279
1. Support Formula
Where N is all the records in a cluster
2. Confidence Formula
Here, a formulas for calculating support and confidence of
gold is given. Support of (gold => copper) gives the
probability of having gold with copper in all the records
whereas confidence gives the probability of having of gold
with copper with respect to all the gold records in a cluster.
The value of support is always less than the confidence.
4. RESULTS
To implement this idea, we have taken a dataset of MRDS
which is a spatial data set of mineral resources in U.S.[7].
This sample data has around 3.5 lack records.
Table -1: Support and Confidence
Commodity Support (%) Confidence (%)
(Gold -> Copper) 2.89 9.30
(Gold -> Copper, Lead,
Silver)
0.72 2.3
(Gold -> Silver) 2.17 6.97
(Gold -> null) 24.63 79.06
Here, the Apriori algorithm is performed on single node and
multi node Hadoop to compare the performance. Also we
find the association rules for this spatial data for a single
cluster.
Table -2: Execution on Single and Multinode System
No. of Node Execution Time in Sec
1 5.2
2 3.1
3 2.3
The result in Table-1 shows the different association rules
with respect to gold.Table-2 shows the performance of
Apriori algorithm using single node and multi node system.
It is observed that if we implement the Apriori Algorithm
parallel on Hadoop, performance is improved.
5. CONCLUSION
FP growth algorithm has two limitations, it cannot be used
for dynamic data size and where the support needs to change
according to situation. By applying the Apriori Algorithm
concurrently using Hadoop, we can overcome the problems
faced by FP growth association mining algorithm. Also we
can improve the speed of association rule mining for spatial
data as compare to the FP growth algorithm as Hadoop is a
distributed system.
ACKNOWLEDGEMENTS
I express true sense of gratitude towards my project guide
Prof. H.A. Hingoliwala, Associate Professor Computer
Department for his invaluable co-operation and guidance
that he gave me throughout my project. I specially thank our
P.G coordinator Prof. M. D. Ingle for inspiring me and
providing me all the lab facilities. I would also like to
express my appreciation and thanks to HOD Prof. S.M.
Shinde & JSCOE Principal Dr. M.G. Jadhav and all my
friends who knowingly or unknowingly have assisted me
throughout my hard work
REFERENCES
[1]. Hongyong Yu, Deshuai Wang, “Mass Log Data
Processing and Mining Based on Hadoop and Cloud
Computing” .The 7th International Conference on Computer
Science & Education (ICCSE 2012)July 14-17, 2012.
Melbourne, Australia.
[2]. Duck-Ho Bae Coll. of Inf. & Commun., Hanyang Univ.,
Seoul, South Korea Ji-Haeng Baek ; Hyun-Kyo Oh ; Ju-
Won Song ; Sang-Wook Kim, “SD-Miner: A SPATIAL
DATA MINING SYSTEM” Network Infrastructure and
Digital Content, 2009.
[3]. Xindong Wu,Fellow, IEEE,Xingquan Zhu, Senior
Member, IEEE, Gong-Qing Wu, and Wei Ding,Senior
Member, IEEE, “Data mining with big data,” IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 26, NO. 1, JANUARY 2014.
[4]. Binbin He, Ying Cui, Jianhua Chen, Pingjing Xie, “A
Spatial Data Mining Method for Mineral Resources
Potential Assessment,” IEEE 978-1-4244-8351-
8/11,2011IEEE.
[5]. Hadoop: The definitive Guide, 3rd ed., O’Reilly, Tom
White, 2012
[6]. Hadoop, http://hadoop.apache.org/
[7]. MRDS, http://tin.er.usgs.gov/mrds/
Ad

Recommended

AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
IJDKP
 
Ijetcas14 316
Ijetcas14 316
Iasir Journals
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
Mr.Sameer Kumar Das
 
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET Journal
 
Paper id 25201498
Paper id 25201498
IJRAT
 
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance
IJECEIAES
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET Journal
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
MapR Technologies
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Parallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching Model
ijsrd.com
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.
Computer Science Journals
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
IRJET Journal
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
riyaniaes
 
An efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence api
João Gabriel Lima
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
Improved Map reduce Framework using High Utility Transactional Databases
Improved Map reduce Framework using High Utility Transactional Databases
International Journal of Engineering Inventions www.ijeijournal.com
 
A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
Hadoop interview questions
Hadoop interview questions
barbie0909
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
iosrjce
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning Clustering
MapR Technologies
 
Introduction to HADOOP
Introduction to HADOOP
Shital Kat
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET Journal
 
InternReport
InternReport
Swetha Tanamala
 
Paratransit Services Research
Paratransit Services Research
P.C.D.I Healthcare and Consultants of Texas L.L.C.
 

More Related Content

What's hot (20)

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
MapR Technologies
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Parallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching Model
ijsrd.com
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.
Computer Science Journals
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
IRJET Journal
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
riyaniaes
 
An efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence api
João Gabriel Lima
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
Improved Map reduce Framework using High Utility Transactional Databases
Improved Map reduce Framework using High Utility Transactional Databases
International Journal of Engineering Inventions www.ijeijournal.com
 
A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
Hadoop interview questions
Hadoop interview questions
barbie0909
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
iosrjce
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning Clustering
MapR Technologies
 
Introduction to HADOOP
Introduction to HADOOP
Shital Kat
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET Journal
 
InternReport
InternReport
Swetha Tanamala
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
MapR Technologies
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Parallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching Model
ijsrd.com
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.
Computer Science Journals
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
IRJET Journal
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
riyaniaes
 
An efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence api
João Gabriel Lima
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
Hadoop interview questions
Hadoop interview questions
barbie0909
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
iosrjce
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning Clustering
MapR Technologies
 
Introduction to HADOOP
Introduction to HADOOP
Shital Kat
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET Journal
 

Viewers also liked (16)

Paratransit Services Research
Paratransit Services Research
P.C.D.I Healthcare and Consultants of Texas L.L.C.
 
Chapters 1 and 2 Summary
Chapters 1 and 2 Summary
livvy milner
 
Chapter 7 Summary
Chapter 7 Summary
livvy milner
 
CHEM2500OutlineSample
CHEM2500OutlineSample
Cole French, CET, B.Sc Chemistry Student
 
Major 2 p pt
Major 2 p pt
Rahul Agarwal
 
Webinar: Inside Social (April 2016)
Webinar: Inside Social (April 2016)
Universum Webinars
 
RF 55f - Fondation Raoul Follereau - Rapport Annuel 2008
RF 55f - Fondation Raoul Follereau - Rapport Annuel 2008
Bernard hardy
 
Frontend performance metrics
Frontend performance metrics
Артем Захарченко
 
Sunk Costs
Sunk Costs
tutor2u
 
Trabajo ctm
Trabajo ctm
Matias ascanio
 
Fixed, variable and Incremental Cost
Fixed, variable and Incremental Cost
Rahat Inayat Ali
 
Utterback_Douglas - Resume
Utterback_Douglas - Resume
Douglas Utterback
 
Building a Modern Windows App
Building a Modern Windows App
Brent Edwards
 
Apriori algorithm
Apriori algorithm
Ashis Kumar Chanda
 
Александр Кашеверов — Коротко про WEB: HTML, CSS, JS.
Александр Кашеверов — Коротко про WEB: HTML, CSS, JS.
DataArt
 
Ad

Similar to Improving performance of apriori algorithm using hadoop (20)

LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
ijdpsjournal
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
acijjournal
 
Cache mechanism to avoid dulpication of same thing in hadoop system to speed ...
Cache mechanism to avoid dulpication of same thing in hadoop system to speed ...
eSAT Journals
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
Mining High Utility Patterns in Large Databases using Mapreduce Framework
Mining High Utility Patterns in Large Databases using Mapreduce Framework
IRJET Journal
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET Journal
 
B017320612
B017320612
IOSR Journals
 
IJSRED-V2I3P84
IJSRED-V2I3P84
IJSRED
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Review: Data Driven Traffic Flow Forecasting using MapReduce in Distributed M...
Review: Data Driven Traffic Flow Forecasting using MapReduce in Distributed M...
AM Publications
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
 
IJET-V2I6P25
IJET-V2I6P25
IJET - International Journal of Engineering and Techniques
 
B1803031217
B1803031217
IOSR Journals
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
 
Multi-Cloud Services
Multi-Cloud Services
IRJET Journal
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
ijdpsjournal
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
acijjournal
 
Cache mechanism to avoid dulpication of same thing in hadoop system to speed ...
Cache mechanism to avoid dulpication of same thing in hadoop system to speed ...
eSAT Journals
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
Mining High Utility Patterns in Large Databases using Mapreduce Framework
Mining High Utility Patterns in Large Databases using Mapreduce Framework
IRJET Journal
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET Journal
 
IJSRED-V2I3P84
IJSRED-V2I3P84
IJSRED
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Review: Data Driven Traffic Flow Forecasting using MapReduce in Distributed M...
Review: Data Driven Traffic Flow Forecasting using MapReduce in Distributed M...
AM Publications
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
 
Multi-Cloud Services
Multi-Cloud Services
IRJET Journal
 
Ad

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
eSAT Journals
 
Material management in construction – a case study
Material management in construction – a case study
eSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought short term strategies in semi arid regions a case study
eSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
eSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
eSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
eSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
eSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
eSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
eSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
eSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
eSAT Journals
 
Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
eSAT Journals
 
Material management in construction – a case study
Material management in construction – a case study
eSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought short term strategies in semi arid regions a case study
eSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
eSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
eSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
eSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
eSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
eSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
eSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
eSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
eSAT Journals
 

Recently uploaded (20)

Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Structured Programming with C++ :: Kjell Backman
Structured Programming with C++ :: Kjell Backman
Shabista Imam
 
Industrial internet of things IOT Week-3.pptx
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
AI_Presentation (1). Artificial intelligence
AI_Presentation (1). Artificial intelligence
RoselynKaur8thD34
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Structured Programming with C++ :: Kjell Backman
Structured Programming with C++ :: Kjell Backman
Shabista Imam
 
Industrial internet of things IOT Week-3.pptx
Industrial internet of things IOT Week-3.pptx
KNaveenKumarECE
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
AI_Presentation (1). Artificial intelligence
AI_Presentation (1). Artificial intelligence
RoselynKaur8thD34
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 

Improving performance of apriori algorithm using hadoop

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 277 IMPROVING PERFORMANCE OF APRIORI ALGORITHM USING HADOOP Ravindra Bachate1 , Hyder Ali Hingoliwala2 1 Department of Computer Engineering, JSPM’s JSCOE, Pune, Maharashtra, India 2 Department of Computer Engineering, JSPM’s JSCOE, Pune, Maharashtra, India Abstract Spatial data is a data having a geological information. This paper explores the use of Hadoop framework to improve the performance of Apriori algorithm for spatial data mining. FP growth algorithm is better than Apriori but it fails in certain situations. By applying the Apriori algorithm parallely using Hadoop framework to spatial data, we can perform well as compare to FP growth. This paper includes clustering based on geological location, classification based on mineral resource type and spatial coherence between mineral resources. Spatial data mining find out the different association rules by observing the spatial data by using Apriori algorithm. The result of the paper will indicate the accurate prediction of occurrence of commodity with respect to other commodity of mineral resources. Keywords: Hadoop, data mining, association rules, clustering, spatial coherence --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION FP growth algorithm is a very popular algorithm used for the association mining due its performance and less storage requirement. With these pros, it has some limitations also which are considered rarely. This paper focuses on the limitations of FP growth association mining algorithm and trying to suggest improvements in Apriori algorithm which can overcome the limitations of FP Growth association mining algorithm. As we aware about the hardware cost, it’s decreasing day by day, so there is no need to concentrate more on storage requirement. FP growth has two limitations – 1. It is difficult to use for interactive mining where the user may change support value as per requirement. 2. It is not suitable where the data has been increasing with time. So to deal with limitations of FP growth, this paper suggest to implement Apriori algorithm using Hadoop for association rule mining. 1.1 Mineral Resources Data System Mineral Resources Data System is a collection of data describing metallic and nonmetallic mineral resources in the world [7]. It includes resource name, location, commodity, geologic characteristics, resource description, production, reserves, and references. As MRDS contains mineral resources data around the world, it is large and complex. If data size goes beyond the Tera Byte, it is difficult to process and mine using FP growth algorithm. The performance of FP growth algorithm hampers when adding the new records and also by changing the support value. As the mineral resources data set is collected from various regions and people, we need to perform ETL (extract, transform and load) operations on the data for processing and mining. 1.2 Hadoop Map Reduce To deal with unstructured and big data like mineral resources data system (MRDS), we need a best technology which can cope with it. There are two options available, parallel DBMS and Hadoop Map Reduce technology. Hadoop has another projects also which can be used for the mining purpose like Hive. But internally again all the projects using Hadoop Map Reduce technique [5]. It gives a better data processing performance with minimum cost and time as compare to parallel DBMS because it works with commodity hardware. Hadoop stores data in the form of blocks on Hadoop Distributed File System i.e. HDFS. The Hadoop framework provides a solution for problems of massive data processing; because it runs applications on large cluster built of commodity hardware with failure tolerance [4].Unstructured data can be processed with Hadoop Map Reduce technique which is not possible with RDBMS. Map Reduce provides flexibility and fault tolerance which is not with parallel DBMS. Map Reduce provides automatic parallelization, data partitioning, task scheduling, handling machine failures and manages inter- machine communication. Hadoop is totally transparent from the end user. The rate of growing an unstructured data is much more as compare to the structured data. The unstructured data includes media files, heavy text files, csv files, log files etc. 2. RELATED WORK Association rule mining algorithm includes to find coverage, support, confidence, lift and interesting [4]. Coverage defines the proportion of case data specified on the Left Haft Hand Side of the rule. Support of an association rule means percentage of task relevant data for which the pattern is true. Confidence gives the trustworthiness associated with the patterns discovered. Lift is a measure of the importance of the association. The term interesting gives the strength of associations between sets of items in the association rule.
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 278 Hongyong Yu, Deshuai Wang [1] proposed a system for data processing and mining log data of SaaS cloud using Hadoop. This paper focuses on Hadoop’s Map Reduce technique and the algorithm used for data mining by Hongyong Yu, Deshuai Wang. They suggest by applying Apriori algorithm concurrently in the distributed system, performance of Apriori algorithm can be increased in proportional to the number of nodes in the distributed system [1]. Every day, 2.5 quintillion bytes of data are created and 90 percent of the data in the world today were produced within the past two years [3].The commonly used software technology cannot cope with massive data and big challenge is to extract an important information from it. Big data has large volume, heterogeneous format and decentralized data control. The example of big data applications are Facebook, Twitter and Google. It is a big challenge to manage and mining a massive data because of its volume, different file formats and growing rate of the data in the world. There are many challenges with big data such as storage, processing, variety and cost. 3. PROPOSED SYSTEM This paper proposes a system which overcomes the problem faced in FP growth association mining algorithm. To improve the performance of Apriori algorithm, it is implemented parallely using Hadoop map reduce technique. The proposed system has three modules 1. Spatial Clustering 2. Spatial Classification 3. Spatial Coherence Before implementing these modules, we need to perform Extract, Transfer and Load operations on the raw data. Because the data available may be in the various form and having an unnecessary information into to it. So first we need to extract the required data from raw data and then transfer it in to the csv format. To process this data, it should be loaded on to the HDFS. Fig-1 Proposed System 3.1 Spatial Clustering There are various algorithms available for clustering like k means but here in this paper, Hadoop partition technique is used to perform spatial clustering based on the geological location like Alaska, California etc[2]. It runs parallel on different nodes concurrently. Because of this, time required for spatial clustering is less as compared to k-means clustering algorithm. In this paper, U.S. mineral resources dataset is taken. In effect, there are 50 clusters formed with respect to the states in United State. These clusters have all the records which belongs to respective state. Intention behind to form the clusters is to make it specialize and reduce the processing time and cost. 3.2 Spatial Classification Here, the same technique is used for implementing a spatial classification which is used for spatial clustering. In this module, again the data set is clustered according to the type of commodity which is required for the next module. In spatial classification, the records having same commodity put in to the separate cluster to make it more specialize. It means, e.g. Alaska state may have another sub clusters with commodity type like gold, silver, copper etc. 3.3 Spatial Coherence Objective of this paper is to find out the spatial coherence. Spatial coherence means if we are mining for gold then what are the possibilities of getting other mineral resources at the same location. For this, association mining is required. As we have discussed above, FP Growth algorithms fails in two situations. So to overcome this, this paper introduces parallel Apriori algorithm for spatial association mining. In Apriori algorithm, first of all, we have to find out the candidate keys. After this, we have to perform two task, one is to find out the support values and second is to find out the confidence. Formula’s to calculate a support and confidence for gold with copper.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 12 | Dec-2014, Available @ http://www.ijret.org 279 1. Support Formula Where N is all the records in a cluster 2. Confidence Formula Here, a formulas for calculating support and confidence of gold is given. Support of (gold => copper) gives the probability of having gold with copper in all the records whereas confidence gives the probability of having of gold with copper with respect to all the gold records in a cluster. The value of support is always less than the confidence. 4. RESULTS To implement this idea, we have taken a dataset of MRDS which is a spatial data set of mineral resources in U.S.[7]. This sample data has around 3.5 lack records. Table -1: Support and Confidence Commodity Support (%) Confidence (%) (Gold -> Copper) 2.89 9.30 (Gold -> Copper, Lead, Silver) 0.72 2.3 (Gold -> Silver) 2.17 6.97 (Gold -> null) 24.63 79.06 Here, the Apriori algorithm is performed on single node and multi node Hadoop to compare the performance. Also we find the association rules for this spatial data for a single cluster. Table -2: Execution on Single and Multinode System No. of Node Execution Time in Sec 1 5.2 2 3.1 3 2.3 The result in Table-1 shows the different association rules with respect to gold.Table-2 shows the performance of Apriori algorithm using single node and multi node system. It is observed that if we implement the Apriori Algorithm parallel on Hadoop, performance is improved. 5. CONCLUSION FP growth algorithm has two limitations, it cannot be used for dynamic data size and where the support needs to change according to situation. By applying the Apriori Algorithm concurrently using Hadoop, we can overcome the problems faced by FP growth association mining algorithm. Also we can improve the speed of association rule mining for spatial data as compare to the FP growth algorithm as Hadoop is a distributed system. ACKNOWLEDGEMENTS I express true sense of gratitude towards my project guide Prof. H.A. Hingoliwala, Associate Professor Computer Department for his invaluable co-operation and guidance that he gave me throughout my project. I specially thank our P.G coordinator Prof. M. D. Ingle for inspiring me and providing me all the lab facilities. I would also like to express my appreciation and thanks to HOD Prof. S.M. Shinde & JSCOE Principal Dr. M.G. Jadhav and all my friends who knowingly or unknowingly have assisted me throughout my hard work REFERENCES [1]. Hongyong Yu, Deshuai Wang, “Mass Log Data Processing and Mining Based on Hadoop and Cloud Computing” .The 7th International Conference on Computer Science & Education (ICCSE 2012)July 14-17, 2012. Melbourne, Australia. [2]. Duck-Ho Bae Coll. of Inf. & Commun., Hanyang Univ., Seoul, South Korea Ji-Haeng Baek ; Hyun-Kyo Oh ; Ju- Won Song ; Sang-Wook Kim, “SD-Miner: A SPATIAL DATA MINING SYSTEM” Network Infrastructure and Digital Content, 2009. [3]. Xindong Wu,Fellow, IEEE,Xingquan Zhu, Senior Member, IEEE, Gong-Qing Wu, and Wei Ding,Senior Member, IEEE, “Data mining with big data,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 1, JANUARY 2014. [4]. Binbin He, Ying Cui, Jianhua Chen, Pingjing Xie, “A Spatial Data Mining Method for Mineral Resources Potential Assessment,” IEEE 978-1-4244-8351- 8/11,2011IEEE. [5]. Hadoop: The definitive Guide, 3rd ed., O’Reilly, Tom White, 2012 [6]. Hadoop, http://hadoop.apache.org/ [7]. MRDS, http://tin.er.usgs.gov/mrds/