SlideShare a Scribd company logo
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 3, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 701
Improved FrequentPatternMiningAlgorithmusing Divideand Conquer
Technique withCurrentProblemSolutions
Nirav Patel1
Kiran Amin2
1
PG Student (I. T.) 2
Head of Department
1
Dept. of Info. Technology 2
Department of Computer Science
1, 2
Ganpat University, Kherva, Gujarat, India
Abstract— Frequent patterns are patterns such as item sets,
subsequences or substructures that appear in a data set
frequently. A Divide and Conquer method is used for
finding frequent item set mining. Its core advantages are
extremely simple data structure and processing scheme.
Divide the original dataset in the projected database and find
out the frequent pattern from the dataset. Split and Merge
uses a purely horizontal transaction representation. It gives
very good result for dense dataset. The researchers introduce
a split and merge algorithm for frequent item set mining.
There are some problems with this algorithm. We have to
modify this algorithm for getting better results and then we
will compare it with old one. We have suggested different
methods to solve problem with current algorithm. We
proposed two methods (1) Method I and (2) Method II for
getting solution of problem. We have compared our
algorithm with the currently worked algorithm SaM. We
examine the performance of SaM and Modified SaM using
real datasets. We have taken results for both dense and
sparse datasets.
I. INTRODUCTION
In, few years the size of database has increased rapidly. The
term data mining or knowledge discovery in database has
been adopted for a field of research dealing with the
automatic discovery of implicit information or knowledge
within the databases. The implicit information within
databases, mainly the interesting association relationships
among sets of objects that lead to association rules may
disclose useful patterns for decision support, financial
forecast, marketing policies, even medical diagnosis and
many other applications.
Frequent itemsets play an essential role in many
data mining tasks that try to find interesting patterns from
databases such as association rules, sequences, clusters and
many more of which the mining of association rules is one
of the most popular problems. The original motivation for
searching association rules came from the need to analyze
called supermarket transaction data, that is, to examine
customer behavior in terms of the purchased products.
Association rules describe how often items are purchased
together.
II. FREQUENT ITEMSET MINING
Studies of Frequent Itemset (or pattern) Mining[1,7] is
acknowledged in the data mining field because of its broad
applications in mining association rules, correlations, and
graph pattern constraint based on frequent patterns,
sequential patterns, and many other data mining tasks.
Efficient algorithms for mining frequent itemsets are crucial
for mining association rules as well as for many other data
mining tasks. The major challenge found in frequent pattern
mining is a large number of result patterns. As the minimum
threshold becomes lower, an exponentially large number of
itemsets are generated. Therefore, pruning unimportant
patterns can done effectively in mining process and that
becomes one of the main topics in frequent pattern mining.
Consequently, the main aim is to optimize the process of
finding patterns of which should be efficient, scalable and
can detect the important of patterns are which can be used in
various ways.
III. RELATED WORK
A. Apriori
The most popular frequent item set mining called the
Apriori algorithm was introduced by [1].The item sets are
check in the order of increasing size (breadth first/level wise
traversal of the prefix tree). The canonical form of item sets
and the induced prefix tree are use to ensure that each
candidate item set is generated at most once. The already
generated levels are used to execute Apriori [1] pruning of
the candidate item sets (using the Apriori property). Apriori
[1,7]: before accessing the transaction database to determine
the support Transactions are represented as simple arrays of
items (so-called horizontal transaction representation, see
also below). The support of a candidate item set is
computing by checking whether they are subsets of a
transaction or by generating and finding subsets of a
transaction .For more detail refer [10].
B. Eclat
Eclat [6, 9, 10] algorithm is basically a depth-first
search algorithm using set intersection. It uses a vertical
database layout i.e. instead of explicitly listing all
transactions; each item is stored together with its cover (also
called TIDList) and uses the intersection based approach to
compute the support of an item set. In this way, the support
of an item set X can be easily computed by simply
intersecting the covers of any two subsets Y, Z ⊆ X, such
that Y U Z = X. It states that, when the database is stored in
the vertical layout, the support of a set can counted much
easier by simply intersecting the covers of two of its subsets
that together give the set itself.
It essentially generates the candidate itemsets using only the
join step from Apriori [1]. Again all the items in the
database is reordered in ascending order of support to reduce
the number of candidate itemsets that is generated, and
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions
(IJSRD/Vol. 1/Issue 3/2013/0077)
All rights reserved by www.ijsrd.com 702
hence, reduce the number of intersections that need to be
computed and the total size of the covers of all generated
itemsets. Since the algorithm does not fully exploit the
monotonicity property, but generates a candidate item set
based on only two of its subsets, the number of candidate
item sets that are generate is much larger as compared to a
breadth-first approach such as Apriori. As a comparison,
Eclat essentially generates candidate itemsets using only the
join step from Apriori [4], since the itemsets necessary for
the prune step are not available.
C. SaM
The Split and Merge algorithm [3,8] is a simplification of
the already fairly simple RElim (Recursive Elimination)
algorithm[2]. While RElim represents a (conditional)
database by storing one transaction list for each item
(partially vertical representation), the split and merge
algorithm employsonly a single transaction list (purely
horizontal representation), stored as an array. This array is
process with a simple split and merge scheme, which
computes a conditional database, processes this conditional
database recursively. An occurrence counter and a pointer to
the sorted transaction (array of contained items). This data
structure is then processedrecursively to find the frequent
item sets. The basic operations of the recursive processing is
based on depth-first/divide-and conquer scheme. In, split
steps given array is split with respect to the leading item of
the first transaction. All array elements referring to
transactions starting with this item are transfer to a new
array. The new array created in the split step and the rest of
the original arrays are combining with a procedure that is
almost identical to one phase of the well-known merge sort
algorithm. The main reason for the merge operation in SaM
[3,8] is to keep the list sorted, so that, (1)All transactions
with the same leading item are grouped together and
(2)Equal transactions (or transaction suffixes) can be
combined, thus reducing the number of objects to process.
Fig. 1 The example database: (1) original form, (2) item
frequencies, (3) transactions with sorted items, (4)
lexicographically sorted transactions, and the used (5) data
structure
Fig. 2: The basic operations of the Split and Merge
algorithm: split (left) and merge (right).
The steps illustrated in Fig. 1 for a simple example
transaction database are below [3,8]:
1) Step 1: Shows the transaction database in its original
form.
2) Step 2: The frequencies of individual items are
determined from this input in order to be able to
discard infrequent items immediately. If we assume a
minimum support of three transactions for our
example, there areno infrequent items, so all items are
kept
3) Step 3: The (frequent) items in each transaction are
sorting according to their frequency in the transaction
database, since it well known that processing the items
in the order of increasing frequency usually leads to
the shortest execution times.
4) Step 4: The transactions are sorted lexicographically
into descending order, with item comparisons again
being decided by the item frequencies, although here
the item with the higher frequency precedes the item
with the lower frequency.
5) Step 5: The data structure on which SaM operates is
built by combining equal transactions and setting up an
array, in which each element consists of two fields: an
occurrence counter and a pointer to the sorted
transaction. This data structure is then processed
recursively to find the frequent item sets.
The basic operations in divide-and-conquer scheme
reviewed [3,2] in Fig. 3.3.2. In the split step (see the left part
of Figure) the given array is split w.r.t. the leading item of
the first transaction (item e in our example): all array
elements referring to transactions starting with this item are
transferred to a new array. In this process, the pointer (in) to
the transaction is advance by one item, so that the common
leading item will remove from all transactions. Obviously,
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions
(IJSRD/Vol. 1/Issue 3/2013/0077)
All rights reserved by www.ijsrd.com 703
this new array represents all frequent items sets containing
the split item (provided this item is frequent). Likewise,
Merge operation done in example.
IV. PROBLEM WITH CURRENT SAM
Here we will focus on frequent item set mining
using divide and conquer technique in split and merge
algorithm. As we have discussed on example how split is
select and then merge item set is use for finding frequent.
Some problems are arrives when taken results. This problem
is critical at initial point. It creates problems at select item
from item set and generates affected result.
We will discuss problem with example for specific
situation like this.
Fig. 3: Problems with SaM
Here one example is identifying the problem. There are 10
different transactions as shown in Fig. 4.1(Left). Now, each
item frequency is initializing in shown in figure 4.1(Right).
For e=3, a=3, c=5, b=8, d=8. Now, e and a have frequency
are same. Then how can select first split item for algorithm.
In, first step both frequency are same. So these controversy
is created to select e or select a. From initial point, we have
to stop the calculation if we have this type of situation. SaM
algorithm given affected result when this type of situation is
created. We identify this problem and still work on find
solution for SaM algorithm. When we get solution, we will
present our result.
V. MODIFIED MECHANISM
As we have discussed in problem identification, when there
is situation like first both items have same frequency then
result is not proper. So now we have to find solution for
that. We have solution for this. For this type of situation we
have proposed one solution. For n different items if we want
to use this algorithm for finding frequent item set, we have
to consider first two same frequency counts with passing
support. Among them which we have to select is dependent
on number of transaction it contains. Suppose, here E has 3
transaction and A has 4 transaction, then we have to select
least of them. i.e E is selected.
Fig. 4: Problem Solution
We have to modify existing algorithm for reducing
total execution time. In current algorithm too much scanning
and sorting is used. So execution time is more. We have to
modify this algorithm in such a way that result is not
affected but execution time will decrease. We have made
some modification for that. First check this modified
algorithm steps. First two steps are as it was in Split and
Merge algorithm. As discussed in problem with current split
and merge algorithm. We have solved that problem with this
algorithm.
 After Second Step, First assign all items which passes
minimum support in array.
 Then according to transaction assign remaining items
for each item. If any item is not starting with
transaction then put it as it is.
 Remove least frequency item (single) with all its
transaction.
 Copy and store all transaction items.
 Remove next least frequency item with all is
transaction.
 Copy and store all transaction items.
 Repeat this until transaction is empty.
VI. EXPERIMENTS AND PERFORMANCE
COMPARISON
We present our experimental results that show that the
modified split and merge method achieves reasonably good
result in terms of time. We processed three datasets.
Algorithm has been implemented in C and platform used is
Ubuntu 11.04 - the Natty Narwhal - released in April
2011.CPU with 2GB of RAM, 8 Processor and 20GB of
hard drive space is used.
A. Dataset Information
Data Set Chess Mushroom PUMSB
Available
at
Frequent Itemset
Mining Dataset
Repository [12]
Frequent Itemset
Mining Dataset
Repository [13]
Frequent
Itemset Mining
Dataset
Repository
[14]
Donated by Roberto Bayardo Roberto Bayardo
Roberto
Bayardo
Total
instances
1,18,252 1,86,852 36,29,404
Total
Columns
37 23 74
Total
Transaction
3196 8124 49046
Attributes
type
Numeric Numeric Numeric
No of
instances
processed
All instances All instances All instances
Description
This data was
collected from
Roberto Bayardo
from the UCI
datasets. In this
dataset, moves of
chess game in
numeric values
stored. Total no
of transaction are
3196.This is one
type of dense
dataset. The data
set listing chess
This data was
collected from
Roberto Bayardo
from the UCI
datasets. In this
dataset numeric
values stored.
Total no of
transaction are
8124.This is one
type of sparse
dataset. The data
set describing
poisonous and
This data was
collected from
Roberto
Bayardo from
PUMBS. In
this dataset
numeric values
stored. Total
no of
transaction are
49046.This is
one type of
sparse dataset.
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions
(IJSRD/Vol. 1/Issue 3/2013/0077)
All rights reserved by www.ijsrd.com 704
end game
positions for king
vs. king and rook.
edible mushrooms
by different
attributes.
Table. 1: Dataset Information [11]
B. Results
We have taken results with different datasets with support
threshold. We run algorithm on C framework and platform
used is Ubuntu 11.CPU with 2GB of RAM, 8 Processor and
20GB of hard drive space is used. Describe results in below
Table. We have found average result of execution time for
Modified SaM and SaM algorithm [1, 3, 6]. We have
compared our results with Eclat algorithm also. We have
used item sets like Chess, Mushroom, PUMSB [12, 13, 14].
We have taken result for Eclat [3] algorithm for comparison.
Eclat algorithm is used for finding Frequent Itemset Mining.
We have compared this algorithm with our modified SaM
and original SaM. Let us see the result of that.
Total time in seconds
Support MOD SaM Eclat
50 2.03 2.05 2.12
55 1.00 1.24 0.98
60 0.45 0.52 0.53
65 0.21 0.27 0.27
70 0.12 0.11 0.11
75 0.06 0.06 0.06
80 0.04 0.03 0.04
AVG 0.558571 0.611429 0.587143
Table. 2: Execution Time of Chess dataset
As shown in Table 2, we have taken results for
different support threshold for chess dataset. Here we
compared support 50%-80% with total execution time. We
have compared Eclat algorithm with our modified SaM and
original SaM algorithm. The time of execution is decreased
with the increase support threshold. Modified SAM gives
good result as compare to other. Results show that Eclat’s
performance is not good as compared to other.
Fig. 5: Execution Time of Chess dataset
Above Fig. 5 shows that the execution time for
algorithm decreases with the increase in support threshold
form 50% to 80% for chess dataset. We observed that SaM
and Eclat takes more time as that compared to Modified
SaM by average time.
Below Table. 3 shows that the execution time for
the SaM algorithm, Modified SaM and Eclat are
approximately same for higher support threshold and it
decreases with the decrease in support using Mushroom
dataset.
Total time in seconds
Support MOD SaM Eclat
50 0.05 0.06 0.06
55 0.05 0.05 0.06
60 0.05 0.05 0.05
65 0.04 0.05 0.05
70 0.04 0.04 0.04
75 0.04 0.04 0.04
80 0.04 0.04 0.04
AVG 0.044286 0.047143 0.048571
Table. 3: Execution Time of Mushroom dataset
Fig. 6: Execution Time of Mushroom dataset
Fig. 6 shows that the execution time of SaM and Modified
SaM algorithm is nearby but it can also be analyzed that the
execution time of SaM, Modified SaM and Eclat is
comparatively same for higher support threshold. As
experimental results SaM algorithm performs excellently on
dense data sets, but shows certain weaknesses on sparse data
sets.
As shown in Table 6.2.3, we have taken results for
different support threshold for PUMSB dataset. Here we
compared support 60%-80% with total execution time. The
time of execution is decrease with the increase support
threshold. Modified SaM performs better than Sam on
sparse dataset. In sparse dataset SaM cannot perform good
because of too much scanning and filtering. So Modified
SaM gives good results for both sparse and dense dataset.
Eclat performs averaged for PUMSB dataset.
Total time in seconds
Support MOD SaM Eclat
60 34.08 34.17 35.24
65 16.03 17.76 15.81
70 5.78 7.16 6.04
75 2.46 2.30 2.61
80 1.27 1.35 1.37
AVG 11.924 12.548 12.214
Table. 4: Execution Time PUMSB dataset
Fig. 7: Execution Time of PUMSB dataset
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions
(IJSRD/Vol. 1/Issue 3/2013/0077)
All rights reserved by www.ijsrd.com 705
As shown in Fig. 7 shows the execution time for all the
algorithms with different support threshold for PUMSB data
set. The time of execution is decrease with the increase
support threshold. Modified SaM gives good result as
compared to SaM. For lower support our modified SaM
does not give good performance for PUMSB dataset.
VII. CONCLUSION AND FUTURE ENHANCEMENT
In this paper, we study the frequent itemset mining and we
study some of the basic algorithm of frequent itemset
mining along with the one of the better algorithm for Split
and Merge. After analysis of the all the things till now, we
can say that SaM can’t work with some of the occasion. So
we modify the current algorithm to find out the frequent
itemset. We have observed frequent pattern mining
algorithm with their execution time for specific datasets. In
this thesis, an in-depth analysis of few algorithms is done
which made a significant contribution to the search of
improving the efficiency of frequent Itemset mining. By
comparing our result to classical frequent item set mining
algorithms like SaM and Eclat the strength and weaknesses
of these algorithms were analyzed. As experimental results
modified SaM algorithm performs excellently on dense data
sets as well as sparse dataset up some support limit.
We have found different problems in this
algorithm. If this problem is not solved then result is
affected. So we suggest two different methods for getting
better results. As experimental results modified SaM
algorithm performs excellently on data sets as compared to
original SaM and Eclat. We can also compare our algorithm
to another classical frequent itemset mining algorithm.
Modified SaM works really better at the moment
with compare to all other algorithms but we have planned to
develop the algorithm which is more efficient and fast than
the current version of Modified SaM and our main aim is to
develop the Modified SaM such a way that consumes the
less execution time compared to current version. One idea to
make it more effective in terms of execution time, we have
to reduce scanning and sorting such a way that
preprocessing is less as compared to current.
Second extension of the Modified SaM is that we can use
some of the taxonomy, which eliminates the some of the
items, which are not frequent, at the beginning of the stage
or user can decided which type of patterns he/she wants. So
it will not waste the time and memory.
REFERENCES
[1] Christian Borgelt. Frequent Item Set Mining, Wiley
Interdisciplinary Reviews: Data Mining and
Knowledge Discovery 2(6):437-456, J. Wiley & Sons,
Chichester, United Kingdom 2012
[2] C.Borgelt. Keeping Things Simple: Finding Frequent
ItemSets by Recursive Elimination. Proc. Workshop
OpenSoftware for Data Mining (OSDM’05 at
KDD’05, Chicago,IL), 66– 70. ACM Press, New
York, NY, USA 2005.
[3] Christian Borgelt and Xiaomeng Wang ,
(Approximate) Frequent Item Set Mining Made
Simple with a Split and Merge Algorithm, springer
2010
[4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and
A.I. Verkamo. Fast discovery of association rules. In
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R.
Uthurusamy, editors, Advances in Knowledge
Discovery and Data Mining, pages 307–328. MIT
Press, 1996.
[5] C.L. Blake and C.J. Merz. UCI Repository of Machine
Learning Databases. Dept. of Information and
Computer Science, University of California at Irvine,
CA, USA1998
[6] http://www.ics.uci.edu/˜mlearn/MLRepository
[7] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li.
NewAlgorithms for Fast Discovery of Association
Rules. Proc. 3rd Int. Conf. on Knowledge Discovery
and Data Mining (KDD’97), 283–296. AAAI Press,
Menlo Park, CA, USA 1997.
[8] R. Agrawal, T. Imielienski, and A. Swami. Mining
Association Rules between Sets of Items in Large
Databases. Proc. Conf. on Management of Data, 207–
216. ACM Press, New York, NY, USA 1993.
[9] C. Borgelt. SaM: Simple Algorithms for Frequent Item
Set Mining. IFSA/EUSFLAT 2009 conference- 2009.
[10] J. Han, and M. Kamber, 2000. Data Mining Concepts
and Techniques. Morgan Kanufmann.
[11] Christian Borgelt. Efficient Implementations of
Apriori and Eclat, Workshop of Frequent Item Set
Mining Implementations, Melbourne, FL, USA FIMI
2003
[12] Frequent Itemset Mining Dataset Repository.
(http://fimi.ua.ac.be/data)
[13] Robert Bayardo, “Frequent Itemset Mining Dataset
Repository, Chess Dataset”.
(http://fimi.ua.ac.be/data/chess.dat)
[14] Robert Bayardo, “Frequent Itemset Mining Dataset
Repository, Mushroom Dataset”.
(http://fimi.ua.ac.be/data/mushroom.dat.)
[15] Robert Bayardo, “Frequent Itemset Mining Dataset
Repository, PUMSB Dataset”,
(http://fimi.ua.ac.be/data/pumsb.dat.)

More Related Content

What's hot (19)

PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
PPT
Associative Learning
Indrajit Sreemany
 
PDF
Data mining techniques a survey paper
eSAT Publishing House
 
PDF
Data mining techniques
eSAT Journals
 
PDF
An Efficient Approach for Asymmetric Data Classification
AM Publications
 
PPTX
ADS Introduction
NagendraK18
 
PDF
Machine_Learning_Trushita
Trushita Redij
 
PDF
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
IRJET Journal
 
PDF
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
PDF
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
PDF
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 
PDF
An integrated mechanism for feature selection
sai kumar
 
PDF
A Survey on Constellation Based Attribute Selection Method for High Dimension...
IJERA Editor
 
PDF
Analysis on different Data mining Techniques and algorithms used in IOT
IJERA Editor
 
PDF
Survey on semi supervised classification methods and feature selection
eSAT Journals
 
PDF
G046024851
IJERA Editor
 
PDF
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
IJET - International Journal of Engineering and Techniques
 
PDF
IRJET- Missing Data Imputation by Evidence Chain
IRJET Journal
 
PDF
A Survey on the Clustering Algorithms in Sales Data Mining
Editor IJCATR
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
Associative Learning
Indrajit Sreemany
 
Data mining techniques a survey paper
eSAT Publishing House
 
Data mining techniques
eSAT Journals
 
An Efficient Approach for Asymmetric Data Classification
AM Publications
 
ADS Introduction
NagendraK18
 
Machine_Learning_Trushita
Trushita Redij
 
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
IRJET Journal
 
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 
An integrated mechanism for feature selection
sai kumar
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
IJERA Editor
 
Analysis on different Data mining Techniques and algorithms used in IOT
IJERA Editor
 
Survey on semi supervised classification methods and feature selection
eSAT Journals
 
G046024851
IJERA Editor
 
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
IJET - International Journal of Engineering and Techniques
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET Journal
 
A Survey on the Clustering Algorithms in Sales Data Mining
Editor IJCATR
 

Viewers also liked (20)

PPT
Frequent itemset mining using pattern growth method
Shani729
 
PPTX
Apriori algorithm
Junghoon Kim
 
PPTX
Efficient frequent pattern mining in distributed system
Saurav Kumar
 
PPTX
Temporal Pattern Mining
Prakhar Dhama
 
PDF
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
PDF
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Seattle DAML meetup
 
PPTX
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
PPT
A vertical representation in frequent item set mining
Dr.Manmohan Singh
 
PPT
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
PPTX
Major issues in data mining
Slideshare
 
PPTX
Data mining fp growth
Shihab Rahman
 
PPT
Fp growth algorithm
Pradip Kumar
 
PPSX
Frequent itemset mining methods
Prof.Nilesh Magar
 
PPT
The comparative study of apriori and FP-growth algorithm
deepti92pawar
 
PPT
Apriori algorithm
nouraalkhatib
 
PPT
Data mining
Samir Sabry
 
PDF
Lecture13 - Association Rules
Albert Orriols-Puig
 
PDF
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
PPT
Data mining slides
smj
 
Frequent itemset mining using pattern growth method
Shani729
 
Apriori algorithm
Junghoon Kim
 
Efficient frequent pattern mining in distributed system
Saurav Kumar
 
Temporal Pattern Mining
Prakhar Dhama
 
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Seattle DAML meetup
 
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
A vertical representation in frequent item set mining
Dr.Manmohan Singh
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
Major issues in data mining
Slideshare
 
Data mining fp growth
Shihab Rahman
 
Fp growth algorithm
Pradip Kumar
 
Frequent itemset mining methods
Prof.Nilesh Magar
 
The comparative study of apriori and FP-growth algorithm
deepti92pawar
 
Apriori algorithm
nouraalkhatib
 
Data mining
Samir Sabry
 
Lecture13 - Association Rules
Albert Orriols-Puig
 
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Data mining slides
smj
 
Ad

Similar to Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions (20)

PDF
J017114852
IOSR Journals
 
PDF
A classification of methods for frequent pattern mining
IOSR Journals
 
PDF
A Study of Various Projected Data Based Pattern Mining Algorithms
ijsrd.com
 
PDF
Review on: Techniques for Predicting Frequent Items
vivatechijri
 
PDF
An improvised tree algorithm for association rule mining using transaction re...
Editor IJCATR
 
PDF
Frequent Item Set Mining - A Review
ijsrd.com
 
PDF
Efficient Parallel Pruning of Associative Rules with Optimized Search
IOSR Journals
 
PDF
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
IOSR Journals
 
PDF
Efficient Temporal Association Rule Mining
IJMER
 
PDF
Efficient Temporal Association Rule Mining
International Journal of Engineering Inventions www.ijeijournal.com
 
PDF
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
PDF
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
PDF
Ijcatr04051008
Editor IJCATR
 
PDF
An improvised frequent pattern tree
IJDKP
 
PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
PDF
Ijcet 06 06_003
IAEME Publication
 
PDF
D0352630
iosrjournals
 
PDF
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES
IJDKP
 
PDF
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES
IJDKP
 
J017114852
IOSR Journals
 
A classification of methods for frequent pattern mining
IOSR Journals
 
A Study of Various Projected Data Based Pattern Mining Algorithms
ijsrd.com
 
Review on: Techniques for Predicting Frequent Items
vivatechijri
 
An improvised tree algorithm for association rule mining using transaction re...
Editor IJCATR
 
Frequent Item Set Mining - A Review
ijsrd.com
 
Efficient Parallel Pruning of Associative Rules with Optimized Search
IOSR Journals
 
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
IOSR Journals
 
Efficient Temporal Association Rule Mining
IJMER
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
ijsrd.com
 
Literature Survey of modern frequent item set mining methods
ijsrd.com
 
Ijcatr04051008
Editor IJCATR
 
An improvised frequent pattern tree
IJDKP
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
Ijcet 06 06_003
IAEME Publication
 
D0352630
iosrjournals
 
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES
IJDKP
 
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES
IJDKP
 
Ad

More from ijsrd.com (20)

PDF
IoT Enabled Smart Grid
ijsrd.com
 
PDF
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
PDF
IoT for Everyday Life
ijsrd.com
 
PDF
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
PDF
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
PDF
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
PDF
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
PDF
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
PDF
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
PDF
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
PDF
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
PDF
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
PDF
A Review: Microwave Energy for materials processing
ijsrd.com
 
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
PDF
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
PDF
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
PDF
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
PDF
Study and Review on Various Current Comparators
ijsrd.com
 
PDF
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
PDF
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 
IoT Enabled Smart Grid
ijsrd.com
 
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
IoT for Everyday Life
ijsrd.com
 
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
A Review: Microwave Energy for materials processing
ijsrd.com
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
Study and Review on Various Current Comparators
ijsrd.com
 
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 

Recently uploaded (20)

PPTX
Functions in Python Programming Language
BeulahS2
 
PPT
FINAL plumbing code for board exam passer
MattKristopherDiaz
 
PPTX
Precooling and Refrigerated storage.pptx
ThongamSunita
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PPTX
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Alan Turing - life and importance for all of us now
Pedro Concejero
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PDF
13th International Conference on Artificial Intelligence, Soft Computing (AIS...
ijait
 
PPTX
darshai cross section and river section analysis
muk7971
 
PDF
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PPTX
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PPTX
CM Function of the heart pp.pptxafsasdfddsf
drmaneharshalid
 
Functions in Python Programming Language
BeulahS2
 
FINAL plumbing code for board exam passer
MattKristopherDiaz
 
Precooling and Refrigerated storage.pptx
ThongamSunita
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Alan Turing - life and importance for all of us now
Pedro Concejero
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
13th International Conference on Artificial Intelligence, Soft Computing (AIS...
ijait
 
darshai cross section and river section analysis
muk7971
 
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
CM Function of the heart pp.pptxafsasdfddsf
drmaneharshalid
 

Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 3, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 701 Improved FrequentPatternMiningAlgorithmusing Divideand Conquer Technique withCurrentProblemSolutions Nirav Patel1 Kiran Amin2 1 PG Student (I. T.) 2 Head of Department 1 Dept. of Info. Technology 2 Department of Computer Science 1, 2 Ganpat University, Kherva, Gujarat, India Abstract— Frequent patterns are patterns such as item sets, subsequences or substructures that appear in a data set frequently. A Divide and Conquer method is used for finding frequent item set mining. Its core advantages are extremely simple data structure and processing scheme. Divide the original dataset in the projected database and find out the frequent pattern from the dataset. Split and Merge uses a purely horizontal transaction representation. It gives very good result for dense dataset. The researchers introduce a split and merge algorithm for frequent item set mining. There are some problems with this algorithm. We have to modify this algorithm for getting better results and then we will compare it with old one. We have suggested different methods to solve problem with current algorithm. We proposed two methods (1) Method I and (2) Method II for getting solution of problem. We have compared our algorithm with the currently worked algorithm SaM. We examine the performance of SaM and Modified SaM using real datasets. We have taken results for both dense and sparse datasets. I. INTRODUCTION In, few years the size of database has increased rapidly. The term data mining or knowledge discovery in database has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within the databases. The implicit information within databases, mainly the interesting association relationships among sets of objects that lead to association rules may disclose useful patterns for decision support, financial forecast, marketing policies, even medical diagnosis and many other applications. Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases such as association rules, sequences, clusters and many more of which the mining of association rules is one of the most popular problems. The original motivation for searching association rules came from the need to analyze called supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. Association rules describe how often items are purchased together. II. FREQUENT ITEMSET MINING Studies of Frequent Itemset (or pattern) Mining[1,7] is acknowledged in the data mining field because of its broad applications in mining association rules, correlations, and graph pattern constraint based on frequent patterns, sequential patterns, and many other data mining tasks. Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. The major challenge found in frequent pattern mining is a large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of itemsets are generated. Therefore, pruning unimportant patterns can done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Consequently, the main aim is to optimize the process of finding patterns of which should be efficient, scalable and can detect the important of patterns are which can be used in various ways. III. RELATED WORK A. Apriori The most popular frequent item set mining called the Apriori algorithm was introduced by [1].The item sets are check in the order of increasing size (breadth first/level wise traversal of the prefix tree). The canonical form of item sets and the induced prefix tree are use to ensure that each candidate item set is generated at most once. The already generated levels are used to execute Apriori [1] pruning of the candidate item sets (using the Apriori property). Apriori [1,7]: before accessing the transaction database to determine the support Transactions are represented as simple arrays of items (so-called horizontal transaction representation, see also below). The support of a candidate item set is computing by checking whether they are subsets of a transaction or by generating and finding subsets of a transaction .For more detail refer [10]. B. Eclat Eclat [6, 9, 10] algorithm is basically a depth-first search algorithm using set intersection. It uses a vertical database layout i.e. instead of explicitly listing all transactions; each item is stored together with its cover (also called TIDList) and uses the intersection based approach to compute the support of an item set. In this way, the support of an item set X can be easily computed by simply intersecting the covers of any two subsets Y, Z ⊆ X, such that Y U Z = X. It states that, when the database is stored in the vertical layout, the support of a set can counted much easier by simply intersecting the covers of two of its subsets that together give the set itself. It essentially generates the candidate itemsets using only the join step from Apriori [1]. Again all the items in the database is reordered in ascending order of support to reduce the number of candidate itemsets that is generated, and
  • 2. Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions (IJSRD/Vol. 1/Issue 3/2013/0077) All rights reserved by www.ijsrd.com 702 hence, reduce the number of intersections that need to be computed and the total size of the covers of all generated itemsets. Since the algorithm does not fully exploit the monotonicity property, but generates a candidate item set based on only two of its subsets, the number of candidate item sets that are generate is much larger as compared to a breadth-first approach such as Apriori. As a comparison, Eclat essentially generates candidate itemsets using only the join step from Apriori [4], since the itemsets necessary for the prune step are not available. C. SaM The Split and Merge algorithm [3,8] is a simplification of the already fairly simple RElim (Recursive Elimination) algorithm[2]. While RElim represents a (conditional) database by storing one transaction list for each item (partially vertical representation), the split and merge algorithm employsonly a single transaction list (purely horizontal representation), stored as an array. This array is process with a simple split and merge scheme, which computes a conditional database, processes this conditional database recursively. An occurrence counter and a pointer to the sorted transaction (array of contained items). This data structure is then processedrecursively to find the frequent item sets. The basic operations of the recursive processing is based on depth-first/divide-and conquer scheme. In, split steps given array is split with respect to the leading item of the first transaction. All array elements referring to transactions starting with this item are transfer to a new array. The new array created in the split step and the rest of the original arrays are combining with a procedure that is almost identical to one phase of the well-known merge sort algorithm. The main reason for the merge operation in SaM [3,8] is to keep the list sorted, so that, (1)All transactions with the same leading item are grouped together and (2)Equal transactions (or transaction suffixes) can be combined, thus reducing the number of objects to process. Fig. 1 The example database: (1) original form, (2) item frequencies, (3) transactions with sorted items, (4) lexicographically sorted transactions, and the used (5) data structure Fig. 2: The basic operations of the Split and Merge algorithm: split (left) and merge (right). The steps illustrated in Fig. 1 for a simple example transaction database are below [3,8]: 1) Step 1: Shows the transaction database in its original form. 2) Step 2: The frequencies of individual items are determined from this input in order to be able to discard infrequent items immediately. If we assume a minimum support of three transactions for our example, there areno infrequent items, so all items are kept 3) Step 3: The (frequent) items in each transaction are sorting according to their frequency in the transaction database, since it well known that processing the items in the order of increasing frequency usually leads to the shortest execution times. 4) Step 4: The transactions are sorted lexicographically into descending order, with item comparisons again being decided by the item frequencies, although here the item with the higher frequency precedes the item with the lower frequency. 5) Step 5: The data structure on which SaM operates is built by combining equal transactions and setting up an array, in which each element consists of two fields: an occurrence counter and a pointer to the sorted transaction. This data structure is then processed recursively to find the frequent item sets. The basic operations in divide-and-conquer scheme reviewed [3,2] in Fig. 3.3.2. In the split step (see the left part of Figure) the given array is split w.r.t. the leading item of the first transaction (item e in our example): all array elements referring to transactions starting with this item are transferred to a new array. In this process, the pointer (in) to the transaction is advance by one item, so that the common leading item will remove from all transactions. Obviously,
  • 3. Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions (IJSRD/Vol. 1/Issue 3/2013/0077) All rights reserved by www.ijsrd.com 703 this new array represents all frequent items sets containing the split item (provided this item is frequent). Likewise, Merge operation done in example. IV. PROBLEM WITH CURRENT SAM Here we will focus on frequent item set mining using divide and conquer technique in split and merge algorithm. As we have discussed on example how split is select and then merge item set is use for finding frequent. Some problems are arrives when taken results. This problem is critical at initial point. It creates problems at select item from item set and generates affected result. We will discuss problem with example for specific situation like this. Fig. 3: Problems with SaM Here one example is identifying the problem. There are 10 different transactions as shown in Fig. 4.1(Left). Now, each item frequency is initializing in shown in figure 4.1(Right). For e=3, a=3, c=5, b=8, d=8. Now, e and a have frequency are same. Then how can select first split item for algorithm. In, first step both frequency are same. So these controversy is created to select e or select a. From initial point, we have to stop the calculation if we have this type of situation. SaM algorithm given affected result when this type of situation is created. We identify this problem and still work on find solution for SaM algorithm. When we get solution, we will present our result. V. MODIFIED MECHANISM As we have discussed in problem identification, when there is situation like first both items have same frequency then result is not proper. So now we have to find solution for that. We have solution for this. For this type of situation we have proposed one solution. For n different items if we want to use this algorithm for finding frequent item set, we have to consider first two same frequency counts with passing support. Among them which we have to select is dependent on number of transaction it contains. Suppose, here E has 3 transaction and A has 4 transaction, then we have to select least of them. i.e E is selected. Fig. 4: Problem Solution We have to modify existing algorithm for reducing total execution time. In current algorithm too much scanning and sorting is used. So execution time is more. We have to modify this algorithm in such a way that result is not affected but execution time will decrease. We have made some modification for that. First check this modified algorithm steps. First two steps are as it was in Split and Merge algorithm. As discussed in problem with current split and merge algorithm. We have solved that problem with this algorithm.  After Second Step, First assign all items which passes minimum support in array.  Then according to transaction assign remaining items for each item. If any item is not starting with transaction then put it as it is.  Remove least frequency item (single) with all its transaction.  Copy and store all transaction items.  Remove next least frequency item with all is transaction.  Copy and store all transaction items.  Repeat this until transaction is empty. VI. EXPERIMENTS AND PERFORMANCE COMPARISON We present our experimental results that show that the modified split and merge method achieves reasonably good result in terms of time. We processed three datasets. Algorithm has been implemented in C and platform used is Ubuntu 11.04 - the Natty Narwhal - released in April 2011.CPU with 2GB of RAM, 8 Processor and 20GB of hard drive space is used. A. Dataset Information Data Set Chess Mushroom PUMSB Available at Frequent Itemset Mining Dataset Repository [12] Frequent Itemset Mining Dataset Repository [13] Frequent Itemset Mining Dataset Repository [14] Donated by Roberto Bayardo Roberto Bayardo Roberto Bayardo Total instances 1,18,252 1,86,852 36,29,404 Total Columns 37 23 74 Total Transaction 3196 8124 49046 Attributes type Numeric Numeric Numeric No of instances processed All instances All instances All instances Description This data was collected from Roberto Bayardo from the UCI datasets. In this dataset, moves of chess game in numeric values stored. Total no of transaction are 3196.This is one type of dense dataset. The data set listing chess This data was collected from Roberto Bayardo from the UCI datasets. In this dataset numeric values stored. Total no of transaction are 8124.This is one type of sparse dataset. The data set describing poisonous and This data was collected from Roberto Bayardo from PUMBS. In this dataset numeric values stored. Total no of transaction are 49046.This is one type of sparse dataset.
  • 4. Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions (IJSRD/Vol. 1/Issue 3/2013/0077) All rights reserved by www.ijsrd.com 704 end game positions for king vs. king and rook. edible mushrooms by different attributes. Table. 1: Dataset Information [11] B. Results We have taken results with different datasets with support threshold. We run algorithm on C framework and platform used is Ubuntu 11.CPU with 2GB of RAM, 8 Processor and 20GB of hard drive space is used. Describe results in below Table. We have found average result of execution time for Modified SaM and SaM algorithm [1, 3, 6]. We have compared our results with Eclat algorithm also. We have used item sets like Chess, Mushroom, PUMSB [12, 13, 14]. We have taken result for Eclat [3] algorithm for comparison. Eclat algorithm is used for finding Frequent Itemset Mining. We have compared this algorithm with our modified SaM and original SaM. Let us see the result of that. Total time in seconds Support MOD SaM Eclat 50 2.03 2.05 2.12 55 1.00 1.24 0.98 60 0.45 0.52 0.53 65 0.21 0.27 0.27 70 0.12 0.11 0.11 75 0.06 0.06 0.06 80 0.04 0.03 0.04 AVG 0.558571 0.611429 0.587143 Table. 2: Execution Time of Chess dataset As shown in Table 2, we have taken results for different support threshold for chess dataset. Here we compared support 50%-80% with total execution time. We have compared Eclat algorithm with our modified SaM and original SaM algorithm. The time of execution is decreased with the increase support threshold. Modified SAM gives good result as compare to other. Results show that Eclat’s performance is not good as compared to other. Fig. 5: Execution Time of Chess dataset Above Fig. 5 shows that the execution time for algorithm decreases with the increase in support threshold form 50% to 80% for chess dataset. We observed that SaM and Eclat takes more time as that compared to Modified SaM by average time. Below Table. 3 shows that the execution time for the SaM algorithm, Modified SaM and Eclat are approximately same for higher support threshold and it decreases with the decrease in support using Mushroom dataset. Total time in seconds Support MOD SaM Eclat 50 0.05 0.06 0.06 55 0.05 0.05 0.06 60 0.05 0.05 0.05 65 0.04 0.05 0.05 70 0.04 0.04 0.04 75 0.04 0.04 0.04 80 0.04 0.04 0.04 AVG 0.044286 0.047143 0.048571 Table. 3: Execution Time of Mushroom dataset Fig. 6: Execution Time of Mushroom dataset Fig. 6 shows that the execution time of SaM and Modified SaM algorithm is nearby but it can also be analyzed that the execution time of SaM, Modified SaM and Eclat is comparatively same for higher support threshold. As experimental results SaM algorithm performs excellently on dense data sets, but shows certain weaknesses on sparse data sets. As shown in Table 6.2.3, we have taken results for different support threshold for PUMSB dataset. Here we compared support 60%-80% with total execution time. The time of execution is decrease with the increase support threshold. Modified SaM performs better than Sam on sparse dataset. In sparse dataset SaM cannot perform good because of too much scanning and filtering. So Modified SaM gives good results for both sparse and dense dataset. Eclat performs averaged for PUMSB dataset. Total time in seconds Support MOD SaM Eclat 60 34.08 34.17 35.24 65 16.03 17.76 15.81 70 5.78 7.16 6.04 75 2.46 2.30 2.61 80 1.27 1.35 1.37 AVG 11.924 12.548 12.214 Table. 4: Execution Time PUMSB dataset Fig. 7: Execution Time of PUMSB dataset
  • 5. Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions (IJSRD/Vol. 1/Issue 3/2013/0077) All rights reserved by www.ijsrd.com 705 As shown in Fig. 7 shows the execution time for all the algorithms with different support threshold for PUMSB data set. The time of execution is decrease with the increase support threshold. Modified SaM gives good result as compared to SaM. For lower support our modified SaM does not give good performance for PUMSB dataset. VII. CONCLUSION AND FUTURE ENHANCEMENT In this paper, we study the frequent itemset mining and we study some of the basic algorithm of frequent itemset mining along with the one of the better algorithm for Split and Merge. After analysis of the all the things till now, we can say that SaM can’t work with some of the occasion. So we modify the current algorithm to find out the frequent itemset. We have observed frequent pattern mining algorithm with their execution time for specific datasets. In this thesis, an in-depth analysis of few algorithms is done which made a significant contribution to the search of improving the efficiency of frequent Itemset mining. By comparing our result to classical frequent item set mining algorithms like SaM and Eclat the strength and weaknesses of these algorithms were analyzed. As experimental results modified SaM algorithm performs excellently on dense data sets as well as sparse dataset up some support limit. We have found different problems in this algorithm. If this problem is not solved then result is affected. So we suggest two different methods for getting better results. As experimental results modified SaM algorithm performs excellently on data sets as compared to original SaM and Eclat. We can also compare our algorithm to another classical frequent itemset mining algorithm. Modified SaM works really better at the moment with compare to all other algorithms but we have planned to develop the algorithm which is more efficient and fast than the current version of Modified SaM and our main aim is to develop the Modified SaM such a way that consumes the less execution time compared to current version. One idea to make it more effective in terms of execution time, we have to reduce scanning and sorting such a way that preprocessing is less as compared to current. Second extension of the Modified SaM is that we can use some of the taxonomy, which eliminates the some of the items, which are not frequent, at the beginning of the stage or user can decided which type of patterns he/she wants. So it will not waste the time and memory. REFERENCES [1] Christian Borgelt. Frequent Item Set Mining, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(6):437-456, J. Wiley & Sons, Chichester, United Kingdom 2012 [2] C.Borgelt. Keeping Things Simple: Finding Frequent ItemSets by Recursive Elimination. Proc. Workshop OpenSoftware for Data Mining (OSDM’05 at KDD’05, Chicago,IL), 66– 70. ACM Press, New York, NY, USA 2005. [3] Christian Borgelt and Xiaomeng Wang , (Approximate) Frequent Item Set Mining Made Simple with a Split and Merge Algorithm, springer 2010 [4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. MIT Press, 1996. [5] C.L. Blake and C.J. Merz. UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, University of California at Irvine, CA, USA1998 [6] http://www.ics.uci.edu/˜mlearn/MLRepository [7] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. NewAlgorithms for Fast Discovery of Association Rules. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD’97), 283–296. AAAI Press, Menlo Park, CA, USA 1997. [8] R. Agrawal, T. Imielienski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proc. Conf. on Management of Data, 207– 216. ACM Press, New York, NY, USA 1993. [9] C. Borgelt. SaM: Simple Algorithms for Frequent Item Set Mining. IFSA/EUSFLAT 2009 conference- 2009. [10] J. Han, and M. Kamber, 2000. Data Mining Concepts and Techniques. Morgan Kanufmann. [11] Christian Borgelt. Efficient Implementations of Apriori and Eclat, Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, USA FIMI 2003 [12] Frequent Itemset Mining Dataset Repository. (http://fimi.ua.ac.be/data) [13] Robert Bayardo, “Frequent Itemset Mining Dataset Repository, Chess Dataset”. (http://fimi.ua.ac.be/data/chess.dat) [14] Robert Bayardo, “Frequent Itemset Mining Dataset Repository, Mushroom Dataset”. (http://fimi.ua.ac.be/data/mushroom.dat.) [15] Robert Bayardo, “Frequent Itemset Mining Dataset Repository, PUMSB Dataset”, (http://fimi.ua.ac.be/data/pumsb.dat.)