SlideShare a Scribd company logo
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
DOI: 10.5121/ijci.2015.4102 15
COMPARATIVE ANALYSISOF ASSOCIATION
RULE GENERATION ALGORITHMSIN DATA
STREAMS
Dr. S. Vijayarani1
and Ms. R. Prasannalakshmi2
,
1
Department of Computer Science,School of Computer Science & Engineering,
BharathiarUniversity,Coimbatore, Tamilnadu, India.
2
Department of Computer Science, School of Computer Science & Engineering,
Bharathiar University, Coimbatore, Tamilnadu, India.
ABSTRACT
Data mining technology is engaged in establishing helpful and unfamiliar data from the huge databases.
Generally, data mining methods are useful for static databases for knowledge extraction wherever
currently available data mining techniques are not appropriate and it also has a number of limitations for
managing dynamic databases. A data stream manages dynamic data sets and it has become one of the
essential research domains in data mining. The fundamental definition of the data stream is an arrival of
continuous and unlimited data which may not be stored fully because it needs more storage capacity. In
order to perform data analysis with this, many new data mining techniques are to be required. Data
analysis is carried out by using clustering, classification, frequent item set mining and association rule
generation. Association rule mining is one of the significant research problems in the data stream which
helps to find out the relationship between the data items in the transactional databases. This research work
concentrated on how the traditional algorithms are used for generating association rules in data streams.
The algorithms used in this work are Assoc Outliers, Frequent Item sets and Supervised Association Rule.
A number of rules generated by an algorithm and execution time are considered as the performance
factors. Experimental results give that Frequent Item set algorithm efficiency is better than Assoc Outliers
and Supervised Association Rule Algorithms. This implementation work is executed in the Tanagra data
mining tool.
KEY WORDS
Data Stream, Association Rules, Assoc Outliers, Frequent Item sets and Supervised Association Rule,
Tanagra.
1.INTRODUCTION
A data stream is an unbroken arrival of data which is boundless in nature. The foremost
individuality of the data stream is it handles primary size of unremitting data and most perhaps
infinite [1] [8]. The application locale of data streams is market-basket data analysis, cross-
marketing, catalogue manner, loss-leader analysis, industry organizations (process credit card
transactions), economic markets (stock alternates), engineering and industrial development
(power supply and manufacturing), security (traffic engineering observing) and web (web logs
and webpage click streams). Essential data mining tasks performed in data streams are clustering,
classification, association rule generation, query optimization and frequent item set mining [2].
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
16
Association rules are described by finding the frequent pattern, links, relationship and the related
structures among the data objects in the databases and in order repositories. There are two
important steps in association rule mining; initial one is to find the frequent data items and the
next step is to generate association rules via these frequent data items [4] [7]. The association
rule mining problem is defined as, assume a given set of items I= {I1,I2,…Im} and a database of
transactions D={t1,t2,…tn} where t i={Ii1,Ii2,….Iik} and Iijє I, an association rule is an inference of
the form X ⇒Y where X,Y ⊂ I are sets of items called item sets and X ∩ Y=θ [5].
Two important events support and confidence are used for association rule generation. The
support of an item (or set of items) is the % of transactions in which that item (or items) happens.
The support (s) for an association rule X ⇒Y is the percentage of transactions in the database that
contain X ∪ Y. The confidence or strength (α) for an association rule X ⇒Y is the ratio of the
number of transactions that contain X ∪ Y to the number of transactions that include X. Usually,
confidence measures the strength of the rule, while the support measures how frequently it
should occur in the database [6]. Some of the important association rule mining algorithms are, a
priori, fp-tree, fp-growth, dynamic item set counting, ECLAT, DCLAT and RARM.
This research work mainly focuses on generating association rules from data streams. The
nonstop arrival of data is divided into many partitions as windows and it is stored in the form
databases. For each and every partition, association rule generation algorithms are applied to
generate the association rules. In this work, the traditional association rule algorithms specifically
Assoc Outliers, Frequent Items and Supervised Association Rule are used for generating
association rules in each partition. From this, we come to know that the advantages, drawbacks
and limitations of these conventional association rule mining algorithms for generating
association rules in data streams [8].
The remaining portion of this paper is prepared as follows. Proposed methodology and the
traditional association rule algorithms are explained in Section 2. Section 3 talks about
experimental results and conclusion is given in Section 4 [16].
2. PROPOSED METHODOLOGY
The system architecture of the proposed work is represented in Figure 1.
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
17
Figure 1. System Architecture
2.1 Dataset
The connect data set is used in this work. It is extorted from http://fimi.ua.ac.be/data/connect.dat.
It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are used.
In data streams, we imagine that the nonstop arrival of data is partitioned into five windows with
a fixed size, i.e. W1, W2, W3....... Wn. [17].
Association Rule Generation
In order to generate association rules, three types of algorithms are used
Assoc outlines' (Association Outliers).
Frequent Item Set Mining.
Supervised Association Rule.
2.1.1 Association Outliers
An association outlier algorithm is used to build rules from an attribute value dataset.Important
terms used in this algorithm are,
A1, A2,…, Am are attributes.
D1, D2,…,Dmisdata items.
Let z(i)
to be aith
occurrence of z . A is the value on the get attribute of the eventi. z(i)
can
be represented as, z(i) =
(z1
(i)
,z2
(i)
, …, zm
(i)
), where zk
(i)
= z(i)
. Ak∈Dk ,k∈ { l,….,m}. Z is the
set of all events.
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
18
Table 1.Pseudo Code for Association Outliers
Step 1- Get input of the record set is contained database DB
and a rule set is belong to R
Step 2-
1. Initializes I is 0 (NULL) value
2. For each transaction t belongs to DB. i.e., t ∈ DB
3. Candidate Generation for Association outliers with the
transaction is Ct
0
;
4. For (i=0; R’=R; i++)
5. Until the candidate generation is growing
6. Temp is NULL;
7. For each transaction t is equal to X -> Y belongs to R’
8. If ܺ ൌൌ ‫ܥ‬ 1
i
then
9. Append Y to temp and delete t from rule generation
R’;
10. The sum of the candidate generation is Ct
i+1
= Ct
i
union
by temp, i++;
11. Transaction t = mod of Ct
i
– Ct
0
divide by mode of
Ct++;
12. Return NULL;
2.1.2 Frequent Item Set Algorithm
A description of frequent item set mining algorithms are instinctive, a set of items that emerge in
many containers is assumed to be “frequent”. Frequent items to be formal; there are a number of
us, entitled the support threshold. If, I is a place of items, the support for I is the amount of
containers for which I is a subset. Applications of frequent item sets are used in supermarkets,
and unique purpose of thisis used for analysis of true market baskets. That is, superstores and
chain stores, record the contents of each market basket (physical shopping cart) brought to the list
for checkout. At this time the “items” are the unlike products that the store sells, and the
“containers” are the sets of items in market-basket. A most important chain might sell 100,000
different items and accumulate data about millions of market baskets. Through finding frequent
item sets, a merchant can find out the items which are frequently purchased. [12][16].
Table 2. Pseudo Code for Frequent Item Sets
Pseudo code
Step 1- Ck: Candidate itemset of size k Lk: frequent
itemset of size k
Step 2-
1. L1 = {frequent items};
2. for (k = 1; Lk !=Æ; k++) do
3. Ck+1 = candidates generated from Lk;
4. For each transaction to inthe database do
5. Increment the count of all candidates in Ck+1 that
are contained in it;
6. Endfor;
7. Lk+1 = candidates in Ck+1with min_support
8. Endfor; return ‫׫‬kLk;
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
19
2.1.3 SPV Assoc Rule (Supervised Association Rule)
This algorithm was originally developed tothe relational variables with constant position. The
predictive association rules explore the associations between the items that differentiate a
dependent attribute. Thisalgorithmisused in supervised learning framework.The algorithm is not
truly customized. Looking at the association rules is just limited to item sets that consist of the
dependent variable. The computation time is reduced after that, there are two components of
Tanagra are devoted to this mission: SPV Assoc Rule and SPV Assoc Rule Tree. To compare the
predictable approaches, the machinery of Tanagra has an additional specificity, it can denote the
class value "dependent variable = value" that desire to forecast. This is decisive for occurrence
when the preceding probability of the dependent changeable values is very dissimilar. However,
it was in the perspective of multivariate characterization of collections of individuals. These
individualsare compared to the group characterization component. [18].
Table 3. Pseudo Code for Supervised Association Rules
Supervised Association _ Rule _APRIORI
Step 1- Input candidate item set 1 and 2
Step 2-
1. If supervised item sets k=2
2. For each frequent item set f∈ F1 do
3. Candidate generation item sets are inserted to frequent item set fins C1
4. End for
5. C1_class_label, C1_other is equal to the split of C1 is groups of class label into the
C1_class_label and the other frequent item sets into C1_other, CL.
6. For each candidate item sets C1∈ C1_Label do
7. Generate the item set of class_label items and non_class_label items.
8. For each candidate itemset c2 ∈C1_other do {
9. Now Σ(c) = form of c1 and c2.
10. Class_Label candidate item sets c is inserted into C2.
11. } }
12. For each candidate item set c1 ∈C1_label do {
13. Identify all the class labels in the array of C1_label that is after c1
14. For each candidate item sets c2 ∈C post do {
15. Now Σ (c) = form of c1 and c2.
16. Insert the c into the C2
17. }}Else
18. For each i1 is count Ci{
19. For each i2 is frequent item set Fk-1 {
20. If (the same item sets are included by k-2 items of i1, i1) ^ (different fromthe
last item set are i1, i2) {
21. Candidate generation C= the form of first k-1 items of i1 and last items of i2
22. Insert c into the Ck
23. } } } }
24. Return Ck
International Journal on Cyberne
Table 4. Rule Generation for Association Outliers
3.EXPERIMENTAL RESULTS
The connect data set is used in this work. It is extorted from
It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are
used.The continuous arrival of data is partitioned into five windows with
W3, W4, W5[16].A number of rules generated and execution time
factors.
Table 5. Execution Time for Association Outliers
Figure 2.Association Outliers for Rule Generation.
Figure 2 gives the information about the number of association rules generated by the association
outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
and confidence values. i.e
0
100
200
300
400
500
600
W1
Rule Generation for Association Outliers
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
Table 4. Rule Generation for Association Outliers
EXPERIMENTAL RESULTS
The connect data set is used in this work. It is extorted from http://fimi.ua.ac.be/data/connect.dat
It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are
The continuous arrival of data is partitioned into five windows with a fixed size, i.e. W
of rules generated and execution time is considered as the performance
Table 5. Execution Time for Association Outliers
Figure 2.Association Outliers for Rule Generation.
Figure 2 gives the information about the number of association rules generated by the association
outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
for five windows.
W1 W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Rule Generation for Association Outliers
Threshold σ = 25, C = 55
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Rules
σ = 25,
C = 55
231 231 328 359
199 57 187 359
234 125 156 421
231 251 312 499
241 297 297 484
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Rules
σ = 25,
C = 55
2220 2240 2290 2375
2234 2250 2210 2241
2315 2311 2342 2386
2936 2913 2918 2954
2940 2932 2948 2979
4, No. 1, February 2015
20
http://fimi.ua.ac.be/data/connect.dat.
It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are
a fixed size, i.e. W1, W2,
as the performance
Figure 2 gives the information about the number of association rules generated by the association
outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
International Journal on Cyberne
Table 6.Rule Generation for Frequent Item Set
Figure 3.Execution time for Association Outliers.
Figure 3 gives the information about the time computation by the association outlier algorithm for
1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values.
i.e for five windows.
Table 7 Time Computation for Frequent Item Set
0
50
100
150
200
250
300
350
400
W1
Time Computation
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
ble 6.Rule Generation for Frequent Item Set
3.Execution time for Association Outliers.
Figure 3 gives the information about the time computation by the association outlier algorithm for
1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values.
Table 7 Time Computation for Frequent Item Set
W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Time Computation of Association Outliers
Threshold σ = 25, C = 55
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Time (ms)
σ = 25,
C = 55
190 234 278 240
256 125 121 199
44 74 184 202
220 256 120 240
303 375 109 183
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Time (s)
σ = 25,
C = 55
0.03 0.02 0.03 0.04
0.01 0.1 0 0.02
0.01 0.01 0.09 0.02
0.02 0 0.01 0.03
0.04 0.01 0.01 0.09
4, No. 1, February 2015
21
Figure 3 gives the information about the time computation by the association outlier algorithm for
1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values.
International Journal on Cyberne
Figure 4 provides the information about the number of association rules generated by the frequent
item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
and confidence values. i.e.
Table 8. Rule Generation for SPV Association Rule
Figure 5.
Figure 5. Provides the information about the time computation by the frequent item set algorithm
for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence
0
500
1000
1500
2000
2500
3000
3500
W1
Rule Generation for Frequent Item Set
Threshold
0
0.02
0.04
0.06
0.08
0.1
0.12
W1
Time Computation for Frequent Item
Threshold
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
Figure 4 provides the information about the number of association rules generated by the frequent
item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
for five windows.
Table 8. Rule Generation for SPV Association Rule
Figure 5.Time Computation for Frequent Item Set.
Provides the information about the time computation by the frequent item set algorithm
for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence
W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Rule Generation for Frequent Item Set
Threshold σ = 25, C= 55
W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Time Computation for Frequent Item Set
Threshold σ = 25, C = 55
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Rules
σ = 25,
C = 55
216 218 212 220
122 135 131 120
145 156 167 144
202 199 256 193
181 201 210 198
4, No. 1, February 2015
22
Figure 4 provides the information about the number of association rules generated by the frequent
item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
Provides the information about the time computation by the frequent item set algorithm
for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence
International Journal on Cyberne
values. i.e. for five windows. An experimental result of th
is better than the Association outlier algorithm and SPV Association rule algorithm.
Table 9. Time Computation for SPV Association Rule
Figure 6.Rule Generations for SPV Association Rule.
Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for
1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values.
i.e. for five windows.
0
50
100
150
200
250
300
W1 W2
Rule Generation for SPV Association Rule
Threshold
Window Size Threshold
W1
σ =
C =
W2
W3
W4
W5
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
for five windows. An experimental result of the frequent item set algorithm
is better than the Association outlier algorithm and SPV Association rule algorithm.
Table 9. Time Computation for SPV Association Rule
Rule Generations for SPV Association Rule.
Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for
5K and 10K of datasets with two different thresholds like support and confidence values.
W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Rule Generation for SPV Association Rule
Threshold σ = 25, C = 45
Threshold
1000 Ds 2000 Ds 5000 Ds 10,000 Ds
Time (ms)
σ = 25,
C = 55
62 64 51 53
31 33 31 30
46 47 46 61
46 23 46 44
33 31 33 64
4, No. 1, February 2015
23
e frequent item set algorithm
Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for
5K and 10K of datasets with two different thresholds like support and confidence values.
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
24
Figure 7.Time Computation for SPV Association Rule.
Figure 7offers the information about the time computation by the Supervised Association Rule
algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and
confidence values. i.e. for five windows.
4. CONCLUSION
This main objective of this work is to compare the traditional association rule mining algorithms
for generating association rules in data streams. From the experimental results, it is observed that
the performance of frequent item set mining algorithm is good and it has produced better results
than association outliers and SPV association rule mining algorithms. These algorithms scanned
the database more than once and hence it needs more execution time. In future new algorithms
are to be developed in order to reduce the number of scans and execution time.
REFERENCES
[1] Aggarwal C (2003). A Framework for Diagnosing Changes in Evolving Data Streams.ACM
SIGMOD Conference.
[2] Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association rules. Proc. 20th VLDB
conference, Santiago, Chile, 1994.
[3] A. Savasere, E. Omiecinski, and S.B. Navathe, “An efficient algorithm for mining association rules in
large databases,” Intl. Conf. on Very Large Databases, pp. 432–444, 1995.
[4] Charu C. Aggarwal “Data Stream Models and algorithms”-Data streaming book 2009, Springer.
[5] Christian Hidber. Online Association rule mining. SIGMOD ’99 Philadelphia PA. ACM 1-58113-
084-8/99/05, 1999.
[6] CharanjeetKaur, Association Rule Mining using Apriori Algorithm: A Survey ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013.
[7] “Data mining techniques “by Arun k Pujari.
[8] “Data Streams: An Overview and Scientific Applications” Charu C. Aggarwal.
[9] “Data Mining: Introductory and Advanced Topics” Margaret H. Dunham.
[10] Frequent item set mining data set repository, http:// fimi.cshelsinki.fi/data/
[11] Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006.
[12] Kamini Nalavade, B.B. Meshram, “Finding Frequent Item sets using AprioriAlgorihm to
Detect Intrusions in Large Dataset”, International Journal of Computer Applications & Information
Technology Vol. 6, Issue I June July 2014 (ISSN: 2278-7720). Page | 84
0
10
20
30
40
50
60
70
W1 W2 W3 W4 W5
1000 Ds
2000 Ds
5000 Ds
10,000 Ds
Time Computation for SPV Association Rule
Threshold σ = 25, C = 55
International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015
25
[13] “Mining frequent patterns across multiple data streams” Jing Guo, Peng Zhang, Jianlong Tan and li
Guo, 2011.
[14] Nan Jiang and Le Gruenwald, “Research Issues in Data Stream Association Rule Mining”- SIGMOD
Record, Vol. 35, No. 1, Mar. 2006.
[15] RakeshAgrawal, RamakrishnanSrikant; Fast Algorithms for Mining Association Rules; Int'l Conf. on
Very Large Databases; September 1994.
[16] S.Vijayarani et al, “ Mining Frequent Item Sets over Data Streams using Éclat Algorithm” ,
International Conference on Research Trends in Computer Technologies (ICRTCT - 2013)
Proceedings published in International Journal of Computer Applications® (IJCA) (0975 – 8887) 27.
[17] Website: Tanagra.software.informer.com.
[18] Website:http://data‐mining‐tutorials.blogspot.com/2008/11/supervised‐association‐rules.html.
AUTHORS
Dr. S. Vijayarani, MCA, M.Phil, Ph.D is working as Assistant Professor in the
School of Computer Science and Engineering, Bharathiar University, Coimbatore.
Her fields of research interest are data mining, privacy and security issues in data
mining and data streams. She has published papers in the international journals and
presented research papers in international and national conferences.
Ms. R.Prasannalakshmihas completed M.C.A in Computer Applications. She is
currently pursuing her M.Phil in Computer Science in the School of Computer
Science and Engineering, Bharathiar University, Coimbatore. Her fields of interest
are Data Streams and privacy preserving in Data mining.

More Related Content

PDF
Efficient Temporal Association Rule Mining
PDF
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
PDF
Dy33753757
PDF
Comparative study of frequent item set in data mining
PDF
Output Privacy Protection With Pattern-Based Heuristic Algorithm
PDF
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
PDF
A statistical data fusion technique in virtual data integration environment
PDF
Cf33497503
Efficient Temporal Association Rule Mining
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
Dy33753757
Comparative study of frequent item set in data mining
Output Privacy Protection With Pattern-Based Heuristic Algorithm
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
A statistical data fusion technique in virtual data integration environment
Cf33497503

What's hot (14)

PDF
Intelligent Supermarket using Apriori
PDF
Clustering of Big Data Using Different Data-Mining Techniques
PDF
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
PDF
An improved apriori algorithm for association rules
PDF
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
PDF
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
PDF
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
PDF
REVIEW: Frequent Pattern Mining Techniques
PDF
Mining Fuzzy Association Rules from Web Usage Quantitative Data
PDF
IRJET- Missing Data Imputation by Evidence Chain
PDF
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
PDF
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
PPTX
Mining frequent patterns association
PDF
A Trinity Construction for Web Extraction Using Efficient Algorithm
Intelligent Supermarket using Apriori
Clustering of Big Data Using Different Data-Mining Techniques
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An improved apriori algorithm for association rules
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
REVIEW: Frequent Pattern Mining Techniques
Mining Fuzzy Association Rules from Web Usage Quantitative Data
IRJET- Missing Data Imputation by Evidence Chain
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Mining frequent patterns association
A Trinity Construction for Web Extraction Using Efficient Algorithm
Ad

Viewers also liked (19)

PDF
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
PDF
Techniques for detection of solitary pulmonary nodules in human lung and thei...
PDF
DIGITAL INVESTIGATION USING HASHBASED CARVING
PDF
4515ijci01
PDF
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACH
PDF
INVIVO PATTERN RECOGNITION AND DIGITAL IMAGE ANALYSIS OF SHEAR STRESS DISTRIB...
PDF
A HYBRID K-HARMONIC MEANS WITH ABCCLUSTERING ALGORITHM USING AN OPTIMAL K VAL...
PDF
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
PDF
EFFECTIVE BANDWIDTH ANALYSIS OF MIMO BASED MOBILE CLOUD COMPUTING
PDF
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...
PDF
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
PDF
DG FED MULTILEVEL INVERTER BASED D-STATCOM FOR VARIOUS LOADING CONDITIONS
PDF
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
PDF
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
PDF
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDS
PDF
A study of index poisoning in peer topeer
PDF
A SURVEY OF THE S TATE OF THE A RT IN Z IG B EE
PDF
P LACEMENT O F E NERGY A WARE W IRELESS M ESH N ODES F OR E-L EARNING...
PDF
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
Techniques for detection of solitary pulmonary nodules in human lung and thei...
DIGITAL INVESTIGATION USING HASHBASED CARVING
4515ijci01
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACH
INVIVO PATTERN RECOGNITION AND DIGITAL IMAGE ANALYSIS OF SHEAR STRESS DISTRIB...
A HYBRID K-HARMONIC MEANS WITH ABCCLUSTERING ALGORITHM USING AN OPTIMAL K VAL...
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
EFFECTIVE BANDWIDTH ANALYSIS OF MIMO BASED MOBILE CLOUD COMPUTING
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
DG FED MULTILEVEL INVERTER BASED D-STATCOM FOR VARIOUS LOADING CONDITIONS
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDS
A study of index poisoning in peer topeer
A SURVEY OF THE S TATE OF THE A RT IN Z IG B EE
P LACEMENT O F E NERGY A WARE W IRELESS M ESH N ODES F OR E-L EARNING...
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...
Ad

Similar to Comparative analysis of association rule generation algorithms in data streams (20)

PDF
J0945761
PDF
PDF
An Approach of Improvisation in Efficiency of Apriori Algorithm
PDF
5 parallel implementation 06299286
PDF
B0950814
PDF
Review on: Techniques for Predicting Frequent Items
PDF
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
PDF
Discovering Frequent Patterns with New Mining Procedure
PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
PDF
IRJET- Improving the Performance of Smart Heterogeneous Big Data
PDF
Ej36829834
PDF
Scalable frequent itemset mining using heterogeneous computing par apriori a...
PDF
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
PDF
Volume 2-issue-6-2081-2084
PDF
Volume 2-issue-6-2081-2084
PDF
A literature review of modern association rule mining techniques
PDF
An improvised frequent pattern tree
PDF
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
PDF
An Improved Differential Evolution Algorithm for Data Stream Clustering
J0945761
An Approach of Improvisation in Efficiency of Apriori Algorithm
5 parallel implementation 06299286
B0950814
Review on: Techniques for Predicting Frequent Items
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
Discovering Frequent Patterns with New Mining Procedure
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
IRJET- Improving the Performance of Smart Heterogeneous Big Data
Ej36829834
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
A literature review of modern association rule mining techniques
An improvised frequent pattern tree
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
An Improved Differential Evolution Algorithm for Data Stream Clustering

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx

Comparative analysis of association rule generation algorithms in data streams

  • 1. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 DOI: 10.5121/ijci.2015.4102 15 COMPARATIVE ANALYSISOF ASSOCIATION RULE GENERATION ALGORITHMSIN DATA STREAMS Dr. S. Vijayarani1 and Ms. R. Prasannalakshmi2 , 1 Department of Computer Science,School of Computer Science & Engineering, BharathiarUniversity,Coimbatore, Tamilnadu, India. 2 Department of Computer Science, School of Computer Science & Engineering, Bharathiar University, Coimbatore, Tamilnadu, India. ABSTRACT Data mining technology is engaged in establishing helpful and unfamiliar data from the huge databases. Generally, data mining methods are useful for static databases for knowledge extraction wherever currently available data mining techniques are not appropriate and it also has a number of limitations for managing dynamic databases. A data stream manages dynamic data sets and it has become one of the essential research domains in data mining. The fundamental definition of the data stream is an arrival of continuous and unlimited data which may not be stored fully because it needs more storage capacity. In order to perform data analysis with this, many new data mining techniques are to be required. Data analysis is carried out by using clustering, classification, frequent item set mining and association rule generation. Association rule mining is one of the significant research problems in the data stream which helps to find out the relationship between the data items in the transactional databases. This research work concentrated on how the traditional algorithms are used for generating association rules in data streams. The algorithms used in this work are Assoc Outliers, Frequent Item sets and Supervised Association Rule. A number of rules generated by an algorithm and execution time are considered as the performance factors. Experimental results give that Frequent Item set algorithm efficiency is better than Assoc Outliers and Supervised Association Rule Algorithms. This implementation work is executed in the Tanagra data mining tool. KEY WORDS Data Stream, Association Rules, Assoc Outliers, Frequent Item sets and Supervised Association Rule, Tanagra. 1.INTRODUCTION A data stream is an unbroken arrival of data which is boundless in nature. The foremost individuality of the data stream is it handles primary size of unremitting data and most perhaps infinite [1] [8]. The application locale of data streams is market-basket data analysis, cross- marketing, catalogue manner, loss-leader analysis, industry organizations (process credit card transactions), economic markets (stock alternates), engineering and industrial development (power supply and manufacturing), security (traffic engineering observing) and web (web logs and webpage click streams). Essential data mining tasks performed in data streams are clustering, classification, association rule generation, query optimization and frequent item set mining [2].
  • 2. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 16 Association rules are described by finding the frequent pattern, links, relationship and the related structures among the data objects in the databases and in order repositories. There are two important steps in association rule mining; initial one is to find the frequent data items and the next step is to generate association rules via these frequent data items [4] [7]. The association rule mining problem is defined as, assume a given set of items I= {I1,I2,…Im} and a database of transactions D={t1,t2,…tn} where t i={Ii1,Ii2,….Iik} and Iijє I, an association rule is an inference of the form X ⇒Y where X,Y ⊂ I are sets of items called item sets and X ∩ Y=θ [5]. Two important events support and confidence are used for association rule generation. The support of an item (or set of items) is the % of transactions in which that item (or items) happens. The support (s) for an association rule X ⇒Y is the percentage of transactions in the database that contain X ∪ Y. The confidence or strength (α) for an association rule X ⇒Y is the ratio of the number of transactions that contain X ∪ Y to the number of transactions that include X. Usually, confidence measures the strength of the rule, while the support measures how frequently it should occur in the database [6]. Some of the important association rule mining algorithms are, a priori, fp-tree, fp-growth, dynamic item set counting, ECLAT, DCLAT and RARM. This research work mainly focuses on generating association rules from data streams. The nonstop arrival of data is divided into many partitions as windows and it is stored in the form databases. For each and every partition, association rule generation algorithms are applied to generate the association rules. In this work, the traditional association rule algorithms specifically Assoc Outliers, Frequent Items and Supervised Association Rule are used for generating association rules in each partition. From this, we come to know that the advantages, drawbacks and limitations of these conventional association rule mining algorithms for generating association rules in data streams [8]. The remaining portion of this paper is prepared as follows. Proposed methodology and the traditional association rule algorithms are explained in Section 2. Section 3 talks about experimental results and conclusion is given in Section 4 [16]. 2. PROPOSED METHODOLOGY The system architecture of the proposed work is represented in Figure 1.
  • 3. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 17 Figure 1. System Architecture 2.1 Dataset The connect data set is used in this work. It is extorted from http://fimi.ua.ac.be/data/connect.dat. It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are used. In data streams, we imagine that the nonstop arrival of data is partitioned into five windows with a fixed size, i.e. W1, W2, W3....... Wn. [17]. Association Rule Generation In order to generate association rules, three types of algorithms are used Assoc outlines' (Association Outliers). Frequent Item Set Mining. Supervised Association Rule. 2.1.1 Association Outliers An association outlier algorithm is used to build rules from an attribute value dataset.Important terms used in this algorithm are, A1, A2,…, Am are attributes. D1, D2,…,Dmisdata items. Let z(i) to be aith occurrence of z . A is the value on the get attribute of the eventi. z(i) can be represented as, z(i) = (z1 (i) ,z2 (i) , …, zm (i) ), where zk (i) = z(i) . Ak∈Dk ,k∈ { l,….,m}. Z is the set of all events.
  • 4. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 18 Table 1.Pseudo Code for Association Outliers Step 1- Get input of the record set is contained database DB and a rule set is belong to R Step 2- 1. Initializes I is 0 (NULL) value 2. For each transaction t belongs to DB. i.e., t ∈ DB 3. Candidate Generation for Association outliers with the transaction is Ct 0 ; 4. For (i=0; R’=R; i++) 5. Until the candidate generation is growing 6. Temp is NULL; 7. For each transaction t is equal to X -> Y belongs to R’ 8. If ܺ ൌൌ ‫ܥ‬ 1 i then 9. Append Y to temp and delete t from rule generation R’; 10. The sum of the candidate generation is Ct i+1 = Ct i union by temp, i++; 11. Transaction t = mod of Ct i – Ct 0 divide by mode of Ct++; 12. Return NULL; 2.1.2 Frequent Item Set Algorithm A description of frequent item set mining algorithms are instinctive, a set of items that emerge in many containers is assumed to be “frequent”. Frequent items to be formal; there are a number of us, entitled the support threshold. If, I is a place of items, the support for I is the amount of containers for which I is a subset. Applications of frequent item sets are used in supermarkets, and unique purpose of thisis used for analysis of true market baskets. That is, superstores and chain stores, record the contents of each market basket (physical shopping cart) brought to the list for checkout. At this time the “items” are the unlike products that the store sells, and the “containers” are the sets of items in market-basket. A most important chain might sell 100,000 different items and accumulate data about millions of market baskets. Through finding frequent item sets, a merchant can find out the items which are frequently purchased. [12][16]. Table 2. Pseudo Code for Frequent Item Sets Pseudo code Step 1- Ck: Candidate itemset of size k Lk: frequent itemset of size k Step 2- 1. L1 = {frequent items}; 2. for (k = 1; Lk !=Æ; k++) do 3. Ck+1 = candidates generated from Lk; 4. For each transaction to inthe database do 5. Increment the count of all candidates in Ck+1 that are contained in it; 6. Endfor; 7. Lk+1 = candidates in Ck+1with min_support 8. Endfor; return ‫׫‬kLk;
  • 5. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 19 2.1.3 SPV Assoc Rule (Supervised Association Rule) This algorithm was originally developed tothe relational variables with constant position. The predictive association rules explore the associations between the items that differentiate a dependent attribute. Thisalgorithmisused in supervised learning framework.The algorithm is not truly customized. Looking at the association rules is just limited to item sets that consist of the dependent variable. The computation time is reduced after that, there are two components of Tanagra are devoted to this mission: SPV Assoc Rule and SPV Assoc Rule Tree. To compare the predictable approaches, the machinery of Tanagra has an additional specificity, it can denote the class value "dependent variable = value" that desire to forecast. This is decisive for occurrence when the preceding probability of the dependent changeable values is very dissimilar. However, it was in the perspective of multivariate characterization of collections of individuals. These individualsare compared to the group characterization component. [18]. Table 3. Pseudo Code for Supervised Association Rules Supervised Association _ Rule _APRIORI Step 1- Input candidate item set 1 and 2 Step 2- 1. If supervised item sets k=2 2. For each frequent item set f∈ F1 do 3. Candidate generation item sets are inserted to frequent item set fins C1 4. End for 5. C1_class_label, C1_other is equal to the split of C1 is groups of class label into the C1_class_label and the other frequent item sets into C1_other, CL. 6. For each candidate item sets C1∈ C1_Label do 7. Generate the item set of class_label items and non_class_label items. 8. For each candidate itemset c2 ∈C1_other do { 9. Now Σ(c) = form of c1 and c2. 10. Class_Label candidate item sets c is inserted into C2. 11. } } 12. For each candidate item set c1 ∈C1_label do { 13. Identify all the class labels in the array of C1_label that is after c1 14. For each candidate item sets c2 ∈C post do { 15. Now Σ (c) = form of c1 and c2. 16. Insert the c into the C2 17. }}Else 18. For each i1 is count Ci{ 19. For each i2 is frequent item set Fk-1 { 20. If (the same item sets are included by k-2 items of i1, i1) ^ (different fromthe last item set are i1, i2) { 21. Candidate generation C= the form of first k-1 items of i1 and last items of i2 22. Insert c into the Ck 23. } } } } 24. Return Ck
  • 6. International Journal on Cyberne Table 4. Rule Generation for Association Outliers 3.EXPERIMENTAL RESULTS The connect data set is used in this work. It is extorted from It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are used.The continuous arrival of data is partitioned into five windows with W3, W4, W5[16].A number of rules generated and execution time factors. Table 5. Execution Time for Association Outliers Figure 2.Association Outliers for Rule Generation. Figure 2 gives the information about the number of association rules generated by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. i.e 0 100 200 300 400 500 600 W1 Rule Generation for Association Outliers Window Size Threshold W1 σ = C = W2 W3 W4 W5 Window Size Threshold W1 σ = C = W2 W3 W4 W5 International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 Table 4. Rule Generation for Association Outliers EXPERIMENTAL RESULTS The connect data set is used in this work. It is extorted from http://fimi.ua.ac.be/data/connect.dat It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are The continuous arrival of data is partitioned into five windows with a fixed size, i.e. W of rules generated and execution time is considered as the performance Table 5. Execution Time for Association Outliers Figure 2.Association Outliers for Rule Generation. Figure 2 gives the information about the number of association rules generated by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support for five windows. W1 W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rule Generation for Association Outliers Threshold σ = 25, C = 55 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rules σ = 25, C = 55 231 231 328 359 199 57 187 359 234 125 156 421 231 251 312 499 241 297 297 484 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rules σ = 25, C = 55 2220 2240 2290 2375 2234 2250 2210 2241 2315 2311 2342 2386 2936 2913 2918 2954 2940 2932 2948 2979 4, No. 1, February 2015 20 http://fimi.ua.ac.be/data/connect.dat. It consists of 67,558 instances and 48 attributes. In this work, 1K, 2K and 5K instances are a fixed size, i.e. W1, W2, as the performance Figure 2 gives the information about the number of association rules generated by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support
  • 7. International Journal on Cyberne Table 6.Rule Generation for Frequent Item Set Figure 3.Execution time for Association Outliers. Figure 3 gives the information about the time computation by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. i.e for five windows. Table 7 Time Computation for Frequent Item Set 0 50 100 150 200 250 300 350 400 W1 Time Computation Window Size Threshold W1 σ = C = W2 W3 W4 W5 Window Size Threshold W1 σ = C = W2 W3 W4 W5 International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 ble 6.Rule Generation for Frequent Item Set 3.Execution time for Association Outliers. Figure 3 gives the information about the time computation by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. Table 7 Time Computation for Frequent Item Set W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time Computation of Association Outliers Threshold σ = 25, C = 55 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time (ms) σ = 25, C = 55 190 234 278 240 256 125 121 199 44 74 184 202 220 256 120 240 303 375 109 183 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time (s) σ = 25, C = 55 0.03 0.02 0.03 0.04 0.01 0.1 0 0.02 0.01 0.01 0.09 0.02 0.02 0 0.01 0.03 0.04 0.01 0.01 0.09 4, No. 1, February 2015 21 Figure 3 gives the information about the time computation by the association outlier algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values.
  • 8. International Journal on Cyberne Figure 4 provides the information about the number of association rules generated by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. i.e. Table 8. Rule Generation for SPV Association Rule Figure 5. Figure 5. Provides the information about the time computation by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence 0 500 1000 1500 2000 2500 3000 3500 W1 Rule Generation for Frequent Item Set Threshold 0 0.02 0.04 0.06 0.08 0.1 0.12 W1 Time Computation for Frequent Item Threshold Window Size Threshold W1 σ = C = W2 W3 W4 W5 International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 Figure 4 provides the information about the number of association rules generated by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support for five windows. Table 8. Rule Generation for SPV Association Rule Figure 5.Time Computation for Frequent Item Set. Provides the information about the time computation by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rule Generation for Frequent Item Set Threshold σ = 25, C= 55 W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time Computation for Frequent Item Set Threshold σ = 25, C = 55 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rules σ = 25, C = 55 216 218 212 220 122 135 131 120 145 156 167 144 202 199 256 193 181 201 210 198 4, No. 1, February 2015 22 Figure 4 provides the information about the number of association rules generated by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support Provides the information about the time computation by the frequent item set algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence
  • 9. International Journal on Cyberne values. i.e. for five windows. An experimental result of th is better than the Association outlier algorithm and SPV Association rule algorithm. Table 9. Time Computation for SPV Association Rule Figure 6.Rule Generations for SPV Association Rule. Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. i.e. for five windows. 0 50 100 150 200 250 300 W1 W2 Rule Generation for SPV Association Rule Threshold Window Size Threshold W1 σ = C = W2 W3 W4 W5 International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 for five windows. An experimental result of the frequent item set algorithm is better than the Association outlier algorithm and SPV Association rule algorithm. Table 9. Time Computation for SPV Association Rule Rule Generations for SPV Association Rule. Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for 5K and 10K of datasets with two different thresholds like support and confidence values. W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Rule Generation for SPV Association Rule Threshold σ = 25, C = 45 Threshold 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time (ms) σ = 25, C = 55 62 64 51 53 31 33 31 30 46 47 46 61 46 23 46 44 33 31 33 64 4, No. 1, February 2015 23 e frequent item set algorithm Figure 6 shows the association rules generated by the Supervised Association Rule algorithm for 5K and 10K of datasets with two different thresholds like support and confidence values.
  • 10. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 24 Figure 7.Time Computation for SPV Association Rule. Figure 7offers the information about the time computation by the Supervised Association Rule algorithm for 1K, 2K, 5K and 10K of datasets with two different thresholds like support and confidence values. i.e. for five windows. 4. CONCLUSION This main objective of this work is to compare the traditional association rule mining algorithms for generating association rules in data streams. From the experimental results, it is observed that the performance of frequent item set mining algorithm is good and it has produced better results than association outliers and SPV association rule mining algorithms. These algorithms scanned the database more than once and hence it needs more execution time. In future new algorithms are to be developed in order to reduce the number of scans and execution time. REFERENCES [1] Aggarwal C (2003). A Framework for Diagnosing Changes in Evolving Data Streams.ACM SIGMOD Conference. [2] Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association rules. Proc. 20th VLDB conference, Santiago, Chile, 1994. [3] A. Savasere, E. Omiecinski, and S.B. Navathe, “An efficient algorithm for mining association rules in large databases,” Intl. Conf. on Very Large Databases, pp. 432–444, 1995. [4] Charu C. Aggarwal “Data Stream Models and algorithms”-Data streaming book 2009, Springer. [5] Christian Hidber. Online Association rule mining. SIGMOD ’99 Philadelphia PA. ACM 1-58113- 084-8/99/05, 1999. [6] CharanjeetKaur, Association Rule Mining using Apriori Algorithm: A Survey ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, Issue 6, June 2013. [7] “Data mining techniques “by Arun k Pujari. [8] “Data Streams: An Overview and Scientific Applications” Charu C. Aggarwal. [9] “Data Mining: Introductory and Advanced Topics” Margaret H. Dunham. [10] Frequent item set mining data set repository, http:// fimi.cshelsinki.fi/data/ [11] Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006. [12] Kamini Nalavade, B.B. Meshram, “Finding Frequent Item sets using AprioriAlgorihm to Detect Intrusions in Large Dataset”, International Journal of Computer Applications & Information Technology Vol. 6, Issue I June July 2014 (ISSN: 2278-7720). Page | 84 0 10 20 30 40 50 60 70 W1 W2 W3 W4 W5 1000 Ds 2000 Ds 5000 Ds 10,000 Ds Time Computation for SPV Association Rule Threshold σ = 25, C = 55
  • 11. International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 1, February 2015 25 [13] “Mining frequent patterns across multiple data streams” Jing Guo, Peng Zhang, Jianlong Tan and li Guo, 2011. [14] Nan Jiang and Le Gruenwald, “Research Issues in Data Stream Association Rule Mining”- SIGMOD Record, Vol. 35, No. 1, Mar. 2006. [15] RakeshAgrawal, RamakrishnanSrikant; Fast Algorithms for Mining Association Rules; Int'l Conf. on Very Large Databases; September 1994. [16] S.Vijayarani et al, “ Mining Frequent Item Sets over Data Streams using Éclat Algorithm” , International Conference on Research Trends in Computer Technologies (ICRTCT - 2013) Proceedings published in International Journal of Computer Applications® (IJCA) (0975 – 8887) 27. [17] Website: Tanagra.software.informer.com. [18] Website:http://data‐mining‐tutorials.blogspot.com/2008/11/supervised‐association‐rules.html. AUTHORS Dr. S. Vijayarani, MCA, M.Phil, Ph.D is working as Assistant Professor in the School of Computer Science and Engineering, Bharathiar University, Coimbatore. Her fields of research interest are data mining, privacy and security issues in data mining and data streams. She has published papers in the international journals and presented research papers in international and national conferences. Ms. R.Prasannalakshmihas completed M.C.A in Computer Applications. She is currently pursuing her M.Phil in Computer Science in the School of Computer Science and Engineering, Bharathiar University, Coimbatore. Her fields of interest are Data Streams and privacy preserving in Data mining.