SlideShare a Scribd company logo
Apriori Algorithm   Hash Based and Graph Based Modifications
Agenda Data Mining Association Rule  Ariori Algorithm Hash Based Method  Graph Based Approach Conclusion and Future Work
Data Mining Data mining is the process of extracting patterns (knowledge) from  data .  The aim of data mining is to automate the process of finding interesting patterns and trends from a given data.  It is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of  profiling practices , scientific discovery, and decision making.
Association Rules Association rule learning is a popular and well researched method for discovering interesting relations between variables in large database.  For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy burger. Such information can be used as the basis for decisions about marketing activities.
Problem Description I = {A, B, C} Possible Item sets:
Support and Confidence Support (A -> B) =  No. of transactions containing A & B ________________________________ No. of total transactions Confidence (A -> B) = No. of transactions containing A & B _________________________________ No. of transactions containing A
Original Apriori Algorithm
L1 = {large 1-itemsetsg} for  ( k = 2; Lk !=0,  k++ ) do begin Ck = apriori-gen(Lk-1 );  // New candidates for  all transactions t Є D do begin Ct = subset (Ck , t);  // Candidates contained in t for  all candidates c Є Ct do c.count++; end Lk = {c  Є  Ck | c.count >=  minsup} end
 
Hash based method   Repeat   //for each transaction of the database { D = { set of all possible k-itemsets in the ith transaction} For  each element of D { Find a unique integer uniq_int using thehash function for k-itemset Increment   freq[uniq_int] } Increment  trans_pos //Moves pointer to next transaction until end_of_file For  (freq_ind=0; freq_ind<length_of_the_array(two_three_freq[]); freq_ind++) { if (freq[freq_ind] >= required support) mark the corresponding k-itemset } }
Graph based approach   Procedure FrequentItemGraph (Tree, F) { scan  the DB once to collect the frequent 2-itemsets and their support ascending; add  all items in the DB as the header nodes for  each 2-itemset entry (top down order) in freq2list do if   (first item = item in header node)  then create a link to the corresponding header node i=3 for  each i-itemsets entry in the tree  do call buildsubtree (F) end  } Procedure  buildsubtree (F) If  (first i-1 itemset = itemsets in their respective header nodes)  then create  a link to the corresponding header node i=i+1 repeat  buildsubtree (F) end  }
Example: a c g T4 b c e f g T3 a b c e f g T2 b e T1 Items Transaction
Conclusion and Future Work In order to be able to continue with the hashing method, we need a perfect hash function h(e1, e2,…, ek), this hash function can be obtained by one of the following methods: h(e1,e2,…,ek) = prm(1)^e1 + prm(2)^e2 + … + prm(k)^ek Where prm is the set of prime numbers, prm = {2, 3, 5, 7…} Although this hash function guarantee a unique key for every itemset, but it requires an irrational memory space, for example, consider an original item set X with only 10 items, and the following T hashed item set, T = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, consider a 4-itemset (1, 2, 3, 10), this item set will be hashed to the value “282475385”, this will result in reserving large memory space without being used effectively. Other perfect hash functions, used in hashing strings, are not applicable here, because the input variables are limited to 26, which is the number of alphabets, while the number of items in a certain database can be very larger than this.
Use hashing techniques to find the efficient frequent 2-itemsets in order to reduce the time and memory requirements to build a graphical structure

More Related Content

PDF
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPTX
Apriori algorithm
PPTX
PPTX
Lect6 Association rule & Apriori algorithm
PDF
What is Apriori Algorithm | Edureka
PPTX
Fp growth algorithm
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Apriori algorithm
Lect6 Association rule & Apriori algorithm
What is Apriori Algorithm | Edureka
Fp growth algorithm

What's hot (20)

PDF
Data Mining: Association Rules Basics
PPT
Design and Analysis of Algorithms
PPTX
Naive bayes
PPTX
Apriori algorithm
PPTX
Data mining techniques unit III
PDF
Algorithms Lecture 7: Graph Algorithms
PPTX
Binary search
PPT
3. mining frequent patterns
PDF
Binary Search - Design & Analysis of Algorithms
PPTX
APRIORI ALGORITHM -PPT.pptx
PPSX
Frequent itemset mining methods
PDF
Keras vs Tensorflow vs PyTorch | Deep Learning Frameworks Comparison | Edureka
PPTX
SPADE -
PPTX
Dynamic Itemset Counting
PPT
Fp growth algorithm
PPTX
Big o notation
PPTX
Probabilistic information retrieval models & systems
PPTX
Apriori algorithm
PPTX
Decision Tree - C4.5&CART
Data Mining: Association Rules Basics
Design and Analysis of Algorithms
Naive bayes
Apriori algorithm
Data mining techniques unit III
Algorithms Lecture 7: Graph Algorithms
Binary search
3. mining frequent patterns
Binary Search - Design & Analysis of Algorithms
APRIORI ALGORITHM -PPT.pptx
Frequent itemset mining methods
Keras vs Tensorflow vs PyTorch | Deep Learning Frameworks Comparison | Edureka
SPADE -
Dynamic Itemset Counting
Fp growth algorithm
Big o notation
Probabilistic information retrieval models & systems
Apriori algorithm
Decision Tree - C4.5&CART
Ad

Viewers also liked (10)

PPTX
Engineering Big Data with Hadoop
PDF
Part 6 (machine learning overview) what makes a problem tough continued
PPT
FSU SLIS Wk 8 Intro to Info Services - Ready Reference
PDF
Part 4 (machine learning overview) solution architecture
PDF
Part 5 (machine learning overview) what makes a problem tough
PPTX
Mini project1 team5
PPTX
Scope and Career in Analytics
PDF
Lecture13 - Association Rules
PPT
Association rule mining
Engineering Big Data with Hadoop
Part 6 (machine learning overview) what makes a problem tough continued
FSU SLIS Wk 8 Intro to Info Services - Ready Reference
Part 4 (machine learning overview) solution architecture
Part 5 (machine learning overview) what makes a problem tough
Mini project1 team5
Scope and Career in Analytics
Lecture13 - Association Rules
Association rule mining
Ad

Similar to Apriori algorithm (20)

PPTX
Data structure and algorithm
DOCX
Game Paper
PPS
Data Structure
PPS
Lec 1 Ds
PPS
Lec 1 Ds
PDF
unit II Mining Association Rule.pdf
PPT
CS583-association-rules.ppt
PPT
Association rule mining used in data mining
PPT
CS583-association-rules.ppt
PPT
CS583-association-rules presentation.ppt
PPT
CS583 - association-rules(BahanAR-5).ppt
PDF
A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...
PPTX
DATA STRUCTURES unit 1.pptx
PPTX
Bsc cs ii dfs u-1 introduction to data structure
PDF
An Approach of Improvisation in Efficiency of Apriori Algorithm
PDF
Scalable frequent itemset mining using heterogeneous computing par apriori a...
PPTX
19. Data Structures and Algorithm Complexity
PDF
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
PPTX
Mca ii dfs u-1 introduction to data structure
PPTX
Bca ii dfs u-1 introduction to data structure
Data structure and algorithm
Game Paper
Data Structure
Lec 1 Ds
Lec 1 Ds
unit II Mining Association Rule.pdf
CS583-association-rules.ppt
Association rule mining used in data mining
CS583-association-rules.ppt
CS583-association-rules presentation.ppt
CS583 - association-rules(BahanAR-5).ppt
A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...
DATA STRUCTURES unit 1.pptx
Bsc cs ii dfs u-1 introduction to data structure
An Approach of Improvisation in Efficiency of Apriori Algorithm
Scalable frequent itemset mining using heterogeneous computing par apriori a...
19. Data Structures and Algorithm Complexity
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Mca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure

Apriori algorithm

  • 1. Apriori Algorithm Hash Based and Graph Based Modifications
  • 2. Agenda Data Mining Association Rule Ariori Algorithm Hash Based Method Graph Based Approach Conclusion and Future Work
  • 3. Data Mining Data mining is the process of extracting patterns (knowledge) from data . The aim of data mining is to automate the process of finding interesting patterns and trends from a given data. It is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of profiling practices , scientific discovery, and decision making.
  • 4. Association Rules Association rule learning is a popular and well researched method for discovering interesting relations between variables in large database. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy burger. Such information can be used as the basis for decisions about marketing activities.
  • 5. Problem Description I = {A, B, C} Possible Item sets:
  • 6. Support and Confidence Support (A -> B) = No. of transactions containing A & B ________________________________ No. of total transactions Confidence (A -> B) = No. of transactions containing A & B _________________________________ No. of transactions containing A
  • 8. L1 = {large 1-itemsetsg} for ( k = 2; Lk !=0, k++ ) do begin Ck = apriori-gen(Lk-1 ); // New candidates for all transactions t Є D do begin Ct = subset (Ck , t); // Candidates contained in t for all candidates c Є Ct do c.count++; end Lk = {c Є Ck | c.count >= minsup} end
  • 9.  
  • 10. Hash based method Repeat //for each transaction of the database { D = { set of all possible k-itemsets in the ith transaction} For each element of D { Find a unique integer uniq_int using thehash function for k-itemset Increment freq[uniq_int] } Increment trans_pos //Moves pointer to next transaction until end_of_file For (freq_ind=0; freq_ind<length_of_the_array(two_three_freq[]); freq_ind++) { if (freq[freq_ind] >= required support) mark the corresponding k-itemset } }
  • 11. Graph based approach Procedure FrequentItemGraph (Tree, F) { scan the DB once to collect the frequent 2-itemsets and their support ascending; add all items in the DB as the header nodes for each 2-itemset entry (top down order) in freq2list do if (first item = item in header node) then create a link to the corresponding header node i=3 for each i-itemsets entry in the tree do call buildsubtree (F) end } Procedure buildsubtree (F) If (first i-1 itemset = itemsets in their respective header nodes) then create a link to the corresponding header node i=i+1 repeat buildsubtree (F) end }
  • 12. Example: a c g T4 b c e f g T3 a b c e f g T2 b e T1 Items Transaction
  • 13. Conclusion and Future Work In order to be able to continue with the hashing method, we need a perfect hash function h(e1, e2,…, ek), this hash function can be obtained by one of the following methods: h(e1,e2,…,ek) = prm(1)^e1 + prm(2)^e2 + … + prm(k)^ek Where prm is the set of prime numbers, prm = {2, 3, 5, 7…} Although this hash function guarantee a unique key for every itemset, but it requires an irrational memory space, for example, consider an original item set X with only 10 items, and the following T hashed item set, T = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, consider a 4-itemset (1, 2, 3, 10), this item set will be hashed to the value “282475385”, this will result in reserving large memory space without being used effectively. Other perfect hash functions, used in hashing strings, are not applicable here, because the input variables are limited to 26, which is the number of alphabets, while the number of items in a certain database can be very larger than this.
  • 14. Use hashing techniques to find the efficient frequent 2-itemsets in order to reduce the time and memory requirements to build a graphical structure