SlideShare a Scribd company logo
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S. A. Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Data Mining and Warehousing (CO314)
Unit –V: Frequent Pattern Analysis
Content
 Market Basket Analysis, Frequent item set, closed item set & Association
Rules, mining multilevel association rules, constraint based association rule
mining
 Generating Association Rules from Frequent Item sets, Apriori Algorithm,
Improving the Efficiency of Apriori, FP Growth Algorithm
 Mining Various Kinds of Association Rules: Mining multilevel association
rules, constraint based association rule mining, Meta rule-Guided Mining of
Association Rules.
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
Why is Frequent Pattern Analysis Important?
 Freq. pattern: An intrinsic and important property of
datasets
 Foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
 Classification: discriminative, frequent pattern analysis
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
Basic Concepts: Frequent Patterns
Customer
buys diaper
Customer
buys both
Customer
buys beer
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
 itemset: A set of one or more
items
 k-itemset X = {x1, …, xk}
 (absolute) support, or, support
count of X: Frequency or
occurrence of an itemset X
 (relative) support, s, is the
fraction of transactions that
contains X (i.e., the probability
that a transaction contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
threshold
Basic Concepts: Association Rules
Customer
buys diaper
Customer
buys both
Customer
buys beer
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
 Find all the rules X  Y with
minimum support and confidence
 support, s, probability that a
transaction contains X  Y
 confidence, c, conditional
probability that a transaction
having X also contains Y
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3,
{Beer, Diaper}:3
 Association rules: (many more!)
 Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
The Downward Closure Property and Scalable Mining Methods
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent
 If {beer, diaper, nuts} is frequent, so is {beer, diaper}
i.e., every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)
 Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
Apriori: A Candidate Generation & Test Approach
 Apriori pruning principle: If there is any itemset which is infrequent, its
superset should not be generated/tested! (Agrawal & Srikant
@VLDB’94, Mannila, et al. @ KDD’ 94)
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k frequent
itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be generated
The Apriori Algorithm—An Example 1
The Apriori Algorithm—An Example 2
The Apriori Algorithm—An Example 3
A database has five transactions. Let min sup D 60% and min conf D 80%.
(a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the
efficiency of the two mining processes.
(b) List all the strong association rules with support s and confidence c
The Apriori Algorithm Pseudo-Code
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Implementation of Apriori
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4 = {abcd}
Further Challenges and Improvement in Apriori
 Major computational challenges
 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates
Improving the Efficiency of Apriori
 Hash-based technique
 Hashing itemsets into corresponding buckets
 A hash-based technique can be used to reduce the size of the candidate k-itemsets, Ck,
for k > 1
 Transaction reduction
 Reducing the number of transactions scanned in future iterations
 A transaction that does not contain any frequent k-itemsets cannot contain any frequent
(k C1)-itemsets.
 Therefore, such a transaction can be marked or removed from further consideration
because subsequent database scans for j-itemsets, where j > k, will not need to
consider such a transaction.
Improving the Efficiency of Apriori
 Partitioning
 Partitioning the data to find candidate itemsets
 Sampling
 Mining on a subset of the given data
 The basic idea of the sampling approach is to pick a random sample S of the given data
D, and then search for frequent itemsets in S instead of D. In this way, we trade off
some degree of accuracy against efficiency
 Dynamic item set counting
 Adding candidate itemsets at different points during a scan
Pattern-Growth Approach: Mining Frequent Patterns
Without Candidate Generation
 Bottlenecks of the Apriori approach
 Breadth-first (i.e., level-wise) search
 Candidate generation and test
 Often generates a huge number of candidates
 The FP Growth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
 Depth-first search
 Avoid explicit candidate generation
 Major philosophy: Grow long patterns from short ones using local
frequent items only
 “abc” is a frequent pattern
 Get all transactions having “abc”, i.e., project DB on abc: DB|abc
 “d” is a local frequent item in DB|abc  abcd is a frequent pattern
Construct FP-tree from a Transaction Database Example 1
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 2
Mining Various Kinds of Association Rules: Mining
multilevel association rules
 Pattern Mining in Multilevel, Multidimensional Space
 Multilevel associations involve concepts at different abstraction levels.
 Multidimensional associations involve more than one dimension or
predicate (e.g., rules that relate what a customer buys to his or her
age).
 Quantitative association rules involve numeric attributes that have an
implicit ordering among values (e.g., age).
Mining Multilevel Associations
 For many applications, strong associations discovered at high abstraction
levels, though with high support
 May want to drill down to find novel patterns at more detailed levels.
 A concept hierarchy defines a sequence of mappings from a set of low-level concepts
to a higher-level, more general concept set
 Data can be generalized by replacing low-level concepts within the data by their
corresponding higher-level concepts, or ancestors, from a concept hierarchy.
 Association rules generated from mining data at multiple abstraction levels are called
multiple-level or multilevel association
 Multilevel association rules can be mined efficiently using concept hierarchies under a
support-confidence framework.
Mining Multilevel Associations
Mining Multidimensional Associations: Multilevel
mining with uniform support
Mining Multidimensional Associations: Multilevel
mining with reduced support.
Constraint-Based Frequent Pattern Mining
 Constraint based association rule mining aims to develop a systematic method
by which the user can find important association among items in a database of
transactions.
 The users specify intuition or expectations as constraints to confine the search
space.
 This strategy is known as constraint-based mining. The constraints can include
the following:
 Knowledge type constraints
 Data constraints
 Interestingness constraints
 Rule constraints
Constraint-Based Frequent Pattern Mining
 Knowledge type constraints: These specify the type of knowledge to be
mined, such as association, correlation, classification, or clustering.
 Data constraints: These specify the set of task-relevant data.
 Dimension/level constraints: These specify the desired dimensions (or
attributes) of the data, the abstraction levels, or the level of the concept
hierarchies to be used in mining.
 Interestingness constraints: These specify thresholds on statistical measures of
rule interestingness such as support, confidence, and correlation.
Constraint-Based Frequent Pattern Mining
 Rule constraints: These specify the form of, or conditions on, the rules to be
mined. Such constraints may be expressed as meta rules (rule templates), as
the maximum or minimum number of predicates that can occur in the rule
antecedent or consequent, or as relationships among attributes, attribute
values, and/or aggregates.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 31
Reference
 Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and
Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.
 https://onlinecourses.nptel.ac.in/noc24_cs22

More Related Content

PPTX
PPT
Traditional symmetric-key cipher
PPTX
lazy learners and other classication methods
PPT
PPTX
Classification and prediction in data mining
PPTX
Feistel cipher
PPTX
Operator precedance parsing
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Traditional symmetric-key cipher
lazy learners and other classication methods
Classification and prediction in data mining
Feistel cipher
Operator precedance parsing
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...

What's hot (20)

PPT
Covering (Rules-based) Algorithm
PDF
Naive Bayes
PPTX
Network intrusion detection system and analysis
PPTX
Fp growth algorithm
PDF
I. Alpha-Beta Pruning in ai
PPTX
Data mining: Classification and prediction
PPTX
Data Mining: Mining ,associations, and correlations
PPT
Cryptography and Network Security William Stallings Lawrie Brown
PPT
Association rule mining
PPT
Clustering: Large Databases in data mining
PDF
Content Based Image Retrieval
PPT
2.5 backpropagation
PPTX
Lecture optimal binary search tree
PPT
X.509 Certificates
PPTX
Decision trees for machine learning
PPTX
SHA- Secure hashing algorithm
PPTX
Dag representation of basic blocks
PPTX
Sequential pattern mining
PPTX
Security services and mechanisms
Covering (Rules-based) Algorithm
Naive Bayes
Network intrusion detection system and analysis
Fp growth algorithm
I. Alpha-Beta Pruning in ai
Data mining: Classification and prediction
Data Mining: Mining ,associations, and correlations
Cryptography and Network Security William Stallings Lawrie Brown
Association rule mining
Clustering: Large Databases in data mining
Content Based Image Retrieval
2.5 backpropagation
Lecture optimal binary search tree
X.509 Certificates
Decision trees for machine learning
SHA- Secure hashing algorithm
Dag representation of basic blocks
Sequential pattern mining
Security services and mechanisms
Ad

Similar to Frequent Pattern Analysis, Apriori and FP Growth Algorithm (20)

PPT
Associations.ppt
PPT
Associations1
PPTX
Association Rule Mining, Correlation,Clustering
PDF
06FPBasic02.pdf
PPT
Mining Frequent Itemsets.ppt
PPT
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
PDF
6 module 4
PPT
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
PPT
06FPBasic.ppt
PPT
06FPBasic.ppt
PPT
Mining Frequent Patterns, Association and Correlations
PPT
My6asso
PDF
06 fp basic
PPTX
Data Mining
PPTX
Lasso Regression regression amalysis.pptx
PPT
Data Mining Techniques
PPT
associations and Data Mining in Machine learning.ppt
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPTX
MIning association rules and frequent patterns.pptx
Associations.ppt
Associations1
Association Rule Mining, Correlation,Clustering
06FPBasic02.pdf
Mining Frequent Itemsets.ppt
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
6 module 4
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
06FPBasic.ppt
06FPBasic.ppt
Mining Frequent Patterns, Association and Correlations
My6asso
06 fp basic
Data Mining
Lasso Regression regression amalysis.pptx
Data Mining Techniques
associations and Data Mining in Machine learning.ppt
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
MIning association rules and frequent patterns.pptx
Ad

More from ShivarkarSandip (20)

PDF
MEASURES OF DATA: SCALE, TENDENCY, VARIATION SHAPE
PDF
STATISTICS AND PROBABILITY FOR DATA SCIENCE,
PDF
Introduction to Data Science: data science process
PDF
Prerquisite for Data Sciecne, KDD, Attribute Type
PDF
NBaysian classifier, Naive Bayes classifier
PDF
Supervised Learning Ensemble Techniques Machine Learning
PDF
Microcontroller 8051- Architecture Memory Organization
PDF
Data Preprocessing -Data Quality Noisy Data
PDF
Supervised Learning Decision Trees Review of Entropy
PDF
Supervised Learning Decision Trees Machine Learning
PDF
Cluster Analysis: Measuring Similarity & Dissimilarity
PDF
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
PDF
Data Warehouse and Architecture, OLAP Operation
PDF
Data Preparation and Preprocessing , Data Cleaning
PDF
Introduction to Data Mining, KDD Process, OLTP and OLAP
PDF
Introduction to Data Mining KDD Process OLAP
PDF
Issues in data mining Patterns Online Analytical Processing
PDF
Introduction to data mining which covers the basics
PDF
Introduction to Data Communication.pdf
PDF
Classification of Signal.pdf
MEASURES OF DATA: SCALE, TENDENCY, VARIATION SHAPE
STATISTICS AND PROBABILITY FOR DATA SCIENCE,
Introduction to Data Science: data science process
Prerquisite for Data Sciecne, KDD, Attribute Type
NBaysian classifier, Naive Bayes classifier
Supervised Learning Ensemble Techniques Machine Learning
Microcontroller 8051- Architecture Memory Organization
Data Preprocessing -Data Quality Noisy Data
Supervised Learning Decision Trees Review of Entropy
Supervised Learning Decision Trees Machine Learning
Cluster Analysis: Measuring Similarity & Dissimilarity
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Data Warehouse and Architecture, OLAP Operation
Data Preparation and Preprocessing , Data Cleaning
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining KDD Process OLAP
Issues in data mining Patterns Online Analytical Processing
Introduction to data mining which covers the basics
Introduction to Data Communication.pdf
Classification of Signal.pdf

Recently uploaded (20)

PDF
composite construction of structures.pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Geodesy 1.pptx...............................................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
composite construction of structures.pdf
Mechanical Engineering MATERIALS Selection
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
OOP with Java - Java Introduction (Basics)
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Geodesy 1.pptx...............................................
Embodied AI: Ushering in the Next Era of Intelligent Systems
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Foundation to blockchain - A guide to Blockchain Tech
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Lecture Notes Electrical Wiring System Components
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Frequent Pattern Analysis, Apriori and FP Growth Algorithm

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S. A. Shivarkar Assistant Professor Contact No.8275032712 Email- [email protected] Subject- Data Mining and Warehousing (CO314) Unit –V: Frequent Pattern Analysis
  • 2. Content  Market Basket Analysis, Frequent item set, closed item set & Association Rules, mining multilevel association rules, constraint based association rule mining  Generating Association Rules from Frequent Item sets, Apriori Algorithm, Improving the Efficiency of Apriori, FP Growth Algorithm  Mining Various Kinds of Association Rules: Mining multilevel association rules, constraint based association rule mining, Meta rule-Guided Mining of Association Rules.
  • 3. What Is Frequent Pattern Analysis?  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set  First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining  Motivation: Finding inherent regularities in data  What products were often purchased together?— Beer and diapers?!  What are the subsequent purchases after buying a PC?  What kinds of DNA are sensitive to this new drug?  Can we automatically classify web documents?  Applications  Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis.
  • 4. Why is Frequent Pattern Analysis Important?  Freq. pattern: An intrinsic and important property of datasets  Foundation for many essential data mining tasks  Association, correlation, and causality analysis  Sequential, structural (e.g., sub-graph) patterns  Pattern analysis in spatiotemporal, multimedia, time- series, and stream data  Classification: discriminative, frequent pattern analysis  Cluster analysis: frequent pattern-based clustering  Data warehousing: iceberg cube and cube-gradient  Semantic data compression: fascicles  Broad applications
  • 5. Basic Concepts: Frequent Patterns Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk  itemset: A set of one or more items  k-itemset X = {x1, …, xk}  (absolute) support, or, support count of X: Frequency or occurrence of an itemset X  (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X)  An itemset X is frequent if X’s support is no less than a minsup threshold
  • 6. Basic Concepts: Association Rules Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk  Find all the rules X  Y with minimum support and confidence  support, s, probability that a transaction contains X  Y  confidence, c, conditional probability that a transaction having X also contains Y Let minsup = 50%, minconf = 50% Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3  Association rules: (many more!)  Beer  Diaper (60%, 100%)  Diaper  Beer (60%, 75%)
  • 7. The Downward Closure Property and Scalable Mining Methods  The downward closure property of frequent patterns  Any subset of a frequent itemset must be frequent  If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper}  Scalable mining methods: Three major approaches  Apriori (Agrawal & Srikant@VLDB’94)  Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00)  Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)
  • 8. Apriori: A Candidate Generation & Test Approach  Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)  Method:  Initially, scan DB once to get frequent 1-itemset  Generate length (k+1) candidate itemsets from length k frequent itemsets  Test the candidates against DB  Terminate when no frequent or candidate set can be generated
  • 11. The Apriori Algorithm—An Example 3 A database has five transactions. Let min sup D 60% and min conf D 80%. (a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the efficiency of the two mining processes. (b) List all the strong association rules with support s and confidence c
  • 12. The Apriori Algorithm Pseudo-Code Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
  • 13. Implementation of Apriori  How to generate candidates?  Step 1: self-joining Lk  Step 2: pruning  Example of Candidate-generation  L3={abc, abd, acd, ace, bcd}  Self-joining: L3*L3  abcd from abc and abd  acde from acd and ace  Pruning:  acde is removed because ade is not in L3  C4 = {abcd}
  • 14. Further Challenges and Improvement in Apriori  Major computational challenges  Multiple scans of transaction database  Huge number of candidates  Tedious workload of support counting for candidates  Improving Apriori: general ideas  Reduce passes of transaction database scans  Shrink number of candidates  Facilitate support counting of candidates
  • 15. Improving the Efficiency of Apriori  Hash-based technique  Hashing itemsets into corresponding buckets  A hash-based technique can be used to reduce the size of the candidate k-itemsets, Ck, for k > 1  Transaction reduction  Reducing the number of transactions scanned in future iterations  A transaction that does not contain any frequent k-itemsets cannot contain any frequent (k C1)-itemsets.  Therefore, such a transaction can be marked or removed from further consideration because subsequent database scans for j-itemsets, where j > k, will not need to consider such a transaction.
  • 16. Improving the Efficiency of Apriori  Partitioning  Partitioning the data to find candidate itemsets  Sampling  Mining on a subset of the given data  The basic idea of the sampling approach is to pick a random sample S of the given data D, and then search for frequent itemsets in S instead of D. In this way, we trade off some degree of accuracy against efficiency  Dynamic item set counting  Adding candidate itemsets at different points during a scan
  • 17. Pattern-Growth Approach: Mining Frequent Patterns Without Candidate Generation  Bottlenecks of the Apriori approach  Breadth-first (i.e., level-wise) search  Candidate generation and test  Often generates a huge number of candidates  The FP Growth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)  Depth-first search  Avoid explicit candidate generation  Major philosophy: Grow long patterns from short ones using local frequent items only  “abc” is a frequent pattern  Get all transactions having “abc”, i.e., project DB on abc: DB|abc  “d” is a local frequent item in DB|abc  abcd is a frequent pattern
  • 18. Construct FP-tree from a Transaction Database Example 1
  • 19. Construct FP-tree from a Transaction Database Example 1 cont…
  • 20. Construct FP-tree from a Transaction Database Example 1 cont…
  • 21. Construct FP-tree from a Transaction Database Example 1 cont…
  • 22. Construct FP-tree from a Transaction Database Example 2
  • 23. Mining Various Kinds of Association Rules: Mining multilevel association rules  Pattern Mining in Multilevel, Multidimensional Space  Multilevel associations involve concepts at different abstraction levels.  Multidimensional associations involve more than one dimension or predicate (e.g., rules that relate what a customer buys to his or her age).  Quantitative association rules involve numeric attributes that have an implicit ordering among values (e.g., age).
  • 24. Mining Multilevel Associations  For many applications, strong associations discovered at high abstraction levels, though with high support  May want to drill down to find novel patterns at more detailed levels.  A concept hierarchy defines a sequence of mappings from a set of low-level concepts to a higher-level, more general concept set  Data can be generalized by replacing low-level concepts within the data by their corresponding higher-level concepts, or ancestors, from a concept hierarchy.  Association rules generated from mining data at multiple abstraction levels are called multiple-level or multilevel association  Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
  • 26. Mining Multidimensional Associations: Multilevel mining with uniform support
  • 27. Mining Multidimensional Associations: Multilevel mining with reduced support.
  • 28. Constraint-Based Frequent Pattern Mining  Constraint based association rule mining aims to develop a systematic method by which the user can find important association among items in a database of transactions.  The users specify intuition or expectations as constraints to confine the search space.  This strategy is known as constraint-based mining. The constraints can include the following:  Knowledge type constraints  Data constraints  Interestingness constraints  Rule constraints
  • 29. Constraint-Based Frequent Pattern Mining  Knowledge type constraints: These specify the type of knowledge to be mined, such as association, correlation, classification, or clustering.  Data constraints: These specify the set of task-relevant data.  Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, the abstraction levels, or the level of the concept hierarchies to be used in mining.  Interestingness constraints: These specify thresholds on statistical measures of rule interestingness such as support, confidence, and correlation.
  • 30. Constraint-Based Frequent Pattern Mining  Rule constraints: These specify the form of, or conditions on, the rules to be mined. Such constraints may be expressed as meta rules (rule templates), as the maximum or minimum number of predicates that can occur in the rule antecedent or consequent, or as relationships among attributes, attribute values, and/or aggregates.
  • 31. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 31 Reference  Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.  https://onlinecourses.nptel.ac.in/noc24_cs22