SlideShare a Scribd company logo
Sriskandarajah Suhothayan Kasun Gajasinghe Isuru Loku Narangoda Subash Chaturanga
Outline Introduction Basic principles Solution patterns
Introduction Graphs can be seen in everywhere. In computer science, graph is viewed as an abstract data structure which represents relationships among data.
Graph based data mining Graph based data mining is finding out useful and understandable patterns from graph representation of data. The main subject area of graph based data mining is identifying the frequently occurring subgraph patterns.
Approaches In the recent past a significant work has been done in this subject area to develop algorithms to mine graph data efficiently.  In this paper we are discussing about such several well known algorithms under following categories. Mathematical Graph Theory Based Approaches Greedy Search Based Approaches Inductive Logic Programming Approach Inductive Database Based Approaches
Applications BioInformatics mine biochemical structures  finding out biological conserved sub networks Chemical compound analysis Web browsing pattern analysis intrusion network analysis mining communication networks
Basic Principles Subgraph categories general subgraphs induced subgraphs connected subgraphs Subgraph Isomorphism Problem This finds whether there exists a one-to-one mapping from a set of vertices to another set.
Basic Principles Graph Invariants Quantities to characterize the topological  structure of a graph number vertices,  degree of each vertex number of edges connected to the vertex
Solution Approaches direct Categorization Completeness complete search heuristic search Subgraph isomorphism matching problem Indirect (solves the subgraph  similarity problem)
Solution Approaches Greedy search Inductive logic programming (ILP) Inductive database Complete level-wise search Support Vector Machine (SVM)
Greedy search The conventional solution Categorized into  Depth-First search (DFS) and Breadth-First Search (BFS) Beam search  The disadvantage: as the search proceeds it prunes the branches which do not fit to the maximum branch number limit
Inductive logic programming (ILP) Induction? combination of the 'abduction' (guessing) to select some hypotheses and the 'justification' to seek those hypotheses to justify the observed facts.
Inductive logic programming (ILP) positive examples  + negative examples   =>  hypothesis + background knowledge background knowledge  to control the search process (prune some search paths) introduce predetermined subgraph patterns  ILP can be in any of four categories
Inductive database Subgraphs and relations among subgraphs are  pre-generated sad stored in an inductive database Advantage: fast operation as the basic patterns Disadvantage: large amount of computation and memory utilization
Complete level-wise search It's Complete and Direct Here data are not sets of items Rather graphs having the combinations of a vertex set V(G) and an edge set E(G) which include topological information. Extended approach of Apriori algorithm is used
Support Vector Machine (SVM) Used for classification and regression analysis A non-probabilistic binary linear classifier SVN is a heuristic search and an indirect method in terms of subgraph isomorphism problem.
Categorization Mathematical Graph Theory Based Approaches  Greedy Search Based Approaches Inductive Logic Programming Approach Inductive Database Based Approaches Kernel Function Based Approaches
Greedy Search Based Approaches Use heuristics to evaluate the solution. Two major works SUBDUE GBI
Graph Based Induction (GBI) Has two methods  one  for chunking and the other for extracting patters. Can arrive at local minimum solutions; using pair wise chunking at each step by the opportunistic beam search. Ability to reconstruct the original graph as and when needed The advantage of GBI is that it can handle both directed and undirected labelled graph even with closed paths which includes closed edges. Use empirical graph size definition, limitation in continuously compressing the graph, graph never becomes a single vertex. Extract substructures and construct a classifier.
SUBDUE A graph-based relational learning system Compress the graphs based on Minimum Description Length (MDL) principle Not face high computational complexity (uses computationally constrained beam search) Miss some optimum sub graphs fewer number of highly interesting patterns; than generating a large number of patterns from which interesting patterns need to be identified. Runtime much larger than gSpan and FSG: non-linear with the dataset size (because of the implementation of graph isomorphism problem)
Mathematical Approaches  Apriori-based methods AGM FSG Pattern Growth methods gSpan
Apriori-based Approach  AGM Used to mine “frequent induced subgraphs” Works with both directed and undirected graphs Importantly, this algorithm is not limited to the connected graphs. It also supports isolated graphs.
AGM Breadth first search.  Create new candidates for level k+1 by joining two graphs at level k. AGM generates new graphs by adding a new node: And then proceeds as per Apriori...
FSG FSG works better on graph data sets with more edge and vertex labels This is an optimized algorithm of AGM with added techniques for efficiency. FSG increases the efficiency of the candidate generation of frequent subgraphs by introducing the Transaction ID (TID) method. efficient candidate subgraph generation algorithms.
FSG FSG is a apriori-based and therefore uses level-wise algorithm  Faces two challenges: candidate generation: the generation of size subgraph candidates is more complicated and costly pruning false positives: subgraph isomorphism test is an NP-complete problem
gSpan Uses Depth-First-Search (DFS) can be used to find frequent sub graphs one by one from small to large ones.  Advantages No candidate generation and false test Better saving of space by DFS. Pattern growth mathod
GRAPH DATASET FREQUENT PATTERNS (MIN SUPPORT IS 2) (A) (B) (C) (1) (2)
Another three approaches to mine graph based data. Inductive Logic Programming approach Inductive database approach Kernel function based approach
ILP approach. ILP systems constructs predictive model for a given data set  by searching large  space of candidate hypothesis.  WARMR – proposed in 1998. Combination of Apriori-like level wise search and IPL method.  But have a high computational complexity. FARMER – proposed in 2011. Runs two orders of magnitude than WARMER.
Inductive DB approach. Databases which are capable of handling patterns within data.  Quite different from from typical data bases. Uses interactive querying process to mine data in these data bases. MolFea is an effort related to this area. Has a better computational efficiency which mines linear fragments in chemical compounds..  Also this performs a complete search of the paths in graph data.
Kernel Function based approach This “kernel” function basically defines similarity between two graphs The paper consists of two efforts done based on this approach, which  classifies the graphs  in to binary classes by SVM (Support Vector -  Machine).

More Related Content

What's hot (20)

PPTX
Lgm saarbrucken
Yasuo Tabei
 
PDF
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
PDF
A Graph-based Model for Multimodal Information Retrieval
serwah_S_gh
 
PPTX
Locally densest subgraph discovery
aftab alam
 
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
Ha Phuong
 
PPTX
A Graph Summarization: A Survey | Summarizing and understanding large graphs
aftab alam
 
PPTX
Collaborative Similarity Measure for Intra-Graph Clustering
Waqas Nawaz
 
PDF
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
PDF
Learning Graph Representation for Data-Efficiency RL
lauratoni4
 
PDF
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
PPTX
[Seminar] 200508 hyunwook lee
ivaderivader
 
PDF
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
ijiert bestjournal
 
PPTX
Distributed graph summarization
aftab alam
 
PDF
Objects Clustering of Movie Using Graph Mining Technique
International Journal of Engineering Inventions www.ijeijournal.com
 
DOC
The Most Important Algorithms
wensheng wei
 
PPTX
Programming in python
Ivan Rojas
 
PDF
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
Daksh Raj Chopra
 
PDF
presentation
jie ren
 
PDF
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
shanelynn
 
PPT
Good Old Fashioned Artificial Intelligence
Robert Short
 
Lgm saarbrucken
Yasuo Tabei
 
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
A Graph-based Model for Multimodal Information Retrieval
serwah_S_gh
 
Locally densest subgraph discovery
aftab alam
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Ha Phuong
 
A Graph Summarization: A Survey | Summarizing and understanding large graphs
aftab alam
 
Collaborative Similarity Measure for Intra-Graph Clustering
Waqas Nawaz
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
Learning Graph Representation for Data-Efficiency RL
lauratoni4
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
[Seminar] 200508 hyunwook lee
ivaderivader
 
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
ijiert bestjournal
 
Distributed graph summarization
aftab alam
 
Objects Clustering of Movie Using Graph Mining Technique
International Journal of Engineering Inventions www.ijeijournal.com
 
The Most Important Algorithms
wensheng wei
 
Programming in python
Ivan Rojas
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
Daksh Raj Chopra
 
presentation
jie ren
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
shanelynn
 
Good Old Fashioned Artificial Intelligence
Robert Short
 

Viewers also liked (20)

PDF
Graph Theory
Shivam Singh
 
PDF
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Seattle DAML meetup
 
PPTX
New opportunities for connected data : Neo4j the graph database
Cédric Fauvet
 
PPTX
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
PPTX
Temporal Pattern Mining
Prakhar Dhama
 
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
PPTX
Interesting applications of graph theory
Tech_MX
 
PDF
120808
Chongjie Li
 
PDF
Mining Frequent Closed Graphs on Evolving Data Streams
Albert Bifet
 
PDF
Financial planning in the brain scanner slidecast
Russell James
 
PDF
Neuronvisio Intro
Michele Mattioni
 
PPTX
Presentation Internship Brain Connectivity Graph 2014 (ENG)
Romain Chion
 
PPTX
Efficient frequent pattern mining in distributed system
Saurav Kumar
 
PDF
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
ijsrd.com
 
PPTX
Graph Theory
Ehsan Hamzei
 
PPT
burton_discrete_graph theory
guest63f42b
 
PDF
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
PPT
Frequent itemset mining using pattern growth method
Shani729
 
PDF
How to read academic research (beginner's guide)
Russell James
 
PPTX
Talking Planned Giving: Words that Work
Russell James
 
Graph Theory
Shivam Singh
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Seattle DAML meetup
 
New opportunities for connected data : Neo4j the graph database
Cédric Fauvet
 
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
Temporal Pattern Mining
Prakhar Dhama
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Interesting applications of graph theory
Tech_MX
 
120808
Chongjie Li
 
Mining Frequent Closed Graphs on Evolving Data Streams
Albert Bifet
 
Financial planning in the brain scanner slidecast
Russell James
 
Neuronvisio Intro
Michele Mattioni
 
Presentation Internship Brain Connectivity Graph 2014 (ENG)
Romain Chion
 
Efficient frequent pattern mining in distributed system
Saurav Kumar
 
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
ijsrd.com
 
Graph Theory
Ehsan Hamzei
 
burton_discrete_graph theory
guest63f42b
 
REVIEW: Frequent Pattern Mining Techniques
Editor IJMTER
 
Frequent itemset mining using pattern growth method
Shani729
 
How to read academic research (beginner's guide)
Russell James
 
Talking Planned Giving: Words that Work
Russell James
 
Ad

Similar to Survey on Frequent Pattern Mining on Graph Data - Slides (20)

DOC
BugLoc: Bug Localization in Multi Threaded Application via Graph Mining Approach
MangaiK4
 
DOC
BugLoc: Bug Localization in Multi Threaded Application via Graph Mining Approach
MangaiK4
 
PDF
Parallel Key Value Pattern Matching Model
ijsrd.com
 
PDF
Mining closed sequential patterns in large sequence databases
IJDMS
 
PDF
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
PDF
Parallel algorithms for multi-source graph traversal and its applications
Subhajit Sahu
 
PDF
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
PDF
Ijetcas14 314
Iasir Journals
 
PDF
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...
TELKOMNIKA JOURNAL
 
PPTX
Everything you need to know about AutoML
Arpitha Gurumurthy
 
PPT
Recognition as Graph Matching
Vishakha Agarwal
 
PDF
Efficient Image Retrieval by Multi-view Alignment Technique with Non Negative...
RSIS International
 
PDF
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 
PPTX
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
AkankshaRawat53
 
PDF
Subgraph relative frequency approach for extracting interesting substructur
IAEME Publication
 
PDF
Research Proposal
Komlan Atitey
 
PDF
395 404
Editor IJARCET
 
PDF
IRJET - Object Detection using Hausdorff Distance
IRJET Journal
 
PDF
Data clustering using map reduce
Varad Meru
 
PDF
IRJET- Object Detection using Hausdorff Distance
IRJET Journal
 
BugLoc: Bug Localization in Multi Threaded Application via Graph Mining Approach
MangaiK4
 
BugLoc: Bug Localization in Multi Threaded Application via Graph Mining Approach
MangaiK4
 
Parallel Key Value Pattern Matching Model
ijsrd.com
 
Mining closed sequential patterns in large sequence databases
IJDMS
 
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
Parallel algorithms for multi-source graph traversal and its applications
Subhajit Sahu
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
IOSR Journals
 
Ijetcas14 314
Iasir Journals
 
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...
TELKOMNIKA JOURNAL
 
Everything you need to know about AutoML
Arpitha Gurumurthy
 
Recognition as Graph Matching
Vishakha Agarwal
 
Efficient Image Retrieval by Multi-view Alignment Technique with Non Negative...
RSIS International
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
AkankshaRawat53
 
Subgraph relative frequency approach for extracting interesting substructur
IAEME Publication
 
Research Proposal
Komlan Atitey
 
IRJET - Object Detection using Hausdorff Distance
IRJET Journal
 
Data clustering using map reduce
Varad Meru
 
IRJET- Object Detection using Hausdorff Distance
IRJET Journal
 
Ad

More from Kasun Gajasinghe (7)

PDF
Building Services with WSO2 Microservices framework for Java and WSO2 AS
Kasun Gajasinghe
 
PDF
Building Services with WSO2 Microservices framework for Java and WSO2 AS
Kasun Gajasinghe
 
PDF
Distributed caching with java JCache
Kasun Gajasinghe
 
PDF
[WSO2] Deployment Synchronizer for Deployment Artifact Synchronization Betwee...
Kasun Gajasinghe
 
PDF
Siddhi CEP Engine
Kasun Gajasinghe
 
PPT
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Kasun Gajasinghe
 
PDF
Google Summer of Code 2011 Sinhalese flyer
Kasun Gajasinghe
 
Building Services with WSO2 Microservices framework for Java and WSO2 AS
Kasun Gajasinghe
 
Building Services with WSO2 Microservices framework for Java and WSO2 AS
Kasun Gajasinghe
 
Distributed caching with java JCache
Kasun Gajasinghe
 
[WSO2] Deployment Synchronizer for Deployment Artifact Synchronization Betwee...
Kasun Gajasinghe
 
Siddhi CEP Engine
Kasun Gajasinghe
 
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Kasun Gajasinghe
 
Google Summer of Code 2011 Sinhalese flyer
Kasun Gajasinghe
 

Recently uploaded (20)

PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PDF
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PPTX
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Next level data operations using Power Automate magic
Andries den Haan
 
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 

Survey on Frequent Pattern Mining on Graph Data - Slides

  • 1. Sriskandarajah Suhothayan Kasun Gajasinghe Isuru Loku Narangoda Subash Chaturanga
  • 2. Outline Introduction Basic principles Solution patterns
  • 3. Introduction Graphs can be seen in everywhere. In computer science, graph is viewed as an abstract data structure which represents relationships among data.
  • 4. Graph based data mining Graph based data mining is finding out useful and understandable patterns from graph representation of data. The main subject area of graph based data mining is identifying the frequently occurring subgraph patterns.
  • 5. Approaches In the recent past a significant work has been done in this subject area to develop algorithms to mine graph data efficiently. In this paper we are discussing about such several well known algorithms under following categories. Mathematical Graph Theory Based Approaches Greedy Search Based Approaches Inductive Logic Programming Approach Inductive Database Based Approaches
  • 6. Applications BioInformatics mine biochemical structures finding out biological conserved sub networks Chemical compound analysis Web browsing pattern analysis intrusion network analysis mining communication networks
  • 7. Basic Principles Subgraph categories general subgraphs induced subgraphs connected subgraphs Subgraph Isomorphism Problem This finds whether there exists a one-to-one mapping from a set of vertices to another set.
  • 8. Basic Principles Graph Invariants Quantities to characterize the topological structure of a graph number vertices, degree of each vertex number of edges connected to the vertex
  • 9. Solution Approaches direct Categorization Completeness complete search heuristic search Subgraph isomorphism matching problem Indirect (solves the subgraph similarity problem)
  • 10. Solution Approaches Greedy search Inductive logic programming (ILP) Inductive database Complete level-wise search Support Vector Machine (SVM)
  • 11. Greedy search The conventional solution Categorized into Depth-First search (DFS) and Breadth-First Search (BFS) Beam search The disadvantage: as the search proceeds it prunes the branches which do not fit to the maximum branch number limit
  • 12. Inductive logic programming (ILP) Induction? combination of the 'abduction' (guessing) to select some hypotheses and the 'justification' to seek those hypotheses to justify the observed facts.
  • 13. Inductive logic programming (ILP) positive examples + negative examples => hypothesis + background knowledge background knowledge to control the search process (prune some search paths) introduce predetermined subgraph patterns ILP can be in any of four categories
  • 14. Inductive database Subgraphs and relations among subgraphs are pre-generated sad stored in an inductive database Advantage: fast operation as the basic patterns Disadvantage: large amount of computation and memory utilization
  • 15. Complete level-wise search It's Complete and Direct Here data are not sets of items Rather graphs having the combinations of a vertex set V(G) and an edge set E(G) which include topological information. Extended approach of Apriori algorithm is used
  • 16. Support Vector Machine (SVM) Used for classification and regression analysis A non-probabilistic binary linear classifier SVN is a heuristic search and an indirect method in terms of subgraph isomorphism problem.
  • 17. Categorization Mathematical Graph Theory Based Approaches Greedy Search Based Approaches Inductive Logic Programming Approach Inductive Database Based Approaches Kernel Function Based Approaches
  • 18. Greedy Search Based Approaches Use heuristics to evaluate the solution. Two major works SUBDUE GBI
  • 19. Graph Based Induction (GBI) Has two methods one for chunking and the other for extracting patters. Can arrive at local minimum solutions; using pair wise chunking at each step by the opportunistic beam search. Ability to reconstruct the original graph as and when needed The advantage of GBI is that it can handle both directed and undirected labelled graph even with closed paths which includes closed edges. Use empirical graph size definition, limitation in continuously compressing the graph, graph never becomes a single vertex. Extract substructures and construct a classifier.
  • 20. SUBDUE A graph-based relational learning system Compress the graphs based on Minimum Description Length (MDL) principle Not face high computational complexity (uses computationally constrained beam search) Miss some optimum sub graphs fewer number of highly interesting patterns; than generating a large number of patterns from which interesting patterns need to be identified. Runtime much larger than gSpan and FSG: non-linear with the dataset size (because of the implementation of graph isomorphism problem)
  • 21. Mathematical Approaches Apriori-based methods AGM FSG Pattern Growth methods gSpan
  • 22. Apriori-based Approach AGM Used to mine “frequent induced subgraphs” Works with both directed and undirected graphs Importantly, this algorithm is not limited to the connected graphs. It also supports isolated graphs.
  • 23. AGM Breadth first search. Create new candidates for level k+1 by joining two graphs at level k. AGM generates new graphs by adding a new node: And then proceeds as per Apriori...
  • 24. FSG FSG works better on graph data sets with more edge and vertex labels This is an optimized algorithm of AGM with added techniques for efficiency. FSG increases the efficiency of the candidate generation of frequent subgraphs by introducing the Transaction ID (TID) method. efficient candidate subgraph generation algorithms.
  • 25. FSG FSG is a apriori-based and therefore uses level-wise algorithm Faces two challenges: candidate generation: the generation of size subgraph candidates is more complicated and costly pruning false positives: subgraph isomorphism test is an NP-complete problem
  • 26. gSpan Uses Depth-First-Search (DFS) can be used to find frequent sub graphs one by one from small to large ones. Advantages No candidate generation and false test Better saving of space by DFS. Pattern growth mathod
  • 27. GRAPH DATASET FREQUENT PATTERNS (MIN SUPPORT IS 2) (A) (B) (C) (1) (2)
  • 28. Another three approaches to mine graph based data. Inductive Logic Programming approach Inductive database approach Kernel function based approach
  • 29. ILP approach. ILP systems constructs predictive model for a given data set by searching large space of candidate hypothesis. WARMR – proposed in 1998. Combination of Apriori-like level wise search and IPL method. But have a high computational complexity. FARMER – proposed in 2011. Runs two orders of magnitude than WARMER.
  • 30. Inductive DB approach. Databases which are capable of handling patterns within data. Quite different from from typical data bases. Uses interactive querying process to mine data in these data bases. MolFea is an effort related to this area. Has a better computational efficiency which mines linear fragments in chemical compounds.. Also this performs a complete search of the paths in graph data.
  • 31. Kernel Function based approach This “kernel” function basically defines similarity between two graphs The paper consists of two efforts done based on this approach, which classifies the graphs in to binary classes by SVM (Support Vector - Machine).