SlideShare a Scribd company logo
Summer School
“Achievements and Applications of Contemporary Informatics,
         Mathematics and Physics” (AACIMP 2011)
              August 8-20, 2011, Kiev, Ukraine




            Graph Based Clustering

                                 Erik Kropat

                     University of the Bundeswehr Munich
                      Institute for Theoretical Computer Science,
                        Mathematics and Operations Research
                                Neubiberg, Germany
Real World Networks

• Biological Networks
  −   Gene regulatory networks
  −   Metabolic networks
  −   Neural networks
  −   Food webs
                                               food web



                                 • Technological Networks
                                   − Telecommunication networks
                                   − Internet
                                   − Power grids

             power grid
Real World Networks

• Social Networks
    −    Communication networks
    −    Organizational networks
    −    Social media
    −    Online communities
                                                                                                    social networks



                                                                              • Economic Networks
                                                                                   − Financial market networks
                                                                                   − Trade networks
                                                                                   − Collaboration networks

               economic networks


Source: Frank Schweitzer et al., “Economic Networks: The New Challenges,” Science 325, no. 5939 (July 24, 2009): 422-425.
Graph-Theory

• Graph theory can provide more detailed information
  about the inner structure of the data set in terms of
    −   cliques          (subsets of nodes where each pair of elements is connected)
    −   clusters         (highly connected groups of nodes)
    −   centrality       (important nodes, hubs)
    −   outliers . . .   (unimportant nodes)

• Applications
    − social network analysis
    − diffusion of information
    − spreading of diseases or rumours

⇒    marketing campaigns, viral marketing, social network advertising
Graph-Based Clustering

• Collection of a wide range of very popular clustering algorithms
  that are based on graph-theory.
• Organize information in large datasets to facilitate users
  for faster access to required information.
Idea

• Objects are represented as nodes in a complete or connected graph.
• Assign a weight to each branch between the two nodes x and y.
  The weight is defined by the distance d(x,y) between the nodes.


Clustering
                                 Distance between
                                      clusters
                                                            Distance between
                                                                 objects
Idea




                               graph




       minimal spanning tree           clusters
Graph Based Clustering

Hierarchical method
(1) Determine a minimal spanning tree (MST)
(2) Delete branches iteratively
    New connected components = Cluster




                                                  4

                                                      6       5


                                              1           8


                                                      3
Minimal Spanning Trees
Minimal Spanning Tree

A minimal spanning tree of a connected graph G = (V,E)
is a connected subgraph with minimal weight
that contains all nodes of G and has no cycles.

                       c                                     c

             4                                      4
                 6           5                          6        5
    b                                    b

     1                8                   1                 8

    a            3               d       a              3             d

          graph G = (V, E)                    minimal spanning tree
Minimal spanning trees can be calculated with...

(1) Prim’s algorithm.
(2) Kruskal’s algorithm.

                                                               c

                                                       4

                                                           6       5
                                                   b

                                                   1           8

                                                   a       3           d
Example – Prims’s Algorithm

Set VT = {a}, ET = { }           Choose an edge (x,y) with minimal weight
                                 such that x ∈ VT and y ∉ VT.
                                 VT = {a,b} and ET = { (a,b) }.


                     c                                     c

           4                                      4

               6         5                            6           5
 b                                       b

 1                  8                    1                8


 a             3             d           a            3               d
Example– Prims’s Algorithm

Choose an edge (x,y) with minimal weight   Choose an edge (x,y) with minimal weight
such that x ∈ VT and y ∉ VT.               such that x ∈ VT and y ∉ VT.
VT = {a,b,d} and ET = { (a,b), (a,d) }.    VT = {a,b,c,d} and ET = { (a,b), (a,d),(b,c) }.


                           c                                             c

                  4                                            4

                      6        5                                   6         5
        b                                             b

        1                 8                           1                 8

    c                                             c
        a             3             d                 a            3              d
Prim’s Algorithm


 INPUT:       Weighted graph G = (V, E), undirected + connected
 OUTPUT:      Minimal spanning tree T = (VT, ET)

 (1) Set VT = {v}, ET = { }, where v is an arbitrary node from V (starting point).
 (2) REPEAT
 (3)   Choose an edge (a,b) with minimal weight, such that a ∈ VT and b ∉ VT.
 (4)   Set VT = VT ∪ {b} and ET = ET ∪ { (a,b) }.
 (5) UNTIL VT = V
Kruskal’s Algorithm


 INPUT:        Weighted graph G = (V, E), undirected + connected
 OUTPUT:       Minimal spanning tree T = (VT, ET)

 (1) Set VT = V, ET = { }, H = E.
 (2) Initialize a queue to contain all edges in G, using the weights in ascending
     order as keys.
 (3) WHILE H ≠ { }
 (4)       Choose an edge e ∈ H with minimal weight.
 (5)       Set H = H  {e}.
 (6)       If (VT, ET ∪ {e}) has no cycles, then ET = ET ∪ {e} .
 (7) END
Branch Deletion
Delete Branches - Different Strategies

(1) Delete the branch with maximum weight.
(2) Delete inconsistent branches.
(3) Delete by analysis of weights.
(1) Delete the branch with maximum weight

• In each step, create two new clusters
  by deleting the branch with maximum weight.
• Repeat until the given number of clusters is reached.

                                               2
                                           2           2

                                                   4




                           2                3

                                  6
                           2
Example: Delete the branch with maximum weight

                                                    2
                                               2            2


                                                        4




                        2                       3
                                                            Minimum spanning tree
                                6
                        2



Ordered weights of branches:   6, 4, 3, 2, 2, 2, 2, 2.
Example: Delete the branch with maximum weight

                                                    2
                                               2            2


                                                        4




                         2                      3

                                   6
                         2



Ordered weights of branches:   6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6)       ⇒   2 clusters
Example: Delete the branch with maximum weight

                                                    2
                                               2            2


                                                        4




                         2                      3

                                   6
                         2




Ordered weights of branches:   6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6)       ⇒   2 clusters
Step 2: Delete branch (weight 4)       ⇒   3 clusters
(2) Delete inconsistent branches

• A branch e is inconsistent, if the corresponding weight de
                                           _
  is (much) larger than a reference value de .
                       _
• The reference value de can be defined by the average weight
  of all branches adjacent to e.

                                                  _
                                                          3+2+1
                                                  de   = _________ = 2
                                                             3
                  1
                          e            3

                          6                                     _
                  2
                                                  d e = 6 > 2 = de
                                                  ⇒ e inconsistent
(3) Delete by analysis of weights
• Perform an “analysis” of all weights of branches in the MST.
  Determine a threshold S.
• The threshold can be estimated by
  histograms on the weights of branches (= length of branches).
• Delete a branches, if the corresponding weight higher than the threshold S.
                   Number




                                                 Number




                                  S

                             weight of branch             weight of branch
                            (length of branch)
Exercise                                  d

                               3                      20

                                              5
                        e                                  c
                               9                  8


                         1                                 4
                               15     g           6

                                          12
                         f                                 b

                               10                     2


                                       a

Find a minimal spanning tree and provide a clustering of the graph
by deleting all inconsistent branches.
Example

Set VT = {a}, ET = { }   Choose an edge (x,y) with minimal weight
                         such that x ∈ VT and y ∉ VT.
Example

Choose an edge (x,y) with minimal weight   Choose an edge (x,y) with minimal weight
such that x ∈ VT and y ∉ VT.               such that x ∈ VT and y ∉ VT.
Example

Choose an edge (x,y) with minimal weight   Choose an edge (x,y) with minimal weight
such that x ∈ VT and y ∉ VT.               such that x ∈ VT and y ∉ VT.
Example
          Choose an edge (x,y) with minimal weight
          such that x ∈ VT and y ∉ VT.




                    minimal spanning tree
Example
          For each branch calculate the reference value
              (average weight of adjacent branches)
                                    d

                        3
                        (3)   (4.5) 5
               e                                        c


                1 (3)                               (4) 4
                                g          6
                                        (3.6)
                f                                       b
                                         (5)
                                                2


                                  a
Example
                Delete inconsistent branches
          (weight is larger than the reference value)
                               d
                                         2 clusters
                       3
                       (3)
              e                              c


               1 (3)                     (4) 4
                               g

              f                              b
                             Noise?


                               a
Summary
Summary

• In graph based clustering objects are represented as nodes
  in a complete or connected graph.
• The distance between two objects is given by the weight
  of the corresponding branch.
• Hierarchical method
     (1) Determine a minimal spanning tree (MST)
     (2) Delete branches iteratively
• Visualization of information in large datasets.
Literature

• V. Kumar, M. Steinbach, P.-N. Tan
  Introduction to Data Mining.
  Addison Wesley, 2005.

Other work mentioned in the presentation
• J.A. Dunne, R.J. Williams, N.D. Martinez, R.A. Wood, D.H. Erwin
  Compilation and Network Analyses of Cambrian Food Webs.
  PLoS Biol 6(4): e102. doi:10.1371/journal.pbio.0060102

• F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo,
  A. Vespignani, D.R. White
  Economic Networks: The New Challenges.
  Science 325, no. 5939 (July 24, 2009): 422-425.
Thank you very much!
Ad

Recommended

PPTX
Project on disease prediction
KOYELMAJUMDAR1
 
PPTX
Database recovery
Vritti Malhotra
 
PPTX
Introduction to-python
Aakashdata
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Stock markets presentation
Sahil Gupta
 
PDF
Matrix chain multiplication
Kiran K
 
PDF
Bayesian learning
Vignesh Saravanan
 
PDF
Cs6702 graph theory and applications Anna University question paper apr may 2...
appasami
 
PDF
Dimensionality Reduction
mrizwan969
 
PDF
Bayesian Networks - A Brief Introduction
Adnan Masood
 
PPTX
Decision Tree Learning
Milind Gokhale
 
PPTX
Dbscan algorithom
Mahbubur Rahman Shimul
 
PDF
Cluster analysis
Hohai university
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
PPT
Cluster analysis
Kamalakshi Deshmukh-Samag
 
PPTX
Unsupervised learning
amalalhait
 
PPTX
05 Clustering in Data Mining
Valerii Klymchuk
 
PPTX
Kmeans
Nikita Goyal
 
PPTX
Data mining: Classification and prediction
DataminingTools Inc
 
PPT
1.2 steps and functionalities
Krish_ver2
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PDF
Feature selection
Dong Guo
 
PPTX
Kruskal Algorithm
Bhavik Vashi
 
PPTX
Supervised and unsupervised learning
Paras Kohli
 
PPTX
Association Analysis in Data Mining
Kamal Acharya
 
DOC
Graph Clustering and cluster
Adil Mehmoood
 
PDF
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 

More Related Content

What's hot (20)

PDF
Dimensionality Reduction
mrizwan969
 
PDF
Bayesian Networks - A Brief Introduction
Adnan Masood
 
PPTX
Decision Tree Learning
Milind Gokhale
 
PPTX
Dbscan algorithom
Mahbubur Rahman Shimul
 
PDF
Cluster analysis
Hohai university
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
PPT
Cluster analysis
Kamalakshi Deshmukh-Samag
 
PPTX
Unsupervised learning
amalalhait
 
PPTX
05 Clustering in Data Mining
Valerii Klymchuk
 
PPTX
Kmeans
Nikita Goyal
 
PPTX
Data mining: Classification and prediction
DataminingTools Inc
 
PPT
1.2 steps and functionalities
Krish_ver2
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PDF
Feature selection
Dong Guo
 
PPTX
Kruskal Algorithm
Bhavik Vashi
 
PPTX
Supervised and unsupervised learning
Paras Kohli
 
PPTX
Association Analysis in Data Mining
Kamal Acharya
 
Dimensionality Reduction
mrizwan969
 
Bayesian Networks - A Brief Introduction
Adnan Masood
 
Decision Tree Learning
Milind Gokhale
 
Dbscan algorithom
Mahbubur Rahman Shimul
 
Cluster analysis
Hohai university
 
Machine Learning with Decision trees
Knoldus Inc.
 
Cluster analysis
Kamalakshi Deshmukh-Samag
 
Unsupervised learning
amalalhait
 
05 Clustering in Data Mining
Valerii Klymchuk
 
Kmeans
Nikita Goyal
 
Data mining: Classification and prediction
DataminingTools Inc
 
1.2 steps and functionalities
Krish_ver2
 
Decision tree and random forest
Lippo Group Digital
 
Data Mining: clustering and analysis
DataminingTools Inc
 
Feature selection
Dong Guo
 
Kruskal Algorithm
Bhavik Vashi
 
Supervised and unsupervised learning
Paras Kohli
 
Association Analysis in Data Mining
Kamal Acharya
 

Viewers also liked (20)

DOC
Graph Clustering and cluster
Adil Mehmoood
 
PDF
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
PPTX
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Domino Data Lab
 
PPT
Semi-supervised concept detection by learning the structure of similarity graphs
Symeon Papadopoulos
 
PPTX
RDFa Tutorial
Ivan Herman
 
PDF
A la croisĂŠe des Graphes
Cyril HIJAR
 
PPT
Facebook ConfĂŠrence "Ne vous limitez pas Ă  la Fan page et aux Like"
Arnaud ROFIDAL
 
PDF
A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies
Symeon Papadopoulos
 
PDF
vts_7560_10802
Mohamed Farouk
 
PPTX
Clustering for Beginners
Sayeed Mahmud
 
PDF
Introduction Ă  l'analyse de rĂŠseaux avec R
Laurent Beauguitte
 
PPTX
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Domino Data Lab
 
PPTX
Document clustering and classification
Mahmoud Alfarra
 
PDF
Label propagation - Semisupervised Learning with Applications to NLP
David Przybilla
 
PPT
GRAPH COLORING AND ITS APPLICATIONS
Manojit Chakraborty
 
PDF
2010 Branch Network Optimization Presentation
Chris Gill
 
PPT
Ch08
nathanurag
 
PDF
K means Clustering
Edureka!
 
PPTX
Network Proposal Power Point
guest7fbe17
 
PPTX
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Paul Shapiro
 
Graph Clustering and cluster
Adil Mehmoood
 
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Domino Data Lab
 
Semi-supervised concept detection by learning the structure of similarity graphs
Symeon Papadopoulos
 
RDFa Tutorial
Ivan Herman
 
A la croisĂŠe des Graphes
Cyril HIJAR
 
Facebook ConfĂŠrence "Ne vous limitez pas Ă  la Fan page et aux Like"
Arnaud ROFIDAL
 
A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies
Symeon Papadopoulos
 
vts_7560_10802
Mohamed Farouk
 
Clustering for Beginners
Sayeed Mahmud
 
Introduction Ă  l'analyse de rĂŠseaux avec R
Laurent Beauguitte
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Domino Data Lab
 
Document clustering and classification
Mahmoud Alfarra
 
Label propagation - Semisupervised Learning with Applications to NLP
David Przybilla
 
GRAPH COLORING AND ITS APPLICATIONS
Manojit Chakraborty
 
2010 Branch Network Optimization Presentation
Chris Gill
 
Ch08
nathanurag
 
K means Clustering
Edureka!
 
Network Proposal Power Point
guest7fbe17
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Paul Shapiro
 
Ad

Similar to Graph Based Clustering (20)

PDF
Extracting biclusters of similar values with Triadic Concept Analysis
INSA Lyon - L'Institut National des Sciences AppliquĂŠes de Lyon
 
PDF
SISAP17
Yasuo Tabei
 
PPTX
An Efficient Convex Hull Algorithm for a Planer Set of Points
Kasun Ranga Wijeweera
 
PPTX
11L_2024_DSCS_EN_Trees2_Prim_Kraskal.pptx
RavanGulmetov
 
PPTX
Strassen's Matrix Multiplication divide and conquere algorithm
Ahmad177077
 
PPTX
MATLABgraphPlotting.pptx
PrabhakarSingh646829
 
PPT
Hierarchical (2)l ppt for data and analytics
DrMADHURI6
 
PDF
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
PDF
DATA STRUCTURES & ALGORITHMS MINIMUM SPANNING TREE
nguyenminhhuy2905
 
PDF
Steven Duplij, Raimund Vogl, "Polyadic Braid Operators and Higher Braiding Ga...
Steven Duplij (Stepan Douplii)
 
PDF
Ch07 linearspacealignment
BioinformaticsInstitute
 
PPT
Introduction to MATLAB
Damian T. Gordon
 
PDF
M A T H E M A T I C A L M E T H O D S J N T U M O D E L P A P E R{Www
guest3f9c6b
 
PDF
Eigenvalues and eigenvectors
iraq
 
PPTX
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
SintooChauhan6
 
DOCX
Q1Perform the two basic operations of multiplication and divisio.docx
amrit47
 
PDF
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Gota Morota
 
PPTX
UNIT-5-II IT-DATA VISUALIZATION TECHNIQUES
hemalathab24
 
PDF
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Koji Yamamoto
 
PDF
Introduction to Big Data Science
Albert Bifet
 
Extracting biclusters of similar values with Triadic Concept Analysis
INSA Lyon - L'Institut National des Sciences AppliquĂŠes de Lyon
 
SISAP17
Yasuo Tabei
 
An Efficient Convex Hull Algorithm for a Planer Set of Points
Kasun Ranga Wijeweera
 
11L_2024_DSCS_EN_Trees2_Prim_Kraskal.pptx
RavanGulmetov
 
Strassen's Matrix Multiplication divide and conquere algorithm
Ahmad177077
 
MATLABgraphPlotting.pptx
PrabhakarSingh646829
 
Hierarchical (2)l ppt for data and analytics
DrMADHURI6
 
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
DATA STRUCTURES & ALGORITHMS MINIMUM SPANNING TREE
nguyenminhhuy2905
 
Steven Duplij, Raimund Vogl, "Polyadic Braid Operators and Higher Braiding Ga...
Steven Duplij (Stepan Douplii)
 
Ch07 linearspacealignment
BioinformaticsInstitute
 
Introduction to MATLAB
Damian T. Gordon
 
M A T H E M A T I C A L M E T H O D S J N T U M O D E L P A P E R{Www
guest3f9c6b
 
Eigenvalues and eigenvectors
iraq
 
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
SintooChauhan6
 
Q1Perform the two basic operations of multiplication and divisio.docx
amrit47
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Gota Morota
 
UNIT-5-II IT-DATA VISUALIZATION TECHNIQUES
hemalathab24
 
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Koji Yamamoto
 
Introduction to Big Data Science
Albert Bifet
 
Ad

More from SSA KPI (20)

PDF
Germany presentation
SSA KPI
 
PDF
Grand challenges in energy
SSA KPI
 
PDF
Engineering role in sustainability
SSA KPI
 
PDF
Consensus and interaction on a long term strategy for sustainable development
SSA KPI
 
PDF
Competences in sustainability in engineering education
SSA KPI
 
PDF
Introducatio SD for enginers
SSA KPI
 
PPT
DAAD-10.11.2011
SSA KPI
 
PDF
Talking with money
SSA KPI
 
PDF
'Green' startup investment
SSA KPI
 
PDF
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
SSA KPI
 
PDF
Dynamics of dice games
SSA KPI
 
PPT
Energy Security Costs
SSA KPI
 
PPT
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
SSA KPI
 
PDF
Advanced energy technology for sustainable development. Part 5
SSA KPI
 
PDF
Advanced energy technology for sustainable development. Part 4
SSA KPI
 
PDF
Advanced energy technology for sustainable development. Part 3
SSA KPI
 
PDF
Advanced energy technology for sustainable development. Part 2
SSA KPI
 
PDF
Advanced energy technology for sustainable development. Part 1
SSA KPI
 
PPT
Fluorescent proteins in current biology
SSA KPI
 
PPTX
Neurotransmitter systems of the brain and their functions
SSA KPI
 
Germany presentation
SSA KPI
 
Grand challenges in energy
SSA KPI
 
Engineering role in sustainability
SSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
SSA KPI
 
Competences in sustainability in engineering education
SSA KPI
 
Introducatio SD for enginers
SSA KPI
 
DAAD-10.11.2011
SSA KPI
 
Talking with money
SSA KPI
 
'Green' startup investment
SSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
SSA KPI
 
Dynamics of dice games
SSA KPI
 
Energy Security Costs
SSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
SSA KPI
 
Advanced energy technology for sustainable development. Part 5
SSA KPI
 
Advanced energy technology for sustainable development. Part 4
SSA KPI
 
Advanced energy technology for sustainable development. Part 3
SSA KPI
 
Advanced energy technology for sustainable development. Part 2
SSA KPI
 
Advanced energy technology for sustainable development. Part 1
SSA KPI
 
Fluorescent proteins in current biology
SSA KPI
 
Neurotransmitter systems of the brain and their functions
SSA KPI
 

Recently uploaded (20)

PPTX
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
PPTX
Peer Teaching Observations During School Internship
AjayaMohanty7
 
PPTX
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
PPTX
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
PPTX
How to Add New Item in CogMenu in Odoo 18
Celine George
 
PPTX
Q1_TLE 8_Week 1- Day 1 tools and equipment
clairenotado3
 
PDF
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
PPTX
INDUCTIVE EFFECT slide for first prof pharamacy students
SHABNAM FAIZ
 
PPTX
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
PPTX
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
PPTX
Wage and Salary Computation.ppt.......,x
JosalitoPalacio
 
PPTX
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
PDF
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
 
PPTX
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
PDF
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
 
PPTX
How to use search fetch method in Odoo 18
Celine George
 
PDF
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
PPTX
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 
PDF
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
Peer Teaching Observations During School Internship
AjayaMohanty7
 
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
How to Add New Item in CogMenu in Odoo 18
Celine George
 
Q1_TLE 8_Week 1- Day 1 tools and equipment
clairenotado3
 
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
INDUCTIVE EFFECT slide for first prof pharamacy students
SHABNAM FAIZ
 
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
Wage and Salary Computation.ppt.......,x
JosalitoPalacio
 
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
 
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
 
How to use search fetch method in Odoo 18
Celine George
 
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 

Graph Based Clustering

  • 1. Summer School “Achievements and Applications of Contemporary Informatics, Mathematics and Physics” (AACIMP 2011) August 8-20, 2011, Kiev, Ukraine Graph Based Clustering Erik Kropat University of the Bundeswehr Munich Institute for Theoretical Computer Science, Mathematics and Operations Research Neubiberg, Germany
  • 2. Real World Networks • Biological Networks − Gene regulatory networks − Metabolic networks − Neural networks − Food webs food web • Technological Networks − Telecommunication networks − Internet − Power grids power grid
  • 3. Real World Networks • Social Networks − Communication networks − Organizational networks − Social media − Online communities social networks • Economic Networks − Financial market networks − Trade networks − Collaboration networks economic networks Source: Frank Schweitzer et al., “Economic Networks: The New Challenges,” Science 325, no. 5939 (July 24, 2009): 422-425.
  • 4. Graph-Theory • Graph theory can provide more detailed information about the inner structure of the data set in terms of − cliques (subsets of nodes where each pair of elements is connected) − clusters (highly connected groups of nodes) − centrality (important nodes, hubs) − outliers . . . (unimportant nodes) • Applications − social network analysis − diffusion of information − spreading of diseases or rumours ⇒ marketing campaigns, viral marketing, social network advertising
  • 5. Graph-Based Clustering • Collection of a wide range of very popular clustering algorithms that are based on graph-theory. • Organize information in large datasets to facilitate users for faster access to required information.
  • 6. Idea • Objects are represented as nodes in a complete or connected graph. • Assign a weight to each branch between the two nodes x and y. The weight is defined by the distance d(x,y) between the nodes. Clustering Distance between clusters Distance between objects
  • 7. Idea graph minimal spanning tree clusters
  • 8. Graph Based Clustering Hierarchical method (1) Determine a minimal spanning tree (MST) (2) Delete branches iteratively New connected components = Cluster 4 6 5 1 8 3
  • 10. Minimal Spanning Tree A minimal spanning tree of a connected graph G = (V,E) is a connected subgraph with minimal weight that contains all nodes of G and has no cycles. c c 4 4 6 5 6 5 b b 1 8 1 8 a 3 d a 3 d graph G = (V, E) minimal spanning tree
  • 11. Minimal spanning trees can be calculated with... (1) Prim’s algorithm. (2) Kruskal’s algorithm. c 4 6 5 b 1 8 a 3 d
  • 12. Example – Prims’s Algorithm Set VT = {a}, ET = { } Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT. VT = {a,b} and ET = { (a,b) }. c c 4 4 6 5 6 5 b b 1 8 1 8 a 3 d a 3 d
  • 13. Example– Prims’s Algorithm Choose an edge (x,y) with minimal weight Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT. such that x ∈ VT and y ∉ VT. VT = {a,b,d} and ET = { (a,b), (a,d) }. VT = {a,b,c,d} and ET = { (a,b), (a,d),(b,c) }. c c 4 4 6 5 6 5 b b 1 8 1 8 c c a 3 d a 3 d
  • 14. Prim’s Algorithm INPUT: Weighted graph G = (V, E), undirected + connected OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = {v}, ET = { }, where v is an arbitrary node from V (starting point). (2) REPEAT (3) Choose an edge (a,b) with minimal weight, such that a ∈ VT and b ∉ VT. (4) Set VT = VT ∪ {b} and ET = ET ∪ { (a,b) }. (5) UNTIL VT = V
  • 15. Kruskal’s Algorithm INPUT: Weighted graph G = (V, E), undirected + connected OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = V, ET = { }, H = E. (2) Initialize a queue to contain all edges in G, using the weights in ascending order as keys. (3) WHILE H ≠ { } (4) Choose an edge e ∈ H with minimal weight. (5) Set H = H {e}. (6) If (VT, ET ∪ {e}) has no cycles, then ET = ET ∪ {e} . (7) END
  • 17. Delete Branches - Different Strategies (1) Delete the branch with maximum weight. (2) Delete inconsistent branches. (3) Delete by analysis of weights.
  • 18. (1) Delete the branch with maximum weight • In each step, create two new clusters by deleting the branch with maximum weight. • Repeat until the given number of clusters is reached. 2 2 2 4 2 3 6 2
  • 19. Example: Delete the branch with maximum weight 2 2 2 4 2 3 Minimum spanning tree 6 2 Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
  • 20. Example: Delete the branch with maximum weight 2 2 2 4 2 3 6 2 Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2. Step 1: Delete branch (weight 6) ⇒ 2 clusters
  • 21. Example: Delete the branch with maximum weight 2 2 2 4 2 3 6 2 Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2. Step 1: Delete branch (weight 6) ⇒ 2 clusters Step 2: Delete branch (weight 4) ⇒ 3 clusters
  • 22. (2) Delete inconsistent branches • A branch e is inconsistent, if the corresponding weight de _ is (much) larger than a reference value de . _ • The reference value de can be defined by the average weight of all branches adjacent to e. _ 3+2+1 de = _________ = 2 3 1 e 3 6 _ 2 d e = 6 > 2 = de ⇒ e inconsistent
  • 23. (3) Delete by analysis of weights • Perform an “analysis” of all weights of branches in the MST. Determine a threshold S. • The threshold can be estimated by histograms on the weights of branches (= length of branches). • Delete a branches, if the corresponding weight higher than the threshold S. Number Number S weight of branch weight of branch (length of branch)
  • 24. Exercise d 3 20 5 e c 9 8 1 4 15 g 6 12 f b 10 2 a Find a minimal spanning tree and provide a clustering of the graph by deleting all inconsistent branches.
  • 25. Example Set VT = {a}, ET = { } Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
  • 26. Example Choose an edge (x,y) with minimal weight Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT. such that x ∈ VT and y ∉ VT.
  • 27. Example Choose an edge (x,y) with minimal weight Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT. such that x ∈ VT and y ∉ VT.
  • 28. Example Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT. minimal spanning tree
  • 29. Example For each branch calculate the reference value (average weight of adjacent branches) d 3 (3) (4.5) 5 e c 1 (3) (4) 4 g 6 (3.6) f b (5) 2 a
  • 30. Example Delete inconsistent branches (weight is larger than the reference value) d 2 clusters 3 (3) e c 1 (3) (4) 4 g f b Noise? a
  • 32. Summary • In graph based clustering objects are represented as nodes in a complete or connected graph. • The distance between two objects is given by the weight of the corresponding branch. • Hierarchical method (1) Determine a minimal spanning tree (MST) (2) Delete branches iteratively • Visualization of information in large datasets.
  • 33. Literature • V. Kumar, M. Steinbach, P.-N. Tan Introduction to Data Mining. Addison Wesley, 2005. Other work mentioned in the presentation • J.A. Dunne, R.J. Williams, N.D. Martinez, R.A. Wood, D.H. Erwin Compilation and Network Analyses of Cambrian Food Webs. PLoS Biol 6(4): e102. doi:10.1371/journal.pbio.0060102 • F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo, A. Vespignani, D.R. White Economic Networks: The New Challenges. Science 325, no. 5939 (July 24, 2009): 422-425.
  • 34. Thank you very much!