Tutorial of
Topological Data Analysis
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
Paper Alert 2016-04-15, Hasegawa lab., Tokyo
The University of Tokyo
Part III - Mapper Algorithm
My TDA = Topology Data Analysis ’s road
TDA Road 2
Part I - Basic concepts &
applications
Part II - Advanced TDA
computation
Part III - Mapper Algorithm
Part V - Applications in…
Part VI - Applications in…
Part IV - Software Roadmap
He is following me
TDA Road Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper Algorithm
Basic motivation
Mapper Algorithm 4
Basic idea
Perform clustering at different “scales”, track how
clusters change as scale varies
Motivation
• Coarser than manifold learning, but
still works in nonlinear situation
• Extract meaningful geometric
information about dataset
• Efficiently computable (for large
dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition.
G Singh, F Mémoli, GE Carlsson - SPBG, 2007
Morse theory
Mapper Algorithm 5
Basic idea
Describe topology of a smooth manifold M using level
sets of a suitable function h : M -> R
• Recover M by looking at h-1((∞, t]), as t scans over the
range of h
• Topology of M changes at critical points of h
Reeb graphs
Mapper Algorithm 6
• For each t in R, contract each
component of f-1(t) to a point
• Resulting structure is a graph
Mapper
Mapper Algorithm 7
The mapper algorithm is a generalization of this procedure (Singh-
Memoli-Carlsson)
Input
✤ Filter (continuous) function f: X -> R
✤ Cover L of im(f) by open intervals:
Method
✤ Cluster each inverse image f-1(Lα) into various connected components
✤ The Mapper is the nerve of V
• Clusters are vertices
• 1 k-simplex per (k+1)-fold intersection
connected cover V
✤ Color vertices according to average value of f in the cluster
k
i=0Vi 6= ;, V0, ..., Vk 2 V
Workflow - Illustration
Mapper Algorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
Workflow - Illustration
Mapper Algorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
Workflow - Illustration
Mapper Algorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
Mapper in practice
Mapper Algorithm 11
Input
✤ Filter (continuous) function f: P -> R
✤ Cover L of im(f) by open intervals:
Method
✤ Cluster each inverse image f-1(Lα) into various connected components
in G
✤ The Mapper is the nerve of V
connected cover V
✤ Color vertices according to average value of f in the cluster
- Point cloud P with metric dP
- Compute neighborhood graph G = (P, E)
• Clusters are vertices
• 1 k-simplex per (k+1)-fold intersection
k
i=0Vi 6= ;, V0, ..., Vk 2 V
(intersections materialized
by data points)
Mapper in practice
Mapper Algorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice
Mapper Algorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice
Mapper Algorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice
Mapper Algorithm 15
Parameters
✤ Filter (continuous) function f: P -> R
✤ Cover L of im(f) by open intervals:
✤ Neighborhood size δ
Example: uniform cover L
• Resolution / granularity: r (diameter of intervals)
• Gain: g (percentage of overlap)
range scale
geometric scale
Filter functions
Mapper Algorithm 16
Choice of filter function is essential
• Some kind of density measure
• A score measure difference (distance) from some baseline
• An eccentricity measure
Statistics

Mean/Max/Min
Variance
n-Moment
Density
…
Machine Learning

PCA/SVD
Auto encoders
Isomap/MDS/TSNE
SVM Distance
Error/Debugging Info
…
Geometry

Centrality

Curvature

Harmonic Cycles
…
Filter functions
Mapper Algorithm 17
Eccentricity
Density
- How close the point lies to the “center” of the point cloud.
- How close the point to the surrounding points
Mapper in applications
Mapper Algorithm 18
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Topological Data Analysis for Discovery in Preclinical Spinal Cord
Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
Using Topological Data Analysis for Diagnosis Pulmonary Embolism,
Rucco et al., arXiv preprint, 2014
Topological Methods for Exploring Low-density States in
Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009
CD8 T-cell reactivity to islet antigens is unique to type 1 while
CD4 T-cell reactivity exists in both type 1 and type 2 diabetes,
Sarikonda et al., J. Autoimmunity, 2013
Innate and adaptive T cells in asthmatic patients: Relationship
to severity and disease mechanisms, Hinks et al., J. Allergy Clinical
Immunology, 2015
✤
✤
✤
✤
✤
✤
Mapper in practice
Mapper Algorithm 19
1. Clustering
2. Feature selection
Mapper in clustering
Mapper Algorithm 20
(1) Compute the Mapper
(2) Detect interesting topological substructures
(“loops”, “flares”)
(3) Use substructure to
cluster data
select parameters
Not easy (Tutorial part 1 + 2)
Mapper Algorithm 21
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
f: 1st and 2nd SVD r = 120, g = 22%
PCA can show the
Republican/
Democrat cluster
but TDA gives
more information
House Party representative grouping
Point: member of
the House
PCA
Mapper Algorithm 22
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Detect new clusters for NBA players
Mapper Algorithm 23
Innate and adaptive T cells in asthmatic patients: Relationship
to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015
The TDA used 62 subjects
with most complete data.
f: 1st and 2nd SVD
r = 120, g = 14%, equalized
Mapper in feature selection
Mapper Algorithm 24
(1) Compute the Mapper
(2) Detect interesting topological substructures
(“loops”, “flares”)
(3) Select features that best
discriminate data in substructure
select parameters Kolmogorov-Smirnov test on (substructure)
feature vs. (whole dataset) feature,
select features with low p-val
Mapper Algorithm 25
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Goal: detect factors that influence survival after therapy in breast cancer patients
Points: breast cancer patients that went through specific therapy
PCA/Single-linkage clustering cannot see this
f: eccentricity
r = 1/30, g = 33%
Mapper Algorithm 26
Topological Data Analysis for Discovery in Preclinical Spinal Cord
Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
Select Parameters
Mapper Algorithm 27
parameter r
parameter g
parameter δ
parameter f
• Small r -> fine cover 

(close to Reeb) (sensitive to δ)
• Large r -> rough cover 

(less sensitive to δ)
• g ≈ 1 -> more points inside
intersections , less sensitive to
δ but far from Reeb
• g ≈ 0 -> controlled Mapper
dimension, close to Reeb
• Large δ -> fewer nodes, clean
Mapper but far from Reeb
(more straight lines)
• Small δ -> distinct
topological structure but lots
of nodes (noisy)
• Depend mostly on the
dataset
coordinate, density estimation,
eccentricity, eigenvector
Select Parameters
Mapper Algorithm 28
Example: P in R2 sampled from known distribution
f = density estimator, r = 1/30, g = 20%
δ = percentage of the diameter of X
Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Reference links
Mapper Algorithm 29
• INF563 Topological Data Analysis Course

http://www.enseignement.polytechnique.fr/informatique/INF563/
• AYASDI

http://www.ayasdi.com/
• …

More Related Content

PDF
Topological data analysis
PDF
Introduction to Topological Data Analysis
PPTX
Topological Data Analysis.pptx
PDF
Topological Data Analysis and Persistent Homology
PDF
CCS2019-opological time-series analysis with delay-variant embedding
PDF
Topological Data Analysis: visual presentation of multidimensional data sets
PDF
SIAM-AG21-Topological Persistence Machine of Phase Transition
PDF
Introduction to Topological Data Analysis
Topological data analysis
Introduction to Topological Data Analysis
Topological Data Analysis.pptx
Topological Data Analysis and Persistent Homology
CCS2019-opological time-series analysis with delay-variant embedding
Topological Data Analysis: visual presentation of multidimensional data sets
SIAM-AG21-Topological Persistence Machine of Phase Transition
Introduction to Topological Data Analysis

What's hot (20)

PPTX
Dimension Reduction Introduction & PCA.pptx
PDF
Tutorial of topological_data_analysis_part_1(basic)
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PDF
Tda presentation
PDF
High Dimensional Data Visualization using t-SNE
PDF
Dimensionality Reduction
PPTX
Machine Learning project presentation
PDF
Visualizing Data Using t-SNE
PPTX
Linear regression with gradient descent
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
PPTX
Spectral clustering
PDF
Sufficient statistics
PPTX
Graph Neural Network - Introduction
PPTX
Lect5 principal component analysis
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PPTX
Random forest algorithm
PPT
K means Clustering Algorithm
PDF
Principal Component Analysis
PDF
Methods of Optimization in Machine Learning
Dimension Reduction Introduction & PCA.pptx
Tutorial of topological_data_analysis_part_1(basic)
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Tda presentation
High Dimensional Data Visualization using t-SNE
Dimensionality Reduction
Machine Learning project presentation
Visualizing Data Using t-SNE
Linear regression with gradient descent
Principal Component Analysis (PCA) and LDA PPT Slides
Spectral clustering
Sufficient statistics
Graph Neural Network - Introduction
Lect5 principal component analysis
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Random forest algorithm
K means Clustering Algorithm
Principal Component Analysis
Methods of Optimization in Machine Learning
Ad

Similar to Tutorial of topological data analysis part 3(Mapper algorithm) (20)

PDF
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
A walk through the intersection between machine learning and mechanistic mode...
PDF
Kernel based similarity estimation and real time tracking of moving
PDF
Vol 9 No 1 - January 2014
PPTX
[20240819_LabSeminar_Huy]Learning Decomposed Spatial Relations for Multi-Vari...
PDF
ME Synopsis
PDF
A Novel Approach to Mathematical Concepts in Data Mining
PPTX
A general multiobjective clustering approach based on multiple distance measures
PDF
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
PDF
Pca part
PDF
FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS
PDF
Farsi character recognition using new hybrid feature extraction methods
PDF
International Journal of Computer Science, Engineering and Information Techno...
PPTX
fuzzy LBP for face recognition ppt
PDF
Citython presentation
PPTX
[20240703_LabSeminar_Huy]MakeGNNGreatAgain.pptx
PDF
Применение машинного обучения для навигации и управления роботами
PDF
Interpolation of-geofield-parameters
PDF
Ill-posedness formulation of the emission source localization in the radio- d...
Welcome to International Journal of Engineering Research and Development (IJERD)
A walk through the intersection between machine learning and mechanistic mode...
Kernel based similarity estimation and real time tracking of moving
Vol 9 No 1 - January 2014
[20240819_LabSeminar_Huy]Learning Decomposed Spatial Relations for Multi-Vari...
ME Synopsis
A Novel Approach to Mathematical Concepts in Data Mining
A general multiobjective clustering approach based on multiple distance measures
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
Pca part
FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS
Farsi character recognition using new hybrid feature extraction methods
International Journal of Computer Science, Engineering and Information Techno...
fuzzy LBP for face recognition ppt
Citython presentation
[20240703_LabSeminar_Huy]MakeGNNGreatAgain.pptx
Применение машинного обучения для навигации и управления роботами
Interpolation of-geofield-parameters
Ill-posedness formulation of the emission source localization in the radio- d...
Ad

More from Ha Phuong (20)

PDF
QTML2021 UAP Quantum Feature Map
PDF
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
PDF
017_20160826 Thermodynamics Of Stochastic Turing Machines
PDF
016_20160722 Molecular Circuits For Dynamic Noise Filtering
PDF
015_20160422 Controlling Synchronous Patterns In Complex Networks
PDF
013_20160328_Topological_Measurement_Of_Protein_Compressibility
PDF
011_20160321_Topological_data_analysis_of_contagion_map
PDF
010_20160216_Variational Gaussian Process
PDF
009_20150201_Structural Inference for Uncertain Networks
PDF
PRML Reading Chapter 11 - Sampling Method
PDF
Approximate Inference (Chapter 10, PRML Reading)
PDF
008 20151221 Return of Frustrating Easy Domain Adaptation
PDF
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
PDF
006 20151207 draws - Deep Recurrent Attentive Writer
PDF
005 20151130 adversary_networks
PDF
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
PPTX
003 20151109 nn_faster_andfaster
PDF
002 20151019 interconnected_network
PDF
001 20151005 ranking_nodesingrowingnetwork
PDF
Deep Learning And Business Models (VNITC 2015-09-13)
QTML2021 UAP Quantum Feature Map
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
017_20160826 Thermodynamics Of Stochastic Turing Machines
016_20160722 Molecular Circuits For Dynamic Noise Filtering
015_20160422 Controlling Synchronous Patterns In Complex Networks
013_20160328_Topological_Measurement_Of_Protein_Compressibility
011_20160321_Topological_data_analysis_of_contagion_map
010_20160216_Variational Gaussian Process
009_20150201_Structural Inference for Uncertain Networks
PRML Reading Chapter 11 - Sampling Method
Approximate Inference (Chapter 10, PRML Reading)
008 20151221 Return of Frustrating Easy Domain Adaptation
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
006 20151207 draws - Deep Recurrent Attentive Writer
005 20151130 adversary_networks
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
003 20151109 nn_faster_andfaster
002 20151019 interconnected_network
001 20151005 ranking_nodesingrowingnetwork
Deep Learning And Business Models (VNITC 2015-09-13)

Recently uploaded (20)

PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
TEXTILE technology diploma scope and career opportunities
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Modernising the Digital Integration Hub
PPTX
The various Industrial Revolutions .pptx
PPT
What is a Computer? Input Devices /output devices
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Five Habits of High-Impact Board Members
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Statistics on Ai - sourced from AIPRM.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
Training Program for knowledge in solar cell and solar industry
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Getting started with AI Agents and Multi-Agent Systems
Developing a website for English-speaking practice to English as a foreign la...
TEXTILE technology diploma scope and career opportunities
Basics of Cloud Computing - Cloud Ecosystem
UiPath Agentic Automation session 1: RPA to Agents
Modernising the Digital Integration Hub
The various Industrial Revolutions .pptx
What is a Computer? Input Devices /output devices
sbt 2.0: go big (Scala Days 2025 edition)
sustainability-14-14877-v2.pddhzftheheeeee
Module 1.ppt Iot fundamentals and Architecture
Build Your First AI Agent with UiPath.pptx
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Five Habits of High-Impact Board Members
A review of recent deep learning applications in wood surface defect identifi...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Statistics on Ai - sourced from AIPRM.pdf

Tutorial of topological data analysis part 3(Mapper algorithm)

  • 1. Tutorial of Topological Data Analysis Tran Quoc Hoan @k09hthaduonght.wordpress.com/ Paper Alert 2016-04-15, Hasegawa lab., Tokyo The University of Tokyo Part III - Mapper Algorithm
  • 2. My TDA = Topology Data Analysis ’s road TDA Road 2 Part I - Basic concepts & applications Part II - Advanced TDA computation Part III - Mapper Algorithm Part V - Applications in… Part VI - Applications in… Part IV - Software Roadmap He is following me
  • 3. TDA Road Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ Mapper Algorithm
  • 4. Basic motivation Mapper Algorithm 4 Basic idea Perform clustering at different “scales”, track how clusters change as scale varies Motivation • Coarser than manifold learning, but still works in nonlinear situation • Extract meaningful geometric information about dataset • Efficiently computable (for large dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. G Singh, F Mémoli, GE Carlsson - SPBG, 2007
  • 5. Morse theory Mapper Algorithm 5 Basic idea Describe topology of a smooth manifold M using level sets of a suitable function h : M -> R • Recover M by looking at h-1((∞, t]), as t scans over the range of h • Topology of M changes at critical points of h
  • 6. Reeb graphs Mapper Algorithm 6 • For each t in R, contract each component of f-1(t) to a point • Resulting structure is a graph
  • 7. Mapper Mapper Algorithm 7 The mapper algorithm is a generalization of this procedure (Singh- Memoli-Carlsson) Input ✤ Filter (continuous) function f: X -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components ✤ The Mapper is the nerve of V • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection connected cover V ✤ Color vertices according to average value of f in the cluster k i=0Vi 6= ;, V0, ..., Vk 2 V
  • 8. Workflow - Illustration Mapper Algorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 9. Workflow - Illustration Mapper Algorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 10. Workflow - Illustration Mapper Algorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 11. Mapper in practice Mapper Algorithm 11 Input ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components in G ✤ The Mapper is the nerve of V connected cover V ✤ Color vertices according to average value of f in the cluster - Point cloud P with metric dP - Compute neighborhood graph G = (P, E) • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection k i=0Vi 6= ;, V0, ..., Vk 2 V (intersections materialized by data points)
  • 12. Mapper in practice Mapper Algorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 13. Mapper in practice Mapper Algorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 14. Mapper in practice Mapper Algorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 15. Mapper in practice Mapper Algorithm 15 Parameters ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: ✤ Neighborhood size δ Example: uniform cover L • Resolution / granularity: r (diameter of intervals) • Gain: g (percentage of overlap) range scale geometric scale
  • 16. Filter functions Mapper Algorithm 16 Choice of filter function is essential • Some kind of density measure • A score measure difference (distance) from some baseline • An eccentricity measure Statistics
 Mean/Max/Min Variance n-Moment Density … Machine Learning
 PCA/SVD Auto encoders Isomap/MDS/TSNE SVM Distance Error/Debugging Info … Geometry
 Centrality
 Curvature
 Harmonic Cycles …
  • 17. Filter functions Mapper Algorithm 17 Eccentricity Density - How close the point lies to the “center” of the point cloud. - How close the point to the surrounding points
  • 18. Mapper in applications Mapper Algorithm 18 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015 Using Topological Data Analysis for Diagnosis Pulmonary Embolism, Rucco et al., arXiv preprint, 2014 Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009 CD8 T-cell reactivity to islet antigens is unique to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2 diabetes, Sarikonda et al., J. Autoimmunity, 2013 Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 ✤ ✤ ✤ ✤ ✤ ✤
  • 19. Mapper in practice Mapper Algorithm 19 1. Clustering 2. Feature selection
  • 20. Mapper in clustering Mapper Algorithm 20 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Use substructure to cluster data select parameters Not easy (Tutorial part 1 + 2)
  • 21. Mapper Algorithm 21 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 f: 1st and 2nd SVD r = 120, g = 22% PCA can show the Republican/ Democrat cluster but TDA gives more information House Party representative grouping Point: member of the House PCA
  • 22. Mapper Algorithm 22 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Detect new clusters for NBA players
  • 23. Mapper Algorithm 23 Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 The TDA used 62 subjects with most complete data. f: 1st and 2nd SVD r = 120, g = 14%, equalized
  • 24. Mapper in feature selection Mapper Algorithm 24 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Select features that best discriminate data in substructure select parameters Kolmogorov-Smirnov test on (substructure) feature vs. (whole dataset) feature, select features with low p-val
  • 25. Mapper Algorithm 25 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Goal: detect factors that influence survival after therapy in breast cancer patients Points: breast cancer patients that went through specific therapy PCA/Single-linkage clustering cannot see this f: eccentricity r = 1/30, g = 33%
  • 26. Mapper Algorithm 26 Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
  • 27. Select Parameters Mapper Algorithm 27 parameter r parameter g parameter δ parameter f • Small r -> fine cover 
 (close to Reeb) (sensitive to δ) • Large r -> rough cover 
 (less sensitive to δ) • g ≈ 1 -> more points inside intersections , less sensitive to δ but far from Reeb • g ≈ 0 -> controlled Mapper dimension, close to Reeb • Large δ -> fewer nodes, clean Mapper but far from Reeb (more straight lines) • Small δ -> distinct topological structure but lots of nodes (noisy) • Depend mostly on the dataset coordinate, density estimation, eccentricity, eigenvector
  • 28. Select Parameters Mapper Algorithm 28 Example: P in R2 sampled from known distribution f = density estimator, r = 1/30, g = 20% δ = percentage of the diameter of X Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 29. Reference links Mapper Algorithm 29 • INF563 Topological Data Analysis Course
 http://www.enseignement.polytechnique.fr/informatique/INF563/ • AYASDI
 http://www.ayasdi.com/ • …