1. Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
Course Project Presentation
Mahdi Amiri
June 2003
Sharif University of Technology
2. Page 2 of 30 Fuzzy C-Means Clustering
Presentation Outline
Presentation Outline
Motivation and Goals
Fuzzy C-Means Clustering (FCM)
Possibilistic C-Means Clustering (PCM)
Fuzzy-Possibilistic C-Means (FPCM)
Comparison of FCM, PCM and FPCM
Conclusions and Future Works
3. Page 3 of 30 Fuzzy C-Means Clustering
Motivation and Goals
Motivation and Goals
Sample Applications
Sample Applications
Image segmentation
– Medical imaging
• X-ray Computer Tomography (CT)
• Magnetic Resonance Imaging (MRI)
• Position Emission Tomography (PET)
Image and speech enhancement
Edge detection
Video shot change detection
4. Page 4 of 30 Fuzzy C-Means Clustering
Definition: Search for structure in data
Elements of Numerical Pattern Recognition
– Process Description
• Feature Nomination, Test Data, Design Data
– Feature Analysis
• Preprocessing, Extraction, Selection, …
– Cluster Analysis
• Labeling, Validity, …
– Classifier Design
• Classification, Estimation, Prediction, Control, …
We are here
Pattern Recognition
Pattern Recognition
Motivation and Goals
Motivation and Goals
5. Page 5 of 30 Fuzzy C-Means Clustering
Fuzzy Clustering
Fuzzy Clustering
Useful in Fuzzy Modeling
– Identification of the fuzzy rules needed to
describe a “black box” system, on the basis
of observed vectors of inputs and outputs
History
– FCM: Bezdek, 1981
– PCM: Krishnapuram - Keller, 1993
– FPCM: N. Pal - K. Pal - Bezdek, 1997
Motivation and Goals
Motivation and Goals
Prof. Bezdek
6. Page 6 of 30 Fuzzy C-Means Clustering
1 2
, , , n
X x x x
n is the number of data point in X
p
k
x p is the number of features in each vector
A c-partition of X, which is matrix U
c n
Set of vectors
1 2
, , , p
c
V v v v
i
v is called “cluster center”
Input, Output
Input, Output
Input: Unlabeled data set
Main Output
Common Additional Output
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
7. Page 7 of 30 Fuzzy C-Means Clustering
X
and
U V
Rows of U
(Membership Functions)
188
n
4
c
2
p
Sample Illustration
Sample Illustration
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
8. Page 8 of 30 Fuzzy C-Means Clustering
2
( , )
1 1
min ( , )
c n
m
m ik ik
i k
J u D
U V
U V
(FCM), Objective Function
(FCM), Objective Function
2
2
ik k i
D A
x v
Distance
1
m
Degree of
Fuzzification
1
1 ,
c
ik
i
u k
Constraint
, T
A A
x x x x Ax
A-norm
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
Optimization of an “objective function” or
“performance index”
9. Page 9 of 30 Fuzzy C-Means Clustering
Zeroing the gradient of with respect to
Zeroing the gradient of with respect to
Minimizing Objective Function
Minimizing Objective Function
1
2
1
1
, ,
m
c
ik
ik
j jk
D
u i k
D
m
J
U
m
J
V
1
( )
t t
F
U V
1
( )
t t
G
V U
1 1
,
n n
m m
i ik k ik
k k
u u i
v x
Note: It is the Center of Gravity
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
10. Page 10 of 30 Fuzzy C-Means Clustering
Initial Choices
– Number of clusters
– Maximum number of iterations (Typ.: 100)
– Weighting exponent (Fuzziness degree)
• m=1: crisp
• m=2: Typical
– Termination measure 1-norm
– Termination threshold (Typ. 0.01)
1 c n
T
m
0
1
t t t
E
V V
Pick
Pick
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
11. Page 11 of 30 Fuzzy C-Means Clustering
Guess Initial Cluster Centers
Alternating Optimization (AO)
–
– REPEAT
–
–
–
– UNTIL ( or )
–
0 1,0 ,0
( , ) cp
c
V v v
0
t
1
t t
1
( )
t t
F
U V
1
( )
t t
G
V U
t T
1
t t
V V
( , ) ( , )
t t
U V U V
Guess, Iterate
Guess, Iterate
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
13. Page 13 of 30 Fuzzy C-Means Clustering
0
U
1
t t t
U V U
1
t t
U U
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
Process could be shifted one half cycle
– Initialization is done on
– Iterates become
– Termination criterion
The convergence theory is the same in either case
Initializing and terminating on V is advantageous
– Convenience
– Speed
– Storage
Implementation Notes
Implementation Notes
14. Page 14 of 30 Fuzzy C-Means Clustering
Pros and Cons
Pros and Cons
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
Advantages
– Unsupervised
– Always converges
Disadvantages
– Long computational time
– Sensitivity to the initial guess (speed, local minima)
– Sensitivity to noise
• One expects low (or even no) membership degree
for outliers (noisy points)
15. Page 15 of 30 Fuzzy C-Means Clustering
Optimal Number of Clusters
Optimal Number of Clusters
2 2
( )
1 1
min ( ) ( )
c n
m
ik k i i
c
i k
P c u
x v v x
Sum of the
within fuzzy cluster fluctuations
(small value for optimal c)
Sum of the
between fuzzy cluster fluctuations
(big value for optimal c)
2
1 1
( )
c n
m
ik k i
i k
u
x v
2
1 1
( )
c n
m
ik i
i k
u
v x
1
1 n
k
k
n
x x
Average of all feature vectors
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
Performance Index
16. Page 16 of 30 Fuzzy C-Means Clustering
Optimal Cluster No. (Example)
Optimal Cluster No. (Example)
Performance index for optimal clusters
(is minimum for c = 4)
c = 4
c = 2 c = 3
c = 5
Fuzzy C-Means Clustering
Fuzzy C-Means Clustering
17. Page 17 of 30 Fuzzy C-Means Clustering
is an outlier but has the same membership
degrees as
Outliers, Disadvantage of FCM
Outliers, Disadvantage of FCM
6
x
6
x
12
x
1,6 0.5
u 2,6 0.5
u 1,12 0.5
u 2,12 0.5
u
1,6 0.5
u
11
X 12
X
2,6 0.5
u
FCM on FCM on
12
x
6
x
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
18. Page 18 of 30 Fuzzy C-Means Clustering
(PCM), Objective Function
(PCM), Objective Function
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
Objective function
Typicality or Possibility
– No constraint like
Cluster weights
2
( , )
1 1 1 1
min ( , ; ) (1 )
c n c n
m m
m ik ik i ik
i k i k
P t D t
T V
T V w w
1 2
( , , , )T
c
w w w
w i
w
ik
t
1
1 ,
c
ik
i
u k
19. Page 19 of 30 Fuzzy C-Means Clustering
Terms of Objective Function
Terms of Objective Function
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
Unconstrained optimization of first term will
lead to the trivial solution
The second term acts as a penalty which tries to
bring typicality values towards 1.
2
1 1
c n
m
ik ik m
i k
t D J
1 1
(1 )
c n
m
i ik
i k
t
w
0 , ,
ik
t i k
First term
Second term
20. Page 20 of 30 Fuzzy C-Means Clustering
Minimizing Objective Function (OF)
Minimizing Objective Function (OF)
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
Rows and columns of OF are independent
First order necessary conditions for
1
2 1
1
, ,
1
ik
m
ik
i
t i k
D
w
ik-th term of OF
Cluster centers (Same as FCM)
2
( , ) (1 )
ik m m
m ik ik i ik
P t D t
T V w
1 1
,
n n
m m
i ik k ik
k k
t t i
v x
Typicality values
21. Page 21 of 30 Fuzzy C-Means Clustering
Alternating Optimization, Again
Alternating Optimization, Again
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
Similar to FCM-AO algorithm (Replace
equations of necessary conditions)
Terminal outputs of FCM-AO recommended as
a good way to initialize PCM-AO
– Cluster centers: Final cluster centers of FCM-AO
– Weights:
2
1
1
, 0
n
m
ik ik
k
i n
m
ik
k
u D
K K
u
w
Typ. K = 1
is proportional to the average
within cluster fluctuation
22. Page 22 of 30 Fuzzy C-Means Clustering
is recognized as an outlier by PCM
Identify Outliers
Identify Outliers
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
6
x
12
x
1,12 0.5
u 2,12 0.5
u
1,6 0.5
u
12
X
2,6 0.5
u
FCM on
12
x
1,12 0.07
t 2,12 0.07
t
1,6 0.63
t
12
X
2,6 0.63
t
PCM on
6
x
12
x
2.0
m
1 2 7.88
w w
23. Page 23 of 30 Fuzzy C-Means Clustering
Pros and Cons
Pros and Cons
Possibililstic C-Means Clustering
Possibililstic C-Means Clustering
Advantage
– Clustering noisy data samples
Disadvantage
– Very sensitive to good initialization
– Coincident clusters may result
• Because the columns and rows of the typicality
matrix are independent of each other
• Sometimes this could be advantageous (start with
a large value of c and get less distinct clusters)
24. Page 24 of 30 Fuzzy C-Means Clustering
Idea
Idea
is a function of and all c centroids
is a function of and alone
Both are important
– To classify a data point, cluster centroid has
to be closest to the data point Membership
– For Estimating the centroids Typicality
for alleviating the undesirable effect of
outliers
ik
u
ik
t
k
x
k
x i
v
Fuzzy-Possibililstic C-Means
Fuzzy-Possibililstic C-Means
25. Page 25 of 30 Fuzzy C-Means Clustering
(FPCM), OF and Constraints
(FPCM), OF and Constraints
Fuzzy-Possibililstic C-Means
Fuzzy-Possibililstic C-Means
Objective function
Constraints
– Membership
– Typicality
• Because of this constraint, typicality of a data point to a
cluster, will be normalized with respect to the distance of all
n data points from that cluster next slide
2
,
( , , )
1 1
min ( , , ) ( )
c n
m
m ik ik ik
i k
J u t D
U T V
U T V
1
1 ,
c
ik
i
u k
1
1 ,
n
ik
k
t i
26. Page 26 of 30 Fuzzy C-Means Clustering
Minimizing OF
Minimizing OF
Fuzzy-Possibililstic C-Means
Fuzzy-Possibililstic C-Means
Membership values
– Same as FCM, but
resulted values may
be different
Typicality values
– Depends on all data
Cluster centers
1
2
1
1
, ,
m
c
ik
ik
j jk
D
u i k
D
1 1
( ) ( ) ,
n n
m m m m
i ik ik k ik ik
k k
u t u t i
v x
1
2
1
1
, ,
n
ik
ik
j ij
D
t i k
D
Typical
in the interval
[3,5]
27. Page 27 of 30 Fuzzy C-Means Clustering
FPCM on X-12
FPCM on X-12
Fuzzy-Possibililstic C-Means
Fuzzy-Possibililstic C-Means
6
x
12
x
1,12 0.5
u 2,12 0.5
u
1,6 0.5
u 2,6 0.5
u
U values 1,12 0.002
t 2,12 0.002
t
1,6 0.023
t 2,6 0.023
t
T values
6
x
12
x
2.0
m 2.0
0.00001
Initial parameters
28. Page 28 of 30 Fuzzy C-Means Clustering
IRIS Data Samples
IRIS Data Samples
Iris plants database
– 4-dimensional data set containing
50 samples each of three types
of IRIS flowers
– n = 150, p = 4, c = 3
– Features
• Sepal length, sepal width,
petal length, petal width
– Classes
• Setosa, Versicolor, Virginica
Iris
setosa
Iris
versicolor
Iris
virginica
Petal
Comparison of FCM, PCM and FPCM
Comparison of FCM, PCM and FPCM
30. Page 30 of 30 Fuzzy C-Means Clustering
Err-T-FPCM <= Err-U-FPCM <= Err-FCM
– Could be considered true in general
Mismatch
– Number of iterations required for FPCM in general
is not half of that for FCM as mentioned at
[PalPB97]; Is there any mistake in my
implementation?
Comparison of algorithms using other “noisy”
data sets
Conclusions and Future Works
Conclusions and Future Works
32. Page 32 of 30 Fuzzy C-Means Clustering
[Bez81] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum, NY, 1981.
[BezKKP99] James C. Bezdek, James Keller, Raghu Krishnapuram and Nikhil
R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and
Image Processing, Kluwer Academic Publishers, TA 1650.F89,
1999.
[KriK93] R. Krishnapuram and J. M. Keller, “A possibilistic approach to
clustering,” IEEE Transactions on Fuzzy Systems, Vol. 1, No. 2, pp.
98-110, May 1993.
[PalPB97] N. R. Pal, K. Pal and J. C. Bezdek, “A mixed c-means clustering
model,” Proceedings of the Sixth IEEE International Conference on
Fuzzy Systems, Vol. 1, pp. 11-21, Jul. 1997.
[YanRP94] Jun Yan, Michael Ryan and James Power, Using fuzzy logic
Towards intelligent systems, Prentice Hall, 1994.
References
References
33. Page 33 of 30 Fuzzy C-Means Clustering
…
Part Title
…
…
…
…
Part Title
Part Title
Editor's Notes
#3:Tomograph: Medical instrument which receives X-rays via a special method.
Magnetic Resonance Imager (MRI): Diagnostic technique which uses a magnetic field and radio waves to provide computerized images of internal body tissues.
Positron Emission Tomography (PET): Technique for creating detailed images of bodily tissues by injecting positron-laden material into the body and recording the gamma rays emitted over a period of approximately two hours.
#10:1-norm (X) = max(sum(abs(X))) (the largest column sum of X)