Customer Segmentation using Clustering

0
Confidential. © Stream Intelligence Ltd. All rights reserved.
Introduction to Clustering

1
Agenda
1 Introduction: Business Case
2 Clustering
3 Hierarchical Clustering
4 K-means Clustering

2
Business Case
1

3
Business Case – Predicting Successful Music Production
Cluster
music A
Cluster
music B
Cluster
music C
Cluster
music D
• Target is to appear at Billboard’s weekly to 40
• Cost per single could up to 300K USD
• Music Intelligence Solution using clustering to predict if a music will be
accepted by market
• Increase success rate from 1 out of 10 to 8 out of 10

4
Clustering
2

5
Statistical Learning Categorization
Statistical
Learning
Unsupervised
Learning
Supervised
Learning
Clustering Predictive Model

6
Clustering
• Process of grouping a set of physical or abstract objects into clusters
(example: customer, product etc.)
• A cluster is a collection of data objects that are similar to one another within the same
cluster and are dissimilar to the objects in other clusters
• Similarity is calculated based distance between point
• Common distance measure is Euclidian distance

7
Hierarchycal Clustering
2

8
Hierarchical Clustering
• Start with each data point in its own cluster

9
Hierarchical Clustering
• Combine two nearest clusters (Euclidian, Centroid)

10
Lets Practice
• The data for this exercise was downloaded from www.movielens.org
• Open “clustering_movie.R”
• The movies in the dataset are categorized as belonging to different gender:
a. Action
b. Comedy
c. Sci-Fi
d. etc.

11
Dendogram
Heights represent
the distance
between
point/cluster

12
Finding Meaningful Cluster
• How to see which cluster have the most action movies?
use this command:
tapply(movies$Action, clusterGroups, mean)
• Exercise: Can you find the characteristic of each cluster?
Hint:
- Add the cluster as one of the variable in the data
- Load dplyr library
- Use aggregate and summarise function

13
Common scenario
Tips:
- Normalize the data
Movie Action Romance Rating Revenue
(in USD)
A 1 1 5 200
B 0 1 4 150
C 0 0 3 50
D 1 1 4 120

14
K-means Clustering
2

15
K-Means Clustering
1. Group data into K-clusters by:
a. Determining the k centroid
b. Group the data points to the nearest centroid
2. Algorithm works by iterating between two stages until the data points converge
Objective : High Level Description

16
Suppose k=3
K-Means Illustrations

17
Iteration = 0
1. Start with random positions of centroids.

18
Iteration = 1
2. Assign each data point to closest centroid

19
Iteration = 1
3. Move centroids to center of assigned
points (recalculating C)

20
Iteration = 3
points
4. Iterate till minimal cost

21
Iteration = 3
points
4. Iterate till minimal cost
What potentially can go wrong?

22
Optimum Number of Cluster Illustrations
TSS = Total Sum of Square Error
K = Number of cluster
Optimum Number of Cluster

23
Lets Practice
• We will use the credit card profile data (cc-profile.csv)
• Open “segmenting_customer.R”
Exercise:
• What is the optimum number of cluster?
• Please provide the characteristics of segment. Do you think it is meaningful?

Customer Segmentation using Clustering

More Related Content

What's hot (20)

Similar to Customer Segmentation using Clustering (20)

Recently uploaded (20)

Customer Segmentation using Clustering