A Scalable Collaborative Filtering
Framework based on Co-clustering
Author: Thomas George, Srujana Merugu in ICDM’05.
Presenter: Rei-Zhe Liu. Date: 2010/10/26.
Outline
 Introduction
 System architecture
 Experiments and result
 Conclusion
2
Introduction
 We propose a dynamic collaborative filtering
approach that can support the entry of new users,
items and ratings using a hybrid of incremental
and batch versions of the co-clustering algorithm.
 Empirical comparison of our approach with SVD,
NNMF and correlation-based collaborative filtering
techniques indicates comparable accuracy at a
much lower computational effort.
3
System architecture
Problem definition(1/2)
 The approximate matrix for prediction is given by
5
Problem definition(2/2)
 We can now pose the prediction of unknown ratings
as a co-clustering problem where we seek to find the
optimal user and item clustering such that the
approximation error with respect to the known
ratings of A is minimized,
 where ensures that only the known ratings contribute
to the loss function.

6
Algorithm(1/3)
7
Algorithm(2/3)
8
Algorithm(3/3)
9
System description
 P1 handles the prediction and
incremental training.
 P2 is responsible for the static
training.
 During incremental training P1,
also updates the raw ratings.
 P2 performs co-clustering
repeatedly by reading A(the
current ratings matrix) and
updating S(summary statistics)
when done.
 Data Objects A and S are stored
at 2 parts: (a)stable part
(b)increment part.
 At the end of each co-clustering
run, the two parts are merged to
obtain a new set of stable values.
10
Experiments and results
Data sets and Exp. Settings(1/2)
 Data set
 MovieLens: 943-1882 user-by-movie matrix. Totally
100,000 ratings. Rated from 1 to 5.
 Evaluation methodology
 The prediction accuracy was measured using the mean
absolute error (MAE), which is the average of the
absolute values of the errors over all the predictions.
 The static training time was estimated in terms of the
CPU time taken for the core training routines (viz. co-
clustering and SVD).
 The prediction time was estimated by averaging over the
response time taken for all the predictions.
12
Data sets and Exp. Settings(2/2)
 For evaluating the prediction accuracy, we created ten
80-20% random train-test splits of the datasets and
averaged the results over the various splits.
 We considered two scenarios, —(i) static testing, where
the known ratings do not change, and (ii) dynamic
testing, where the ratings are updated incrementally.
 Algorithms
 We compared the performance of our co-clustering
based approach with SVD [13], NNMF [10] and classic
correlation-based collaborative filtering [12].
 An incremental SVD-based approach [14] using a
folding in technique was also implemented in order to
evaluate the prediction accuracy in dynamic scenarios
with changing ratings.13
Evaluation(1/3)
k = l = SVD rank =NNMF rank=3
k = l = SVD rank=3
14
Evaluation(2/3)
Dataset:
Mov1
Dataset:
MovieLens
15
CoC: (m+n+kl-k-l)
NNMF, SVD:
(m+n)(k+l)
Evaluation(3/3)
16
Dataset:
MovieLens
Conclusion
Conclusion
 In this paper, we presented a new dynamic
collaborative filtering approach based on simultaneous
clustering of users and items.
 Empirical results indicate that our approach can provide
high quality predictions at a much lower computational
cost compared to traditional correlation and SVD-based
approaches.
18

More Related Content

PPT
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
PDF
Software Process Control on Ungrouped Data: Log-Power Model
PDF
Pay-as-you-go Reconciliation in Schema Matching Networks
PDF
An Empirical Study for Defect Prediction using Clustering
PDF
A report on designing a model for improving CPU Scheduling by using Machine L...
PDF
Master's Thesis Presentation
PDF
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
PPTX
a deep reinforced model for abstractive summarization
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Software Process Control on Ungrouped Data: Log-Power Model
Pay-as-you-go Reconciliation in Schema Matching Networks
An Empirical Study for Defect Prediction using Clustering
A report on designing a model for improving CPU Scheduling by using Machine L...
Master's Thesis Presentation
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
a deep reinforced model for abstractive summarization

What's hot (20)

PDF
mlsys_portrait
PDF
SigOpt_Bayesian_Optimization_Primer
PDF
fmelleHumanActivityRecognitionWithMobileSensors
PPTX
Meta learned Confidence for Few-shot Learning
PDF
Differential evolution optimization technique
DOCX
Learning Methods in a Neural Network
PPTX
RapidMiner: Learning Schemes In Rapid Miner
PPTX
Feature Selection in Machine Learning
PPTX
Genetic algorithm for hyperparameter tuning
PPTX
RapidMiner: Data Mining And Rapid Miner
PDF
Similarity learning
PDF
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
PDF
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
PDF
Machine learning Mind Map
PPT
Graph-Based Technique for Extracting Keyphrases In a Single-Document (GTEK)
PDF
Deep learning MindMap
PDF
Caravan insurance data mining prediction models
PDF
Survey paper on Big Data Imputation and Privacy Algorithms
PPTX
Wasserstein 1031 thesis [Chung il kim]
mlsys_portrait
SigOpt_Bayesian_Optimization_Primer
fmelleHumanActivityRecognitionWithMobileSensors
Meta learned Confidence for Few-shot Learning
Differential evolution optimization technique
Learning Methods in a Neural Network
RapidMiner: Learning Schemes In Rapid Miner
Feature Selection in Machine Learning
Genetic algorithm for hyperparameter tuning
RapidMiner: Data Mining And Rapid Miner
Similarity learning
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Machine learning Mind Map
Graph-Based Technique for Extracting Keyphrases In a Single-Document (GTEK)
Deep learning MindMap
Caravan insurance data mining prediction models
Survey paper on Big Data Imputation and Privacy Algorithms
Wasserstein 1031 thesis [Chung il kim]
Ad

Viewers also liked (20)

PDF
Brokerage 2007 presentation distributed
PDF
2 deus leaflet wp2
PDF
I Minds2009 Health Decision Support Prof Bart De Moor (Ibbt Esat Ku Leuven)
PDF
Acknowledge 08 Ontwikkeling Front End Benny Daems Ibbt Edm U Hasselt En Al...
PDF
Maduf07 Expert Opinion And Potential Estimation Lieven De Marez
PDF
I Lab4 Usecases
PPTX
T map 로그에서 발생한 Java Locale 문제들
PDF
Qo E E2 E6 Slotevent Programma
PDF
Ddo1 Bernd Langeheine 081017 Ghent
PDF
Erfgoed2 0 6 Nieuwe Perspectieven Voor Digitaal Erfgoed Bart De Nil En Jero...
PPS
tviexpress
PDF
Show Me the Outcomes - United States
PPTX
The Library's "Place"
PDF
Mark Sterns : entrepreneurship and faithfulness
PDF
Brokerage2006 de logistieke keten
PDF
Zorg en technologie_IBBT_Brokerage_HS_Peter_Degadt0120416_
PDF
Brokerage 2007presentation user
PDF
Grid07 4 Tzannetakis
PDF
Analyse Gent M #11 & Launch Startup Garage
PDF
Brokerage2006 beheer van volgende generatie telecom services
Brokerage 2007 presentation distributed
2 deus leaflet wp2
I Minds2009 Health Decision Support Prof Bart De Moor (Ibbt Esat Ku Leuven)
Acknowledge 08 Ontwikkeling Front End Benny Daems Ibbt Edm U Hasselt En Al...
Maduf07 Expert Opinion And Potential Estimation Lieven De Marez
I Lab4 Usecases
T map 로그에서 발생한 Java Locale 문제들
Qo E E2 E6 Slotevent Programma
Ddo1 Bernd Langeheine 081017 Ghent
Erfgoed2 0 6 Nieuwe Perspectieven Voor Digitaal Erfgoed Bart De Nil En Jero...
tviexpress
Show Me the Outcomes - United States
The Library's "Place"
Mark Sterns : entrepreneurship and faithfulness
Brokerage2006 de logistieke keten
Zorg en technologie_IBBT_Brokerage_HS_Peter_Degadt0120416_
Brokerage 2007presentation user
Grid07 4 Tzannetakis
Analyse Gent M #11 & Launch Startup Garage
Brokerage2006 beheer van volgende generatie telecom services
Ad

Similar to A scalable collaborative filtering framework based on co-clustering (20)

PPT
A scalable collaborative filtering framework based on co clustering
PPTX
Collaborative Filtering Recommendation System
PDF
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
PDF
Ijmet 10 02_050
PPTX
Recommender Systems: Advances in Collaborative Filtering
PPT
Collaborative filtering using orthogonal nonnegative matrix
PDF
IntroductionRecommenderSystems_Petroni.pdf
PPTX
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
PDF
A Review Study OF Movie Recommendation Using Machine Learning
DOCX
Developing Movie Recommendation System
PDF
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
PDF
Multidirectional Product Support System for Decision Making In Textile Indust...
PPT
Chapter 02 collaborative recommendation
PPT
Chapter 02 collaborative recommendation
PPTX
Lessons learnt at building recommendation services at industry scale
PDF
Survey of Recommendation Systems
PDF
Recommendation System Explained
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Mobile App Recommendations Using Deep Learning and Big Data
PPTX
Collaborative filtering
A scalable collaborative filtering framework based on co clustering
Collaborative Filtering Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
Ijmet 10 02_050
Recommender Systems: Advances in Collaborative Filtering
Collaborative filtering using orthogonal nonnegative matrix
IntroductionRecommenderSystems_Petroni.pdf
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Review Study OF Movie Recommendation Using Machine Learning
Developing Movie Recommendation System
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
Multidirectional Product Support System for Decision Making In Textile Indust...
Chapter 02 collaborative recommendation
Chapter 02 collaborative recommendation
Lessons learnt at building recommendation services at industry scale
Survey of Recommendation Systems
Recommendation System Explained
International Journal of Computational Engineering Research(IJCER)
Mobile App Recommendations Using Deep Learning and Big Data
Collaborative filtering

Recently uploaded (20)

PPTX
The various Industrial Revolutions .pptx
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Configure Apache Mutual Authentication
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Five Habits of High-Impact Board Members
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
The various Industrial Revolutions .pptx
Consumable AI The What, Why & How for Small Teams.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
sustainability-14-14877-v2.pddhzftheheeeee
The influence of sentiment analysis in enhancing early warning system model f...
Configure Apache Mutual Authentication
Benefits of Physical activity for teenagers.pptx
Training Program for knowledge in solar cell and solar industry
A contest of sentiment analysis: k-nearest neighbor versus neural network
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
NewMind AI Weekly Chronicles – August ’25 Week III
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Getting started with AI Agents and Multi-Agent Systems
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Zenith AI: Advanced Artificial Intelligence
Build Your First AI Agent with UiPath.pptx
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Five Habits of High-Impact Board Members
sbt 2.0: go big (Scala Days 2025 edition)
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...

A scalable collaborative filtering framework based on co-clustering

  • 1. A Scalable Collaborative Filtering Framework based on Co-clustering Author: Thomas George, Srujana Merugu in ICDM’05. Presenter: Rei-Zhe Liu. Date: 2010/10/26.
  • 2. Outline  Introduction  System architecture  Experiments and result  Conclusion 2
  • 3. Introduction  We propose a dynamic collaborative filtering approach that can support the entry of new users, items and ratings using a hybrid of incremental and batch versions of the co-clustering algorithm.  Empirical comparison of our approach with SVD, NNMF and correlation-based collaborative filtering techniques indicates comparable accuracy at a much lower computational effort. 3
  • 5. Problem definition(1/2)  The approximate matrix for prediction is given by 5
  • 6. Problem definition(2/2)  We can now pose the prediction of unknown ratings as a co-clustering problem where we seek to find the optimal user and item clustering such that the approximation error with respect to the known ratings of A is minimized,  where ensures that only the known ratings contribute to the loss function.  6
  • 10. System description  P1 handles the prediction and incremental training.  P2 is responsible for the static training.  During incremental training P1, also updates the raw ratings.  P2 performs co-clustering repeatedly by reading A(the current ratings matrix) and updating S(summary statistics) when done.  Data Objects A and S are stored at 2 parts: (a)stable part (b)increment part.  At the end of each co-clustering run, the two parts are merged to obtain a new set of stable values. 10
  • 12. Data sets and Exp. Settings(1/2)  Data set  MovieLens: 943-1882 user-by-movie matrix. Totally 100,000 ratings. Rated from 1 to 5.  Evaluation methodology  The prediction accuracy was measured using the mean absolute error (MAE), which is the average of the absolute values of the errors over all the predictions.  The static training time was estimated in terms of the CPU time taken for the core training routines (viz. co- clustering and SVD).  The prediction time was estimated by averaging over the response time taken for all the predictions. 12
  • 13. Data sets and Exp. Settings(2/2)  For evaluating the prediction accuracy, we created ten 80-20% random train-test splits of the datasets and averaged the results over the various splits.  We considered two scenarios, —(i) static testing, where the known ratings do not change, and (ii) dynamic testing, where the ratings are updated incrementally.  Algorithms  We compared the performance of our co-clustering based approach with SVD [13], NNMF [10] and classic correlation-based collaborative filtering [12].  An incremental SVD-based approach [14] using a folding in technique was also implemented in order to evaluate the prediction accuracy in dynamic scenarios with changing ratings.13
  • 14. Evaluation(1/3) k = l = SVD rank =NNMF rank=3 k = l = SVD rank=3 14
  • 18. Conclusion  In this paper, we presented a new dynamic collaborative filtering approach based on simultaneous clustering of users and items.  Empirical results indicate that our approach can provide high quality predictions at a much lower computational cost compared to traditional correlation and SVD-based approaches. 18