A scalable collaborative filtering framework based on co-clustering

A Scalable Collaborative Filtering
Framework based on Co-clustering
Author: Thomas George, Srujana Merugu in ICDM’05.
Presenter: Rei-Zhe Liu. Date: 2010/10/26.

Outline
 Introduction
 System architecture
 Experiments and result
 Conclusion
2

Introduction
 We propose a dynamic collaborative filtering
approach that can support the entry of new users,
items and ratings using a hybrid of incremental
and batch versions of the co-clustering algorithm.
 Empirical comparison of our approach with SVD,
NNMF and correlation-based collaborative filtering
techniques indicates comparable accuracy at a
much lower computational effort.
3

Problem definition(1/2)
 The approximate matrix for prediction is given by
5

Problem definition(2/2)
 We can now pose the prediction of unknown ratings
as a co-clustering problem where we seek to find the
optimal user and item clustering such that the
approximation error with respect to the known
ratings of A is minimized,
 where ensures that only the known ratings contribute
to the loss function.

6

System description
 P1 handles the prediction and
incremental training.
 P2 is responsible for the static
training.
 During incremental training P1,
also updates the raw ratings.
 P2 performs co-clustering
repeatedly by reading A(the
current ratings matrix) and
updating S(summary statistics)
when done.
 Data Objects A and S are stored
at 2 parts: (a)stable part
(b)increment part.
 At the end of each co-clustering
run, the two parts are merged to
obtain a new set of stable values.
10

Data sets and Exp. Settings(1/2)
 Data set
 MovieLens: 943-1882 user-by-movie matrix. Totally
100,000 ratings. Rated from 1 to 5.
 Evaluation methodology
 The prediction accuracy was measured using the mean
absolute error (MAE), which is the average of the
absolute values of the errors over all the predictions.
 The static training time was estimated in terms of the
CPU time taken for the core training routines (viz. co-
clustering and SVD).
 The prediction time was estimated by averaging over the
response time taken for all the predictions.
12

Data sets and Exp. Settings(2/2)
 For evaluating the prediction accuracy, we created ten
80-20% random train-test splits of the datasets and
averaged the results over the various splits.
 We considered two scenarios, —(i) static testing, where
the known ratings do not change, and (ii) dynamic
testing, where the ratings are updated incrementally.
 Algorithms
 We compared the performance of our co-clustering
based approach with SVD [13], NNMF [10] and classic
correlation-based collaborative filtering [12].
 An incremental SVD-based approach [14] using a
folding in technique was also implemented in order to
evaluate the prediction accuracy in dynamic scenarios
with changing ratings.13

Evaluation(1/3)
k = l = SVD rank =NNMF rank=3
k = l = SVD rank=3
14

Evaluation(2/3)
Dataset:
Mov1
Dataset:
MovieLens
15
CoC: (m+n+kl-k-l)
NNMF, SVD:
(m+n)(k+l)

Evaluation(3/3)
16
Dataset:
MovieLens

Conclusion
 In this paper, we presented a new dynamic
collaborative filtering approach based on simultaneous
clustering of users and items.
 Empirical results indicate that our approach can provide
high quality predictions at a much lower computational
cost compared to traditional correlation and SVD-based
approaches.
18

A scalable collaborative filtering framework based on co-clustering

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to A scalable collaborative filtering framework based on co-clustering (20)

Recently uploaded (20)

A scalable collaborative filtering framework based on co-clustering