SlideShare a Scribd company logo
EM algorithm and its application in Probabilistic Latent
                   Semantic Analysis (pLSA)

                                                 Duc-Hieu Tran
                                             tdh.net [at] gmail.com

                                            Nanyang Technological University


                                                   July 27, 2010




Duc-Hieu Trantdh.net [at] gmail.com (NTU)              EM in pLSA              July 27, 2010   1 / 27
The parameter estimation problem


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   2 / 27
The parameter estimation problem


  Introduction



   Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi )
   =⇒ optimal classifier
           P(ωj |x) ∝ p(x|ωj )p(ωj )
           decide ωi if p(ωi |x) > P(ωj |x), ∀j = i
   In practice, p(x|ωi ) is unknown – just estimated from training samples
   (e.g., assume p(x|ωi ) ∼ N (µi , Σi )).




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   3 / 27
The parameter estimation problem


  Frequentist vs. Bayesian schools


   Frequentist
           parameters – quantities whose values are fixed but unknown.
           the best estimate of their values – the one that maximizes the
           probability of obtaining the observed samples.
   Bayesian
           paramters – random variables having some known prior distribution.
           observation of the samples converts this to a posterior density;
           revising our opinion about the true values of the parameters.




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   4 / 27
The parameter estimation problem


  Examples

           training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )}
           frequentist: maximum likelihood

                                                max            p(y (i) |x (i) ; θ)
                                                   θ
                                                          i

           bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I)
                                                        m
                                    P(θ|S) ∝                   P(y (i) |x (i) , θ) .P(θ)
                                                       i=1

                                              θMAP = arg max P(θ|S)
                                                                  θ




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                         July 27, 2010   5 / 27
EM algorithm


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA   July 27, 2010   6 / 27
EM algorithm


  An estimation problem

           training set of m independent samples: {x (1) , x (2) , . . . , x (m) }
           goal: fit the paramters of a model p(x, z) to the data
           the likelihood:
                                        m                         m
                                                    (i)
                            (θ) =           log p(x ; θ) =              log       p(x (i) , z; θ)
                                      i=1                         i=1         z

           explicitly maximize (θ) might be difficult.
           z - laten random variable
           if z (i) were observed, then maximum likelihood estimation would be
           easy.
           strategy: repeatedly construct a lower-bound on                           (E-step) and
           optimize that lower-bound (M-step).


Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                             July 27, 2010   7 / 27
EM algorithm


  EM algorithm (1)
           digression: Jensen’s inequality.
           f – convex function; E [f (X )] ≥ f (E [X ])
           for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0

                          (θ) =             log p(x (i) ; θ)
                                     i

                               =            log             p(x (i) , z (i) ; θ)
                                     i              z (i)
                                                                          p(x (i) , z (i) ; θ)
                               =            log             Qi (z (i) )                                          (1)
                                     i
                                                                             Qi (z (i) )
                                                    z (i)
                              applying Jensen’s inequality, concave function log
                                                                       p(x (i) , z (i) ; θ)
                               ≥                    Qi (z (i) )log                                               (2)
                                     i
                                                                          Qi (z (i) )
                                            z (i)

      More detail . . .
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                      EM in pLSA                        July 27, 2010   8 / 27
EM algorithm


  EM algorithm (2)
           for any set of distribution Qi , formula (2) gives a lower-bound on (θ)
           how to choose Qi ?
           strategy: make the inequality hold with equality at our particular
           value of θ.
           require:
                                                 p(x (i) , z (i) ; θ)
                                                                      =c
                                                    Qi (z (i) )
           c – constant not depend on z (i)
           choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ)
           we know           z   Qi (z (i) ) = 1, so

                                     p(x (i) , z (i) ; θ)   p(x (i) , z (i) ; θ)
                  Qi (z (i) ) =                           =                      = p(z (i) |x (i) ; θ)
                                      z p(x (i) , z; θ)       p(x (i) ; θ)

Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                           July 27, 2010   9 / 27
EM algorithm


  EM algorithm (3)


           Qi – posterior distribution of z (i) given x (i) and the parameter θ
   EM algorithm: repeat until convergence
           E-step: for each i
                                                Qi (z (i) ) := p(z (i) |x (i) ; θ)
           M-step:

                                                                                   p(x (i) , z (i) ; θ)
                             θ := arg max                        Qi (z (i) ) log
                                            θ        i
                                                                                      Qi (z (i) )
                                                         z (i)

   The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) )



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                EM in pLSA                                July 27, 2010   10 / 27
EM algorithm


  EM algorithm (4)
   Digression: coordinate ascent algorithm.
       maxW (α1 , . . . αm )
             α
           loop until converge:
           for i ∈ 1, . . . , m:

                                    αi = arg max W (α1 , . . . , αi , . . . , αm )
                                                                 ˆ
                                              αi
                                              ˆ

   EM-algorithm as coordinate ascent algorithm

                                                                             p(x (i) , z (i) ; θ)
                               J(Q, θ) =                   Qi (z (i) ) log
                                              i
                                                                                Qi (z (i) )
                                                   z (i)

            (θ) ≥ J(Q, θ)
           EM algorithm can be viewed as coordinate ascent on J
           E-step: maximize w.r.t Q
           M-step: maximize w.r.t θ
Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                                July 27, 2010   11 / 27
Probabilistic Latent Sematic Analysis


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA   July 27, 2010   12 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (1)

           set of documents D = {d1 , . . . , dN }
           set of words W = {w1 , . . . , wM }
           set of unobserved classes Z = {z1 , . . . , zK }
           conditional independence assumption:

                                         P(di , wj |zk ) = P(di |zk )P(wj |zk )                                (3)

           so,
                                                              K
                                        P(wj |di ) =               P(zk |di )P(wj |zk )                        (4)
                                                             k=1
                                                                   K
                                   P(di , wj ) = P(di )                 P(wj |zk )P(zk |di )
                                                                  k=1
      More detail . . .



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                           July 27, 2010   13 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (2)

           n(di , wj ) – # word wj in doc. di
           Likelihood
                                     N                    N       M
                            L=            P(di ) =                     [P(di , wj )]n(di ,wj )
                                    i=1                 i=1 j=1

                                     N      M                 K                                  n(di ,wj )

                               =                  P(di )              P(wj |zk )P(zk |di )
                                    i=1 j=1                 k=1

           log-likelihood           = log(L)
                  N     M                                                               K
             =                 n(di , wj ) log P(di ) + n(di , wj ) log                     P(wj |zk )P(zk |di )
                 i=1 j=1                                                              k=1



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                                  July 27, 2010   14 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (3)
           maximize w.r.t P(wj |zk ), P(zk |di )
           ≈ maximize
                              N      M                          K
                                         n(di , wj ) log             P(wj |zk )P(zk |di )
                            i=1 j=1                           k=1
                                    N    M                           K
                                                                                      P(wj |zk )P(zk |di )
                            =                 n(di , wj ) log              Qk (zk )
                                                                                           Qk (zk )
                                   i=1 j=1                           k=1
                                    N    M                     K
                                                                                      P(wj |zk )P(zk |di )
                            ≥                 n(di , wj )            Qk (zk ) log
                                                                                           Qk (zk )
                                   i=1 j=1                   k=1

           choose
                                                     P(wj |zk )P(zk |di )
                                  Qk (zk ) =         K
                                                                                       = P(zk |di , wj )
                                                     l=1 P(wj |zl )P(zl |di )
              More detail . . .

Duc-Hieu Trantdh.net [at] gmail.com (NTU)                    EM in pLSA                             July 27, 2010   15 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (4)


           ≈ maximize (w.r.t P(wj |zk ), P(zk |di ))

                         N    M                      K
                                                                                P(wj |zk )P(zk |di )
                                    n(di , wj )           P(zk |di , wj ) log
                                                                                  P(zk |di , wj )
                       i=1 j=1                     k=1

           ≈ maximize
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   16 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (5)
   EM-algorithm
      E-step: update
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step: maximize w.r.t P(wj |zk ), P(zk |di )
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1

           subject to
                                            M
                                                  P(wj |zk ) = 1, k ∈ {1 . . . K }
                                            j=1
                                             K
                                                  P(zk |di ) = 1, i ∈ {1 . . . N}
                                            k=1
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   17 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (6)


   Solution of maximization problem in M-step:
                                                      N
                                                      i=1 n(di , wj )P(zk |di , wj )
                          P(wj |zk ) =           M      N
                                                 m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                 M
                                                 j=1 n(di , wj )P(zk |di , wj )
                          P(zk |di ) =
                                                               n(di )
                                M
   where, n(di ) =              j=1 n(di , wj )
      More detail . . .




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                  EM in pLSA               July 27, 2010   18 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (7)

   All together
           E-step:
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step:
                                                         N
                                                         i=1 n(di , wj )P(zk |di , wj )
                           P(wj |zk ) =             M      N
                                                    m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                    M
                                                    j=1 n(di , wj )P(zk |di , wj )
                            P(zk |di ) =
                                                                  n(di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   19 / 27
Reference


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA   July 27, 2010   20 / 27
Reference




           R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification,
           Wiley-Interscience, 2001.
           T. Hofmann, ”Unsupervised learning by probabilistic latent semantic
           analysis,” Machine Learning, vol. 42, 2001, p. 177–196.
           Course: ”Machine Learning CS229”, Andrew Ng, Stanford University




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA      July 27, 2010   21 / 27
Appendix

   Generative model for word/document co-occurence
       select a document di with probability (w.p) P(di )
       pick a latent class zk w.p P(zk |di )
       generate a word wj w.p P(wj |zk )
                                K                                K
            P(di , wj ) =           P(di , wj |zk )P(zk ) =            P(wj |zk )P(di |zk )P(zk )
                              k=1                                k=1
                                                                  K
                                                            =          P(wj |zk )P(zk |di )P(di )
                                                                 k=1
                                                                          K
                                                            = P(di )           P(wj |zk )P(zk |di )
                                                                         k=1
                                               P(di , wj ) = P(wj |di )P(di )
                                                                 K
                                            =⇒ P(wj |di ) =            P(zk |di )P(wj |zk )
                                                                 k=1

Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA                          July 27, 2010   22 / 27
Appendix




                                                        K
                                       P(wj |di ) =          P(zk |di )P(wj |zk )
                                                       k=1
                       K
           since       k=1 P(zk |di )       = 1, P(wj , di ) is convex combination of P(wj |zk )
           ≈ each document is modelled as a mixture of topics




                                                                                                    Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                     July 27, 2010     23 / 27
Appendix




                                               P(di , wj |zk )P(zk )
                               P(zk |di , wj ) =                                            (5)
                                                   P(di , wj )
                                               P(wj |zk )P(di |zk )P(zk )
                                             =                                              (6)
                                                       P(di , wj )
                                               P(wj |zk )P(zk |di )
                                             =                                              (7)
                                                   P(wj |di )
                                                 P(wj |zk )P(zk |di )
                                             = K                                            (8)
                                                 l=1 P(wj |zl )P(zl |di )

   From (5) to (6) by conditional independence assumption (3). From (7) to
   (8) by (4).                                                        Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)          EM in pLSA               July 27, 2010   24 / 27
Appendix




   Lagrange multipliers τk , ρi
                        N     M                    K
               H=                  n(di , wj )          P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                    k=1
                                                              
                        K                   M                        N              K
                   +         τk 1 −              P(wj |di ) +            ρi 1 −         P(zk |di )
                       k=1                  j=1                      i=1            k=1

                                                  N
                         ∂H                       i=1 P(zk |di , wj )n(di , wj )
                                   =                                                − τk = 0
                       ∂P(wj |zk )                        P(wj |zk )
                                                  M
                         ∂H                       j=1 n(di , wj )P(zk |di , wj )
                                   =                                                − ρi = 0
                       ∂P(zk |di )                        P(zk |di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                           July 27, 2010   25 / 27
Appendix




                M
   from         j=1 P(wj |zk )      =1

                                             M   N
                                   τk =               P(zk |di , wj )n(di , wj )
                                            j=1 i=1

                K
   from         k=1 P(zk |di , wj )         =1

                                                     ρi = n(di )

   =⇒ P(wj |zk ), P(zk |di )                                                                       Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                   July 27, 2010     26 / 27
Appendix


  Applying the Jensen’s inequality




           f (x) = log (x), concave function

                                    p(x (i) , z (i) ; θ)                      p(x (i) , z (i) ; θ)
                f    Ez (i) ∼Qi                              ≥ Ez (i) ∼Qi f
                                       Qi (z (i) )                               Qi (z (i) )

                                                                                                      Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                      July 27, 2010     27 / 27

More Related Content

What's hot (20)

Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
shanelynn
 
Image processing
Image processing
Pooja G N
 
Loss Function.pptx
Loss Function.pptx
funnyworld18
 
Sampling Distributions and Estimators
Sampling Distributions and Estimators
Long Beach City College
 
K-Means manual work
K-Means manual work
Dr.E.N.Sathishkumar
 
Chapter 6: OPERATIONS ON GRAPHS
Chapter 6: OPERATIONS ON GRAPHS
Rupali Rana
 
Linear discriminant analysis
Linear discriminant analysis
Bangalore
 
Instance based learning
Instance based learning
Slideshare
 
algebric structure_UNIT 4.ppt
algebric structure_UNIT 4.ppt
DivyaRasrogi
 
Section 11: Normal Subgroups
Section 11: Normal Subgroups
Kevin Johnson
 
K-means Clustering
K-means Clustering
Sajib Sen
 
Singular value decomposition (SVD)
Singular value decomposition (SVD)
Luis Serrano
 
bag-of-words models
bag-of-words models
Xiaotao Zou
 
Image transforms
Image transforms
Visvesvaraya National Institute of Technology, Nagpur, Maharashtra, India
 
Recursion DM
Recursion DM
Rokonuzzaman Rony
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
petitegeek
 
equivalence and countability
equivalence and countability
ROHAN GAIKWAD
 
A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projects
Mukesh Kumar
 
CLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in Biostatic
Muhammad Amir Sohail
 
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Isaac Yowetu
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
shanelynn
 
Image processing
Image processing
Pooja G N
 
Loss Function.pptx
Loss Function.pptx
funnyworld18
 
Chapter 6: OPERATIONS ON GRAPHS
Chapter 6: OPERATIONS ON GRAPHS
Rupali Rana
 
Linear discriminant analysis
Linear discriminant analysis
Bangalore
 
Instance based learning
Instance based learning
Slideshare
 
algebric structure_UNIT 4.ppt
algebric structure_UNIT 4.ppt
DivyaRasrogi
 
Section 11: Normal Subgroups
Section 11: Normal Subgroups
Kevin Johnson
 
K-means Clustering
K-means Clustering
Sajib Sen
 
Singular value decomposition (SVD)
Singular value decomposition (SVD)
Luis Serrano
 
bag-of-words models
bag-of-words models
Xiaotao Zou
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
petitegeek
 
equivalence and countability
equivalence and countability
ROHAN GAIKWAD
 
A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projects
Mukesh Kumar
 
CLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in Biostatic
Muhammad Amir Sohail
 
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Isaac Yowetu
 

Similar to EM algorithm and its application in probabilistic latent semantic analysis (20)

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talk
Christian Robert
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 
Machine learning (9)
Machine learning (9)
NYversity
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
Shadhin Rahman
 
Logics of the laplace transform
Logics of the laplace transform
Tarun Gehlot
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
Jae-kwang Kim
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
BigMC
 
Ml mle_bayes
Ml mle_bayes
Phong Vo
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...
Alexander Decker
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Alexander Decker
 
sada_pres
sada_pres
Stephane Senecal
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Eesti Pank
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
SEENET-MTP
 
Computation of the marginal likelihood
Computation of the marginal likelihood
BigMC
 
Image denoising
Image denoising
Yap Wooi Hen
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
Frank Nielsen
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talk
Christian Robert
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 
Machine learning (9)
Machine learning (9)
NYversity
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
Shadhin Rahman
 
Logics of the laplace transform
Logics of the laplace transform
Tarun Gehlot
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
Jae-kwang Kim
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
BigMC
 
Ml mle_bayes
Ml mle_bayes
Phong Vo
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...
Alexander Decker
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Alexander Decker
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Eesti Pank
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
SEENET-MTP
 
Computation of the marginal likelihood
Computation of the marginal likelihood
BigMC
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
Frank Nielsen
 
Ad

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
zukun
 
My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
zukun
 
Ad

Recently uploaded (20)

FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Oracle Cloud and AI Specialization Program
Oracle Cloud and AI Specialization Program
VICTOR MAESTRE RAMIREZ
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Oracle Cloud and AI Specialization Program
Oracle Cloud and AI Specialization Program
VICTOR MAESTRE RAMIREZ
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 

EM algorithm and its application in probabilistic latent semantic analysis

  • 1. EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA) Duc-Hieu Tran tdh.net [at] gmail.com Nanyang Technological University July 27, 2010 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 1 / 27
  • 2. The parameter estimation problem Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 2 / 27
  • 3. The parameter estimation problem Introduction Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi ) =⇒ optimal classifier P(ωj |x) ∝ p(x|ωj )p(ωj ) decide ωi if p(ωi |x) > P(ωj |x), ∀j = i In practice, p(x|ωi ) is unknown – just estimated from training samples (e.g., assume p(x|ωi ) ∼ N (µi , Σi )). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 3 / 27
  • 4. The parameter estimation problem Frequentist vs. Bayesian schools Frequentist parameters – quantities whose values are fixed but unknown. the best estimate of their values – the one that maximizes the probability of obtaining the observed samples. Bayesian paramters – random variables having some known prior distribution. observation of the samples converts this to a posterior density; revising our opinion about the true values of the parameters. Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 4 / 27
  • 5. The parameter estimation problem Examples training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )} frequentist: maximum likelihood max p(y (i) |x (i) ; θ) θ i bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I) m P(θ|S) ∝ P(y (i) |x (i) , θ) .P(θ) i=1 θMAP = arg max P(θ|S) θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 5 / 27
  • 6. EM algorithm Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 6 / 27
  • 7. EM algorithm An estimation problem training set of m independent samples: {x (1) , x (2) , . . . , x (m) } goal: fit the paramters of a model p(x, z) to the data the likelihood: m m (i) (θ) = log p(x ; θ) = log p(x (i) , z; θ) i=1 i=1 z explicitly maximize (θ) might be difficult. z - laten random variable if z (i) were observed, then maximum likelihood estimation would be easy. strategy: repeatedly construct a lower-bound on (E-step) and optimize that lower-bound (M-step). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 7 / 27
  • 8. EM algorithm EM algorithm (1) digression: Jensen’s inequality. f – convex function; E [f (X )] ≥ f (E [X ]) for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0 (θ) = log p(x (i) ; θ) i = log p(x (i) , z (i) ; θ) i z (i) p(x (i) , z (i) ; θ) = log Qi (z (i) ) (1) i Qi (z (i) ) z (i) applying Jensen’s inequality, concave function log p(x (i) , z (i) ; θ) ≥ Qi (z (i) )log (2) i Qi (z (i) ) z (i) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 8 / 27
  • 9. EM algorithm EM algorithm (2) for any set of distribution Qi , formula (2) gives a lower-bound on (θ) how to choose Qi ? strategy: make the inequality hold with equality at our particular value of θ. require: p(x (i) , z (i) ; θ) =c Qi (z (i) ) c – constant not depend on z (i) choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ) we know z Qi (z (i) ) = 1, so p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) Qi (z (i) ) = = = p(z (i) |x (i) ; θ) z p(x (i) , z; θ) p(x (i) ; θ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 9 / 27
  • 10. EM algorithm EM algorithm (3) Qi – posterior distribution of z (i) given x (i) and the parameter θ EM algorithm: repeat until convergence E-step: for each i Qi (z (i) ) := p(z (i) |x (i) ; θ) M-step: p(x (i) , z (i) ; θ) θ := arg max Qi (z (i) ) log θ i Qi (z (i) ) z (i) The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 10 / 27
  • 11. EM algorithm EM algorithm (4) Digression: coordinate ascent algorithm. maxW (α1 , . . . αm ) α loop until converge: for i ∈ 1, . . . , m: αi = arg max W (α1 , . . . , αi , . . . , αm ) ˆ αi ˆ EM-algorithm as coordinate ascent algorithm p(x (i) , z (i) ; θ) J(Q, θ) = Qi (z (i) ) log i Qi (z (i) ) z (i) (θ) ≥ J(Q, θ) EM algorithm can be viewed as coordinate ascent on J E-step: maximize w.r.t Q M-step: maximize w.r.t θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 11 / 27
  • 12. Probabilistic Latent Sematic Analysis Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 12 / 27
  • 13. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (1) set of documents D = {d1 , . . . , dN } set of words W = {w1 , . . . , wM } set of unobserved classes Z = {z1 , . . . , zK } conditional independence assumption: P(di , wj |zk ) = P(di |zk )P(wj |zk ) (3) so, K P(wj |di ) = P(zk |di )P(wj |zk ) (4) k=1 K P(di , wj ) = P(di ) P(wj |zk )P(zk |di ) k=1 More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 13 / 27
  • 14. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (2) n(di , wj ) – # word wj in doc. di Likelihood N N M L= P(di ) = [P(di , wj )]n(di ,wj ) i=1 i=1 j=1 N M K n(di ,wj ) = P(di ) P(wj |zk )P(zk |di ) i=1 j=1 k=1 log-likelihood = log(L) N M K = n(di , wj ) log P(di ) + n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 14 / 27
  • 15. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (3) maximize w.r.t P(wj |zk ), P(zk |di ) ≈ maximize N M K n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) = n(di , wj ) log Qk (zk ) Qk (zk ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) ≥ n(di , wj ) Qk (zk ) log Qk (zk ) i=1 j=1 k=1 choose P(wj |zk )P(zk |di ) Qk (zk ) = K = P(zk |di , wj ) l=1 P(wj |zl )P(zl |di ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 15 / 27
  • 16. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (4) ≈ maximize (w.r.t P(wj |zk ), P(zk |di )) N M K P(wj |zk )P(zk |di ) n(di , wj ) P(zk |di , wj ) log P(zk |di , wj ) i=1 j=1 k=1 ≈ maximize N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 16 / 27
  • 17. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (5) EM-algorithm E-step: update P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: maximize w.r.t P(wj |zk ), P(zk |di ) N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 subject to M P(wj |zk ) = 1, k ∈ {1 . . . K } j=1 K P(zk |di ) = 1, i ∈ {1 . . . N} k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 17 / 27
  • 18. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (6) Solution of maximization problem in M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) M where, n(di ) = j=1 n(di , wj ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 18 / 27
  • 19. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (7) All together E-step: P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 19 / 27
  • 20. Reference Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 20 / 27
  • 21. Reference R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Wiley-Interscience, 2001. T. Hofmann, ”Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, vol. 42, 2001, p. 177–196. Course: ”Machine Learning CS229”, Andrew Ng, Stanford University Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 21 / 27
  • 22. Appendix Generative model for word/document co-occurence select a document di with probability (w.p) P(di ) pick a latent class zk w.p P(zk |di ) generate a word wj w.p P(wj |zk ) K K P(di , wj ) = P(di , wj |zk )P(zk ) = P(wj |zk )P(di |zk )P(zk ) k=1 k=1 K = P(wj |zk )P(zk |di )P(di ) k=1 K = P(di ) P(wj |zk )P(zk |di ) k=1 P(di , wj ) = P(wj |di )P(di ) K =⇒ P(wj |di ) = P(zk |di )P(wj |zk ) k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 22 / 27
  • 23. Appendix K P(wj |di ) = P(zk |di )P(wj |zk ) k=1 K since k=1 P(zk |di ) = 1, P(wj , di ) is convex combination of P(wj |zk ) ≈ each document is modelled as a mixture of topics Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 23 / 27
  • 24. Appendix P(di , wj |zk )P(zk ) P(zk |di , wj ) = (5) P(di , wj ) P(wj |zk )P(di |zk )P(zk ) = (6) P(di , wj ) P(wj |zk )P(zk |di ) = (7) P(wj |di ) P(wj |zk )P(zk |di ) = K (8) l=1 P(wj |zl )P(zl |di ) From (5) to (6) by conditional independence assumption (3). From (7) to (8) by (4). Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 24 / 27
  • 25. Appendix Lagrange multipliers τk , ρi N M K H= n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1   K M N K + τk 1 − P(wj |di ) + ρi 1 − P(zk |di ) k=1 j=1 i=1 k=1 N ∂H i=1 P(zk |di , wj )n(di , wj ) = − τk = 0 ∂P(wj |zk ) P(wj |zk ) M ∂H j=1 n(di , wj )P(zk |di , wj ) = − ρi = 0 ∂P(zk |di ) P(zk |di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 25 / 27
  • 26. Appendix M from j=1 P(wj |zk ) =1 M N τk = P(zk |di , wj )n(di , wj ) j=1 i=1 K from k=1 P(zk |di , wj ) =1 ρi = n(di ) =⇒ P(wj |zk ), P(zk |di ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 26 / 27
  • 27. Appendix Applying the Jensen’s inequality f (x) = log (x), concave function p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) f Ez (i) ∼Qi ≥ Ez (i) ∼Qi f Qi (z (i) ) Qi (z (i) ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 27 / 27