SlideShare a Scribd company logo
Introduction to theano
Case study of word embedding models
Shashank
Gupta
MS, SIEL IIIT-H
Numpy is fast, why theano?
Bound by CPU
Bound by Python interpreter (less scope of runtime optimization like lazy
evaluation etc.)
Lack of symbolic differentiation
Finite difference method exists, but prone to numerical errors
SymPy supports symbolic differentiation but expression graphs not optimized as
theano
What’s theano?
Math expression compiler
Compiles a math expression to optimized C code
Targeted for GPU (CPU if GPU not supported)
Strongly typed as compared to python’s dynamically typed system
Works as GPU metaprogrammer (abstraction on top of GPU programming)
Alternative to theano
CUDA (PyCUDA it’s python wrapper)
Asynchronous Stochastic Gradient Descent - smart way to parallelize SGD
Main idea - Don’t create lock for one update, let threads run in async.
Could speed up ML algos more than matrix-vector parallelization
Automatic Differentiation
Crude way - Finite difference method
df / dx = (f(x + h) - f(x - h)) / 2*h
Prone to coding bugs
Prone to numerical error
Symbolic differentiation
Calculate gradient analytically
For a given math expression construct symbolic computation graph
Ref:
http://cs231n.github.io/optimization-2/
Backprop using symbolic differentiation
Once graph is computed forward pass is just flow of input to compute output
Backprop is going backwards from last node to compute gradients locally at each
node and accumulate those local gradients to compute global gradient (using
chain rule)
Ex :
Cont.
Each entity in the computation graph is a symbolic variable (a node in
computational graph)
Each operation is an ‘op’ node (theano specific)
‘Op’ node takes some inputs and produces some output
‘Apply’ node which applies op to inputs
‘Type’ nodes with associated type information of symbolic variable
Cont.
This is kind of abstract representation of math expression
Can think as intermediate representation in standard compiler phase
This is how theano represents the computational graph internally
Cont.
Ref: http://deeplearning.net/software/theano/extending/graphstructures.html
Optimization
Once this ‘intermediate representation’ is generated theano optimizes this graph
for efficient computation
Can think of it as compiler optimization phase
It reorders, remove redundant expression etc. to generate an ‘equivalent’ graph
which gives same output as input
Shared variables
Symbolic variables with predefined values
In Machine learning used to define parameters of the models with pre-defined
values (W, b matrices in NN)
Sends these values to host GPU with optimised storage
Theano functions
Compiles the ‘abstract’ computation graph to optimised C code
Compiles it targeting GPU
C code is highly optimized for numeric computation
Sort of interface between theano code and it’s calling python code
Case study : Word embedding models
1st Model : Autoencoder based word embedding
Ref : http://arxiv.org/pdf/1412.4930v2.pdf
Cont.
U is the embedding matrix which is learned by optimizing this objective
Code
Refer to github gist :
https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09
Only theano part included, I/O and preprocessing omitted
Model 2 - GloVe word embedding
Optimizes squared loss function
Challenges in practical implementation in theano
Main thing to remember - VECTORIZED implementation
How to handle large matrices
Refer to github gist :
https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09
Model 3 : Skip-gram negative sampling
Alpha stage: Not able to figure out vectorization of loss function
Refer to github gist :
https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09
THANKS !!!

More Related Content

PDF
Deep Learning in theano
PPTX
Introduction to Machine Learning with TensorFlow
PDF
PyTorch for Deep Learning Practitioners
PDF
(Kpi summer school 2015) theano tutorial part2
PDF
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
PPTX
Tensorflow - Intro (2017)
PDF
digital signal-processing-lab-manual
PDF
TensorFlow example for AI Ukraine2016
Deep Learning in theano
Introduction to Machine Learning with TensorFlow
PyTorch for Deep Learning Practitioners
(Kpi summer school 2015) theano tutorial part2
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Tensorflow - Intro (2017)
digital signal-processing-lab-manual
TensorFlow example for AI Ukraine2016

What's hot (20)

PDF
Scientific Python
PPTX
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
PPTX
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
PPTX
Deep Learning, Keras, and TensorFlow
PDF
Predicting organic reaction outcomes with weisfeiler lehman network
PDF
Dsp lab pdf
PDF
Dsp lab manual 15 11-2016
PDF
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
DOC
Digital Signal Processing Lab Manual
PDF
Learning stochastic neural networks with Chainer
PDF
Deep Learning with PyTorch
PPTX
Introduction to PyTorch
PDF
Introduction to Deep Learning, Keras, and TensorFlow
PDF
Introduction to TensorFlow
PDF
Differences of Deep Learning Frameworks
PDF
Attention mechanisms with tensorflow
PDF
Dive Into PyTorch
PDF
Pytorch for tf_developers
PPTX
Tensor flow
Scientific Python
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Deep Learning, Keras, and TensorFlow
Predicting organic reaction outcomes with weisfeiler lehman network
Dsp lab pdf
Dsp lab manual 15 11-2016
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Digital Signal Processing Lab Manual
Learning stochastic neural networks with Chainer
Deep Learning with PyTorch
Introduction to PyTorch
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to TensorFlow
Differences of Deep Learning Frameworks
Attention mechanisms with tensorflow
Dive Into PyTorch
Pytorch for tf_developers
Tensor flow
Ad

Similar to Introduction to theano, case study of Word Embeddings (20)

PPT
Migration To Multi Core - Parallel Programming Models
PPT
Parallel Programming Primer
PPT
Parallel Programming Primer 1
PPT
parellel computing
PDF
Unmanaged Parallelization via P/Invoke
PPT
Flowcharts and Introduction to computers
PPT
Flowchart presentation that can be useful
ODP
Parallel Programming on the ANDC cluster
PPTX
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PPTX
C Programming Unit-1
PDF
Writing Efficient Code Feb 08
PDF
Compiler gate question key
PPTX
25-MPI-OpenMP.pptx
PPT
Cpcs302 1
PDF
Automatic Task-based Code Generation for High Performance DSEL
PDF
Integrative Parallel Programming in HPC
PPT
Target updated track f
PPT
Chip Ex2010 Gert Goossens
PDF
Machine learning on streams of data
PDF
1588147798Begining_ABUAD1.pdf
Migration To Multi Core - Parallel Programming Models
Parallel Programming Primer
Parallel Programming Primer 1
parellel computing
Unmanaged Parallelization via P/Invoke
Flowcharts and Introduction to computers
Flowchart presentation that can be useful
Parallel Programming on the ANDC cluster
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
C Programming Unit-1
Writing Efficient Code Feb 08
Compiler gate question key
25-MPI-OpenMP.pptx
Cpcs302 1
Automatic Task-based Code Generation for High Performance DSEL
Integrative Parallel Programming in HPC
Target updated track f
Chip Ex2010 Gert Goossens
Machine learning on streams of data
1588147798Begining_ABUAD1.pdf
Ad

Recently uploaded (20)

PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
573137875-Attendance-Management-System-original
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
PPT on Performance Review to get promotions
PPT
Total quality management ppt for engineering students
PDF
R24 SURVEYING LAB MANUAL for civil enggi
UNIT-1 - COAL BASED THERMAL POWER PLANTS
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Geodesy 1.pptx...............................................
Automation-in-Manufacturing-Chapter-Introduction.pdf
Categorization of Factors Affecting Classification Algorithms Selection
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Mechanical Engineering MATERIALS Selection
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
Fundamentals of safety and accident prevention -final (1).pptx
additive manufacturing of ss316l using mig welding
PPT on Performance Review to get promotions
Total quality management ppt for engineering students
R24 SURVEYING LAB MANUAL for civil enggi

Introduction to theano, case study of Word Embeddings

  • 1. Introduction to theano Case study of word embedding models Shashank Gupta MS, SIEL IIIT-H
  • 2. Numpy is fast, why theano? Bound by CPU Bound by Python interpreter (less scope of runtime optimization like lazy evaluation etc.) Lack of symbolic differentiation Finite difference method exists, but prone to numerical errors SymPy supports symbolic differentiation but expression graphs not optimized as theano
  • 3. What’s theano? Math expression compiler Compiles a math expression to optimized C code Targeted for GPU (CPU if GPU not supported) Strongly typed as compared to python’s dynamically typed system Works as GPU metaprogrammer (abstraction on top of GPU programming)
  • 4. Alternative to theano CUDA (PyCUDA it’s python wrapper) Asynchronous Stochastic Gradient Descent - smart way to parallelize SGD Main idea - Don’t create lock for one update, let threads run in async. Could speed up ML algos more than matrix-vector parallelization
  • 5. Automatic Differentiation Crude way - Finite difference method df / dx = (f(x + h) - f(x - h)) / 2*h Prone to coding bugs Prone to numerical error
  • 6. Symbolic differentiation Calculate gradient analytically For a given math expression construct symbolic computation graph Ref: http://cs231n.github.io/optimization-2/
  • 7. Backprop using symbolic differentiation Once graph is computed forward pass is just flow of input to compute output Backprop is going backwards from last node to compute gradients locally at each node and accumulate those local gradients to compute global gradient (using chain rule) Ex :
  • 8. Cont. Each entity in the computation graph is a symbolic variable (a node in computational graph) Each operation is an ‘op’ node (theano specific) ‘Op’ node takes some inputs and produces some output ‘Apply’ node which applies op to inputs ‘Type’ nodes with associated type information of symbolic variable
  • 9. Cont. This is kind of abstract representation of math expression Can think as intermediate representation in standard compiler phase This is how theano represents the computational graph internally
  • 11. Optimization Once this ‘intermediate representation’ is generated theano optimizes this graph for efficient computation Can think of it as compiler optimization phase It reorders, remove redundant expression etc. to generate an ‘equivalent’ graph which gives same output as input
  • 12. Shared variables Symbolic variables with predefined values In Machine learning used to define parameters of the models with pre-defined values (W, b matrices in NN) Sends these values to host GPU with optimised storage
  • 13. Theano functions Compiles the ‘abstract’ computation graph to optimised C code Compiles it targeting GPU C code is highly optimized for numeric computation Sort of interface between theano code and it’s calling python code
  • 14. Case study : Word embedding models 1st Model : Autoencoder based word embedding Ref : http://arxiv.org/pdf/1412.4930v2.pdf
  • 15. Cont. U is the embedding matrix which is learned by optimizing this objective
  • 16. Code Refer to github gist : https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09 Only theano part included, I/O and preprocessing omitted
  • 17. Model 2 - GloVe word embedding Optimizes squared loss function Challenges in practical implementation in theano Main thing to remember - VECTORIZED implementation How to handle large matrices Refer to github gist : https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09
  • 18. Model 3 : Skip-gram negative sampling Alpha stage: Not able to figure out vectorization of loss function Refer to github gist : https://gist.github.com/shashankg7/aec2303803e7b39b150a9f78cb59db09