SlideShare a Scribd company logo
NLP & Deep
Learning for
non-experts
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
How to start projects in machine learning?
• Kaggle competitions ---
• Make sure to solve the ML problems for concept development
before competing
How to start projects in machine learning?
• Kaggle competitions ---
• Make sure to solve the ML
problems for concept
development before
competing
How to start projects in machine learning?
• Self guided workshops/projects ---
lets say you have data from Zomato
• Restaurant recommendation --
user based, content similarity
based.
• Restaurant tags from reviews.
• Sentiment analysis from reviews.
Outline
• What is NLP
• Bag of Words model for sentiment analysis using scikit learn
• DeepDive into deep learning
• Solve the sentiment analysis problem using keras
• A short into Convolution Neural Networks (CNN)
What is Natural
Language Processing?
• Giving structure to unstructured data
• Learn properties of the data that makes
decision making simple
• Provide concise information to drive
intelligence of different systems.
Why?
• Unstructured data cannot be consumed
directly
• Automate simple and complex
functionalities
• Inferences from text data becomes
queriable. This could help with regular BU
reports
• Understand customers better and take
necessary actions for better experience.
Applications
• Categorization of text
• Building domain specific Knowledge Graph
• Recommendations
• Web --- Search
• HR --- people analytics
• Medical --- drug discovery, automated
diagnosis
• ………..
What are the underlying tasks?
• Syntactic Parsing of sentences --- parsing based on structure
• Part of Speech Tagging
• Semantic Parsing -- mapping text directly into formal query language,
e.g. SQL queries for a pre-determined database schema.
• Dialogue state tracking --- chatbots
• Machine Translation
• Language modeling
• Text extraction
• Classification
Text Classification
Text Pre - processing Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall performance
• Training Data Collection / Examples
of classes that we are trying to model
• Model performance is directly
correlated with quality of training
data
• Model selection
• Architecture
• Parameter Tuning
User
Online
Model Evaluation
Text Data
Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
Model Building: a simple Bag of words (BOW)
model
https://realpython.com/python-keras-text-classification/
Model Building: a simple BOW model
https://realpython.com/python-keras-text-classification/
Deep
Learning
Deep learning algorithms seek
to exploit the unknown
structure in the input
distribution in order to discover
good representations, often at
multiple levels, with higher-level
learned features defined in
terms of lower-level features.
--- Yoshua Bengio
a kind of
learning where
the
representation
you form have
several levels of
abstraction,
rather than a
direct input to
output --- Peter
Norvig
When you hear the term deep learning, just think
of a large deep neural net. Deep refers to the
number of layers typically and so this kind of the
popular term that’s been adopted in the press. I
think of them as deep neural networks generally.
--- Andrew Ng
Why now?
• Explosion in labelled data.
• Exponential growth in
computation power with
cloud computing and
availability of GPUs
• Improvements in setting
initial conditions and
activation functions
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
What is a neuron?
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
What is a neuron?
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
Neural Network
a1
a2
a3
Neural Network
• Each node is a function with input
and output vectors
• Every network structure is defined
by a set of functions
Output Layer
• Loss is minimized using
Gradient Descent
• Find network parameters
such that the loss is
minimized
• This is done by taking
derivatives of the loss wrt
parameters.
• Next the parameters are
updated by subtracting
learning rate times the
derivative
Commonly
used loss
functions
• Mean Squared Error Loss
• Mean Squared Logarithmic Error Loss
• Mean Absolute Error Loss
Regression Loss Functions
• Binary Cross-Entropy
• Hinge Loss
• Squared Hinge Loss
Binary Classification Loss Functions
• Multi-Class Cross-Entropy Loss
• Sparse Multiclass Cross-Entropy Loss
• Kullback Leibler Divergence Loss
Multi-Class Classification Loss Functions
Cost
Function –
Cross
Entropy
Dropout -- avoid overfitting
• Large weights in a neural network are a
sign of a more complex network that has
overfit the training data.
• Probabilistically dropping out nodes in the
network is a simple and effective
regularization method.
• A large network with more training and the
use of a weight constraint are suggested
when using dropout.
Optimization
Techniques
Gradient Descent
Adagrad
RMSprop
Adam
…
Adam Optimization
• adaptive moment estimation
• The method computes individual adaptive learning rates for different
parameters from estimates of first and second moments of the
gradients.
• Calculates an exponential moving average of the gradient and the
squared gradient, parameters control the decay rates of these moving
averages.
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Derivative
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Derivative
a = max(0,z)
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
https://arxiv.org/abs/1710.05941v1
Text Classification Reminder!
https://realpython.com/python-keras-text-classification/
Text Classification using feed forward NN
https://realpython.com/python-keras-text-classification/
Text Classification using feed forward NN
Fit & measure accuracy!
plot_history(history)
Clearly overfits the data!
Can we do better? Word Embeddings
• Words are represented as dense
vectors
• These vectors are
• Learned during the training
task by the neural network
• Pre-trained, learned from
Language Models
• Encode the semantic meaning of
the word.
Text Pre-processing with Keras
PaddingTokenizing
Start with an Embedding Layer
• Embedding Layer of Keras which takes the previously calculated integers and
maps them to a dense vector of the embedding.
o Parameters
Ø input_dim: the size of the vocabulary
Ø output_dim: the size of the dense vector
Ø input_length: the length of the sequence
Hope to see you soon
Nice to see you again
After training
https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
Add a pooling layer
• MaxPooling1D/AveragePooling1D or
a GlobalMaxPooling1D/GlobalAveragePooling1D layer
• way to downsample (a way to reduce the size of) the incoming
feature vectors.
• Global max/average pooling takes the maximum/average of all
features whereas in the other case you have to define the pool size.
Definition of
the entire
model
Training
Using pre-trained word embeddings will lead to an accuracy of
0.82. This is a case of transfer learning.
https://realpython.com/python-keras-text-classification
Embeddings + Maxpooling -- Benifits
• Power of generalization --- embeddings are able to share information
across similar features.
• Fewer nodes with zero values.
Convolution Neural Network
Detect features ! Downsample.
What is a CNN?
In a traditional feedforward neural network we connect each
input neuron to each output neuron in the next layer. That’s
also called a fully connected layer, or affine layer.
• We use convolutions over the input layer to compute the
output. This results in local connections, where each region
of the input is connected to a neuron in the output. Each
layer applies different filters and combines the result
• During the training phase, a CNN automatically learns the
values of its filters based on the task you want to perform.
Tricky --- dimensions keep changing as we go from one layer to another
Model definition
Embedding_dim = 50
maxlen=10
Advantages
of CNN
• Character Based CNN
• Has the ability to deal with out of vocabulary
words. This makes it particularly suitable for user
generated raw text.
• Works for multiple languages.
• Model size is small since the tokens are limited to
the number of characters ~ 70. This makes real
life deployments easier and faster.
• Networks with convolutional and pooling
layers are useful for classification tasks in
which we expect to find strong local clues
regarding class membership.
Takeaways!
• If you have text data you need to use NLP
• Try a simple bag of words model for your data
• Having a high level understanding of deep learning will help with
better judgement in architecture design and choice of parameters.
• Deep Learning has the potential to give high performance, you do
need large amount of training data for the benefits.
Thank You
@sangha_deb
sangha123@gmail.com
Visualization of the architecture
50
10
GlobalMaxPool1D
DenseLayer
Sigmoid
Cov1D
Some helpful courses
https://www.coursera.org/learn/classification-vector-spaces-in-nlp
Appendix
Transfer Learning
Character Based CNNs.
https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf
• Embedding Layer
• Six convolutional layers, and 3 convolutional layers followed by a max pooling layer
• Two fully connected layer(dense layer in keras), neuron units are 1024.
• Output layer(dense layer), neuron units depends on classes. In this task, we set it 4.
Pre-processing
Setting Embedding Weights
Model
https://towardsdatascience.com/character-level-cnn-with-keras-50391c3adf33

More Related Content

PPTX
Intro to ml_2021
PPTX
NLP Classifier Models & Metrics
PPTX
Developing Recommendation System to provide a Personalized Learning experienc...
PDF
Introduction to machine learning
PDF
PDF
H transformer-1d paper review!!
PDF
PDF
Intro to ml_2021
NLP Classifier Models & Metrics
Developing Recommendation System to provide a Personalized Learning experienc...
Introduction to machine learning
H transformer-1d paper review!!

What's hot (20)

PPTX
Word embedding
PDF
Week 4 advanced labeling, augmentation and data preprocessing
PPT
LearningKit.ppt
PPTX
Introduction to Machine Learning
PDF
Course 2 Machine Learning Data LifeCycle in Production - Week 1
PPTX
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
PPTX
Introduction to Machine Learning
PDF
PDF
Deep Dive into Hyperparameter Tuning
PDF
Lecture 5 machine learning updated
DOC
DagdelenSiriwardaneY..
PDF
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
PPTX
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
PDF
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
PPTX
Analytics Boot Camp - Slides
PPTX
Real-time DirectTranslation System for Sinhala and Tamil Languages.
PDF
PDF
Deep Learning Interview Questions and Answers | Edureka
PDF
Feature Engineering
PPT
Machine Learning Applications in NLP.ppt
Word embedding
Week 4 advanced labeling, augmentation and data preprocessing
LearningKit.ppt
Introduction to Machine Learning
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Introduction to Machine Learning
Deep Dive into Hyperparameter Tuning
Lecture 5 machine learning updated
DagdelenSiriwardaneY..
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
Analytics Boot Camp - Slides
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Deep Learning Interview Questions and Answers | Edureka
Feature Engineering
Machine Learning Applications in NLP.ppt
Ad

Similar to NLP and Deep Learning for non_experts (20)

PPTX
Deep learning
PDF
Introduction to Convolutional Neural Networks
PPT
presentation.ppt
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PPTX
Deep learning summary
PPTX
Deep learning with TensorFlow
PDF
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
PPT
Overview of Deep Learning and its advantage
PPT
Introduction to Deep Learning presentation
PPT
deepnet-lourentzou.ppt
PPT
Deep learning is a subset of machine learning and AI
PDF
Neural Networks and Deep Learning
PPTX
What Deep Learning Means for Artificial Intelligence
PPTX
Introduction to Deep Learning and Tensorflow
PPTX
Deep Learning with Python (PyData Seattle 2015)
PDF
M7 - Neural Networks in machine learning.pdf
PPTX
Deep Learning for Text (Text Mining) LSTM
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PPTX
Deep Learning, Keras, and TensorFlow
Deep learning
Introduction to Convolutional Neural Networks
presentation.ppt
Artificial Intelligence, Machine Learning and Deep Learning
Deep learning summary
Deep learning with TensorFlow
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Overview of Deep Learning and its advantage
Introduction to Deep Learning presentation
deepnet-lourentzou.ppt
Deep learning is a subset of machine learning and AI
Neural Networks and Deep Learning
What Deep Learning Means for Artificial Intelligence
Introduction to Deep Learning and Tensorflow
Deep Learning with Python (PyData Seattle 2015)
M7 - Neural Networks in machine learning.pdf
Deep Learning for Text (Text Mining) LSTM
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Deep Learning, Keras, and TensorFlow
Ad

More from Sanghamitra Deb (13)

PDF
odsc_2023.pdf
PPTX
Multi-modal sources for predictive modeling using deep learning
PPTX
Computer Vision Landscape : Present and Future
PDF
Intro to NLP: Text Categorization and Topic Modeling
PPTX
Computer Vision for Beginners
PDF
NLP and Machine Learning for non-experts
PDF
Democratizing NLP content modeling with transfer learning using GPUs
PDF
Natural Language Comprehension: Human Machine Collaboration.
PDF
Data day2017
PDF
Extracting knowledgebase from text
PDF
Extracting medical attributes and finding relations
PDF
From Rocket Science to Data Science
PPTX
Understanding Product Attributes from Reviews
odsc_2023.pdf
Multi-modal sources for predictive modeling using deep learning
Computer Vision Landscape : Present and Future
Intro to NLP: Text Categorization and Topic Modeling
Computer Vision for Beginners
NLP and Machine Learning for non-experts
Democratizing NLP content modeling with transfer learning using GPUs
Natural Language Comprehension: Human Machine Collaboration.
Data day2017
Extracting knowledgebase from text
Extracting medical attributes and finding relations
From Rocket Science to Data Science
Understanding Product Attributes from Reviews

Recently uploaded (20)

PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Sustainable Sites - Green Building Construction
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Well-logging-methods_new................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Total quality management ppt for engineering students
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
introduction to datamining and warehousing
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPT
Project quality management in manufacturing
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PPT on Performance Review to get promotions
Safety Seminar civil to be ensured for safe working.
Sustainable Sites - Green Building Construction
R24 SURVEYING LAB MANUAL for civil enggi
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Well-logging-methods_new................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Total quality management ppt for engineering students
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Construction Project Organization Group 2.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Geodesy 1.pptx...............................................
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
introduction to datamining and warehousing
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Project quality management in manufacturing
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT on Performance Review to get promotions

NLP and Deep Learning for non_experts

  • 1. NLP & Deep Learning for non-experts Sanghamitra Deb Staff Data Scientist Chegg Inc
  • 2. How to start projects in machine learning? • Kaggle competitions --- • Make sure to solve the ML problems for concept development before competing
  • 3. How to start projects in machine learning? • Kaggle competitions --- • Make sure to solve the ML problems for concept development before competing
  • 4. How to start projects in machine learning? • Self guided workshops/projects --- lets say you have data from Zomato • Restaurant recommendation -- user based, content similarity based. • Restaurant tags from reviews. • Sentiment analysis from reviews.
  • 5. Outline • What is NLP • Bag of Words model for sentiment analysis using scikit learn • DeepDive into deep learning • Solve the sentiment analysis problem using keras • A short into Convolution Neural Networks (CNN)
  • 6. What is Natural Language Processing? • Giving structure to unstructured data • Learn properties of the data that makes decision making simple • Provide concise information to drive intelligence of different systems.
  • 7. Why? • Unstructured data cannot be consumed directly • Automate simple and complex functionalities • Inferences from text data becomes queriable. This could help with regular BU reports • Understand customers better and take necessary actions for better experience.
  • 8. Applications • Categorization of text • Building domain specific Knowledge Graph • Recommendations • Web --- Search • HR --- people analytics • Medical --- drug discovery, automated diagnosis • ………..
  • 9. What are the underlying tasks? • Syntactic Parsing of sentences --- parsing based on structure • Part of Speech Tagging • Semantic Parsing -- mapping text directly into formal query language, e.g. SQL queries for a pre-determined database schema. • Dialogue state tracking --- chatbots • Machine Translation • Language modeling • Text extraction • Classification
  • 10. Text Classification Text Pre - processing Collecting Training Data Model Building Offline SME • Reduces noise • Ensures quality • Improves overall performance • Training Data Collection / Examples of classes that we are trying to model • Model performance is directly correlated with quality of training data • Model selection • Architecture • Parameter Tuning User Online Model Evaluation
  • 11. Text Data Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
  • 12. Model Building: a simple Bag of words (BOW) model https://realpython.com/python-keras-text-classification/
  • 13. Model Building: a simple BOW model https://realpython.com/python-keras-text-classification/
  • 14. Deep Learning Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. --- Yoshua Bengio a kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output --- Peter Norvig When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically and so this kind of the popular term that’s been adopted in the press. I think of them as deep neural networks generally. --- Andrew Ng
  • 15. Why now? • Explosion in labelled data. • Exponential growth in computation power with cloud computing and availability of GPUs • Improvements in setting initial conditions and activation functions
  • 16. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions?
  • 17. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions? What is a neuron?
  • 18. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions? What is a neuron?
  • 22. Neural Network • Each node is a function with input and output vectors • Every network structure is defined by a set of functions
  • 24. • Loss is minimized using Gradient Descent • Find network parameters such that the loss is minimized • This is done by taking derivatives of the loss wrt parameters. • Next the parameters are updated by subtracting learning rate times the derivative
  • 25. Commonly used loss functions • Mean Squared Error Loss • Mean Squared Logarithmic Error Loss • Mean Absolute Error Loss Regression Loss Functions • Binary Cross-Entropy • Hinge Loss • Squared Hinge Loss Binary Classification Loss Functions • Multi-Class Cross-Entropy Loss • Sparse Multiclass Cross-Entropy Loss • Kullback Leibler Divergence Loss Multi-Class Classification Loss Functions
  • 27. Dropout -- avoid overfitting • Large weights in a neural network are a sign of a more complex network that has overfit the training data. • Probabilistically dropping out nodes in the network is a simple and effective regularization method. • A large network with more training and the use of a weight constraint are suggested when using dropout.
  • 29. Adam Optimization • adaptive moment estimation • The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. • Calculates an exponential moving average of the gradient and the squared gradient, parameters control the decay rates of these moving averages. https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
  • 30. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 31. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 32. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish Derivative
  • 33. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish Derivative a = max(0,z)
  • 34. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 35. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish https://arxiv.org/abs/1710.05941v1
  • 37. Text Classification using feed forward NN https://realpython.com/python-keras-text-classification/
  • 38. Text Classification using feed forward NN
  • 39. Fit & measure accuracy! plot_history(history) Clearly overfits the data!
  • 40. Can we do better? Word Embeddings • Words are represented as dense vectors • These vectors are • Learned during the training task by the neural network • Pre-trained, learned from Language Models • Encode the semantic meaning of the word.
  • 41. Text Pre-processing with Keras PaddingTokenizing
  • 42. Start with an Embedding Layer • Embedding Layer of Keras which takes the previously calculated integers and maps them to a dense vector of the embedding. o Parameters Ø input_dim: the size of the vocabulary Ø output_dim: the size of the dense vector Ø input_length: the length of the sequence Hope to see you soon Nice to see you again After training https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
  • 43. Add a pooling layer • MaxPooling1D/AveragePooling1D or a GlobalMaxPooling1D/GlobalAveragePooling1D layer • way to downsample (a way to reduce the size of) the incoming feature vectors. • Global max/average pooling takes the maximum/average of all features whereas in the other case you have to define the pool size.
  • 45. Training Using pre-trained word embeddings will lead to an accuracy of 0.82. This is a case of transfer learning. https://realpython.com/python-keras-text-classification
  • 46. Embeddings + Maxpooling -- Benifits • Power of generalization --- embeddings are able to share information across similar features. • Fewer nodes with zero values.
  • 47. Convolution Neural Network Detect features ! Downsample.
  • 48. What is a CNN? In a traditional feedforward neural network we connect each input neuron to each output neuron in the next layer. That’s also called a fully connected layer, or affine layer. • We use convolutions over the input layer to compute the output. This results in local connections, where each region of the input is connected to a neuron in the output. Each layer applies different filters and combines the result • During the training phase, a CNN automatically learns the values of its filters based on the task you want to perform. Tricky --- dimensions keep changing as we go from one layer to another
  • 50. Advantages of CNN • Character Based CNN • Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user generated raw text. • Works for multiple languages. • Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life deployments easier and faster. • Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership.
  • 51. Takeaways! • If you have text data you need to use NLP • Try a simple bag of words model for your data • Having a high level understanding of deep learning will help with better judgement in architecture design and choice of parameters. • Deep Learning has the potential to give high performance, you do need large amount of training data for the benefits.
  • 53. Visualization of the architecture 50 10 GlobalMaxPool1D DenseLayer Sigmoid Cov1D
  • 57. Character Based CNNs. https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf • Embedding Layer • Six convolutional layers, and 3 convolutional layers followed by a max pooling layer • Two fully connected layer(dense layer in keras), neuron units are 1024. • Output layer(dense layer), neuron units depends on classes. In this task, we set it 4.