SlideShare a Scribd company logo
Natural LanguageProcessing
Yuriy Guts – Jul 09, 2016
Who Is This Guy?
Data Science Team Lead
Sr. Data Scientist
Software Architect, R&D Engineer
I also teach Machine Learning:
What is NLP?
Study of interaction between computers and human languages
NLP = Computer Science + AI + Computational Linguistics
Common NLP Tasks
Easy Medium Hard
• Chunking
• Part-of-Speech Tagging
• Named Entity Recognition
• Spam Detection
• Thesaurus
• Syntactic Parsing
• Word Sense Disambiguation
• Sentiment Analysis
• Topic Modeling
• Information Retrieval
• Machine Translation
• Text Generation
• Automatic Summarization
• Question Answering
• Conversational Interfaces
Interdisciplinary Tasks: Speech-to-Text
Interdisciplinary Tasks: Image Captioning
What Makes NLP so Hard?
Ambiguity
Non-Standard Language
Also: neologisms, complex entity names, phrasal verbs/idioms
More Complex Languages Than English
• German: Donaudampfschiffahrtsgesellschaftskapitän (5 “words”)
• Chinese: 50,000 different characters (2-3k to read a newspaper)
• Japanese: 3 writing systems
• Thai: Ambiguous word boundaries and sentence concepts
• Slavic: Different word forms depending on gender, case, tense
Write Traditional “If-Then-Else” Rules?
BIG NOPE!
Leads to very large and complex codebases.
Still struggles to capture trivial cases (for a human).
Better Approach: Machine Learning
“ • A computer program is said to learn from experience E
• with respect to some class of tasks T and performance measure P,
• if its performance at tasks in T, as measured by P,
• improves with experience E.
— Tom M. Mitchell
Part 1
Essential Machine Learning Backgroundfor NLP
Before We Begin: Disclaimer
• This will be a very quick description of ML. By no means exhaustive.
• Only the essential background for what we’ll have in Part 2.
• To fit everything into a small timeframe, I’ll simplify some aspects.
• I encourage you to read ML books or watch videos to dig deeper.
Common ML Tasks
• Regression
• Classification (Binary or Multi-Class)
1. Supervised Learning
2. Unsupervised Learning
• Clustering
• Anomaly Detection
• Latent Variable Models (Dimensionality Reduction, EM, …)
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Regression
Predict a continuous dependent variable
based on independent predictors
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Linear Regression
Natural Language Processing (NLP)
After adding polynomial features
Natural Language Processing (NLP)
Classification
Assign an observation to some category
from a known discrete list of categories
Logistic Regression
Class A
Class B
(Multi-class extension = Softmax Regression)
Neural Networks
and Backpropagation Algorithm
http://playground.tensorflow.org/
Clustering
Group objects in such a way
that objects in the same group are similar,
and objects in the different groups are not
K-Means Clustering
Evaluation
How do we know if an ML model is good?
What do we do if something goes wrong?
Underfitting & Overfitting
Development & Troubleshooting
• Picking the right metric: MAE, RMSE, AUC, Cross-Entropy, Log-Loss
• Training Set / Validation Set / Test Set split
• Picking hyperparameters against Validation Set
• Regularization to prevent OF
• Plotting learning curves to check for UF/OF
Deep Learning
• Core idea: instead of hand-crafting complex features, use increased computing
capacity and build a deep computation graph that will try to learn feature
representations on its own.
End-to-end learning rather than a cascade of apps.
• Works best with lots of homogeneous, spatially related features
(image pixels, character sequences, audio signal measurements).
Usually works poorly otherwise.
• State-of-the-art and/or superhuman performance on many tasks.
• Typically requires massive amounts of data and training resources.
• But: a very young field. Theories not strongly established, views change.
Example: Convolutional Neural Network
Part 2
NLP Challenges And Approaches
“Classical” NLP Pipeline
Tokenization
Morphology
Syntax
Semantics
Discourse
Break text into sentences and words, lemmatize
Part of speech (POS) tagging, stemming, NER
Constituency/dependency parsing
Coreference resolution, wordsense disambiguation
Task-dependent (sentiment, …)
Often Relies on Language Banks
• WordNet (ontology, semantic similarity tree)
• Penn Treebank (POS, grammar rules)
• PropBank (semantic propositions)
• …Dozens of them!
Tokenization & Stemming
POS/NER Tagging
Parsing (LPCFG)
“Classical” way: Training a NER Tagger
Task: Predict whether the word is a PERSON, LOCATION, DATE or OTHER.
Could be more than 3 NER tags (e.g. MUC-7 contains 7 tags).
1. Current word.
2. Previous, next word (context).
3. POS tags of current word and nearby words.
4. NER label for previous word.
5. Word substrings (e.g. ends in “burg”, contains “oxa” etc.)
6. Word shape (internal capitalization, numerals, dashes etc.).
7. …on and on and on…
Features:
Feature Representation: Bag of Words
A single word is a one-hot encoding vector with the size of the dictionary :(
Problem
• Manually designed features are often over-specified, incomplete,
take a long time to design and validate.
• Often requires PhD-level knowledge of the domain.
• Researchers spend literally decades hand-crafting features.
• Bag of words model is very high-dimensional and sparse,
cannot capture semantics or morphology.
Maybe Deep Learning can help?
Deep Learning for NLP
• Core enabling idea: represent words as dense vectors
[0 1 0 0 0 0 0 0 0] [0.315 0.136 0.831]
• Try to capture semantic and morphologic similarity so that the features
for “similar” words are “similar”
(e.g. closer in Euclidean space).
• Natural language is context dependent: use context for learning.
• Straightforward (but slow) way: build a co-occurrence matrix and SVD it.
Embedding Methods: Word2Vec
CBoW version: predict center word from context Skip-gram version: predict context from center word
Benefits
• Learns features of each word on its own, given a text corpus.
• No heavy preprocessing is required, just a corpus.
• Word vectors can be used as features for lots of supervised
learning applications: POS, NER, chunking, semantic role labeling.
All with pretty much the same network architecture.
• Similarities and linear relationships between word vectors.
• A bit more modern representation: GloVe, but requires more RAM.
Linearities
Training a NER Tagger: Deep Learning
Just replace this with NER tag
(or POS tag, chunk end, etc.)
Language Modeling
Assign high probabilities to well-formed sentences
(crucial for text generation, speech recognition, machine translation)
“Classical” Way: N-Grams
Problem: doesn’t scale well to bigger N. N = 5 is pretty much the limit.
Deep Learning Way: Recurrent NN (RNN)
Can use past information without restricting the size of the context.
But: in practice, can’t recall information that came in a long time ago.
Long Short Term Memory Network (LSTM)
Contains gates that control forgetting, adding, updating and outputting information.
Surprisingly amazing performance at language tasks compared to vanilla RNN.
Tackling Hard Tasks
Deep Learning enables end-to-
end learning for Machine
Translation, Image Captioning,
Text Generation, Summarization:
NLP tasks which are inherently
very hard!
RNN for Machine Translation
Hottest Current Research
• Attention Networks
• Dynamic Memory Networks
(see ICML 2016 proceedings)
Tools I Used
• NLTK (Python)
• Gensim (Python)
• Stanford CoreNLP (Java with bindings)
• Apache OpenNLP (Java with bindings)
Deep Learning Frameworks with GPU Support:
• Torch (Torch-RNN) (Lua)
• TensorFlow, Theano, Keras (Python)
NLP Progress for Ukrainian
• Ukrainian lemma dictionary with POS tags
https://github.com/arysin/dict_uk
• Ukrainian lemmatizer plugin for ElasticSearch
https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
• lang-uk project (1M corpus, NER, tokenization, etc.)
https://github.com/lang-uk
Demo 1: Exploring Semantic Properties Of ASOIAF(“Game of Thrones”)
Demo 2: TopicModeling for DOU.UA Comments
GitHub Repos with IPython Notebooks
• https://github.com/YuriyGuts/thrones2vec
• https://github.com/YuriyGuts/dou-topic-modeling
yuriy.guts@gmail.com
linkedin.com/in/yuriyguts
github.com/YuriyGuts

More Related Content

PPTX
Natural language processing
PDF
Natural language processing (nlp)
PPTX
Natural language processing
PDF
Natural language processing (NLP) introduction
PPT
Introduction to Natural Language Processing
PPTX
Natural Language Processing
PPTX
NLP PPT.pptx
PPT
Natural language processing
Natural language processing
Natural language processing (nlp)
Natural language processing
Natural language processing (NLP) introduction
Introduction to Natural Language Processing
Natural Language Processing
NLP PPT.pptx
Natural language processing

What's hot (20)

PPTX
Natural Language Processing
PDF
Natural Language Processing
PDF
Natural language processing
PDF
Introduction to Natural Language Processing (NLP)
PDF
Introduction to natural language processing
PPT
Natural Language Processing
PPTX
PPTX
Natural Language Processing
PPTX
Natural language processing
PPTX
Introduction to natural language processing, history and origin
PDF
Natural Language Processing with Python
PDF
NLP using transformers
PPTX
Natural Language processing
PPTX
Language models
PDF
Natural language processing
PPTX
Introduction to Named Entity Recognition
PDF
Recurrent Neural Networks, LSTM and GRU
PPTX
Natural Language Processing in AI
Natural Language Processing
Natural Language Processing
Natural language processing
Introduction to Natural Language Processing (NLP)
Introduction to natural language processing
Natural Language Processing
Natural Language Processing
Natural language processing
Introduction to natural language processing, history and origin
Natural Language Processing with Python
NLP using transformers
Natural Language processing
Language models
Natural language processing
Introduction to Named Entity Recognition
Recurrent Neural Networks, LSTM and GRU
Natural Language Processing in AI
Ad

Viewers also liked (17)

PPT
Introduction to Natural Language Processing
PPTX
Building effective communication skills using NLP
PPTX
Selling technique - With NLP
PPTX
How People Really Hold and Touch (their Phones)
PDF
What 33 Successful Entrepreneurs Learned From Failure
PDF
Upworthy: 10 Ways To Win The Internets
PDF
Five Killer Ways to Design The Same Slide
PDF
A-Z Culture Glossary 2017
PDF
Digital Strategy 101
PDF
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
PDF
The What If Technique presented by Motivate Design
PDF
The Seven Deadly Social Media Sins
PDF
The History of SEO
PDF
Displaying Data
PDF
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
PDF
How Google Works
PPTX
10 Powerful Body Language Tips for your next Presentation
Introduction to Natural Language Processing
Building effective communication skills using NLP
Selling technique - With NLP
How People Really Hold and Touch (their Phones)
What 33 Successful Entrepreneurs Learned From Failure
Upworthy: 10 Ways To Win The Internets
Five Killer Ways to Design The Same Slide
A-Z Culture Glossary 2017
Digital Strategy 101
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
The What If Technique presented by Motivate Design
The Seven Deadly Social Media Sins
The History of SEO
Displaying Data
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
How Google Works
10 Powerful Body Language Tips for your next Presentation
Ad

Similar to Natural Language Processing (NLP) (20)

PDF
Challenges in transfer learning in nlp
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PPTX
asdrfasdfasdf
PPTX
NLP Introduction and basics of natural language processing
PPTX
Natural Language Processing (NLP).pptx
PDF
Machine Learning of Natural Language
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PPTX
An Introduction to Recent Advances in the Field of NLP
PPTX
NLP Bootcamp
PDF
State-of-the-Art Text Classification using Deep Contextual Word Representations
PDF
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
PPTX
Nltk
PPTX
Networking lesson 4 chaoter 1 Module 4-1.pptx
PDF
Master LLMs with LangChain -the basics of LLM
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PDF
NLP Bootcamp 2018 : Representation Learning of text for NLP
PPTX
Natural Language Processing & its importance
PDF
Colloquium talk on modal sense classification using a convolutional neural ne...
PDF
Natural Language Processing, Techniques, Current Trends and Applications in I...
Challenges in transfer learning in nlp
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
asdrfasdfasdf
NLP Introduction and basics of natural language processing
Natural Language Processing (NLP).pptx
Machine Learning of Natural Language
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
Beyond the Symbols: A 30-minute Overview of NLP
An Introduction to Recent Advances in the Field of NLP
NLP Bootcamp
State-of-the-Art Text Classification using Deep Contextual Word Representations
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
Nltk
Networking lesson 4 chaoter 1 Module 4-1.pptx
Master LLMs with LangChain -the basics of LLM
Gnerative AI presidency Module1_L4_LLMs_new.pptx
NLP Bootcamp 2018 : Representation Learning of text for NLP
Natural Language Processing & its importance
Colloquium talk on modal sense classification using a convolutional neural ne...
Natural Language Processing, Techniques, Current Trends and Applications in I...

More from Yuriy Guts (19)

PDF
Target Leakage in Machine Learning (ODSC East 2020)
PDF
Automated Machine Learning
PDF
Target Leakage in Machine Learning
PDF
Paraphrase Detection in NLP
PDF
UCU NLP Summer Workshops 2017 - Part 2
PDF
NoSQL (ELEKS DevTalks #1 - Jan 2015)
PDF
Experiments with Machine Learning - GDG Lviv
PDF
A Developer Overview of Redis
PDF
[JEEConf 2015] Lessons from Building a Modern B2C System in Scala
PDF
Redis for .NET Developers
PDF
Aspect-Oriented Programming (AOP) in .NET
PDF
Non-Functional Requirements
PDF
Introduction to Software Architecture
PDF
UML for Business Analysts
PDF
Intro to Software Engineering for non-IT Audience
PPTX
ELEKS DevTalks #4: Amazon Web Services Crash Course
PPTX
ELEKS Summer School 2012: .NET 09 - Databases
PPTX
ELEKS Summer School 2012: .NET 06 - Multithreading
PPTX
ELEKS Summer School 2012: .NET 04 - Resources and Memory
Target Leakage in Machine Learning (ODSC East 2020)
Automated Machine Learning
Target Leakage in Machine Learning
Paraphrase Detection in NLP
UCU NLP Summer Workshops 2017 - Part 2
NoSQL (ELEKS DevTalks #1 - Jan 2015)
Experiments with Machine Learning - GDG Lviv
A Developer Overview of Redis
[JEEConf 2015] Lessons from Building a Modern B2C System in Scala
Redis for .NET Developers
Aspect-Oriented Programming (AOP) in .NET
Non-Functional Requirements
Introduction to Software Architecture
UML for Business Analysts
Intro to Software Engineering for non-IT Audience
ELEKS DevTalks #4: Amazon Web Services Crash Course
ELEKS Summer School 2012: .NET 09 - Databases
ELEKS Summer School 2012: .NET 06 - Multithreading
ELEKS Summer School 2012: .NET 04 - Resources and Memory

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Database Infoormation System (DBIS).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Predictive modeling basics in data cleaning process
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PDF
Introduction to Data Science and Data Analysis
Clinical guidelines as a resource for EBP(1).pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Mega Projects Data Mega Projects Data
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Database Infoormation System (DBIS).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
ISS -ESG Data flows What is ESG and HowHow
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
annual-report-2024-2025 original latest.
STERILIZATION AND DISINFECTION-1.ppthhhbx
Supervised vs unsupervised machine learning algorithms
.pdf is not working space design for the following data for the following dat...
Predictive modeling basics in data cleaning process
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
Introduction to Data Science and Data Analysis

Natural Language Processing (NLP)

  • 2. Who Is This Guy? Data Science Team Lead Sr. Data Scientist Software Architect, R&D Engineer I also teach Machine Learning:
  • 3. What is NLP? Study of interaction between computers and human languages NLP = Computer Science + AI + Computational Linguistics
  • 4. Common NLP Tasks Easy Medium Hard • Chunking • Part-of-Speech Tagging • Named Entity Recognition • Spam Detection • Thesaurus • Syntactic Parsing • Word Sense Disambiguation • Sentiment Analysis • Topic Modeling • Information Retrieval • Machine Translation • Text Generation • Automatic Summarization • Question Answering • Conversational Interfaces
  • 7. What Makes NLP so Hard?
  • 9. Non-Standard Language Also: neologisms, complex entity names, phrasal verbs/idioms
  • 10. More Complex Languages Than English • German: Donaudampfschiffahrtsgesellschaftskapitän (5 “words”) • Chinese: 50,000 different characters (2-3k to read a newspaper) • Japanese: 3 writing systems • Thai: Ambiguous word boundaries and sentence concepts • Slavic: Different word forms depending on gender, case, tense
  • 11. Write Traditional “If-Then-Else” Rules? BIG NOPE! Leads to very large and complex codebases. Still struggles to capture trivial cases (for a human).
  • 12. Better Approach: Machine Learning “ • A computer program is said to learn from experience E • with respect to some class of tasks T and performance measure P, • if its performance at tasks in T, as measured by P, • improves with experience E. — Tom M. Mitchell
  • 13. Part 1 Essential Machine Learning Backgroundfor NLP
  • 14. Before We Begin: Disclaimer • This will be a very quick description of ML. By no means exhaustive. • Only the essential background for what we’ll have in Part 2. • To fit everything into a small timeframe, I’ll simplify some aspects. • I encourage you to read ML books or watch videos to dig deeper.
  • 15. Common ML Tasks • Regression • Classification (Binary or Multi-Class) 1. Supervised Learning 2. Unsupervised Learning • Clustering • Anomaly Detection • Latent Variable Models (Dimensionality Reduction, EM, …)
  • 18. Regression Predict a continuous dependent variable based on independent predictors
  • 26. Classification Assign an observation to some category from a known discrete list of categories
  • 27. Logistic Regression Class A Class B (Multi-class extension = Softmax Regression)
  • 30. Clustering Group objects in such a way that objects in the same group are similar, and objects in the different groups are not
  • 32. Evaluation How do we know if an ML model is good? What do we do if something goes wrong?
  • 34. Development & Troubleshooting • Picking the right metric: MAE, RMSE, AUC, Cross-Entropy, Log-Loss • Training Set / Validation Set / Test Set split • Picking hyperparameters against Validation Set • Regularization to prevent OF • Plotting learning curves to check for UF/OF
  • 35. Deep Learning • Core idea: instead of hand-crafting complex features, use increased computing capacity and build a deep computation graph that will try to learn feature representations on its own. End-to-end learning rather than a cascade of apps. • Works best with lots of homogeneous, spatially related features (image pixels, character sequences, audio signal measurements). Usually works poorly otherwise. • State-of-the-art and/or superhuman performance on many tasks. • Typically requires massive amounts of data and training resources. • But: a very young field. Theories not strongly established, views change.
  • 37. Part 2 NLP Challenges And Approaches
  • 38. “Classical” NLP Pipeline Tokenization Morphology Syntax Semantics Discourse Break text into sentences and words, lemmatize Part of speech (POS) tagging, stemming, NER Constituency/dependency parsing Coreference resolution, wordsense disambiguation Task-dependent (sentiment, …)
  • 39. Often Relies on Language Banks • WordNet (ontology, semantic similarity tree) • Penn Treebank (POS, grammar rules) • PropBank (semantic propositions) • …Dozens of them!
  • 43. “Classical” way: Training a NER Tagger Task: Predict whether the word is a PERSON, LOCATION, DATE or OTHER. Could be more than 3 NER tags (e.g. MUC-7 contains 7 tags). 1. Current word. 2. Previous, next word (context). 3. POS tags of current word and nearby words. 4. NER label for previous word. 5. Word substrings (e.g. ends in “burg”, contains “oxa” etc.) 6. Word shape (internal capitalization, numerals, dashes etc.). 7. …on and on and on… Features:
  • 44. Feature Representation: Bag of Words A single word is a one-hot encoding vector with the size of the dictionary :(
  • 45. Problem • Manually designed features are often over-specified, incomplete, take a long time to design and validate. • Often requires PhD-level knowledge of the domain. • Researchers spend literally decades hand-crafting features. • Bag of words model is very high-dimensional and sparse, cannot capture semantics or morphology. Maybe Deep Learning can help?
  • 46. Deep Learning for NLP • Core enabling idea: represent words as dense vectors [0 1 0 0 0 0 0 0 0] [0.315 0.136 0.831] • Try to capture semantic and morphologic similarity so that the features for “similar” words are “similar” (e.g. closer in Euclidean space). • Natural language is context dependent: use context for learning. • Straightforward (but slow) way: build a co-occurrence matrix and SVD it.
  • 47. Embedding Methods: Word2Vec CBoW version: predict center word from context Skip-gram version: predict context from center word
  • 48. Benefits • Learns features of each word on its own, given a text corpus. • No heavy preprocessing is required, just a corpus. • Word vectors can be used as features for lots of supervised learning applications: POS, NER, chunking, semantic role labeling. All with pretty much the same network architecture. • Similarities and linear relationships between word vectors. • A bit more modern representation: GloVe, but requires more RAM.
  • 50. Training a NER Tagger: Deep Learning Just replace this with NER tag (or POS tag, chunk end, etc.)
  • 51. Language Modeling Assign high probabilities to well-formed sentences (crucial for text generation, speech recognition, machine translation)
  • 52. “Classical” Way: N-Grams Problem: doesn’t scale well to bigger N. N = 5 is pretty much the limit.
  • 53. Deep Learning Way: Recurrent NN (RNN) Can use past information without restricting the size of the context. But: in practice, can’t recall information that came in a long time ago.
  • 54. Long Short Term Memory Network (LSTM) Contains gates that control forgetting, adding, updating and outputting information. Surprisingly amazing performance at language tasks compared to vanilla RNN.
  • 55. Tackling Hard Tasks Deep Learning enables end-to- end learning for Machine Translation, Image Captioning, Text Generation, Summarization: NLP tasks which are inherently very hard! RNN for Machine Translation
  • 56. Hottest Current Research • Attention Networks • Dynamic Memory Networks (see ICML 2016 proceedings)
  • 57. Tools I Used • NLTK (Python) • Gensim (Python) • Stanford CoreNLP (Java with bindings) • Apache OpenNLP (Java with bindings) Deep Learning Frameworks with GPU Support: • Torch (Torch-RNN) (Lua) • TensorFlow, Theano, Keras (Python)
  • 58. NLP Progress for Ukrainian • Ukrainian lemma dictionary with POS tags https://github.com/arysin/dict_uk • Ukrainian lemmatizer plugin for ElasticSearch https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer • lang-uk project (1M corpus, NER, tokenization, etc.) https://github.com/lang-uk
  • 59. Demo 1: Exploring Semantic Properties Of ASOIAF(“Game of Thrones”) Demo 2: TopicModeling for DOU.UA Comments
  • 60. GitHub Repos with IPython Notebooks • https://github.com/YuriyGuts/thrones2vec • https://github.com/YuriyGuts/dou-topic-modeling