SlideShare a Scribd company logo
Building NLP solutions
using Python
By Ramu Pulipati,
@botsplash
Introduction to NLP
• Natural Language:
• General purpose communications
• Distinct difference between humans and Animals
• Much difficult to interpret from Formal Language
• Natural Language Processing (NLP) Advancements
• Earlier focus was on Linguistics and Computer Science
• Current evolution is focused on Machine Learning, specifically
Deep Learning and Neural Networks
• Varied degrees of implementation based on use case
Scope of Natural Language Processing
• Read
• Natural Language Understanding (NLU)
• Write
• Natural Language Generation (NLG)
• Speak
• Speech Recognition / Syntesis
NLP Applications
More Applications …
• Email Spam
• Siri / Alexa / Cortana
• Legal Contacts to find Action
clauses
• Health Care Records
• Energy Sector / Utilities /
Inspection Records
• Automated Agents
• Appointment Scheduling
• Auto Email Responses
• Typing Suggestions
• Spelling Check
• Predicting Crops
• Social Media Propaganda
• Press/Earnings releases
• Weather Reports
• Search Engines
• News categorization
• Chatbot
• NY Times Oped author analysis
State of NLP
Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
Botsplash AI Strategy
Machine
Learning
Natural
Language
Processing
Predictive
Analytics
Routing Intelligence
High Intent Conversion Detection
Trends and Behavior
End Chat, Spam Detection
Content and Sentiment
FAQ, Support, Transaction
Chatbot
Re-engagement
Smart Scheduling
UI Interactions
Focus on solvable/acceptable problems
I’m looking for 30yr mortgage loan in Charlotte, NC
(Named Entity Recognition)
Thanks for your help. Great chatting with you.
(classification)
Lets connect tomorrow. Anytime evening will work for me.
(classification / intent / actionable)
This rate is unacceptable. What can you do?
(sentiment)
Leading NLP Providers
• AWS Comprehend
• Google Cloud NLP
• Microsoft Project Oxford
• IBM Watson
• Aylien
• Cennest Comparison:
https://cognitiveintegratorapp.azurewebsites.net/
Text Processing Roundup
• Normalization
• Text Classification
• Text Similarity
• Text Extraction
• Topic Modeling
• Semantic Search
• Sentiment Analysis
NLP Pipeline
• Classical
follows
traditional ML
strategies
• Deep Learning
requires lot of
data
Getting started
• Python Installation. Use 3+.
• Data science packages installation. Use “pip install” or Anaconda
• Always use “virtualenv” when setting up environments.
• Start with Jupyter notebooks and convert it production code.
• Use cloud hosted jupyter notebooks with access to GPU from
floydhub, paperspace, Google, Amazon or Azure
Python packages for NLP
• NLP Focus Packages
• NLTK
• Spacy
• Gensim
• Textblob
• Scikit Learn
• Stanford NLP (java)
• WordNet, SentiWordNet
• FastText / MUSE / Faiss
• Deep Learning Frameworks
• Tensorflow / Keras
• Pytorch
• Other Noteworth
• Scrapy
• Newspaper
• nlp-architect
NLTK Code Tour
• Tokenization (Dictionary and Regex)
• Stemming
• Lemma
• NLP Grammar - Chunking and Chinking
• Entity Recognition
• WikiQuiz
Word Embeddings
• Paper published by Mikolov 2013
Example: Man is to Woman, then King is to _______
• Multi-dimensional space of word representations with proximity
based on similarity of the words (word vectors)
• Algebraic expressions can be applied on Word vectors
• Building Word embedding: Provide lot of data with features to look
• Word2vec is a popular word embedding implemented with Neural
network
• Other implementations such as Glove use co-occurrence matrices
Word2vec paper results
Spacy.io Lightning Tour
• Industrial Strength, Fast
• POS Tagging and Dependency Parsing
• Named Entities, Word embedding and Similarity
• Custom Pipelines
• Visualization
Text classification
• Use cases: Spam, Actionable events, Intents
• For Content based or Request based
classification
• Steps involve Preparing -> Training ->
Prediction
• Feature Extractions
• Bag of Words
• TD-IDF model
• Word Vectors: Averaged, TD-IDF, tc
• Starspace model
• FastText
• Classification alg: Multinomial Bayes or SVM
Steps to classifying your data
1. Identify tags to be applied
2. Manually add tags for the
data (possibly in the
application)
3. Build a classification
algorithm
4. Setup your application to
auto classify tags
5. Evaluate silently and then
enable the actions
Sentiment Analysis
• Use case: Reviews, Chat transcripts, etc
• Supervised techniques are effective for a domain
• Packages:
• SentiWordNet
• StanfordNLP
• Spacy Sentiment Analysis (incomplete)
Summarization
• Summarization is hard
• Uses variety of techniques including Text extraction, Feature Matrix,
TD-IDF, Co-location, SVD and other methods
• Implement LSA to under
• Review of implementations:
• Spacy
• TextRank
• Pyteaser
• Textteaser
• Sumy
Code Review / Demo Apps
• Jupyter Notebooks
• NLTK Code Review
• Space Code Review
• NLTK Grammar Parsing
• WikiQuiz
• Sequence to Sequence Chatbot
• DeepQA demo
• Topic Modeling Code Review
• Text Similarity – Phrase Matcher API
Follow up Learning
• Websites:
• Allen AI - NLP
• Fast AI
• Malabuba
• Coursera
• Youtube
• Resources
• Sanni Oluwatoyin Yetunde
Google Slides
• Cambridge Data Science
Group presentation
• nlp.fast.ai

More Related Content

PPTX
Building NLP solutions for Davidson ML Group
PPTX
Feature Engineering for NLP
PPTX
Natural language processing: feature extraction
PPT
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
PDF
Natural Language Search in Solr
PDF
Webinar: OpenNLP and Solr for Superior Relevance
PPTX
The power of community: training a Transformer Language Model on a shoestring
PDF
Natural Language Processing with Graph Databases and Neo4j
Building NLP solutions for Davidson ML Group
Feature Engineering for NLP
Natural language processing: feature extraction
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
Natural Language Search in Solr
Webinar: OpenNLP and Solr for Superior Relevance
The power of community: training a Transformer Language Model on a shoestring
Natural Language Processing with Graph Databases and Neo4j

What's hot (20)

PDF
Build Mandarin AI Conversational Agent with Rasa
PDF
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
PDF
The State of #NLProc
PPTX
Learning to Rank Presentation (v2) at LexisNexis Search Guild
PDF
Nikko Ström at AI Frontiers: Deep Learning in Alexa
PDF
Introduction to Natural Language Processing (NLP)
PDF
Hacking Lucene and Solr for Fun and Profit
PDF
Webinar: Simpler Semantic Search with Solr
PPTX
An Introduction to Natural Language Processing
PDF
Sentiment Analysis Using Solr
PDF
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
PDF
Shrinking the Haystack" using Solr and OpenNLP
PDF
NLP from scratch
PPTX
Vectorization - Georgia Tech - CSE6242 - March 2015
PDF
Building a Neural Machine Translation System From Scratch
PPTX
Searching with vectors
PDF
Sequence Modelling with Deep Learning
ODP
MongoDB & Machine Learning
PPTX
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Build Mandarin AI Conversational Agent with Rasa
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
The State of #NLProc
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Introduction to Natural Language Processing (NLP)
Hacking Lucene and Solr for Fun and Profit
Webinar: Simpler Semantic Search with Solr
An Introduction to Natural Language Processing
Sentiment Analysis Using Solr
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Shrinking the Haystack" using Solr and OpenNLP
NLP from scratch
Vectorization - Georgia Tech - CSE6242 - March 2015
Building a Neural Machine Translation System From Scratch
Searching with vectors
Sequence Modelling with Deep Learning
MongoDB & Machine Learning
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Ad

Similar to Building NLP solutions using Python (20)

PPTX
Taming Text
PPTX
Introduction to Text Mining
KEY
Machine Learning & Apache Mahout
PPTX
Software Programming with Python II.pptx
PPTX
How Oracle Uses CrowdFlower For Sentiment Analysis
PDF
AI presentation and introduction - Retrieval Augmented Generation RAG 101
PPTX
Machine Learning Toolssssssssssssss.pptx
PPTX
python_libraries_for_artificial_intelligence.pptx
PPTX
Natural language processing and search
PPTX
aistudy-240521200530-db141c56 RAG AI.pptx
PPTX
Final presentation
PDF
Deep learning for NLP
PPTX
Designing and Implementing Search Solutions
PDF
Webinar: Fusion 3.1 - What's New
PPTX
Data science and Hadoop
PDF
Drupal and Apache Stanbol
PDF
Data Acquisition for Sentiment Analysis
PPTX
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
PPTX
Text Mining
PPT
Pythonn-machine-learning-with-python.ppt
Taming Text
Introduction to Text Mining
Machine Learning & Apache Mahout
Software Programming with Python II.pptx
How Oracle Uses CrowdFlower For Sentiment Analysis
AI presentation and introduction - Retrieval Augmented Generation RAG 101
Machine Learning Toolssssssssssssss.pptx
python_libraries_for_artificial_intelligence.pptx
Natural language processing and search
aistudy-240521200530-db141c56 RAG AI.pptx
Final presentation
Deep learning for NLP
Designing and Implementing Search Solutions
Webinar: Fusion 3.1 - What's New
Data science and Hadoop
Drupal and Apache Stanbol
Data Acquisition for Sentiment Analysis
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Text Mining
Pythonn-machine-learning-with-python.ppt
Ad

More from botsplash.com (14)

PDF
Migrating to postgresql
PPTX
Bootstrap SaaS startup using Open Source Tools
PPTX
Devops Days, 2019 - Charlotte
PPTX
Getting started with postgresql
PPTX
Chat interfaces, Extension to Digital Marketing
PPTX
Cloud computing options
PPTX
Data Science meets Digital Marketing
PPTX
botsplash deep dive
PPTX
Building Twitter bot using Python
PPTX
Python for data science
PPTX
Live development & tools
PPTX
AI Use Cases discussion
PPTX
Career advice for beginner software engineers
PPTX
Node.js Getting Started &amd Best Practices
Migrating to postgresql
Bootstrap SaaS startup using Open Source Tools
Devops Days, 2019 - Charlotte
Getting started with postgresql
Chat interfaces, Extension to Digital Marketing
Cloud computing options
Data Science meets Digital Marketing
botsplash deep dive
Building Twitter bot using Python
Python for data science
Live development & tools
AI Use Cases discussion
Career advice for beginner software engineers
Node.js Getting Started &amd Best Practices

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Transforming Manufacturing operations through Intelligent Integrations
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced IT Governance
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Transforming Manufacturing operations through Intelligent Integrations
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
Advanced IT Governance
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx

Building NLP solutions using Python

  • 1. Building NLP solutions using Python By Ramu Pulipati, @botsplash
  • 2. Introduction to NLP • Natural Language: • General purpose communications • Distinct difference between humans and Animals • Much difficult to interpret from Formal Language • Natural Language Processing (NLP) Advancements • Earlier focus was on Linguistics and Computer Science • Current evolution is focused on Machine Learning, specifically Deep Learning and Neural Networks • Varied degrees of implementation based on use case
  • 3. Scope of Natural Language Processing • Read • Natural Language Understanding (NLU) • Write • Natural Language Generation (NLG) • Speak • Speech Recognition / Syntesis
  • 5. More Applications … • Email Spam • Siri / Alexa / Cortana • Legal Contacts to find Action clauses • Health Care Records • Energy Sector / Utilities / Inspection Records • Automated Agents • Appointment Scheduling • Auto Email Responses • Typing Suggestions • Spelling Check • Predicting Crops • Social Media Propaganda • Press/Earnings releases • Weather Reports • Search Engines • News categorization • Chatbot • NY Times Oped author analysis
  • 6. State of NLP Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
  • 7. Botsplash AI Strategy Machine Learning Natural Language Processing Predictive Analytics Routing Intelligence High Intent Conversion Detection Trends and Behavior End Chat, Spam Detection Content and Sentiment FAQ, Support, Transaction Chatbot Re-engagement Smart Scheduling UI Interactions
  • 8. Focus on solvable/acceptable problems I’m looking for 30yr mortgage loan in Charlotte, NC (Named Entity Recognition) Thanks for your help. Great chatting with you. (classification) Lets connect tomorrow. Anytime evening will work for me. (classification / intent / actionable) This rate is unacceptable. What can you do? (sentiment)
  • 9. Leading NLP Providers • AWS Comprehend • Google Cloud NLP • Microsoft Project Oxford • IBM Watson • Aylien • Cennest Comparison: https://cognitiveintegratorapp.azurewebsites.net/
  • 10. Text Processing Roundup • Normalization • Text Classification • Text Similarity • Text Extraction • Topic Modeling • Semantic Search • Sentiment Analysis
  • 11. NLP Pipeline • Classical follows traditional ML strategies • Deep Learning requires lot of data
  • 12. Getting started • Python Installation. Use 3+. • Data science packages installation. Use “pip install” or Anaconda • Always use “virtualenv” when setting up environments. • Start with Jupyter notebooks and convert it production code. • Use cloud hosted jupyter notebooks with access to GPU from floydhub, paperspace, Google, Amazon or Azure
  • 13. Python packages for NLP • NLP Focus Packages • NLTK • Spacy • Gensim • Textblob • Scikit Learn • Stanford NLP (java) • WordNet, SentiWordNet • FastText / MUSE / Faiss • Deep Learning Frameworks • Tensorflow / Keras • Pytorch • Other Noteworth • Scrapy • Newspaper • nlp-architect
  • 14. NLTK Code Tour • Tokenization (Dictionary and Regex) • Stemming • Lemma • NLP Grammar - Chunking and Chinking • Entity Recognition • WikiQuiz
  • 15. Word Embeddings • Paper published by Mikolov 2013 Example: Man is to Woman, then King is to _______ • Multi-dimensional space of word representations with proximity based on similarity of the words (word vectors) • Algebraic expressions can be applied on Word vectors • Building Word embedding: Provide lot of data with features to look • Word2vec is a popular word embedding implemented with Neural network • Other implementations such as Glove use co-occurrence matrices
  • 17. Spacy.io Lightning Tour • Industrial Strength, Fast • POS Tagging and Dependency Parsing • Named Entities, Word embedding and Similarity • Custom Pipelines • Visualization
  • 18. Text classification • Use cases: Spam, Actionable events, Intents • For Content based or Request based classification • Steps involve Preparing -> Training -> Prediction • Feature Extractions • Bag of Words • TD-IDF model • Word Vectors: Averaged, TD-IDF, tc • Starspace model • FastText • Classification alg: Multinomial Bayes or SVM
  • 19. Steps to classifying your data 1. Identify tags to be applied 2. Manually add tags for the data (possibly in the application) 3. Build a classification algorithm 4. Setup your application to auto classify tags 5. Evaluate silently and then enable the actions
  • 20. Sentiment Analysis • Use case: Reviews, Chat transcripts, etc • Supervised techniques are effective for a domain • Packages: • SentiWordNet • StanfordNLP • Spacy Sentiment Analysis (incomplete)
  • 21. Summarization • Summarization is hard • Uses variety of techniques including Text extraction, Feature Matrix, TD-IDF, Co-location, SVD and other methods • Implement LSA to under • Review of implementations: • Spacy • TextRank • Pyteaser • Textteaser • Sumy
  • 22. Code Review / Demo Apps • Jupyter Notebooks • NLTK Code Review • Space Code Review • NLTK Grammar Parsing • WikiQuiz • Sequence to Sequence Chatbot • DeepQA demo • Topic Modeling Code Review • Text Similarity – Phrase Matcher API
  • 23. Follow up Learning • Websites: • Allen AI - NLP • Fast AI • Malabuba • Coursera • Youtube • Resources • Sanni Oluwatoyin Yetunde Google Slides • Cambridge Data Science Group presentation • nlp.fast.ai

Editor's Notes

  • #3: Natural language is ambiguous, where formal language is precise Formal language: Programming language
  • #8: The botsplash framework encompasses and build on strong concepts and strategy to augment business processes to achieve best outcome for business and customers of the business botsplash is a Software-as-a-Service platform on a model of B-2-b-2-C. We want the “B”(business) to provide “C”(consumers of business) the best, easy to use and reliable technology to reduce costs , increase business transactions, efficiency and customer satisfaction.
  • #12: ML Strategies: * Explore data and use visualizations * Create Train and Test data * Setup training algorithm and feature * Train Model * Test the result * Rinse and Repeat until the results are satisfactory
  • #19: Multinomial Naïve Bayes is used to predict more than 2 classes. Popular Bayes algorithm that expects each feature is independent Support vector machine are supervised algorithms used for classification, regression, anomaly and outlier detections For classification algorithm, we focus on following metrics: accuracy, precision, recall and f1 score