SlideShare a Scribd company logo
?
What’s in it for you
What is Data Science?
Data Science vs Business Intelligence
What does a Data Scientist do?
Data Science lifecycle with example
Data Scientist demand
Need for Data Science
The Prerequisites for learning Data Science
Need for Data Science
Need For Data Science
Does the thought of your car
driving you home by itself excite
you?
Is that even possible ?
Need For Data Science
Does the thought of your car
driving you home by itself excite
you?
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
Need For Data Science
You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
Need For Data Science
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
Exactly! And then let the machine
learn iteratively using
unsupervised learning!
Need For Data Science
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
You mean it will be able to take decisions like
slowing down, stopping by itself, speeding up
and all of that?
That’s interesting!
Need For Data Science
Exactly! And then let the machine
learn iteratively using
unsupervised learning!
This is where the ‘Need For Data Science’
comes into picture. Data Science helps in
making better decisions!
Self driving cars will root out more than 2 million deaths caused by car
accidents annually.
Need For Data Science
Due to lack of data available, flights
are often delayed or cancelled at
the last minute
1
Need For Data Science
We’re extremely sorry to
inform that your flight has
been delayed by 4 hours
due to bad weather
conditions. Regret the
inconvenience caused
2
3
1
Need For Data Science
Due to improper route planning,
customers don’t get the flight for
desired time and duration
We’re extremely sorry to
inform you that there are
no flights for the time
selected. There’s a
connecting flight for the
same time tomorrow.
2
3
Due to lack of data available, flights
are often delayed or cancelled at
the last minute
2
1
3
Need For Data Science
Dear Flyer, We regret to
inform you that your flight
has been cancelled due to
delay from Airbus on
account of engine delivery
Incorrect decisions in selection of
right equipment leads to unplanned
delays and cancellations
Due to lack of data available, flights
are often delayed or cancelled at
the last minute
Due to improper route planning,
customers don’t get the flight for
desired time and duration
Need For Data Science
With Data Science, it has become possible to predict
such disruptions and alleviate the loss for both airline
and the passenger
Need For Data Science
Using Data Science, we can
achieve the following: Route Planning: Whether to
schedule direct or connecting
flights
Predictive analytics model
can be built to foresee
flight delays
Deciding which class of planes
to purchase for better
performance
Promotional offers
depending on customer
booking patterns
Logistics companies like FedEx are using Data Science
models for operational efficiency
Discover the best
routes to ship
The best suited time to
deliver
The best mode of transport
Need For Data Science
Need For Data Science
So Data Science is mainly needed for:
Better Decision Making
Whether A or B?
Predictive Analysis
What will happen next?
Pattern Discovery
Is there any hidden information in the
data?
What is Data Science?
What is Data Science?
Suppose, you have decided to buy furniture online for
your new office
How do you choose the right website?
What is Data Science?
Want to buy online furniture?
Does website sell furniture
?
Yes
Rating > 4 out of 5
Yes
Purchase Product
No
Close website
No
Close website
Yes
Discount > 20%
No
Close website
Which route should my cab take so
that I reach faster?
Which viewers like the same kind
of TV shows?
Will this refrigerator fail in the next 3
years: Yes or No?
Who will win the elections?
Data Science can answer a lot of other questions as well!
What is Data Science?
What is Data Science?
Finally
communicating
and visualizing
the results
Asking the right
questions and
exploring the data
Modeling the data
using various
algorithms
So, Data Science or Data-driven Science is about:
Finally
communicating
and visualizing the
results
Modeling the data
using various
algorithms
Asking the right
questions and
exploring the data
What is Data Science?
So, Data Science or Data-driven Science is about:
Finally
communicating
and visualizing the
results
Modeling the data
using various
algorithms
Asking the right
questions and
exploring the data
What is Data Science?
So, Data Science or Data-driven Science is about:
Business Intelligence
vs Data Science
Business Intelligence vs Data Science
Structured data e.g. Data
Warehouse
Unstructured data e.g. web logs
Data Source
Method Analytical Scientific
Skills Statistics, Visualization Statistics, Visualization, Machine
Learning
Focus Past and Present Data Present Data and Future
Predictions
Criterion Business Intelligence Data Science
Prerequisites For Data Science
Prerequisites for Data Science
Only when you ask questions, you will have a
better understanding of the business problem
CURIOSITY
The following are the 3 essential traits of a Data Scientist:
Prerequisites for Data Science
COMMON SENSE
To identify new ways to solve a business
problem and to detect priority problems
The following are the 3 essential traits of a Data Scientist:
CURIOSITY
Prerequisites for Data Science
COMMUNICATION SKILLSCOMMON SENSE
A Data Scientist needs to communicate
their findings to business teams to act upon
the insights
The following are the 3 essential traits of a Data Scientist:
CURIOSITY
Machine learning is the backbone of Data
Science. It is one of the many ways that
Data Science uses to find solution to a
problem
Prerequisites for Data Science
1 MACHINE LEARNING
Prerequisites for Data Science
Mathematical Models can be extremely
helpful to make fast calculations and
predictions from what you know about
your data
1
2
MACHINE LEARNING
MATHEMATICAL
MODELLING
Prerequisites for Data Science
Statistics is foundational to Data Science.
It lets you extract knowledge and obtain
better results from the data
3
1
2
MACHINE LEARNING
STATISTICS
MATHEMATICAL
MODELLING
Prerequisites for Data Science
You should know at least one
programming language, preferably
Python or R for data modelling
4
1
2
3
MACHINE LEARNING
STATISTICS
COMPUTER
PROGRAMMING
MATHEMATICAL
MODELLING
MACHINE LEARNING
Prerequisites for Data Science
STATISTICS
COMPUTER
PROGRAMMING
The discipline of querying
databases teaches you to ask
better questions as a Data
Scientist
51
2
3
4
MATHEMATICAL
MODELLING
DATABASES
Tools/Skills used in Data Science
Skills: R, Python, Statistics
Tools: SAS, Jupyter, R studio, MATLAB,
Excel, RapidMiner
Data Analysis
Skills: ETL, SQL,Hadoop, Apache Spark,
Tools: Informatica/ Talend, AWS Redshift
Data Warehousing
Skills: R, Python libraries
Tools: Jupyter, Tableau, Cognos, RAW
Data Visualization
Skills: Python, Algebra, ML Algorithms, Statistics
Tools: Spark MLib, Mahout, Azure ML studio
Machine Learning
What does a Data Scientist do?
What does a Data Scientist do?
Real World
What does a Data Scientist do?
Raw Data
Real World
What does a Data Scientist do?
Raw Data
Process and Analyze
Real World
What does a Data Scientist do?
Raw Data
Process and Analyze
Meaningful Data
Real World
What does a Data Scientist do?
Raw Data
Process and Analyze
Meaningful Data
Real World
Useful Insights
Must Know Machine Learning Algorithms
Naive Baiyes
Support Vector MachineClustering
The most basic and important techniques that you should know as a Data
Scientist are
Decision TreeRegression
Note to instructor: Please say that they can find the videos on specific algorithms
in the video description below
Data Science Lifecycle with
Example
Concept Study – Life Cycle
CONCEPT STUDY
Understanding the problem statement, thorough study of the business model
is required.
1
2
3 4
5
6
What is the Example?
What is the
end goal?
What is the budget?
What are the
various
specifications?
Concept Study – Example
Concept of the task : Predict the price of 1.35 carat diamond
Get to know about the diamond industry, various terminologies used. Understand the business
problem and collect RELEVANT and enough data
Suppose, we get the price of diamonds from different diamond
retailers. Now, we want to find out the price of 1.35 carat diamond
Concept Study – Example
Data Preparation - Life cycle
Data Preparation
Also known as Data Munging, it is the most important aspect of Data Science
lifecycle for any valuable insights to pop up.
1
2
3 4
5
6
Data Integration
Resolving any
conflicts in the data
and handling
redundancies
Data Cleaning
Correcting inconsistent data
by filling out missing values
and smoothing out noisy data Data Transformation
It involves normalization,
transformation and
aggregation of data using
ETL methodsData Reduction
Using various strategies,
reducing the size of data
but yielding the same
outcome
Data Preparation - Life cycle
Data Preparation - Example
Missing
Value
Improper
Datatype
Null Value
Data preparation: Make the data clean and valuable.
Data Preparation - Example
Ways to fill missing data values:
If dataset is huge, we can
simply remove the rows
with missing data vales. It
is the quickest way.
i.e. we use the rest of the
data to predict the values.
We can substitute missing
values with mean of rest of
the data using pandas’
dataframe in Python.
i.e. df.mean()
df.fillna(mean)
• Split the data into train data and test data in the ratio of 80:20
• It is generally advised to divide the dataset into two random partition
Data Preparation - Example
Train data (80%)
Test data (20%)
Model Planning - Life cycle
Model Planning:-
After proper understanding and cleaning of the data in hand, suitable
model is selected.1
2
3 4
5
6
Model Planning:
• This step involves Exploratory Data Analysis (EDA) to understand the relation between
variables and to see what the data can tell us
• Key variables are selected
Model Planning - Life cycle
But what is
Exploratory
Data Analysis?
Definition : Deeper analysis of dataset to better understand the data.
Model Planning - Life cycle
Goals :
• Know the datatypes and answer questions with the data
• Understand how data is distributed
• Identify outliers
• Identify patterns, if any
Techniques:
• Histogram
0
2000
4000
6000
8000
10000
12000
14000
0 0.5 1 1.5 2 2.5
TREND ANALYSIS
• Trend Analysis
Model Planning - Life cycle
Using various techniques, we can easily figure out that the relation between carat and price of diamond is linear in
nature
Model Planning - Example
Test Data
(20%)
Train Data
(80%)
Model is created
Feedback
• Train Data is used to develop model
• Test Data is used to validate model
Train Data vs Test Data
Improvement
SASMATLAB
PythonR
Various tools used in Model Planning
Model Building - Life cycle
Model Building :-
Using various analytical tools and techniques, data is transformed with the
goal of ‘discovering’ useful information to build the right model
1
2
3 4
5
6
Model Building:
On analyzing the data, we observe that the output is progressing linearly. Hence, we are using Linear Regression
Algorithm for Model Building in this case
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
Model Building - Example
Linear regression describes the relation between 2 variables i.e. X and Y
After the regression line is drawn, we can predict Y value for a input X value
using following formula: Y = mX + c
m = Slope of the line
c = Y intercept
X is Independent
variable
Model Building - Example
Linear regression describes the relation between 2 variables i.e. X and Y
After the regression line is drawn, we can predict Y value for a input X value
using following formula: Y = mX + c
m = Slope of the line
c = Y intercept
X is Independent
variable
Y is dependent
variable
Collected &
Analysed Data
(Carat, price)
Output
Test data
Model Building Prediction
(Price)
(Carat)
Model Building - Example
Using test data set, the built model is validated for the best accuracy
Feedback
Prediction:
After successful validation of the model, we predict the price of 1.35 carat diamond
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
Prediction:
Thus, using Simple Linear Regression algorithm we have implemented a successful model and predicted the price of
1.35 carat diamond to be Rs. 10,000
Model Building - Example
Rs. 15,000
Carat
Rs.5,000
Rs.10,000
Price of diamond
0.5 1.0 1.5
1.35
Regression
line
This model is easily built using Python packages like pandas,
matplotlib, numpy
We will study this in detail in the upcoming Data Science Tutorial
using Python
Model Building - Example
Communication - Life cycle
Communicate results:
Keys findings are identified and conveyed to the stakeholders
Communicate results
1
2
3 4
5
6
Communication - Life cycle
The Battle is not over yet!!
A good Data Scientist should be able to communicate his findings
with the business team such that it easily goes into execution
phase
Life cycle of Data Science project
Operationalize: -
Final reports, code, and technical documents are delivered by the team.
1
2
3 4
5
6
Summary - Life cycle
Operationalize
1
2
3 4
5
6
Concept Study
Data Preparation
Model Planning Model Building
Communicate Results
Demand for Data
Scientist
Demand for Data Scientist
Marketing
Finance
Healthcare
Gaming
Industries with high demand of Data Scientists:
Technology
Summary
Need For Data Science What is Data science? Prerequisites of data science
Demand for data scientistLifecycle with exampleTools Used in Data science
So what’s
your next step?

More Related Content

PPTX
Introduction to data science
PPTX
Data science
PPTX
introduction to data science
PPTX
Data science
PPTX
Data Science
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
Introduction to Data Science
PPTX
Introduction to data science.pptx
Introduction to data science
Data science
introduction to data science
Data science
Data Science
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Introduction to Data Science
Introduction to data science.pptx

What's hot (20)

PDF
Introduction to data science
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PPTX
Introduction of Data Science
PPTX
Introduction to Data Science
PPTX
Introduction to Data Analytics
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PDF
An introduction to Machine Learning
PDF
Introduction To Data Science
PPTX
Data science & data scientist
PPTX
Data analytics
PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
PDF
Data science presentation
PDF
Data science
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PPTX
Data analytics
PPTX
Data science
PPTX
Machine Can Think
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Introduction to data science
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Introduction of Data Science
Introduction to Data Science
Introduction to Data Analytics
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
An introduction to Machine Learning
Introduction To Data Science
Data science & data scientist
Data analytics
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data science presentation
Data science
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Data analytics
Data science
Machine Can Think
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Ad

Similar to What Is Data Science? | Introduction to Data Science | Data Science For Beginners | Simplilearn (20)

PPTX
introductiontodatascience-230122140841-b90a0856 (1).pptx
PPTX
introduction TO DS 1.pptxvbvcbvcbvcbvcbvcb
PPTX
Introduction to Data Science.pptx
PPTX
Introduction to Data Science.pptx
PDF
Guide for a Data Scientist
PDF
Data Science: lesson01_intro-to-ds-and-ml.pdf
PDF
Untitled document.pdf
PPTX
Impact of Data Science
PPTX
AI and data science notes.pptx for DICT module 2
PPTX
The Power of Data Science by DICS INNOVATIVE.pptx
PDF
Data science-Introductions-Real World Application
PDF
Introduction to Data Science.pdf
PPT
Data_Science_Presentationforlearning machine learning
PDF
Defining Data Science: A Comprehensive Overview
PPTX
intro to data science Clustering and visualization of data science subfields ...
PPTX
Datascience.pptx
PPTX
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
PPTX
Unit 1-FDS. .pptx
PPTX
What is data_science_by_khawar_shehzad
PDF
Making an impact with data science
introductiontodatascience-230122140841-b90a0856 (1).pptx
introduction TO DS 1.pptxvbvcbvcbvcbvcbvcb
Introduction to Data Science.pptx
Introduction to Data Science.pptx
Guide for a Data Scientist
Data Science: lesson01_intro-to-ds-and-ml.pdf
Untitled document.pdf
Impact of Data Science
AI and data science notes.pptx for DICT module 2
The Power of Data Science by DICS INNOVATIVE.pptx
Data science-Introductions-Real World Application
Introduction to Data Science.pdf
Data_Science_Presentationforlearning machine learning
Defining Data Science: A Comprehensive Overview
intro to data science Clustering and visualization of data science subfields ...
Datascience.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
Unit 1-FDS. .pptx
What is data_science_by_khawar_shehzad
Making an impact with data science
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
RMMM.pdf make it easy to upload and study
PDF
Basic Mud Logging Guide for educational purpose
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
Supply Chain Operations Speaking Notes -ICLT Program
Classroom Observation Tools for Teachers
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Insiders guide to clinical Medicine.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
01-Introduction-to-Information-Management.pdf
VCE English Exam - Section C Student Revision Booklet
FourierSeries-QuestionsWithAnswers(Part-A).pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
TR - Agricultural Crops Production NC III.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
RMMM.pdf make it easy to upload and study
Basic Mud Logging Guide for educational purpose
human mycosis Human fungal infections are called human mycosis..pptx
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

What Is Data Science? | Introduction to Data Science | Data Science For Beginners | Simplilearn

  • 1. ?
  • 2. What’s in it for you What is Data Science? Data Science vs Business Intelligence What does a Data Scientist do? Data Science lifecycle with example Data Scientist demand Need for Data Science The Prerequisites for learning Data Science
  • 3. Need for Data Science
  • 4. Need For Data Science Does the thought of your car driving you home by itself excite you?
  • 5. Is that even possible ? Need For Data Science Does the thought of your car driving you home by itself excite you?
  • 6. This is where the ‘Need For Data Science’ comes into picture. Data Science helps in making better decisions! Need For Data Science
  • 7. You mean it will be able to take decisions like slowing down, stopping by itself, speeding up and all of that? Need For Data Science This is where the ‘Need For Data Science’ comes into picture. Data Science helps in making better decisions!
  • 8. You mean it will be able to take decisions like slowing down, stopping by itself, speeding up and all of that? Exactly! And then let the machine learn iteratively using unsupervised learning! Need For Data Science This is where the ‘Need For Data Science’ comes into picture. Data Science helps in making better decisions!
  • 9. You mean it will be able to take decisions like slowing down, stopping by itself, speeding up and all of that? That’s interesting! Need For Data Science Exactly! And then let the machine learn iteratively using unsupervised learning! This is where the ‘Need For Data Science’ comes into picture. Data Science helps in making better decisions!
  • 10. Self driving cars will root out more than 2 million deaths caused by car accidents annually. Need For Data Science
  • 11. Due to lack of data available, flights are often delayed or cancelled at the last minute 1 Need For Data Science We’re extremely sorry to inform that your flight has been delayed by 4 hours due to bad weather conditions. Regret the inconvenience caused 2 3
  • 12. 1 Need For Data Science Due to improper route planning, customers don’t get the flight for desired time and duration We’re extremely sorry to inform you that there are no flights for the time selected. There’s a connecting flight for the same time tomorrow. 2 3 Due to lack of data available, flights are often delayed or cancelled at the last minute
  • 13. 2 1 3 Need For Data Science Dear Flyer, We regret to inform you that your flight has been cancelled due to delay from Airbus on account of engine delivery Incorrect decisions in selection of right equipment leads to unplanned delays and cancellations Due to lack of data available, flights are often delayed or cancelled at the last minute Due to improper route planning, customers don’t get the flight for desired time and duration
  • 14. Need For Data Science With Data Science, it has become possible to predict such disruptions and alleviate the loss for both airline and the passenger
  • 15. Need For Data Science Using Data Science, we can achieve the following: Route Planning: Whether to schedule direct or connecting flights Predictive analytics model can be built to foresee flight delays Deciding which class of planes to purchase for better performance Promotional offers depending on customer booking patterns
  • 16. Logistics companies like FedEx are using Data Science models for operational efficiency Discover the best routes to ship The best suited time to deliver The best mode of transport Need For Data Science
  • 17. Need For Data Science So Data Science is mainly needed for: Better Decision Making Whether A or B? Predictive Analysis What will happen next? Pattern Discovery Is there any hidden information in the data?
  • 18. What is Data Science?
  • 19. What is Data Science? Suppose, you have decided to buy furniture online for your new office How do you choose the right website?
  • 20. What is Data Science? Want to buy online furniture? Does website sell furniture ? Yes Rating > 4 out of 5 Yes Purchase Product No Close website No Close website Yes Discount > 20% No Close website
  • 21. Which route should my cab take so that I reach faster? Which viewers like the same kind of TV shows? Will this refrigerator fail in the next 3 years: Yes or No? Who will win the elections? Data Science can answer a lot of other questions as well! What is Data Science?
  • 22. What is Data Science? Finally communicating and visualizing the results Asking the right questions and exploring the data Modeling the data using various algorithms So, Data Science or Data-driven Science is about:
  • 23. Finally communicating and visualizing the results Modeling the data using various algorithms Asking the right questions and exploring the data What is Data Science? So, Data Science or Data-driven Science is about:
  • 24. Finally communicating and visualizing the results Modeling the data using various algorithms Asking the right questions and exploring the data What is Data Science? So, Data Science or Data-driven Science is about:
  • 26. Business Intelligence vs Data Science Structured data e.g. Data Warehouse Unstructured data e.g. web logs Data Source Method Analytical Scientific Skills Statistics, Visualization Statistics, Visualization, Machine Learning Focus Past and Present Data Present Data and Future Predictions Criterion Business Intelligence Data Science
  • 28. Prerequisites for Data Science Only when you ask questions, you will have a better understanding of the business problem CURIOSITY The following are the 3 essential traits of a Data Scientist:
  • 29. Prerequisites for Data Science COMMON SENSE To identify new ways to solve a business problem and to detect priority problems The following are the 3 essential traits of a Data Scientist: CURIOSITY
  • 30. Prerequisites for Data Science COMMUNICATION SKILLSCOMMON SENSE A Data Scientist needs to communicate their findings to business teams to act upon the insights The following are the 3 essential traits of a Data Scientist: CURIOSITY
  • 31. Machine learning is the backbone of Data Science. It is one of the many ways that Data Science uses to find solution to a problem Prerequisites for Data Science 1 MACHINE LEARNING
  • 32. Prerequisites for Data Science Mathematical Models can be extremely helpful to make fast calculations and predictions from what you know about your data 1 2 MACHINE LEARNING MATHEMATICAL MODELLING
  • 33. Prerequisites for Data Science Statistics is foundational to Data Science. It lets you extract knowledge and obtain better results from the data 3 1 2 MACHINE LEARNING STATISTICS MATHEMATICAL MODELLING
  • 34. Prerequisites for Data Science You should know at least one programming language, preferably Python or R for data modelling 4 1 2 3 MACHINE LEARNING STATISTICS COMPUTER PROGRAMMING MATHEMATICAL MODELLING
  • 35. MACHINE LEARNING Prerequisites for Data Science STATISTICS COMPUTER PROGRAMMING The discipline of querying databases teaches you to ask better questions as a Data Scientist 51 2 3 4 MATHEMATICAL MODELLING DATABASES
  • 36. Tools/Skills used in Data Science Skills: R, Python, Statistics Tools: SAS, Jupyter, R studio, MATLAB, Excel, RapidMiner Data Analysis Skills: ETL, SQL,Hadoop, Apache Spark, Tools: Informatica/ Talend, AWS Redshift Data Warehousing Skills: R, Python libraries Tools: Jupyter, Tableau, Cognos, RAW Data Visualization Skills: Python, Algebra, ML Algorithms, Statistics Tools: Spark MLib, Mahout, Azure ML studio Machine Learning
  • 37. What does a Data Scientist do?
  • 38. What does a Data Scientist do? Real World
  • 39. What does a Data Scientist do? Raw Data Real World
  • 40. What does a Data Scientist do? Raw Data Process and Analyze Real World
  • 41. What does a Data Scientist do? Raw Data Process and Analyze Meaningful Data Real World
  • 42. What does a Data Scientist do? Raw Data Process and Analyze Meaningful Data Real World Useful Insights
  • 43. Must Know Machine Learning Algorithms Naive Baiyes Support Vector MachineClustering The most basic and important techniques that you should know as a Data Scientist are Decision TreeRegression Note to instructor: Please say that they can find the videos on specific algorithms in the video description below
  • 44. Data Science Lifecycle with Example
  • 45. Concept Study – Life Cycle CONCEPT STUDY Understanding the problem statement, thorough study of the business model is required. 1 2 3 4 5 6
  • 46. What is the Example? What is the end goal? What is the budget? What are the various specifications? Concept Study – Example
  • 47. Concept of the task : Predict the price of 1.35 carat diamond Get to know about the diamond industry, various terminologies used. Understand the business problem and collect RELEVANT and enough data Suppose, we get the price of diamonds from different diamond retailers. Now, we want to find out the price of 1.35 carat diamond Concept Study – Example
  • 48. Data Preparation - Life cycle Data Preparation Also known as Data Munging, it is the most important aspect of Data Science lifecycle for any valuable insights to pop up. 1 2 3 4 5 6
  • 49. Data Integration Resolving any conflicts in the data and handling redundancies Data Cleaning Correcting inconsistent data by filling out missing values and smoothing out noisy data Data Transformation It involves normalization, transformation and aggregation of data using ETL methodsData Reduction Using various strategies, reducing the size of data but yielding the same outcome Data Preparation - Life cycle
  • 50. Data Preparation - Example Missing Value Improper Datatype Null Value Data preparation: Make the data clean and valuable.
  • 51. Data Preparation - Example Ways to fill missing data values: If dataset is huge, we can simply remove the rows with missing data vales. It is the quickest way. i.e. we use the rest of the data to predict the values. We can substitute missing values with mean of rest of the data using pandas’ dataframe in Python. i.e. df.mean() df.fillna(mean)
  • 52. • Split the data into train data and test data in the ratio of 80:20 • It is generally advised to divide the dataset into two random partition Data Preparation - Example Train data (80%) Test data (20%)
  • 53. Model Planning - Life cycle Model Planning:- After proper understanding and cleaning of the data in hand, suitable model is selected.1 2 3 4 5 6
  • 54. Model Planning: • This step involves Exploratory Data Analysis (EDA) to understand the relation between variables and to see what the data can tell us • Key variables are selected Model Planning - Life cycle
  • 55. But what is Exploratory Data Analysis? Definition : Deeper analysis of dataset to better understand the data. Model Planning - Life cycle Goals : • Know the datatypes and answer questions with the data • Understand how data is distributed • Identify outliers • Identify patterns, if any
  • 56. Techniques: • Histogram 0 2000 4000 6000 8000 10000 12000 14000 0 0.5 1 1.5 2 2.5 TREND ANALYSIS • Trend Analysis Model Planning - Life cycle Using various techniques, we can easily figure out that the relation between carat and price of diamond is linear in nature
  • 57. Model Planning - Example Test Data (20%) Train Data (80%) Model is created Feedback • Train Data is used to develop model • Test Data is used to validate model Train Data vs Test Data Improvement
  • 59. Model Building - Life cycle Model Building :- Using various analytical tools and techniques, data is transformed with the goal of ‘discovering’ useful information to build the right model 1 2 3 4 5 6
  • 60. Model Building: On analyzing the data, we observe that the output is progressing linearly. Hence, we are using Linear Regression Algorithm for Model Building in this case Model Building - Example Rs. 15,000 Carat Rs.5,000 Rs.10,000 Price of diamond 0.5 1.0 1.5 1.35 Regression line
  • 61. Model Building - Example Linear regression describes the relation between 2 variables i.e. X and Y After the regression line is drawn, we can predict Y value for a input X value using following formula: Y = mX + c m = Slope of the line c = Y intercept X is Independent variable
  • 62. Model Building - Example Linear regression describes the relation between 2 variables i.e. X and Y After the regression line is drawn, we can predict Y value for a input X value using following formula: Y = mX + c m = Slope of the line c = Y intercept X is Independent variable Y is dependent variable
  • 63. Collected & Analysed Data (Carat, price) Output Test data Model Building Prediction (Price) (Carat) Model Building - Example Using test data set, the built model is validated for the best accuracy Feedback
  • 64. Prediction: After successful validation of the model, we predict the price of 1.35 carat diamond Model Building - Example Rs. 15,000 Carat Rs.5,000 Rs.10,000 Price of diamond 0.5 1.0 1.5 1.35 Regression line
  • 65. Prediction: Thus, using Simple Linear Regression algorithm we have implemented a successful model and predicted the price of 1.35 carat diamond to be Rs. 10,000 Model Building - Example Rs. 15,000 Carat Rs.5,000 Rs.10,000 Price of diamond 0.5 1.0 1.5 1.35 Regression line
  • 66. This model is easily built using Python packages like pandas, matplotlib, numpy We will study this in detail in the upcoming Data Science Tutorial using Python Model Building - Example
  • 67. Communication - Life cycle Communicate results: Keys findings are identified and conveyed to the stakeholders Communicate results 1 2 3 4 5 6
  • 68. Communication - Life cycle The Battle is not over yet!! A good Data Scientist should be able to communicate his findings with the business team such that it easily goes into execution phase
  • 69. Life cycle of Data Science project Operationalize: - Final reports, code, and technical documents are delivered by the team. 1 2 3 4 5 6
  • 70. Summary - Life cycle Operationalize 1 2 3 4 5 6 Concept Study Data Preparation Model Planning Model Building Communicate Results
  • 72. Demand for Data Scientist Marketing Finance Healthcare Gaming Industries with high demand of Data Scientists: Technology
  • 73. Summary Need For Data Science What is Data science? Prerequisites of data science Demand for data scientistLifecycle with exampleTools Used in Data science

Editor's Notes

  • #3: Remove title case
  • #23: Data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms
  • #24: Data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms
  • #25: Data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms
  • #36: Good insight of the workings of the DBMS will surely take you a long way.
  • #39: A Data Scientist collects as much raw data as possible from the real world
  • #40: A Data Scientist collects as much raw data as possible from the real world
  • #41: A Data Scientist collects as much raw data as possible from the real world
  • #42: A Data Scientist collects as much raw data as possible from the real world
  • #43: A Data Scientist collects as much raw data as possible from the real world
  • #53: Iwe can also use
  • #74: Natural language processing to enable it to communicate successfully in English (or some other human language). Knowledge representation to store information provided before or during the interrogation. Automated reasoning to use the stored information to answer questions and to draw new conclusions. Machine learning to adapt to new circumstances and to detect and extrapolate patterns.