SlideShare a Scribd company logo
Data Science Tutorial | What is Data Science? | Data Science For Beginners | Edureka
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Agenda
1. Need for Data Science
2. Walmart Use Case
3. What is Data Science?
4. Who is a Data Scientist?
5. Data Science – Skill Set
6. Data Science Job Roles
7. Data Life Cycle
8. Introduction to Machine Learning
9. K – Means Use Case
10. K – Means Algorithm
11. Hands - On
12. Data Science Certification
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Need For Data Science
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources
Mobile Cloud Smart Car
Evolution of
Technology
IOT
Social Media
Other factors
Telephone Desktop Car
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources
Evolution of
Technology
IOT
Social Media
Other factors
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources
Evolution of
Technology
IOT
Social Media
Other factors
347,222 tweets1,736,111 pictures 204,000,000 emails
300 hours of video
uploaded
4,166,667 likes &
200,000 photos
4,166,667 likes &
200,000 photos
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Sources
Evolution of
Technology
IOT
Social Media
Other factors
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Walmart Use Case
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart
Halloween and cookie sales
Data scientist at Walmart found a connection between Halloween and the sales of cookies.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart
Hurricane and strawberry pop tarts
Data scientist at Walmart found that sales of Strawberry pop-tarts increased by 7 times before a Hurricane.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Analysis At Walmart
Social media and cake pops
Walmart is leveraging social media data to find about the trending products so that they can be introduced to
the Walmart stores across the world
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What Is Data Science?
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What is Data Science?
Data Science is the process of extracting knowledge and insights
from data by using scientific methods.
Scientific methods:
Programming + Statistics + Business
“Torture the data, and it will confess to anything.”
~ Ronald Coase, Economics, Nobel Prize
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Who Is A Data Scientist?
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Who Is A Data Scientist?
Mathematics
Business Technology
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science – Skill Set
Programming
languagesStatistics
Machine Learning
Big Data processing
frameworks
Data wrangling &
exploration
Data visualisation
Data extraction &
processing
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Job Roles
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Job Roles
Data Scientist Data Analyst Data Architect Data Engineer
Statistician
Database
Administrator
Business Analyst
Data & Analytics
Manager
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Life Cycle
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Data Science
Business
requirements
Data
acquisition
Data
processing
Data
exploration
Modelling
Deployment
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Understand the problem
Identify central objectives
Identify variables that need
to be predicted
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
What data do I need for my project?
What are the data sources?
How can I obtain the data?
What is the most efficient way to
store and access all of it?
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
Transform data into desired format
Data cleaning
• Missing values
• Corrupted data
• Remove unnecessary
data
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
understand the patterns in the data
Retrieve useful insight
form hypotheses
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
Determine optimal data features
for the machine-learning model
Create a model that predicts the
target most accurately
Evaluate & test the efficiency of
the model
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Life Cycle
Business requirements
Data acquisition
Data Processing
Data exploration
Modelling
Deployment
Check the deployment environment
for dependency issues
Deploy the model in a pre-
production/ test environment
Monitor the performance
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Introduction To Machine Learning
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
What Is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) which provides machines the ability to learn automatically &
improve from experience without being explicitly programmed.
They look the same!
Cherry
Apple
Orange
Data
Algorithm
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Types Of Machine Learning
Reinforcement LearningSupervised Learning Unsupervised Learning
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Use Case
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Brain Tumour Detection Using K - means
Brain tumour segmentation deals with the implementation of the k-means
algorithm for detection of range and shape of tumour in brain MR images.
K-Means clustering is an unsupervised learning algorithm used to partition a dataset
into k clusters in which each data point belongs to the cluster with the nearest mean.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence
➢Randomly initialize k points called the cluster centroids.
Here, k = 2
➢Value of k(number of clusters) can be determined by the elbow
curve.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence
➢Compute the distance between the data points and the
cluster centroid initialized.
➢Depending upon the minimum distance, data points are
divided into two groups.
1
2
Euclidean distance
Cluster
centroid
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence
➢Compute mean of red dots & reposition red cluster
centroid to this mean
➢Compute mean of green dots & reposition green
cluster centroid to this mean.
1
2
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence
1
2
➢Repeat previous two steps iteratively till the cluster
centroids stop changing their positions.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence 1
2
➢Repeat previous two steps iteratively till the cluster
centroids stop changing their positions.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence 1
2
➢Repeat previous two steps iteratively till the cluster
centroids stop changing their positions.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence 1
2
➢Repeat previous two steps iteratively till the cluster
centroids stop changing their positions.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
Initialization
Cluster assignment
Move centroid
Optimization
Convergence 1
2
➢Finally, k-means clustering algorithm converges.
➢Divides the data points into two clusters clearly visible in
red and green.
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
K – Means Algorithm
➢ Data Matrix
➢ Distance/ dissimilarity Matrix
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Hands - On
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Data Science Certification
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Edureka’s Data Science Certification
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
Edureka’s Data Science Certification
Introduction to
Data Science
Statistical
Inference
Data extraction,
wrangling &
exploration
Introduction to
Machine Learning
Classification
techniques
Unsupervised
Learning
Recommender
engine Text Mining Time seriesDeep Learning
DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science
WebDriver vs. IDE vs. RC
➢ Data Warehouse is like a relational database designed for analytical needs.
➢ It functions on the basis of OLAP (Online Analytical Processing).
➢ It is a central location where consolidated data from multiple locations (databases) are stored.

More Related Content

PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
PDF
Data Science Introduction
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PDF
Introduction to Data Science
PPTX
Introduction to data science club
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Introduction
Data Science Training | Data Science For Beginners | Data Science With Python...
Introduction to Data Science
Introduction to data science club

What's hot (20)

PDF
Data science
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PDF
Introduction to Data Science
PDF
Introduction to data science
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Data science & data scientist
PDF
Data science
PPTX
Data Science
PDF
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PPTX
Introduction to data science
PDF
Data Science Full Course | Edureka
PPTX
Introduction of Data Science
PDF
Introduction To Data Science
PPTX
Introduction to Data Analytics
PPTX
Data analytics
PDF
Introduction to Data Science
PDF
Introduction to Python for Data Science
PDF
Data Science Project Lifecycle
PDF
How to Become a Data Scientist
Data science
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Introduction to Data Science
Introduction to data science
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Data science & data scientist
Data science
Data Science
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Introduction to data science
Data Science Full Course | Edureka
Introduction of Data Science
Introduction To Data Science
Introduction to Data Analytics
Data analytics
Introduction to Data Science
Introduction to Python for Data Science
Data Science Project Lifecycle
How to Become a Data Scientist
Ad

Similar to Data Science Tutorial | What is Data Science? | Data Science For Beginners | Edureka (20)

PPTX
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
PDF
Data Science : Make Smarter Business Decisions
PPTX
Application of Clustering in Data Science using Real-life Examples
PDF
Sentiment Analysis In Retail Domain
PPTX
Top 5 algorithms used in Data Science
PDF
Data Science and Machine Learning for Non Programmers | Edureka
PDF
Business Analytics with R
PDF
Logistic Regression In Data Science
PDF
Machine Learning With R | Machine Learning Algorithms | Data Science Training...
PPTX
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
PPTX
Data scientist roadmap
PPTX
Data Science Training
PPTX
Data Science with R
PDF
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
PDF
Top 10 Myths Regarding Data Scientists Roles in India | Edureka
PPTX
Data science training presentation for high-quality education and training in...
PPTX
Data Science Demystified
PPTX
Data science training institute in hyderabad
PPTX
Best data science training in Hyderabad
PPTX
Data science training in hyd ppt (1)
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
Data Science : Make Smarter Business Decisions
Application of Clustering in Data Science using Real-life Examples
Sentiment Analysis In Retail Domain
Top 5 algorithms used in Data Science
Data Science and Machine Learning for Non Programmers | Edureka
Business Analytics with R
Logistic Regression In Data Science
Machine Learning With R | Machine Learning Algorithms | Data Science Training...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Data scientist roadmap
Data Science Training
Data Science with R
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Top 10 Myths Regarding Data Scientists Roles in India | Edureka
Data science training presentation for high-quality education and training in...
Data Science Demystified
Data science training institute in hyderabad
Best data science training in Hyderabad
Data science training in hyd ppt (1)
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf

Data Science Tutorial | What is Data Science? | Data Science For Beginners | Edureka

  • 2. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Agenda 1. Need for Data Science 2. Walmart Use Case 3. What is Data Science? 4. Who is a Data Scientist? 5. Data Science – Skill Set 6. Data Science Job Roles 7. Data Life Cycle 8. Introduction to Machine Learning 9. K – Means Use Case 10. K – Means Algorithm 11. Hands - On 12. Data Science Certification
  • 3. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Need For Data Science
  • 4. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Sources Mobile Cloud Smart Car Evolution of Technology IOT Social Media Other factors Telephone Desktop Car
  • 5. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Sources Evolution of Technology IOT Social Media Other factors
  • 6. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Sources Evolution of Technology IOT Social Media Other factors 347,222 tweets1,736,111 pictures 204,000,000 emails 300 hours of video uploaded 4,166,667 likes & 200,000 photos 4,166,667 likes & 200,000 photos
  • 7. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Sources Evolution of Technology IOT Social Media Other factors
  • 8. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Walmart Use Case
  • 9. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Analysis At Walmart Halloween and cookie sales Data scientist at Walmart found a connection between Halloween and the sales of cookies.
  • 10. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Analysis At Walmart Hurricane and strawberry pop tarts Data scientist at Walmart found that sales of Strawberry pop-tarts increased by 7 times before a Hurricane.
  • 11. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Analysis At Walmart Social media and cake pops Walmart is leveraging social media data to find about the trending products so that they can be introduced to the Walmart stores across the world
  • 12. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science What Is Data Science?
  • 13. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science What is Data Science? Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods: Programming + Statistics + Business “Torture the data, and it will confess to anything.” ~ Ronald Coase, Economics, Nobel Prize
  • 14. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Who Is A Data Scientist?
  • 15. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Who Is A Data Scientist? Mathematics Business Technology
  • 16. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Science – Skill Set Programming languagesStatistics Machine Learning Big Data processing frameworks Data wrangling & exploration Data visualisation Data extraction & processing
  • 17. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Science Job Roles
  • 18. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Science Job Roles Data Scientist Data Analyst Data Architect Data Engineer Statistician Database Administrator Business Analyst Data & Analytics Manager
  • 19. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Science Life Cycle
  • 20. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Data Science Business requirements Data acquisition Data processing Data exploration Modelling Deployment
  • 21. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Understand the problem Identify central objectives Identify variables that need to be predicted Business requirements Data acquisition Data Processing Data exploration Modelling Deployment
  • 22. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Business requirements Data acquisition Data Processing Data exploration Modelling Deployment What data do I need for my project? What are the data sources? How can I obtain the data? What is the most efficient way to store and access all of it?
  • 23. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Business requirements Data acquisition Data Processing Data exploration Modelling Deployment Transform data into desired format Data cleaning • Missing values • Corrupted data • Remove unnecessary data
  • 24. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Business requirements Data acquisition Data Processing Data exploration Modelling Deployment understand the patterns in the data Retrieve useful insight form hypotheses
  • 25. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Business requirements Data acquisition Data Processing Data exploration Modelling Deployment Determine optimal data features for the machine-learning model Create a model that predicts the target most accurately Evaluate & test the efficiency of the model
  • 26. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Life Cycle Business requirements Data acquisition Data Processing Data exploration Modelling Deployment Check the deployment environment for dependency issues Deploy the model in a pre- production/ test environment Monitor the performance
  • 27. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Introduction To Machine Learning
  • 28. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science What Is Machine Learning? Machine learning is a subset of artificial intelligence (AI) which provides machines the ability to learn automatically & improve from experience without being explicitly programmed. They look the same! Cherry Apple Orange Data Algorithm
  • 29. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Types Of Machine Learning Reinforcement LearningSupervised Learning Unsupervised Learning
  • 30. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Use Case
  • 31. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Brain Tumour Detection Using K - means Brain tumour segmentation deals with the implementation of the k-means algorithm for detection of range and shape of tumour in brain MR images. K-Means clustering is an unsupervised learning algorithm used to partition a dataset into k clusters in which each data point belongs to the cluster with the nearest mean.
  • 32. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm
  • 33. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence ➢Randomly initialize k points called the cluster centroids. Here, k = 2 ➢Value of k(number of clusters) can be determined by the elbow curve.
  • 34. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence ➢Compute the distance between the data points and the cluster centroid initialized. ➢Depending upon the minimum distance, data points are divided into two groups. 1 2 Euclidean distance Cluster centroid
  • 35. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence ➢Compute mean of red dots & reposition red cluster centroid to this mean ➢Compute mean of green dots & reposition green cluster centroid to this mean. 1 2
  • 36. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence 1 2 ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions.
  • 37. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence 1 2 ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions.
  • 38. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence 1 2 ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions.
  • 39. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence 1 2 ➢Repeat previous two steps iteratively till the cluster centroids stop changing their positions.
  • 40. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm Initialization Cluster assignment Move centroid Optimization Convergence 1 2 ➢Finally, k-means clustering algorithm converges. ➢Divides the data points into two clusters clearly visible in red and green.
  • 41. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science K – Means Algorithm ➢ Data Matrix ➢ Distance/ dissimilarity Matrix
  • 42. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Hands - On
  • 43. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Data Science Certification
  • 44. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Edureka’s Data Science Certification
  • 45. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science Edureka’s Data Science Certification Introduction to Data Science Statistical Inference Data extraction, wrangling & exploration Introduction to Machine Learning Classification techniques Unsupervised Learning Recommender engine Text Mining Time seriesDeep Learning
  • 46. DATA SCIENCE CERTIFICATION TRAINING www.edureka.co/data-science WebDriver vs. IDE vs. RC ➢ Data Warehouse is like a relational database designed for analytical needs. ➢ It functions on the basis of OLAP (Online Analytical Processing). ➢ It is a central location where consolidated data from multiple locations (databases) are stored.