100% found this document useful (2 votes)

520 views31 pages

Python For Data Science and Machine Learning

This document provides an overview of Python for data science and machine learning. It discusses fundamentals of data science, the data science workflow, applications of data science, tools used, and hands-on activities. Fundamentals include how data science combines fields like math, computer science, and domain expertise to extract value from data. The workflow involves preparing, analyzing, visualizing, and deploying data. Applications include optimizing campaigns, improving sales/diagnoses, and forecasting. Tools include programming languages like Python and R, libraries for tasks like data engineering, analysis, visualization, and machine learning algorithms. Hands-on activities demonstrate collecting API data, exploring/cleaning datasets, and building supervised/unsupervised machine learning models.

Uploaded by

Kassandra Kay Fabia Mislang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

520 views31 pages

Python For Data Science and Machine Learning

Uploaded by

Kassandra Kay Fabia Mislang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Python for

Data Science
and Machine
Learning
Rod Salvador
Senior Data Scientist
Reed Elsevier Philippines

https://www.extremetech.com/extreme/319005-the-day-i-learned-what-data-science-is
Contents:
• Fundamentals of Data Science
• Data Science Workflow
• Applications of Data Science
• Tools for Data Science
• Hands-on Activities
Data Engineering
Exploratory Data Analysis (EDA)
Data Preprocessing/Cleansing
Machine Learning Modeling
Data Visualization
• Q&A
Fundamentals of Data Science

Data science combines multiple fields, including

mathematics, computer science, and domain expertise
to extract value from data.

It encompasses preparing data for analysis, including

cleansing, aggregating, and manipulating the data to
perform advanced data analysis, machine learning,
visualization, and deployment [1].
Applications of Data Science

Optimize campaign efforts by analyzing which platforms

are heavily used and rarely used by our end users.

Improve sales by creating targeted recommendations for

customers based on previous purchases and spending
habits.

Determine customer churn by analyzing data from

profiles, marketing interactions, sales history, and surveys
so sales and marketing can take action to retain them.

Improve events experience by analyzing the sentiment of

exhibitors, visitors, and hosted buyers based on open text
survey responses.
Applications of Data Science
Improve patient diagnoses by analyzing medical test data
and reported symptoms so doctors can diagnose diseases
earlier and treat them more effectively.

Improve efficiency by analyzing traffic patterns, weather

conditions, and other factors so logistics companies can
improve delivery speeds and reduce costs.

Forecast the growth of COVID-19 cases in a particular

region, country, continent, etc.

Detect fraud in financial services by recognizing

suspicious behaviors and anomalous actions.
Data Science Workflow
Tools for Data Science
Programming Languages

Python

Java

Javascript

C/C++

SQL
Tools for Data Science
Libraries

Data Engineering: requests, selenium, pyodbc, boto3, json5, beautifulsoup4, awswrangler, etc.

Data Analysis/Cleaning: numpy, pandas, pandas profiling, scipy, etc.

Machine Learning: scikit-learn, tensorflow, keras, pytorch, TPOT, etc.

Data Visualization: matplotlib, ggplot, d3.js, seaborn, plotly, etc.

NLP: nltk, textblob, twython, huggingface, etc.

Automation: pyautogui, selenium, etc.

Tools for Data Science
Other Tools

IDE: Jupyterlab/Jupyter notebook via Anaconda navigator, VS Code, Sublime, Pycharm, etc.

Data Sources: Kaggle, google open datasets, kdnuggets, NASA, etc.

Research: arxiv.org, paperswithcode.com, google scholar, etc.

Cloud/Distributed Computing: GCP, Azure, AWS, Databricks, Hadoop, Spark, etc.

Version Control: Git/Github, Bitbucket, subversion, etc.

Deployment: Heroku, Streamlit, Flask, Django, FastAPI, Docker, Kubernetes, Jenkins, etc.
Data Engineering

Data engineering is the practice designing and

building systems for collecting, storing, and
analyzing data at scale.

The ultimate goal is to make data accessible so that

organizations can use it to evaluate and optimize their
performance [2].
Data Engineering

What’s the difference between a data scientist/analyst

and a data engineer?

Data scientists and data analysts analyze data sets to

glean knowledge and insights.

Data engineers build systems for collecting,

validating, and preparing that high-quality data [2].
Data Engineering

DEMO 1: Collect data from an API endpoint using requests library

Exploratory Data Analysis (EDA)

Exploratory data analysis (EDA) is used to analyze and

investigate data sets and summarize their main
characteristics, often employing data visualization
methods [3].

It helps determine how to manipulate data sources to get

the answers you need, making it easier to:
1. discover patterns
2. spot anomalies
3. test a hypothesis
4. check assumptions
Exploratory Data Analysis

DEMO 2: Perform EDA using pandas profiling and create

exploratory visuals using matplotlib
Data Preprocessing/Cleansing

Data preprocessing is the process of transforming raw

data into an understandable format.

The quality of the data should be checked before applying

machine learning or data mining algorithms [4].
Data Preprocessing/Cleansing

Characteristics of a dirty data:

1. Incomplete data (e.g., missing/null values)

2. Duplicates
3. Inconsistent data (e.g., data types, data versions)
4. Outliers
5. Outdated data
6. Inaccurate data
7. Insecure data
,Data Preprocessing/Cleansing

DEMO 3: Cleanse tabular data using numpy and pandas

Break!

10 minutes
Introduction to Machine Learning

Machine learning is defined as the ability of a machine to

learn from data without being explicitly programmed.

Machine learning is best used for…

• Problems for which existing solutions require a lot of hand-tuning or long

lists of rules.
• Complex problems for which there is no good solution at all using a
traditional approach.
• Fluctuating environments: a Machine Learning system can adapt to new
data.
• Getting insights about complex problems and large amounts of data.
Introduction to Machine Learning

Machine Learning Workflow

Introduction to Machine Learning

Types of Machine Learning

Supervised learning is a type of machine learning that requires both input (features) data and
output (label) data. The goal is to find a mapping between the input and the output data.

https://ai.plainenglish.io/introduction-to-machine-learning-2316e048ade3
Introduction to Machine Learning

Types of Machine Learning

Unsupervised learning is a type of machine learning that only requires input data. The goal is to
find similarities, differences, and patterns in the data.

https://towardsdatascience.com/supervised-vs-unsupervised-learning-in-2-minutes-72dad148f242
Introduction to Machine Learning

Tasks under supervised learning

https://medium.com/big-data-at-berkeley/choosing-fine-tuning-your-machine-learning-model-8c28fc1bd2fc
Introduction to Machine Learning

Tasks under unsupervised learning

https://www.reddit.com/r/datascience/comments/d6buto/kmeans_be_like_mine_mine_mine/
https://towardsdatascience.com/dimensionality-reduction-cheatsheet-15060fee3aa
,Introduction to Machine Learning

DEMO 4: Supervised learning using scikit-learn library

,Introduction to Machine Learning

DEMO 5: Unsupervised learning using scikit-learn library

Data Visualization

Data visualization is the graphical representation of

information and data.

By using visual elements like charts, graphs, and maps, data

visualization tools provide an accessible way to see and
understand trends, outliers, and patterns in data [5].
,Data Visualization

DEMO 6: Data visualization using seaborn and plotly

Q&A
References

1. https://www.oracle.com/ph/data-science/what-is-data-science/
2. https://www.coursera.org/articles/what-does-a-data-engineer-do-and-how-do-i-
become-one
3. https://www.ibm.com/cloud/learn/exploratory-data-analysis
4. https://www.analyticsvidhya.com/blog/2021/08/data-preprocessing-in-data-mining-a-
hands-on-guide/
5. https://www.tableau.com/learn/articles/data-visualization
Thank you

[email protected]

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
180 Data Science and Machine Learning Projects With Python by Aman Kharwal Coders Camp Medium
No ratings yet
180 Data Science and Machine Learning Projects With Python by Aman Kharwal Coders Camp Medium
18 pages
Sample Outline Azure Machine Learning Engineering
No ratings yet
Sample Outline Azure Machine Learning Engineering
17 pages
Self-Learning Data Science
No ratings yet
Self-Learning Data Science
16 pages
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
Pandas
100% (1)
Pandas
1,131 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
No ratings yet
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
3 pages
Data Preparation For Automated Machine Learning: White Paper
No ratings yet
Data Preparation For Automated Machine Learning: White Paper
21 pages
Fake News Detection
No ratings yet
Fake News Detection
14 pages
Data Scientist - KD PDF
No ratings yet
Data Scientist - KD PDF
1 page
Data Science Skills
No ratings yet
Data Science Skills
31 pages
3 - Big Data Insight V.2019 PDF
No ratings yet
3 - Big Data Insight V.2019 PDF
28 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Full Stack Data Science
No ratings yet
Full Stack Data Science
54 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
Deep Learning Fundamentals Materials
100% (1)
Deep Learning Fundamentals Materials
216 pages
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
No ratings yet
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
416 pages
Introduction To Database Programming in Python
No ratings yet
Introduction To Database Programming in Python
26 pages
Lesson 06 Mathematical Computing Using NumPy
No ratings yet
Lesson 06 Mathematical Computing Using NumPy
59 pages
Data Science Course Content
No ratings yet
Data Science Course Content
4 pages
Dealing With Missing Data in Python Pandas
100% (1)
Dealing With Missing Data in Python Pandas
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Machine Learning
100% (6)
Machine Learning
115 pages
BA ZG523 Introduction To Data Science
50% (2)
BA ZG523 Introduction To Data Science
12 pages
Data Science and Machine Learning
100% (1)
Data Science and Machine Learning
190 pages
New Ebook Guide To AI Data Science
No ratings yet
New Ebook Guide To AI Data Science
50 pages
Lesson 02 2.01 Introduction To Data Science
No ratings yet
Lesson 02 2.01 Introduction To Data Science
31 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Top 9 Feature Engineering Techniques With Python: Dataset & Prerequisites
No ratings yet
Top 9 Feature Engineering Techniques With Python: Dataset & Prerequisites
27 pages
Machine Learning
100% (1)
Machine Learning
21 pages
PythonForDataScience Cheatsheet PDF
100% (5)
PythonForDataScience Cheatsheet PDF
21 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Lesson 08 Data Visualization With Python
No ratings yet
Lesson 08 Data Visualization With Python
125 pages
Data Visualization Cheatsheet 1702209209
100% (1)
Data Visualization Cheatsheet 1702209209
7 pages
MAchine Learning
No ratings yet
MAchine Learning
120 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
26 pages
30 Amazing Machine Learning Projects For The Past Year (v.2018)
No ratings yet
30 Amazing Machine Learning Projects For The Past Year (v.2018)
22 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
ARIMA Models in Python Chapter4 PDF
100% (1)
ARIMA Models in Python Chapter4 PDF
50 pages
Great Collection of Data Science Resources
100% (1)
Great Collection of Data Science Resources
2 pages
Thinking in Pandas - How To Use The Python Data Analysis Library The Right Way (2020)
100% (2)
Thinking in Pandas - How To Use The Python Data Analysis Library The Right Way (2020)
190 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Learning Data Mining With Python - Sample Chapter
100% (4)
Learning Data Mining With Python - Sample Chapter
29 pages
Data Science With Python - Lesson 02 - Data Analytics Overview
No ratings yet
Data Science With Python - Lesson 02 - Data Analytics Overview
54 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Introduction To Data Science
94% (16)
Introduction To Data Science
530 pages
Feature Engineering Handout
No ratings yet
Feature Engineering Handout
33 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet
Riphah International University Islamabad: M Haris Rana 2122
No ratings yet
Riphah International University Islamabad: M Haris Rana 2122
7 pages
Dissertation Report: Tribal Museum and Development Center, Dindori, Madhya Pradesh.
No ratings yet
Dissertation Report: Tribal Museum and Development Center, Dindori, Madhya Pradesh.
137 pages
Expedicao Continua 12.1.2310 Com Contents
No ratings yet
Expedicao Continua 12.1.2310 Com Contents
16 pages
Ui JS
No ratings yet
Ui JS
3 pages
DS Lab 9 - Recursion in C++
No ratings yet
DS Lab 9 - Recursion in C++
10 pages
ddIMPLEMENTATIONS PROCESS
No ratings yet
ddIMPLEMENTATIONS PROCESS
84 pages
ATS9900 V100R006C00 Feature List FT
No ratings yet
ATS9900 V100R006C00 Feature List FT
14 pages
Java Full Stack Developer
No ratings yet
Java Full Stack Developer
2 pages
PSP230
No ratings yet
PSP230
2 pages
Chapter2 - Primitive Data
No ratings yet
Chapter2 - Primitive Data
38 pages
2-1 R18 - DATA STRUCTURES Digital Notes
No ratings yet
2-1 R18 - DATA STRUCTURES Digital Notes
153 pages
998-20488370 EasyUPS3S-3M 400V GMA-US B WEB
No ratings yet
998-20488370 EasyUPS3S-3M 400V GMA-US B WEB
8 pages
Programming Fundamentals Lecture 02
No ratings yet
Programming Fundamentals Lecture 02
15 pages
Evo DAC Two Plus User Manual PrA
No ratings yet
Evo DAC Two Plus User Manual PrA
54 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
16 pages
Prescriptive Maint by Leveraging Ai
No ratings yet
Prescriptive Maint by Leveraging Ai
8 pages
Peter Ashraf
No ratings yet
Peter Ashraf
1 page
HW 01 - CSL 537
No ratings yet
HW 01 - CSL 537
6 pages
CV - Abhishek Banerjee
No ratings yet
CV - Abhishek Banerjee
2 pages
Operating System Groups Created During Oracle Database Installation
No ratings yet
Operating System Groups Created During Oracle Database Installation
4 pages
Alchemy Server SDK
100% (1)
Alchemy Server SDK
30 pages
AutoCAD 2021 - 1
No ratings yet
AutoCAD 2021 - 1
19 pages
Arbaminch Universty Institut of Technology
No ratings yet
Arbaminch Universty Institut of Technology
14 pages
Scout Youth Forum
No ratings yet
Scout Youth Forum
23 pages
Record Transport Management System
No ratings yet
Record Transport Management System
34 pages
Epfl Doctoral Thesis Template
100% (3)
Epfl Doctoral Thesis Template
6 pages
Module 2-1
No ratings yet
Module 2-1
20 pages
Constructor and Destructor
No ratings yet
Constructor and Destructor
17 pages
TCS ALL ASPIRE AND TECH LOUNGE Questions and Answers - TCS Aspire Agile Methodology Questions & Answers
No ratings yet
TCS ALL ASPIRE AND TECH LOUNGE Questions and Answers - TCS Aspire Agile Methodology Questions & Answers
19 pages
Manual Quark Ion
No ratings yet
Manual Quark Ion
17 pages

Python For Data Science and Machine Learning

Uploaded by

Python For Data Science and Machine Learning

Uploaded by

Python for

Data science combines multiple fields, including

It encompasses preparing data for analysis, including

Optimize campaign efforts by analyzing which platforms

Improve sales by creating targeted recommendations for

Determine customer churn by analyzing data from

Improve events experience by analyzing the sentiment of

Improve efficiency by analyzing traffic patterns, weather

Forecast the growth of COVID-19 cases in a particular

Detect fraud in financial services by recognizing

Data Analysis/Cleaning: numpy, pandas, pandas profiling, scipy, etc.

Machine Learning: scikit-learn, tensorflow, keras, pytorch, TPOT, etc.

Data Visualization: matplotlib, ggplot, d3.js, seaborn, plotly, etc.

NLP: nltk, textblob, twython, huggingface, etc.

Automation: pyautogui, selenium, etc.

Data Sources: Kaggle, google open datasets, kdnuggets, NASA, etc.

Research: arxiv.org, paperswithcode.com, google scholar, etc.

Cloud/Distributed Computing: GCP, Azure, AWS, Databricks, Hadoop, Spark, etc.

Version Control: Git/Github, Bitbucket, subversion, etc.

Data engineering is the practice designing and

The ultimate goal is to make data accessible so that

What’s the difference between a data scientist/analyst

Data scientists and data analysts analyze data sets to

Data engineers build systems for collecting,

DEMO 1: Collect data from an API endpoint using requests library

Exploratory data analysis (EDA) is used to analyze and

It helps determine how to manipulate data sources to get

DEMO 2: Perform EDA using pandas profiling and create

Data preprocessing is the process of transforming raw

The quality of the data should be checked before applying

Characteristics of a dirty data:

1. Incomplete data (e.g., missing/null values)

DEMO 3: Cleanse tabular data using numpy and pandas

Machine learning is defined as the ability of a machine to

Machine learning is best used for…

• Problems for which existing solutions require a lot of hand-tuning or long

Machine Learning Workflow

Types of Machine Learning

Types of Machine Learning

Tasks under supervised learning

Tasks under unsupervised learning

DEMO 4: Supervised learning using scikit-learn library

DEMO 5: Unsupervised learning using scikit-learn library

Data visualization is the graphical representation of

By using visual elements like charts, graphs, and maps, data

DEMO 6: Data visualization using seaborn and plotly

You might also like