SlideShare a Scribd company logo
Machine Learning with Python
Compiled by : Dr. Kumud Kundu
Outline
● The general concepts of machine learning
● The three types of learning and basic terminology
● The building blocks for successfully designing machine learning systems
● Introduction to Pandas, Matlplotlib and sklearn framework
○ For basics of Python refer to (https://www.python.org/) and
○ For basics of NumPy refer to (http://www.numpy.org/).
● Simple Program of Plotting Graphs with Matplotlib.pyplot
● Coding Template of Analyzing and Visualizing Dataframe with Pandas
● Simple Program for supervised learning (prediction modelling) with Linear Regression
● Simple Program for unsupervised learning (clustering) with Kmeans
Machine Learning
Machine learning, the application and science of algorithms that make sense of data
Or
Machine Learning uses algorithms that takes input data, learns from data and make
informed decisions.
Or
To design and implement programs that improve with experience
ML: Giving Computers the Ability to Learn from Data
Machine Learning is…
Automating automation
Getting computers to program themselves
Let the data do the work instead!
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
JOURNEY FROM DATA TO PREDICTIONS
“Machine learning is the next Internet”
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Traditional Programming Vs. Machine Learning Programmming
Machine learning is inherently a multi-disciplinary field
It draws on results from :
Artificial intelligence,
Probability
Statistics
Computational complexity theory
Information theory
Philosophy
Psychology
Neurobiology
and other fields.
Most machine learning methods work well because of human-designed representations and input
features
ML becomes just optimizing weights to best make a final prediction
Machine Learning
How Machines Learn???
Learning is all about discovering the best parameter values (a, b, c …) that maps
input to output.
Or
The main goal behind learning, we want to learn how the values are calculated
(relationships between output and input) i.e.
Machine learning algorithms are described as learning a target function (f) that
best maps input variables (X) to an output variable (Y), Y = f(X)
The relationships can be linear or non linear.
These values enable the learned model to output results for new instances based on
previous learned ones.
The problem of learning a function from data is a difficult problem
and this is the reason why the field of machine learning and machine
learning algorithms exist.
● Error creeps in predicting output from real life input data instances (X).
i.e. Y = f(X) + e
● This error might be error such as not having enough attributes to sufficiently characterize the best
mapping from X to Y.
Subject 1
Subject 2
As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis
of intensity profile, though expected output is Subject1 with pose
Subject 1
with pose
Ml programming with python
Ml programming with python
The following diagram shows a typical workflow for
using machine learning in predictive modeling:
ML Program
● A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
Python for Machine Learning Program
Why Python??
Python is one of the most popular programming languages for data science and thanks to its very active developer
and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific
computing and machine learning have been developed.
For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open
source machine learning libraries will be used.
Python on Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations and narrative text.
The core programming languages supported by Jupyter are Julia, Python
and R.
Use it on Google Colab colab.research.google.com
or Use Jupyter notebook on Anaconda
● Using the Anaconda Python distribution and package manager
● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an
Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
Key Terms in Machine Language Program
● Training example: A row in a table representing the dataset and synonymous with an observation, record,
instance, or sample (in most contexts, sample refers to a collection of training examples).
● Training: Model fitting, for parametric models similar to parameter estimation.
● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input,
attribute, or covariate.
● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth.
● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from
the expected output.
Import the Libraries into the Jupyter Notebook
● Import Numpy as np
● Import Pandas as pd
● Import Matplotlib.pyplot as plt
Matplotlib: A Plotting Library for Python
● it makes heavy use of NumPy
● Importing matplotlib :
● from matplotlib import pyplot as plt or
● import matplotlib.pyplot as plt
● Examples:
● # for plotting bar graph
● x=[1,23,4,5,6,7]
● y=[23,45,67,89,90,100]
● plt.bar(x,y)
● plt.title('bar graph')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
● plt.scatter(x,y)
● plt.title('Scatter Plot')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
For subplots (Simultaneous plotting)
● Matplotlib.pyplot.subplot
● import numpy as np
● x=np.arange(0,10,0.01)
● plt.subplot(1,3,1)
● plt.plot(x,np.sin(x))
● plt.subplot(1,3,2)
● plt.plot(x,np.cos(x))
● plt.subplot(1,3,3)
● plt.plot(x,np.sin(2*x))
● plt.show()
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool.
Pandas in data analysis:
Importing Data
Writing to different formats
Pandas Data Structures
Data Exploration
Data Manipulation
Aggregating Data
Merging Data
DataFrame
● DataFrame is a two-dimensional array with heterogeneous data.
Reading and Writing into DataFrames
● Import pandas as pd
● Reading Data into Dataframe using Pandas
○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file
○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv')
○ df=pd.read_excel(‘File Name’)
● Writing Data from dataframes to Files on System
df.to_csv(‘File Name’ or ‘Destination Path along with path file’)
df.to_excel(‘File Name’ or ‘Destination Path along with path file’
To display all the records of the file : display(df)
● types = df.dtypes
● print(types)
Getting preview of Dataframe
● To view top n records of dataframe
○ df.head(5)
● To view bottom n records of dataframe
○ df.tail(5)
● View column name
○ df.columns
○ Getting subdataframe from dataframe
○ df['name’] , df[['name','nations']]
SubDataFrame as per Query
To display the records of India with ranking <50
display(df[(df['nations'] == "IND") & (df['rank’] < 50)])
Selecting data columns from dataset with column names:
df[[‘col1’ ‘col2’]]
With iloc (integer-location) based indexing for selection by position
df.iloc[:,:-1] // select all columns but not the last one
df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
Drop Columns from a Dataframe using drop() method.
Drop Columns from a Dataframe using and drop() method.
Method #1: Drop Columns from a Dataframe using drop() method.
Remove specific single column.
k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset
Removing specific multiple columns.
k.drop(['rate_date', 'rating'], axis=1)
Remove columns as based on column index.
k.drop[k.columns[[0,1]],axis=1, inplace= True)
Remove all columns between a specific column to another columns
K.iloc(:,[3,4])
Code for Data Reading, Data Manipulation using Pandas
● # Importing Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
● df=pd.read_csv('rating.csv')
# to display column headers in dataset
df.columns
● # to get the number of instances and associated features
df.shape
# to get insights to data by grouping the data of one column
● df.groupby('nations').size()
# to get smaller dataset as per the query or subqueries
● k=(df[(df['nations'] =="IND") & (df['rank']<50)])
# to display smaller subset of data
display(k)
# to drop desired column from the smaller set of data
● k=dataset.drop(['name','rate_date','nations'],axis=1)
Scikit /sklearn: Free Machine Learning Library for Python
● It supports Python numerical and scientific libraries like NumPy and SciPy .
● Model selection is the process of selecting one final machine learning model from among a collection of candidate
machine learning models for a training dataset. Model selection is a process that can be applied both across different
types of models (e.g. logistic regression, SVM, KNN, etc.)
● from sklearn.model_selection
● model_selection is the process of selecting one final machine learning model among a collection of machine learning
models for training set.
● model parameters are parameters which arise as a result of the fit
Challenge of ML Program
The challenge of applied machine learning is in choosing
a model among a range of different models for your
problem.
Simple Predictive ML Program using Linear Regression
Model
● SIMPLE_REGRESSION.ipynb On Google Colab
# Important Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
df=pd.read_csv('rating.csv.csv')
# For plotting graphs
import matplotlib.pyplot as plt
# Dividing Dataset into Train Set (X) and Target Set (y)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
# from machine learning library of python (sklearn) import train_test_split function
from sklearn.model_selection import train_test_split
# X is training set
# y is the target set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# split with the help of train_test_split function
# X part is divided in two parts Train and Test
# Y part is divided into two parts Train and Test
X_test.shape
# import Linear Regression Model
from sklearn.linear_model import LinearRegression
# created instance of linear regression model
model = LinearRegression()
# Finding the relationship between input AND OUTPUT with the help of fit function
model.fit(X_train, y_train)
# using the same trained model over the unknown test data i.e. x_test
y_pred = model.predict(X_test)
Visualizing and Evaluation of results
# Visualization of Results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('PCM Marks vs Placement_Package (Training set)')
plt.xlabel('PCM Marks')
plt.ylabel('Placement_Package')
plt.show()
# importing metrics from sklearn to evaluate the predicted result
from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:',
# include Numerical Calculation Python Library numpy
import numpy as np
np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
CLUSTERING : Grouping things together
UNSUPERVISED LEARNING
Cluster Analysis : A method of Unsupervised Learning
● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are
more similar to each other than to those in other groups.
● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when
we apply a clustering algorithm.
● To survey academic performance of high school students , the entire population of particular board can be divided into
different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
K-Means Clustering
● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the
cluster with the nearest mean, serving as a prototype of the cluster.
● K-Means falls under the category of centroid-based clustering.
•n = number of instances
•k = number of clusters
•t = number of iterations
K-Means Clustering Algorithm involves the following steps-
● Choose the number of clusters K.
● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each
other.
○ Calculate the distance between each data point and each cluster center by using given distance function.
○ A data point is assigned to that cluster whose center is nearest to that data point.
○ Re-compute the center of newly formed clusters.
○ The center of a cluster is computed by taking mean of all the data points contained in that cluster.
● Keep repeating the above four steps until any of the following stopping criteria is met-
○ No change in the center of newly formed clusters
○ No change in the data points of the cluster
○ Maximum number of iterations are reached
Metric to evaluate the quality of Clusters
● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the
centroid of that cluster.
● It tells us how far the points within a cluster are
● the distance between them should be as low as possible.
from sklearn.cluster import KMeans
● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster
centroid.
● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
● Using the elbow method to find the optimal number of clusters
An Elbow Method Algorithm
● The basic idea of the elbow rule is to use a square of the distance between the sample points in
each cluster and the centroid of the cluster to give a series of K values. The sum of squared
errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE.
● Smaller values indicate that each cluster is more convergent
Clustering Example with K-Means
Coding contd..
Coding contd..
Agglomerative Clustering
● An agglomerative algorithm is a type of hierarchical clustering algorithm where
each individual element to be clustered is in its own cluster. These clusters are merged
iteratively until all the elements belong to one cluster.
● Hierarchical clustering is a powerful technique that allows to build tree structures from
data similarities.
Hierarchical Clustering Example
Coding contd..
Ml programming with python
Applications of Clustering
● Search Engines.
● Spam Detection
● Customer Segmentation
Ad

Recommended

358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
sumitbardhan
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Introduction to R
Introduction to R
Samuel Bosch
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
rajkamaltibacademy
 
R programming by ganesh kavhar
R programming by ganesh kavhar
Savitribai Phule Pune University
 
R programming groundup-basic-section-i
R programming groundup-basic-section-i
Dr. Awase Khirni Syed
 
358 33 powerpoint-slides_3-pointers_chapter-3
358 33 powerpoint-slides_3-pointers_chapter-3
sumitbardhan
 
Data structure
Data structure
Muhammad Farhan
 
Workshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
R-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
Alvaro Gil
 
Python Programming - XII. File Processing
Python Programming - XII. File Processing
Ranel Padon
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
Korkrid Akepanidtaworn
 
Intellectual technologies
Intellectual technologies
Polad Saruxanov
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
Templates in c++
Templates in c++
ThamizhselviKrishnam
 
264finalppt (1)
264finalppt (1)
Mahima Verma
 
R programming slides
R programming slides
Pankaj Saini
 
Unit 2 linked list
Unit 2 linked list
DrkhanchanaR
 
Machine Learning in R
Machine Learning in R
Alexandros Karatzoglou
 
Primitive data types
Primitive data types
bad_zurbic
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
izahn
 
Getting Started with R
Getting Started with R
Sankhya_Analytics
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
AlbanLevy
 
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
mporhel
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming Languages
Vasavi College of Engg
 
08 class and object
08 class and object
dhrubo kayal
 
R programming Fundamentals
R programming Fundamentals
Ragia Ibrahim
 
Python ml
Python ml
Shubham Sharma
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
DSCUSICT
 

More Related Content

What's hot (20)

Workshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
R-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
Alvaro Gil
 
Python Programming - XII. File Processing
Python Programming - XII. File Processing
Ranel Padon
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
Korkrid Akepanidtaworn
 
Intellectual technologies
Intellectual technologies
Polad Saruxanov
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
Templates in c++
Templates in c++
ThamizhselviKrishnam
 
264finalppt (1)
264finalppt (1)
Mahima Verma
 
R programming slides
R programming slides
Pankaj Saini
 
Unit 2 linked list
Unit 2 linked list
DrkhanchanaR
 
Machine Learning in R
Machine Learning in R
Alexandros Karatzoglou
 
Primitive data types
Primitive data types
bad_zurbic
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
izahn
 
Getting Started with R
Getting Started with R
Sankhya_Analytics
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
AlbanLevy
 
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
mporhel
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming Languages
Vasavi College of Engg
 
08 class and object
08 class and object
dhrubo kayal
 
R programming Fundamentals
R programming Fundamentals
Ragia Ibrahim
 
Workshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
R-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
Alvaro Gil
 
Python Programming - XII. File Processing
Python Programming - XII. File Processing
Ranel Padon
 
Intellectual technologies
Intellectual technologies
Polad Saruxanov
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
R programming slides
R programming slides
Pankaj Saini
 
Unit 2 linked list
Unit 2 linked list
DrkhanchanaR
 
Primitive data types
Primitive data types
bad_zurbic
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
izahn
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
AlbanLevy
 
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
mporhel
 
Unit 2 Principles of Programming Languages
Unit 2 Principles of Programming Languages
Vasavi College of Engg
 
08 class and object
08 class and object
dhrubo kayal
 
R programming Fundamentals
R programming Fundamentals
Ragia Ibrahim
 

Similar to Ml programming with python (20)

Python ml
Python ml
Shubham Sharma
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
DSCUSICT
 
Predicting rainfall with data science in python
Predicting rainfall with data science in python
dhanushthurinjikuppa
 
Machine learning Experiments report
Machine learning Experiments report
AlmkdadAli
 
Basic of python for data analysis
Basic of python for data analysis
Pramod Toraskar
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
machinelearningwithpythonppt-230605123325-8b1d6277.pptx
machinelearningwithpythonppt-230605123325-8b1d6277.pptx
geethar79
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting Started
Rafey Iqbal Rahman
 
Artificial Intelligence concepts in a Nutshell
Artificial Intelligence concepts in a Nutshell
kannanalagu1
 
Introduction_to_Python.pptx
Introduction_to_Python.pptx
Vinay Chowdary
 
python-programming-3-books-in-ryan-turner_compress.pdf
python-programming-3-books-in-ryan-turner_compress.pdf
Ahmed Attyub
 
Python Machine Learning-Library- Technology.powerpoint
Python Machine Learning-Library- Technology.powerpoint
Suhana58
 
Python for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
Jupyter machine learning crash course
Jupyter machine learning crash course
Olga Scrivner
 
Pythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.ppt
drakesean662
 
Data Science.pptx
Data Science.pptx
TrainerAnalogicx
 
Machine Learning Using Python.pptx Machine Learning Using PythonMachine Learn...
Machine Learning Using Python.pptx Machine Learning Using PythonMachine Learn...
satyakarunak
 
Data Science With Python
Data Science With Python
Mosky Liu
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
DSCUSICT
 
Predicting rainfall with data science in python
Predicting rainfall with data science in python
dhanushthurinjikuppa
 
Machine learning Experiments report
Machine learning Experiments report
AlmkdadAli
 
Basic of python for data analysis
Basic of python for data analysis
Pramod Toraskar
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
machinelearningwithpythonppt-230605123325-8b1d6277.pptx
machinelearningwithpythonppt-230605123325-8b1d6277.pptx
geethar79
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting Started
Rafey Iqbal Rahman
 
Artificial Intelligence concepts in a Nutshell
Artificial Intelligence concepts in a Nutshell
kannanalagu1
 
Introduction_to_Python.pptx
Introduction_to_Python.pptx
Vinay Chowdary
 
python-programming-3-books-in-ryan-turner_compress.pdf
python-programming-3-books-in-ryan-turner_compress.pdf
Ahmed Attyub
 
Python Machine Learning-Library- Technology.powerpoint
Python Machine Learning-Library- Technology.powerpoint
Suhana58
 
Python for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
Jupyter machine learning crash course
Jupyter machine learning crash course
Olga Scrivner
 
Pythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.ppt
drakesean662
 
Machine Learning Using Python.pptx Machine Learning Using PythonMachine Learn...
Machine Learning Using Python.pptx Machine Learning Using PythonMachine Learn...
satyakarunak
 
Data Science With Python
Data Science With Python
Mosky Liu
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Ad

Recently uploaded (20)

Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
The Man In The Back – Exceptional Delaware.pdf
The Man In The Back – Exceptional Delaware.pdf
dennisongomezk
 
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
Plate Tectonic Boundaries and Continental Drift Theory
Plate Tectonic Boundaries and Continental Drift Theory
Marie
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
Chalukyas of Gujrat, Solanki Dynasty NEP.pptx
Chalukyas of Gujrat, Solanki Dynasty NEP.pptx
Dr. Ravi Shankar Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
Ultimatewinner0342
 
Intellectual Property Right (Jurisprudence).pptx
Intellectual Property Right (Jurisprudence).pptx
Vishal Chanalia
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
HistoPathology Ppt. Arshita Gupta for Diploma
HistoPathology Ppt. Arshita Gupta for Diploma
arshitagupta674
 
Tanja Vujicic - PISA for Schools contact Info
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
The Man In The Back – Exceptional Delaware.pdf
The Man In The Back – Exceptional Delaware.pdf
dennisongomezk
 
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
Plate Tectonic Boundaries and Continental Drift Theory
Plate Tectonic Boundaries and Continental Drift Theory
Marie
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
Ultimatewinner0342
 
Intellectual Property Right (Jurisprudence).pptx
Intellectual Property Right (Jurisprudence).pptx
Vishal Chanalia
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
HistoPathology Ppt. Arshita Gupta for Diploma
HistoPathology Ppt. Arshita Gupta for Diploma
arshitagupta674
 
Tanja Vujicic - PISA for Schools contact Info
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
Ad

Ml programming with python

  • 1. Machine Learning with Python Compiled by : Dr. Kumud Kundu
  • 2. Outline ● The general concepts of machine learning ● The three types of learning and basic terminology ● The building blocks for successfully designing machine learning systems ● Introduction to Pandas, Matlplotlib and sklearn framework ○ For basics of Python refer to (https://www.python.org/) and ○ For basics of NumPy refer to (http://www.numpy.org/). ● Simple Program of Plotting Graphs with Matplotlib.pyplot ● Coding Template of Analyzing and Visualizing Dataframe with Pandas ● Simple Program for supervised learning (prediction modelling) with Linear Regression ● Simple Program for unsupervised learning (clustering) with Kmeans
  • 3. Machine Learning Machine learning, the application and science of algorithms that make sense of data Or Machine Learning uses algorithms that takes input data, learns from data and make informed decisions. Or To design and implement programs that improve with experience
  • 4. ML: Giving Computers the Ability to Learn from Data
  • 5. Machine Learning is… Automating automation Getting computers to program themselves Let the data do the work instead! Training Data model/ predictor past model/ predictor future Testing Data
  • 6. JOURNEY FROM DATA TO PREDICTIONS “Machine learning is the next Internet”
  • 8. Machine learning is inherently a multi-disciplinary field It draws on results from : Artificial intelligence, Probability Statistics Computational complexity theory Information theory Philosophy Psychology Neurobiology and other fields.
  • 9. Most machine learning methods work well because of human-designed representations and input features ML becomes just optimizing weights to best make a final prediction Machine Learning
  • 10. How Machines Learn??? Learning is all about discovering the best parameter values (a, b, c …) that maps input to output. Or The main goal behind learning, we want to learn how the values are calculated (relationships between output and input) i.e. Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y), Y = f(X) The relationships can be linear or non linear. These values enable the learned model to output results for new instances based on previous learned ones.
  • 11. The problem of learning a function from data is a difficult problem and this is the reason why the field of machine learning and machine learning algorithms exist. ● Error creeps in predicting output from real life input data instances (X). i.e. Y = f(X) + e ● This error might be error such as not having enough attributes to sufficiently characterize the best mapping from X to Y. Subject 1 Subject 2 As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis of intensity profile, though expected output is Subject1 with pose Subject 1 with pose
  • 14. The following diagram shows a typical workflow for using machine learning in predictive modeling:
  • 15. ML Program ● A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
  • 16. Python for Machine Learning Program
  • 17. Why Python?? Python is one of the most popular programming languages for data science and thanks to its very active developer and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific computing and machine learning have been developed. For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open source machine learning libraries will be used.
  • 18. Python on Jupyter Notebook The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. The core programming languages supported by Jupyter are Julia, Python and R. Use it on Google Colab colab.research.google.com or Use Jupyter notebook on Anaconda ● Using the Anaconda Python distribution and package manager ● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
  • 19. Key Terms in Machine Language Program ● Training example: A row in a table representing the dataset and synonymous with an observation, record, instance, or sample (in most contexts, sample refers to a collection of training examples). ● Training: Model fitting, for parametric models similar to parameter estimation. ● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input, attribute, or covariate. ● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth. ● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from the expected output.
  • 20. Import the Libraries into the Jupyter Notebook ● Import Numpy as np ● Import Pandas as pd ● Import Matplotlib.pyplot as plt
  • 21. Matplotlib: A Plotting Library for Python ● it makes heavy use of NumPy ● Importing matplotlib : ● from matplotlib import pyplot as plt or ● import matplotlib.pyplot as plt ● Examples: ● # for plotting bar graph ● x=[1,23,4,5,6,7] ● y=[23,45,67,89,90,100] ● plt.bar(x,y) ● plt.title('bar graph') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 22. ● plt.scatter(x,y) ● plt.title('Scatter Plot') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 23. For subplots (Simultaneous plotting) ● Matplotlib.pyplot.subplot ● import numpy as np ● x=np.arange(0,10,0.01) ● plt.subplot(1,3,1) ● plt.plot(x,np.sin(x)) ● plt.subplot(1,3,2) ● plt.plot(x,np.cos(x)) ● plt.subplot(1,3,3) ● plt.plot(x,np.sin(2*x)) ● plt.show()
  • 24. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. Pandas in data analysis: Importing Data Writing to different formats Pandas Data Structures Data Exploration Data Manipulation Aggregating Data Merging Data
  • 25. DataFrame ● DataFrame is a two-dimensional array with heterogeneous data.
  • 26. Reading and Writing into DataFrames ● Import pandas as pd ● Reading Data into Dataframe using Pandas ○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file ○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv') ○ df=pd.read_excel(‘File Name’) ● Writing Data from dataframes to Files on System df.to_csv(‘File Name’ or ‘Destination Path along with path file’) df.to_excel(‘File Name’ or ‘Destination Path along with path file’ To display all the records of the file : display(df) ● types = df.dtypes ● print(types)
  • 27. Getting preview of Dataframe ● To view top n records of dataframe ○ df.head(5) ● To view bottom n records of dataframe ○ df.tail(5) ● View column name ○ df.columns ○ Getting subdataframe from dataframe ○ df['name’] , df[['name','nations']]
  • 28. SubDataFrame as per Query To display the records of India with ranking <50 display(df[(df['nations'] == "IND") & (df['rank’] < 50)]) Selecting data columns from dataset with column names: df[[‘col1’ ‘col2’]] With iloc (integer-location) based indexing for selection by position df.iloc[:,:-1] // select all columns but not the last one df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
  • 29. Drop Columns from a Dataframe using drop() method. Drop Columns from a Dataframe using and drop() method. Method #1: Drop Columns from a Dataframe using drop() method. Remove specific single column. k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset Removing specific multiple columns. k.drop(['rate_date', 'rating'], axis=1) Remove columns as based on column index. k.drop[k.columns[[0,1]],axis=1, inplace= True) Remove all columns between a specific column to another columns K.iloc(:,[3,4])
  • 30. Code for Data Reading, Data Manipulation using Pandas ● # Importing Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function ● df=pd.read_csv('rating.csv') # to display column headers in dataset df.columns ● # to get the number of instances and associated features df.shape # to get insights to data by grouping the data of one column ● df.groupby('nations').size() # to get smaller dataset as per the query or subqueries ● k=(df[(df['nations'] =="IND") & (df['rank']<50)]) # to display smaller subset of data display(k) # to drop desired column from the smaller set of data ● k=dataset.drop(['name','rate_date','nations'],axis=1)
  • 31. Scikit /sklearn: Free Machine Learning Library for Python ● It supports Python numerical and scientific libraries like NumPy and SciPy . ● Model selection is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset. Model selection is a process that can be applied both across different types of models (e.g. logistic regression, SVM, KNN, etc.) ● from sklearn.model_selection ● model_selection is the process of selecting one final machine learning model among a collection of machine learning models for training set. ● model parameters are parameters which arise as a result of the fit
  • 32. Challenge of ML Program The challenge of applied machine learning is in choosing a model among a range of different models for your problem.
  • 33. Simple Predictive ML Program using Linear Regression Model ● SIMPLE_REGRESSION.ipynb On Google Colab # Important Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function df=pd.read_csv('rating.csv.csv') # For plotting graphs import matplotlib.pyplot as plt # Dividing Dataset into Train Set (X) and Target Set (y) X = df.iloc[:, :-1].values y = df.iloc[:, -1].values
  • 34. # from machine learning library of python (sklearn) import train_test_split function from sklearn.model_selection import train_test_split # X is training set # y is the target set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0) # split with the help of train_test_split function # X part is divided in two parts Train and Test # Y part is divided into two parts Train and Test X_test.shape # import Linear Regression Model from sklearn.linear_model import LinearRegression # created instance of linear regression model model = LinearRegression() # Finding the relationship between input AND OUTPUT with the help of fit function model.fit(X_train, y_train) # using the same trained model over the unknown test data i.e. x_test y_pred = model.predict(X_test)
  • 35. Visualizing and Evaluation of results # Visualization of Results plt.scatter(X_train, y_train, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('PCM Marks vs Placement_Package (Training set)') plt.xlabel('PCM Marks') plt.ylabel('Placement_Package') plt.show() # importing metrics from sklearn to evaluate the predicted result from sklearn import metrics print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred)) print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred)) print('Root Mean Squared Error:', # include Numerical Calculation Python Library numpy import numpy as np np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
  • 36. CLUSTERING : Grouping things together UNSUPERVISED LEARNING
  • 37. Cluster Analysis : A method of Unsupervised Learning ● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. ● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. ● To survey academic performance of high school students , the entire population of particular board can be divided into different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
  • 38. K-Means Clustering ● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ● K-Means falls under the category of centroid-based clustering. •n = number of instances •k = number of clusters •t = number of iterations
  • 39. K-Means Clustering Algorithm involves the following steps- ● Choose the number of clusters K. ● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each other. ○ Calculate the distance between each data point and each cluster center by using given distance function. ○ A data point is assigned to that cluster whose center is nearest to that data point. ○ Re-compute the center of newly formed clusters. ○ The center of a cluster is computed by taking mean of all the data points contained in that cluster. ● Keep repeating the above four steps until any of the following stopping criteria is met- ○ No change in the center of newly formed clusters ○ No change in the data points of the cluster ○ Maximum number of iterations are reached
  • 40. Metric to evaluate the quality of Clusters ● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the centroid of that cluster. ● It tells us how far the points within a cluster are ● the distance between them should be as low as possible.
  • 41. from sklearn.cluster import KMeans ● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster centroid. ● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) ● Using the elbow method to find the optimal number of clusters
  • 42. An Elbow Method Algorithm ● The basic idea of the elbow rule is to use a square of the distance between the sample points in each cluster and the centroid of the cluster to give a series of K values. The sum of squared errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE. ● Smaller values indicate that each cluster is more convergent
  • 46. Agglomerative Clustering ● An agglomerative algorithm is a type of hierarchical clustering algorithm where each individual element to be clustered is in its own cluster. These clusters are merged iteratively until all the elements belong to one cluster. ● Hierarchical clustering is a powerful technique that allows to build tree structures from data similarities.
  • 50. Applications of Clustering ● Search Engines. ● Spam Detection ● Customer Segmentation