SlideShare a Scribd company logo
20ACS04 –
PROBLEM SOLVING AND
PROGRAMMING USING
PYTHON
PREPARED BY
Mr. P. NANDAKUMAR
ASSISTANT PROFESSOR,
DEPARTMENT OF INFORMATION TECHNOLOGY,
SVCET.
COURSE CONTENT
UNIT-V INTRODUCTION TO NUMPY, PANDAS,
MATPLOTLIB
Exploratory Data Analysis (EDA), Data Science life cycle,
Descriptive Statistics, Basic tools (plots, graphs and summary
statistics) of EDA, Philosophy of EDA. Data Visualization: Scatter
plot, bar chart, histogram, boxplot, heat maps, etc.
EXPLORATORY DATA ANALYSIS (EDA)
Exploratory Data Analysis (EDA) is an approach that is used to
analyze the data and discover trends, patterns, or check assumptions in
data with the help of statistical summaries and graphical
representations.
Types of EDA
Depending on the number of columns we are analyzing we can divide
EDA into three types.
1. Univariate Analysis
2. Bi-Variate analysis
3. Multivariate Analysis
EXPLORATORY DATA ANALYSIS (EDA)
1. Univariate Analysis – In univariate analysis, we analyze or deal with
only one variable at a time. The analysis of univariate data is thus the
simplest form of analysis since the information deals with only one
quantity that changes. It does not deal with causes or relationships and the
main purpose of the analysis is to describe the data and find patterns that
exist within it.
2. Bi-Variate analysis – This type of data involves two different variables.
The analysis of this type of data deals with causes and relationships and
the analysis is done to find out the relationship between the two variables.
3. Multivariate Analysis – When the data involves three or more variables,
it is categorized under multivariate.
EXPLORATORY DATA ANALYSIS (EDA)
Depending on the type of analysis we can also subcategorize EDA into
two parts.
1. Non-graphical Analysis – In non-graphical analysis, we analyze
data using statistical tools like mean median or mode or skewness
2. Graphical Analysis – In graphical analysis, we use visualizations
charts to visualize trends and patterns in the data
DATA SCIENCE LIFECYCLE
Data Science Lifecycle revolves around the use of machine learning
and different analytical strategies to produce insights and predictions
from information in order to acquire a commercial enterprise
objective.
The complete method includes a number of steps like data cleaning,
preparation, modelling, model evaluation, etc. It is a lengthy procedure
and may additionally take quite a few months to complete.
DATA SCIENCE LIFECYCLE
The following are some primary motives for the use of Data science
technology:
 It helps to convert the big quantity of uncooked and unstructured records
into significant insights.
 It can assist in unique predictions such as a range of surveys, elections, etc.
 It also helps in automating transportation such as growing a self-driving
car, we can say which is the future of transportation.
 Companies are shifting towards Data science and opting for this
technology. Amazon, Netflix, etc, which cope with the big quantity of
data, are the use of information science algorithms for higher consumer
experience.
THE LIFECYCLE OF DATA SCIENCE
DESCRIPTIVE STATISTICS
In Descriptive statistics, we are describing our data with the help of various
representative methods like by using charts, graphs, tables, excel files etc.
In descriptive statistics, we describe our data in some manner and present it in
a meaningful way so that it can be easily understood.
Most of the times it is performed on small data sets and this analysis helps us
a lot to predict some future trends based on the current findings.
Types of Descriptive statistic:
 Measure of central tendency
 Measure of variability
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS
Measure of central tendency:
It represents the whole set of data by single value.It gives us the location of
central points. There are three main measures of central tendency:
1. Mean
2. Mode
3. Median
DESCRIPTIVE STATISTICS
Mean:
It is the sum of observation divided by the total number of observations. It is
also defined as average which is the sum divided by count.
where, n = number of terms
Python Code to find Mean in python:
import numpy as np
# Sample Data
arr = [5, 6, 11]
# Mean
mean = np.mean(arr)
print("Mean = ", mean)
DESCRIPTIVE STATISTICS
Mode:
It is the value that has the highest frequency in the given data set. The data set
may have no mode if the frequency of all data points is the same. Also, we
can have more than one mode if we encounter two or more data points having
the same frequency.
Code to find Mode in python:
from scipy import stats
# sample Data
arr =[1, 2, 2, 3]
# Mode
mode = stats.mode(arr)
print("Mode = ", mode)
DESCRIPTIVE STATISTICS
Median:
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is median
and if it is even then the median would be the average of two central
elements.
where, n=number of terms
Python code to find Median:
import numpy as np
# sample Data
arr =[1, 2, 3, 4]
# Median
median = np.median(arr)
print("Median = ", median)
DESCRIPTIVE STATISTICS
Measure of variability:
Measure of variability is known as the spread of data or how well is our data
is distributed. The most common variability measures are:
1. Range
2. Variance
3. Standard deviation
DESCRIPTIVE STATISTICS
Range:
The range describes the difference between the largest and smallest data point
in our data set. The bigger the range, the more is the spread of data and vice
versa.
Range = Largest data value – smallest data value
Python Code to find Range:
import numpy as np
# Sample Data
arr = [1, 2, 3, 4, 5]
#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)
# Difference Of Max and Min
Range = Maximum-Minimum
print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum,
Minimum, Range))
DESCRIPTIVE STATISTICS
Variance:
It is defined as an average squared deviation from the mean. It is being
calculated by finding the difference between every data point and the average
which is also known as the mean, squaring them, adding all of them and then
dividing by the number of data points present in our data set.
where N = number of terms
u = Mean
Python code to find Variance:
import statistics
# sample data
arr = [1, 2, 3, 4, 5]
# variance
print("Var = ", (statistics.variance(arr)))
DESCRIPTIVE STATISTICS
Standard Deviation:
It is defined as the square root of the variance. It is being calculated by finding
the Mean, then subtract each number from the Mean which is also known as
average and square the result. Adding all the values and then divide by the no
of terms followed the square root.
where N = number of terms
u = Mean
Python code to perform Standard Deviation:
import statistics
# sample data
arr = [1, 2, 3, 4, 5]
# Standard Deviation
print("Std = ", (statistics.stdev(arr)))
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
1. Univariate Non-graphical - this is the simplest form of data analysis as
during this we use just one variable to research the info. The standard
goal of univariate non-graphical EDA is to know the underlying sample
distribution/ data and make observations about the population. Outlier
detection is additionally part of the analysis.
2. Multivariate Non-graphical - Multivariate non-graphical EDA technique
is usually wont to show the connection between two or more variables
within the sort of either cross-tabulation or statistics.
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
3. Univariate graphical - Non-graphical methods are quantitative and
objective, they are not able to give the complete picture of the data;
therefore, graphical methods are used more as they involve a degree of
subjective analysis, also are required. Common sorts of univariate
graphics are:
 Histogram
 Stem-and-leaf plots
 Boxplots
 Quantile-normal plots
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
4. Multivariate graphical - Multivariate graphical data uses graphics to
display relationships between two or more sets of knowledge. The sole
one used commonly may be a grouped barplot with each group
representing one level of 1 of the variables and every bar within a gaggle
representing the amount of the opposite variable.
Other common sorts of multivariate graphics are:
 Scatterplot
 Run chart
 Heat map
 Multivariate chart
 Bubble chart
BASIC TOOLS OF EDA
TOOLS REQUIRED FOR EXPLORATORY DATAANALYSIS:
 R: An open-source programming language and free software environment
for statistical computing and graphics supported by the R foundation for
statistical computing.
 Python: An interpreted, object-oriented programming language with
dynamic semantics. Its high level, built-in data structures, combined with
dynamic binding, make it very attractive for rapid application development,
also as to be used as a scripting or glue language to attach existing
components together.
Ad

Recommended

Basic Statistical and Machine Learning Techniques
Basic Statistical and Machine Learning Techniques
dchatterjee2110
 
Statistics for BSN 3 something that can guide them in their analysis.pptx
Statistics for BSN 3 something that can guide them in their analysis.pptx
Nhelia Santos Perez
 
Data Analysis in Quantitative and Qualitative Research
Data Analysis in Quantitative and Qualitative Research
Nhelia Santos Perez
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
AmmarAhmedSiddiqui2
 
Introduction of data science
Introduction of data science
TanujaSomvanshi1
 
Presentation of BRM.pptx
Presentation of BRM.pptx
Gãurãv Kúmàr
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
TEJVEER SINGH
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
harshrnotaria
 
Educ 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection Tools
Teacher Pauline
 
UNIT-4.docx
UNIT-4.docx
scet315
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
fINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
Data Science 1.pdf
Data Science 1.pdf
ArchanaArya17
 
ANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptx
UtkarshKumar608655
 
Basic Level Quantitative Analysis Using SPSS.ppt
Basic Level Quantitative Analysis Using SPSS.ppt
Dr. Imran Ghaffar Sulehri
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Machine learning module 2
Machine learning module 2
Gokulks007
 
Statistics for data scientists
Statistics for data scientists
Ajay Ohri
 
7 qc tools
7 qc tools
kmsonam
 
Real life application of statistics in engineering
Real life application of statistics in engineering
JannatulFerdous160
 
statistical analysis, analysis of statistical mechanism
statistical analysis, analysis of statistical mechanism
Sanjay100591
 
Research methodology-Research Report
Research methodology-Research Report
DrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data Processing
DrMAlagupriyasafiq
 
ugc carelist journals ugc carelist journals
ugc carelist journals ugc carelist journals
mounikadopenventio
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
MuhammadNafees42
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 

More Related Content

Similar to UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON (20)

B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
harshrnotaria
 
Educ 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection Tools
Teacher Pauline
 
UNIT-4.docx
UNIT-4.docx
scet315
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
fINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
Data Science 1.pdf
Data Science 1.pdf
ArchanaArya17
 
ANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptx
UtkarshKumar608655
 
Basic Level Quantitative Analysis Using SPSS.ppt
Basic Level Quantitative Analysis Using SPSS.ppt
Dr. Imran Ghaffar Sulehri
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Machine learning module 2
Machine learning module 2
Gokulks007
 
Statistics for data scientists
Statistics for data scientists
Ajay Ohri
 
7 qc tools
7 qc tools
kmsonam
 
Real life application of statistics in engineering
Real life application of statistics in engineering
JannatulFerdous160
 
statistical analysis, analysis of statistical mechanism
statistical analysis, analysis of statistical mechanism
Sanjay100591
 
Research methodology-Research Report
Research methodology-Research Report
DrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data Processing
DrMAlagupriyasafiq
 
ugc carelist journals ugc carelist journals
ugc carelist journals ugc carelist journals
mounikadopenventio
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
MuhammadNafees42
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
harshrnotaria
 
Educ 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection Tools
Teacher Pauline
 
UNIT-4.docx
UNIT-4.docx
scet315
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
Basic Level Quantitative Analysis Using SPSS.ppt
Basic Level Quantitative Analysis Using SPSS.ppt
Dr. Imran Ghaffar Sulehri
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Machine learning module 2
Machine learning module 2
Gokulks007
 
Statistics for data scientists
Statistics for data scientists
Ajay Ohri
 
7 qc tools
7 qc tools
kmsonam
 
Real life application of statistics in engineering
Real life application of statistics in engineering
JannatulFerdous160
 
statistical analysis, analysis of statistical mechanism
statistical analysis, analysis of statistical mechanism
Sanjay100591
 
Research methodology-Research Report
Research methodology-Research Report
DrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data Processing
DrMAlagupriyasafiq
 
ugc carelist journals ugc carelist journals
ugc carelist journals ugc carelist journals
mounikadopenventio
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
MuhammadNafees42
 

More from Nandakumar P (17)

UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
Python Course for Beginners
Python Course for Beginners
Nandakumar P
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed Systems
Nandakumar P
 
Unit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in Engineering
Nandakumar P
 
Unit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in Engineering
Nandakumar P
 
Naming in Distributed Systems
Naming in Distributed Systems
Nandakumar P
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File System
Nandakumar P
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
Professional Ethics in Engineering
Professional Ethics in Engineering
Nandakumar P
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
Python Course for Beginners
Python Course for Beginners
Nandakumar P
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed Systems
Nandakumar P
 
Unit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in Engineering
Nandakumar P
 
Unit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in Engineering
Nandakumar P
 
Naming in Distributed Systems
Naming in Distributed Systems
Nandakumar P
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File System
Nandakumar P
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
Professional Ethics in Engineering
Professional Ethics in Engineering
Nandakumar P
 
Ad

Recently uploaded (20)

Paper 108 | Thoreau’s Influence on Gandhi: The Evolution of Civil Disobedience
Paper 108 | Thoreau’s Influence on Gandhi: The Evolution of Civil Disobedience
Rajdeep Bavaliya
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
 
Health Care Planning and Organization of Health Care at Various Levels – Unit...
Health Care Planning and Organization of Health Care at Various Levels – Unit...
RAKESH SAJJAN
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
LDM Recording Presents Yogi Goddess by LDMMIA
LDM Recording Presents Yogi Goddess by LDMMIA
LDM & Mia eStudios
 
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
 
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
National Information Standards Organization (NISO)
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
“THE BEST CLASS IN SCHOOL”. _
“THE BEST CLASS IN SCHOOL”. _
Colégio Santa Teresinha
 
SCHIZOPHRENIA OTHER PSYCHOTIC DISORDER LIKE Persistent delusion/Capgras syndr...
SCHIZOPHRENIA OTHER PSYCHOTIC DISORDER LIKE Persistent delusion/Capgras syndr...
parmarjuli1412
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
 
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDM & Mia eStudios
 
BINARY files CSV files JSON files with example.pptx
BINARY files CSV files JSON files with example.pptx
Ramakrishna Reddy Bijjam
 
Paper 108 | Thoreau’s Influence on Gandhi: The Evolution of Civil Disobedience
Paper 108 | Thoreau’s Influence on Gandhi: The Evolution of Civil Disobedience
Rajdeep Bavaliya
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
 
Health Care Planning and Organization of Health Care at Various Levels – Unit...
Health Care Planning and Organization of Health Care at Various Levels – Unit...
RAKESH SAJJAN
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
LDM Recording Presents Yogi Goddess by LDMMIA
LDM Recording Presents Yogi Goddess by LDMMIA
LDM & Mia eStudios
 
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
SCHIZOPHRENIA OTHER PSYCHOTIC DISORDER LIKE Persistent delusion/Capgras syndr...
SCHIZOPHRENIA OTHER PSYCHOTIC DISORDER LIKE Persistent delusion/Capgras syndr...
parmarjuli1412
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
 
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDMMIA Practitioner Student Reiki Yoga S2 Video PDF Without Yogi Goddess
LDM & Mia eStudios
 
BINARY files CSV files JSON files with example.pptx
BINARY files CSV files JSON files with example.pptx
Ramakrishna Reddy Bijjam
 
Ad

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON

  • 1. 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON PREPARED BY Mr. P. NANDAKUMAR ASSISTANT PROFESSOR, DEPARTMENT OF INFORMATION TECHNOLOGY, SVCET.
  • 2. COURSE CONTENT UNIT-V INTRODUCTION TO NUMPY, PANDAS, MATPLOTLIB Exploratory Data Analysis (EDA), Data Science life cycle, Descriptive Statistics, Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA. Data Visualization: Scatter plot, bar chart, histogram, boxplot, heat maps, etc.
  • 3. EXPLORATORY DATA ANALYSIS (EDA) Exploratory Data Analysis (EDA) is an approach that is used to analyze the data and discover trends, patterns, or check assumptions in data with the help of statistical summaries and graphical representations. Types of EDA Depending on the number of columns we are analyzing we can divide EDA into three types. 1. Univariate Analysis 2. Bi-Variate analysis 3. Multivariate Analysis
  • 4. EXPLORATORY DATA ANALYSIS (EDA) 1. Univariate Analysis – In univariate analysis, we analyze or deal with only one variable at a time. The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes. It does not deal with causes or relationships and the main purpose of the analysis is to describe the data and find patterns that exist within it. 2. Bi-Variate analysis – This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship between the two variables. 3. Multivariate Analysis – When the data involves three or more variables, it is categorized under multivariate.
  • 5. EXPLORATORY DATA ANALYSIS (EDA) Depending on the type of analysis we can also subcategorize EDA into two parts. 1. Non-graphical Analysis – In non-graphical analysis, we analyze data using statistical tools like mean median or mode or skewness 2. Graphical Analysis – In graphical analysis, we use visualizations charts to visualize trends and patterns in the data
  • 6. DATA SCIENCE LIFECYCLE Data Science Lifecycle revolves around the use of machine learning and different analytical strategies to produce insights and predictions from information in order to acquire a commercial enterprise objective. The complete method includes a number of steps like data cleaning, preparation, modelling, model evaluation, etc. It is a lengthy procedure and may additionally take quite a few months to complete.
  • 7. DATA SCIENCE LIFECYCLE The following are some primary motives for the use of Data science technology:  It helps to convert the big quantity of uncooked and unstructured records into significant insights.  It can assist in unique predictions such as a range of surveys, elections, etc.  It also helps in automating transportation such as growing a self-driving car, we can say which is the future of transportation.  Companies are shifting towards Data science and opting for this technology. Amazon, Netflix, etc, which cope with the big quantity of data, are the use of information science algorithms for higher consumer experience.
  • 8. THE LIFECYCLE OF DATA SCIENCE
  • 9. DESCRIPTIVE STATISTICS In Descriptive statistics, we are describing our data with the help of various representative methods like by using charts, graphs, tables, excel files etc. In descriptive statistics, we describe our data in some manner and present it in a meaningful way so that it can be easily understood. Most of the times it is performed on small data sets and this analysis helps us a lot to predict some future trends based on the current findings. Types of Descriptive statistic:  Measure of central tendency  Measure of variability
  • 11. DESCRIPTIVE STATISTICS Measure of central tendency: It represents the whole set of data by single value.It gives us the location of central points. There are three main measures of central tendency: 1. Mean 2. Mode 3. Median
  • 12. DESCRIPTIVE STATISTICS Mean: It is the sum of observation divided by the total number of observations. It is also defined as average which is the sum divided by count. where, n = number of terms Python Code to find Mean in python: import numpy as np # Sample Data arr = [5, 6, 11] # Mean mean = np.mean(arr) print("Mean = ", mean)
  • 13. DESCRIPTIVE STATISTICS Mode: It is the value that has the highest frequency in the given data set. The data set may have no mode if the frequency of all data points is the same. Also, we can have more than one mode if we encounter two or more data points having the same frequency. Code to find Mode in python: from scipy import stats # sample Data arr =[1, 2, 2, 3] # Mode mode = stats.mode(arr) print("Mode = ", mode)
  • 14. DESCRIPTIVE STATISTICS Median: It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is median and if it is even then the median would be the average of two central elements. where, n=number of terms Python code to find Median: import numpy as np # sample Data arr =[1, 2, 3, 4] # Median median = np.median(arr) print("Median = ", median)
  • 15. DESCRIPTIVE STATISTICS Measure of variability: Measure of variability is known as the spread of data or how well is our data is distributed. The most common variability measures are: 1. Range 2. Variance 3. Standard deviation
  • 16. DESCRIPTIVE STATISTICS Range: The range describes the difference between the largest and smallest data point in our data set. The bigger the range, the more is the spread of data and vice versa. Range = Largest data value – smallest data value Python Code to find Range: import numpy as np # Sample Data arr = [1, 2, 3, 4, 5] #Finding Max Maximum = max(arr) # Finding Min Minimum = min(arr) # Difference Of Max and Min Range = Maximum-Minimum print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum, Minimum, Range))
  • 17. DESCRIPTIVE STATISTICS Variance: It is defined as an average squared deviation from the mean. It is being calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them and then dividing by the number of data points present in our data set. where N = number of terms u = Mean Python code to find Variance: import statistics # sample data arr = [1, 2, 3, 4, 5] # variance print("Var = ", (statistics.variance(arr)))
  • 18. DESCRIPTIVE STATISTICS Standard Deviation: It is defined as the square root of the variance. It is being calculated by finding the Mean, then subtract each number from the Mean which is also known as average and square the result. Adding all the values and then divide by the no of terms followed the square root. where N = number of terms u = Mean Python code to perform Standard Deviation: import statistics # sample data arr = [1, 2, 3, 4, 5] # Standard Deviation print("Std = ", (statistics.stdev(arr)))
  • 19. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 1. Univariate Non-graphical - this is the simplest form of data analysis as during this we use just one variable to research the info. The standard goal of univariate non-graphical EDA is to know the underlying sample distribution/ data and make observations about the population. Outlier detection is additionally part of the analysis. 2. Multivariate Non-graphical - Multivariate non-graphical EDA technique is usually wont to show the connection between two or more variables within the sort of either cross-tabulation or statistics.
  • 20. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 3. Univariate graphical - Non-graphical methods are quantitative and objective, they are not able to give the complete picture of the data; therefore, graphical methods are used more as they involve a degree of subjective analysis, also are required. Common sorts of univariate graphics are:  Histogram  Stem-and-leaf plots  Boxplots  Quantile-normal plots
  • 21. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 4. Multivariate graphical - Multivariate graphical data uses graphics to display relationships between two or more sets of knowledge. The sole one used commonly may be a grouped barplot with each group representing one level of 1 of the variables and every bar within a gaggle representing the amount of the opposite variable. Other common sorts of multivariate graphics are:  Scatterplot  Run chart  Heat map  Multivariate chart  Bubble chart
  • 22. BASIC TOOLS OF EDA TOOLS REQUIRED FOR EXPLORATORY DATAANALYSIS:  R: An open-source programming language and free software environment for statistical computing and graphics supported by the R foundation for statistical computing.  Python: An interpreted, object-oriented programming language with dynamic semantics. Its high level, built-in data structures, combined with dynamic binding, make it very attractive for rapid application development, also as to be used as a scripting or glue language to attach existing components together.