SlideShare a Scribd company logo
Basic Analysis Using
Python
SECTION 1
Descriptive Statistics
Summarising your Data
2
Data Snapshot
Data Descriptionbasic_salary_P3
3
The data has 41 rows and 7 columns
First_Name First Name
Last_Name Last Name
Grade Grade
Location Location
Function Department
ba Basic Allowance
ms Management
Supplements
Describing Variable
salary.describe()
ba ms
count 39.000000 37.000000
mean 17209.743590 11939.054054
std 4159.515241 3223.018305
min 10940.000000 2700.000000
25% 13785.000000 10450.000000
50% 16230.000000 12420.000000
75% 19305.000000 14200.000000
max 29080.000000 16970.000000
4
salary=pd.read_csv('basic_salary_P3.csv')
#Importing Data
#Checking the variable features using summary function
summary() gives descriptive measures for numeric variable
Measures of Central Tendency
print(salary.ba.mean())
17209.74
# Mean
mean(), gives mean of the variable.
print(salary.ba.median())
16230
# Median
median() gives median of the variable.
from scipy import stats
BasicAll=salary.ba.dropna(axis=0)
trimmed_mean= trim_mean(BasicAll, 0.1)
trimmed_mean
16879
Import stats from scipy.
Missing values are removed from ba
using dropna()
Here, trim_mean() is excluding 10%
observations from each side of the
data from the mean
print(salary.ba.mode())
NA
# Mode
mode() gives us the mode of the variable.
5
Measures of Variation
statistics.variance(BasicAll)
17301567
6
import statistics
statistics.stdev(BasicAll)
4159.515
# Standard Deviation
Import statistics library to use functions for
calculating standard deviation and variance
Use the BasicAll object created previously, for
calculating Standard deviation, variance and co-
efficient of variation.
stdev() gives standard deviation of the variable
var() gives variance of the variable
stats.variation(BasicAll)
0.23857
# Co-efficient of Variation
variation() from scipy.stats gives us the co-
efficient of variation.
Skewness and Kurtosis
stats.kurtosis(BasicAll, bias=False)
0.4996513
7
stats.skew(BasicAll, bias=False)
0.9033507
# Skewness
skew() gives skewness of the variable.
bias=False corrects the calculations for
statistical bias.
from scipy import stats Using package scipy to calculate skewness
and kurtosis.
# Kurtosis
kurtosis() gives kurtosis of the variable.
SECTION 2
Bivariate Analysis
8
Data Snapshot
The data has 25 rows and 6 columns
empno Employee Number
aptitude Aptitude Score of the
Employee
testofen Test of English
tech_ Technical Score
g_k_ General Knowledge Score
job_prof Job Proficiency Score
Data Description
job_proficiency_P3
9
Scatter Plot
10
import pandas as pd
import matplotlib as mlt
import matplotlib.pyplot as plt
job= pd.read_csv('job_proficiency_P3')
plt.scatter(job.aptitude,job.job_prof)
# Plotting Scatter plot
scatter() gives a scatterplot of
the two variables mentioned.
col= Argument to add colour
Pearson Correlation Coefficient
Pearson Correlation Coefficient 0.5144
There is positive relation between aptitude and job proficiency but
the relation is of moderate degree.
import numpy as np
np.corrcoef(job.aptitude,job.job_prof)
# Scatterplot
array([[ 1. , 0.51441069],
[ 0.51441069, 1. ]])
corrcoef gives the Pearson Correlation
Coefficient of the two variables mentioned
sns.lmplot('aptitude','job_prof',data=job);plt.xlabel('Aptitude');plt.yl
abel('Job Proficiency')
ScatterPlot with Regression Line
#Scatterplot of job proficiency against aptitude with Regression Line
12
#Importing Library Seaborn
import seaborn as sns
sns.lmplot Calls a scatter plot from sns object
plt.xlabel Defines the label on the X axis
Plt.ylabel Defines the label on the Y axis
13
OUT [3]:
ScatterPlot with Regression Line
Scatter Plot Matrix using
seabornpackage
14
sns.pairplot(job)
#ScatterPlot Matrix
SECTION 3
DataVisualisation
Graphs in Python
15
Data Snapshot
The data has 1000 rows and 10
columns
CustID Customer ID
Age Age
Gender Gender
PinCode PinCode
Active Whether the customer
was active in past 24
weeks or not
Calls Number of Calls made
Minutes Number of minutes
spoken
Amt Amount charged
AvgTime Mean Time per call
Age_Group Age Group of the
Customer
Data Descriptiontelecom_P3
16
Data Visualization
Data Visualization is possible thanks to matplotlib. It is a multiplatform visualization
tool built on top of Numpy that works with the SciPy library to create graphical models .
It provides the user with complete control over the graph and comes with two interfaces,
an object oriented style and a MATPLOT style.
matplotlib is fairly low level and can be cumbersome to use byitself, which is why
several libraries and wrappers exist on top of it's API such as Seaborn, Altair, Bokeh and
even pandas.
We will be using the pandas wrapper as a quick tool for visualizing our data and learn
about seaborn as we move on to higher level visualizations. However, the fact remains
that we will essentially working with matplotlib for both.
17
telecom_data=pd.read_csv('telecom_P3.csv')
import pandas as pd
import matplotlib as mlt
import matplotlib.pyplot as plt
import seaborn as sns
Diagrams
#Importing the Libraries
#Importing Data
18
#Aggregate & Merge Data
working=telecom_data.groupby('Age_Group')['CustID'].count()
Aggregating the CustID data by the age groups.
Simple Bar Chart
19
working.plot.bar(title='Simple Bar Chart')
#Create a basic bar chart using plot function
plot() This function is a convenience method to plot all columns
with labels
bar() Plots a bar chart. Can also be called by passing the
argument kind ='bar' in plot.
title A string argument to give the plot a title.
Simple Bar Chart
20
OUT [7]:
Simple Bar Chart
21
plt.figure(); working.plot.bar(title='Simple Bar Chart', color='red');
plt.xlabel('Age Groups'); plt.ylabel('No. of Calls')
#Customizing your chart using additional arguments (both provide the same results)
plt.figure() This function is a convenience method to
plot all columns with labels.
ax Matplotlib axes object containing the actual
plot (with data points).
color An argument to specify the plot colour.
Accepts strings, hex numbers and colour
code.
plt.xlabel,
ax.set_xlabel
Function/method to specify the x label.
plt.ylabel,
ax.set_ylabel
Function/method to specify the x label.
plt.figure(); ax=working.plot.bar(title='Simple Bar Chart', color='red');
ax.set_xlabel('Age Groups'); ax.set_ylabel('No. of Calls')
OR
Simple Bar Chart
22
OUT [8]:
Stacked Bar Chart
23
#Stacked Bar Chart
pivot_table Reshapes the data and aggregates according to function
specified. Here, we are aggregating the number of calls made by
gender and age group.
index The column or array to group by on the x axes (pivot table rows).
columns The column or array to group by on the y axes (pivot table
column).
values Column to aggregate
aggfunc Function to aggregate by.
stacked Returns a stacked chart. Default is False.
working2=pd.pivot_table(telecom_data, index=['Age_Group'],
columns=['Gender'], values=['CustID'], aggfunc='count')
plt.figure(); working2.plot.bar(title='Stacked Bar Chart', stacked=True);
plt.xlabel('Age Groups'); plt.ylabel('No. of Calls')
Stacked Bar Chart
24
OUT [11]:
Percentage Bar Chart
25
#Stacked Bar Chart
working3=working2.div(working2.sum(1).astype(float), axis=0)
plt.figure(); working3.plot.bar(title='Percentage Bar Chart',
stacked=True); plt.xlabel('Age Groups'); plt.ylabel('No. of Calls')
Creates percentage values by dividing the count data by column sum.
ax Matplotlib axes object contaning the actual plot (with data
points).
color An argument to specify the plot colour. Accepts strings, hex
numbers and colour code.
plt.xlabel,
ax.set_xlabe
l
Function/method to specify the x label.
plt.ylabel,
ax.set_ylabe
l
Function/method to specify the x label.
Percentage Bar Chart
26
OUT [13]:
Multiple Bar Chart
27
#Stacked Bar Chart
pivot_table Reshapes the data and aggregates according to function
specified.
index The column or array to group by on the x axes (pivot table rows).
columns The column or array to group by on the y axes (pivot table
column).
values Column to aggregate
aggfunction Function to aggregate by.
plt.figure(); working2.plot.bar(title='Multiple Bar Chart');
plt.xlabel('Age Groups'); plt.ylabel('No. of Customers')
Multiple Bar Chart
28
OUT [14]:
Pie Chart
29
working.plot.pie(label=('Age Groups'), colormap='brg')
#Pie Bar Chart
pie() Creates a pie chart
label Specifies the Label to be used
colormap String argument that specifies what colors to choose from
Pie Chart
OUT [15]:
Box Plot
31
telecom_data.Calls.plot.box(label='No. Of Calls')
#BoxPlot
box() in pandas yields a different types of box chart
Calls specifies vector (column) for which the box plot needs to be plotted
label provides a user defined label for the variable on Y axis
color can be used to input your choice of color to the bars
BoxPlot Chart
32
OUT [17]:
Box Plot
33
telecom_data.boxplot(column='Calls', by='Age_Group', grid=False)
#BoxPlot using multiple variables. Here, we are plotting number of calls
by gender.
boxplot() in pandas yields a different types of box chart. It's a different way of
writing plot.box()
column specifies vector (variable) for which the box plot needs to be plotted
by Specifies the vector (column) by which the distribution should be plotted.
label provides a user defined label for the variable on Y axis
color can be used to input your choice of color to the bars
grid Can be used to remove the background grid seen in each plot
Box Plot
34
OUT [18]:
Histogram
35
telecom_data.Calls.hist(bins=12,grid=False)
#Histogram
hist() in base Python yields a histogram
bins specifies the width of each bar
label provides a user defined label for the variable on X and Y axis
color can be used to input your choice of color to the bars
Histogram
36
Out [18]:
Stem and Leaf Plot
37
plt.stem(telecom_data.Calls)
#Stem and Leaf Plot using matplotlib
stem() in matplotlib yields a stem and leaf chart
telecom_data.Ca
lls
specifies vector (variable) for which the stemplot needs to be plotted
Heat Map
38
plt.show; ax=sns.heatmap(agg);ax.set(xlabel='Gender', ylabel='Age
Group',title='Heatmap for Number of Calls by Age & Gender')
# Heat Map
ax Axes object returned by seaborn
heatmap() Seaborn method for creating a heatmap
ax.set Sets text data in the graph
linewidths Adds lines between each cell. Default is zero.
#Importing data and aggregating calls by gender and age group
agg=pd.pivot_table(telecom_data, index=['Age_Group'],
columns=['Gender'], values=['Calls'], aggfunc='sum')
Heat Map
39
OUT [8]
THANK YOU!
40

More Related Content

PPT
PPT
Arrays Data Structure
PDF
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
PPTX
Data types in python
PPTX
Operators and expressions in c language
PPT
Python Pandas
PDF
Introduction to NumPy (PyData SV 2013)
PDF
Variables & Data Types In Python | Edureka
Arrays Data Structure
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Data types in python
Operators and expressions in c language
Python Pandas
Introduction to NumPy (PyData SV 2013)
Variables & Data Types In Python | Edureka

What's hot (20)

PDF
Arrays in python
PPTX
Memory management in python
PPTX
Operators in Python
PPTX
Data Analysis with Python Pandas
PPTX
Introduction to matplotlib
PDF
basic of desicion control statement in python
PPTX
Python array
PDF
PythonOOP
PPTX
Basics of Object Oriented Programming in Python
PDF
pandas - Python Data Analysis
PDF
Python programming : Arrays
PDF
Python Basics
PPTX
Java I/O and Object Serialization
PDF
Python Programming: Lists, Modules, Exceptions
PDF
Python Data Types.pdf
PDF
Python Programming
PDF
Operators in python
PPTX
Tree in data structure
PPTX
Data structure using c module 1
PPTX
Functions in c++
Arrays in python
Memory management in python
Operators in Python
Data Analysis with Python Pandas
Introduction to matplotlib
basic of desicion control statement in python
Python array
PythonOOP
Basics of Object Oriented Programming in Python
pandas - Python Data Analysis
Python programming : Arrays
Python Basics
Java I/O and Object Serialization
Python Programming: Lists, Modules, Exceptions
Python Data Types.pdf
Python Programming
Operators in python
Tree in data structure
Data structure using c module 1
Functions in c++
Ad

Similar to Basic Analysis using Python (20)

PPTX
Basic Analysis using R
PPTX
matplotlib.pptxdsfdsfdsfdsdsfdsdfdsfsdf cvvf
PPTX
Data Visualization 2020_21
PPTX
interenship.pptx
PPTX
MatplotLib.pptx
PPTX
PPT on Data Science Using Python
PPTX
Python programming workshop
PPTX
Python Visualization API Primersubplots
PPT
Python High Level Functions_Ch 11.ppt
DOCX
Introduction to r
PPT
Matlab1
PPTX
Data Science.pptx00000000000000000000000
PPTX
statistical computation using R- an intro..
PPTX
matlab presentation fro engninering students
PDF
Advanced Web Technology ass.pdf
PPTX
Unit3-v1-Plotting and Visualization.pptx
PPTX
CIV1900 Matlab - Plotting & Coursework
PDF
Programs in array using SWIFT
PPTX
R Programming.pptx
PPTX
Lecture 9.pptx
Basic Analysis using R
matplotlib.pptxdsfdsfdsfdsdsfdsdfdsfsdf cvvf
Data Visualization 2020_21
interenship.pptx
MatplotLib.pptx
PPT on Data Science Using Python
Python programming workshop
Python Visualization API Primersubplots
Python High Level Functions_Ch 11.ppt
Introduction to r
Matlab1
Data Science.pptx00000000000000000000000
statistical computation using R- an intro..
matlab presentation fro engninering students
Advanced Web Technology ass.pdf
Unit3-v1-Plotting and Visualization.pptx
CIV1900 Matlab - Plotting & Coursework
Programs in array using SWIFT
R Programming.pptx
Lecture 9.pptx
Ad

More from Sankhya_Analytics (8)

PPTX
Getting Started with Python
PPTX
Data Management in Python
PPTX
Getting Started with MySQL II
PPTX
Getting Started with MySQL I
PPTX
Getting Started with R
PPTX
Data Management in R
PPTX
R Get Started II
PPTX
R Get Started I
Getting Started with Python
Data Management in Python
Getting Started with MySQL II
Getting Started with MySQL I
Getting Started with R
Data Management in R
R Get Started II
R Get Started I

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Mega Projects Data Mega Projects Data
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Lecture1 pattern recognition............
PDF
Business Analytics and business intelligence.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
1_Introduction to advance data techniques.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Mega Projects Data Mega Projects Data
ISS -ESG Data flows What is ESG and HowHow
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Lecture1 pattern recognition............
Business Analytics and business intelligence.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Acumen Training GuidePresentation.pptx
Reliability_Chapter_ presentation 1221.5784
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Supervised vs unsupervised machine learning algorithms
1_Introduction to advance data techniques.pptx

Basic Analysis using Python

  • 3. Data Snapshot Data Descriptionbasic_salary_P3 3 The data has 41 rows and 7 columns First_Name First Name Last_Name Last Name Grade Grade Location Location Function Department ba Basic Allowance ms Management Supplements
  • 4. Describing Variable salary.describe() ba ms count 39.000000 37.000000 mean 17209.743590 11939.054054 std 4159.515241 3223.018305 min 10940.000000 2700.000000 25% 13785.000000 10450.000000 50% 16230.000000 12420.000000 75% 19305.000000 14200.000000 max 29080.000000 16970.000000 4 salary=pd.read_csv('basic_salary_P3.csv') #Importing Data #Checking the variable features using summary function summary() gives descriptive measures for numeric variable
  • 5. Measures of Central Tendency print(salary.ba.mean()) 17209.74 # Mean mean(), gives mean of the variable. print(salary.ba.median()) 16230 # Median median() gives median of the variable. from scipy import stats BasicAll=salary.ba.dropna(axis=0) trimmed_mean= trim_mean(BasicAll, 0.1) trimmed_mean 16879 Import stats from scipy. Missing values are removed from ba using dropna() Here, trim_mean() is excluding 10% observations from each side of the data from the mean print(salary.ba.mode()) NA # Mode mode() gives us the mode of the variable. 5
  • 6. Measures of Variation statistics.variance(BasicAll) 17301567 6 import statistics statistics.stdev(BasicAll) 4159.515 # Standard Deviation Import statistics library to use functions for calculating standard deviation and variance Use the BasicAll object created previously, for calculating Standard deviation, variance and co- efficient of variation. stdev() gives standard deviation of the variable var() gives variance of the variable stats.variation(BasicAll) 0.23857 # Co-efficient of Variation variation() from scipy.stats gives us the co- efficient of variation.
  • 7. Skewness and Kurtosis stats.kurtosis(BasicAll, bias=False) 0.4996513 7 stats.skew(BasicAll, bias=False) 0.9033507 # Skewness skew() gives skewness of the variable. bias=False corrects the calculations for statistical bias. from scipy import stats Using package scipy to calculate skewness and kurtosis. # Kurtosis kurtosis() gives kurtosis of the variable.
  • 9. Data Snapshot The data has 25 rows and 6 columns empno Employee Number aptitude Aptitude Score of the Employee testofen Test of English tech_ Technical Score g_k_ General Knowledge Score job_prof Job Proficiency Score Data Description job_proficiency_P3 9
  • 10. Scatter Plot 10 import pandas as pd import matplotlib as mlt import matplotlib.pyplot as plt job= pd.read_csv('job_proficiency_P3') plt.scatter(job.aptitude,job.job_prof) # Plotting Scatter plot scatter() gives a scatterplot of the two variables mentioned. col= Argument to add colour
  • 11. Pearson Correlation Coefficient Pearson Correlation Coefficient 0.5144 There is positive relation between aptitude and job proficiency but the relation is of moderate degree. import numpy as np np.corrcoef(job.aptitude,job.job_prof) # Scatterplot array([[ 1. , 0.51441069], [ 0.51441069, 1. ]]) corrcoef gives the Pearson Correlation Coefficient of the two variables mentioned
  • 12. sns.lmplot('aptitude','job_prof',data=job);plt.xlabel('Aptitude');plt.yl abel('Job Proficiency') ScatterPlot with Regression Line #Scatterplot of job proficiency against aptitude with Regression Line 12 #Importing Library Seaborn import seaborn as sns sns.lmplot Calls a scatter plot from sns object plt.xlabel Defines the label on the X axis Plt.ylabel Defines the label on the Y axis
  • 13. 13 OUT [3]: ScatterPlot with Regression Line
  • 14. Scatter Plot Matrix using seabornpackage 14 sns.pairplot(job) #ScatterPlot Matrix
  • 16. Data Snapshot The data has 1000 rows and 10 columns CustID Customer ID Age Age Gender Gender PinCode PinCode Active Whether the customer was active in past 24 weeks or not Calls Number of Calls made Minutes Number of minutes spoken Amt Amount charged AvgTime Mean Time per call Age_Group Age Group of the Customer Data Descriptiontelecom_P3 16
  • 17. Data Visualization Data Visualization is possible thanks to matplotlib. It is a multiplatform visualization tool built on top of Numpy that works with the SciPy library to create graphical models . It provides the user with complete control over the graph and comes with two interfaces, an object oriented style and a MATPLOT style. matplotlib is fairly low level and can be cumbersome to use byitself, which is why several libraries and wrappers exist on top of it's API such as Seaborn, Altair, Bokeh and even pandas. We will be using the pandas wrapper as a quick tool for visualizing our data and learn about seaborn as we move on to higher level visualizations. However, the fact remains that we will essentially working with matplotlib for both. 17
  • 18. telecom_data=pd.read_csv('telecom_P3.csv') import pandas as pd import matplotlib as mlt import matplotlib.pyplot as plt import seaborn as sns Diagrams #Importing the Libraries #Importing Data 18 #Aggregate & Merge Data working=telecom_data.groupby('Age_Group')['CustID'].count() Aggregating the CustID data by the age groups.
  • 19. Simple Bar Chart 19 working.plot.bar(title='Simple Bar Chart') #Create a basic bar chart using plot function plot() This function is a convenience method to plot all columns with labels bar() Plots a bar chart. Can also be called by passing the argument kind ='bar' in plot. title A string argument to give the plot a title.
  • 21. Simple Bar Chart 21 plt.figure(); working.plot.bar(title='Simple Bar Chart', color='red'); plt.xlabel('Age Groups'); plt.ylabel('No. of Calls') #Customizing your chart using additional arguments (both provide the same results) plt.figure() This function is a convenience method to plot all columns with labels. ax Matplotlib axes object containing the actual plot (with data points). color An argument to specify the plot colour. Accepts strings, hex numbers and colour code. plt.xlabel, ax.set_xlabel Function/method to specify the x label. plt.ylabel, ax.set_ylabel Function/method to specify the x label. plt.figure(); ax=working.plot.bar(title='Simple Bar Chart', color='red'); ax.set_xlabel('Age Groups'); ax.set_ylabel('No. of Calls') OR
  • 23. Stacked Bar Chart 23 #Stacked Bar Chart pivot_table Reshapes the data and aggregates according to function specified. Here, we are aggregating the number of calls made by gender and age group. index The column or array to group by on the x axes (pivot table rows). columns The column or array to group by on the y axes (pivot table column). values Column to aggregate aggfunc Function to aggregate by. stacked Returns a stacked chart. Default is False. working2=pd.pivot_table(telecom_data, index=['Age_Group'], columns=['Gender'], values=['CustID'], aggfunc='count') plt.figure(); working2.plot.bar(title='Stacked Bar Chart', stacked=True); plt.xlabel('Age Groups'); plt.ylabel('No. of Calls')
  • 25. Percentage Bar Chart 25 #Stacked Bar Chart working3=working2.div(working2.sum(1).astype(float), axis=0) plt.figure(); working3.plot.bar(title='Percentage Bar Chart', stacked=True); plt.xlabel('Age Groups'); plt.ylabel('No. of Calls') Creates percentage values by dividing the count data by column sum. ax Matplotlib axes object contaning the actual plot (with data points). color An argument to specify the plot colour. Accepts strings, hex numbers and colour code. plt.xlabel, ax.set_xlabe l Function/method to specify the x label. plt.ylabel, ax.set_ylabe l Function/method to specify the x label.
  • 27. Multiple Bar Chart 27 #Stacked Bar Chart pivot_table Reshapes the data and aggregates according to function specified. index The column or array to group by on the x axes (pivot table rows). columns The column or array to group by on the y axes (pivot table column). values Column to aggregate aggfunction Function to aggregate by. plt.figure(); working2.plot.bar(title='Multiple Bar Chart'); plt.xlabel('Age Groups'); plt.ylabel('No. of Customers')
  • 29. Pie Chart 29 working.plot.pie(label=('Age Groups'), colormap='brg') #Pie Bar Chart pie() Creates a pie chart label Specifies the Label to be used colormap String argument that specifies what colors to choose from
  • 31. Box Plot 31 telecom_data.Calls.plot.box(label='No. Of Calls') #BoxPlot box() in pandas yields a different types of box chart Calls specifies vector (column) for which the box plot needs to be plotted label provides a user defined label for the variable on Y axis color can be used to input your choice of color to the bars
  • 33. Box Plot 33 telecom_data.boxplot(column='Calls', by='Age_Group', grid=False) #BoxPlot using multiple variables. Here, we are plotting number of calls by gender. boxplot() in pandas yields a different types of box chart. It's a different way of writing plot.box() column specifies vector (variable) for which the box plot needs to be plotted by Specifies the vector (column) by which the distribution should be plotted. label provides a user defined label for the variable on Y axis color can be used to input your choice of color to the bars grid Can be used to remove the background grid seen in each plot
  • 35. Histogram 35 telecom_data.Calls.hist(bins=12,grid=False) #Histogram hist() in base Python yields a histogram bins specifies the width of each bar label provides a user defined label for the variable on X and Y axis color can be used to input your choice of color to the bars
  • 37. Stem and Leaf Plot 37 plt.stem(telecom_data.Calls) #Stem and Leaf Plot using matplotlib stem() in matplotlib yields a stem and leaf chart telecom_data.Ca lls specifies vector (variable) for which the stemplot needs to be plotted
  • 38. Heat Map 38 plt.show; ax=sns.heatmap(agg);ax.set(xlabel='Gender', ylabel='Age Group',title='Heatmap for Number of Calls by Age & Gender') # Heat Map ax Axes object returned by seaborn heatmap() Seaborn method for creating a heatmap ax.set Sets text data in the graph linewidths Adds lines between each cell. Default is zero. #Importing data and aggregating calls by gender and age group agg=pd.pivot_table(telecom_data, index=['Age_Group'], columns=['Gender'], values=['Calls'], aggfunc='sum')

Editor's Notes