SlideShare a Scribd company logo
DATA SCIENCE WITH
PYTHON
PANDAS
Enrollment: 2302031030074
D2D BTECH-IT4
Kevin Patel
BATCH-1
OVERVIEW
➢ Series
➢ DataFrame
➢ Pandas for Time Series
➢ Merging, Joining, Concatenate
➢ Importing data
➢ A simple example
> the python commands will be written here
# this is a comment
2
SET IT UP!
➢ Open a Terminal
➢ Start ipython notebook
➢ Open ipython notebook web-page (localhost:8888)
➢ Open ‘tutorial_pandas.ipynb’
$ ipython notebook
3
PANDAS LIBRARY
The Pandas library provides useful functions to:
➢ Represent and manage data structures
➢ Ease the data processing
➢ With built-in functions to manage (Time) Series
It uses numpy, scipy, matplotlib functions
Manual PDF ONLINE
> import pandas as pd
# to import the pandas library
> pd.__version__
# get the version of the library (0.16)
4
SERIES: DATA STRUCTURE
➢ Unidimensional data structure
➢ Indexing
· automatic
· manual
· ! not univocally !
> data = [1,2,3,4,5]
> s = pd.Series(data)
> s
> s.index
> s = pd.Series(data, index = ['a','b','c','d','d'])
> s['d']
> s[[4]]
# try with: s = pd.Series(data, index = [1,2,3,4,4])
> s.index = [1,2,3,4,5]
5
SERIES: BASIC OPERATIONS
➢ Mathematically, Series are vectors
➢ Compatible with numpy functions
➢ Some basic functions available as pandas methods
➢ Plotting (based on matplotlib)
> import numpy as np
# import numpy to get some mathematical functions
> random_data = np.random.uniform(size=10)
> s = pd.Series(random_data)
> s+1
# try other mathematical functions: **2, *2, exp(s), …
> s.apply(np.log)
> s.mean()
# try other built-in functions. Use 'tab' to discover …
> s.plot() 6
DATAFRAME: DATA STRUCTURE
➢ Bidimensional data structure
➢ A dictionary of Series, with shared index
→ each column is a Series
➢ Indexed, cols and rows (not univocally)
> s1 = pd.Series([1,2,3,4,5], index = list('abcde'))
> data = {'one':s1**s1, 'two':s1+1}
> df = pd.DataFrame(data)
> df.columns
> df.index
# index, columns: assign name (if not existing), or select
> s2 = pd.Series([1,2,3,4,10], index = list('edcbh'))
> df['three'] = s2
# try changing s2 indexes,
7
DATAFRAME: ACCESSING VALUES - 1
➢ keep calm
➢ select columns and rows to obtain Series
➢ query function to select rows
> data = np.random.randn(5,2)
> df = pd.DataFrame(data, index = list('abcde'),
columns = ['one','two'])
> col = df.one
> row = df.xs('b')
# type(col) and type(row) is Series,you know how to manage ...
> df.query('one > 0')
> df.index = [1,2,3,4,5]
> df.query('1 < index < 4')
8
DATAFRAME: ACCESSING VALUES - 2
➢ … madness continues
➢ ix access by index:
works on rows, AND on columns
➢ iloc access by position
➢ you can extract Series
➢ ! define a strategy, and be careful with indexes !
> data = np.random.randn(5,2)
> df = pd.DataFrame(data, index = list('abcde'),
columns = ['one','two'])
> df.ix['a']
# try df.ix[['a', 'b'], 'one'], types
> df.iloc[1,1]
# try df.iloc[1:,1], types?
> df.ix[1:, 'one']
# works as well...
9
DATAFRAME: BASIC OPERATIONS
➢ DataFrames can be considered as Matrixes
➢ Compatible with numpy functions
➢ Some basic functions available as pandas methods
· axis = 0: column-wise
· axis = 1: row-wise
➢ self.apply() function
➢ Plotting (based on matplotlib)
> df_copy = df
# it is a link! Use df_copy = df.copy()
> df * df
> np.exp(df)
> df.mean()
# try df.mean(axis = 1)
# try type(df.mean())
> df.apply(np.mean)
> df.plot()
# try df.transpose().plot()
1
PANDAS FOR TIME SERIES
➢ Used in financial data analysis, we will use for signals
➢ TimeSeries: Series when the index is a timestamp
➢ Pandas functions for Time Series (here)
➢ Useful to select a portion of signal (windowing)
· query method: not available on Series → convert to a DataFrame
> times = np.arange(0, 60, 0.5)
> data = np.random.randn(len(times))
> ts = pd.Series(data, index = times)
> ts.plot()
> epoch = ts[(ts.index > 10) & (ts.index <=20)]
# ts.plot()
# epoch.plot()
> ts_df = pd.DataFrame(ts)
> ts_df.query('10 < index <=20')
1
FEW NOTES ABOUT TIMESTAMPS
➢ Absolute timestamps VS Relative timestamps
· Absolute timestamp is important for synchronization
➢ Unix Timestamps VS date/time representation (converter)
· Unix Timestamp: reference for signal processing
· 0000000000.000000 = 1970, 1st January, 00:00:00.000000
· date/time: easier to understand
· unix timestamp: easier to select/manage
➢ Pandas functions to manage Timestamps
> import datetime
> import time
> now_dt = datetime.datetime.now()
# now_dt = time.ctime()
> now_ut = time.time()
# find out how to convert datetime <--> timestamp
> ts.index = ts.index + now_ut
> ts.index = pd.to_datetime(ts.index, unit = 's')
# ts[(ts.index > -write date time here-)]
> ts.plot()
1
MERGE, JOIN, CONCATENATE
➢ Simple examples here (concatenate, append)
➢ SQL-like functions (join, merge)
➢ Refer to chapter 17 of Pandas Manual
➢ Cookbooks here
> df1 = pd.DataFrame(np.random.randn(6, 3),
columns=['A', 'B', 'C'])
> df2 = pd.DataFrame(np.random.randn(6, 3),
columns=['D', 'E', 'F'])
> df3 = df1.copy()
> df = pd.concat([df1, df2])
> df = df1.append(df2)
# try df = df1.append(df3)
# try df = df1.append(df3, ignore_index = True)
1
IMPORTING DATA
➢ data_df = pd.read_table(FILE,
sep = ',',
skiprows = 5,
header = True,
usecols = [0,1,3],
index_col = 0,
nrows=10)
> FILE = '/path/to/sample_datafile.txt'
> data_df = pd.read_table(...)
# try header = 0, names = ['col1','col2', 'col3']
and adjust skiprows
# try nrows=None
> data_df.plot()
> data = pd.read_table(FILE, sep = ',',
skiprows=[0,1,2,3,4,5,7], header=2, index_col=0)
# empirical solution
> data.plot() 1
SIMPLE FEATURE EXTRACTION EXAMPLE
> import pandas as pd
> WINLEN = 1 # length of window
> WINSTEP = 0.5 # shifting step
> data = pd.read_table(..., usecols=[0,1]) # import data
> t_start = data.index[0] # start first window
> t_end = t_start + WINLEN # end first window
> feat_df = pd.DataFrame() # initialize features df
> while (t_end < data.index[-1]): # cycle
> data_curr = data.query(str(t_start)+'<=index<'+str(t_end))
# extract portion of the signal
> mean_ = data_curr.mean()[0] # extract mean; why [0]?
> sd_ = data_curr.std()[0] # extract …
> feat_row = pd.DataFrame({'mean':mean_, 'sd':sd_},
index=[t_start]) # merge features
> feat_df = feat_df.append(feat_row) # append to features df
1

More Related Content

Similar to Python Panda Library for python programming.ppt (20)

PPTX
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
PDF
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Zalando Technology
 
PDF
Make Sure Your Applications Crash
Moshe Zadka
 
ODP
Introduction to R
agnonchik
 
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
PDF
Python Interview Questions PDF By ScholarHat
Scholarhat
 
PPTX
introductiontopandas- for 190615082420.pptx
rahulborate13
 
PPTX
R programming
Pramodkumar Jha
 
PPTX
BA lab1.pptx
sherifsalem24
 
PPTX
python pandas ppt.pptx123456789777777777
nischayagarwal008
 
PPTX
dataframe_operations and various functions
JayanthiM19
 
PPTX
Python Pandas.pptx
SujayaBiju
 
PPTX
pandas directories on the python language.pptx
SumitMajukar
 
PPTX
DataFrame Creation.pptx
SarveshMariappan
 
PPTX
Five
Łukasz Langa
 
PDF
Spark Dataframe - Mr. Jyotiska
Sigmoid
 
ODP
Data Analysis in Python
Richard Herrell
 
PPTX
XII IP New PYTHN Python Pandas 2020-21.pptx
lekha572836
 
PPTX
Pandas-(Ziad).pptx
Sivam Chinna
 
PDF
2 pandasbasic
pramod naik
 
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Zalando Technology
 
Make Sure Your Applications Crash
Moshe Zadka
 
Introduction to R
agnonchik
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Python Interview Questions PDF By ScholarHat
Scholarhat
 
introductiontopandas- for 190615082420.pptx
rahulborate13
 
R programming
Pramodkumar Jha
 
BA lab1.pptx
sherifsalem24
 
python pandas ppt.pptx123456789777777777
nischayagarwal008
 
dataframe_operations and various functions
JayanthiM19
 
Python Pandas.pptx
SujayaBiju
 
pandas directories on the python language.pptx
SumitMajukar
 
DataFrame Creation.pptx
SarveshMariappan
 
Spark Dataframe - Mr. Jyotiska
Sigmoid
 
Data Analysis in Python
Richard Herrell
 
XII IP New PYTHN Python Pandas 2020-21.pptx
lekha572836
 
Pandas-(Ziad).pptx
Sivam Chinna
 
2 pandasbasic
pramod naik
 

Recently uploaded (20)

PPTX
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
PPTX
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
PDF
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
PPTX
Project 4 PART 1 AI Assistant Vocational Education
barmanjit380
 
PPTX
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 
PPTX
Martyrs of Ireland - who kept the faith of St. Patrick.pptx
Martin M Flynn
 
PDF
VCE Literature Section A Exam Response Guide
jpinnuck
 
PDF
Rapid Mathematics Assessment Score sheet for all Grade levels
DessaCletSantos
 
PDF
Nanotechnology and Functional Foods Effective Delivery of Bioactive Ingredien...
rmswlwcxai8321
 
PPTX
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
PDF
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
PDF
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
PPTX
ENGLISH -PPT- Week1 Quarter1 -day-1.pptx
garcialhavz
 
PPTX
Photo chemistry Power Point Presentation
mprpgcwa2024
 
PPTX
Elo the Hero is an story about a young boy who became hero.
TeacherEmily1
 
PPT
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
PDF
Andreas Schleicher_Teaching Compass_Education 2040.pdf
EduSkills OECD
 
PPTX
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
PDF
Supply Chain Security A Comprehensive Approach 1st Edition Arthur G. Arway
rxgnika452
 
PPTX
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
 
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
Project 4 PART 1 AI Assistant Vocational Education
barmanjit380
 
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 
Martyrs of Ireland - who kept the faith of St. Patrick.pptx
Martin M Flynn
 
VCE Literature Section A Exam Response Guide
jpinnuck
 
Rapid Mathematics Assessment Score sheet for all Grade levels
DessaCletSantos
 
Nanotechnology and Functional Foods Effective Delivery of Bioactive Ingredien...
rmswlwcxai8321
 
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
ENGLISH -PPT- Week1 Quarter1 -day-1.pptx
garcialhavz
 
Photo chemistry Power Point Presentation
mprpgcwa2024
 
Elo the Hero is an story about a young boy who became hero.
TeacherEmily1
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
Andreas Schleicher_Teaching Compass_Education 2040.pdf
EduSkills OECD
 
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
Supply Chain Security A Comprehensive Approach 1st Edition Arthur G. Arway
rxgnika452
 
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
Ad

Python Panda Library for python programming.ppt

  • 1. DATA SCIENCE WITH PYTHON PANDAS Enrollment: 2302031030074 D2D BTECH-IT4 Kevin Patel BATCH-1
  • 2. OVERVIEW ➢ Series ➢ DataFrame ➢ Pandas for Time Series ➢ Merging, Joining, Concatenate ➢ Importing data ➢ A simple example > the python commands will be written here # this is a comment 2
  • 3. SET IT UP! ➢ Open a Terminal ➢ Start ipython notebook ➢ Open ipython notebook web-page (localhost:8888) ➢ Open ‘tutorial_pandas.ipynb’ $ ipython notebook 3
  • 4. PANDAS LIBRARY The Pandas library provides useful functions to: ➢ Represent and manage data structures ➢ Ease the data processing ➢ With built-in functions to manage (Time) Series It uses numpy, scipy, matplotlib functions Manual PDF ONLINE > import pandas as pd # to import the pandas library > pd.__version__ # get the version of the library (0.16) 4
  • 5. SERIES: DATA STRUCTURE ➢ Unidimensional data structure ➢ Indexing · automatic · manual · ! not univocally ! > data = [1,2,3,4,5] > s = pd.Series(data) > s > s.index > s = pd.Series(data, index = ['a','b','c','d','d']) > s['d'] > s[[4]] # try with: s = pd.Series(data, index = [1,2,3,4,4]) > s.index = [1,2,3,4,5] 5
  • 6. SERIES: BASIC OPERATIONS ➢ Mathematically, Series are vectors ➢ Compatible with numpy functions ➢ Some basic functions available as pandas methods ➢ Plotting (based on matplotlib) > import numpy as np # import numpy to get some mathematical functions > random_data = np.random.uniform(size=10) > s = pd.Series(random_data) > s+1 # try other mathematical functions: **2, *2, exp(s), … > s.apply(np.log) > s.mean() # try other built-in functions. Use 'tab' to discover … > s.plot() 6
  • 7. DATAFRAME: DATA STRUCTURE ➢ Bidimensional data structure ➢ A dictionary of Series, with shared index → each column is a Series ➢ Indexed, cols and rows (not univocally) > s1 = pd.Series([1,2,3,4,5], index = list('abcde')) > data = {'one':s1**s1, 'two':s1+1} > df = pd.DataFrame(data) > df.columns > df.index # index, columns: assign name (if not existing), or select > s2 = pd.Series([1,2,3,4,10], index = list('edcbh')) > df['three'] = s2 # try changing s2 indexes, 7
  • 8. DATAFRAME: ACCESSING VALUES - 1 ➢ keep calm ➢ select columns and rows to obtain Series ➢ query function to select rows > data = np.random.randn(5,2) > df = pd.DataFrame(data, index = list('abcde'), columns = ['one','two']) > col = df.one > row = df.xs('b') # type(col) and type(row) is Series,you know how to manage ... > df.query('one > 0') > df.index = [1,2,3,4,5] > df.query('1 < index < 4') 8
  • 9. DATAFRAME: ACCESSING VALUES - 2 ➢ … madness continues ➢ ix access by index: works on rows, AND on columns ➢ iloc access by position ➢ you can extract Series ➢ ! define a strategy, and be careful with indexes ! > data = np.random.randn(5,2) > df = pd.DataFrame(data, index = list('abcde'), columns = ['one','two']) > df.ix['a'] # try df.ix[['a', 'b'], 'one'], types > df.iloc[1,1] # try df.iloc[1:,1], types? > df.ix[1:, 'one'] # works as well... 9
  • 10. DATAFRAME: BASIC OPERATIONS ➢ DataFrames can be considered as Matrixes ➢ Compatible with numpy functions ➢ Some basic functions available as pandas methods · axis = 0: column-wise · axis = 1: row-wise ➢ self.apply() function ➢ Plotting (based on matplotlib) > df_copy = df # it is a link! Use df_copy = df.copy() > df * df > np.exp(df) > df.mean() # try df.mean(axis = 1) # try type(df.mean()) > df.apply(np.mean) > df.plot() # try df.transpose().plot() 1
  • 11. PANDAS FOR TIME SERIES ➢ Used in financial data analysis, we will use for signals ➢ TimeSeries: Series when the index is a timestamp ➢ Pandas functions for Time Series (here) ➢ Useful to select a portion of signal (windowing) · query method: not available on Series → convert to a DataFrame > times = np.arange(0, 60, 0.5) > data = np.random.randn(len(times)) > ts = pd.Series(data, index = times) > ts.plot() > epoch = ts[(ts.index > 10) & (ts.index <=20)] # ts.plot() # epoch.plot() > ts_df = pd.DataFrame(ts) > ts_df.query('10 < index <=20') 1
  • 12. FEW NOTES ABOUT TIMESTAMPS ➢ Absolute timestamps VS Relative timestamps · Absolute timestamp is important for synchronization ➢ Unix Timestamps VS date/time representation (converter) · Unix Timestamp: reference for signal processing · 0000000000.000000 = 1970, 1st January, 00:00:00.000000 · date/time: easier to understand · unix timestamp: easier to select/manage ➢ Pandas functions to manage Timestamps > import datetime > import time > now_dt = datetime.datetime.now() # now_dt = time.ctime() > now_ut = time.time() # find out how to convert datetime <--> timestamp > ts.index = ts.index + now_ut > ts.index = pd.to_datetime(ts.index, unit = 's') # ts[(ts.index > -write date time here-)] > ts.plot() 1
  • 13. MERGE, JOIN, CONCATENATE ➢ Simple examples here (concatenate, append) ➢ SQL-like functions (join, merge) ➢ Refer to chapter 17 of Pandas Manual ➢ Cookbooks here > df1 = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C']) > df2 = pd.DataFrame(np.random.randn(6, 3), columns=['D', 'E', 'F']) > df3 = df1.copy() > df = pd.concat([df1, df2]) > df = df1.append(df2) # try df = df1.append(df3) # try df = df1.append(df3, ignore_index = True) 1
  • 14. IMPORTING DATA ➢ data_df = pd.read_table(FILE, sep = ',', skiprows = 5, header = True, usecols = [0,1,3], index_col = 0, nrows=10) > FILE = '/path/to/sample_datafile.txt' > data_df = pd.read_table(...) # try header = 0, names = ['col1','col2', 'col3'] and adjust skiprows # try nrows=None > data_df.plot() > data = pd.read_table(FILE, sep = ',', skiprows=[0,1,2,3,4,5,7], header=2, index_col=0) # empirical solution > data.plot() 1
  • 15. SIMPLE FEATURE EXTRACTION EXAMPLE > import pandas as pd > WINLEN = 1 # length of window > WINSTEP = 0.5 # shifting step > data = pd.read_table(..., usecols=[0,1]) # import data > t_start = data.index[0] # start first window > t_end = t_start + WINLEN # end first window > feat_df = pd.DataFrame() # initialize features df > while (t_end < data.index[-1]): # cycle > data_curr = data.query(str(t_start)+'<=index<'+str(t_end)) # extract portion of the signal > mean_ = data_curr.mean()[0] # extract mean; why [0]? > sd_ = data_curr.std()[0] # extract … > feat_row = pd.DataFrame({'mean':mean_, 'sd':sd_}, index=[t_start]) # merge features > feat_df = feat_df.append(feat_row) # append to features df 1

Editor's Notes

  • #4: Shift-tab for info about the function (2 times for help)