SlideShare a Scribd company logo
Programming for Data
Analysis
Week 2
Dr. Ferdin Joe John Joseph
Faculty of Information Technology
Thai – Nichi Institute of Technology, Bangkok
Today’s lesson
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
2
• Merging
• Concatenating
• Reshaping
• Laboratory
Merging
• Used in pandas to combine data from two sources
• Sources can be from same format or different
• csv and csv, csv and json, json and xml and a concoction of all these
• Similar to numpy array manipulation but effective with pandas
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
3
Function Used
concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
4
Syntax of concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
5
Syntax of concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
6
•objs : a sequence or mapping of Series or DataFrame objects. If a dict is passed, the sorted keys will be used as the keys argument,
unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all
None in which case a ValueError will be raised.
•axis : {0, 1, …}, default 0. The axis to concatenate along.
Syntax of concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
7
•join : {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es).
Outer for union and inner for intersection.
•ignore_index : boolean, default False. If True, do not use the index values on the
concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are
concatenating objects where the concatenation axis does not have meaningful indexing
information. Note the index values on the other axes are still respected in the join.
Syntax of concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
8
•keys : sequence, default None. Construct hierarchical index using the passed
keys as the outermost level. If multiple levels passed, should contain tuples.
•levels : list of sequences, default None. Specific levels (unique values) to use
for constructing a MultiIndex. Otherwise they will be inferred from the keys.
•names : list, default None. Names for the levels in the resulting hierarchical index.
Syntax of concat()
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
9
•verify_integrity : boolean, default False. Check whether the new concatenated
axis contains duplicates. This can be very expensive relative to the
actual data concatenation.
•copy : boolean, default True. If False, do not copy data unnecessarily.
Example
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
10
Example
• Available Data frames: df1, df2 and df3
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
11
Creation of arrays
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
12
Creation of arrays
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
13
Creation of arrays
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
14
Creation of Arrays
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
15
Frames
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
16
Concatenation
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
17
Concatenation views
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
18
Setting other axes
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
19
Setting other axes
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
20
Setting other axes
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
21
Inner Join
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
22
Outer Join
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
23
Append()
• Alternative to concat()
• Combines two dataframes in first index only
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
24
Append
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
25
Sort
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
26
Append multiple dataframes
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
27
Varying dimension concatenation
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
28
Appending rows to a dataframe
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
29
How it works with csv, json and xml
• Convert these files to pandas dataframe object
• Play with concat or append
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
30
DSA 207 - Merging
• Create two arrays A1 and A2 and convert them into pandas data
frame. Merge the data frames and store in A2. Display A2 before and
after merging
• Merge the given csv files together using pandas and display the first
10 data and last 15 data.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
31

More Related Content

Similar to Week2: Programming for Data Analysis (20)

PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
PDF
Data Wrangling Week 4
Ferdin Joe John Joseph PhD
 
PPTX
Pandas in Programming (python) presentation
AhmadAbdullah244742
 
PPTX
Pandas in Programming (Python) Presentation
AhmadAbdullah244742
 
PDF
pandas-221217084954-937bb582.pdf
scorsam1
 
PPTX
Pandas.pptx
Govardhan Bhavani
 
PDF
Data wrangling week3
Ferdin Joe John Joseph PhD
 
PPTX
dataframe_operations and various functions
JayanthiM19
 
PDF
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
PPTX
Data Science ppt on dataframe operations.pptx
CoolGamer16
 
PPT
Python Panda Library for python programming.ppt
tejaskumbhani111
 
PPTX
Pandas csv
Devashish Kumar
 
PDF
NUS-ISS Learning Day 2018- Pandas ate my data
NUS-ISS
 
PDF
Data wrangling week 6
Ferdin Joe John Joseph PhD
 
PDF
lecture14DATASCIENCE AND MACHINE LER.pdf
smartashammari
 
PDF
Data wrangling week1
Ferdin Joe John Joseph PhD
 
PPT
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
sagarrathore52204
 
PPTX
Handling Missing Data for Data Analysis.pptx
Ramakrishna Reddy Bijjam
 
PDF
330 Pandas Interview Questions and Answers MCQ Format 1st Edition Manish Salunke
gaivaseugi
 
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
Data Wrangling Week 4
Ferdin Joe John Joseph PhD
 
Pandas in Programming (python) presentation
AhmadAbdullah244742
 
Pandas in Programming (Python) Presentation
AhmadAbdullah244742
 
pandas-221217084954-937bb582.pdf
scorsam1
 
Pandas.pptx
Govardhan Bhavani
 
Data wrangling week3
Ferdin Joe John Joseph PhD
 
dataframe_operations and various functions
JayanthiM19
 
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Data Science ppt on dataframe operations.pptx
CoolGamer16
 
Python Panda Library for python programming.ppt
tejaskumbhani111
 
Pandas csv
Devashish Kumar
 
NUS-ISS Learning Day 2018- Pandas ate my data
NUS-ISS
 
Data wrangling week 6
Ferdin Joe John Joseph PhD
 
lecture14DATASCIENCE AND MACHINE LER.pdf
smartashammari
 
Data wrangling week1
Ferdin Joe John Joseph PhD
 
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
sagarrathore52204
 
Handling Missing Data for Data Analysis.pptx
Ramakrishna Reddy Bijjam
 
330 Pandas Interview Questions and Answers MCQ Format 1st Edition Manish Salunke
gaivaseugi
 

More from Ferdin Joe John Joseph PhD (20)

PDF
Invited Talk DGTiCon 2022
Ferdin Joe John Joseph PhD
 
PDF
Week 12: Cloud AI- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 11: Cloud Native- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 10: Cloud Security- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Ferdin Joe John Joseph PhD
 
PDF
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Ferdin Joe John Joseph PhD
 
PDF
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Ferdin Joe John Joseph PhD
 
PDF
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Ferdin Joe John Joseph PhD
 
PDF
Hadoop in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
PDF
Cloud Computing Essentials in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
PDF
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
PDF
Week 11: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
PDF
Week 10: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
PDF
Week 8: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
PDF
Deep learning - Introduction
Ferdin Joe John Joseph PhD
 
Invited Talk DGTiCon 2022
Ferdin Joe John Joseph PhD
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Ferdin Joe John Joseph PhD
 
Hadoop in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Cloud Computing Essentials in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
Week 11: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Week 10: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Week 8: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Deep learning - Introduction
Ferdin Joe John Joseph PhD
 
Ad

Recently uploaded (20)

PPTX
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
PDF
SaleServicereport and SaleServicereport
2251330007
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
DOCX
ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMEN...
JoemarAgbayani1
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
DOCX
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PPTX
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PPTX
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
SaleServicereport and SaleServicereport
2251330007
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMENT AS OF MAY 15 RCT ACCOMPLISHMEN...
JoemarAgbayani1
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
big data eco system fundamentals of data science
arivukarasi
 
Ad

Week2: Programming for Data Analysis

  • 1. Programming for Data Analysis Week 2 Dr. Ferdin Joe John Joseph Faculty of Information Technology Thai – Nichi Institute of Technology, Bangkok
  • 2. Today’s lesson Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 2 • Merging • Concatenating • Reshaping • Laboratory
  • 3. Merging • Used in pandas to combine data from two sources • Sources can be from same format or different • csv and csv, csv and json, json and xml and a concoction of all these • Similar to numpy array manipulation but effective with pandas Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 3
  • 4. Function Used concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 4
  • 5. Syntax of concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 5
  • 6. Syntax of concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 6 •objs : a sequence or mapping of Series or DataFrame objects. If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised. •axis : {0, 1, …}, default 0. The axis to concatenate along.
  • 7. Syntax of concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 7 •join : {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection. •ignore_index : boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
  • 8. Syntax of concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 8 •keys : sequence, default None. Construct hierarchical index using the passed keys as the outermost level. If multiple levels passed, should contain tuples. •levels : list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys. •names : list, default None. Names for the levels in the resulting hierarchical index.
  • 9. Syntax of concat() Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 9 •verify_integrity : boolean, default False. Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation. •copy : boolean, default True. If False, do not copy data unnecessarily.
  • 10. Example Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 10
  • 11. Example • Available Data frames: df1, df2 and df3 Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 11
  • 12. Creation of arrays Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 12
  • 13. Creation of arrays Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 13
  • 14. Creation of arrays Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 14
  • 15. Creation of Arrays Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 15
  • 16. Frames Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 16
  • 17. Concatenation Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 17
  • 18. Concatenation views Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 18
  • 19. Setting other axes Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 19
  • 20. Setting other axes Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 20
  • 21. Setting other axes Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 21
  • 22. Inner Join Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 22
  • 23. Outer Join Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 23
  • 24. Append() • Alternative to concat() • Combines two dataframes in first index only Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 24
  • 25. Append Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 25
  • 26. Sort Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 26
  • 27. Append multiple dataframes Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 27
  • 28. Varying dimension concatenation Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 28
  • 29. Appending rows to a dataframe Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 29
  • 30. How it works with csv, json and xml • Convert these files to pandas dataframe object • Play with concat or append Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 30
  • 31. DSA 207 - Merging • Create two arrays A1 and A2 and convert them into pandas data frame. Merge the data frames and store in A2. Display A2 before and after merging • Merge the given csv files together using pandas and display the first 10 data and last 15 data. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 31