SlideShare a Scribd company logo
Merge Multiple files into single
dataframe using R
Yogesh Khandelwal
Problem Description
• The zip file contains 332 comma-separated-value (CSV) files
containing pollution monitoring data for fine particulate
matter (PM) air pollution at 332 locations in the United States.
Each file contains data from a single monitor and the ID
number for each monitor is contained in the file name. For
example, data for monitor 200 is contained in the file
"200.csv".
• Data Source: http://spark-
public.s3.amazonaws.com/compdata/data/specdata.zip
Merge Multiple CSV in single data frame using R
Variable Name
Variables in file
• Date: the date of observation in YYYY-MM-DD format
(year-month-day) ,Datatype:factor
• sulfate: the level of sulfate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• nitrate: the level of nitrate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• Id:location id,Datatype:int
Before we start we should know
• Functions in R
• How to merge data files
Functions in R
Functions in R
Functions are created using the function() directive and are
stored as R objects just like anything else. In particular, they are R
objects of class “function”.
f <- function(<arguments>) {
## Do something interesting
}
• Functions in R are “first class objects”, which means that they can
be treated much like any other R object. Importantly,
• Functions can be passed as arguments to other functions.
• Functions can be nested, so that you can define a function
inside of another function
• The return value of a function is the last expression in the function
• body to be evaluated.
Function contd..
• For ex:
Function name
Function defination
Function call
Our objective
• How we can merge no. of files into single data
frame?
• How to apply same function to different files
in efficient way?
How to merge two different files?
• No.of options available like
1. Use merge() function
2. Use rbind(),cbind() etc.
How to merge no.of files as a single
data frame
• Approach 1
files<-list.files("specdata",full.names = TRUE)
dat<-NULL
for(i in 1:332)
{
dat<-rbind(dat,read.csv(files[i]))
}
• Further we can run various command on merged file object as per our need some are like:
1. Str(dat)
2. Head(dat)
3. Tail(dat) etc.
Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE,
the file names (rather than paths) are returned.
How to handle missing value in R ?
contd.
• In R, NA is used to represent any value that is 'not available' or 'missing' (in
the | statistical sense)
• Missing values play an important role in statistics and data analysis. Often,
missing values must not be ignored, but rather they should be carefully
studied to see if there's an underlying pattern or cause for their
missingness.
• For ex:
• X<-c(1,2,NA,4)
• Y<-c(NA,2,3,1)
• >x+y
• [1] NA 4 NA 5
• Multiple options are available in R to handle NA values like
• Is.NA()
• Set na.rm=TRUE as a function argument
> mean(X) [1] NA
> mean(X,na.rm = TRUE) [1] 2.333333
Apply what we learn to our dataset
Function defination
Function call
pollutantmean('specdata','nitrate',1:10)
[1] 0.7976266
Thank You!!

More Related Content

What's hot (20)

PPT
Lesson 1 matrix
Melvy Dela Torre
 
PPT
Adding integers
bweldon
 
PPTX
Algebra Tiles
Mr. Hobbs
 
PPT
Set concepts
Malti Aswal
 
PDF
Set Theory
Birinder Singh Gulati
 
PPTX
12. Angle of Elevation & Depression.pptx
BebeannBuar1
 
PPTX
Simple probability
06426345
 
PPTX
Math 6 - Division of Integers
menchreo
 
PPTX
Operations with Integers.pptx
LerioCostin2
 
PPT
Introduction to sets
Sonia Pahuja
 
PPTX
Descriptive Statistics in R.pptx
Ramakrishna Reddy Bijjam
 
PPT
Operations on Radicals.ppt
ssuser2b0f3a
 
PPTX
5th grade word problems and fractions pd
Laura Chambless
 
PPT
5 2 triangle inequality theorem
lothomas
 
PPTX
Pyramid
Shze Hwa Lee
 
PPTX
Sets and venn diagrams
Farhana Shaheen
 
PPTX
Complement of a set
MartinGeraldine
 
PPTX
DEFINED AND UNDEFINED TERMS IN GEOMETRY.pptx
XiVitrez1
 
PPTX
Polygons
skellyreyes
 
PDF
Intro to Discrete Mathematics
asad faraz
 
Lesson 1 matrix
Melvy Dela Torre
 
Adding integers
bweldon
 
Algebra Tiles
Mr. Hobbs
 
Set concepts
Malti Aswal
 
12. Angle of Elevation & Depression.pptx
BebeannBuar1
 
Simple probability
06426345
 
Math 6 - Division of Integers
menchreo
 
Operations with Integers.pptx
LerioCostin2
 
Introduction to sets
Sonia Pahuja
 
Descriptive Statistics in R.pptx
Ramakrishna Reddy Bijjam
 
Operations on Radicals.ppt
ssuser2b0f3a
 
5th grade word problems and fractions pd
Laura Chambless
 
5 2 triangle inequality theorem
lothomas
 
Pyramid
Shze Hwa Lee
 
Sets and venn diagrams
Farhana Shaheen
 
Complement of a set
MartinGeraldine
 
DEFINED AND UNDEFINED TERMS IN GEOMETRY.pptx
XiVitrez1
 
Polygons
skellyreyes
 
Intro to Discrete Mathematics
asad faraz
 

Similar to Merge Multiple CSV in single data frame using R (20)

PDF
R interview questions
Ajay Tech
 
PPTX
description description description description
ibrahimradwan14
 
PPTX
Data Handling in R language basic concepts.pptx
gameyug28
 
PPTX
Data Cleaning in R language basic concepts.pptx
gameyug28
 
PPTX
data frames.pptx
RacksaviR
 
PDF
Basic R Data Manipulation
Chu An
 
PPTX
R Functions in Dataframe.pptx
Ramakrishna Reddy Bijjam
 
PPTX
3. R- list and data frame
krishna singh
 
PDF
Data import-cheatsheet
Dieudonne Nahigombeye
 
PDF
R_CheatSheet.pdf
MariappanR3
 
PPTX
R language introduction
Shashwat Shriparv
 
PPT
R for Statistical Computing
Mohammed El Rafie Tarabay
 
PPTX
ml ppt.pptx
DhinaKaran546663
 
PPTX
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
pathanthecreator1
 
PDF
R Cheat Sheet – Data Management
Dr. Volkan OBAN
 
PPTX
Moving Data to and From R
Syracuse University
 
PDF
9. R data-import data-export
ExternalEvents
 
PPTX
Data Exploration in R.pptx
Ramakrishna Reddy Bijjam
 
PDF
R data-import, data-export
FAO
 
PDF
R gráfico
stryper1968
 
R interview questions
Ajay Tech
 
description description description description
ibrahimradwan14
 
Data Handling in R language basic concepts.pptx
gameyug28
 
Data Cleaning in R language basic concepts.pptx
gameyug28
 
data frames.pptx
RacksaviR
 
Basic R Data Manipulation
Chu An
 
R Functions in Dataframe.pptx
Ramakrishna Reddy Bijjam
 
3. R- list and data frame
krishna singh
 
Data import-cheatsheet
Dieudonne Nahigombeye
 
R_CheatSheet.pdf
MariappanR3
 
R language introduction
Shashwat Shriparv
 
R for Statistical Computing
Mohammed El Rafie Tarabay
 
ml ppt.pptx
DhinaKaran546663
 
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
pathanthecreator1
 
R Cheat Sheet – Data Management
Dr. Volkan OBAN
 
Moving Data to and From R
Syracuse University
 
9. R data-import data-export
ExternalEvents
 
Data Exploration in R.pptx
Ramakrishna Reddy Bijjam
 
R data-import, data-export
FAO
 
R gráfico
stryper1968
 

Recently uploaded (20)

PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
 
PPTX
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
DOCX
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
 
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
PPTX
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
PPTX
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PPSX
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PPTX
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
Starbucks in the Indian market through its joint venture.
sales480687
 
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
Data science AI/Ml basics to learn .pdf
deokhushi04
 
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 

Merge Multiple CSV in single data frame using R

  • 1. Merge Multiple files into single dataframe using R Yogesh Khandelwal
  • 2. Problem Description • The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". • Data Source: http://spark- public.s3.amazonaws.com/compdata/data/specdata.zip
  • 5. Variables in file • Date: the date of observation in YYYY-MM-DD format (year-month-day) ,Datatype:factor • sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • Id:location id,Datatype:int
  • 6. Before we start we should know • Functions in R • How to merge data files
  • 8. Functions in R Functions are created using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. f <- function(<arguments>) { ## Do something interesting } • Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly, • Functions can be passed as arguments to other functions. • Functions can be nested, so that you can define a function inside of another function • The return value of a function is the last expression in the function • body to be evaluated.
  • 9. Function contd.. • For ex: Function name Function defination Function call
  • 10. Our objective • How we can merge no. of files into single data frame? • How to apply same function to different files in efficient way?
  • 11. How to merge two different files?
  • 12. • No.of options available like 1. Use merge() function 2. Use rbind(),cbind() etc.
  • 13. How to merge no.of files as a single data frame • Approach 1 files<-list.files("specdata",full.names = TRUE) dat<-NULL for(i in 1:332) { dat<-rbind(dat,read.csv(files[i])) } • Further we can run various command on merged file object as per our need some are like: 1. Str(dat) 2. Head(dat) 3. Tail(dat) etc. Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned.
  • 14. How to handle missing value in R ?
  • 15. contd. • In R, NA is used to represent any value that is 'not available' or 'missing' (in the | statistical sense) • Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness. • For ex: • X<-c(1,2,NA,4) • Y<-c(NA,2,3,1) • >x+y • [1] NA 4 NA 5 • Multiple options are available in R to handle NA values like • Is.NA() • Set na.rm=TRUE as a function argument > mean(X) [1] NA > mean(X,na.rm = TRUE) [1] 2.333333
  • 16. Apply what we learn to our dataset Function defination

Editor's Notes

  • #17: lapply() applies a given function for each element in a list,so there will be several function calls. do.call() applies a given function to the list as a whole,so there is only one function call.