SlideShare a Scribd company logo
CSV Files
WHAT IS A CSV FILE ?
• CSV files are used to store a large number of variables – or data.
• Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
• The CSV module is a built-in function that allows Python to parse these types of files.
• The text inside a CSV file is laid out in rows, and each of those has columns, all separated
by commas.
• Every line in the file is a row in the spreadsheet, while the commas are used to define and
separate cells.
CSV MODULE
• The csv module is useful for working with data exported from
spreadsheets and databases into text files formatted with fields and
records, commonly referred to as comma-separated value (CSV) format
because commas are often used to separate the fields in a record.
• If you want to import or export spreadsheets and databases for use in
the Python interpreter, you must rely on the CSV module, or Comma
Separated Values format.
STEPS
• First, save the excel file with ‘.csv’ extension .
• Second, save the csv file in same folder where the python file is there.
• And then write the code for reading and writing of the csv file.
READING A CSV FILE
• There are two ways to read a CSV file.
• You can use the csv module’s reader function or you can use the
DictReader class.
• Using DictReader class:
• Here we have open the csv file ‘mpg.csv’
and try to open the file and read the file
using DictReader() class.
• DictReader() is used to output the data in
dictionary format.
Here, m[:3] prints the first three row from
starting.
READING A CSV FILE
• USING READER() CLASS:
Here , we read the code using the reader() class which separate
the row and column value with comma.
Output:
Writing a CSV File
• The csv module also has two methods that you can use to write a CSV file,
you can use the writer function or the DictWriter class.
• USING DictWriter() CLASS:
LOOPING THROUGH ROWS
The for loop which defines that for the following indented lines, the row variable should contain each element
from the list, and the second line which will print this row variable.
We can open the csv file using open(filename.csv) and
then perform the operation.
LOOPING THROUGH ROWS
In this, we create an empty list ‘model_no’
• After creating empty list we append the
data of row[2] in the list and print the list.
• Once run, this code will print a single list
EXTRACTING INFORMATION FROM CSV FILE
• If you want information about a particular column
then extract it using row[].
• Here in this code, we extract the information about
‘model’ column.
CONVERTING LIST TO SETS IN CSV FILE
Here in this code, ‘set’ function is used to
remove the duplicay of the value and print only
the value once.
First we import the csv module while manipulating
with csv file.
PANDAS
• Pandas is an open source Python library for data analysis.
PANDAS DATA STRUCTURES
Pandas introduces two new data structures to Python :
• Series
• DataFrame
SERIES
SERIES
• Series is a one-dimensional labelled array capable of holding any data type.
• A Series is a one-dimensional object similar to an array, list, or column in a table.
• It will assign a labelled index to each item in the Series.
• By default, each item will receive an index label from 0 to N, where N is the length of
the Series minus one.
SERIES
CREATE A SERIES WITH AN ARBITRARY LIST
In the output the value in list is arranged in series with the index assigned.
The dtype in output is ‘object’ as the strings is taken as
object data type.
You can arrange the values in the list in series
form using pd.series() data structure.
SERIES
Alternatively , specify an index to use when creating the Series.
In this, we can specify the index of the elements which are in the list and then print it, for naming the index we
use index=[] .
The Series constructor can convert a dictonary as well, using the keys of the
dictionary as its index.
In this, series constructor convert
the dictionary key to use as its
index .
SERIES EXAMPLE
If you want to output the index of the values in the series then use , ‘index’ keyword.
SERIES EXAMPLE
If one of the elements in the series is ‘None’ then in the output it
prints ‘None’ only.
If one of the elements in the series is
‘none’ and all elements are numeric
then it prints in the output as ‘NaN’
(not a number) value.
• NaN is not same as None
keyword.
• In numpy we use isnan() to
check NaN value is there or
not.
QUERYING A SERIES
We can basically query in the series using:
 loc() : used when we query about the label
 iloc() : used when we query the data using numeric value.
When you want to query about the
particular element in series using
numeric position use ‘iloc[]’ .
When you want to query about the
particular element in series using
label use ‘loc[]’ .
Pandas csv
Pandas csv
Pandas csv
DATAFRAME
DATAFRAME
• A DataFrame is a tabular data structure comprised of rows and columns.
• A DataFrame is defined as a group of Series objects that share an index (the column
names).
• The Pandas data frame consists of three main components: the data, the index, and
the columns.
DATAFRAME EXAMPLE
head() is used to displays the first five records of the
dataset
Here pd.DataFrame() function is used to frame the different series
object and output the result in two-dimensional form.
EXTRACTING VALUES FROM DATAFRAME
To extract the element by label use loc[] attribute.
In this code, we find out the customer come in ‘shop 2’ index.
We can also extract the element if we want
only particular column by their mentioned
index, pass two values in df.loc[] function.
EXTRACTING VALUES FROM DATAFRAME
In this the ‘place’ column is added in the dataframe
. We can add any column using this form.
If we want to display two or more columns along with the index
then we use this form. In this cost and student column is shown
only with all indices.
RENAME A COLUMN NAME
In this , to rename the column we use
‘df.rename(columns={}) ‘ syntax.
In this, we write the column name which have to
rename.
In this, we have to write the new column
name which you want to mention.
INPLACE
• In any method , if inplace is False then operation won’t affect the underlying data.
• If the inplace is True then nothing going to print out
• And it is tip that something is happen in inplace.
DROP
To drop any column we use drop() function which
drop the mentioned column.
In this, we use inplace =True which tell
something is happen in inplace and nothing
prints it.
• Axis=1 is used if we want to drop the column
• Axis=0 is used if we want to drop the row.
QUERYING A DATAFRAME
In this, we want output for the cost>20 value in dataframe and
it returns True or False if it satisfies the condition.
Where() takes the Boolean masking condition,applies it to the
dataframe series and returns a new dataframe of the series of the
shape shape.
Here count() is used to count the occurrence of cost in
dataframe.
FILTER THE ROWS WITH NaN VALUE
Dropna() function is used to remove the row which contain not a
number value.
We can also filter the rows or drop row by using this way of
writing a code.
QUERYING DATAFRAME USING LOGICAL OPERATION
Here in this, &(and) operation is used in the two
condition and output the result if it satisfies the
both condition.
Here in this, |(or) operation is used in the
two condition and output the result if it
satisfies either of the condition.
USE THIS DATA FOR INDEXING A DATAFRAME
INDEXING A DATAFRAME
Index() is used to display the index or rows
of the dataframe.
Set_index() is used to set the column as an index in
the dataframe.
Reset_index() is used to reset the index that is set
using set_index().
HANDLE MISSING VALUES IN PANDAS
Output:
• Isnull() function returns True for a value if
the value is null otherwise returns False.
• Tail() function is used to display the last five
column from the data.
HANDLE MISSING VALUES IN PANDAS
Output :
Notnull() function returns True if the value is not
null and False when value is null.
HANDLE MISSING VALUES IN PANDAS
Fillna() is used to fill the missing values in csv file
to some value named to it. In this , ‘Various’ is
used to fill the missing values.
Output:
GROUPBY
GROUPBY
• groupby function is used anytime when u want to analyse panda series by
some category.
Census.csv is a csv file.
In this line of code, we want to find the mean of the BIRTHS2012 column
for each CTYNAME column.
GROUPBY EXAMPLE
In this code, if we want to find out the mean of BIRTHS2012 column wrt city
name ‘Ada county’ then use this way .
Output:
GROUPBY EXAMPLE
In this line of code, if you want to
calculate the mean over across all the
column for each CTYNAME, then use this.
AGG() Function
• agg() function allow to specify multiple aggregation function at once.
In this line of code, agg() function is used to aggregate the
value for count,min,max,mean.
SCALES
Pandas csv
NOMINAL SCALES EXAMPLE
Output:
.astype() simply convert the datatype of one form to another.
ORDINAL SCALES EXAMPLE
Output:
If we want to arrange the resulting data in ordered
form, then ordered attribute is used.
SCALES EXAMPLE
Here, the dtype return is of object type.
Here, the dtype return is of category type as we
change the dtype ‘object’ to category using astype.
PIVOT TABLE
PIVOT TABLE
• To give a better representation where the columns are the unique variables
and an index of dates identifies individual observations.
• To reshape the data into this form, use the pivot function
OUTPUT:
PIVOT TABLE
Here, we can use the aggfunc=[] and pass a number of
aggregate operations you want to apply on.
DATA FUNCTIONAITY IN PANDAS
• Timestamp:
• Period : represents a single time span.
DATA FUNCTIONAITY IN PANDAS
DatetimeIndex: is the index of the timestamp
PeriodIndex: is the index of the period
In this ,(‘abc’) is the index
assigned to timestamp value.
In this ,(‘abc’) is the index
assigned to period value.
CONVERTING TO DATETIME
To convert into datetime format
use ‘to _ datetime()’ .
TIMEDELTAS
• TIMEDELTAS: differences in time
In this, we find the difference between the two
timestamps.
MERGING DATAFRAMES
MERGING DATAFRAMES
Use this dataset to merge the
dataframes
OUTER JOIN
Merge() function is use dto merge
the two dataframes .
INNER JOIN
LEFT JOIN
RIGHT JOIN
MERGING DATAFRAMES

More Related Content

PPTX
Functions and Modules.pptx
PPTX
Pandas Series
PDF
Numpy tutorial
PPTX
PDF
Introduction to Python Pandas for Data Analytics
PDF
Pandas
PPTX
Python Scipy Numpy
PPTX
Introduction to numpy Session 1
Functions and Modules.pptx
Pandas Series
Numpy tutorial
Introduction to Python Pandas for Data Analytics
Pandas
Python Scipy Numpy
Introduction to numpy Session 1

What's hot (20)

PDF
What is Python Lambda Function? Python Tutorial | Edureka
PPTX
Chapter 03 python libraries
PPTX
Data Structures in Python
PPS
String and string buffer
PDF
Python programming : Strings
PPTX
classes and objects in C++
PDF
Python programming : Classes objects
PPTX
String, string builder, string buffer
PPTX
Packages In Python Tutorial
PPSX
Modules and packages in python
PPTX
Queue Implementation Using Array & Linked List
 
PPTX
Linked List - Insertion & Deletion
PPTX
Datastructures in python
 
PPTX
Chapter 05 classes and objects
PPT
Abstract data types
PPTX
Python-Inheritance.pptx
PDF
List , tuples, dictionaries and regular expressions in python
PDF
Datatypes in python
PPTX
Basic data structures in python
What is Python Lambda Function? Python Tutorial | Edureka
Chapter 03 python libraries
Data Structures in Python
String and string buffer
Python programming : Strings
classes and objects in C++
Python programming : Classes objects
String, string builder, string buffer
Packages In Python Tutorial
Modules and packages in python
Queue Implementation Using Array & Linked List
 
Linked List - Insertion & Deletion
Datastructures in python
 
Chapter 05 classes and objects
Abstract data types
Python-Inheritance.pptx
List , tuples, dictionaries and regular expressions in python
Datatypes in python
Basic data structures in python
Ad

Similar to Pandas csv (20)

PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PPTX
Python libraries for analysis Pandas.pptx
PPTX
introduction to data structures in pandas
PPTX
PANDAS IN PYTHON (Series and DataFrame)
PPTX
PPTX
Lecture 3 intro2data
PPTX
Unit 3_Numpy_VP.pptx
PDF
Panda data structures and its importance in Python.pdf
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Unit 3_Numpy_VP.pptx
PPTX
DataStructures in Pyhton Pandas and numpy.pptx
PPTX
Pandas-(Ziad).pptx
PPTX
Python Pandas.pptx
PPTX
pandas directories on the python language.pptx
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PPTX
Unit 3_Numpy_Vsp.pptx
PPTX
Pandas Dataframe reading data Kirti final.pptx
PDF
pandas dataframe notes.pdf
PPTX
Lecture 9.pptx
PPTX
Presentation on the basic of numpy and Pandas
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Python libraries for analysis Pandas.pptx
introduction to data structures in pandas
PANDAS IN PYTHON (Series and DataFrame)
Lecture 3 intro2data
Unit 3_Numpy_VP.pptx
Panda data structures and its importance in Python.pdf
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Unit 3_Numpy_VP.pptx
DataStructures in Pyhton Pandas and numpy.pptx
Pandas-(Ziad).pptx
Python Pandas.pptx
pandas directories on the python language.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
Unit 3_Numpy_Vsp.pptx
Pandas Dataframe reading data Kirti final.pptx
pandas dataframe notes.pdf
Lecture 9.pptx
Presentation on the basic of numpy and Pandas
Ad

More from Devashish Kumar (6)

PPTX
Python: Data Visualisation
PPTX
Data Analysis packages
PPTX
Data Analysis in Python-NumPy
PPTX
Functions in python slide share
PPTX
Introduction to Python Part-1
PPTX
Cloud Computing Introductory-1
Python: Data Visualisation
Data Analysis packages
Data Analysis in Python-NumPy
Functions in python slide share
Introduction to Python Part-1
Cloud Computing Introductory-1

Recently uploaded (20)

PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Lecture1 pattern recognition............
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
modul_python (1).pptx for professional and student
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Introduction to Data Science and Data Analysis
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Lecture1 pattern recognition............
oil_refinery_comprehensive_20250804084928 (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
modul_python (1).pptx for professional and student
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Introduction to Data Science and Data Analysis
Galatica Smart Energy Infrastructure Startup Pitch Deck
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
annual-report-2024-2025 original latest.
Mega Projects Data Mega Projects Data
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx

Pandas csv

  • 2. WHAT IS A CSV FILE ? • CSV files are used to store a large number of variables – or data. • Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext. • The CSV module is a built-in function that allows Python to parse these types of files. • The text inside a CSV file is laid out in rows, and each of those has columns, all separated by commas. • Every line in the file is a row in the spreadsheet, while the commas are used to define and separate cells.
  • 3. CSV MODULE • The csv module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as comma-separated value (CSV) format because commas are often used to separate the fields in a record. • If you want to import or export spreadsheets and databases for use in the Python interpreter, you must rely on the CSV module, or Comma Separated Values format.
  • 4. STEPS • First, save the excel file with ‘.csv’ extension . • Second, save the csv file in same folder where the python file is there. • And then write the code for reading and writing of the csv file.
  • 5. READING A CSV FILE • There are two ways to read a CSV file. • You can use the csv module’s reader function or you can use the DictReader class. • Using DictReader class: • Here we have open the csv file ‘mpg.csv’ and try to open the file and read the file using DictReader() class. • DictReader() is used to output the data in dictionary format. Here, m[:3] prints the first three row from starting.
  • 6. READING A CSV FILE • USING READER() CLASS: Here , we read the code using the reader() class which separate the row and column value with comma. Output:
  • 7. Writing a CSV File • The csv module also has two methods that you can use to write a CSV file, you can use the writer function or the DictWriter class. • USING DictWriter() CLASS:
  • 8. LOOPING THROUGH ROWS The for loop which defines that for the following indented lines, the row variable should contain each element from the list, and the second line which will print this row variable. We can open the csv file using open(filename.csv) and then perform the operation.
  • 9. LOOPING THROUGH ROWS In this, we create an empty list ‘model_no’ • After creating empty list we append the data of row[2] in the list and print the list. • Once run, this code will print a single list
  • 10. EXTRACTING INFORMATION FROM CSV FILE • If you want information about a particular column then extract it using row[]. • Here in this code, we extract the information about ‘model’ column.
  • 11. CONVERTING LIST TO SETS IN CSV FILE Here in this code, ‘set’ function is used to remove the duplicay of the value and print only the value once. First we import the csv module while manipulating with csv file.
  • 12. PANDAS • Pandas is an open source Python library for data analysis.
  • 13. PANDAS DATA STRUCTURES Pandas introduces two new data structures to Python : • Series • DataFrame
  • 15. SERIES • Series is a one-dimensional labelled array capable of holding any data type. • A Series is a one-dimensional object similar to an array, list, or column in a table. • It will assign a labelled index to each item in the Series. • By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.
  • 16. SERIES CREATE A SERIES WITH AN ARBITRARY LIST In the output the value in list is arranged in series with the index assigned. The dtype in output is ‘object’ as the strings is taken as object data type. You can arrange the values in the list in series form using pd.series() data structure.
  • 17. SERIES Alternatively , specify an index to use when creating the Series. In this, we can specify the index of the elements which are in the list and then print it, for naming the index we use index=[] .
  • 18. The Series constructor can convert a dictonary as well, using the keys of the dictionary as its index. In this, series constructor convert the dictionary key to use as its index .
  • 19. SERIES EXAMPLE If you want to output the index of the values in the series then use , ‘index’ keyword.
  • 20. SERIES EXAMPLE If one of the elements in the series is ‘None’ then in the output it prints ‘None’ only. If one of the elements in the series is ‘none’ and all elements are numeric then it prints in the output as ‘NaN’ (not a number) value. • NaN is not same as None keyword. • In numpy we use isnan() to check NaN value is there or not.
  • 21. QUERYING A SERIES We can basically query in the series using:  loc() : used when we query about the label  iloc() : used when we query the data using numeric value. When you want to query about the particular element in series using numeric position use ‘iloc[]’ . When you want to query about the particular element in series using label use ‘loc[]’ .
  • 26. DATAFRAME • A DataFrame is a tabular data structure comprised of rows and columns. • A DataFrame is defined as a group of Series objects that share an index (the column names). • The Pandas data frame consists of three main components: the data, the index, and the columns.
  • 27. DATAFRAME EXAMPLE head() is used to displays the first five records of the dataset Here pd.DataFrame() function is used to frame the different series object and output the result in two-dimensional form.
  • 28. EXTRACTING VALUES FROM DATAFRAME To extract the element by label use loc[] attribute. In this code, we find out the customer come in ‘shop 2’ index. We can also extract the element if we want only particular column by their mentioned index, pass two values in df.loc[] function.
  • 29. EXTRACTING VALUES FROM DATAFRAME In this the ‘place’ column is added in the dataframe . We can add any column using this form. If we want to display two or more columns along with the index then we use this form. In this cost and student column is shown only with all indices.
  • 30. RENAME A COLUMN NAME In this , to rename the column we use ‘df.rename(columns={}) ‘ syntax. In this, we write the column name which have to rename. In this, we have to write the new column name which you want to mention.
  • 31. INPLACE • In any method , if inplace is False then operation won’t affect the underlying data. • If the inplace is True then nothing going to print out • And it is tip that something is happen in inplace.
  • 32. DROP To drop any column we use drop() function which drop the mentioned column. In this, we use inplace =True which tell something is happen in inplace and nothing prints it. • Axis=1 is used if we want to drop the column • Axis=0 is used if we want to drop the row.
  • 33. QUERYING A DATAFRAME In this, we want output for the cost>20 value in dataframe and it returns True or False if it satisfies the condition. Where() takes the Boolean masking condition,applies it to the dataframe series and returns a new dataframe of the series of the shape shape. Here count() is used to count the occurrence of cost in dataframe.
  • 34. FILTER THE ROWS WITH NaN VALUE Dropna() function is used to remove the row which contain not a number value. We can also filter the rows or drop row by using this way of writing a code.
  • 35. QUERYING DATAFRAME USING LOGICAL OPERATION Here in this, &(and) operation is used in the two condition and output the result if it satisfies the both condition. Here in this, |(or) operation is used in the two condition and output the result if it satisfies either of the condition.
  • 36. USE THIS DATA FOR INDEXING A DATAFRAME
  • 37. INDEXING A DATAFRAME Index() is used to display the index or rows of the dataframe. Set_index() is used to set the column as an index in the dataframe. Reset_index() is used to reset the index that is set using set_index().
  • 38. HANDLE MISSING VALUES IN PANDAS Output: • Isnull() function returns True for a value if the value is null otherwise returns False. • Tail() function is used to display the last five column from the data.
  • 39. HANDLE MISSING VALUES IN PANDAS Output : Notnull() function returns True if the value is not null and False when value is null.
  • 40. HANDLE MISSING VALUES IN PANDAS Fillna() is used to fill the missing values in csv file to some value named to it. In this , ‘Various’ is used to fill the missing values. Output:
  • 42. GROUPBY • groupby function is used anytime when u want to analyse panda series by some category. Census.csv is a csv file. In this line of code, we want to find the mean of the BIRTHS2012 column for each CTYNAME column.
  • 43. GROUPBY EXAMPLE In this code, if we want to find out the mean of BIRTHS2012 column wrt city name ‘Ada county’ then use this way . Output:
  • 44. GROUPBY EXAMPLE In this line of code, if you want to calculate the mean over across all the column for each CTYNAME, then use this.
  • 45. AGG() Function • agg() function allow to specify multiple aggregation function at once. In this line of code, agg() function is used to aggregate the value for count,min,max,mean.
  • 48. NOMINAL SCALES EXAMPLE Output: .astype() simply convert the datatype of one form to another.
  • 49. ORDINAL SCALES EXAMPLE Output: If we want to arrange the resulting data in ordered form, then ordered attribute is used.
  • 50. SCALES EXAMPLE Here, the dtype return is of object type. Here, the dtype return is of category type as we change the dtype ‘object’ to category using astype.
  • 52. PIVOT TABLE • To give a better representation where the columns are the unique variables and an index of dates identifies individual observations. • To reshape the data into this form, use the pivot function OUTPUT:
  • 53. PIVOT TABLE Here, we can use the aggfunc=[] and pass a number of aggregate operations you want to apply on.
  • 54. DATA FUNCTIONAITY IN PANDAS • Timestamp: • Period : represents a single time span.
  • 55. DATA FUNCTIONAITY IN PANDAS DatetimeIndex: is the index of the timestamp PeriodIndex: is the index of the period In this ,(‘abc’) is the index assigned to timestamp value. In this ,(‘abc’) is the index assigned to period value.
  • 56. CONVERTING TO DATETIME To convert into datetime format use ‘to _ datetime()’ .
  • 57. TIMEDELTAS • TIMEDELTAS: differences in time In this, we find the difference between the two timestamps.
  • 59. MERGING DATAFRAMES Use this dataset to merge the dataframes
  • 60. OUTER JOIN Merge() function is use dto merge the two dataframes .