
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Load CSV Data for ML Projects in Python
To successfully build a machine learning project, loading data properly is one of the most important as well as challenging tasks. CSV is the most common format for machine learning projects. It is a simple format which is used to store tabular data.
Followings are the three most common approaches in Python with the help of which you can load CSV data for machine learning projects −
Using Python Standard Library
To load CSV data files, Python standard library provides us with a built-in function namely csv module.
Example
In this example we will be loading CSV data file of iris flower data set −
#Importing csv module import csv #To convert the data into NumPy array, import numpy module: import numpy as np #Providing the full path of the CSV data file which is stored on our local directory: datafile_path = r"c:/Users/ Desktop/iris.csv" # Reading data using the csv.reader()function: with open(datafile_path,'r') as f: reader = csv.reader(f,delimiter = ',') data_headers = next(reader) data = list(reader) data = np.array(data).astype(float) #Printing the names of the data headers and the first 5 lines of the data file: print(data_headers) print(data[:5])
Output
['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] [ [5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] ]
Using Pandas
Another approach which we can use to load CSV data files is pandas.read_csv() function. This function will return a pandas.DataFrame that can be used immediately for plotting.
Example
In this example we will be loading CSV data file of Pima Indians Dataset −
#Importing read_csv function from Pandas from pandas import read_csv #Providing the full path of the CSV data file which is stored on our local directory: datafile_path = r"C:/Users/Leekha/Desktop/pima-indians-diabetes.csv" #Providing header names and reading data using read_csv() function: headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(datafile_path, names=headernames) #Printing the number of rows and columns in the file and first 5 lines of the data file: print(data.shape) print(data[:5])
Output
(768, 9) preg plas pres skin test mass pedi age class 0 6 148 72 35 0 33.6 0.627 50 1 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1