In this article, we will learn How to create a list of Files, Folders, and Sub Folders and then export them to Excel using Python. We will create a list of names and paths using a few folder traversing methods explained below and store them in an Excel sheet by either using openpyxl or pandas module.
Input: The following image represents the structure of the directory.
NOTE:SF2 is an empty directoryTraversing files, folder, and subfolders
The following Functions are methods for traversing folders and stores the name and path of files/folders in lists.
Method 1: Using append_path_name(path, name_list, path_list, glob)
An important function that is used in the following folder traversing functions. The purpose of the function is to check if the given path is of Windows or Linux os as the path separator is different and appends the names and paths of files or folders to name_list and path_list respectively.
Note: Windows uses "\" and Linux uses "/" as path separator, since python treats "\" as an invalid character, we need to use "\\" instead of "\" in the path.
Approach:
- The function will first find if the path contains "\\" using :
# Returns the count if it finds
# any "\\" in the given path.
path.find("\\")
Note: If it returns any number greater than zero, it means the current os is Windows and the first block of code will be executed or else the second block of code will be executed representing Linux os.
- We will split the path according to the current os and store it in a temporary list.
# Windows
temp = path.split("\\")
# Linux
temp = path.split("/")
- We will append the name and path of files or folders to the name_list and path_list respectively.
# temp[-1] gets the last value present in
# the temporary list that represents
# the file or folder name.
name_list.append(temp[-1])
path_list.append(path)
- If the glob variable is True, the parent path will be joined with a regex expression that is required for recursive traversing in glob.iglob() method.
# Windows
path = os.path.join(path, "**\\*")
# Linux
path = os.path.join(path, "**/*")
Example:
Python3
import os
# This function splits the path by checking
# if it is a windows os or linux os path and
# appends the name and path of directory (and
# files only for glob function).
def append_path_name(path, name_list, path_list, glob):
# Checks if it is a windows path or linux
# path
if path.find("\\") > 0:
# Splits the windows path and stores the
# list in a temp list and appends the last
# value of temp_list in name_list as it
# represents the name of file/ folder and
# also appends the path to path_list.
temp = path.split("\\")
name_list.append(temp[-1])
path_list.append(path)
# If this function is called under
# find_using_glob then we return modified
# path so that iglob can recursively
# traverse the folders.
if glob == True:
path = os.path.join(path, "**\\*")
return path, name_list, path_list
else:
# Same explanation as above but the path splitting
# is based on Linux
temp = path.split("/")
name_list.append(temp[-1])
path_list.append(path)
if glob == True:
path = os.path.join(path, "**/*")
return path, name_list, path_list
return name_list, path_list
name_list, path_list = append_path_name("/content/sample_data", [], [], False)
print(name_list)
print(path_list)
Output:
['sample_data', 'anscombe.json', 'california_housing_train.csv', 'F2', 'SF2', 'california_housing_test.csv',
'.ipynb_checkpoints', '.ipynb_checkpoints', 'F1', 'mnist_test.csv', 'README.md', '.ipynb_checkpoints', 'SF1',
'mnist_train_small.csv']
['/content/sample_data', '/content/sample_data/anscombe.json',Â
'/content/sample_data/california_housing_train.csv', '/content/sample_data/F2',Â
'/content/sample_data/F2/SF2', '/content/sample_data/F2/SF2/california_housing_test.csv',Â
'/content/sample_data/F2/.ipynb_checkpoints', '/content/sample_data/.ipynb_checkpoints',Â
'/content/sample_data/F1', '/content/sample_data/F1/mnist_test.csv', '/content/sample_data/F1/README.md',Â
'/content/sample_data/F1/.ipynb_checkpoints', '/content/sample_data/F1/SF1',Â
'/content/sample_data/F1/SF1/mnist_train_small.csv']
Method 2: Using find_using_os_walk(path, name_list, path_list)
This method generates the file names in a directory tree by walking the tree either top-down or bottom-up in the given path.
Syntax : os.walk(path)
Approach:
1. Initiate a for loop using os.walk(path) method, it generates a tuple containing the path of the current directory in root and the file list in files.
for root, _, files in os.walk(path):
2. Call the append_path_name function to store the names and paths of directories bypassing the current directory path.
name_list, path_list = append_path_name(
root, name_list, path_list, False)
3. Iterate the files and store the names and paths of the files found inside a folder.
# Joins the folder path and the
# file name to generate file path
file_path = os.path.join(root, file_name)
# Appends file name and file path to
# name_list and path_list respectively.
name_list.append(file_name)
path_list.append(file_path)
Example:
Python3
import os
# This Function uses os.walk method to traverse folders
# recursively and appends the name and path of file/
# folders in name_list and path_list respectively.
def find_using_os_walk(path, name_list, path_list):
for root, _, files in os.walk(path):
# Function returns modified name_list and
# path_list.
name_list, path_list = append_path_name(
root, name_list, path_list, False)
for file_name in files:
file_path = os.path.join(root, file_name)
# Appends file name and file path to
# name_list and path_list respectively.
name_list.append(file_name)
path_list.append(file_path)
return name_list, path_list
name_list, path_list = find_using_os_walk("/content/sample_data", [], [])
print(name_list)
print(path_list)
Output:
['sample_data', 'anscombe.json', 'california_housing_train.csv', 'F2', 'SF2', 'california_housing_test.csv',
'.ipynb_checkpoints', '.ipynb_checkpoints', 'F1', 'mnist_test.csv', 'README.md', '.ipynb_checkpoints', 'SF1',
'mnist_train_small.csv']
['/content/sample_data', '/content/sample_data/anscombe.json',Â
'/content/sample_data/california_housing_train.csv', '/content/sample_data/F2',Â
'/content/sample_data/F2/SF2', '/content/sample_data/F2/SF2/california_housing_test.csv',Â
'/content/sample_data/F2/.ipynb_checkpoints', '/content/sample_data/.ipynb_checkpoints',Â
'/content/sample_data/F1', '/content/sample_data/F1/mnist_test.csv', '/content/sample_data/F1/README.md',Â
'/content/sample_data/F1/.ipynb_checkpoints', '/content/sample_data/F1/SF1',Â
'/content/sample_data/F1/SF1/mnist_train_small.csv']
Method 3: Using find_using_scandir(path, name_list, path_list)
This Function returns an iterator of os.DirEntry objects corresponding to the entries in the directory given by path.
Syntax : os.scandir(path)
Approach:
1. Call the append_path_name function to store the names and paths of directories by passing the current directory path.
name_list, path_list = append_path_name(
path, name_list, path_list, False)
2. Initiate a for loop using os.scandir(path) method that returns an object containing the current name and path of the file/folder.
for curr_path_obj in os.scandir(path):
3. If the current path is a directory then the function calls itself to recursively traverse the folders and store the folder names and paths from step 1.
if curr_path_obj.is_dir() == True:
file_path = curr_path_obj.path
find_using_scandir(file_path, name_list, path_list)
4. Else the file names and paths are stored in name_list and path_list respectively.
file_name = curr_path_obj.name
file_path = curr_path_obj.path
name_list.append(file_name)
path_list.append(file_path)
Example:
Python3
import os
# This Function uses os.scandir method to traverse
# folders recursively and appends the name and path of
# file/folders in name_list and path_list respectively.
def find_using_scandir(path, name_list, path_list):
# Function returns modified name_list and path_list.
name_list, path_list = append_path_name(
path, name_list, path_list, False)
for curr_path_obj in os.scandir(path):
# If the current path is a directory then the
# function calls itself with the directory path
# and goes on until a file is found.
if curr_path_obj.is_dir() == True:
file_path = curr_path_obj.path
find_using_scandir(file_path, name_list, path_list)
else:
# Appends file name and file path to
# name_list and path_list respectively.
file_name = curr_path_obj.name
file_path = curr_path_obj.path
name_list.append(file_name)
path_list.append(file_path)
return name_list, path_list
name_list, path_list = find_using_scandir("/content/sample_data", [], [])
print(name_list)
print(path_list)
Output:
['sample_data', 'anscombe.json', 'california_housing_train.csv', 'F2', 'SF2', 'california_housing_test.csv',
'.ipynb_checkpoints', '.ipynb_checkpoints', 'F1', 'mnist_test.csv', 'README.md', '.ipynb_checkpoints', 'SF1',
'mnist_train_small.csv']
['/content/sample_data', '/content/sample_data/anscombe.json',Â
'/content/sample_data/california_housing_train.csv', '/content/sample_data/F2',Â
'/content/sample_data/F2/SF2', '/content/sample_data/F2/SF2/california_housing_test.csv',Â
'/content/sample_data/F2/.ipynb_checkpoints', '/content/sample_data/.ipynb_checkpoints',Â
'/content/sample_data/F1', '/content/sample_data/F1/mnist_test.csv', '/content/sample_data/F1/README.md',Â
'/content/sample_data/F1/.ipynb_checkpoints', '/content/sample_data/F1/SF1',Â
'/content/sample_data/F1/SF1/mnist_train_small.csv']
Method 4: Using find_using_listdir(path, name_list, path_list)
This Function Gets the list of all files and directories in the given path.
Syntax : os.listdir(path)
Approach:
1. Call the append_path_name function to store the names and paths of directories by passing the current directory path.
name_list, path_list = append_path_name(
path, name_list, path_list, False)
2. Initiate a for loop using os.listdir(path) method that returns a list of files and folder names present in the current path.
for curr_name in os.listdir(path):
3. Join the name of folder or file with the current path.
curr_path = os.path.join(path, curr_name)
4. If the current path is a directory then the function calls itself to recursively traverse the folders and store the folder names and paths from step 1.
if os.path.isdir(curr_path) == True:
find_using_listdir(curr_path, name_list, path_list)
5. Else the file names and paths are stored in name_list and path_list respectively.
name_list.append(curr_name)
path_list.append(curr_path)
Code for the above-mentioned function:
Python3
import os
# This Function uses os.listdir method to traverse
# folders recursively and appends the name and path of
# file/folders in name_list and path_list respectively.
def find_using_listdir(path, name_list, path_list):
# Function appends folder name and folder path to
# name_list and path_list respectively.
name_list, path_list = append_path_name(path,
name_list, path_list, False)
for curr_name in os.listdir(path):
curr_path = os.path.join(path, curr_name)
# Checks if the current path is a directory.
if os.path.isdir(curr_path) == True:
# If the current path is a directory then the
# function calls itself with the directory path
# and goes on until a file is found
find_using_listdir(curr_path, name_list, path_list)
else:
# Appends file name and file path to
# name_list and path_list respectively.
name_list.append(curr_name)
path_list.append(curr_path)
return name_list, path_list
name_list, path_list = find_using_listdir("/content/sample_data", [], [])
print(name_list)
print(path_list)
Output:
['sample_data', 'anscombe.json', 'california_housing_train.csv', 'F2', 'SF2', 'california_housing_test.csv',
'.ipynb_checkpoints', '.ipynb_checkpoints', 'F1', 'mnist_test.csv', 'README.md', '.ipynb_checkpoints', 'SF1',
'mnist_train_small.csv']
['/content/sample_data', '/content/sample_data/anscombe.json',Â
'/content/sample_data/california_housing_train.csv', '/content/sample_data/F2',Â
'/content/sample_data/F2/SF2', '/content/sample_data/F2/SF2/california_housing_test.csv',Â
'/content/sample_data/F2/.ipynb_checkpoints', '/content/sample_data/.ipynb_checkpoints',Â
'/content/sample_data/F1', '/content/sample_data/F1/mnist_test.csv', '/content/sample_data/F1/README.md',Â
'/content/sample_data/F1/.ipynb_checkpoints', '/content/sample_data/F1/SF1',Â
'/content/sample_data/F1/SF1/mnist_train_small.csv']
Method 5: Using find_using_glob(path, name_list, path_list)
This Function returns an iterator that yields the same values as glob() without actually storing them all simultaneously.
Syntax : glob.iglob(path, recursive=True)
Approach:
1. Call the append_path_name function to store the name and path of parent directory and also return the modified path required for glob method since the last parameter is True.
path, name_list, path_list = append_path_name(
path, name_list, path_list, True)
2. Initiate a for loop using glob.iglob(path, recursive=True) method that recursively traverses the folders and returns the current path of file/folder.
for curr_path in glob.iglob(path, recursive=True):
3. Call the append_path_name function to store the names and paths of files/folders by passing the current path.
name_list, path_list = append_path_name(
curr_path, name_list, path_list, False)
Code for the above-mentioned function:
Python3
import glob
# This Function uses glob.iglob method to traverse
# folders recursively and appends the name and path of
# file/folders in name_list and path_list respectively.
def find_using_glob(path, name_list, path_list):
# Appends the Parent Directory name and path
# and modifies the parent path so that iglob
# can traverse recursively.
path, name_list, path_list = append_path_name(
path, name_list, path_list, True)
# glob.iglob with recursive set to True will
# get all the file/folder paths.
for curr_path in glob.iglob(path, recursive=True):
# Appends file/folder name and path to
# name_list and path_list respectively.
name_list, path_list = append_path_name(
curr_path, name_list, path_list, False)
return name_list, path_list
name_list, path_list = find_using_glob("/content/sample_data", [], [])
print(name_list)
print(path_list)
Output:
['sample_data', 'anscombe.json', 'california_housing_train.csv', 'F2', 'SF2', 'california_housing_test.csv',
'.ipynb_checkpoints', '.ipynb_checkpoints', 'F1', 'mnist_test.csv', 'README.md', '.ipynb_checkpoints', 'SF1',
'mnist_train_small.csv']
['/content/sample_data', '/content/sample_data/anscombe.json',Â
'/content/sample_data/california_housing_train.csv', '/content/sample_data/F2',Â
'/content/sample_data/F2/SF2', '/content/sample_data/F2/SF2/california_housing_test.csv',Â
'/content/sample_data/F2/.ipynb_checkpoints', '/content/sample_data/.ipynb_checkpoints',Â
'/content/sample_data/F1', '/content/sample_data/F1/mnist_test.csv', '/content/sample_data/F1/README.md',Â
'/content/sample_data/F1/.ipynb_checkpoints', '/content/sample_data/F1/SF1',Â
'/content/sample_data/F1/SF1/mnist_train_small.csv']
Storing data in Excel SheetÂ
Method 1: Using openpyxl
This module is used to read/write data to excel. It has a wide range of features but here we will use it to just create and write data to excel. You need to install openpyxl via pip in your system.
pip install openpyxl
Approach:
1. Import the required modules
# imports workbook from openpyxl module
from openpyxl import Workbook
2. Create a workbook object
work_book = Workbook()
3. Get the workbook active sheet object and initiate the following variables with 0.
row, col1_width, col2_width = 0, 0, 0
work_sheet = work_book.active
4. Iterate the rows to the maximum length of name_list as these many entries will be written to the excel sheet
while row <= len(name_list):
5. Get the cell objects of column 1 and column 2 of the same row and store the values of name_list and path_list to the respective cells.
name = work_sheet.cell(row=row+1, column=1)
path = work_sheet.cell(row=row+1, column=2)
# This block will execute only once and
# add the Heading of column 1 and column 2
if row == 0:
name.value = "Name"
path.value = "Path"
row += 1
continue
# Storing the values from name_list and path_list
# to the specified cell objects
name.value = name_list[row-1]
path.value = path_list[row-1]
6. (Optional) Adjusting the width of cells in Excel sheet using openpyxl's column dimensions.
col1_width = max(col1_width, len(name_list[row-1]))
col2_width = max(col2_width, len(path_list[row-1]))
work_sheet.column_dimensions["A"].width = col1_width
work_sheet.column_dimensions["B"].width = col2_width
7. Save the workbook with a file name after the iteration is over with a filename.
work_book.save(filename="Final.xlsx")
Example:
Python3
# Function will create an excel file and
# write the file/ folder names and their
# path using openpyxl
def create_excel_using_openpyxl(name_list, path_list,
path):
# Creates a workbook object and gets an
# active sheet
work_book = Workbook()
work_sheet = work_book.active
# Writing the data in excel sheet
row, col1_width, col2_width = 0, 0, 0
while row <= len(name_list):
name = work_sheet.cell(row=row+1, column=1)
path = work_sheet.cell(row=row+1, column=2)
# Writing the Heading i.e Name and Path
if row == 0:
name.value = "Name"
path.value = "Path"
row += 1
continue
# Writing the data from specified lists to columns
name.value = name_list[row-1]
path.value = path_list[row-1]
# Adjusting width of Column in excel sheet
col1_width = max(col1_width, len(name_list[row-1]))
col2_width = max(col2_width, len(path_list[row-1]))
work_sheet.column_dimensions["A"].width = col1_width
work_sheet.column_dimensions["B"].width = col2_width
row += 1
# Saving the workbook
work_book.save(filename="Final.xlsx")
create_excel_using_openpyxl(name_list, path_list, path)
Output:
Method 2: Using pandas
1. Create a frame (a dictionary) with the keys as 'Name' and 'Path' and values as name_list and path_list respectively:
frame = {'Name': name_list,
'Path': path_list
}
2. Before exporting we need to create a dataframe called df_data with columns as Name and Path.
df_data = pd.DataFrame(frame)
3. Export the data to excel using the following code:
df_data.to_excel('Final.xlsx', index=False)
Code for the above-mentioned function:
Python3
# Function will create a data frame using pandas and
# write File/Folder, and their path to excel file.
def create_excel_using_pandas_dataframe(name_list,
path_list, path):
# Default Frame (a dictionary) is created with
# File/Folder names and their path with the given lists
frame = {'Name': name_list,
'Path': path_list
}
# Creates the dataframe using pandas with the given
# dictionary
df_data = pd.DataFrame(frame)
# Creates and saves the data to an excel file
df_data.to_excel('Final.xlsx', index=False)
create_excel_using_pandas_dataframe(name_list,
path_list, path)
Output:
Similar Reads
How to Merge all excel files in a folder using Python?
In this article, we will see how to combine all Excel files present in a folder into a single file. Module used: The python libraries used are: Pandas: Pandas is a python library developed for a python programming language for manipulating data and analyzing the data. It is widely used in Data Scien
3 min read
How to List all Files and Directories in FTP Server using Python?
FTP ( File Transfer Protocol ) is set of rules that computer follows to transfer files across computer network. It is TCP/IP based protocol. FTP lets clients share files. FTP is less secure because of files are shared as plain text without any encryption across the network. It is possible using pyt
2 min read
Get list of files and folders in Google Drive storage using Python
In this article, we are going to have a look at how can we get a list of files (or folders) stored in our Google Drive cloud storage using Google Drive API in Python. It is a REST API that allows you to leverage Google Drive storage from within your app or program. So, let's create a simple Python s
4 min read
How to move list of folders with subfolders using Python ?
Sometimes we need to move an entire directory or maybe there is a list of such directories say A along with its sub-content, files, and subfolders to another destination directory B. While this can be done manually by 'cutting' and 'pasting' but what if there are hundreds or thousands of directories
4 min read
Python - Move all files from subfolders to main folder
This article will discuss how to move all files from the subfolder to the main folder using Python. The approach is simple it is similar to moving files from one folder to another using Python, except here the main folder or parent folder of the subfolder is passed as the destination. Modules UsedOS
3 min read
How to convert PDF file to Excel file using Python?
In this article, we will see how to convert a PDF to Excel or CSV File Using Python. It can be done with various methods, here are we are going to use some methods. Method 1: Using pdftables_api Here will use the pdftables_api Module for converting the PDF file into any other format. It's a simple
2 min read
Remove all empty files within a folder and subfolders in Python
In this article, we will see how to remove all empty files within a folder and its subfolders in Python. We occasionally produce some empty files that are not needed. Here, we will use the below functions/methods to remove all empty files within a folder and its subfolder imported from the OS module
4 min read
How to Find the First Empty Row of an Excel File in Python
When working with data in Excel files, one common task is to find the first empty row where new data can be added. Whether you're automating a data entry process or performing data analysis, Python offers several methods to accomplish this task efficiently. In this article, we'll explore different a
2 min read
How to create a duplicate file of an existing file using Python?
In this article, we will discuss how to create a duplicate of the existing file in Python. Below are the source and destination folders, before creating the duplicate file in the destination folder. After a duplicate file has been created in the destination folder, it looks like the image below. For
5 min read
Create and Write on Excel File using xlsxwriter Module - Python
XlsxWriter The xlsxwriter module in Python is used to create Excel .xlsx files and write data into them. It supports:Writing text, numbers, and formulasCreating multiple worksheetsFormatting, charts, images, filters, and conditional formattingInstallationBefore using xlsxwriter, we need to install i
2 min read