Open In App

Append Pandas DataFrames Using for Loop

Last Updated : 20 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

When dealing with large datasets, we often need to combine dataframes into single dataframe. Usually concat() is used along with the for loop to append the dataframes. Let us consider an example:

Python
import pandas as pd
import numpy as np

# Create some example DataFrames
dataframes = [pd.DataFrame(np.random.rand(10, 5)) for _ in range(100)]

# Efficient way: collect in a list and concatenate once
combined_df = pd.concat(dataframes, ignore_index=True)

# Display the result
print(combined_df)

Output:

Screenshot-2024-12-14-194935
Append Pandas DataFrames Using for Loop

Here we are generating 100 dataframes. Each dataframe comprises of 10 rows and 5 columns. Now using a for loop, we are iterating over the list of dataframes and finally using the concat method to append the dataframes. This is much more memory efficient.

Let us consider an another example: here we have 10 dataframes which are appended to the list with the help of list comprehension. Then using concat() we are concatenating all the dataframes.

Python
import pandas as pd

# Example DataFrames (Creating 10 DataFrames with simple values)
dfs = [pd.DataFrame({'A': [i, i+1], 'B': [i+2, i+3]}) for i in range(0, 10)]

# Concatenate all DataFrames in the list
result = pd.concat(df_list, ignore_index=False)

print(result)

Output:

Screenshot-2024-12-14-205227
Append Pandas DataFrames Using for Loop

From the output we can see that the dataframes have been stacked one over the other. This technique is used for large datasets as it does not create dataframes in each iteration. Hence it is much more memory efficient.

Appending dataframes but with different columns

There can be scenarios when we need to append dataframes but each of them having different column names. So we need to preprocess the columns and append the dataframes using for loop and concat method.

Let us consider a scenario. Here we have three dataframes and each of them have different column names. Now we will first collect all the column names and use reindex in the for loop to ensure each dataframes has all the columns and append them to the list. Finally use concat to concatenate all the dataframes.

Python
import pandas as pd

# Creating 10 DataFrames with different columns
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

# List of DataFrames
dfs = [df1, df2, df3]

# List to store DataFrames for concatenation
df_list = []

# Get all columns across the DataFrames
all_columns = list(set(df1.columns).union(set(df2.columns), set(df3.columns)))

# For loop to append DataFrames, reindexing them to the same column set
for df in dfs:
    df = df.reindex(columns=all_columns)  # Reindex with all columns
    df_list.append(df)

# Concatenate all DataFrames
result = pd.concat(df_list, ignore_index=True)

print(result)

Output:

Screenshot-2024-12-14-210958
Append Pandas DataFrames Using for Loop

From the output we can see that for those dataframes that do not have the particular column, it generates NaN value.

Append Pandas DataFrames Using for Loop - Examples

Example 1: Let us consider that we have list of dataframes. We will iterate over the list and for each iteration we will use concat method to concatenate the dataframes one by one.

Python
import pandas as pd

# Create sample DataFrames with different columns
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]})

# List of DataFrames to concatenate
dfs = [df1, df2]

# Initialize an empty DataFrame to concatenate into
result = pd.DataFrame()

# For loop to concatenate DataFrames
for df in dfs:
    result = pd.concat([result, df], ignore_index=True, sort=False)

print(result)

Output:

Screenshot-2024-12-19-213925
Append Pandas DataFrames Using for Loop

From the output we can see that all the columns are present in the final dataframe. The values which does not exist in a particular column are assigned NaN. This method is useful for small datasets since concat() creates a new dataframe in every iteration and consumes much more memory . So we can also use reindex() to preprocess the dataframes and concat at one go as well.

Example 2: Here we have three dataframes. So we will iterate and append the dataframes to the list. Lastly we will use concat() to combine all the dataframes that are present in the list.

Python
import pandas as pd

# Create sample DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})

# Append DataFrames to a list
df_list = []
for i in range(1,4):
  df_list.append(eval(f'df{i}'))

# Concatenate all DataFrames in the list
result = pd.concat(df_list, ignore_index=True)

print(result)

Output:

Screenshot-2024-12-19-214333
Append Pandas DataFrames Using for Loop

So here we have appended all the dataframes to a list using append method and then use concat() to combine the dataframes.


Next Article
Article Tags :

Similar Reads