Open In App

Compare Two Csv Files Using Python

Last Updated : 30 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

We are given two files and our tasks is to compare two CSV files based on their differences in Python. In this article, we will see some generally used methods for comparing two CSV files and print differences.

file1.csv contains

Name,Age,City
John,25,New York
Emily,30,Los Angeles
Michael,40,Chicago

file2.csv contains

Name,Age,City
John,25,New York
Michael,45,Chicago
Emma,35,San Francisco

Using compare()

compare() method in pandas is used to compare two DataFrames and return the differences. It highlights only the rows and columns where the values differ, making it ideal for structured data comparison.


Python
import pandas as pd

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

# Compare DataFrames
res = df1.compare(df2)
print(res)

Output

Output
Using compare()

Explanation: It first reads file1.csv and file2.csv into two separate DataFrames, df1 and df2. The compare() method is then applied to identify differences between the two DataFrames.

Using set operations

This method reads both files line-by-line and stores their content as sets. Using set difference (a - b) allows you to quickly identify lines that are present in one file but not the other.

Python
with open('file1.csv') as f1, open('file2.csv') as f2:
    a = set(f1.readlines())
    b = set(f2.readlines())

print(a - b)
print(a - b)

Output

Output
Using set operations

Explanation: It first opens file1.csv and file2.csv, reads their contents line by line and stores them as sets a and b. The difference a - b is then printed to show lines present in file1.csv but not in file2.csv.

Using difflib

Python’s difflib module provides detailed differences between files, similar to Unix's diff command. It can generate unified or context diffs showing what was added, removed, or changed.

Python
import difflib

with open('file1.csv') as f1, open('file2.csv') as f2:
    d = difflib.unified_diff(f1.readlines(), f2.readlines(), fromfile='file1.csv', tofile='file2.csv')
    for line in d:
        print(line, end='')

Output

Output
Using difflib

Explanation: It opens file1.csv and file2.csv, reads their contents, and uses difflib.unified_diff() to generate a line-by-line comparison. The output shows added, removed or changed lines between the two files in a unified diff format.


Next Article
Practice Tags :

Similar Reads