How to eliminate repeated lines in a python function?

Python Server Side Programming Programming

In this article, we will discuss how to delete multiple lines that are repeated in a Python Function. If the file containing the program is small and only has a few lines, we can remove repeated lines manually. However, when dealing with huge files, we need a Python program to do so.

Using the File Handling Methods

Python has built-in methods for creating, opening, and closing files, which makes handling files easier. Using the methods, we can also perform several file actions, such as reading, writing, and appending data (while files are open).

To remove duplicate lines from a text file or a Python file that contains a function, we use file handling methods in Python. The text file or function must be in the same directory as the ".py" file that contains the Python program.

Algorithm

The following is an approach to eliminate repeated lines in a Python function -

First of all, open the input file that contains the function (or text with duplicate lines) in read-only mode.
To write the result, open the output file in write mode.
Read the input file line by line, using the "not in" operator, and verify if any line in the input file is a duplicate of the lines we added in the output file. If there are no duplicates, add the current line to the output file.
Save the line's hash value in a set. Instead of inspecting and storing the entire line, we will check the hash value of each line. This is more effective and takes less space when dealing with large files.
Skip that line if the hash value has already been added to the set.
When everything is done, the output file will contain every line from the input file without duplicating anything.

In here, the input file, i.e., "File.txt," contains the following data -

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

Example: Remove Duplicates Using Hash

In the following example, we are removing duplicate lines from a file by generating a unique MD5 hash for each line. Only lines with unique hashes are written to the output file -

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

Output

We can see in the following output, here you can observe that that all the repeated lines from the input file are eliminated -

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Example: Remove Duplicates Using Set Comparison

The following is another example to eliminate repeated lines in a Python function by storing each line in a set and checking for repetition. Only lines that are not present in the set are written to the output file -

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

Output

We can see in the following output that all the repeated lines from the input file are eliminated in the output file, which contains the unique data as shown below -

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Using Loops to Remove Repetition

If you are repeating similar operations multiple times, consider using a loop instead. This helps to remove hard-coded repetition.

Example

Here, the same print statement is written multiple times. We can replace it using a loop -

# Repetitive code
def greet():
   print("Hello")
   print("Hello")
   print("Hello")
greet()

We get the output as shown below -

Hello
Hello
Hello

Now let us improve it using a for loop -

# Improved version using loop
def greet():
   for _ in range(3):
      print("Hello")
greet()

The output obtained is as follows -

Hello
Hello
Hello

Using Helper Functions

When a block of code is repeated with slight variations, you can move it into a separate helper function and call it multiple times.

Example

Here, the same logic for formatting names is repeated. Instead, we can refactor it into a function -

# Repetitive code
def process_names():
   print("Hello, " + "Rohan".title())
   print("Hello, " + "Geet".title())
   print("Hello, " + "Nancy".title())

process_names()

Following is the output obtained -

Hello, Rohan
Hello, Geet
Hello, Nancy

Instead of repeating the same name-formatting logic multiple times, we can refactor the code into a reusable helper function as shown below -

def greet(name):
   print("Hello, " + name.title())

def process_names():
   for person in ["Rohan", "Geet", "Nancy"]:
      greet(person)

process_names()

The result produced is as follows -

Hello, Rohan
Hello, Geet
Hello, Nancy

Using Data Structures

You can use lists, dictionaries, or other data structures to hold values and loop over them, instead of repeating similar code for each value.

Example

In this example, we are writing repeated lines to process items -

# Repetitive code
def show_prices():
   print("Price of apple: $1")
   print("Price of banana: $0.5")
   print("Price of cherry: $2")

show_prices()

The output obtained is -

Price of apple: $1
Price of banana: $0.5
Price of cherry: $2

Instead of writing repeated lines to process items, use a list as shown in the following example -

def show_prices():
   prices = [("apple", 1), ("banana", 0.5), ("cherry", 2)]
   for fruit, price in prices:
      print(f"Price of {fruit}: ${price}")

show_prices()

Following is the output of the above code -

Price of apple: $1
Price of banana: $0.5
Price of cherry: $2

We can refactor the code using a dictionary as shown below -

def show_prices():
   prices = {
      "apple": 1,
      "banana": 0.5,
      "cherry": 2
   }
   for fruit, price in prices.items():
      print(f"Price of {fruit}: ${price}")

show_prices()

We can see the output as follows -

Price of apple: $1
Price of banana: $0.5
Price of cherry: $2

Using List Comprehension

If you are creating lists with repeated patterns, list comprehensions can help to reduce repetition and make your code shorter.

Example

Below is a basic example where we manually append the squares of numbers 1 to 4 -

# Repetitive way
squares = []
squares.append(1**2)
squares.append(2**2)
squares.append(3**2)
squares.append(4**2)
print (squares)

The output is -

[1, 4, 9, 16]

We can write the same logic in a cleaner way using list comprehension as shown below -

# Cleaner version
squares = [x**2 for x in range(1, 5)]
print (squares)

Following is the output of the above code -

[1, 4, 9, 16]

Sarika Singh

Updated on: 2025-05-09T15:25:12+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started