
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Eliminate Repeated Lines in a Python Function
In this article, we will discuss how to delete multiple lines that are repeated in a Python Function. If the file containing the program is small and only has a few lines, we can remove repeated lines manually. However, when dealing with huge files, we need a Python program to do so.
Using the File Handling Methods
Python has built-in methods for creating, opening, and closing files, which makes handling files easier. Using the methods, we can also perform several file actions, such as reading, writing, and appending data (while files are open).
To remove duplicate lines from a text file or a Python file that contains a function, we use file handling methods in Python. The text file or function must be in the same directory as the ".py" file that contains the Python program.
Algorithm
The following is an approach to eliminate repeated lines in a Python function -
-
First of all, open the input file that contains the function (or text with duplicate lines) in read-only mode.
-
To write the result, open the output file in write mode.
-
Read the input file line by line, using the "not in" operator, and verify if any line in the input file is a duplicate of the lines we added in the output file. If there are no duplicates, add the current line to the output file.
Save the line's hash value in a set. Instead of inspecting and storing the entire line, we will check the hash value of each line. This is more effective and takes less space when dealing with large files.
-
Skip that line if the hash value has already been added to the set.
-
When everything is done, the output file will contain every line from the input file without duplicating anything.
In here, the input file, i.e., "File.txt," contains the following data -
Welcome to TutorialsPoint. Welcome to TutorialsPoint. Python programming language in this file. eliminate repeated lines. eliminate repeated lines. eliminate repeated lines. Skip the line.
Example: Remove Duplicates Using Hash
In the following example, we are removing duplicate lines from a file by generating a unique MD5 hash for each line. Only lines with unique hashes are written to the output file -
import hashlib # path of the input and output files OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt' InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt' # holding the line which is already seen lines_present = set() # opening the output file in write mode to write in it The_Output_File = open(OutFile, "w") # loop for opening the file in read mode for l in open(InFile, "r"): # finding the hash value of the current line # Before performing the hash, we remove any blank spaces and new lines from the end of the line. # Using hashlib library determine the hash value of a line. hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest() if hash_value not in lines_present: The_Output_File.write(l) lines_present.add(hash_value) # closing the output text file The_Output_File.close()
Output
We can see in the following output, here you can observe that that all the repeated lines from the input file are eliminated -
Welcome to TutorialsPoint. Python programming language in this file. eliminate repeated lines. Skip the line.
Example: Remove Duplicates Using Set Comparison
The following is another example to eliminate repeated lines in a Python function by storing each line in a set and checking for repetition. Only lines that are not present in the set are written to the output file -
import hashlib # path of the input and output files OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt' InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt' # holding the line which is already seen lines_present = set() # opening the output file in write mode to write in it The_Output_File = open(OutFile, "w") # loop for opening the file in read mode for l in open(InFile, "r"): # finding the hash value of the current line # Before performing the hash, we remove any blank spaces and new lines from the end of the line. # Using hashlib library determine the hash value of a line. hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest() if hash_value not in lines_present: The_Output_File.write(l) lines_present.add(hash_value) # closing the output text file The_Output_File.close()
Output
We can see in the following output that all the repeated lines from the input file are eliminated in the output file, which contains the unique data as shown below -
Welcome to TutorialsPoint. Python programming language in this file. eliminate repeated lines. Skip the line.
Using Loops to Remove Repetition
If you are repeating similar operations multiple times, consider using a loop instead. This helps to remove hard-coded repetition.
Example
Here, the same print statement is written multiple times. We can replace it using a loop -
# Repetitive code def greet(): print("Hello") print("Hello") print("Hello") greet()
We get the output as shown below -
Hello Hello Hello
Now let us improve it using a for loop -
# Improved version using loop def greet(): for _ in range(3): print("Hello") greet()
The output obtained is as follows -
Hello Hello Hello
Using Helper Functions
When a block of code is repeated with slight variations, you can move it into a separate helper function and call it multiple times.
Example
Here, the same logic for formatting names is repeated. Instead, we can refactor it into a function -
# Repetitive code def process_names(): print("Hello, " + "Rohan".title()) print("Hello, " + "Geet".title()) print("Hello, " + "Nancy".title()) process_names()
Following is the output obtained -
Hello, Rohan Hello, Geet Hello, Nancy
Instead of repeating the same name-formatting logic multiple times, we can refactor the code into a reusable helper function as shown below -
def greet(name): print("Hello, " + name.title()) def process_names(): for person in ["Rohan", "Geet", "Nancy"]: greet(person) process_names()
The result produced is as follows -
Hello, Rohan Hello, Geet Hello, Nancy
Using Data Structures
You can use lists, dictionaries, or other data structures to hold values and loop over them, instead of repeating similar code for each value.
Example
In this example, we are writing repeated lines to process items -
# Repetitive code def show_prices(): print("Price of apple: $1") print("Price of banana: $0.5") print("Price of cherry: $2") show_prices()
The output obtained is -
Price of apple: $1 Price of banana: $0.5 Price of cherry: $2
Instead of writing repeated lines to process items, use a list as shown in the following example -
def show_prices(): prices = [("apple", 1), ("banana", 0.5), ("cherry", 2)] for fruit, price in prices: print(f"Price of {fruit}: ${price}") show_prices()
Following is the output of the above code -
Price of apple: $1 Price of banana: $0.5 Price of cherry: $2
We can refactor the code using a dictionary as shown below -
def show_prices(): prices = { "apple": 1, "banana": 0.5, "cherry": 2 } for fruit, price in prices.items(): print(f"Price of {fruit}: ${price}") show_prices()
We can see the output as follows -
Price of apple: $1 Price of banana: $0.5 Price of cherry: $2
Using List Comprehension
If you are creating lists with repeated patterns, list comprehensions can help to reduce repetition and make your code shorter.
Example
Below is a basic example where we manually append the squares of numbers 1 to 4 -
# Repetitive way squares = [] squares.append(1**2) squares.append(2**2) squares.append(3**2) squares.append(4**2) print (squares)
The output is -
[1, 4, 9, 16]
We can write the same logic in a cleaner way using list comprehension as shown below -
# Cleaner version squares = [x**2 for x in range(1, 5)] print (squares)
Following is the output of the above code -
[1, 4, 9, 16]