Remove URLs from string in Python
Last Updated :
24 Jan, 2024
A regular expression (regex) is a sequence of characters that defines a search pattern in text. To remove URLs from a string in Python, you can either use regular expressions (regex) or some external libraries like urllib.parse. The re-module in Python is used for working with regular expressions. In this article, we will see how we can remove URLs from a string in Python.
Python Remove URLs from a String
Below are the ways by which we can remove URLs from a string in Python:
- Using the re.sub() function
- Using the re.findall() function
- Using the re.search() function
- Using the urllib.parse class
Python Remove URLs from String Using re.sub() function
In this example, the code defines a function 'remove_urls' to find URLs in text and replace them with a placeholder [URL REMOVED], using regular expressions for pattern matching and the re.sub() method for substitution.
Python3
import re
def remove_urls(text, replacement_text="[URL REMOVED]"):
# Define a regex pattern to match URLs
url_pattern = re.compile(r'https?://\S+|www\.\S+')
# Use the sub() method to replace URLs with the specified replacement text
text_without_urls = url_pattern.sub(replacement_text, text)
return text_without_urls
# Example:
input_text = "Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/"
output_text = remove_urls(input_text)
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text)
OutputOriginal Text:
Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/
Text with URLs Removed:
Visit on GeeksforGeeks Website: [URL REMOVED]
Remove URLs from String Using re.findall() function
In this example, the Python code defines a function 'remove_urls_findall' that uses regular expressions to find all URLs using re.findall() method in a given text and replaces them with a replacement text "[URL REMOVED]".
Python3
import re
def remove_urls_findall(text, replacement_text="[URL REMOVED]"):
url_pattern = re.compile(r'https?://\S+|www\.\S+')
urls = url_pattern.findall(text)
for url in urls:
text = text.replace(url, replacement_text)
return text
# Example:
input_text = "Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/"
output_text_findall = remove_urls_findall(input_text)
print("\nUsing re.findall():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_findall)
Output:
Using re.findall():
Original Text:
Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/
Text with URLs Removed:
Check out the latest Python tutorials on GeeksforGeeks: [URL REMOVED]
Remove URLs from String in Python Using re.search() function
In this example, the Python code defines a function 'remove_urls_search' using regular expressions and re.search() to find and replace URLs in a given text with a replacement text "[URL REMOVED]".
Python3
import re
def remove_urls_search(text, replacement_text="[URL REMOVED]"):
url_pattern = re.compile(r'https?://\S+|www\.\S+')
while True:
match = url_pattern.search(text)
if not match:
break
text = text[:match.start()] + replacement_text + text[match.end():]
return text
# Example:
input_text = "Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks"
output_text_search = remove_urls_search(input_text)
print("\nUsing re.search():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_search)
Output:
Using re.search():
Original Text:
Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks
Text with URLs Removed:
Visit our website at [URL REMOVED] for more information. Follow us on Twitter: @geeksforgeeks
Remove URLs from String Using urllib.parse
In this example, the Python code defines a function 'remove_urls_urllib' that uses urllib.parse to check and replace URLs in a given text with a replacement text "[URL REMOVED]".
Python3
# Using urllib.parse
from urllib.parse import urlparse
def remove_urls_urllib(text, replacement_text="[URL REMOVED]"):
words = text.split()
for i, word in enumerate(words):
parsed_url = urlparse(word)
if parsed_url.scheme and parsed_url.netloc:
words[i] = replacement_text
return ' '.join(words)
# Example:
input_text = "Check out the GeeksforGeeks website at https://www.geeksforgeeks.org/ for programming tutorials."
output_text_urllib = remove_urls_urllib(input_text)
print("Using urllib.parse:")
print("Text with URLs Removed:")
print(output_text_urllib)
OutputUsing urllib.parse:
Text with URLs Removed:
Check out the GeeksforGeeks website at [URL REMOVED] for programming tutorials.
Similar Reads
Python - Remove suffix from string list
To remove a suffix from a list of strings, we identify and exclude elements that end with the specified suffix. This involves checking each string in the list and ensuring it doesn't have the unwanted suffix at the end, resulting in a list with only the desired elements.Using list comprehensionUsing
3 min read
Python - Remove String from String List
This particular article is indeed a very useful one for Machine Learning enthusiast as it solves a good problem for them. In Machine Learning we generally encounter this issue of getting a particular string in huge amount of data and handling that sometimes becomes a tedious task. Lets discuss certa
4 min read
Python - Remove substring list from String
In Python Strings we encounter problems where we need to remove a substring from a string. However, in some cases, we need to handle a list of substrings to be removed, ensuring the string is adjusted accordingly. Using String Replace in a LoopThis method iterates through the list of substrings and
3 min read
Python - Remove after substring in String
Removing everything after a specific substring in a string involves locating the substring and then extracting only the part of the string that precedes it. For example we are given a string s="Hello, this is a sample string" we need to remove the part of string after a particular substring includin
3 min read
Python | Removing strings from tuple
Sometimes we can come across the issue in which we receive data in form of tuple and we just want the numbers from it and wish to erase all the strings from them. This has a useful utility in Web-Development and Machine Learning as well. Let's discuss certain ways in which this particular task can b
4 min read
Python - Remove Punctuation from String
In this article, we will explore various methods to Remove Punctuations from a string.Using str.translate() with str.maketrans()str.translate() method combined with is str.maketrans() one of the fastest ways to remove punctuation from a string because it works directly with string translation tables
2 min read
Remove spaces from a string in Python
Removing spaces from a string is a common task in Python that can be solved in multiple ways. For example, if we have a string like " g f g ", we might want the output to be "gfg" by removing all the spaces. Let's look at different methods to do so:Using replace() methodTo remove all spaces from a s
2 min read
Python | Removing Initial word from string
During programming, sometimes, we can have such a problem in which it is required that the first word from the string has to be removed. These kinds of problems are common and one should be aware about the solution to such problems. Let's discuss certain ways in which this problem can be solved. Met
4 min read
Python | Remove prefix strings from list
Sometimes, while working with data, we can have a problem in which we need to filter the strings list in such a way that strings starting with a specific prefix are removed. Let's discuss certain ways in which this task can be performed. Method #1 : Using loop + remove() + startswith() The combinati
5 min read
How to Remove HTML Tags from String in Python
Removing HTML tags from a string in Python can be achieved using various methods, including regular expressions and specialized libraries like Beautiful Soup. Each approach is suitable for specific scenarios, depending on your requirements. Letâs explore how to efficiently remove HTML tags.Using Reg
2 min read