Open In App

Python | Sorting URL on basis of Top Level Domain

Last Updated : 11 May, 2020
Comments
Improve
Suggest changes
Like Article
Like
Report
Given a list of URL, the task is to sort the URL in the list based on the top-level domain. A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet. Example - org, com, edu. This is mostly used in a case where we have to scrap the pages and sort URL according to top-level domain. It is widely used in open-source projects and serves as handy snippet for use.
Input :
url = ["https://www.isb.edu", "www.google.com", 
"http://cyware.com", "https://www.gst.in", 
"https://www.coursera.org", "https://www.create.net", 
"https://www.ontariocolleges.ca"]

Output :
['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu', 
'https://www.gst.in', 'https://www.create.net',
 'https://www.coursera.org']

Explanation:
The Tld for the above list is in sorted order
['.ca','.com','.com','.edu','.in','.net','.org']

Below are some ways to do the above task. Method 1: Using sorted You can split the input and then use sorting to sort according to TLD. Python3
#Python code to sort the URL in the list based on the top-level domain.

#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
 "https://www.gst.in", "https://www.coursera.org",
 "https://www.create.net", "https://www.ontariocolleges.ca"]

#Function to sort in tld order
def tld(Input):
    return Input.split('.')[-1]

#Using sorted and calling function
Output = sorted(Input,key=tld)

#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)
Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']
Method 2: Using Lambda The most concise and readable way to sort the URL in the list based on the top-level domain is using lambda. Python3
#Python code to sort the URL in the list based on the top-level domain.

#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
"https://www.gst.in", "https://www.coursera.org",
"https://www.create.net", "https://www.ontariocolleges.ca"]

#Using lambda and sorted 
Output = sorted(Input,key=lambda x: x.split('.')[-1])

#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)
Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']
Method 3: Using reversed Reversing the input and splitting it and then applying a sort to sort URL according to TLD Python3
#Python code to sort the URL in the list based on the top-level domain.

#Url list initialization
Input = ["https://www.isb.edu", "www.google.com", "http://cyware.com",
"https://www.gst.in", "https://www.coursera.org",
"https://www.create.net", "https://www.ontariocolleges.ca"]

#Internal function for reversed
def internal(string):
    return list(reversed(string.split('.')))

#Using sorted and calling internal for reversed
Output = sorted(Input, key=internal)

#Printing output
print("Initial list is :")
print(Input)
print("sorted list according to TLD is")
print(Output)
Initial list is :

['https://www.isb.edu', 'www.google.com', 'http://cyware.com',
 'https://www.gst.in', 'https://www.coursera.org', 
'https://www.create.net', 'https://www.ontariocolleges.ca']

Sorted list according to TLD is :

['https://www.ontariocolleges.ca', 'www.google.com', 
'http://cyware.com', 'https://www.isb.edu',
 'https://www.gst.in', 'https://www.create.net', 'https://www.coursera.org']


Next Article
Article Tags :
Practice Tags :

Similar Reads