Implementing News Parser using Template Method Design Pattern in Python
Last Updated :
25 Oct, 2020
While defining algorithms, programmers often neglect the importance of grouping the same methods of different algorithms. Normally, they define algorithms from start to end and repeat the same methods in every algorithm. This practice leads to code duplication and difficulties in code maintenance – even for a small logic change, the programmer has to update the code in several places.
A common example is building authentication using social network accounts. The authentication process using different social network accounts are similar in general but slightly varies in implementation level. If you are defining the algorithms for different accounts from start to end without separating the common methods, it leads to code duplication and difficulties in code maintenance.
Template Design Pattern is a design pattern in Python that provides a dedicated pattern to prevent code duplication. In this design pattern, the same methods will be implemented in the abstract class, and the algorithms that are derived from this abstract class can reuse the methods. It has a template method that facilitates the method call for every derived algorithm. Let's look into the benefits of the template design pattern.
- It allows a class to control and expose its parts
- It provides great extensibility
- Avoids code duplication
- Ease of code maintenance
News Parser Implementation
Let's implement a news parser to get the latest news from different sites. Here, we consider the RSS Feed and Atom Feed to fetch the latest news. Both of these feeds are based on XML protocol, with a few differences in XML structure. You can check the XML structure of RSS and Atom.
Here, our template design pattern consists of two concrete classes – YahooNewsParser and GoogleNewsParser – and these are derived from an abstract class called AbstractNewsParser. This abstract class contains the template method – print_latest_news() – that calls the primitive operation methods. Here, the primitive operation methods include both common algorithms as well as different algorithms, in which common algorithms are defined in the abstract class itself, and different algorithms are redefined in the respective concrete classes.
NewsParser
From the above diagram, it is clear that get_url() and parse_content() primitive operation methods are redefined in respective concrete classes. This is because the URL and XML structure differs w.r.t to the feed. So, it is necessary to redefine these methods to achieve the required functionalities. The other primitive methods such as get_raw_content() and content_crop() are common methods and are defined in the abstract class itself. The template method, print_lates_news(), is responsible for calling these primitive methods. Let's get into the code implementation.
Python3
import abc
import urllib.request
from xml.dom.minidom import parseString
class AbstractNewsParser(object, metaclass=abc.ABCMeta):
def __init__(self):
# Restrict creating abstract class instance
if self.__class__ is AbstractNewsParser:
raise TypeError('Abstract class cannot be instantiated')
def print_latest_news(self):
""" A Template method, returns 3 latest news for every
news website """
url = self.get_url()
raw_content = self.get_raw_content(url)
content = self.parse_content(raw_content)
cropped = self.content_crop(content)
for item in cropped:
print('Title: ', item['title'])
print('Content: ', item['content'])
print('Link: ', item['link'])
print('Published ', item['published'])
print('Id: ', item['id'])
@abc.abstractmethod
def get_url(self):
pass
def get_raw_content(self, url):
return urllib.request.urlopen(url).read()
@abc.abstractmethod
def parse_content(self, content):
pass
def content_crop(self, parsed_content, max_items=3):
return parsed_content[:max_items]
class YahooNewsParser(AbstractNewsParser):
def get_url(self):
return 'https://news.yahoo.com/rss/'
def parse_content(self, raw_content):
yahoo_parsed_content = []
dom = parseString(raw_content)
for node in dom.getElementsByTagName('item'):
yahoo_parsed_item = {}
try:
yahoo_parsed_item['title'] = node.getElementsByTagName('title')[0].\
childNodes[0].nodeValue
except IndexError:
yahoo_parsed_item['title'] = None
try:
yahoo_parsed_item['content'] = node.getElementsByTagName('description')[0].\
childNodes[0].nodeValue
except IndexError:
yahoo_parsed_item['content'] = None
try:
yahoo_parsed_item['link'] = node.getElementsByTagName('link')[0].\
childNodes[0].nodeValue
except IndexError:
yahoo_parsed_item['link'] = None
try:
yahoo_parsed_item['id'] = node.getElementsByTagName('guid')[0].\
childNodes[0].nodeValue
except IndexError:
yahoo_parsed_item['id'] = None
try:
yahoo_parsed_item['published'] = node.getElementsByTagName('pubDate')[0].\
childNodes[0].nodeValue
except IndexError:
yahoo_parsed_item['published'] = None
yahoo_parsed_content.append(yahoo_parsed_item)
return yahoo_parsed_content
class GoogleNewsParser(AbstractNewsParser):
def get_url(self):
return 'https://news.google.com/atom'
def parse_content(self, raw_content):
google_parsed_content = []
dom = parseString(raw_content)
for node in dom.getElementsByTagName('entry'):
google_parsed_item = {}
try:
google_parsed_item['title'] = node.getElementsByTagName('title')[0].\
childNodes[0].nodeValue
except IndexError:
google_parsed_item['title'] = None
try:
google_parsed_item['content'] = node.getElementsByTagName('content')[0].\
childNodes[0].nodeValue
except IndexError:
google_parsed_item['content'] = None
try:
google_parsed_item['link'] = node.getElementsByTagName('href')[0].\
childNodes[0].nodeValue
except IndexError:
google_parsed_item['link'] = None
try:
google_parsed_item['id'] = node.getElementsByTagName('id')[0].\
childNodes[0].nodeValue
except IndexError:
google_parsed_item['id'] = None
try:
google_parsed_item['published'] = node.getElementsByTagName('title')[0].\
childNodes[0].nodeValue
except IndexError:
google_parsed_item['published'] = None
google_parsed_content.append(google_parsed_item)
return google_parsed_content
class NewsParser(object):
def get_latest_news(self):
yahoo = YahooNewsParser()
print('Yahoo: \n', yahoo.print_latest_news())
print()
print()
google = GoogleNewsParser()
print('Google: \n', google.print_latest_news())
if __name__ == '__main__':
newsParser = NewsParser()
newsParser.get_latest_news()
Output
Yahoo News Parser
Google News Parser
A template design pattern provides the best design solution when you have an algorithm that has the same behavior with a different implementation process. It helps to design a standard structure for an algorithm in such a way that the derived classes can redefine the steps without changing the structure.
Similar Reads
Template Method - Python Design Patterns
The Template method is a Behavioral Design Pattern that defines the skeleton of the operation and leaves the details to be implemented by the child class. Its subclasses can override the method implementations as per need but the invocation is to be in the same way as defined by an abstract class. I
4 min read
Memento Method Design Pattern in Python
The Memento Design Pattern is a behavioral design pattern that allows you to capture and restore an object's state without revealing its internal structure. This pattern is particularly useful in implementing undo mechanisms. In this article, we will explore the Memento Method Design Pattern in Pyth
8 min read
Implementing Weather Forecast using Facade Design Pattern in Python
Facade Design Patterns are design patterns in Python that provide a simple interface to a complex subsystem. When we look into the world around us, we can always find facade design patterns. An automobile is the best example: you don't need to understand how the engine functions. To operate the engi
3 min read
Strategy Method - Python Design Patterns
The strategy method is Behavioral Design pattern that allows you to define the complete family of algorithms, encapsulates each one and putting each of them into separate classes and also allows to interchange there objects. It is implemented in Python by dynamically replacing the content of a metho
3 min read
State Method - Python Design Patterns
State method is Behavioral Design Pattern that allows an object to change its behavior when there occurs a change in its internal state. It helps in implementing the state as a derived class of the state pattern interface. If we have to change the behavior of an object based on its state, we can hav
4 min read
Facade Method Design Pattern in Python
The Facade Method Design Pattern in Python simplifies complex systems by providing a unified interface to a set of interfaces in a subsystem. This pattern helps in reducing the dependencies between clients and the intricate system, making the code more modular and easier to understand. By implementi
7 min read
Singleton Method - Python Design Patterns
Prerequisite: Singleton Design pattern | IntroductionWhat is Singleton Method in PythonSingleton Method is a type of Creational Design pattern and is one of the simplest design patterns available to us. It is a way to provide one and only one object of a particular type. It involves only one class t
5 min read
Implementing Web Crawler using Abstract Factory Design Pattern in Python
In the Abstract Factory design pattern, every product has an abstract product interface. This approach facilitates the creation of families of related objects that is independent of their factory classes. As a result, you can change the factory at runtime to get a different object  â simplifies the
5 min read
Accessing Web Resources using Factory Method Design Pattern in Python
A factory is a class for building other objects, where it creates and returns an object based on the passed parameters. Here, the client provides materials to the Factory class, and the Factory class creates and returns the products based on the given material. A Factory is not a design pattern, but
4 min read
Prototype Method Design Pattern in Python
The Prototype Method Design Pattern in Python enables the creation of new objects by cloning existing ones, promoting efficient object creation and reducing overhead. This pattern is particularly useful when the cost of creating a new object is high and when an object's initial state or configuratio
6 min read