Generating Word Cloud in Python
Last Updated :
03 Jun, 2025
A Word Cloud is a picture made up of words where the size of each word shows how frequently it appears in the dataset i.e The bigger the word appears in the cloud the more often that word is used. They help us to identify the most common and important words in a text. In this article, we will understand about word cloud and how to generate it using Python.
For Example: If we analyze customer reviews of a movie like "good", "bad" or "average" might be bigger if they are mentioned many times. These are useful because they:
- Quickly show the most common words in text data
- Help to understand what people are talking about in large text files
- Make text data look visually appealing
- Allow easy identification of important words
Implementing Word Cloud in Python
We will be using IMDB dataset and this dataset contains 50,000 movie reviews in CSV format. We will focus on the review column which contains the text data of the movie reviews. You can download it from here. Below is the step by step implementation:
Step 1: Loading the Dataset
Let's load the dataset using pandas.
Python
import pandas as pd
df = pd.read_csv('/content/IMDB-Dataset.csv')
print(df.head())
Output:
DatasetStep 2: Understanding the Dataset
Before cleaning the text let's understand the dataset. The dataset contains two columns:
- review: Contains the movie review text
- sentiment: It shows whether the review is positive or negative
We are only interested in the review column. Let's check the column names and some sample text.
Python
print(df.columns)
print(df['review'][0])
Output:
Index(['review', 'sentiment'], dtype='object') One of the ....your darker side.
The review column contains detailed text reviews of movies. Our goal is to extract the most frequent words from these reviews.
Step 3: Cleaning the Text Data
Before generating the word cloud, we need to clean the text data which involves:
1. Removing punctuation
2. Converting text to lowercase
3. Removing stopwords i.e common words like "the", "is", "and"
- re.sub(): This removes punctuation and numbers
- STOPWORDS: These are list of common stopwords
Python
import re
from wordcloud import STOPWORDS
text = ' '.join(df['review'].astype(str).tolist())
text = re.sub(r'[^A-Za-z\s]', '', text)
text = text.lower()
stopwords = set(STOPWORDS)
text = ' '.join(word for word in text.split() if word not in stopwords)
Step 4: Generating the Word Cloud
Now our text is clean, let's generate the word cloud.
- WordCloud(): Generates the word cloud
Python
from wordcloud import WordCloud
import matplotlib.pyplot as plt
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("IMDB Movie Reviews Word Cloud")
plt.show()
Output:
Word CloudStep 5: Customizing the Word Cloud
We can customize the word cloud with different options like:
1. Maximum number of words
2. Color scheme
3. Shape of the cloud
- max_words: Limits the number of words
- colormap: Changes the color of the word cloud
Python
wordcloud = WordCloud(width=800, height=400, background_color='white', max_words=100, colormap='coolwarm').generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Customized IMDB Movie Reviews Word Cloud")
plt.show()
Output:

Real life applications of Word Cloud
- Sentiment Analysis: Imagine we have hundreds of customer reviews. By creating two word clouds one for positive words like "great" and "friendly" and another for negative words like "late" and "broken" we can easily see what customers like or dislike.
- Social Media Analysis: Observing what's trending on social media by collecting hashtags and keywords, word clouds can visually highlight what's being talked about the most.
- Real-Time Data: In live customer chats or support systems it can instantly show common issues like "delivery delay" or "payment error" which helps teams to respond faster.
By combining word clouds with NLP techniques we can see patterns, understand customer needs and make smarter data-driven decisions.
Similar Reads
Generating Word Cloud in Python | Set 2
Prerequisite: Generating Word Cloud in Python | Set - 1Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud. Word clouds are widely used
5 min read
Generate Word Clouds Of Any Shape In Python
In this article, we will discuss how to create word clouds of any shape in Python. The term WordCloud refers to a data visualization technique for showing text data in which the size of each word indicates its frequency or relevance. To create a word cloud of any shape, use Python's Matplotlib, wor
6 min read
Generating random Id's in Python
Generating random IDs in Python is useful when we need unique identifiers for things like user accounts, sessions, or database entries. In this article we will see Various methods to Generate Random ID's in Python.Using random.randint()random.randint() method in Python is used to generate random int
2 min read
Generating meshes in Python
In computer graphics and scientific computing, the mesh is an arrangement of points, lines, and surfaces that outline the shape and structure of a 3D object or surface. Making meshes is a critical process in several industries, such as 3D modelling, simulation, visualization, and gaming. In this art
5 min read
Generators in Python
Python generator functions are a powerful tool for creating iterators. In this article, we will discuss how the generator function works in Python.Generator Function in PythonA generator function is a special type of function that returns an iterator object. Instead of using return to send back a si
5 min read
Cloud Computing with Python
Cloud services offer on-demand computing resources, making applications more scalable, cost-effective and accessible from anywhere.Python has become one of the most popular programming languages for cloud computing due to its simplicity, flexibility and vast ecosystem of libraries. Whether youâre de
5 min read
Convert Generator Object To JSON In Python
JSON (JavaScript Object Notation) is a widely used data interchange format, and Python provides excellent support for working with JSON data. However, when it comes to converting generator objects to JSON, there are several methods to consider. In this article, we'll explore some commonly used metho
2 min read
10 Interesting Python Code Tricks
In python we can return multiple values - It's very unique feature of Python that returns multiple value at time. Python def GFG(): g = 1 f = 2 return g, f x, y = GFG() print(x, y) Output(1, 2) Allows Negative Indexing: Python allows negative indexing for its sequences. Index -1 refer to the last el
2 min read
Python â The new generation Language
INTRODUCTION: Python is a widely-used, high-level programming language known for its simplicity, readability, and versatility. It is often used in scientific computing, data analysis, artificial intelligence, and web development, among other fields. Python's popularity has been growing rapidly in re
6 min read
Get Current Value Of Generator In Python
Python generators are powerful constructs that allow lazy evaluation of data, enabling efficient memory usage and improved performance. When working with generators, it's essential to have tools to inspect their current state. In this article, we'll explore some different methods to get the current
3 min read