Generating Word Cloud in R Programming
Last Updated :
07 May, 2024
Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud. Word clouds are widely used for analyzing data from social network websites.
Why Word Cloud?
The reasons one should use word clouds to present the text data are:
- Word clouds add simplicity and clarity. The most used keywords stand out better in a word cloud
- Word clouds are a potent communication tool. They are easy to understand, to be shared, and are impactful.
- Word clouds are visually engaging than a table data.
Implementation in R
Here are steps to create a word cloud in R Programming.
Step 1: Create a Text File
Copy and paste the text in a plain text file (e.g:file.txt) and save the file.
Step 2: Install and Load the Required Packages
Python3
# install the required packages
install.packages("tm") # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator
install.packages("RColorBrewer") # color palettes
# load the packages
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
Step 3: Text Mining
- Load the Text:
The text is loaded using Corpus() function from text mining(tm) package. Corpus is a list of a document.
- Start by importing text file created in step 1:
To import the file saved locally in your computer, type the following R code. You will be asked to choose the text file interactively.
Python3
text = readLines(file.choose())
Load the data as a corpus:
Python3
# VectorSource() function
# creates a corpus of
# character vectors
docs = Corpus(VectorSource(text))
Text transformation:
Transformation is performed using tm_map() function to replace, for example, special characters from the text like "@", "#", "/".
Python3
toSpace = content_transformer
(function (x, pattern)
gsub(pattern, " ", x))
docs1 = tm_map(docs, toSpace, "/")
docs1 = tm_map(docs, toSpace, "@")
docs1 = tm_map(docs, toSpace, "#")
- Cleaning the Text:
The tm_map() function is used to remove unnecessary white space, to convert the text to lower case, to remove common stopwords. Numbers can be removed using removeNumbers.
Python3
# Convert the text to lower case
docs1 = tm_map(docs1,
content_transformer(tolower))
# Remove numbers
docs1 = tm_map(docs1, removeNumbers)
# Remove white spaces
docs1 = tm_map(docs1, stripWhitespace)
Step 4: Build a term-document Matrix
Document matrix is a table containing the frequency of the words. Column names are words and row names are documents. The function TermDocumentMatrix() from text mining package can be used as follows.
Python3
dtm = TermDocumentMatrix(docs)
m = as.matrix(dtm)
v = sort(rowSums(m), decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
head(d, 10)
Step 5: Generate the Word Cloud
The importance of words can be illustrated as a word cloud as follows.
Python3
wordcloud(words = d$word,
freq = d$freq,
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
The complete code for the word cloud in R is given below.
Python3
# R program to illustrate
# Generating word cloud
# Install the required packages
install.packages("tm") # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator
install.packages("RColorBrewer") # color palettes
# Load the packages
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
# To choose the text file
text = readLines(file.choose())
# VectorSource() function
# creates a corpus of
# character vectors
docs = Corpus(VectorSource(text))
# Text transformation
toSpace = content_transformer(
function (x, pattern)
gsub(pattern, " ", x))
docs1 = tm_map(docs, toSpace, "/")
docs1 = tm_map(docs, toSpace, "@")
docs1 = tm_map(docs, toSpace, "#")
strwrap(docs1)
# Cleaning the Text
docs1 = tm_map(docs1, content_transformer(tolower))
docs1 = tm_map(docs1, removeNumbers)
docs1 = tm_map(docs1, stripWhitespace)
# Build a term-document matrix
dtm = TermDocumentMatrix(docs)
m = as.matrix(dtm)
v = sort(rowSums(m),
decreasing = TRUE)
d = data.frame(word = names(v),
freq = v)
head(d, 10)
# Generate the Word cloud
wordcloud(words = d$word,
freq = d$freq,
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
Output:


Advantages of Word Clouds
- Analyzing customer and employee feedback.
- Identifying new SEO keywords to target.
- Word clouds are killer visualisation tools. They present text data in a simple and clear format
- Word clouds are great communication tools. They are incredibly handy for anyone wishing to communicate a basic insight
Drawbacks of Word Clouds
- Word Clouds are not perfect for every situation.
- Data should be optimized for context.
- Word clouds typically fail to give the actionable insights that needs to improve and grow the business.
Similar Reads
Hello World in R Programming When we start to learn any programming languages we do follow a tradition to begin HelloWorld as our first basic program. Here we are going to learn that tradition. An interesting thing about R programming is that we can get our things done with very little code. Before we start to learn to code, le
2 min read
How to Code in R programming? R is a powerful programming language and environment for statistical computing and graphics. Whether you're a data scientist, statistician, researcher, or enthusiast, learning R programming opens up a world of possibilities for data analysis, visualization, and modeling. This comprehensive guide aim
4 min read
Functions in R Programming A function accepts input arguments and produces the output by executing valid R commands that are inside the function. Functions are useful when you want to perform a certain task multiple times. In R Programming Language when you are creating a function the function name and the file in which you a
8 min read
File Handling in R Programming In R Programming, handling of files such as reading and writing files can be done by using in-built functions present in R base package. In this article, let us discuss reading and writing of CSV files, creating a file, renaming a file, check the existence of the file, listing all files in the worki
4 min read
R6 Classes in R Programming In Object-Oriented Programming (OOP) of R Language, encapsulation means binding the data and methods inside a class. The R6 package is an encapsulated OOP system that helps us use encapsulation in R. R6 package provides R6 class which is similar to the reference class in R but is independent of the
3 min read
Exporting Data from scripts in R Programming In R, when a program terminates, all data is lost unless it is exported to a file. Exporting data ensures its preservation, even after the program ends, and allows for easy sharing, storage, and transfer between systems.Exporting data is essential to prevent loss of information. It allows for:Data P
6 min read
Learn R Programming R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS. In this R Language tutorial, we will Learn R Programming La
15+ min read
Array Operations in R Programming Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures. Now, letâs
4 min read
R Tutorial | Learn R Programming Language R is an interpreted programming language widely used for statistical computing, data analysis and visualization. R language is open-source with large community support. R provides structured approach to data manipulation, along with decent libraries and packages like Dplyr, Ggplot2, shiny, Janitor a
4 min read
Working with CSV files in R Programming CSV (Comma-Separated Values) files are plain text files where each row contains data values separated by commas or other delimiters such as tabs. These files are commonly used for storing tabular data and can be easily imported and manipulated in R. We will explore how to efficiently work with CSV f
3 min read