Open In App

Natural Language Generation with R

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Natural Language Generation (NLG) is a subfield of Artificial Intelligence (AI) that focuses on creating human-like text based on data or structured information. It’s the process that powers chatbots, automated news articles, and other systems that need to generate text automatically. In this article, we’ll explore how you can implement basic NLG techniques in R Programming Language.

What is the Natural Language Generation?

Natural Language Generation is the process of turning structured data into natural language text. This means taking numbers, symbols, or any form of data and converting it into sentences and paragraphs that humans can easily read and understand. NLG can be used for various tasks like writing reports, generating product descriptions, summarizing information, and even creating conversational responses for chatbots.

Why Use NLG?

NLG is important because it allows us to automate the process of writing. Instead of manually creating reports or summaries, NLG can automatically generate these texts from data, saving time and effort. This is particularly useful in industries that handle large amounts of data and need to produce regular reports, such as finance, healthcare, and customer service.

How NLG Works?

  • Templates: Templates are predefined structures for sentences or paragraphs where specific parts can be filled with data. For example, a template might be, "The temperature today is [temperature] degrees," where [temperature] is replaced with actual data.
  • String Interpolation: This involves inserting variables directly into a string. For example, using a template to create a sentence like, "Today’s sales were [sales] units."
  • Recurrent Neural Networks (RNNs): RNNs are a type of neural network designed to recognize patterns in sequences of data, like text. They are particularly useful in NLG because they can learn from large text datasets and generate coherent sentences and paragraphs.

R Packages for NLG

R offers several packages for NLG that help automate the creation of text such are:

  • text Package: This package provides tools for text analysis and generation. It uses various statistical methods and machine learning techniques for text processing and generation.
  • rnn Package: It implements Recurrent Neural Networks (RNNs), which are suitable for tasks that involve sequential data like text. RNNs can generate text by predicting the next word or character based on the previous ones.
  • keras Package: It provides an R interface to Keras, a high-level neural networks API. Keras is particularly useful for building and training complex neural networks, including RNNs and LSTMs (Long Short-Term Memory networks), which are advanced types of RNNs designed to remember longer sequences of data.
  • tensorflow Package: This package provides an R interface to TensorFlow, an open-source library for dataflow and differentiable programming. It is used in conjunction with Keras for deep learning tasks, including NLG.

Now we implement step by step Natural Language Generation with R Programming Language.

Step 1: Install and Load the necessary packages

First we install and load the required packages.

R
install.packages("keras")
install.packages("tensorflow")
library(keras)
library(tensorflow)

Step 2: Prepare Your Data

For this example, we will use a simple text corpus from a famous novel. You can replace it with any text data you have.

R
# Sample text data
text_data <- "It was the best of times, it was the worst of times, it was the age of wisdom, 
              it was the age of foolishness..."

# Convert the text to lowercase and remove special characters
text_data <- tolower(text_data)
text_data <- gsub("[^a-z ]", "", text_data)

# Split the text into characters
chars <- unlist(strsplit(text_data, NULL))

# Create a character index mapping
chars_unique <- unique(chars)
char_index <- 1:length(chars_unique)
names(char_index) <- chars_unique

# Convert characters to integers
text_indices <- unlist(lapply(chars, function(x) char_index[x]))

# Set parameters for text generation
maxlen <- 40  # Length of input sequences
step <- 3     # Step size for moving the input window

Step 3: Prepare the Input and Output Data

We need to create input sequences and their corresponding target characters.

R
# Initialize empty lists for storing input-output pairs
input_sequences <- list()
target_chars <- list()

# Loop through the text indices to create input-output pairs
for (i in seq(1, length(text_indices) - maxlen, by = step)) {
  input_sequences[[length(input_sequences) + 1]] <- text_indices[i:(i + maxlen - 1)]
  target_chars[[length(target_chars) + 1]] <- text_indices[i + maxlen]
}

# Convert lists to arrays for use in the neural network
X <- array(0, dim = c(length(input_sequences), maxlen, length(chars_unique)))
y <- array(0, dim = c(length(input_sequences), length(chars_unique)))

for (i in 1:length(input_sequences)) {
  seq_indices <- input_sequences[[i]]
  X[i, , ] <- sapply(seq_indices, function(x) {as.numeric(x == char_index)})
  y[i, target_chars[[i]]] <- 1
}

Step 4: Build and Compile the LSTM Model

We’ll now create an LSTM model using Keras.

R
# Initialize the sequential model
model <- keras_model_sequential() %>%
  # Add an LSTM layer with 128 units
  layer_lstm(units = 128, input_shape = c(maxlen, length(chars_unique))) %>%
  # Add a dense output layer with softmax activation for character prediction
  layer_dense(units = length(chars_unique), activation = 'softmax')

# Compile the model with categorical cross-entropy loss and Adam optimizer
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(),
  metrics = 'accuracy'
)

# Print the model summary
summary(model)

Output:

Model: "sequential"
________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm (LSTM) (None, 128) 74752
dense (Dense) (None, 17) 2193
================================================================================
Total params: 76945 (300.57 KB)
Trainable params: 76945 (300.57 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________

Step 5: Train the Model

Train the LSTM model on the text data. Training on a small text corpus might not give meaningful results, but this is just for demonstration purposes.

R
# Fit the model to the data
model %>% fit(
  X, y,
  batch_size = 128,
  epochs = 60,
  verbose = 2
)

Output:

Epoch 1/60

1/1 - 3s - loss: 2.8316 - accuracy: 0.1364 - 3s/epoch - 3s/step
Epoch 2/60
1/1 - 0s - loss: 2.8154 - accuracy: 0.1818 - 62ms/epoch - 62ms/step
Epoch 3/60
1/1 - 0s - loss: 2.7990 - accuracy: 0.2273 - 31ms/epoch - 31ms/step
Epoch 4/60
1/1 - 0s - loss: 2.7818 - accuracy: 0.2273 - 29ms/epoch - 29ms/step
Epoch 5/60 ..........................................................................................................................

Step 6: Generate Text

Once the model is trained, we can generate new text based on the learned patterns.

R
# Function to sample an index from the prediction probabilities
sample_from_prob <- function(preds, temperature = 1.0) {
  preds <- log(preds) / temperature
  exp_preds <- exp(preds)
  preds <- exp_preds / sum(exp_preds)
  rmultinom(1, 1, preds) %>% which.max()
}

# Seed text for generation
seed_text <- "it was the "
seed_chars <- unlist(strsplit(tolower(seed_text), NULL))

# Convert seed text to indices
seed_indices <- unlist(lapply(seed_chars, function(x) char_index[x]))

# Generate text
generated_text <- seed_text

for (i in 1:200) {  # Generate 200 characters
  x_pred <- array(0, dim = c(1, maxlen, length(chars_unique)))
  
  for (t in 1:length(seed_indices)) {
    x_pred[1, t, seed_indices[t]] <- 1
  }
  
  preds <- model %>% predict(x_pred, verbose = 0)
  next_index <- sample_from_prob(preds[1, ], temperature = 0.5)
  next_char <- names(char_index)[next_index]
  
  generated_text <- paste0(generated_text, next_char)
  seed_indices <- c(seed_indices[-1], next_index)
}

cat(generated_text)

Output:

it was the aee of foolishness of foolishness

Conclusion

Creating text from data in R is a powerful tool that helps turn raw numbers or facts into easy-to-read sentences or paragraphs. Whether you’re making simple reports or building more complex projects, R gives you everything you need to start generating natural language. By using tools like the `glue` and `text` packages, along with R's ability to handle and modify data, you can create tailored text that makes your data more engaging and understandable.


Article Tags :

Similar Reads