Open In App

FastText Working and Implementation

Last Updated : 28 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Word embeddings have become an important part of modern natural language processing, but traditional approaches like Word2Vec struggle with out-of-vocabulary words and morphologically rich languages. FastText addresses these limitations through a subword-based approach that captures semantic meaning at the character level while maintaining computational efficiency.

Understanding FastText Architecture

FastText extends the Skip-gram and CBOW models by representing words as bags of character n-grams rather than atomic units. This fundamental shift allows the model to generate embeddings for previously unseen words and capture morphological relationships between related terms.

The Subword Approach

Traditional word embedding models treat each word as an indivisible token. FastText breaks words into character n-grams, enabling it to understand word structure and meaning at a granular level.

Consider the word "running":

  • 3-grams: <ru, run, unn, nni, nin, ing, ng>
  • 4-grams: <run, runn, unni, nnin, ning, ing>
  • 5-grams: <runn, runni, unnin, nning, ning>

The angle brackets indicate word boundaries, helping the model distinguish between subwords that appear at different positions.

Hierarchical Softmax Optimization

FastText employs hierarchical softmax instead of standard softmax for computational efficiency. Rather than computing probabilities across all vocabulary words, it constructs a binary tree where each leaf represents a word and internal nodes represent probability distributions.

Key advantages of hierarchical softmax:

  • Reduces time complexity from O(V) to O(log V) where V is vocabulary size
  • Uses Huffman coding to optimize frequent word access
  • Maintains prediction accuracy while significantly improving training speed

Step-by-Step Implementation

Step 1: Installing and Importing FastText

Install FastText using pip and import the required libraries:

Python
import fasttext
import os

Note: Use numpy==1.24.4 for compatibility with FastText

Step 2: Creating Training Data

  • Prepares example sentences related to royalty, exercise and reading.
  • Writes each sentence in lowercase into a text file for FastText training.
Python
def create_sample_data():
    # Sample sentences for training
    sentences = [
        "The king rules the kingdom",
        "The queen helps the king",
        "Running is good exercise", 
        "The runner runs fast",
        "Walking is healthy activity",
        "The walker walks slowly",
        "Reading books is fun",
        "The reader reads daily"
    ]
    
    # Save to text file (one sentence per line)
    with open('training_data.txt', 'w') as f:
        for sentence in sentences:
            f.write(sentence.lower() + '\n')  # Convert to lowercase
    
    print("Training data created in 'training_data.txt'")

create_sample_data()

Output:

Training data created in 'training_data.txt'

Step 3: Training a Basic FastText Model

  • Trains a skipgram model using FastText on the created text file.
  • Saves the trained word vector model to a .bin file.
Python
def train_simple_model():
    # Train skipgram model (predicts context from target word)
    model = fasttext.train_unsupervised(
        'training_data.txt',    # Input file
        model='skipgram',       
        dim=50,                 # Embedding dimension
        epoch=10,               # Number of training iterations
        minCount=1,             # Minimum word frequency
        minn=3,                 # Minimum character n-gram length
        maxn=6                  # Maximum character n-gram length
    )
    
    model.save_model('word_vectors.bin')
    print("Model trained and saved as 'word_vectors.bin'")
    return model


model = train_simple_model()

Output:

Model trained and saved as 'word_vectors.bin'

Step 4: Getting Word Vectors

  • Retrieves vector representations of words using the trained model.
  • Shows vector values for known and out-of-vocabulary (OOV) words.
Python
def get_word_embeddings(model):
    king_vector = model.get_word_vector('king')
    print(f"Vector for 'king': {king_vector[:5]}...")
    print(f"Vector shape: {king_vector.shape}")
    
    kingdom_vector = model.get_word_vector('kingdom')
    print(f"Vector for 'kingdom' (OOV): {kingdom_vector[:5]}...")
    
    return king_vector, kingdom_vector

king_vec, kingdom_vec = get_word_embeddings(model)

Output:

Vector for 'king': [-0.0001826 -0.00033079 0.0004302 0.00088911 -0.00164602]...
Vector shape: (50,)
Vector for 'kingdom' (OOV): [ 0.00122273 0.00092931 -0.00018005 -0.00013839 -0.00051276]...

Step 5: Finding Similar Words

  • Uses the model to find top-k words most similar to a given query word.
  • Displays similar words along with their similarity scores.
Python
def find_similar_words(model, word, k=3):
    print(f"\nWords similar to '{word}':")
    try:
        neighbors = model.get_nearest_neighbors(word, k)
        for i, (similarity, similar_word) in enumerate(neighbors, 1):
            print(f"{i}. {similar_word}: {similarity:.4f}")
    except Exception as e:
        print(f"Error: {e}")

find_similar_words(model, 'king')
find_similar_words(model, 'running')

Output:

Words similar to 'king':
1. walks: 0.2693
2. running: 0.1971
3. queen: 0.1912

Words similar to 'running':
1. runner: 0.4778
2. the: 0.3344
3. runs: 0.2653

Step 6: Text Classification Implementation

  • Creates labeled movie review data with __label__ prefixes for classification.
  • Stores the data in movie_reviews.txt.
Python
def create_classification_data():
    reviews = [
        ("This movie is amazing and fun", "positive"),
        ("Great acting and story", "positive"), 
        ("Excellent film with good plot", "positive"),
        ("Wonderful cinematography", "positive"),
        ("Terrible movie very boring", "negative"),
        ("Bad acting and poor story", "negative"),
        ("Worst film ever made", "negative"),
        ("Boring and predictable plot", "negative")
    ]
    
    with open('movie_reviews.txt', 'w') as f:
        for text, label in reviews:
            f.write(f"__label__{label} {text.lower()}\n")
    
    print("Classification data created in 'movie_reviews.txt'")

create_classification_data()

Output:

Classification data created in 'movie_reviews.txt'

Step 7: Training Text Classifier

  • Trains a FastText supervised model for sentiment classification.
  • Saves the trained model to a file named text_classifier.bin.
Python
def train_text_classifier():
    classifier = fasttext.train_supervised(
        'movie_reviews.txt',
        epoch=25,
        lr=0.1,
        wordNgrams=2,
        verbose=2
    )
    
    classifier.save_model('text_classifier.bin')
    print("Classifier trained and saved")
    return classifier

classifier = train_text_classifier()

Output:

Classifier trained and saved

Step 8: Making Predictions

Python
def test_classifier(classifier):
    test_sentences = [
        "This is a fantastic movie",
        "Boring and terrible film", 
        "Great story and acting",
        "Worst movie I have seen"
    ]
    
    print("\nClassification Results:")
    print("-" * 40)
    
    for sentence in test_sentences:
        labels, probabilities = classifier.predict(sentence, k=1)
        predicted_label = labels[0].replace('__label__', '')
        confidence = probabilities[0]
        print(f"Text: '{sentence}'")
        print(f"Prediction: {predicted_label} (confidence: {confidence:.4f})\n")

test_classifier(classifier)

Output:

FastText-Classification
Final classification results

Edge Cases

  • Character encoding issues: FastText requires consistent UTF-8 encoding across training and inference data. Mixed encodings can lead to inconsistent subword generation.
  • Optimal n-gram range: The choice of minimum and maximum n-gram lengths depends on the target language. For English, 3-6 character n-grams typically work well, while morphologically rich languages may benefit from longer ranges.
  • Training data quality: FastText is sensitive to preprocessing decisions. Inconsistent tokenization or normalization can degrade model quality, particularly for subword-based features.

Practical Applications

FastText excels in scenarios requiring robust of morphological variations and out-of-vocabulary words. It's particularly effective for:

  • Multilingual applications where training data may be limited for some languages
  • Domain-specific text with specialized vocabulary not found in general corpora
  • Real-time systems requiring fast inference and low memory overhead
  • Text classification tasks where subword information provides discriminative features

The library's combination of efficiency and linguistic sophistication makes it a valuable tool for production NLP systems, especially when dealing with diverse or evolving vocabularies where traditional word-level approaches fall short.

Advantages and Limitations

Key Advantages

  • OOV handling: Generates embeddings for unseen words through subword information
  • Morphological awareness: Captures relationships between word variants (run, running, runner)
  • Computational efficiency: Fast training and inference through hierarchical softmax
  • Language flexibility: Works well with morphologically rich languages

Limitations

  • Memory overhead: Requires more storage than traditional embeddings due to subword information
  • Hyperparameter sensitivity: N-gram range (minn, maxn) significantly affects performance
  • Limited semantic depth: May not capture complex semantic relationships as well as transformer-based models

Similar Reads