Open In App

Frequent Pattern Growth Algorithm

Last Updated : 27 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

The FP-Growth (Frequent Pattern Growth) algorithm efficiently mines frequent itemsets from large transactional datasets. Unlike the Apriori algorithm which suffers from high computational cost due to candidate generation and multiple database scans. FP-Growth avoids these inefficiencies by compressing the data into an FP-Tree (Frequent Pattern Tree) and extracts patterns directly from it.

How FP-Growth Works

Here's how it works in simple terms:

  1. Data Compression: First FP-Growth compresses the dataset into a smaller structure called the Frequent Pattern Tree (FP-Tree). This tree stores information about item sets (collections of items) and their frequencies without need to generate candidate sets like Apriori does.
  2. Mining the Tree: The algorithm then examines this tree to identify patterns that appear frequently based on a minimum support threshold. It does this by breaking the tree down into smaller "conditional" trees for each item making the process more efficient.
  3. Generating Patterns: Once the tree is built and analyzed the algorithm generates the frequent patterns (itemsets) and the rules that describe relationships between items.

Imagine you’re organizing a party and want to know popular food combinations without asking every guest repeatedly.

  1. List food items each guest brought transactions.
  2. Count items and remove infrequent ones filter by support.
  3. Group items in order of popularity and create a tree where paths represent common combinations.
  4. Instead of repeatedly asking guests you explore this tree to discover patterns. For example, you might find that pizza and pasta often come together or that cake and pasta are also a common pair.

This is exactly how FP-Growth finds frequent patterns efficiently.

Working of FP- Growth Algorithm

Lets jump to the usage of FP- Growth Algorithm and how it works with reallife data. Consider the following data:

Transaction ID

Items

T1

{E,K,M,N,O,Y}

T2

{D,E,K,N,O,Y}

T3

{A,E,K,M}

T4

{K,M,Y}

T5

{C,E,I,K,O,O}

The above-given data is a hypothetical dataset of transactions with each letter representing an item. The frequency of each individual item is computed:- 

Item

Frequency

A

1

C

2

D

1

E

4

I

1

K

5

M

3

N

2

O

4

U

1

Y

3

Let the minimum support be 3. A Frequent Pattern set is built which will contain all the elements whose frequency is greater than or equal to the minimum support. These elements are stored in descending order of their respective frequencies. After insertion of the relevant items, the set L looks like this:- 

L = {K : 5, E : 4, M : 3, O : 4, Y : 3} 

Now for each transaction the respective Ordered-Item set is built. It is done by iterating the Frequent Pattern set and checking if the current item is contained in the transaction in question. If the current item is contained the item is inserted in the Ordered-Item set for the current transaction. The following table is built for all the transactions: 

Transaction ID

Items

Ordered-Item-Set

T1

{E,K,M,N,O,Y}

{K,E,M,O,Y}

T2

{D,E,K,N,O,Y}

{K,E,O,Y}

T3

{A,E,K,M}

{K,E,M}

T4

{C,K,M,U,Y}

{K,M,Y}

T5

{C,E,I,K,O,O}

{K,E,O}

Now all the Ordered-Item sets are inserted into a Tree Data Structure. 

a) Inserting the set {K, E, M, O, Y}
Here all the items are simply linked one after the other in the order of occurrence in the set and initialise the support count for each item as 1. For inserting {K, E, M, O, Y} we traverse the tree from the root. If a node already exists for an item, we increase its support count. If it doesn’t exist, we create a new node for that item and link it to the previous item.

Frequent-Pattern-Growth-Algorithm
Inserting the set {K, E, M, O, Y}

b) Inserting the set {K, E, O, Y}
Till the insertion of the elements K and E, simply the support count is increased by 1. On inserting O we can see that there is no direct link between E and O, therefore a new node for the item O is initialized with the support count as 1 and item E is linked to this new node. On inserting Y, we first initialize a new node for the item Y with support count as 1 and link the new node of O with the new node of Y. 

Frequent-Pattern-Growth-Algorithm-2
Inserting the set {K, E, O, Y}

c) Inserting the set {K, E, M}
Here simply the support count of each element is increased by 1.

Frequent-Pattern-Growth-Algorithm-3
Inserting the set {K, E, M}

d) Inserting the set {K, M, Y}
Similar to step b), first the support count of K is increased, then new nodes for M and Y are initialized and linked accordingly.

Frequent-Pattern-Growth-Algorithm-4
Inserting the set {K, M, Y}


e) Inserting the set {K, E, O}
Here simply the support counts of the respective elements are increased. Note that the support count of the new node of item O is increased. 

Frequent-Pattern-Growth-Algorithm-5
Inserting the set {K, E, O}

The Conditional Pattern Base for each item consists of the set of prefixes of all paths in the FP-tree that lead to that item. Note that the items in the below table are arranged in the ascending order of their frequencies. 

Now for each item, the Conditional Frequent Pattern Tree is built. It is done by taking the set of elements that is common in all the paths in the Conditional Pattern Base of that item and calculating its support count by summing the support counts of all the paths in the Conditional Pattern Base. 
 

From the Conditional Frequent Pattern tree the Frequent Pattern rules are generated by pairing the items of the Conditional Frequent Pattern Tree set to the corresponding to the item as given in the below table. 

For each row two types of association rules can be inferred for example for the first row which contains the element, the rules K -> Y and Y -> K can be inferred. To determine the valid rule, the confidence of both the rules is calculated and the one with confidence greater than or equal to the minimum confidence value is retained.

Frequent Pattern Growth (FP-Growth) algorithm improves upon the Apriori algorithm by eliminating the need for multiple database scans and reducing computational overhead. By using a Tree data structure and focusing on ordered-item sets it efficiently mines frequent item sets making it a faster and more scalable solution for large datasets making it useful tool for data mining.


Next Article

Similar Reads