SlideShare a Scribd company logo
Huffman Codes
Encoding messages
 Encode a message composed of a string of
characters
 Codes used by computer systems
 ASCII
• uses 8 bits per character
• can encode 256 characters
 Unicode
• 16 bits per character
• can encode 65536 characters
• includes all characters encoded by ASCII
 ASCII and Unicode are fixed-length codes
 all characters represented by same number of bits
Problems
 Suppose that we want to encode a message
constructed from the symbols A, B, C, D, and E
using a fixed-length code
 How many bits are required to encode each
symbol?
 at least 3 bits are required
 2 bits are not enough (can only encode four
symbols)
 How many bits are required to encode the
message DEAACAAAAABA?
 there are twelve symbols, each requires 3 bits
 12*3 = 36 bits are required
Drawbacks of fixed-length codes
 Wasted space
 Unicode uses twice as much space as ASCII
• inefficient for plain-text messages containing
only ASCII characters
 Same number of bits used to represent all characters
 ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’
 Potential solution: use variable-length codes
 variable number of bits to represent characters
when frequency of occurrence is known
 short codes for characters that occur frequently
Advantages of variable-length codes
 The advantage of variable-length codes over fixed-
length is short codes can be given to characters that
occur frequently
 on average, the length of the encoded message is
less than fixed-length encoding
 Potential problem: how do we know where one
character ends and another begins?
• not a problem if number of bits is fixed!
A = 00
B = 01
C = 10
D = 11
0010110111001111111111
A C D B A D D D D D
Prefix property
 A code has the prefix property if no character code
is the prefix (start of the code) for another character
 Example:
 000 is not a prefix of 11, 01, 001, or 10
 11 is not a prefix of 000, 01, 001, or 10 …
Symbol Code
P 000
Q 11
R 01
S 001
T 10
01001101100010
R S T Q P T
Code without prefix property
 The following code does not have prefix property
 The pattern 1110 can be decoded as QQQP, QTP,
QQS, or TS
Symbol Code
P 0
Q 1
R 01
S 10
T 11
Problem
 Design a variable-length prefix-free code such that
the message DEAACAAAAABA can be encoded
using 22 bits
 Possible solution:
 A occurs eight times while B, C, D, and E each
occur once
 represent A with a one bit code, say 0
• remaining codes cannot start with 0
 represent B with the two bit code 10
• remaining codes cannot start with 0 or 10
 represent C with 110
 represent D with 1110
 represent E with 11110
Encoded message
Symbol Code
A 0
B 10
C 110
D 1110
E 11110
DEAACAAAAABA
1110111100011000000100 22 bits
Another possible code
Symbol Code
A 0
B 100
C 101
D 1101
E 1111
DEAACAAAAABA
1101111100101000001000 22 bits
Better code
Symbol Code
A 0
B 100
C 101
D 110
E 111
DEAACAAAAABA
11011100101000001000 20 bits
What code to use?
 Question: Is there a variable-length code that makes
the most efficient use of space?
Answer: Yes!
Huffman coding tree
 Binary tree
 each leaf contains symbol (character)
 label edge from node to left child with 0
 label edge from node to right child with 1
 Code for any symbol obtained by following path from
root to the leaf containing symbol
 Code has prefix property
 leaf node cannot appear on path to another leaf
 note: fixed-length codes are represented by a
complete Huffman tree and clearly have the prefix
property
Building a Huffman tree
 Find frequencies of each symbol occurring in
message
 Begin with a forest of single node trees
 each contain symbol and its frequency
 Do recursively
 select two trees with smallest frequency at the root
 produce a new binary tree with the selected trees
as children and store the sum of their frequencies
in the root
 Recursion ends when there is one tree
 this is the Huffman coding tree
Example
 Build the Huffman coding tree for the message
This is his message
 Character frequencies
 Begin with forest of single trees
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
Step 1
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
2
Step 2
1
1 3
1 2
1 2 3 5
A G I S
M T E H _
2 2
Step 3
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
Step 4
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
4
Step 5
1
1 3
1
1 3 5
A G I S
M T _
2 2
2 2
E H
4
4
6
Step 6
3 3 5
I S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4
6
8
Step 7
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
Step 8
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
19
Label edges
3 3
5
I
S
_
2 2
E H
4
1
1 1
1
A G M T
2 2
4 6
8 11
19
0
0
0
0
0
0
0
0
1
1
1
1 1
1
1
1
Huffman code & encoded message
S 11
E 010
H 011
_ 100
I 101
A 0000
G 0001
M 0010
T 0011
This is his message
00110111011110010111100011101111000010010111100000001010
Average Code Length
The average code length of the Huffman tree can be determined by using
the formula given below:
Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency )
This is his message
Symbol Frequency
(F)
Code Code
Length (CL)
Total
(F*CL)
S 5 11 2 10
E 2 010 3 6
H 2 011 3 6
_ 3 100 3 9
I 3 101 3 9
A 1 0000 4 4
G 1 0001 4 4
M 1 0010 4 4
T 1 0011 4 4
Total_Freq 19 56
Average = 56/19
= 2.94737 bits
Length of the string??

More Related Content

PPT
huffman ppt
PPT
Huffmans code
PPTX
Farhana shaikh webinar_huffman coding
PPT
Komdat-Kompresi Data
PPTX
Computer-codes.pptx
PPT
Topic 1 Data Representation
PPT
Topic 1 Data Representation
PDF
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
huffman ppt
Huffmans code
Farhana shaikh webinar_huffman coding
Komdat-Kompresi Data
Computer-codes.pptx
Topic 1 Data Representation
Topic 1 Data Representation
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf

Similar to Huffman Coding.ppt (20)

PPT
Compression Ii
PPT
Compression Ii
PDF
Digital electronics
PPT
Huffman code presentation and their operation
PPT
710402_Lecture 1.ppt
PPTX
Huffman Coding
PDF
Module-IV 094.pdf
PDF
Binary codes
PDF
004 NUMBER SYSTEM (1).pdf
PDF
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
PPTX
Digital Fundamental - Binary Codes-Logic Gates
PDF
DigitalLogic_CharacterCodes.pdf advanced
PPTX
Digital image processing- Compression- Different Coding techniques
PPTX
linear codes and cyclic codes
PPT
Lecture 01
PPTX
3RD.pptx
PPTX
. computer codes
PDF
Basics of coding theory
PPTX
DLD-W3-L1.pptx
Compression Ii
Compression Ii
Digital electronics
Huffman code presentation and their operation
710402_Lecture 1.ppt
Huffman Coding
Module-IV 094.pdf
Binary codes
004 NUMBER SYSTEM (1).pdf
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Digital Fundamental - Binary Codes-Logic Gates
DigitalLogic_CharacterCodes.pdf advanced
Digital image processing- Compression- Different Coding techniques
linear codes and cyclic codes
Lecture 01
3RD.pptx
. computer codes
Basics of coding theory
DLD-W3-L1.pptx
Ad

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Construction Project Organization Group 2.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PPT on Performance Review to get promotions
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Artificial Intelligence
PDF
737-MAX_SRG.pdf student reference guides
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Fundamentals of Mechanical Engineering.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mechanical Engineering MATERIALS Selection
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Construction Project Organization Group 2.pptx
573137875-Attendance-Management-System-original
Foundation to blockchain - A guide to Blockchain Tech
PPT on Performance Review to get promotions
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Artificial Intelligence
737-MAX_SRG.pdf student reference guides
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Fundamentals of Mechanical Engineering.pptx
Ad

Huffman Coding.ppt

  • 2. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII • uses 8 bits per character • can encode 256 characters  Unicode • 16 bits per character • can encode 65536 characters • includes all characters encoded by ASCII  ASCII and Unicode are fixed-length codes  all characters represented by same number of bits
  • 3. Problems  Suppose that we want to encode a message constructed from the symbols A, B, C, D, and E using a fixed-length code  How many bits are required to encode each symbol?  at least 3 bits are required  2 bits are not enough (can only encode four symbols)  How many bits are required to encode the message DEAACAAAAABA?  there are twelve symbols, each requires 3 bits  12*3 = 36 bits are required
  • 4. Drawbacks of fixed-length codes  Wasted space  Unicode uses twice as much space as ASCII • inefficient for plain-text messages containing only ASCII characters  Same number of bits used to represent all characters  ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’  Potential solution: use variable-length codes  variable number of bits to represent characters when frequency of occurrence is known  short codes for characters that occur frequently
  • 5. Advantages of variable-length codes  The advantage of variable-length codes over fixed- length is short codes can be given to characters that occur frequently  on average, the length of the encoded message is less than fixed-length encoding  Potential problem: how do we know where one character ends and another begins? • not a problem if number of bits is fixed! A = 00 B = 01 C = 10 D = 11 0010110111001111111111 A C D B A D D D D D
  • 6. Prefix property  A code has the prefix property if no character code is the prefix (start of the code) for another character  Example:  000 is not a prefix of 11, 01, 001, or 10  11 is not a prefix of 000, 01, 001, or 10 … Symbol Code P 000 Q 11 R 01 S 001 T 10 01001101100010 R S T Q P T
  • 7. Code without prefix property  The following code does not have prefix property  The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS Symbol Code P 0 Q 1 R 01 S 10 T 11
  • 8. Problem  Design a variable-length prefix-free code such that the message DEAACAAAAABA can be encoded using 22 bits  Possible solution:  A occurs eight times while B, C, D, and E each occur once  represent A with a one bit code, say 0 • remaining codes cannot start with 0  represent B with the two bit code 10 • remaining codes cannot start with 0 or 10  represent C with 110  represent D with 1110  represent E with 11110
  • 9. Encoded message Symbol Code A 0 B 10 C 110 D 1110 E 11110 DEAACAAAAABA 1110111100011000000100 22 bits
  • 10. Another possible code Symbol Code A 0 B 100 C 101 D 1101 E 1111 DEAACAAAAABA 1101111100101000001000 22 bits
  • 11. Better code Symbol Code A 0 B 100 C 101 D 110 E 111 DEAACAAAAABA 11011100101000001000 20 bits
  • 12. What code to use?  Question: Is there a variable-length code that makes the most efficient use of space? Answer: Yes!
  • 13. Huffman coding tree  Binary tree  each leaf contains symbol (character)  label edge from node to left child with 0  label edge from node to right child with 1  Code for any symbol obtained by following path from root to the leaf containing symbol  Code has prefix property  leaf node cannot appear on path to another leaf  note: fixed-length codes are represented by a complete Huffman tree and clearly have the prefix property
  • 14. Building a Huffman tree  Find frequencies of each symbol occurring in message  Begin with a forest of single node trees  each contain symbol and its frequency  Do recursively  select two trees with smallest frequency at the root  produce a new binary tree with the selected trees as children and store the sum of their frequencies in the root  Recursion ends when there is one tree  this is the Huffman coding tree
  • 15. Example  Build the Huffman coding tree for the message This is his message  Character frequencies  Begin with forest of single trees A G M T E H _ I S 1 1 1 1 2 2 3 3 5 1 1 3 1 2 1 2 3 5 A G I S M T E H _
  • 16. Step 1 1 1 3 1 2 1 2 3 5 A G I S M T E H _ 2
  • 17. Step 2 1 1 3 1 2 1 2 3 5 A G I S M T E H _ 2 2
  • 18. Step 3 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4
  • 19. Step 4 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4 4
  • 20. Step 5 1 1 3 1 1 3 5 A G I S M T _ 2 2 2 2 E H 4 4 6
  • 21. Step 6 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8
  • 22. Step 7 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11
  • 23. Step 8 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11 19
  • 24. Label edges 3 3 5 I S _ 2 2 E H 4 1 1 1 1 A G M T 2 2 4 6 8 11 19 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
  • 25. Huffman code & encoded message S 11 E 010 H 011 _ 100 I 101 A 0000 G 0001 M 0010 T 0011 This is his message 00110111011110010111100011101111000010010111100000001010
  • 26. Average Code Length The average code length of the Huffman tree can be determined by using the formula given below: Average Code Length = ∑ ( frequency × code length ) / ∑ ( frequency ) This is his message Symbol Frequency (F) Code Code Length (CL) Total (F*CL) S 5 11 2 10 E 2 010 3 6 H 2 011 3 6 _ 3 100 3 9 I 3 101 3 9 A 1 0000 4 4 G 1 0001 4 4 M 1 0010 4 4 T 1 0011 4 4 Total_Freq 19 56 Average = 56/19 = 2.94737 bits Length of the string??