SlideShare a Scribd company logo
page 1
2/24/2021 CSE 40373/60373: Multimedia Systems
: Lossless compression
•Compression: the process of coding that will
effectively reduce the total number of bits needed to
represent certain information.
If the compression and decompression processes
induce no information loss, then the compression
scheme is lossless; otherwise, it is lossy.
Compression ratio:
page 2
2/24/2021 CSE 40373/60373: Multimedia Systems
Shannon’s theory
The entropy η of an information source with
alphabet S = {s1, s2, . . . , sn} is:
(7.3)
pi – probability that symbol si will occur in S.
Compression is not possible for a) because entropy is
8 (need 8 bits per value)
2
1
l
o
g
n
i i
i
p p




page 3
2/24/2021 CSE 40373/60373: Multimedia Systems
Run length coding
Memoryless Source: an information source that is
independently distributed. Namely, the value of the
current symbol does not depend on the values of
the previously appeared symbols
Rationale for RLC: if the information source has
the property that symbols tend to form continuous
groups, then such symbol and the length of the
group can be coded.
page 4
2/24/2021 CSE 40373/60373: Multimedia Systems
Variable length codes
Different length for each symbol
 Use occurence frequency to choose lengths
 An Example: Frequency count of the symbols in
”HELLO”.
Symbol H E L O
Count 1 1 2 1
page 5
2/24/2021 CSE 40373/60373: Multimedia Systems
Huffman Coding
Initialization: Put all symbols on a list sorted according
to their frequency counts
Repeat until the list has only one symbol left:
 From the list pick two symbols with the lowest frequency
counts. Form a Huffman sub-tree that has these two symbols
as child nodes and create a parent node
 Assign the sum of the children’s frequency counts to the parent
and insert it into the list such that the order is maintained
 Delete the children from the list
Assign a codeword for each leaf based on the path
from the root.
5
page 6
2/24/2021 CSE 40373/60373: Multimedia Systems
 Fig. 7.5: Coding Tree for “HELLO” using the Huffman Algorithm.
page 7
2/24/2021 CSE 40373/60373: Multimedia Systems
Huffman Coding (cont’d)
new symbols P1, P2, P3 are created to refer to the
parent nodes in the Huffman coding tree
 After initialization: L H E O
 After iteration (a): L P1 H
 After iteration (b): L P2
 After iteration (c): P3
page 8
2/24/2021 CSE 40373/60373: Multimedia Systems
Properties of Huffman Coding
1. Unique Prefix Property: No Huffman code is a
prefix of any other Huffman code - precludes any
ambiguity in decoding
2. Optimality: minimum redundancy code - proved
optimal for a given data model (i.e., a given,
accurate, probability distribution):
 The two least frequent symbols will have the same length
for their Huffman codes, differing only at the last bit
 Symbols that occur more frequently will have shorter
Huffman codes than symbols that occur less frequently
 The average code length for an information source S is
strictly less than η + 1
page 9
2/24/2021 CSE 40373/60373: Multimedia Systems
Adaptive Huffman Coding
Extended Huffman is in book: group symbols
together
Adaptive Huffman: statistics are gathered and
updated dynamically as the data stream arrives
ENCODER
-------
Initial_code();
while not EOF
{
get(c);
encode(c);
update_tree(c);
}
DECODER
-------
Initial_code();
while not EOF
{
decode(c);
output(c);
update_tree(c);
}
page 10
2/24/2021 CSE 40373/60373: Multimedia Systems
Adaptive Huffman Coding (Cont’d)
Initial_code assigns symbols with some initially
agreed upon codes, without any prior knowledge of
the frequency counts.
update_tree constructs an Adaptive Huffman tree.
It basically does two things:
 increments the frequency counts for the symbols
(including any new ones)
 updates the configuration of the tree.
The encoder and decoder must use exactly the
same initial_code and update_tree routines
page 11
2/24/2021 CSE 40373/60373: Multimedia Systems
Notes on Adaptive Huffman Tree Updating
The tree must always maintain its sibling property,
i.e., all nodes (internal and leaf) are arranged in the
order of increasing counts
If the sibling property is about to be violated, a
swap procedure is invoked to update the tree by
rearranging the nodes
When a swap is necessary, the farthest node with
count N is swapped with the node whose count has
just been increased to N +1
page 12
2/24/2021 CSE 40373/60373: Multimedia Systems
 Fig. 7.6: Node Swapping for Updating an Adaptive Huffman Tree
page 13
2/24/2021 CSE 40373/60373: Multimedia Systems
Another Example: Adaptive Huffman
Coding for AADCCDD
page 14
2/24/2021 CSE 40373/60373: Multimedia Systems
page 15
2/24/2021 CSE 40373/60373: Multimedia Systems
Table 7.4 Sequence of symbols and codes sent to
the decoder
• It is important to emphasize that the
code for a particular symbol changes
during the adaptive Huffman coding
process. For example, after
AADCCDD, when the character D
overtakes A as the most frequent
symbol, its code changes from 101 to 0
Symbol NEW A A NE
W
D NE
W
C C D D
Code 0 00001 1 0 00100 00 0001
1
001 101 101
page 16
2/24/2021 CSE 40373/60373: Multimedia Systems
7.5 Dictionary-based Coding
LZW uses fixed-length code words to represent
variable-length strings of symbols/characters that
commonly occur together, e.g., words in English
text
 LZW encoder and decoder build up the same dictionary
dynamically while receiving the data
LZW places longer and longer repeated entries into
a dictionary, and then emits the code for an
element, rather than the string itself, if the element
has already been placed in the dictionary
16
page 17
2/24/2021 CSE 40373/60373: Multimedia Systems
LZW compression for string
“ABABBABCABABBA”
The output codes are: 1 2 4 5 2 3 4 6 1. Instead of
sending 14 characters, only 9 codes need to be
sent (compression ratio = 14/9 = 1.56).
17
S C Output Code String
1
2
3
A
B
C
A
B
A
AB
B
BA
B
C
A
AB
A
AB
ABB
A
B
A
B
B
A
B
C
A
B
A
B
B
A
EOF
1
2
4
5
2
3
4
6
1
4
5
6
7
8
9
10
11
AB
BA
ABB
BAB
BC
CA
ABA
ABBA
page 18
2/24/2021 CSE 40373/60373: Multimedia Systems
LZW decompression (1 2 4 5 2 3 4 6 1)
18
S K Entry/output Code String
1
2
3
A
B
C
NIL
A
B
AB
BA
B
C
AB
ABB
A
1
2
4
5
2
3
4
6
1
EOF
A
B
AB
BA
B
C
AB
ABB
A
4
5
6
7
8
9
10
11
AB
BA
ABB
BAB
BC
CA
ABA
ABBA
ABABBABCABABBA
page 19
2/24/2021 CSE 40373/60373: Multimedia Systems
LZW Coding (cont’d)
In real applications, the code length l is kept in the
range of [l0, lmax]. The dictionary initially has a size
of 2l0. When it is filled up, the code length will be
increased by 1; this is allowed to repeat until l = lmax
When lmax is reached and the dictionary is filled up,
it needs to be flushed (as in Unix compress, or to
have the LRU (least recently used) entries
removed
page 20
2/24/2021 CSE 40373/60373: Multimedia Systems
7.6 Arithmetic Coding
Arithmetic coding is a more modern coding method
that usually out-performs Huffman coding
Huffman coding assigns each symbol a codeword
which has an integral bit length. Arithmetic coding
can treat the whole message as one unit
 More details in the book
20
page 21
2/24/2021 CSE 40373/60373: Multimedia Systems
7.7 Lossless Image Compression
Due to spatial redundancy in normal images I, the
difference image d will have a narrower histogram
and hence a smaller entropy
page 22
2/24/2021 CSE 40373/60373: Multimedia Systems
Lossless JPEG
A special case of the JPEG image compression
The Predictive method
 Forming a differential prediction: A predictor combines
the values of up to three neighboring pixels as the
predicted value for the current pixel
 Predictor Prediction
P1 A
P2 B
P3 C
P4 A + B – C
P5 A + (B – C) / 2
P6 B + (A – C) / 2
P7 (A + B) / 2
page 23
2/24/2021 CSE 40373/60373: Multimedia Systems
2. Encoding: The encoder compares the
prediction with the actual pixel value at the position
‘X’ and encodes the difference using Huffman
coding
23
page 24
2/24/2021 CSE 40373/60373: Multimedia Systems
Performance: generally poor, 2-3
24
Compression Program Compression Ratio
Lena Football F-18 Flowers
Lossless JPEG 1.45 1.54 2.29 1.26
Optimal Lossless JPEG 1.49 1.67 2.71 1.33
Compress (LZW) 0.86 1.24 2.21 0.87
Gzip (LZ77) 1.08 1.36 3.10 1.05
Gzip -9 (optimal LZ77) 1.08 1.36 3.13 1.05
Pack(Huffman coding) 1.02 1.12 1.19 1.00

More Related Content

PPT
RRB JE Stage 2 Computer and Applications Questions Part 1
 
PPT
Compression Ii
PPT
Lecture 01
PPTX
Network layer,ipv4, Classful Addressing,notations, Classless addressing,class...
PPT
Data Representation
DOCX
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Unit2 test flashcards
RRB JE Stage 2 Computer and Applications Questions Part 1
 
Compression Ii
Lecture 01
Network layer,ipv4, Classful Addressing,notations, Classless addressing,class...
Data Representation
International Journal of Computational Engineering Research(IJCER)
Unit2 test flashcards

What's hot (19)

PDF
Reliability Level List Based Iterative SISO Decoding Algorithm for Block Turb...
PPT
Chap 02
DOC
Comprehasive Exam - IT
PPT
About ip address
PDF
Ch19 network layer-logical add
DOC
B sc semester i
PPTX
NETWORK LAYER - Logical Addressing
PPT
network Addressing
PPT
Huffman Student
PPT
Combinational Logic
PPTX
Xdr ppt
PPT
Chapter 19: Logical Addressing
PPTX
Classes of ip addresses
PDF
Systematic error codes implimentation for matched data encoded 47405
PPT
Ip address concepts
PPTX
Class2
PPTX
Ipv4 presentation
PDF
Computer data representation (integers, floating-point numbers, text, images,...
DOC
Gsp 215 Future Our Mission/newtonhelp.com
Reliability Level List Based Iterative SISO Decoding Algorithm for Block Turb...
Chap 02
Comprehasive Exam - IT
About ip address
Ch19 network layer-logical add
B sc semester i
NETWORK LAYER - Logical Addressing
network Addressing
Huffman Student
Combinational Logic
Xdr ppt
Chapter 19: Logical Addressing
Classes of ip addresses
Systematic error codes implimentation for matched data encoded 47405
Ip address concepts
Class2
Ipv4 presentation
Computer data representation (integers, floating-point numbers, text, images,...
Gsp 215 Future Our Mission/newtonhelp.com
Ad

Similar to Compress (20)

PPT
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
PPTX
unit 5-ERTS.pptx
PPTX
Image compression
PPT
Compression Ii
PDF
Bsdconv
PPT
Image compression1.ppt
PPT
Compression ii
PPT
Data Redundacy
PDF
Reed Solomon Coding For Error Detection and Correction
PPT
Hufman coding basic
PPTX
soc ip core based for spacecraft application
PDF
11.secure compressed image transmission using self organizing feature maps
PPT
Chapter%202%20 %20 Text%20compression(2)
 
PDF
40120130406011 2-3
PDF
CODES_2.elictronicjajajajsjjsjdjdjcjdjjsksjs
PDF
J0445255
PDF
Fpga implementation of (15,7) bch encoder and decoder for text message
PPTX
Presentation for the Project on VLSI and Embedded
PDF
Fpga implementation of (15,7) bch encoder and decoder
PPT
Lossless
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
unit 5-ERTS.pptx
Image compression
Compression Ii
Bsdconv
Image compression1.ppt
Compression ii
Data Redundacy
Reed Solomon Coding For Error Detection and Correction
Hufman coding basic
soc ip core based for spacecraft application
11.secure compressed image transmission using self organizing feature maps
Chapter%202%20 %20 Text%20compression(2)
 
40120130406011 2-3
CODES_2.elictronicjajajajsjjsjdjdjcjdjjsksjs
J0445255
Fpga implementation of (15,7) bch encoder and decoder for text message
Presentation for the Project on VLSI and Embedded
Fpga implementation of (15,7) bch encoder and decoder
Lossless
Ad

Recently uploaded (20)

PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Artificial Intelligence
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
PPT on Performance Review to get promotions
PPT
Mechanical Engineering MATERIALS Selection
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Fundamentals of Mechanical Engineering.pptx
Internet of Things (IOT) - A guide to understanding
UNIT 4 Total Quality Management .pptx
Construction Project Organization Group 2.pptx
Fundamentals of safety and accident prevention -final (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Artificial Intelligence
III.4.1.2_The_Space_Environment.p pdffdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT on Performance Review to get promotions
Mechanical Engineering MATERIALS Selection
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems

Compress

  • 1. page 1 2/24/2021 CSE 40373/60373: Multimedia Systems : Lossless compression •Compression: the process of coding that will effectively reduce the total number of bits needed to represent certain information. If the compression and decompression processes induce no information loss, then the compression scheme is lossless; otherwise, it is lossy. Compression ratio:
  • 2. page 2 2/24/2021 CSE 40373/60373: Multimedia Systems Shannon’s theory The entropy η of an information source with alphabet S = {s1, s2, . . . , sn} is: (7.3) pi – probability that symbol si will occur in S. Compression is not possible for a) because entropy is 8 (need 8 bits per value) 2 1 l o g n i i i p p    
  • 3. page 3 2/24/2021 CSE 40373/60373: Multimedia Systems Run length coding Memoryless Source: an information source that is independently distributed. Namely, the value of the current symbol does not depend on the values of the previously appeared symbols Rationale for RLC: if the information source has the property that symbols tend to form continuous groups, then such symbol and the length of the group can be coded.
  • 4. page 4 2/24/2021 CSE 40373/60373: Multimedia Systems Variable length codes Different length for each symbol  Use occurence frequency to choose lengths  An Example: Frequency count of the symbols in ”HELLO”. Symbol H E L O Count 1 1 2 1
  • 5. page 5 2/24/2021 CSE 40373/60373: Multimedia Systems Huffman Coding Initialization: Put all symbols on a list sorted according to their frequency counts Repeat until the list has only one symbol left:  From the list pick two symbols with the lowest frequency counts. Form a Huffman sub-tree that has these two symbols as child nodes and create a parent node  Assign the sum of the children’s frequency counts to the parent and insert it into the list such that the order is maintained  Delete the children from the list Assign a codeword for each leaf based on the path from the root. 5
  • 6. page 6 2/24/2021 CSE 40373/60373: Multimedia Systems  Fig. 7.5: Coding Tree for “HELLO” using the Huffman Algorithm.
  • 7. page 7 2/24/2021 CSE 40373/60373: Multimedia Systems Huffman Coding (cont’d) new symbols P1, P2, P3 are created to refer to the parent nodes in the Huffman coding tree  After initialization: L H E O  After iteration (a): L P1 H  After iteration (b): L P2  After iteration (c): P3
  • 8. page 8 2/24/2021 CSE 40373/60373: Multimedia Systems Properties of Huffman Coding 1. Unique Prefix Property: No Huffman code is a prefix of any other Huffman code - precludes any ambiguity in decoding 2. Optimality: minimum redundancy code - proved optimal for a given data model (i.e., a given, accurate, probability distribution):  The two least frequent symbols will have the same length for their Huffman codes, differing only at the last bit  Symbols that occur more frequently will have shorter Huffman codes than symbols that occur less frequently  The average code length for an information source S is strictly less than η + 1
  • 9. page 9 2/24/2021 CSE 40373/60373: Multimedia Systems Adaptive Huffman Coding Extended Huffman is in book: group symbols together Adaptive Huffman: statistics are gathered and updated dynamically as the data stream arrives ENCODER ------- Initial_code(); while not EOF { get(c); encode(c); update_tree(c); } DECODER ------- Initial_code(); while not EOF { decode(c); output(c); update_tree(c); }
  • 10. page 10 2/24/2021 CSE 40373/60373: Multimedia Systems Adaptive Huffman Coding (Cont’d) Initial_code assigns symbols with some initially agreed upon codes, without any prior knowledge of the frequency counts. update_tree constructs an Adaptive Huffman tree. It basically does two things:  increments the frequency counts for the symbols (including any new ones)  updates the configuration of the tree. The encoder and decoder must use exactly the same initial_code and update_tree routines
  • 11. page 11 2/24/2021 CSE 40373/60373: Multimedia Systems Notes on Adaptive Huffman Tree Updating The tree must always maintain its sibling property, i.e., all nodes (internal and leaf) are arranged in the order of increasing counts If the sibling property is about to be violated, a swap procedure is invoked to update the tree by rearranging the nodes When a swap is necessary, the farthest node with count N is swapped with the node whose count has just been increased to N +1
  • 12. page 12 2/24/2021 CSE 40373/60373: Multimedia Systems  Fig. 7.6: Node Swapping for Updating an Adaptive Huffman Tree
  • 13. page 13 2/24/2021 CSE 40373/60373: Multimedia Systems Another Example: Adaptive Huffman Coding for AADCCDD
  • 14. page 14 2/24/2021 CSE 40373/60373: Multimedia Systems
  • 15. page 15 2/24/2021 CSE 40373/60373: Multimedia Systems Table 7.4 Sequence of symbols and codes sent to the decoder • It is important to emphasize that the code for a particular symbol changes during the adaptive Huffman coding process. For example, after AADCCDD, when the character D overtakes A as the most frequent symbol, its code changes from 101 to 0 Symbol NEW A A NE W D NE W C C D D Code 0 00001 1 0 00100 00 0001 1 001 101 101
  • 16. page 16 2/24/2021 CSE 40373/60373: Multimedia Systems 7.5 Dictionary-based Coding LZW uses fixed-length code words to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text  LZW encoder and decoder build up the same dictionary dynamically while receiving the data LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary 16
  • 17. page 17 2/24/2021 CSE 40373/60373: Multimedia Systems LZW compression for string “ABABBABCABABBA” The output codes are: 1 2 4 5 2 3 4 6 1. Instead of sending 14 characters, only 9 codes need to be sent (compression ratio = 14/9 = 1.56). 17 S C Output Code String 1 2 3 A B C A B A AB B BA B C A AB A AB ABB A B A B B A B C A B A B B A EOF 1 2 4 5 2 3 4 6 1 4 5 6 7 8 9 10 11 AB BA ABB BAB BC CA ABA ABBA
  • 18. page 18 2/24/2021 CSE 40373/60373: Multimedia Systems LZW decompression (1 2 4 5 2 3 4 6 1) 18 S K Entry/output Code String 1 2 3 A B C NIL A B AB BA B C AB ABB A 1 2 4 5 2 3 4 6 1 EOF A B AB BA B C AB ABB A 4 5 6 7 8 9 10 11 AB BA ABB BAB BC CA ABA ABBA ABABBABCABABBA
  • 19. page 19 2/24/2021 CSE 40373/60373: Multimedia Systems LZW Coding (cont’d) In real applications, the code length l is kept in the range of [l0, lmax]. The dictionary initially has a size of 2l0. When it is filled up, the code length will be increased by 1; this is allowed to repeat until l = lmax When lmax is reached and the dictionary is filled up, it needs to be flushed (as in Unix compress, or to have the LRU (least recently used) entries removed
  • 20. page 20 2/24/2021 CSE 40373/60373: Multimedia Systems 7.6 Arithmetic Coding Arithmetic coding is a more modern coding method that usually out-performs Huffman coding Huffman coding assigns each symbol a codeword which has an integral bit length. Arithmetic coding can treat the whole message as one unit  More details in the book 20
  • 21. page 21 2/24/2021 CSE 40373/60373: Multimedia Systems 7.7 Lossless Image Compression Due to spatial redundancy in normal images I, the difference image d will have a narrower histogram and hence a smaller entropy
  • 22. page 22 2/24/2021 CSE 40373/60373: Multimedia Systems Lossless JPEG A special case of the JPEG image compression The Predictive method  Forming a differential prediction: A predictor combines the values of up to three neighboring pixels as the predicted value for the current pixel  Predictor Prediction P1 A P2 B P3 C P4 A + B – C P5 A + (B – C) / 2 P6 B + (A – C) / 2 P7 (A + B) / 2
  • 23. page 23 2/24/2021 CSE 40373/60373: Multimedia Systems 2. Encoding: The encoder compares the prediction with the actual pixel value at the position ‘X’ and encodes the difference using Huffman coding 23
  • 24. page 24 2/24/2021 CSE 40373/60373: Multimedia Systems Performance: generally poor, 2-3 24 Compression Program Compression Ratio Lena Football F-18 Flowers Lossless JPEG 1.45 1.54 2.29 1.26 Optimal Lossless JPEG 1.49 1.67 2.71 1.33 Compress (LZW) 0.86 1.24 2.21 0.87 Gzip (LZ77) 1.08 1.36 3.10 1.05 Gzip -9 (optimal LZ77) 1.08 1.36 3.13 1.05 Pack(Huffman coding) 1.02 1.12 1.19 1.00