Introduction to Source Coding.pdf

Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor)
HARAMAYA UNIVERSITY
HARAMAYA INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL AND COMPUTER
ENGINEERING
Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor),
Guest Lecturer
Introduction to Source Coding

Shannon’s 1st Source Coding Theorem
 Shannon showed that:
“To reliably store the information generated by some
random source X, you need no more/less than, on the
average, H(X) bits for each outcome.”
Haramaya
University,
HIT,
School
of
ECE
2 Haramaya University, HIT, School of ECE
12/20/2022

 If I toss a dice 1,000,000 times and record values from each
trial
1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
 In principle, I need 3 bits for storing each outcome as 3 bits
covers 1-8. So I need 3,000,000 bits for storing the
information.
 Using ASCII representation, computer needs 8 bits=1 byte
for storing each outcome
 The resulting file has size 8,000,000 bits
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

 You only need 2.585 bits for storing each outcome.
 So, the file can be compressed to yield size
2.585x1,000,000=2,585,000 bits
 Optimal Compression Ratio is:
Haramaya
University,
HIT,
School
of
ECE
4
%
31
.
32
3231
.
0
000
,
000
,
8
000
,
585
,
2


Haramaya University, HIT, School of ECE
12/20/2022

.
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Type of Coding
 Source Coding - Code data to more efficiently represent the
information
 Reduces “size” of data
 Analog - Encode analog source data into a binary format
 Digital - Reduce the “size” of digital source data
 Channel Coding - Code data for transmission over a noisy
communication channel
 Increases “size” of data
 Digital - add redundancy to identify and correct errors
 Analog - represent digital values by analog signals
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Type of Source Coding
Two Types of Source Coding
 Lossless coding (entropy coding)
 Data can be decoded to form exactly the same bits
 Used in “zip”
 Can only achieve moderate compression (e.g. 2:1 - 3:1) for natural images
 Can be important in certain applications such as medical imaging
 Lossly source coding
 Decompressed image is visually similar, but has been changed
 Used in “JPEG” and “MPEG”
 Can achieve much greater compression (e.g. 20:1 -40:1) for natural images
 Uses entropy coding
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Lossless Coding
 Lossless compression allows the original data to be
perfectly reconstructed from the compressed data.
 By operation of the pigeonhole principle, no lossless
compression algorithm can efficiently compress all possible
data.
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Lossless Coding
 Lossless data compression is used in many applications. For example,
 It is used in the ZIP file format and in the GNU tool gzip.
 It is also used as a component within lossy data compression technologies
(e.g. lossless mid/side joint stereo preprocessing by MP3 encoders and
other lossy audio encoders).
 For Typical examples are executable programs, text documents, and source
code.
 Some image file formats, like PNG or GIF, use only lossless compression,
while others like TIFF and MNG may use either lossless or lossy methods.
 Lossless audio formats are most often used for archiving or production
purposes, while smaller lossy audio files are typically used on portable
players and in other cases where storage space is limited or exact
replication of the audio is unnecessary.
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Lossless Coding
 Most lossless compression programs do two things in
sequence:
 the first step generates a statistical model for the input
data, and
 the second step uses this model to map input data to bit
sequences in such a way that "probable" (e.g. frequently
encountered) data will produce shorter output than
"improbable" data.
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

Lossless Coding
 The primary encoding algorithms used to produce bit
sequences are Huffman coding (also used by the deflate
algorithm) and arithmetic coding.
 Arithmetic coding achieves compression rates close to the
best possible for a particular statistical model, which is
given by the information entropy, whereas Huffman
compression is simpler and faster but produces poor
results for models that deal with symbol probabilities close
to 1.
Haramaya
University,
HIT,
School
of
ECE
11
12/20/2022

Lossless Coding
Adaptive models
 Adaptive models dynamically update the model as the data is
compressed.
 Both the encoder and decoder begin with a trivial model, yielding poor
compression of initial data, but as they learn more about the data,
performance improves.
 Most popular types of compression used in practice now use adaptive
coders.
Haramaya
University,
HIT,
School
of
ECE
12/20/2022

 Assume a set of symbols (26 English letters and some additional
symbols such as space, period, etc.) is to be transmitted through the
communication channel.
 These symbols can be treated as independent samples of a random
variable X with probability P(X) and entropy
 The length of the code for a symbol x with can be its
surprise
 Let L be the average number of bits to encode the N symbols. Shannon
proved that the minimum L satisfies
14
Shannon's Source Coding Theorem
12/20/2022

 A Huffman code is a particular type of optimal prefix code that is
commonly used for lossless data compression.
 Optimum prefix code developed by D. Huffman in a class assignment
 The output from Huffman's algorithm can be viewed as a variable-
length code table for encoding a source symbol.
 The algorithm derives this table from the estimated probability or
frequency of occurrence (weight) for each possible value of the
source symbol.
 Huffman coding is not always optimal among all compression methods
 it is replaced with arithmetic coding or asymmetric numeral systems if better compression ratio is required.
15
Huffman Coding
12/20/2022

 Two Requirements for Optimum Prefix Codes
 The two least likely symbols have codewords that differ only in the last bit
 These three requirements lead to a simple way of building a binary tree
describing an optimum prefix code - THE Huffman Code.
 Build it from bottom up, starting with the two least likely symbols
 The external nodes correspond to the symbols
 The internal nodes correspond to “super symbols” in a “reduced” alphabet
16
Huffman Coding
12/20/2022

1. Label each node with one of the source symbol probabilities
2. Merge the nodes labeled by the two smallest probabilities into a parent
node
3. Label the parent node with the sum of the two children’s probabilities
 This parent node is now considered to be a “super symbol” (it replaces its two
children symbols) in a reduced alphabet
4. Among the elements in reduced alphabet, merge two with smallest probs.
 If there is more than one such pair, choose the pair that has the “lowest order
super symbol” (this assure the minimum variance Huffman Code)
5. Label the parent node with the sum of the two children probabilities.
6. Repeat steps 4 & 5 until only a single super symbol remains
17
Huffman Code-Design Steps
12/20/2022

.
18
Huffman Code-Examples
12/20/2022

.
19
Huffman Code-Examples
12/20/2022

.
20
Minimum Variance-Huffman Code
12/20/2022

 Build it from bottom up, starting with the two least likely symbols
 The external nodes correspond to the symbols
 The internal nodes correspond to “super symbols” in a “reduced” alphabet
21
Huffman Coding
12/20/2022

.
22
Extended Huffman Coding
12/20/2022

.
23
Extended Huffman Coding
12/20/2022

.
24
Performance of Extended Huffman Coding
12/20/2022

.
25
12/20/2022

.
26
12/20/2022

 Adaptive Huffman coding (also called Dynamic Huffman
coding) is an adaptive coding technique based on Huffman
coding.
 It permits building the code as the symbols are being
transmitted, having no initial knowledge of source distribution,
that allows one-pass encoding and adaptation to changing
conditions in data.
 The benefit of one-pass procedure is that the source can be
encoded in real time, though it becomes more sensitive to
transmission errors, since just a single loss ruins the whole
code.
27
ADAPTIVE HUFFMAN CODING
12/20/2022

 One pass
 During the pass calculate the frequencies
 Update the Huffman tree accordingly
 Coder – new Huffman tree computed after transmitting the
symbol
 Decoder – new Huffman tree computed after receiving the
symbol
 Symbol set and their initial codes must be known ahead of
time.
 Need NYT (not yet transmitted symbol) to indicate a new leaf
is needed in the tree.
28
12/20/2022

.
29
12/20/2022

.
30
12/20/2022

.
31
12/20/2022

.
32
12/20/2022

.
33
12/20/2022

.
34
12/20/2022

.
35
12/20/2022

.
36
12/20/2022

.
37
12/20/2022

.
38
12/20/2022

.
39
12/20/2022

.
40
12/20/2022

.
41
12/20/2022

.
42
12/20/2022

.
43
12/20/2022

.
44
12/20/2022

.
45
12/20/2022

.
46
12/20/2022

.
47
12/20/2022

.
48
12/20/2022

.
49
12/20/2022

.
50
12/20/2022

Huffman Coding
 Replacing an input symbol with a codeword
 Need a probability distribution
 Hard to adapt to changing statistics
 Need to store the codeword table
 Minimum codeword length is 1 bit
Arithmetic Coding
 Replace the entire input with a single floating-point number
 Does not need the probability distribution
 Adaptive coding is very easy
 No need to keep and send codeword table
 Fractional codeword length
51
Huffman Coding Vs Arithmetic Coding
12/20/2022

 Recall table look-up decoding of Huffman code
 N: alphabet size
 L: Max code word length
 Divide [0, 2L] into N intervals
 One interval for one symbol
 Interval size is roughly proportional to symbol prob.
52
Arithmetic Coding
12/20/2022

 Arithmetic coding applies this idea recursively
 Normalizes the range [0, 2L] to [0, 1].
 Map an input sequence (multiple symbols) to a unique tag
in [0, 1)
53
Arithmetic Coding
12/20/2022

 Disjoint and complete partition of the range [0, 1)
 Each interval corresponds to one symbol
 Interval size is proportional to symbol probability
 The first symbol restricts the tag position to be in one of
the intervals
 The reduced interval is partitioned recursively as more
symbols are processed.
54
Arithmetic Coding
12/20/2022

.
55
Arithmetic Coding
12/20/2022

 Symbol set and prob: a (0.8), b(0.02), c(0.18)
56
Arithmetic Coding-Example
12/20/2022

 set and prob: a (0.8), b(0.02), c(0.18)
57
Arithmetic Coding-Example
12/20/2022

58
Arithmetic Decoding-Example
12/20/2022

59
Arithmetic Decoding-Example (Floating Pt. Option)-Simplified
12/20/2022

 Arithmetic coding is slow in general:
 To decode a symbol, we need a series of decisions and
multiplications:
 The complexity is greatly reduced if we have only two symbols: 0
and 1.
 Only two intervals: [0, x), [x, 1)
60
Binary Arithmetic Decoding
12/20/2022

61
Binary Arithmetic Decoding
12/20/2022

 Investigate the latest development in the source coding
technologies.
62
Reading Assignment
12/20/2022

Introduction to Source Coding.pdf

More Related Content

Similar to Introduction to Source Coding.pdf (20)

Recently uploaded (20)

Introduction to Source Coding.pdf