Source coding

Chapter 8
Information Theory and Coding

What is information?
• You are planning to go to Biratnagar during summer
vacation. You called your friend in Biratnagar to know
the weather conditions of Biratnagar. Assume that you
could have received the following messages regarding the
weather condition in Biratnagar in summer:
– It is sunny and hot
– It is cold
– Snow is falling down.
• Which statement contains the most information?
• The information content of any message is closely related
to the past knowledge of the occurrence of event and the
level of uncertainty it contains with respect to the
recipient of the message
• The amount of information received from the knowledge
of occurrence of an event is related to the probability or
the likelihood of occurrence of the event.

• The message related to an event least likely to occur contains
more information.
• Let m1. m2. ……..mq be the one the q possible messages
emitted by a source with probabilites of occurrence P1, P2,
………..Pq such that,
P1+P2+……….Pq=1
• Let I(mk) be the amount of information contained in k-th
message. Then for I(mk) to represent information content of mk
message, based on intuitions the following conditions should
be met.
A) I(mk)>I(mj), if Pk<pj
B) I(mk) → 0 if Pk → 1
C) I(mk) → 1 if Pk → 0
D) I(mk) ≥ 0 when 0 ≤ Pk≤ 1
E) I (mk and mj) ≡ I(mk mj) = I(mk)+I(mj)

• To satisfy all the conditions mentioned above, we could relate
I(mk) and Pk in the following manner.
I(mk) = log(1/Pk) = -log(Pk)
• If two binary digits 1 and 0 occur with equal probability and
are correctly detected at the receiving end, then the
information content in each digit is 1 bit.
I(0 or 1) = -log2(1/2) = 1 bit

Entropy
• Entropy: average information content of a
sequence of symbol.
1
1
1
log
log
n
i
ii
n
i i
i
H P
P
P P
=
=
=
= −
∑
∑
EntropyH(bits)
0 0.5 1.0
Probability, p
1.0
0.5
0 If the symbol rate is Rs then
the information rate is
Rinf = RsX H bits/sec

Example:
Average Information Content in English Language
Calculate the average information in bits/character in English
assuming each letter is equally likely
26
2
1
1 1
log
26 26
4.7 /
i
H
bits char
=
 
= −  ÷
 
=
∑

Real-world English
But English characters do not appear with the same frequency
in the real-world English literature, probabilities of each
character are assumes as ,
Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10
•Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07
•Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02
•Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01
( ) ( )
( ) ( )
2 2
2 2
4 .1log .1 5 .07 log .07
8 .02log .02 9 .01log .01
4.17 /
H
bits char
 × + ×
=− 
+ × + ×  
=

Source Coding
Source Coding – eliminate redundancy in the data, send same
information in fewer bits
Channel Coding – Detect/Correct errors in signaling and improve
BER

Source Coding
• Goal is to find an efficient description of information
sources
– Reduce required bandwidth
– Reduce memory to store
• Memoryless –If symbols from source are independent,
one symbol does not depend on next
• Memory – elements of a sequence depend on one
another, e.g. UNIVERSIT_?, 10-tuple contains less
information since dependent
• This means that it’s more efficient to code information
with memory as groups of symbols
( ) ( )memory no memory
H X H X<

10
Desirable Properties
• Length
– Fixed Length – ASCII
– Variable Length – Morse Code, JPEG
• Uniquely Decodable – allow user to invert
mapping to the original
• Prefix-Free – No codeword can be a prefix of any
other codeword
• Average Code Length (ni is code length of ith
symbol)
( )i i
i
n n P X= ∑

11
Uniquely Decodable and Prefix Free Codes
• Uniquely decodable?
– Not code 1
– If “10111” sent, is code 3
‘babbb’or ‘bacb’? Not
code 3 or 6
• Prefix-Free
– Not code 4,
– prefix contains ‘1’
• Avg Code Length
– Code 2: n=2
– Code 5: n=1.23
Xi P(Xi)
a 0.73
b 0.25
c 0.02
Sym
bol
Code
1
Code
2
Code
3
Code
4
Code
5
Code
6
a 00 00 0 1 1 1
b 00 01 1 10 00 01
c 11 10 11 100 01 11

Huffman Code
• Characteristics of Huffman Codes:
– Prefix-free, variable length code that can achieve
the shortest average code length for an alphabet
– Most frequent symbols have short codes
• Procedure
– List all symbols and probabilities in descending
order
– Merge branches with two lowest probabilities,
combine their probabilities
– Repeat until one branch is left

Huffman Code Example
0.4
0.2
0.1
0.1
0.1
0.1
a
b
c
d
e
f
0.2
0.4
0.2
0.2
0.1
0.1 0.2
0.4
0.2
0.2
0.2 0.4
0.4
0.4
0.2 0.6
0.6
0.4
1.0
1
0
1
0
1
0
1
01
0
The Code:
A 11
B 00
C 101
D 100
E 011
F 010
2.4n =
Compression
Ratio:
3.0/2.4=1.25
Entropy:
2.32

Example:
• Consider a random vector X = {a, b, c}
with associated probabilities as listed
in the Table
• Calculate the entropy of this symbol
set
• Find the Huffman Code for this symbol
set
• Find the compression ratio and
efficiency of this code
Xi P(Xi)
a 0.73
b 0.25
c 0.02

Error detection and correction codes
• Whenever bits flow from one point to another, they are subject to unpredictable
• changes because of interference. This interference can change the shape of the
signal.
• In a single-bit error, a 0 is changed to a 1 or a 1 to a O. In a burst error, multiple
bits are changed. For example, a 11100 s burst of impulse noise on a transmission
with a data rate of 1200 bps might change all or some of the12 bits of information.
• For reliable communication, errors must be detected and corrected.
• The correction of errors is more difficult than the detection. In error detection,
we are looking only to see if any error has occurred.
• In error correction, we need to know the exact number of bits that are corrupted
and more importantly, their location in the message and then correct it.
• Some of the popular detection methods are:
– Parity checking
– Checksum error detection
– Cyclic redundancy check (CRC)
• Some of the popular correction methods are:
– Block codes
– Convolutional Codes

Single-bit error
This kind of errors can happen in parallel transmission.
Example:Example:
If data is sent at 1Mbps then each bit lasts only
1/1,000,000 sec. or 1 μs.
For a single-bit error to occur, the noise must have a
duration of only 1 μs, which is very rare.

• Burst errors does not necessarily mean that
the errors occur in consecutive bits, the length
of the burst is measured from the first corrupted
bit to the last corrupted bit. Some bits in between
may not have been corrupted.
Burst error is most likely to happen in serial
transmission
Example:Example:
If data is sent at rate = 1Kbps then a noise of
1/100 sec can affect 10 bits.(1/100*1000)
If same data is sent at rate = 1Mbps then a noise of
1/100 sec can affect 10,000 bits.(1/100*106
)

Error detectionError detection
• Error detection means to decide whether the received
data is correct or not without having a copy of the
original message.
• Error detection uses the concept of redundancy,
which means adding extra bits for detecting errors at the
destination.

• Parity Checking
– An additional bit called as parity bit is added to
each data word.
– Even Parity
– Odd Parity
– It can detect only one bit of error.
– It can not reveal the location of erroneous bit.
P Data Word
0 1001011
P Data Word
1 1001011

Checksum Error Detection
Parity check method is not useful in detecting the errors
in case of burst.
A checksum is transmitted along with every block of
data bytes.
An eight bit accumulator to used to add 8 bit bytes of a
data to find the checksum byte.
The carries of the MSB are ignored while finding out
the checksum byte.

At the senderAt the sender
The unit is divided into k sections, each of n
bits.
All sections are added together using one’s
complement to get the sum.
The sum is complemented and becomes the
checksum.
The checksum is sent with the data

At the receiverAt the receiver
The unit is divided into k sections, each of n
bits.
All sections are added together using one’s
complement to get the sum.
The sum is complemented.
If the result is zero, the data are accepted:
otherwise, they are rejected.

Linear Block Code
• Linear block code is an error-correcting code for which any
linear combination of code-words is also a code-word.
• In a linear block code, the exclusive OR of any two valid
code-words creates another valid code-word.
• Linear codes are used in forward error correction and are
applied in methods for transmitting symbols on
communications channel so that if errors occur in the
communication, some errors can be corrected or detected by
the recipient of a message block.
• Linear codes allow for more efficient encoding and decoding
algorithms than other codes.

• A desirable property for a linear block code is
the systematic structure of the code words as
shown in figure.
– Where a code word is divided into two parts
• The message part consists of k information digits
• The redundant checking part consists of n-k parity
check bit.

Cyclic Code
• Cyclic code is a block code, where the circular shifts
of each code-word gives another code-word.
• They are error correcting codes that have algebraic
properties that are convenient for efficient error
detection and correction.
• They can efficiently implemented using simple shift
registers.
• Definition A code C is cyclic if
(i) C is a linear code;
(ii) any cyclic shift of a code-word is also a code-
word,

• i.e. whenever a0,… an-1 ∈C, then also an-1 a0… an–2 ∈ C.
Code C = {000, 101, 011, 110} is cyclic.

Cyclic Redundancy CheckCyclic Redundancy Check
• This is a type of polynomial code in which a bit
string is represented in the form of polynomials
with coefficients of 0 and 1 only.
• For CRC code, the sender and receiver must agree
upon a generator polynomial G(x).
• A code word can be generated for a given data
word (message) polynomial M(x) with the help of
long division.
• CRC is based in binary division.
• A sequence of redundant bits called CRC or CRC
remainder is appended at the end of a data unit
such as byte.

• The resulting data unit after adding CRC remainder
becomes exactly divisible by another predetermined
binary number.
• At the receiver, this data unit is divided by the same
binary number.
• There is no error if this division does not yield any
remainder, but a non-zero remainder indicates
presence of errors in the received data unit.
• Such an erroneous data unit is then rejected.

Hamming code and Hamming distance
• Hamming weight of a code (code vector) is defined as
the number of non-zero components in the code.
• E.g. C=101100 , HW=3
• One of the central concepts in coding for error control
is Hamming distance.
• The Hamming distance between two words (of the
same size) is the number of differences between the
corresponding bits.
• Hamming distance between two words X and Y as
d(X,Y).
• The Hamming distance can easily be found if we
apply the XOR operation on the two words and count
the number of 1s in the result.

• Note that the Hamming distance is a value greater
than zero
• The Hamming distance between 000 and 011 is 2
because 000 XOR 011 is 011 (two 1s).
• The Hamming distance between d(10101,11110) is 3
because 10101 XOR 11110 is 01011 (three 1s)
Minimum Hamming Distance
• The minimum Hamming distance (dmin) is the smallest
Hamming distance between all possible pairs in a set
of words.
• E.g. C={001,110,111,000}, dmin=1

• E.g. consider a 3-bit code word with HD=2 as
C={000,101,110,011}
• If these code words are used then any single error in
any of the code words can easily be detected but if
the code word with HD=3 are used then we can detect
and correct single error in each code word. {000,111}

Convolutional Code
• In telecommunication, a convolutional code is a type of
error-correcting code that generates parity symbols via the
sliding application of a Boolean polynomial function to a
data stream.
• The sliding application represents the 'convolution' of the
encoder over the data, which gives rise to the term
'convolutional coding.‘
• Convolutional codes are often characterized by the base
code rate and the depth (or memory) of the encoder
[n,k,K].
• The base code rate is typically given as n/k, where n is the
input data rate and k is the output symbol rate. The depth is
often called the "constraint length" 'K', which represents
number of memory elements.

• Convolutional codes are used extensively to achieve
reliable data transfer in numerous applications, such as
digital video, radio, mobile communications and satellite
communications.
• To convolutionally encode data, start with k memory
registers, each holding 1 input bit. Unless otherwise
specified, all memory registers start with a value of 0.
• The encoder has n modulo-2 adders (a modulo 2 adder
can be implemented with a single Boolean XOR gate and
n generator polynomials — one for each adder.
• An input bit m1 is fed into the leftmost register. Using the
generator polynomials and the existing values in the
remaining registers, the encoder outputs n symbols.

• These symbols may be transmitted depending on the
desired code rate. Now bit shift all register values to the
right (m1 moves to m0, m0 moves to m−1) and wait for the
next input bit.
• If there are no remaining input bits, the encoder continues
shifting until all registers have returned to the zero state
(flush bit termination).
• The figure below is a rate 1
⁄3 (m
⁄n) encoder with constraint
length (k) of 3. Generator polynomials are G1 = (1,1,1), G2
= (0,1,1), and G3 = (1,0,1). Therefore, output bits are
calculated (modulo 2) as follows:
n1 = m1 + m0 + m−1
n2 = m0 + m−1
n3 = m1 + m−1.

Source coding

More Related Content

What's hot (20)

Similar to Source coding (20)

More from Shankar Gangaju (20)

Recently uploaded (20)

Source coding

Editor's Notes