SlideShare a Scribd company logo
Chapter 8
Information Theory and Coding
What is information?
• You are planning to go to Biratnagar during summer
vacation. You called your friend in Biratnagar to know
the weather conditions of Biratnagar. Assume that you
could have received the following messages regarding the
weather condition in Biratnagar in summer:
– It is sunny and hot
– It is cold
– Snow is falling down.
• Which statement contains the most information?
• The information content of any message is closely related
to the past knowledge of the occurrence of event and the
level of uncertainty it contains with respect to the
recipient of the message
• The amount of information received from the knowledge
of occurrence of an event is related to the probability or
the likelihood of occurrence of the event.
• The message related to an event least likely to occur contains
more information.
• Let m1. m2. ……..mq be the one the q possible messages
emitted by a source with probabilites of occurrence P1, P2,
………..Pq such that,
P1+P2+……….Pq=1
• Let I(mk) be the amount of information contained in k-th
message. Then for I(mk) to represent information content of mk
message, based on intuitions the following conditions should
be met.
A) I(mk)>I(mj), if Pk<pj
B) I(mk) → 0 if Pk → 1
C) I(mk) → 1 if Pk → 0
D) I(mk) ≥ 0 when 0 ≤ Pk≤ 1
E) I (mk and mj) ≡ I(mk mj) = I(mk)+I(mj)
• To satisfy all the conditions mentioned above, we could relate
I(mk) and Pk in the following manner.
I(mk) = log(1/Pk) = -log(Pk)
• If two binary digits 1 and 0 occur with equal probability and
are correctly detected at the receiving end, then the
information content in each digit is 1 bit.
I(0 or 1) = -log2(1/2) = 1 bit
Entropy
• Entropy: average information content of a
sequence of symbol.
1
1
1
log
log
n
i
ii
n
i i
i
H P
P
P P
=
=
=
= −
∑
∑
EntropyH(bits)
0 0.5 1.0
Probability, p
1.0
0.5
0 If the symbol rate is Rs then
the information rate is
Rinf = RsX H bits/sec
Example:
Average Information Content in English Language
Calculate the average information in bits/character in English
assuming each letter is equally likely
26
2
1
1 1
log
26 26
4.7 /
i
H
bits char
=
 
= −  ÷
 
=
∑
Real-world English
But English characters do not appear with the same frequency
in the real-world English literature, probabilities of each
character are assumes as ,
Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10
•Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07
•Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02
•Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01
( ) ( )
( ) ( )
2 2
2 2
4 .1log .1 5 .07 log .07
8 .02log .02 9 .01log .01
4.17 /
H
bits char
 × + ×
=− 
+ × + ×  
=
Source Coding
Source Coding – eliminate redundancy in the data, send same
information in fewer bits
Channel Coding – Detect/Correct errors in signaling and improve
BER
Source Coding
• Goal is to find an efficient description of information
sources
– Reduce required bandwidth
– Reduce memory to store
• Memoryless –If symbols from source are independent,
one symbol does not depend on next
• Memory – elements of a sequence depend on one
another, e.g. UNIVERSIT_?, 10-tuple contains less
information since dependent
• This means that it’s more efficient to code information
with memory as groups of symbols
( ) ( )memory no memory
H X H X<
10
Desirable Properties
• Length
– Fixed Length – ASCII
– Variable Length – Morse Code, JPEG
• Uniquely Decodable – allow user to invert
mapping to the original
• Prefix-Free – No codeword can be a prefix of any
other codeword
• Average Code Length (ni is code length of ith
symbol)
( )i i
i
n n P X= ∑
11
Uniquely Decodable and Prefix Free Codes
• Uniquely decodable?
– Not code 1
– If “10111” sent, is code 3
‘babbb’or ‘bacb’? Not
code 3 or 6
• Prefix-Free
– Not code 4,
– prefix contains ‘1’
• Avg Code Length
– Code 2: n=2
– Code 5: n=1.23
Xi P(Xi)
a 0.73
b 0.25
c 0.02
Sym
bol
Code
1
Code
2
Code
3
Code
4
Code
5
Code
6
a 00 00 0 1 1 1
b 00 01 1 10 00 01
c 11 10 11 100 01 11
Huffman Code
• Characteristics of Huffman Codes:
– Prefix-free, variable length code that can achieve
the shortest average code length for an alphabet
– Most frequent symbols have short codes
• Procedure
– List all symbols and probabilities in descending
order
– Merge branches with two lowest probabilities,
combine their probabilities
– Repeat until one branch is left
Huffman Code Example
0.4
0.2
0.1
0.1
0.1
0.1
a
b
c
d
e
f
0.2
0.4
0.2
0.2
0.1
0.1 0.2
0.4
0.2
0.2
0.2 0.4
0.4
0.4
0.2 0.6
0.6
0.4
1.0
1
0
1
0
1
0
1
01
0
The Code:
A 11
B 00
C 101
D 100
E 011
F 010
2.4n =
Compression
Ratio:
3.0/2.4=1.25
Entropy:
2.32
Example:
• Consider a random vector X = {a, b, c}
with associated probabilities as listed
in the Table
• Calculate the entropy of this symbol
set
• Find the Huffman Code for this symbol
set
• Find the compression ratio and
efficiency of this code
Xi P(Xi)
a 0.73
b 0.25
c 0.02
Error detection and correction codes
• Whenever bits flow from one point to another, they are subject to unpredictable
• changes because of interference. This interference can change the shape of the
signal.
• In a single-bit error, a 0 is changed to a 1 or a 1 to a O. In a burst error, multiple
bits are changed. For example, a 11100 s burst of impulse noise on a transmission
with a data rate of 1200 bps might change all or some of the12 bits of information.
• For reliable communication, errors must be detected and corrected.
• The correction of errors is more difficult than the detection. In error detection,
we are looking only to see if any error has occurred.
• In error correction, we need to know the exact number of bits that are corrupted
and more importantly, their location in the message and then correct it.
• Some of the popular detection methods are:
– Parity checking
– Checksum error detection
– Cyclic redundancy check (CRC)
• Some of the popular correction methods are:
– Block codes
– Convolutional Codes
Single-bit error
This kind of errors can happen in parallel transmission.
Example:Example:
If data is sent at 1Mbps then each bit lasts only
1/1,000,000 sec. or 1 μs.
For a single-bit error to occur, the noise must have a
duration of only 1 μs, which is very rare.
Burst error
• Burst errors does not necessarily mean that
the errors occur in consecutive bits, the length
of the burst is measured from the first corrupted
bit to the last corrupted bit. Some bits in between
may not have been corrupted.
Burst error is most likely to happen in serial
transmission
Example:Example:
If data is sent at rate = 1Kbps then a noise of
1/100 sec can affect 10 bits.(1/100*1000)
If same data is sent at rate = 1Mbps then a noise of
1/100 sec can affect 10,000 bits.(1/100*106
)
Error detectionError detection
• Error detection means to decide whether the received
data is correct or not without having a copy of the
original message.
• Error detection uses the concept of redundancy,
which means adding extra bits for detecting errors at the
destination.
• Parity Checking
– An additional bit called as parity bit is added to
each data word.
– Even Parity
– Odd Parity
– It can detect only one bit of error.
– It can not reveal the location of erroneous bit.
P Data Word
0 1001011
P Data Word
1 1001011
Checksum Error Detection
Parity check method is not useful in detecting the errors
in case of burst.
A checksum is transmitted along with every block of
data bytes.
An eight bit accumulator to used to add 8 bit bytes of a
data to find the checksum byte.
The carries of the MSB are ignored while finding out
the checksum byte.
Source coding
At the senderAt the sender
The unit is divided into k sections, each of n
bits.
All sections are added together using one’s
complement to get the sum.
The sum is complemented and becomes the
checksum.
The checksum is sent with the data
At the receiverAt the receiver
The unit is divided into k sections, each of n
bits.
All sections are added together using one’s
complement to get the sum.
The sum is complemented.
If the result is zero, the data are accepted:
otherwise, they are rejected.
Linear Block Code
• Linear block code is an error-correcting code for which any
linear combination of code-words is also a code-word.
• In a linear block code, the exclusive OR of any two valid
code-words creates another valid code-word.
• Linear codes are used in forward error correction and are
applied in methods for transmitting symbols on
communications channel so that if errors occur in the
communication, some errors can be corrected or detected by
the recipient of a message block.
• Linear codes allow for more efficient encoding and decoding
algorithms than other codes.
• A desirable property for a linear block code is
the systematic structure of the code words as
shown in figure.
– Where a code word is divided into two parts
• The message part consists of k information digits
• The redundant checking part consists of n-k parity
check bit.
Cyclic Code
• Cyclic code is a block code, where the circular shifts
of each code-word gives another code-word.
• They are error correcting codes that have algebraic
properties that are convenient for efficient error
detection and correction.
• They can efficiently implemented using simple shift
registers.
• Definition A code C is cyclic if
(i) C is a linear code;
(ii) any cyclic shift of a code-word is also a code-
word,
• i.e. whenever a0,… an-1 ∈C, then also an-1 a0… an–2 ∈ C.
Code C = {000, 101, 011, 110} is cyclic.
Cyclic Redundancy CheckCyclic Redundancy Check
• This is a type of polynomial code in which a bit
string is represented in the form of polynomials
with coefficients of 0 and 1 only.
• For CRC code, the sender and receiver must agree
upon a generator polynomial G(x).
• A code word can be generated for a given data
word (message) polynomial M(x) with the help of
long division.
• CRC is based in binary division.
• A sequence of redundant bits called CRC or CRC
remainder is appended at the end of a data unit
such as byte.
• The resulting data unit after adding CRC remainder
becomes exactly divisible by another predetermined
binary number.
• At the receiver, this data unit is divided by the same
binary number.
• There is no error if this division does not yield any
remainder, but a non-zero remainder indicates
presence of errors in the received data unit.
• Such an erroneous data unit is then rejected.
Cyclic Redundancy Check
CRC
Hamming code and Hamming distance
• Hamming weight of a code (code vector) is defined as
the number of non-zero components in the code.
• E.g. C=101100 , HW=3
• One of the central concepts in coding for error control
is Hamming distance.
• The Hamming distance between two words (of the
same size) is the number of differences between the
corresponding bits.
• Hamming distance between two words X and Y as
d(X,Y).
• The Hamming distance can easily be found if we
apply the XOR operation on the two words and count
the number of 1s in the result.
• Note that the Hamming distance is a value greater
than zero
• The Hamming distance between 000 and 011 is 2
because 000 XOR 011 is 011 (two 1s).
• The Hamming distance between d(10101,11110) is 3
because 10101 XOR 11110 is 01011 (three 1s)
Minimum Hamming Distance
• The minimum Hamming distance (dmin) is the smallest
Hamming distance between all possible pairs in a set
of words.
• E.g. C={001,110,111,000}, dmin=1
• E.g. consider a 3-bit code word with HD=2 as
C={000,101,110,011}
• If these code words are used then any single error in
any of the code words can easily be detected but if
the code word with HD=3 are used then we can detect
and correct single error in each code word. {000,111}
Convolutional Code
• In telecommunication, a convolutional code is a type of
error-correcting code that generates parity symbols via the
sliding application of a Boolean polynomial function to a
data stream.
• The sliding application represents the 'convolution' of the
encoder over the data, which gives rise to the term
'convolutional coding.‘
• Convolutional codes are often characterized by the base
code rate and the depth (or memory) of the encoder
[n,k,K].
• The base code rate is typically given as n/k, where n is the
input data rate and k is the output symbol rate. The depth is
often called the "constraint length" 'K', which represents
number of memory elements.
• Convolutional codes are used extensively to achieve
reliable data transfer in numerous applications, such as
digital video, radio, mobile communications and satellite
communications.
• To convolutionally encode data, start with k memory
registers, each holding 1 input bit. Unless otherwise
specified, all memory registers start with a value of 0.
• The encoder has n modulo-2 adders (a modulo 2 adder
can be implemented with a single Boolean XOR gate and
n generator polynomials — one for each adder.
• An input bit m1 is fed into the leftmost register. Using the
generator polynomials and the existing values in the
remaining registers, the encoder outputs n symbols.
• These symbols may be transmitted depending on the
desired code rate. Now bit shift all register values to the
right (m1 moves to m0, m0 moves to m−1) and wait for the
next input bit.
• If there are no remaining input bits, the encoder continues
shifting until all registers have returned to the zero state
(flush bit termination).
• The figure below is a rate ​1
⁄3 (​m
⁄n) encoder with constraint
length (k) of 3. Generator polynomials are G1 = (1,1,1), G2
= (0,1,1), and G3 = (1,0,1). Therefore, output bits are
calculated (modulo 2) as follows:
n1 = m1 + m0 + m−1
n2 = m0 + m−1
n3 = m1 + m−1.

More Related Content

PDF
Digital base band modulation
PPTX
Digital Communication 1
PPTX
UNIT-1 Elements of Digital Communication
PPTX
Introduction of digital communication
PDF
Introduction to Digital Communication
PDF
Digital communication systems
PDF
Error Control Coding -Introduction
PPTX
PSK (PHASE SHIFT KEYING )
Digital base band modulation
Digital Communication 1
UNIT-1 Elements of Digital Communication
Introduction of digital communication
Introduction to Digital Communication
Digital communication systems
Error Control Coding -Introduction
PSK (PHASE SHIFT KEYING )

What's hot (20)

PPT
spread spectrum
PPT
Digital Communication: Information Theory
PPTX
TDMA, FDMA, and CDMA
PPT
Noise in Communication System
PPTX
Convolutional codes
PPTX
Gsm security algorithms A3 , A5 , A8
PDF
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
PPTX
Pulse shaping
PPTX
Spread spectrum
PPTX
Presentation on 8086 Microprocessor
PDF
Multiplexing
PPTX
Convolutional Error Control Coding
PPTX
BCH Codes
PPTX
Linear block coding
PPTX
Convolution Codes
PPTX
NYQUIST CRITERION FOR ZERO ISI
PPT
Chapter 03 cyclic codes
PPTX
FHSS- Frequency Hop Spread Spectrum
PPTX
Frequency division multiplexing (FDM).pptx
spread spectrum
Digital Communication: Information Theory
TDMA, FDMA, and CDMA
Noise in Communication System
Convolutional codes
Gsm security algorithms A3 , A5 , A8
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Pulse shaping
Spread spectrum
Presentation on 8086 Microprocessor
Multiplexing
Convolutional Error Control Coding
BCH Codes
Linear block coding
Convolution Codes
NYQUIST CRITERION FOR ZERO ISI
Chapter 03 cyclic codes
FHSS- Frequency Hop Spread Spectrum
Frequency division multiplexing (FDM).pptx
Ad

Similar to Source coding (20)

PPTX
DCN Error Detection & Correction
PPTX
Error Detection and Correction - Data link Layer
PDF
Data links
PPTX
Error Detection and correction concepts in Data communication and networks
PDF
07 Data Link LayerError Control.pdf
PPT
13-DataLink_02.ppt
PDF
1.4.pdf 1.4.pdf 1.4.pdf1.4.pdf1.4.pdf1.4.pdf
PPTX
9-Lect_9-2.pptx DataLink Layer DataLink Layer
PPTX
Error detection and correction
PPT
15CS46 - Data communication or computer networks 1_Module-3.ppt
PPT
Chapter 10 Error Detection and Correction 267 10.1 INTRODUCTION 267 Types of ...
PPT
Full error detection and correction
PPTX
Parity check, redundancy, and errors
PPTX
ERROR DETECTION IN DATA COMMUNICATION AND NETWORKING-1.pptx
PDF
crc_checksum.pdf
PPTX
Error detection.
PDF
cn computer netwok module 1 computer nep
PPT
error detection correction
PPTX
Coding Scheme/ Information theory/ Error coding scheme
DCN Error Detection & Correction
Error Detection and Correction - Data link Layer
Data links
Error Detection and correction concepts in Data communication and networks
07 Data Link LayerError Control.pdf
13-DataLink_02.ppt
1.4.pdf 1.4.pdf 1.4.pdf1.4.pdf1.4.pdf1.4.pdf
9-Lect_9-2.pptx DataLink Layer DataLink Layer
Error detection and correction
15CS46 - Data communication or computer networks 1_Module-3.ppt
Chapter 10 Error Detection and Correction 267 10.1 INTRODUCTION 267 Types of ...
Full error detection and correction
Parity check, redundancy, and errors
ERROR DETECTION IN DATA COMMUNICATION AND NETWORKING-1.pptx
crc_checksum.pdf
Error detection.
cn computer netwok module 1 computer nep
error detection correction
Coding Scheme/ Information theory/ Error coding scheme
Ad

More from Shankar Gangaju (20)

PDF
Tutorial no. 8
PDF
Tutorial no. 7
PDF
Tutorial no. 6
PDF
Tutorial no. 3(1)
PDF
Tutorial no. 5
PDF
Tutorial no. 4
PDF
Tutorial no. 2
PDF
Tutorial no. 1.doc
DOC
What is a computer
DOC
DOC
DOC
9.structure & union
DOC
DOC
5.program structure
DOC
4. function
DOC
3. control statement
DOC
2. operator
DOC
1. introduction to computer
PDF
PDF
Electromagnetic formula
Tutorial no. 8
Tutorial no. 7
Tutorial no. 6
Tutorial no. 3(1)
Tutorial no. 5
Tutorial no. 4
Tutorial no. 2
Tutorial no. 1.doc
What is a computer
9.structure & union
5.program structure
4. function
3. control statement
2. operator
1. introduction to computer
Electromagnetic formula

Recently uploaded (20)

PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPT
Project quality management in manufacturing
PPTX
Construction Project Organization Group 2.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PPT on Performance Review to get promotions
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Sustainable Sites - Green Building Construction
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
additive manufacturing of ss316l using mig welding
Project quality management in manufacturing
Construction Project Organization Group 2.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Embodied AI: Ushering in the Next Era of Intelligent Systems
Lecture Notes Electrical Wiring System Components
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT on Performance Review to get promotions
CYBER-CRIMES AND SECURITY A guide to understanding
Mechanical Engineering MATERIALS Selection
Sustainable Sites - Green Building Construction
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Foundation to blockchain - A guide to Blockchain Tech
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

Source coding

  • 2. What is information? • You are planning to go to Biratnagar during summer vacation. You called your friend in Biratnagar to know the weather conditions of Biratnagar. Assume that you could have received the following messages regarding the weather condition in Biratnagar in summer: – It is sunny and hot – It is cold – Snow is falling down. • Which statement contains the most information? • The information content of any message is closely related to the past knowledge of the occurrence of event and the level of uncertainty it contains with respect to the recipient of the message • The amount of information received from the knowledge of occurrence of an event is related to the probability or the likelihood of occurrence of the event.
  • 3. • The message related to an event least likely to occur contains more information. • Let m1. m2. ……..mq be the one the q possible messages emitted by a source with probabilites of occurrence P1, P2, ………..Pq such that, P1+P2+……….Pq=1 • Let I(mk) be the amount of information contained in k-th message. Then for I(mk) to represent information content of mk message, based on intuitions the following conditions should be met. A) I(mk)>I(mj), if Pk<pj B) I(mk) → 0 if Pk → 1 C) I(mk) → 1 if Pk → 0 D) I(mk) ≥ 0 when 0 ≤ Pk≤ 1 E) I (mk and mj) ≡ I(mk mj) = I(mk)+I(mj)
  • 4. • To satisfy all the conditions mentioned above, we could relate I(mk) and Pk in the following manner. I(mk) = log(1/Pk) = -log(Pk) • If two binary digits 1 and 0 occur with equal probability and are correctly detected at the receiving end, then the information content in each digit is 1 bit. I(0 or 1) = -log2(1/2) = 1 bit
  • 5. Entropy • Entropy: average information content of a sequence of symbol. 1 1 1 log log n i ii n i i i H P P P P = = = = − ∑ ∑ EntropyH(bits) 0 0.5 1.0 Probability, p 1.0 0.5 0 If the symbol rate is Rs then the information rate is Rinf = RsX H bits/sec
  • 6. Example: Average Information Content in English Language Calculate the average information in bits/character in English assuming each letter is equally likely 26 2 1 1 1 log 26 26 4.7 / i H bits char =   = −  ÷   = ∑
  • 7. Real-world English But English characters do not appear with the same frequency in the real-world English literature, probabilities of each character are assumes as , Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10 •Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07 •Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02 •Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01 ( ) ( ) ( ) ( ) 2 2 2 2 4 .1log .1 5 .07 log .07 8 .02log .02 9 .01log .01 4.17 / H bits char  × + × =−  + × + ×   =
  • 8. Source Coding Source Coding – eliminate redundancy in the data, send same information in fewer bits Channel Coding – Detect/Correct errors in signaling and improve BER
  • 9. Source Coding • Goal is to find an efficient description of information sources – Reduce required bandwidth – Reduce memory to store • Memoryless –If symbols from source are independent, one symbol does not depend on next • Memory – elements of a sequence depend on one another, e.g. UNIVERSIT_?, 10-tuple contains less information since dependent • This means that it’s more efficient to code information with memory as groups of symbols ( ) ( )memory no memory H X H X<
  • 10. 10 Desirable Properties • Length – Fixed Length – ASCII – Variable Length – Morse Code, JPEG • Uniquely Decodable – allow user to invert mapping to the original • Prefix-Free – No codeword can be a prefix of any other codeword • Average Code Length (ni is code length of ith symbol) ( )i i i n n P X= ∑
  • 11. 11 Uniquely Decodable and Prefix Free Codes • Uniquely decodable? – Not code 1 – If “10111” sent, is code 3 ‘babbb’or ‘bacb’? Not code 3 or 6 • Prefix-Free – Not code 4, – prefix contains ‘1’ • Avg Code Length – Code 2: n=2 – Code 5: n=1.23 Xi P(Xi) a 0.73 b 0.25 c 0.02 Sym bol Code 1 Code 2 Code 3 Code 4 Code 5 Code 6 a 00 00 0 1 1 1 b 00 01 1 10 00 01 c 11 10 11 100 01 11
  • 12. Huffman Code • Characteristics of Huffman Codes: – Prefix-free, variable length code that can achieve the shortest average code length for an alphabet – Most frequent symbols have short codes • Procedure – List all symbols and probabilities in descending order – Merge branches with two lowest probabilities, combine their probabilities – Repeat until one branch is left
  • 13. Huffman Code Example 0.4 0.2 0.1 0.1 0.1 0.1 a b c d e f 0.2 0.4 0.2 0.2 0.1 0.1 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.6 0.6 0.4 1.0 1 0 1 0 1 0 1 01 0 The Code: A 11 B 00 C 101 D 100 E 011 F 010 2.4n = Compression Ratio: 3.0/2.4=1.25 Entropy: 2.32
  • 14. Example: • Consider a random vector X = {a, b, c} with associated probabilities as listed in the Table • Calculate the entropy of this symbol set • Find the Huffman Code for this symbol set • Find the compression ratio and efficiency of this code Xi P(Xi) a 0.73 b 0.25 c 0.02
  • 15. Error detection and correction codes • Whenever bits flow from one point to another, they are subject to unpredictable • changes because of interference. This interference can change the shape of the signal. • In a single-bit error, a 0 is changed to a 1 or a 1 to a O. In a burst error, multiple bits are changed. For example, a 11100 s burst of impulse noise on a transmission with a data rate of 1200 bps might change all or some of the12 bits of information. • For reliable communication, errors must be detected and corrected. • The correction of errors is more difficult than the detection. In error detection, we are looking only to see if any error has occurred. • In error correction, we need to know the exact number of bits that are corrupted and more importantly, their location in the message and then correct it. • Some of the popular detection methods are: – Parity checking – Checksum error detection – Cyclic redundancy check (CRC) • Some of the popular correction methods are: – Block codes – Convolutional Codes
  • 16. Single-bit error This kind of errors can happen in parallel transmission. Example:Example: If data is sent at 1Mbps then each bit lasts only 1/1,000,000 sec. or 1 μs. For a single-bit error to occur, the noise must have a duration of only 1 μs, which is very rare.
  • 18. • Burst errors does not necessarily mean that the errors occur in consecutive bits, the length of the burst is measured from the first corrupted bit to the last corrupted bit. Some bits in between may not have been corrupted. Burst error is most likely to happen in serial transmission Example:Example: If data is sent at rate = 1Kbps then a noise of 1/100 sec can affect 10 bits.(1/100*1000) If same data is sent at rate = 1Mbps then a noise of 1/100 sec can affect 10,000 bits.(1/100*106 )
  • 19. Error detectionError detection • Error detection means to decide whether the received data is correct or not without having a copy of the original message. • Error detection uses the concept of redundancy, which means adding extra bits for detecting errors at the destination.
  • 20. • Parity Checking – An additional bit called as parity bit is added to each data word. – Even Parity – Odd Parity – It can detect only one bit of error. – It can not reveal the location of erroneous bit. P Data Word 0 1001011 P Data Word 1 1001011
  • 21. Checksum Error Detection Parity check method is not useful in detecting the errors in case of burst. A checksum is transmitted along with every block of data bytes. An eight bit accumulator to used to add 8 bit bytes of a data to find the checksum byte. The carries of the MSB are ignored while finding out the checksum byte.
  • 23. At the senderAt the sender The unit is divided into k sections, each of n bits. All sections are added together using one’s complement to get the sum. The sum is complemented and becomes the checksum. The checksum is sent with the data
  • 24. At the receiverAt the receiver The unit is divided into k sections, each of n bits. All sections are added together using one’s complement to get the sum. The sum is complemented. If the result is zero, the data are accepted: otherwise, they are rejected.
  • 25. Linear Block Code • Linear block code is an error-correcting code for which any linear combination of code-words is also a code-word. • In a linear block code, the exclusive OR of any two valid code-words creates another valid code-word. • Linear codes are used in forward error correction and are applied in methods for transmitting symbols on communications channel so that if errors occur in the communication, some errors can be corrected or detected by the recipient of a message block. • Linear codes allow for more efficient encoding and decoding algorithms than other codes.
  • 26. • A desirable property for a linear block code is the systematic structure of the code words as shown in figure. – Where a code word is divided into two parts • The message part consists of k information digits • The redundant checking part consists of n-k parity check bit.
  • 27. Cyclic Code • Cyclic code is a block code, where the circular shifts of each code-word gives another code-word. • They are error correcting codes that have algebraic properties that are convenient for efficient error detection and correction. • They can efficiently implemented using simple shift registers. • Definition A code C is cyclic if (i) C is a linear code; (ii) any cyclic shift of a code-word is also a code- word,
  • 28. • i.e. whenever a0,… an-1 ∈C, then also an-1 a0… an–2 ∈ C. Code C = {000, 101, 011, 110} is cyclic.
  • 29. Cyclic Redundancy CheckCyclic Redundancy Check • This is a type of polynomial code in which a bit string is represented in the form of polynomials with coefficients of 0 and 1 only. • For CRC code, the sender and receiver must agree upon a generator polynomial G(x). • A code word can be generated for a given data word (message) polynomial M(x) with the help of long division. • CRC is based in binary division. • A sequence of redundant bits called CRC or CRC remainder is appended at the end of a data unit such as byte.
  • 30. • The resulting data unit after adding CRC remainder becomes exactly divisible by another predetermined binary number. • At the receiver, this data unit is divided by the same binary number. • There is no error if this division does not yield any remainder, but a non-zero remainder indicates presence of errors in the received data unit. • Such an erroneous data unit is then rejected.
  • 32. Hamming code and Hamming distance • Hamming weight of a code (code vector) is defined as the number of non-zero components in the code. • E.g. C=101100 , HW=3 • One of the central concepts in coding for error control is Hamming distance. • The Hamming distance between two words (of the same size) is the number of differences between the corresponding bits. • Hamming distance between two words X and Y as d(X,Y). • The Hamming distance can easily be found if we apply the XOR operation on the two words and count the number of 1s in the result.
  • 33. • Note that the Hamming distance is a value greater than zero • The Hamming distance between 000 and 011 is 2 because 000 XOR 011 is 011 (two 1s). • The Hamming distance between d(10101,11110) is 3 because 10101 XOR 11110 is 01011 (three 1s) Minimum Hamming Distance • The minimum Hamming distance (dmin) is the smallest Hamming distance between all possible pairs in a set of words. • E.g. C={001,110,111,000}, dmin=1
  • 34. • E.g. consider a 3-bit code word with HD=2 as C={000,101,110,011} • If these code words are used then any single error in any of the code words can easily be detected but if the code word with HD=3 are used then we can detect and correct single error in each code word. {000,111}
  • 35. Convolutional Code • In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a Boolean polynomial function to a data stream. • The sliding application represents the 'convolution' of the encoder over the data, which gives rise to the term 'convolutional coding.‘ • Convolutional codes are often characterized by the base code rate and the depth (or memory) of the encoder [n,k,K]. • The base code rate is typically given as n/k, where n is the input data rate and k is the output symbol rate. The depth is often called the "constraint length" 'K', which represents number of memory elements.
  • 36. • Convolutional codes are used extensively to achieve reliable data transfer in numerous applications, such as digital video, radio, mobile communications and satellite communications. • To convolutionally encode data, start with k memory registers, each holding 1 input bit. Unless otherwise specified, all memory registers start with a value of 0. • The encoder has n modulo-2 adders (a modulo 2 adder can be implemented with a single Boolean XOR gate and n generator polynomials — one for each adder. • An input bit m1 is fed into the leftmost register. Using the generator polynomials and the existing values in the remaining registers, the encoder outputs n symbols.
  • 37. • These symbols may be transmitted depending on the desired code rate. Now bit shift all register values to the right (m1 moves to m0, m0 moves to m−1) and wait for the next input bit. • If there are no remaining input bits, the encoder continues shifting until all registers have returned to the zero state (flush bit termination). • The figure below is a rate ​1 ⁄3 (​m ⁄n) encoder with constraint length (k) of 3. Generator polynomials are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore, output bits are calculated (modulo 2) as follows: n1 = m1 + m0 + m−1 n2 = m0 + m−1 n3 = m1 + m−1.

Editor's Notes

  • #7: This shows how we may be able to use fewer bits ON AVERAGE to transmit a signal by encoding frequently seen symbols with fewer bits. This is exploited in source coding.
  • #20: Make sense of message. Make sense of message.