SlideShare a Scribd company logo
2
Most read
10
Most read
11
Most read
Data Compression
Manish T I
• If a data item d occurs n consecutive times in the
input stream, replace the n occurrences with the
single pair nd.
• The n consecutive occurrences of a data item are
called a run length of n, and this approach to
data compression is called run-length encoding
or RLE.
Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
We have to adopt the convention that only three or
more repetitions of the same character will be
replaced with a repetition factor.
The main problems with this method are the
following:
• In plain English text there are not many repetitions.
• There are many “doubles” but a “triple” is rare.
• The most repetitive character is the space.
• Dashes or asterisks may sometimes also repeat.
• In mathematical texts, digits may repeat.
• Example Paragraph
The abbott from Abruzzi accedes to the demands of all
abbesses from Narragansett and Abbevilles from Abyssinia.
He will accommodate them, abbreviate his sabbatical, and
be an accomplished accessory.
• The character “@” may be part of the text in the input
stream, in which case a different escape character must
be chosen.
• Sometimes the input stream may contain every possible
character in the alphabet.
• Example
An example is an object file, the result of compiling a
program. Such a file contains machine instructions and can
be considered a string of bytes that may have any values.
• Since the repetition count is written on the output
stream as a byte, it is limited to counts of up to 255.
• This limitation can be softened somewhat when we
realize that the existence of a repetition count means
that there is a repetition (at least three identical
consecutive characters).
• We may adopt the convention that a repeat count of 0
means three repeat characters, which implies that a
repeat count of 255 means a run of 258 identical
characters.
• The MNP class 5 method was used for data
compression in old modems.
• It has been developed by Microcom, Inc., a
maker of modems (MNP stands for Microcom
Network Protocol), and it uses a combination
of run-length and adaptive frequency
encoding.
Performance
We assume that the string contains M repetitions of
average length L each. Each of the M repetitions is
replaced by 3 characters (escape, count, and data)
Size of the compressed string is N − M × L +M ×3
= N −M(L − 3)
Compression factor = N / N −M(L − 3)
Digram Encoding
• A variant of run length encoding for text is digram
encoding.
• This method is suitable for cases where the data
to be compressed consists only of certain
characters, e.g., just letters, digits, and
punctuation.
• Good results can be obtained if the data can be
analyzed beforehand.
• “E”, “T”, “TH”, and “A”, occur often.
Pattern Substitution
For compressing computer programs, where certain
words, such as for, repeat, and print, occur often.
Each such word is replaced with a control character
or, if there are many such words, with an escape
character followed by a code character.
Assuming that code “a” is assigned to the word
print, the text “m:print,b,a;” will be compressed to
“m:@a,b,a;”.
Relative Encoding [Differencing]
• Successive temperatures normally do not differ
by much, so the sensor needs to send only the
first temperature, followed by differences.
The sequence of temperatures 70, 71, 72.5, 73.1, . .
can be compressed to 70, 1, 1.5, 0.6, . . ..
The differences are small and can be expressed in
fewer bits.
The sequence 110, 115, 121, 119, 200, 202, . . .
can be compressed to 110, 5, 6,−2, 200, 2, . . . .
Now need to distinguish between a difference and
an actual value.
The compressor creating an extra bit (a flag) for each
number sent, accumulating those bits, and sending
them to the de compressor from time to time, as part
of the transmission.
Assuming that each difference is sent as a byte, the
compressor should follow (or precede) a group of 8 bytes
with a byte consisting of their 8 flags.
Another practical way to send differences mixed with actual
values is to send pairs of bytes. Each pair is either an actual
16-bit measurement (positive or negative) or two 8-bit
signed differences.
Thus actual measurements can be between 0 and ±32K and
differences can be between 0 and ±255.
For each pair, the compressor creates a flag: 0 if the pair is
an actual value, 1 if it is a pair of differences.
After 16 pairs are sent, the compressor sends the 16 flags.
• The sequence of measurements 110, 115, 121,
119, 200, 202, . . . is sent as (110), (5, 6), (−2,−1),
(200), (2, . . .), where each pair of parentheses
indicates a pair of bytes.
• The −1 has value 11111111 (binary) , which is
ignored by the de-compressor (it indicates that
there is only one difference in this pair).
Reference:-
 Data Compression: The Complete Reference, David
Salomon, Springer Science & Business Media, 2004
For any queries contact:
Web: www.iprg.co.in
E-mail: manishti2004@gmail.com
Facebook: @ImageProcessingResearchGroup

More Related Content

PPTX
Run length encoding
PPTX
Intensity Transformation and Spatial filtering
PPT
Huffman Coding
PPTX
Devices and gateways
PPTX
digital image processing
PPTX
Module 31
PPTX
Chapter 9 morphological image processing
PPTX
Bit plane coding
Run length encoding
Intensity Transformation and Spatial filtering
Huffman Coding
Devices and gateways
digital image processing
Module 31
Chapter 9 morphological image processing
Bit plane coding

What's hot (20)

PPTX
Predictive coding
PPTX
Histogram Processing
ODP
image compression ppt
PPT
Wavelet transform in image compression
PDF
Digital Image Processing: Image Segmentation
PPTX
Digital image processing
PPT
Image Restoration
PPTX
switching techniques in data communication and networking
PPTX
Computer graphics chapter 4
PPTX
PPTX
Wavelet based image compression technique
PPTX
Feature scaling
PPSX
Edge Detection and Segmentation
PPT
Arithmetic coding
PPTX
Image Restoration (Order Statistics Filters)
PPT
Image segmentation ppt
PPTX
Spread spectrum
PDF
Feature Engineering in Machine Learning
Predictive coding
Histogram Processing
image compression ppt
Wavelet transform in image compression
Digital Image Processing: Image Segmentation
Digital image processing
Image Restoration
switching techniques in data communication and networking
Computer graphics chapter 4
Wavelet based image compression technique
Feature scaling
Edge Detection and Segmentation
Arithmetic coding
Image Restoration (Order Statistics Filters)
Image segmentation ppt
Spread spectrum
Feature Engineering in Machine Learning
Ad

Viewers also liked (20)

PDF
RLE - Run Length Encoding (UFP)
PPT
Compression
PPT
Huffman Student
PPTX
Huffman coding
PPTX
Image enhancement
PDF
Quantization
PPTX
Quantization
PPTX
Huffman codes
PDF
Discrete cosine transform
PDF
quantization
PDF
Data compression huffman coding algoritham
PPTX
Research Methodology - Introduction
PPTX
Text compression in LZW and Flate
PPTX
Huffman Coding
PPTX
Sampling theorem
PPTX
data compression technique
PDF
Run-Length Encoding algorithm
PPT
Sampling theory
PPT
Data compression
RLE - Run Length Encoding (UFP)
Compression
Huffman Student
Huffman coding
Image enhancement
Quantization
Quantization
Huffman codes
Discrete cosine transform
quantization
Data compression huffman coding algoritham
Research Methodology - Introduction
Text compression in LZW and Flate
Huffman Coding
Sampling theorem
data compression technique
Run-Length Encoding algorithm
Sampling theory
Data compression
Ad

Similar to Data Compression - Text Compression - Run Length Encoding (20)

PDF
Data communication & computer networking: Huffman algorithm
PPTX
Text compression
DOCX
Lecft3data
PPT
1 PCM & Encoding
PDF
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
PPTX
linear block code.pptxjdkdidjdjdkdkidndndjdj
PPT
frequency division multiplexing lecture 16.ppt
PPTX
3 mathematical priliminaries DATA compression
PPT
Multiplexing concepts and techniques.ppt
PPTX
Chapter 4 Lossless Compression Algorithims.pptx
PPTX
Source coding
PPTX
Non-Uniform Quantisation in digital communication system
PPTX
Presentation ppt 3.pptx
PPTX
9-Lect_9-2.pptx DataLink Layer DataLink Layer
PPT
Compression Ii
PPT
Compression Ii
PPTX
Language Model.pptx
PPTX
digital communication and Digital pulse Modulation.pptx
PPT
Compression ii
PPT
Lec5 Compression
Data communication & computer networking: Huffman algorithm
Text compression
Lecft3data
1 PCM & Encoding
12_HuffmanhsjsjsjjsiejjssjjejsjCoding_pdf.pdf
linear block code.pptxjdkdidjdjdkdkidndndjdj
frequency division multiplexing lecture 16.ppt
3 mathematical priliminaries DATA compression
Multiplexing concepts and techniques.ppt
Chapter 4 Lossless Compression Algorithims.pptx
Source coding
Non-Uniform Quantisation in digital communication system
Presentation ppt 3.pptx
9-Lect_9-2.pptx DataLink Layer DataLink Layer
Compression Ii
Compression Ii
Language Model.pptx
digital communication and Digital pulse Modulation.pptx
Compression ii
Lec5 Compression

More from MANISH T I (20)

PDF
Budgerigar
PDF
NAAC Criteria 3
PDF
Artificial intelligence - An Overview
PDF
The future of blogging
PDF
Socrates - Most Important of his Thoughts
PDF
Technical writing
PDF
Shannon-Fano algorithm
PPTX
Solar Image Processing
PDF
Graph Theory Introduction
PDF
Rooted & binary tree
PPTX
PPTX
Colourful Living - Way of Life
PPTX
Introduction to Multimedia
PPT
Soft Computing
PPTX
Research Methodology - Methods of data collection
PPTX
15 lessons of lord buddha
PPTX
DBMS - FIRST NORMAL FORM
PPTX
Simple Dictionary Compression
PPTX
Introduction for Data Compression
PPT
Lz77 (sliding window)
Budgerigar
NAAC Criteria 3
Artificial intelligence - An Overview
The future of blogging
Socrates - Most Important of his Thoughts
Technical writing
Shannon-Fano algorithm
Solar Image Processing
Graph Theory Introduction
Rooted & binary tree
Colourful Living - Way of Life
Introduction to Multimedia
Soft Computing
Research Methodology - Methods of data collection
15 lessons of lord buddha
DBMS - FIRST NORMAL FORM
Simple Dictionary Compression
Introduction for Data Compression
Lz77 (sliding window)

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Institutional Correction lecture only . . .
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
VCE English Exam - Section C Student Revision Booklet
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Institutional Correction lecture only . . .
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
RMMM.pdf make it easy to upload and study
Abdominal Access Techniques with Prof. Dr. R K Mishra
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Pharma ospi slides which help in ospi learning
Supply Chain Operations Speaking Notes -ICLT Program
Anesthesia in Laparoscopic Surgery in India
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
GDM (1) (1).pptx small presentation for students
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

Data Compression - Text Compression - Run Length Encoding

  • 2. • If a data item d occurs n consecutive times in the input stream, replace the n occurrences with the single pair nd. • The n consecutive occurrences of a data item are called a run length of n, and this approach to data compression is called run-length encoding or RLE. Run Length Encoding
  • 4. We have to adopt the convention that only three or more repetitions of the same character will be replaced with a repetition factor. The main problems with this method are the following: • In plain English text there are not many repetitions. • There are many “doubles” but a “triple” is rare. • The most repetitive character is the space. • Dashes or asterisks may sometimes also repeat. • In mathematical texts, digits may repeat.
  • 5. • Example Paragraph The abbott from Abruzzi accedes to the demands of all abbesses from Narragansett and Abbevilles from Abyssinia. He will accommodate them, abbreviate his sabbatical, and be an accomplished accessory. • The character “@” may be part of the text in the input stream, in which case a different escape character must be chosen. • Sometimes the input stream may contain every possible character in the alphabet. • Example An example is an object file, the result of compiling a program. Such a file contains machine instructions and can be considered a string of bytes that may have any values.
  • 6. • Since the repetition count is written on the output stream as a byte, it is limited to counts of up to 255. • This limitation can be softened somewhat when we realize that the existence of a repetition count means that there is a repetition (at least three identical consecutive characters). • We may adopt the convention that a repeat count of 0 means three repeat characters, which implies that a repeat count of 255 means a run of 258 identical characters.
  • 7. • The MNP class 5 method was used for data compression in old modems. • It has been developed by Microcom, Inc., a maker of modems (MNP stands for Microcom Network Protocol), and it uses a combination of run-length and adaptive frequency encoding.
  • 8. Performance We assume that the string contains M repetitions of average length L each. Each of the M repetitions is replaced by 3 characters (escape, count, and data) Size of the compressed string is N − M × L +M ×3 = N −M(L − 3) Compression factor = N / N −M(L − 3)
  • 9. Digram Encoding • A variant of run length encoding for text is digram encoding. • This method is suitable for cases where the data to be compressed consists only of certain characters, e.g., just letters, digits, and punctuation. • Good results can be obtained if the data can be analyzed beforehand. • “E”, “T”, “TH”, and “A”, occur often.
  • 10. Pattern Substitution For compressing computer programs, where certain words, such as for, repeat, and print, occur often. Each such word is replaced with a control character or, if there are many such words, with an escape character followed by a code character. Assuming that code “a” is assigned to the word print, the text “m:print,b,a;” will be compressed to “m:@a,b,a;”.
  • 11. Relative Encoding [Differencing] • Successive temperatures normally do not differ by much, so the sensor needs to send only the first temperature, followed by differences. The sequence of temperatures 70, 71, 72.5, 73.1, . . can be compressed to 70, 1, 1.5, 0.6, . . .. The differences are small and can be expressed in fewer bits.
  • 12. The sequence 110, 115, 121, 119, 200, 202, . . . can be compressed to 110, 5, 6,−2, 200, 2, . . . . Now need to distinguish between a difference and an actual value. The compressor creating an extra bit (a flag) for each number sent, accumulating those bits, and sending them to the de compressor from time to time, as part of the transmission. Assuming that each difference is sent as a byte, the compressor should follow (or precede) a group of 8 bytes with a byte consisting of their 8 flags.
  • 13. Another practical way to send differences mixed with actual values is to send pairs of bytes. Each pair is either an actual 16-bit measurement (positive or negative) or two 8-bit signed differences. Thus actual measurements can be between 0 and ±32K and differences can be between 0 and ±255. For each pair, the compressor creates a flag: 0 if the pair is an actual value, 1 if it is a pair of differences. After 16 pairs are sent, the compressor sends the 16 flags.
  • 14. • The sequence of measurements 110, 115, 121, 119, 200, 202, . . . is sent as (110), (5, 6), (−2,−1), (200), (2, . . .), where each pair of parentheses indicates a pair of bytes. • The −1 has value 11111111 (binary) , which is ignored by the de-compressor (it indicates that there is only one difference in this pair).
  • 15. Reference:-  Data Compression: The Complete Reference, David Salomon, Springer Science & Business Media, 2004 For any queries contact: Web: www.iprg.co.in E-mail: [email protected] Facebook: @ImageProcessingResearchGroup