SlideShare a Scribd company logo
Dynamic Time Wrapping (DTW)
Dynamic time wrapping (DTW) is a well-known technique to find an optimal alignment
between two given (time-dependent) sequences under certain restrictions (Fig. 01). Intuitively,
the sequences are warped in a nonlinear fashion to match each other. Originally, DTW has been
used to compare different speech patterns in automatic speech recognition. In fields such as
data mining and information retrieval, DTW has been successfully applied to automatically
cope with time deformations and different speeds associated with time-dependent data. Fig-
01 showing two speech signal aligned with DTW.
Fig-01: Two Speech Signal Aligned using DTW
Fig. 02: Time alignment of two time-dependent sequences. Aligned points are indicated by
the arrows
The distance between two point, x=[x1,x2,...,xn] and y=[y1,y2,...,yn]
in a n-dimensional space can be computed via the Euclidean distance:
dist(x,y)=∥x−y∥=√((𝑥1 − 𝑦1)2 + (𝑥2 − 𝑦2)2 + ⋯ + (𝑥 𝑛 − 𝑦 𝑛)2)
However, if the length of x is different from y, then we cannot use the above formula to
compute the distance. Instead, we need a more flexible method that can find the best mapping
from elements in x to those in y in order to compute the distance.
The goal of dynamic time wrapping (DTW for short) is to find the best mapping with the
minimum distance by the use of DP. The method is called "time wrapping " since both x and
y are usually vectors of time series and we need to compress or expand in time in order to
find the best mapping. We shall give the formula for DTW in this section.
Let t and r be two vectors of lengths m and n, respectively. The goal of DTW is to find a
mapping path {(p1,q1),(p2,q2),...,(pk,qk)} such that the distance on this mapping path
∑ |𝑡(𝑝𝑖) − 𝑟(𝑞𝑖)|𝑘
𝑖=0 is minimized, with the following constraints:
 Boundary conditions: (p1,q1)=(1,1), (pk,qk)=(m,n). This is a typical example of
"anchored beginning" and "anchored end".
 Local constraint: For any given node (i,j) in the path, the possible fan-in nodes are
restricted to (i−1,j), (i,j−1), (i−1,j−1). This local constraint guarantees that the
mapping path is monotonically non-decreasing in its first and second arguments.
Moreover, for any given element in t, we should be able to find at least one
corresponding element in r, and vice versa.
How can we find the optimum mapping path in DTW? An obvious choice is forward DP,
which can be summarized in the following three steps:
1. Optimum-value function: Define D(i,j) as the DTW distance between t(1:i) and r(1:j),
with the mapping path starting from (1,1) to (i,j).
2. Recursion:
with the initial condition D(1,1)=|t(1)−r(1)|
3. Final answer: D(m,n).
In practice, we need to construct a matrix D of dimensions m×n first and fill in the value of
D(1,1) by using the initial condition. Then by using the recursive formula, we fill the whole
matrix one element at a time, by following a column-by-column or row-by-row order. The
final answer will be available as D(m,n), with a computational complexity of O(mn).
If we want to know the optimum mapping path in addition to the minimum distance, we may
want to keep the optimum fan-in of each node. Then at the end of DP, we can quickly back
track to find the optimum mapping path between the two input vectors.
We can also have the backward DP for DTW, as follows:
1. Optimum-value function: Define D(i,j) as the DTW distance between t(i:m) and r(j:n),
with the mapping path from (i,j) to (m,n).
2. Recursion:
with the initial condition D(m,n)=|t(m)−r(n)|
3. Final answer: D(1,1).
The answer obtain by the backward DP should be that same as that obtained by the forward
DP.
Another commonly used local path constraint is to set the fan-in of 27°-45°-63° only, as
shown in the following figure:
Advantages
 Works well for small number of templates (<20)
 Language independent
 Speaker specific
 Easy to train (end user controls it)
Disadvantages
 Limited number of templates
 Need actual training examples
Applications
 Spoken word recognition
Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus
time axis which needs to be eliminated. Considering any two speech patterns, we can get
rid of their timing differences by wrapping the time axis of one so that the maximum
coincidence is attained with the other.
 Correlation Power Analysis
Unstable clocks are used to defeat naive power analysis. Several techniques are used to
counter this defense, one of which is dynamic time warp.
Vector Quantization
Quantization is a process of mapping an infinite set of scalar or vector quantities by a finite set
of scalar or vector quantities. Quantization has applications in the areas of signal processing,
speech processing and Image processing. In speech coding, quantization is required to reduce
the number of bits used to represent a sample of speech signal. When less number of bits is
used to represent a sample the bit-rate, complexity and memory requirement gets reduced.
Quantization results in the loss in the quality of a speech signal, which is undesirable. So a
compromise must be made between the reduction in bit-rate and the quality of speech signal.
Two types of quantization techniques exist. They are scalar quantization and vector
quantization. “Scalar quantization deals with the quantization of samples on a sample by
sample basis”, while “vector quantization deals with quantizing the samples in groups called
vectors”. Vector quantization increases the optimality of a quantizer at the cost of increased
computational complexity and memory requirements.
Shannon theory states that “quantizing a vector is more effective than quantizing individual
scalar values in terms of spectral distortion”. According to Shannon the dimension of a vector
chosen greatly affects the performance of quantization. Vectors of larger dimension produce
better quality compared to vectors of smaller dimension and in vectors of smaller dimension
the transparency in the quantization is not good at a particular bit-rate chosen. This is because
in vectors of smaller dimension the correlation that exists between the samples is lost and the
scalar quantization itself destroys the correlation that exists between successive samples. So
the quality of the quantized speech signal gets lost. Therefore, quantizing correlated data
requires techniques that preserve the correlation between the samples, which is achieved by the
vector quantization technique (VQ). Vector quantization is the simplification of scalar
quantization. Vectors of larger dimension produce transparency in quantization at a particular
bit-rate chosen. In Vector quantization the data is quantized in the form of contiguous blocks
called vectors rather than individual samples. But later with the development of better coding
techniques, it is made possible that transparency in quantization can also be achieved even for
vectors of smaller dimension. In this thesis quantization is performed on vectors of full length
and on vectors of smaller dimensions for a given bit-rate.
Fig-03 : Two Dimensional Vector Quantizer
An example of two dimensional vector quantizer is shown in Fig-03. The two dimensional
region shown in Fig-03 is called as the voronoi region, which in turn contains several numbers
of small hexagonal regions. The hexagonal regions defined by the red borders are called as the
encoding regions. The green dots represent the vectors to be quantized which fall in different
hexagonal regions and the blue circles represent the codeword‟s (centroids). The vectors (green
dots) falling in a particular hexagonal region is best represented by the codeword (blue circle)
falling in that hexagonal region.
Vector quantization technique has become a great tool with the development of non variational
design algorithms like the Linde, Buzo, Gray (LBG) algorithm. On the other hand besides
spectral distortion the vector quantizer is having its own limitations like the computational
complexity and memory requirements required for the searching and storing of the codebooks.
For applications requiring higher bit-rates the computational complexity and memory
requirements increases exponentially. The block diagram of a vector quantizer is shown in Fig-
04
Fig-04 : Block diagram of Vector Quantizer
Let 𝑠 𝑘 = [𝑠1, 𝑠2, … . 𝑠 𝑁] 𝑇
be an N dimensional vector with real valued samples in the range 1≤
k ≤N. The superscript T in the vector 𝑠 𝑘 denotes the transpose of the vector. In vector
quantization, a real valued N dimensional input vector 𝑠 𝑘 is matched with the real valued N
dimensional codewords of the codebook N = 2 𝑏
. The code word that best matches the input
vector with lowest distortion is taken and the input vector is replaced by it. The codebook
consists of a finite set of codewords C = 𝐶𝑖 , 1≤ i ≤L, where 𝐶𝑖 =
[𝐶1𝑖, 𝐶2𝑖, … . 𝐶 𝑁𝑖] 𝑇
, where C is the codebook, L is the length of the codebook and 𝐶𝑖 denote
the ith codeword in a codebook.
Advantages
Advantages with vector quantization compared to scalar quantization.
 Can utilize the memory of the source.
 The distortion at a given rate will always be lower when increasing the number of
dimensions, even for a memoryless source.
Index iInput Vector
Buffe r
Codebook with
Codeword’s
C
Vector
Quantizer
( )ns ks
iC
Disadvantages
Disadvantages with vector quantization compared to scalar quantization.
 Both the storage space and the time needed to perform the quantization grows faster
than exponentially with the number of dimensions. Since there is no structure to the
codebook (in the general case) we will have to compare each signal vector with every
reconstruction vector in the codebook to find the closest one.
Applications
A few classical examples of applications include:
 Medical image storage (e.g. Magnetic Resonance Imaging). e.g. Magnetic resonance
image compression using scalar-vector quantization, or Compression of skin tumor
images;
 Satellite image storage and transmission (e.g. Remote Sensing). e.g. A vector
quantization-based coding scheme for television transmission via satellite;
 Transmission of audio signals through old noisy radio mobile communication channels.
e.g. A study of vector quantization for noisy channels; see also Competitive learning
algorithms for robust vector quantization for more examples in transmission
applications; etc.
More recent applications have been integrating VQ in several machine learning tasks such as:
 Speaker identification. e.g. A discriminative training algorithm for VQ-based speaker
identification;
 Image Steganography. e.g. High-capacity image hiding scheme based on vector
quantization, or Steganography using overlapping codebook partition; etc.
Linear predictive coding (LPC)
Linear predictive coding (LPC) is a method for signal source modelling in speech signal
processing. It is often used by linguists as a formant extraction tool. It has wide application in
other areas. LPC analysis is usually most appropriate for modeling vowels which are periodic,
except nasalized vowels. LPC is based on the source-filter model of speech signal.
Envelope Calculation
The LPC method is quite close to the FFT. The envelope is calculated from a number of
formants or poles specified by the user.
 The formants are estimated removing their effects from the speech signal, and
estimating the intensity and frequency of the remaining buzz. The removing process is
called inverse filtering, and the remaining signal is called the residue.
 The speech signal – source – is synthesized from the buzz parameters and the residue.
The source is ran through the filter – formants –, resulting in speech.
 The process is iterated several time is a second, with "frames". A 30 to 50 frames rate
per second yields and intelligible speech.
Advantages
Its main advantage comes from the reference to a simplified vocal tract model and the analogy
of a source-filter model with the speech production system. It is a useful methods for encoding
speech at a low bit rate.
Limitations
The LPC performance is limited by the method itself, and the local characteristics of the
signal.
 The harmonic spectrum sub-samples the spectral envelope, which produces a spectral
aliasing. These problems are especially manifested in voiced and high-pitched signals,
affecting the first harmonics of the signal, which refer to the perceived speech quality
and formant dynamics.
 A correct all-pole model for the signal spectrum can hardly be obtained.
 The desired spectral information, the spectral envelope is not represented: we get too
close to the original spectra. The LPC follows the curve of the spectrum down to the
residual noise level in the gap between two harmonics, or partials spaced too far apart.
It does not represent the desired spectral information to be modeled since we are
interested in fitting the spectral envelope as close as possible and not the original
spectra. The spectral envelope should be a smooth function passing through the
prominent peaks of the spectrum, yielding a flat sequence, and not the "valleys" formed
by the harmonic peaks.
Fig-05 : Comparing several envelope estimation methods
Applications
1. LPC, a statistical method for predicting future values of a waveform on the basis of its
past values1
, is often used to obtain a spectral envelope.
2. LPC differs from formant tracking in that:
 the waveform remains in the time domain; resonances are described by the
coefficients of an all-pole filter.
 altering resonances is difficult since editing IIR filter coefficients can result in an
unstable filter.
 analysis may be applied to a wide range of sounds.
3. LPC is often used to determine the filter in a source-filter model of speech2
which:
 characterizes the response of the vocal tract.
 reconstitutes the speech waveform when driven by the correct source.

More Related Content

PPTX
ADC and its Circuit design
PPTX
D ecimation and interpolation
PPTX
COLEA : A MATLAB Tool for Speech Analysis
PPTX
Source coding
PPT
Chapter03 fm modulation
PPT
Enhancement in spatial domain
ADC and its Circuit design
D ecimation and interpolation
COLEA : A MATLAB Tool for Speech Analysis
Source coding
Chapter03 fm modulation
Enhancement in spatial domain

What's hot (20)

PDF
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
PPT
Jpeg2000
PDF
Multirate
PDF
Digital Tv Overview
PPTX
Design of FIR Filters
PPTX
Presentation on bipolar encoding
PDF
Codigo de bloques lineales
PPTX
Source coding theorem
PPT
Chapter 2 Image Processing: Pixel Relation
PPTX
Turbo codes
PPT
Error Resilient Video Communication
PDF
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
PDF
Lecture 3 image sampling and quantization
PPTX
Image compression .
DOCX
DOCX
Makalah Sinyal digital dan analog
PPTX
Wavelet based image compression technique
PDF
Carry save adder vhdl
PPT
Insights from S-parameters
PPTX
Image compression using discrete cosine transform
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
Jpeg2000
Multirate
Digital Tv Overview
Design of FIR Filters
Presentation on bipolar encoding
Codigo de bloques lineales
Source coding theorem
Chapter 2 Image Processing: Pixel Relation
Turbo codes
Error Resilient Video Communication
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
Lecture 3 image sampling and quantization
Image compression .
Makalah Sinyal digital dan analog
Wavelet based image compression technique
Carry save adder vhdl
Insights from S-parameters
Image compression using discrete cosine transform
Ad

Similar to Dynamic time wrapping (dtw), vector quantization(vq), linear predictive coding (lpc) (20)

PPTX
Multimedia lossy compression algorithms
PDF
Speech recognition using vector quantization through modified k means lbg alg...
PDF
E017263040
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
PDF
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
PDF
An Image representation using Compressive Sensing and Arithmetic Coding
PDF
Non standard size image compression with reversible embedded wavelets
PDF
Non standard size image compression with reversible embedded wavelets
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
Digital communication unit II
PDF
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
PPTX
Speaker recognition systems
PDF
MDCT audio coding with pulse vector quantizers
PDF
Effect of grid adaptive interpolation over depth images
PDF
QRC-ESPRIT Method for Wideband Signals
PDF
I0341042048
PDF
Switched Multistage Vector Quantizer
Multimedia lossy compression algorithms
Speech recognition using vector quantization through modified k means lbg alg...
E017263040
International Journal of Engineering Research and Development (IJERD)
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
An Image representation using Compressive Sensing and Arithmetic Coding
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded wavelets
International Journal of Engineering Research and Development (IJERD)
Digital communication unit II
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Speaker recognition systems
MDCT audio coding with pulse vector quantizers
Effect of grid adaptive interpolation over depth images
QRC-ESPRIT Method for Wideband Signals
I0341042048
Switched Multistage Vector Quantizer
Ad

More from Tanjarul Islam Mishu (11)

PDF
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
PPTX
Dynamic time wrapping
PPTX
A presentation on windowing
PPTX
PPTX
Mobile satellite communication
PPTX
Shop management system
PDF
E health system design
PPTX
Multiplication algorithm, hardware and flowchart
PPTX
Rules of Karnaugh Map
PPSX
Implement Fingerprint authentication for employee automation system
PDF
Implement fingerprint authentication for employee automation system
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
Dynamic time wrapping
A presentation on windowing
Mobile satellite communication
Shop management system
E health system design
Multiplication algorithm, hardware and flowchart
Rules of Karnaugh Map
Implement Fingerprint authentication for employee automation system
Implement fingerprint authentication for employee automation system

Recently uploaded (20)

PDF
Module 3: Health Systems Tutorial Slides S2 2025
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PDF
Business Ethics Teaching Materials for college
PDF
English Language Teaching from Post-.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PPTX
NOI Hackathon - Summer Edition - GreenThumber.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
Module 3: Health Systems Tutorial Slides S2 2025
102 student loan defaulters named and shamed – Is someone you know on the list?
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Cell Structure & Organelles in detailed.
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
O7-L3 Supply Chain Operations - ICLT Program
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Business Ethics Teaching Materials for college
English Language Teaching from Post-.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cardiovascular Pharmacology for pharmacy students.pptx
NOI Hackathon - Summer Edition - GreenThumber.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Insiders guide to clinical Medicine.pdf
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx

Dynamic time wrapping (dtw), vector quantization(vq), linear predictive coding (lpc)

  • 1. Dynamic Time Wrapping (DTW) Dynamic time wrapping (DTW) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions (Fig. 01). Intuitively, the sequences are warped in a nonlinear fashion to match each other. Originally, DTW has been used to compare different speech patterns in automatic speech recognition. In fields such as data mining and information retrieval, DTW has been successfully applied to automatically cope with time deformations and different speeds associated with time-dependent data. Fig- 01 showing two speech signal aligned with DTW. Fig-01: Two Speech Signal Aligned using DTW Fig. 02: Time alignment of two time-dependent sequences. Aligned points are indicated by the arrows The distance between two point, x=[x1,x2,...,xn] and y=[y1,y2,...,yn] in a n-dimensional space can be computed via the Euclidean distance: dist(x,y)=∥x−y∥=√((𝑥1 − 𝑦1)2 + (𝑥2 − 𝑦2)2 + ⋯ + (𝑥 𝑛 − 𝑦 𝑛)2)
  • 2. However, if the length of x is different from y, then we cannot use the above formula to compute the distance. Instead, we need a more flexible method that can find the best mapping from elements in x to those in y in order to compute the distance. The goal of dynamic time wrapping (DTW for short) is to find the best mapping with the minimum distance by the use of DP. The method is called "time wrapping " since both x and y are usually vectors of time series and we need to compress or expand in time in order to find the best mapping. We shall give the formula for DTW in this section. Let t and r be two vectors of lengths m and n, respectively. The goal of DTW is to find a mapping path {(p1,q1),(p2,q2),...,(pk,qk)} such that the distance on this mapping path ∑ |𝑡(𝑝𝑖) − 𝑟(𝑞𝑖)|𝑘 𝑖=0 is minimized, with the following constraints:  Boundary conditions: (p1,q1)=(1,1), (pk,qk)=(m,n). This is a typical example of "anchored beginning" and "anchored end".  Local constraint: For any given node (i,j) in the path, the possible fan-in nodes are restricted to (i−1,j), (i,j−1), (i−1,j−1). This local constraint guarantees that the mapping path is monotonically non-decreasing in its first and second arguments. Moreover, for any given element in t, we should be able to find at least one corresponding element in r, and vice versa. How can we find the optimum mapping path in DTW? An obvious choice is forward DP, which can be summarized in the following three steps: 1. Optimum-value function: Define D(i,j) as the DTW distance between t(1:i) and r(1:j), with the mapping path starting from (1,1) to (i,j). 2. Recursion: with the initial condition D(1,1)=|t(1)−r(1)| 3. Final answer: D(m,n). In practice, we need to construct a matrix D of dimensions m×n first and fill in the value of D(1,1) by using the initial condition. Then by using the recursive formula, we fill the whole
  • 3. matrix one element at a time, by following a column-by-column or row-by-row order. The final answer will be available as D(m,n), with a computational complexity of O(mn). If we want to know the optimum mapping path in addition to the minimum distance, we may want to keep the optimum fan-in of each node. Then at the end of DP, we can quickly back track to find the optimum mapping path between the two input vectors. We can also have the backward DP for DTW, as follows: 1. Optimum-value function: Define D(i,j) as the DTW distance between t(i:m) and r(j:n), with the mapping path from (i,j) to (m,n). 2. Recursion: with the initial condition D(m,n)=|t(m)−r(n)| 3. Final answer: D(1,1). The answer obtain by the backward DP should be that same as that obtained by the forward DP. Another commonly used local path constraint is to set the fan-in of 27°-45°-63° only, as shown in the following figure: Advantages  Works well for small number of templates (<20)  Language independent  Speaker specific  Easy to train (end user controls it) Disadvantages  Limited number of templates  Need actual training examples
  • 4. Applications  Spoken word recognition Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis which needs to be eliminated. Considering any two speech patterns, we can get rid of their timing differences by wrapping the time axis of one so that the maximum coincidence is attained with the other.  Correlation Power Analysis Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warp.
  • 5. Vector Quantization Quantization is a process of mapping an infinite set of scalar or vector quantities by a finite set of scalar or vector quantities. Quantization has applications in the areas of signal processing, speech processing and Image processing. In speech coding, quantization is required to reduce the number of bits used to represent a sample of speech signal. When less number of bits is used to represent a sample the bit-rate, complexity and memory requirement gets reduced. Quantization results in the loss in the quality of a speech signal, which is undesirable. So a compromise must be made between the reduction in bit-rate and the quality of speech signal. Two types of quantization techniques exist. They are scalar quantization and vector quantization. “Scalar quantization deals with the quantization of samples on a sample by sample basis”, while “vector quantization deals with quantizing the samples in groups called vectors”. Vector quantization increases the optimality of a quantizer at the cost of increased computational complexity and memory requirements. Shannon theory states that “quantizing a vector is more effective than quantizing individual scalar values in terms of spectral distortion”. According to Shannon the dimension of a vector chosen greatly affects the performance of quantization. Vectors of larger dimension produce better quality compared to vectors of smaller dimension and in vectors of smaller dimension the transparency in the quantization is not good at a particular bit-rate chosen. This is because in vectors of smaller dimension the correlation that exists between the samples is lost and the scalar quantization itself destroys the correlation that exists between successive samples. So the quality of the quantized speech signal gets lost. Therefore, quantizing correlated data requires techniques that preserve the correlation between the samples, which is achieved by the vector quantization technique (VQ). Vector quantization is the simplification of scalar quantization. Vectors of larger dimension produce transparency in quantization at a particular bit-rate chosen. In Vector quantization the data is quantized in the form of contiguous blocks called vectors rather than individual samples. But later with the development of better coding techniques, it is made possible that transparency in quantization can also be achieved even for vectors of smaller dimension. In this thesis quantization is performed on vectors of full length and on vectors of smaller dimensions for a given bit-rate. Fig-03 : Two Dimensional Vector Quantizer
  • 6. An example of two dimensional vector quantizer is shown in Fig-03. The two dimensional region shown in Fig-03 is called as the voronoi region, which in turn contains several numbers of small hexagonal regions. The hexagonal regions defined by the red borders are called as the encoding regions. The green dots represent the vectors to be quantized which fall in different hexagonal regions and the blue circles represent the codeword‟s (centroids). The vectors (green dots) falling in a particular hexagonal region is best represented by the codeword (blue circle) falling in that hexagonal region. Vector quantization technique has become a great tool with the development of non variational design algorithms like the Linde, Buzo, Gray (LBG) algorithm. On the other hand besides spectral distortion the vector quantizer is having its own limitations like the computational complexity and memory requirements required for the searching and storing of the codebooks. For applications requiring higher bit-rates the computational complexity and memory requirements increases exponentially. The block diagram of a vector quantizer is shown in Fig- 04 Fig-04 : Block diagram of Vector Quantizer Let 𝑠 𝑘 = [𝑠1, 𝑠2, … . 𝑠 𝑁] 𝑇 be an N dimensional vector with real valued samples in the range 1≤ k ≤N. The superscript T in the vector 𝑠 𝑘 denotes the transpose of the vector. In vector quantization, a real valued N dimensional input vector 𝑠 𝑘 is matched with the real valued N dimensional codewords of the codebook N = 2 𝑏 . The code word that best matches the input vector with lowest distortion is taken and the input vector is replaced by it. The codebook consists of a finite set of codewords C = 𝐶𝑖 , 1≤ i ≤L, where 𝐶𝑖 = [𝐶1𝑖, 𝐶2𝑖, … . 𝐶 𝑁𝑖] 𝑇 , where C is the codebook, L is the length of the codebook and 𝐶𝑖 denote the ith codeword in a codebook. Advantages Advantages with vector quantization compared to scalar quantization.  Can utilize the memory of the source.  The distortion at a given rate will always be lower when increasing the number of dimensions, even for a memoryless source. Index iInput Vector Buffe r Codebook with Codeword’s C Vector Quantizer ( )ns ks iC
  • 7. Disadvantages Disadvantages with vector quantization compared to scalar quantization.  Both the storage space and the time needed to perform the quantization grows faster than exponentially with the number of dimensions. Since there is no structure to the codebook (in the general case) we will have to compare each signal vector with every reconstruction vector in the codebook to find the closest one. Applications A few classical examples of applications include:  Medical image storage (e.g. Magnetic Resonance Imaging). e.g. Magnetic resonance image compression using scalar-vector quantization, or Compression of skin tumor images;  Satellite image storage and transmission (e.g. Remote Sensing). e.g. A vector quantization-based coding scheme for television transmission via satellite;  Transmission of audio signals through old noisy radio mobile communication channels. e.g. A study of vector quantization for noisy channels; see also Competitive learning algorithms for robust vector quantization for more examples in transmission applications; etc. More recent applications have been integrating VQ in several machine learning tasks such as:  Speaker identification. e.g. A discriminative training algorithm for VQ-based speaker identification;  Image Steganography. e.g. High-capacity image hiding scheme based on vector quantization, or Steganography using overlapping codebook partition; etc.
  • 8. Linear predictive coding (LPC) Linear predictive coding (LPC) is a method for signal source modelling in speech signal processing. It is often used by linguists as a formant extraction tool. It has wide application in other areas. LPC analysis is usually most appropriate for modeling vowels which are periodic, except nasalized vowels. LPC is based on the source-filter model of speech signal. Envelope Calculation The LPC method is quite close to the FFT. The envelope is calculated from a number of formants or poles specified by the user.  The formants are estimated removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The removing process is called inverse filtering, and the remaining signal is called the residue.  The speech signal – source – is synthesized from the buzz parameters and the residue. The source is ran through the filter – formants –, resulting in speech.  The process is iterated several time is a second, with "frames". A 30 to 50 frames rate per second yields and intelligible speech. Advantages Its main advantage comes from the reference to a simplified vocal tract model and the analogy of a source-filter model with the speech production system. It is a useful methods for encoding speech at a low bit rate. Limitations The LPC performance is limited by the method itself, and the local characteristics of the signal.  The harmonic spectrum sub-samples the spectral envelope, which produces a spectral aliasing. These problems are especially manifested in voiced and high-pitched signals, affecting the first harmonics of the signal, which refer to the perceived speech quality and formant dynamics.  A correct all-pole model for the signal spectrum can hardly be obtained.  The desired spectral information, the spectral envelope is not represented: we get too close to the original spectra. The LPC follows the curve of the spectrum down to the residual noise level in the gap between two harmonics, or partials spaced too far apart. It does not represent the desired spectral information to be modeled since we are interested in fitting the spectral envelope as close as possible and not the original spectra. The spectral envelope should be a smooth function passing through the prominent peaks of the spectrum, yielding a flat sequence, and not the "valleys" formed by the harmonic peaks.
  • 9. Fig-05 : Comparing several envelope estimation methods Applications 1. LPC, a statistical method for predicting future values of a waveform on the basis of its past values1 , is often used to obtain a spectral envelope. 2. LPC differs from formant tracking in that:  the waveform remains in the time domain; resonances are described by the coefficients of an all-pole filter.  altering resonances is difficult since editing IIR filter coefficients can result in an unstable filter.  analysis may be applied to a wide range of sounds. 3. LPC is often used to determine the filter in a source-filter model of speech2 which:  characterizes the response of the vocal tract.  reconstitutes the speech waveform when driven by the correct source.