SlideShare a Scribd company logo
Abstract— One of the major building blocks in an image
data compression system is the discrete cosine transform or
DCT; which can be achieved using specialized algorithms. This
paper presents a method to implement the DCT compression
technique using the Lee algorithm. Rapid prototyping based on
FPGA platform of the Spartan-3E family is used to validate the
operation of the described DCT system. This system offers
significant advantages: portability, rapid time to market and
real time, continuing parametric change in the DCT transform.
I. INTRODUCTION
large number of image data compression techniques are
available, each one being adapted to a specific type of
application, such as: compact disc, videoconference,
videophone and multimedia systems. In all of these
applications the transmission line bandwidth will determine
the compression standard to be used [1]. Among these, there
is a compression technique based on a frequency transform
called a Discrete Cosine Transform (DCT). This transform
contains unique characteristics which allow for the creation
of an efficient image compression; image and video
compressors and decompressors are implemented in both
software and hardware. However, hardware
implementations are especially important for the realization
of highly parallel algorithms and can achieve much higher
throughput than software solutions. The DCT transform was
greatly enhanced by its implementation in VLSI circuits,
which are becoming increasingly faster. VLSI circuits are
now capable of executing a DCT transform in real time.
In this paper, we present the methodology to implement
the DCT. The proposed system could be useful in many
other applications that require image data compression. We
describe in section II the general description of the codec
image video system that required the data compression. The
DCT hardware design implementations are subjects of
section III. Section IV contains the implementation process
of the DCT using FPGA and its experimental results.
Finally, conclusion is given in section V.
II. CODEC IMAGE VIDEO
Figure 1 shows the subsystem of CoDec
(Compressor/Decompressor) video system [2, 3]. It consists
of a compressor and a decompressor. The compressor is
made up of an image pre-processor, a discrete cosine
Manuscript received in April 30, 2009.
A. Kassem and M. Hamad are with the Electrical and Computer,
Communication Engineering Department, Notre Dame University,
P.O.BOX 72, Zouk Mikael, Lebanon, e-mail: mhamad or
akassem@ndu.edu.lb.
transform (DCT), a quantizer Vector (QV) and a Variable
Length Coding (VLC). The compressed image is then
transmitted via a channel line or wirelessly to the
decompressor. The decompressor consists of an Inverse
Variable Length Coding (IVLC), an inverse discrete cosine
transform (IDCT), an inverse quantizer (IQV), and a post-
processor as shown in Figure 1.
Fig. 1. CoDec image video system.
Discrete Cosine Transform (DCT) block receives an NxN
matrix image, which is divided into smaller image blocks
(4x4, 8x8, 16x16, ...) where each block is transformed from
the spatial domain to the frequency domain. DCT
decomposes signal into spatial frequency components called
DCT coefficients [2]. The lower frequency DCT coefficients
appear toward the first line/first column of the DCT matrix,
and the higher frequency coefficients are in the last line/last
column of the DCT matrix. The quantization is used to
discard insignificant data without introducing any artifacts
to the image. After quantization, the majority of the DCT
coefficients are equal to zero [4, 5]. A run-length coding
(RLC) and variable length coding (VLC) are used to retrieve
code words and their lengths from predefined lookup tables.
The decompressor block is used to reconstruct the
compressed image using the inverse process.
III. DCT HARDWARE DESIGN
A. Theory of DCT for Hardware Implementation
Equation 1, shows the 1-D Discrete Cosine Transform
(DCT) [6]:
∑
−
=
+
=
1
0
)
2
)12(
()()(
2
)(
N
x
N
ux
CosxfuC
N
uF
π
, (1)
where F(u): coefficient value in the transform domain,
f(x): coefficient value in the pixel domain,
x: spatial coordinate in the pixel domain,
u: coordinate in the transform domain,
Image Compression on FPGA using DCT
A. Kassem, Member, IEEE, M. Hamad , Member, IEEE, and E. Haidamous
A
ACTEA 2009 July 15-17, 2009 Zouk Mosbeh, Lebanon
978-1-4244-3834-1/09/$25.00 © 2009 IEEE 320
2
1)( =uC for u=0, otherwise 1.
The DCT is a frequency transform which is equivalent to
the real part of the discrete Fourier transform (DFT).
Equation (2) shows the forward transformation for the
generation of the two dimensional discrete cosine transform
2D-DCT, of the original NxN image block:
∑∑
−
=
−
=
=
1
0
1
0
2
),()()(
4
),(
N
i
N
j
jifvCuC
N
vuF
)
2
)12(
()
2
)12(
(
N
vj
Cos
N
ui
Cos
++ ππ
, (2)
where F(u,v): coefficient values in the transform domain,
f(i,j): coefficient values in the pixel domain,
i,j: spatial coordinates in the pixel domain,
u,v: coordinates in the transform domain,
2
1)( =uC for u=0, otherwise 1.
2
1)( =vC for v=0, otherwise 1.
The separability property of the DCT, has an advantage
that F(u, v) can be computed in two successive steps, 1-D
operations on rows and columns of an image block and then
calculate the 2D-DCT as shown in figure 2. Using this
property equation (2), becomes as the following equation
(3):
∑ ∑
−
=
−
=⎪⎩
⎪
⎨
⎧
⎪⎭
⎪
⎬
⎫+
=
1
0
1
0
)
2
)12(
(),()(
2
)(
2
),(
N
i
N
j
N
vj
CosjifvC
N
uC
N
vuF
π
)
2
)12(
(
N
ui
Cos
+π
, (3)
Fig. 2. Computation of 2-D DCT using separability property
To minimize the computation of the 1D-DCT in equation
(1), a LEE graph will be used [7]. The LEE graph can be
produced by rewriting the equation (1) as follows:
∑
−
=
+
=
1
0
)12(
2)(
2
)(
N
n
mn
Nc CnF
N
mF & , m= 0, 1, …., N-1, (4)
where )cos( kiCi
k π= ;
and )()()( nFnCnF =&
This graph, shown in figure 3, for N=8 requires
1)(log
2
3
2 +− NN
N
Real additions,
and
)(log
2
2 N
N
Real multiplications.
B. DCT Hardware Implementation
The block diagram of the hardware implementation of the
2D-DCT using LEE graph is shown in figure 4. The
operative part is used to calculate the 1D-DCT on the rows,
and the 1D-DCT on the columns, alternatively, to obtain the
2D-DCT for N=8.
Fig. 3. LEE graph to compute the 1D-DCT for N=8
321
The 1D-DCT algorithm is applied firstly on the data
sequences for the rows and the results are stored in the
memory. Afterwards the algorithm is applied on the
columns obtaining the final results of the 1D-DCT. During
operation of the 1D-DCT on the columns, the next data
sequences enter into the system, thus creating a pipeline
architecture [8]. Each output transform sequence is rounded
off by using an 11 bits adder and comparator subsystems.
IV. RESULTS
2D Discrete Cosine Transform can be implemented onto
an FPGA through system generator using hardware models,
which can be used as a building block for various image
processing systems. System Generator works with standard
Simulink models including “Gateway In” and “Gateway
Out” defines the boundary of the FPGA.
The input image is obtained by MATLAB and
transformed into a matrix representation. This image is then
decomposed into (8x8) block images. Figure 5 shows an
8x8 block image obtained from a 256x256 gray scale image.
Fig.5. Image test divided into 8x8 blocks used for DCT
And the corresponding matrix representation, obtained by
MATLAB, of the 8x8 block is shown by matrix 1.
Matrix 1. Pixel level of 8x8 block using MALAB.
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
⎡
738293856711510987
80947393967711578
7364475885609472
7681565963576259
5855535561626258
5958575961626066
5462677071897783
579110910410197109106
The 2D-DCT matrix of this block calculated by
MATLAB is represented by matrix 2.
Matrix 2. 2D-DCT result using MATLAB.
597.079 35.59 -1.172 -5.127 -15.25 -4.32 -19.81 -1.6
-17.287 8.613 -14.64 20.567 6.254 14.25 17.41 2.37
99.4702 29.92 -18.89 18.946 -24.64 -9.7 6.752 6.47
30.9677 0.635 -1.171 2.7918 10.074 5.313 -24.98 -15.4
18.5024 -6.76 -4.47 11.6 -15.75 0.051 14.03 7.99
19.6004 0.68 -10.38 6.8812 5.0155 -3.58 -17.08 -10.4
-5.4927 6.033 3.503 1.6981 1.2775 0.91 5.144 -6.26
13.2943 -20.1 0.54 8.287 0.1484 3.332 -7.513 3.68
After downloading the bit mapping file of the design (2D-
DCT) onto the FPGA board type SPARTAN 3-E
STARTER mounted on the laptop through the USB port.
The obtained matrix by the hardware is presented by the
matrix 3 using a frequency of 50 MHz. The hardware
implementation of the 2D-DCT occupies only 14% of the
total number of slices and 10% of the total LUT. However,
inherent limitations in the interface to the FPGA limited the
overall performance of the design.
Fig. 4. Block diagram of the hardware implementation of the 2D-DCT using LEE graph
322
The Maximum percentage error is found to be 7.86%≈8%
on the grayscale pixel range [0,255]. This error doesn’t
appear when motion images in process; this due of the fact
that the Human Visual System (HVS) is less sensitive to
errors in high frequency coefficients than it is to lower
frequency coefficients, the higher frequency components
can be more finely quantized, as done by the quantization
matrix.
Matrix 3. 2D-DCT result using LEE graph implemented in Hardware.
597.1 35.6 -1.2 -5.1 -15.3 -4.3 -19.8 -1.6
-17.3 8.6 -14.6 20.6 6.3 14.3 17.4 2.4
99.5 29.9 -18.9 18.9 -24.6 -9.7 6.8 6.5
31.0 0.6 -1.2 2.8 10.1 5.3 -25.0 -15.4
18.5 -6.8 -4.5 11.6 -15.8 0.1 14.0 8.0
19.6 0.7 -10.4 6.9 5.0 -3.6 -17.1 -10.4
-5.5 6.0 3.5 1.7 1.3 0.9 5.1 -6.3
13.3 -20.1 0.5 8.3 0.2 3.3 -7.5 3.7
The 2D-DCT architecture shows a good performance
when it is applied to 32x32 sub-images where each sub-
image is composed by 8x8 pixels as shown in table 1. The
timing diagram of this implementation is depicted by figure
6, where
Ni: Number of images;
Ns: Number of sub-images;
Nl: Number of Lines of the sub-image;
Nc: Number of Columns of the sub-image;
Tc: T-cycle = 1 clock cycle;
Tci: Time to initialize the 1st
sub-image = Nl * Tc;
Ts: Time to initialize the next sub-image = Nc * Tc;
Tsi: Time required to compute one sub-image;
Tti: Total time required to compute one image= Ns * Tsi;
Fig.6. Timing diagram of the implemented 2D-DCT
Block 1 and 2 of figure 4 work in pipeline, when they are
loaded by the sub-image pixels. In addition the initialization
time to load the 1st
sub-image can be omitted.
Table 1: Performance of the 2D-DCT implemented in hardware.
Ni Ns Nl*Tc
(us)
Nc*Tc
(us)
Tsi
(us)
Tti
(ms)
1 1024 0.16 0.16 0.16 0.16
12 12288 0.16 0.16 0.16 1.97
25 25600 0.16 0.16 0.16 4.1
30 30720 0.16 0.16 0.16 4.92
As shown in table 1, about 0.2% (5 ms) is used by the
2D-DCT, which means that 98.8% is left to the other blocks
such as QV, LVC and the decompressor block to produce a
real time motion image (30 image/second).
V. CONCLUSION
We have presented the implementation of the 2D-DCT
algorithm using LEE graph with combined pipeline
architecture. This implementation was realized with a Xilinx
XC3S500E Spartan-3E Starter FPGA, clocked at 50 MHz.
The use of a reprogrammable device permits the continuing
parametric changes of the DCT in real time. The Xilinx
System Generator, embedded in MATLAB Simulink was
used to program the model and test in the FPGA board using
the hardware co-simulation feature tools. Finally, the error
percentages about 8 % of the grayscale pixel were very
small between the images before the 2D-DCT
implementation and after the hardware implementation.
ACKNOWLEDGMENT
The authors would like to acknowledge the financial
support of the use of CAD tools from the Xilinx
Corporation.
REFERENCES
[1] A. K. Jain, “Fundamental of Digital Image Processing”, Prentice-Hall,
1st
Ed.,1989.
[2] I. E. G. Richardson, “Video Codec Design: Developing Image and
Video Compression Systems”, Wiley & Sons, 1st
Ed., 2002.
[3] A. Puri, “Video Coding using the MPEG-1 Compression Standard”,
Society for Information Display Digest of Technical papers, pp. 123-
126, 1992.
[4] L. V. Agostini, I. S. Silva, S. Bampi, “Pipelined fast 2D DCT
architecture for JPEG image Compression”, The 14th
Symposium on
Integrated Circuits and Systems Design, pp. 226-231, Sept, 2001.
[5] A. B. Watson, “DCT Quantization Matrices Visually Optimized for
Individual Images”, Proceedings of SPIE, pp. 202-2216, 1993
[6] N. Ahamed, T. Natarjian, and K. R. Rao, “Discrete Cosine
Transform”, IEEE Trans. on Comp., Vol. C-23, pp. 90-93, Jan. 1974.
[7] B.G. LEE, “A new algorithm to compute the discrete cosine
transform”, IEEE Trans. on Acc., Speech, and Signal Process., pp.
1243-1245 vol. ASSP-32, no. 6, Dec.1984.
[8] A. Kassem et al., “Simulation and Implementation of DCT for Image
Processing Applications”, Second LAAS International Conference on
Computer Simulation, 1997.
323

More Related Content

PDF
IMQA Paper
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
PDF
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
PDF
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
PDF
Median based parallel steering kernel regression for image reconstruction
PDF
A Review on Image Compression using DCT and DWT
PDF
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
IMQA Paper
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
Median based parallel steering kernel regression for image reconstruction
A Review on Image Compression using DCT and DWT
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...

What's hot (18)

PDF
Paper on experimental setup for verifying - "Slow Learners are Fast"
PDF
Cuda project paper
PDF
GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion
PDF
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
DOCX
Digital scaling
PDF
I3602061067
PDF
Gpu based image segmentation using
PDF
An improved image compression algorithm based on daubechies wavelets with ar...
PDF
Adaptive lifting based image compression scheme using interactive artificial ...
PDF
nips report
PDF
Image Compression Using Intra Prediction of H.264/AVC and Implement of Hiding...
PDF
Improved anti-noise attack ability of image encryption algorithm using de-noi...
PDF
Transformation and dynamic visualization of images from computer through an F...
PDF
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
PDF
An Efficient Multiplierless Transform algorithm for Video Coding
PDF
Color image compression based on spatial and magnitude signal decomposition
PDF
Fpga sotcore architecture for lifting scheme revised
PDF
Real-time traffic sign detection and recognition using Raspberry Pi
Paper on experimental setup for verifying - "Slow Learners are Fast"
Cuda project paper
GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Digital scaling
I3602061067
Gpu based image segmentation using
An improved image compression algorithm based on daubechies wavelets with ar...
Adaptive lifting based image compression scheme using interactive artificial ...
nips report
Image Compression Using Intra Prediction of H.264/AVC and Implement of Hiding...
Improved anti-noise attack ability of image encryption algorithm using de-noi...
Transformation and dynamic visualization of images from computer through an F...
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
An Efficient Multiplierless Transform algorithm for Video Coding
Color image compression based on spatial and magnitude signal decomposition
Fpga sotcore architecture for lifting scheme revised
Real-time traffic sign detection and recognition using Raspberry Pi
Ad

Similar to Kassem2009 (20)

PDF
An35225228
PDF
4 - Simulation and analysis of different DCT techniques on MATLAB (presented ...
PDF
A Novel Image Compression Approach Inexact Computing
PDF
Paper id 25201467
PDF
IIIRJET-Implementation of Image Compression Algorithm on FPGA
PPT
Multimedia image compression standards
PDF
D0325016021
PDF
Digital image watermarking using dct with high security of
PDF
4 - Simulation and analysis of different DCT techniques on MATLAB (presented ...
PDF
C010421720
PDF
Modified approximate 8-point multiplier less DCT like transform
PDF
Digital Image Compression using Hybrid Scheme using DWT and Quantization wit...
PDF
SQUASHED JPEG IMAGE COMPRESSION VIA SPARSE MATRIX
PDF
Squashed JPEG Image Compression via Sparse Matrix
PDF
Squashed JPEG Image Compression via Sparse Matrix
PDF
A Review on Image Compression in Parallel using CUDA
PDF
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
PPT
Image Compression Digital Image processing
PDF
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
PDF
International Journal on Soft Computing ( IJSC )
An35225228
4 - Simulation and analysis of different DCT techniques on MATLAB (presented ...
A Novel Image Compression Approach Inexact Computing
Paper id 25201467
IIIRJET-Implementation of Image Compression Algorithm on FPGA
Multimedia image compression standards
D0325016021
Digital image watermarking using dct with high security of
4 - Simulation and analysis of different DCT techniques on MATLAB (presented ...
C010421720
Modified approximate 8-point multiplier less DCT like transform
Digital Image Compression using Hybrid Scheme using DWT and Quantization wit...
SQUASHED JPEG IMAGE COMPRESSION VIA SPARSE MATRIX
Squashed JPEG Image Compression via Sparse Matrix
Squashed JPEG Image Compression via Sparse Matrix
A Review on Image Compression in Parallel using CUDA
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
Image Compression Digital Image processing
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
International Journal on Soft Computing ( IJSC )
Ad

Recently uploaded (20)

PPTX
iec ppt- ppt on iec pulmonary rehabilitation 1.pptx
PDF
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
PPTX
HPE Aruba-master-icon-library_052722.pptx
PPT
pump pump is a mechanism that is used to transfer a liquid from one place to ...
PPT
robotS AND ROBOTICSOF HUMANS AND MACHINES
PPTX
DOC-20250430-WA0014._20250714_235747_0000.pptx
PDF
SEVA- Fashion designing-Presentation.pdf
PDF
SOUND-NOTE-ARCHITECT-MOHIUDDIN AKHAND SMUCT
PPTX
An introduction to AI in research and reference management
PDF
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
Interior Structure and Construction A1 NGYANQI
PDF
Introduction-to-World-Schools-format-guide.pdf
PDF
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
PDF
Test slideshare presentation for blog post
PPTX
Orthtotics presentation regarding physcial therapy
PPT
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
PPTX
Causes of Flooding by Slidesgo sdnl;asnjdl;asj.pptx
PPTX
rapid fire quiz in your house is your india.pptx
PPTX
CLASS_11_BUSINESS_STUDIES_PPT_CHAPTER_1_Business_Trade_Commerce.pptx
PPTX
2. Competency Based Interviewing - September'16.pptx
iec ppt- ppt on iec pulmonary rehabilitation 1.pptx
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
HPE Aruba-master-icon-library_052722.pptx
pump pump is a mechanism that is used to transfer a liquid from one place to ...
robotS AND ROBOTICSOF HUMANS AND MACHINES
DOC-20250430-WA0014._20250714_235747_0000.pptx
SEVA- Fashion designing-Presentation.pdf
SOUND-NOTE-ARCHITECT-MOHIUDDIN AKHAND SMUCT
An introduction to AI in research and reference management
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
Interior Structure and Construction A1 NGYANQI
Introduction-to-World-Schools-format-guide.pdf
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
Test slideshare presentation for blog post
Orthtotics presentation regarding physcial therapy
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
Causes of Flooding by Slidesgo sdnl;asnjdl;asj.pptx
rapid fire quiz in your house is your india.pptx
CLASS_11_BUSINESS_STUDIES_PPT_CHAPTER_1_Business_Trade_Commerce.pptx
2. Competency Based Interviewing - September'16.pptx

Kassem2009

  • 1. Abstract— One of the major building blocks in an image data compression system is the discrete cosine transform or DCT; which can be achieved using specialized algorithms. This paper presents a method to implement the DCT compression technique using the Lee algorithm. Rapid prototyping based on FPGA platform of the Spartan-3E family is used to validate the operation of the described DCT system. This system offers significant advantages: portability, rapid time to market and real time, continuing parametric change in the DCT transform. I. INTRODUCTION large number of image data compression techniques are available, each one being adapted to a specific type of application, such as: compact disc, videoconference, videophone and multimedia systems. In all of these applications the transmission line bandwidth will determine the compression standard to be used [1]. Among these, there is a compression technique based on a frequency transform called a Discrete Cosine Transform (DCT). This transform contains unique characteristics which allow for the creation of an efficient image compression; image and video compressors and decompressors are implemented in both software and hardware. However, hardware implementations are especially important for the realization of highly parallel algorithms and can achieve much higher throughput than software solutions. The DCT transform was greatly enhanced by its implementation in VLSI circuits, which are becoming increasingly faster. VLSI circuits are now capable of executing a DCT transform in real time. In this paper, we present the methodology to implement the DCT. The proposed system could be useful in many other applications that require image data compression. We describe in section II the general description of the codec image video system that required the data compression. The DCT hardware design implementations are subjects of section III. Section IV contains the implementation process of the DCT using FPGA and its experimental results. Finally, conclusion is given in section V. II. CODEC IMAGE VIDEO Figure 1 shows the subsystem of CoDec (Compressor/Decompressor) video system [2, 3]. It consists of a compressor and a decompressor. The compressor is made up of an image pre-processor, a discrete cosine Manuscript received in April 30, 2009. A. Kassem and M. Hamad are with the Electrical and Computer, Communication Engineering Department, Notre Dame University, P.O.BOX 72, Zouk Mikael, Lebanon, e-mail: mhamad or [email protected]. transform (DCT), a quantizer Vector (QV) and a Variable Length Coding (VLC). The compressed image is then transmitted via a channel line or wirelessly to the decompressor. The decompressor consists of an Inverse Variable Length Coding (IVLC), an inverse discrete cosine transform (IDCT), an inverse quantizer (IQV), and a post- processor as shown in Figure 1. Fig. 1. CoDec image video system. Discrete Cosine Transform (DCT) block receives an NxN matrix image, which is divided into smaller image blocks (4x4, 8x8, 16x16, ...) where each block is transformed from the spatial domain to the frequency domain. DCT decomposes signal into spatial frequency components called DCT coefficients [2]. The lower frequency DCT coefficients appear toward the first line/first column of the DCT matrix, and the higher frequency coefficients are in the last line/last column of the DCT matrix. The quantization is used to discard insignificant data without introducing any artifacts to the image. After quantization, the majority of the DCT coefficients are equal to zero [4, 5]. A run-length coding (RLC) and variable length coding (VLC) are used to retrieve code words and their lengths from predefined lookup tables. The decompressor block is used to reconstruct the compressed image using the inverse process. III. DCT HARDWARE DESIGN A. Theory of DCT for Hardware Implementation Equation 1, shows the 1-D Discrete Cosine Transform (DCT) [6]: ∑ − = + = 1 0 ) 2 )12( ()()( 2 )( N x N ux CosxfuC N uF π , (1) where F(u): coefficient value in the transform domain, f(x): coefficient value in the pixel domain, x: spatial coordinate in the pixel domain, u: coordinate in the transform domain, Image Compression on FPGA using DCT A. Kassem, Member, IEEE, M. Hamad , Member, IEEE, and E. Haidamous A ACTEA 2009 July 15-17, 2009 Zouk Mosbeh, Lebanon 978-1-4244-3834-1/09/$25.00 © 2009 IEEE 320
  • 2. 2 1)( =uC for u=0, otherwise 1. The DCT is a frequency transform which is equivalent to the real part of the discrete Fourier transform (DFT). Equation (2) shows the forward transformation for the generation of the two dimensional discrete cosine transform 2D-DCT, of the original NxN image block: ∑∑ − = − = = 1 0 1 0 2 ),()()( 4 ),( N i N j jifvCuC N vuF ) 2 )12( () 2 )12( ( N vj Cos N ui Cos ++ ππ , (2) where F(u,v): coefficient values in the transform domain, f(i,j): coefficient values in the pixel domain, i,j: spatial coordinates in the pixel domain, u,v: coordinates in the transform domain, 2 1)( =uC for u=0, otherwise 1. 2 1)( =vC for v=0, otherwise 1. The separability property of the DCT, has an advantage that F(u, v) can be computed in two successive steps, 1-D operations on rows and columns of an image block and then calculate the 2D-DCT as shown in figure 2. Using this property equation (2), becomes as the following equation (3): ∑ ∑ − = − =⎪⎩ ⎪ ⎨ ⎧ ⎪⎭ ⎪ ⎬ ⎫+ = 1 0 1 0 ) 2 )12( (),()( 2 )( 2 ),( N i N j N vj CosjifvC N uC N vuF π ) 2 )12( ( N ui Cos +π , (3) Fig. 2. Computation of 2-D DCT using separability property To minimize the computation of the 1D-DCT in equation (1), a LEE graph will be used [7]. The LEE graph can be produced by rewriting the equation (1) as follows: ∑ − = + = 1 0 )12( 2)( 2 )( N n mn Nc CnF N mF & , m= 0, 1, …., N-1, (4) where )cos( kiCi k π= ; and )()()( nFnCnF =& This graph, shown in figure 3, for N=8 requires 1)(log 2 3 2 +− NN N Real additions, and )(log 2 2 N N Real multiplications. B. DCT Hardware Implementation The block diagram of the hardware implementation of the 2D-DCT using LEE graph is shown in figure 4. The operative part is used to calculate the 1D-DCT on the rows, and the 1D-DCT on the columns, alternatively, to obtain the 2D-DCT for N=8. Fig. 3. LEE graph to compute the 1D-DCT for N=8 321
  • 3. The 1D-DCT algorithm is applied firstly on the data sequences for the rows and the results are stored in the memory. Afterwards the algorithm is applied on the columns obtaining the final results of the 1D-DCT. During operation of the 1D-DCT on the columns, the next data sequences enter into the system, thus creating a pipeline architecture [8]. Each output transform sequence is rounded off by using an 11 bits adder and comparator subsystems. IV. RESULTS 2D Discrete Cosine Transform can be implemented onto an FPGA through system generator using hardware models, which can be used as a building block for various image processing systems. System Generator works with standard Simulink models including “Gateway In” and “Gateway Out” defines the boundary of the FPGA. The input image is obtained by MATLAB and transformed into a matrix representation. This image is then decomposed into (8x8) block images. Figure 5 shows an 8x8 block image obtained from a 256x256 gray scale image. Fig.5. Image test divided into 8x8 blocks used for DCT And the corresponding matrix representation, obtained by MATLAB, of the 8x8 block is shown by matrix 1. Matrix 1. Pixel level of 8x8 block using MALAB. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ 738293856711510987 80947393967711578 7364475885609472 7681565963576259 5855535561626258 5958575961626066 5462677071897783 579110910410197109106 The 2D-DCT matrix of this block calculated by MATLAB is represented by matrix 2. Matrix 2. 2D-DCT result using MATLAB. 597.079 35.59 -1.172 -5.127 -15.25 -4.32 -19.81 -1.6 -17.287 8.613 -14.64 20.567 6.254 14.25 17.41 2.37 99.4702 29.92 -18.89 18.946 -24.64 -9.7 6.752 6.47 30.9677 0.635 -1.171 2.7918 10.074 5.313 -24.98 -15.4 18.5024 -6.76 -4.47 11.6 -15.75 0.051 14.03 7.99 19.6004 0.68 -10.38 6.8812 5.0155 -3.58 -17.08 -10.4 -5.4927 6.033 3.503 1.6981 1.2775 0.91 5.144 -6.26 13.2943 -20.1 0.54 8.287 0.1484 3.332 -7.513 3.68 After downloading the bit mapping file of the design (2D- DCT) onto the FPGA board type SPARTAN 3-E STARTER mounted on the laptop through the USB port. The obtained matrix by the hardware is presented by the matrix 3 using a frequency of 50 MHz. The hardware implementation of the 2D-DCT occupies only 14% of the total number of slices and 10% of the total LUT. However, inherent limitations in the interface to the FPGA limited the overall performance of the design. Fig. 4. Block diagram of the hardware implementation of the 2D-DCT using LEE graph 322
  • 4. The Maximum percentage error is found to be 7.86%≈8% on the grayscale pixel range [0,255]. This error doesn’t appear when motion images in process; this due of the fact that the Human Visual System (HVS) is less sensitive to errors in high frequency coefficients than it is to lower frequency coefficients, the higher frequency components can be more finely quantized, as done by the quantization matrix. Matrix 3. 2D-DCT result using LEE graph implemented in Hardware. 597.1 35.6 -1.2 -5.1 -15.3 -4.3 -19.8 -1.6 -17.3 8.6 -14.6 20.6 6.3 14.3 17.4 2.4 99.5 29.9 -18.9 18.9 -24.6 -9.7 6.8 6.5 31.0 0.6 -1.2 2.8 10.1 5.3 -25.0 -15.4 18.5 -6.8 -4.5 11.6 -15.8 0.1 14.0 8.0 19.6 0.7 -10.4 6.9 5.0 -3.6 -17.1 -10.4 -5.5 6.0 3.5 1.7 1.3 0.9 5.1 -6.3 13.3 -20.1 0.5 8.3 0.2 3.3 -7.5 3.7 The 2D-DCT architecture shows a good performance when it is applied to 32x32 sub-images where each sub- image is composed by 8x8 pixels as shown in table 1. The timing diagram of this implementation is depicted by figure 6, where Ni: Number of images; Ns: Number of sub-images; Nl: Number of Lines of the sub-image; Nc: Number of Columns of the sub-image; Tc: T-cycle = 1 clock cycle; Tci: Time to initialize the 1st sub-image = Nl * Tc; Ts: Time to initialize the next sub-image = Nc * Tc; Tsi: Time required to compute one sub-image; Tti: Total time required to compute one image= Ns * Tsi; Fig.6. Timing diagram of the implemented 2D-DCT Block 1 and 2 of figure 4 work in pipeline, when they are loaded by the sub-image pixels. In addition the initialization time to load the 1st sub-image can be omitted. Table 1: Performance of the 2D-DCT implemented in hardware. Ni Ns Nl*Tc (us) Nc*Tc (us) Tsi (us) Tti (ms) 1 1024 0.16 0.16 0.16 0.16 12 12288 0.16 0.16 0.16 1.97 25 25600 0.16 0.16 0.16 4.1 30 30720 0.16 0.16 0.16 4.92 As shown in table 1, about 0.2% (5 ms) is used by the 2D-DCT, which means that 98.8% is left to the other blocks such as QV, LVC and the decompressor block to produce a real time motion image (30 image/second). V. CONCLUSION We have presented the implementation of the 2D-DCT algorithm using LEE graph with combined pipeline architecture. This implementation was realized with a Xilinx XC3S500E Spartan-3E Starter FPGA, clocked at 50 MHz. The use of a reprogrammable device permits the continuing parametric changes of the DCT in real time. The Xilinx System Generator, embedded in MATLAB Simulink was used to program the model and test in the FPGA board using the hardware co-simulation feature tools. Finally, the error percentages about 8 % of the grayscale pixel were very small between the images before the 2D-DCT implementation and after the hardware implementation. ACKNOWLEDGMENT The authors would like to acknowledge the financial support of the use of CAD tools from the Xilinx Corporation. REFERENCES [1] A. K. Jain, “Fundamental of Digital Image Processing”, Prentice-Hall, 1st Ed.,1989. [2] I. E. G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley & Sons, 1st Ed., 2002. [3] A. Puri, “Video Coding using the MPEG-1 Compression Standard”, Society for Information Display Digest of Technical papers, pp. 123- 126, 1992. [4] L. V. Agostini, I. S. Silva, S. Bampi, “Pipelined fast 2D DCT architecture for JPEG image Compression”, The 14th Symposium on Integrated Circuits and Systems Design, pp. 226-231, Sept, 2001. [5] A. B. Watson, “DCT Quantization Matrices Visually Optimized for Individual Images”, Proceedings of SPIE, pp. 202-2216, 1993 [6] N. Ahamed, T. Natarjian, and K. R. Rao, “Discrete Cosine Transform”, IEEE Trans. on Comp., Vol. C-23, pp. 90-93, Jan. 1974. [7] B.G. LEE, “A new algorithm to compute the discrete cosine transform”, IEEE Trans. on Acc., Speech, and Signal Process., pp. 1243-1245 vol. ASSP-32, no. 6, Dec.1984. [8] A. Kassem et al., “Simulation and Implementation of DCT for Image Processing Applications”, Second LAAS International Conference on Computer Simulation, 1997. 323