SlideShare a Scribd company logo
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)
Volume 7, Issue 1, Ver. I (Jan. - Feb. 2017), PP 01-07
e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197
www.iosrjournals.org
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 1 | Page
Optimized linear spatial filters implemented in FPGA
Ivan Kanev, Petya Pavlova
(Department CST, Technical University Sofia - Plovdiv branch, Bulgaria)
Abstract: Linear spatial filters (LSF) are used for filtering of digital images with the purpose of blurring, noise
reduction, detail enhancement etc. The realization of LSF confronts the capital problem of a lot of operations
needed for their computation. In this paper, described is an approach for optimizing of LSF by utilizing parallel
algorithms and their hardware implementation on FPGA. A model and an algorithm based on partial sums and
aimed at calculating the filtered pixels are presented. Defined are criteria for comparing of the different types of
linear filters. A schematic diagram of an FPGA-based DSP operational block is shown. VHDL is utilized for the
hardware design. Conducted are studies focused on comparing the partial sums and the non-partial sums based
methods of filtering. It is ascertained that the methods employing partial sums reduce the number of operations
to the size of the window (3, 5, 7,...) . These FPGA-based LSF are suitable for applications using threshold
detecting, edge detection or image detail enhancement.
Keywords: DSP, FPGA, Linear Spatial Filtering, Partial Sum, VHDL
I. Introduction
The concept of filtering is related to the use of Fourier transform for signal processing in the frequency
domain. The digital filtering of an image uses procedures which are applied directly on the pixels of the image.
In this case, the term spatial filtering is employed to distinguish these processes from filtering in the frequency
domain. In spatial filtering the value of the pixel being filtered is calculated by the cross-correlation or
convolution operators applied on the pixel neighborhood [1]. Smoothing spatial filters are used for blurring and
noise reduction in an image. Blurring is used in the image preprocessing. such as removing of small details in
the image or overcoming small gaps in lines or curves (Fig. 1). Noise reduction is used to improve image
quality. Spatial filtering are classified as linear and nonlinear. If the operations performed on neighbourhood
pixels are linear, filters are classified as linear. Otherwise filters are classified as non-linear [2]. This article
deals only with linear spatial filters (LSF), because they are the basis for important applications in digital
filtering of images such as threshold detection, edge detection etc.
(a) (b) (c)
Fig. 1. Filter images LSF: (a) the image from the Hubble Space Telescope: (b) filtering the input image with a
mask of dimensions 15 * 15; (c) performance thresholds [1].
A major problem in the implementation of LSF is that calculating the pixels being filtered requires
multiple operations summation, multiplication and division. The number of these operations is increased
proportionally to increase of the image size and reaches hundreds of millions. Oh that basis, they are difficult to
apply when employing general-purpose microprocessors (GPP), or DSP processors [3]. The limitations that are
inherent of GPP and DSP processors can be overcome through implementing LSF on a single chip (System on a
Chip (SoC)) based on Field Programmable Gate Arrays (FPGAs) [4]. FPGA is having the capability of parallel
processing and hence it is a good platform for image processing [5]. Bailey, [6] analyzes various pipeline and
parallel processing schemes of LSF, implemented on FPGA. The development of such schemes can be realized
in two directions: new algorithms to calculate the pixel being filtered; reducing the number of operating units.
Vasilev at al. [7] propose an algorithm to reduce the time to calculate operations with the use of partial sums.
Khan [8], Malik [9] recommend FPGA-based LSF to be constructed with a DSP blocks using multiply
accumulate (MAC). To optimize the number of registers used to store the weight coefficients a ROM- based
finite-state machine (FSM) can be employed [10].
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 2 | Page
In this paper, discussed is an approach to optimize the LSF using parallel algorithms based on partial sums,
and their hardware implementation on FPGA.
The paper is organized as follows. Section II presents a parallel algorithm and a DSP operating unit The
results are analyzed in Section III. Section IV concludes the paper.
II. Parallel Algorithm And DSP Operational Block For Calculating
LSF Using Partial Sums
The filtering of a pixel )( y,xp in an image of size NM  results in a new pixel g (x, y) with
coordinates equal to the coordinates of the center of the neighbourhood and whose value is the result of the
filtering operation [1]. The effect of filtering is determined by the mask of weight coefficients )( b,aw . The
cross-correlation operator G(x, y) used to calculate the filtered pixel employing linear spatial filters, and defined
in the window of size S × S, is presented by the following relation:





k
ka
by,axp.
k
kb
b,awy,xG )()()( , (1)
where
1210  M,..,,,x ; 1210  N,..,,,y . (2)
2
1

S
k ; (3)
S is an odd number, and 3S . (4)
The normalized value of the filtered pixel g (x, y) can be calculated employing two methods:
Method # 1 - dividing G (x, y) by the normalizing factor )( b,aNF
)()()( b,aNF/y,xGy,xg  , (5)
where





k
ka
k
kb
b,awb,aNF )()( . (6)
Method # 2 - by multiplying the )( y,xG by the reciprocal value of the normalization factor )(1
b,aNF

)(
1
)()( b,aNF.y,xGy,xg

 . (7)
Although both methods provide the same results, their implementation in FPGA-based LSF presents
substantial differences. With Method # 1 the normalization is implemented through a divider synthesized with
the FPGA logical elements, and with Method # 2 - through FPGA built-in multipliers. In this regard, method # 2
is preferable because it does not use the logical elements and the normalization is performed more quickly. [11]
1. Algorithm for parallel computation of the filtered pixels by using partial sums
The algorithm for the parallel calculation of )( y,xg using the partial sums are based on the fact that
the filtering of several consecutive windows includes pixels having common local coordinates. This
circumstance can be used as a basis for algorithms in which, parallel to the calculation of the current value of
)( y,xG and )( y,xg , calculated are partial sums PS1 (x, y), PS2 (x, y), .. composed of pixels and weight
coefficients that can be used in subsequent iterations.
Fig 2. Conceptual model of a method for parallel calculation of LSF using partial sums: Step # 1; x, y = 1
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 3 | Page
In Fig. 2 shows a conceptual model of a method for parallel calculation of the LSF using the partial
sums. In the proposed model it is assumed that the pixels of the input image p (x, y) arriving in succession at the
input of the filter, by the columns of the window in which the filtering is carried out:
p(x,y) = p(0,0), p(0,1),.., p(3,0), p(3,1) p(3,2),p(4,0), p(4,1), p(4,2),..
The pixels of the input image are multiplied by the weight coefficients of the Gaussian kernels PS
weights, PS1 weights, PS2 weights, and summed up in three accumulators (Σ). In the window of size S = 3 (4),
four parallel processes are realized in three successive steps through which calculated are )( y,xG , )( y,xg ,
PS1 (x, y), and PS2 (x, y). The final values of the results are stored in the output registers Gq, PS1q and PS2q of
the accumulator. The normalized value of the filtered pixel )( y,xg is rounded by the factor  10,r,r  , to
ensure more accuracy to the calculations.
Start of the algorithm
y=0
Step # 1. This step is performed for all x = 1. Realized are three parallel processes in which calculated
are )( y,xG , the final value of PS1(x, y), the initial value of PS2(x, y) and the current values registers Gq, PS1q,
PS2q are set. To perform the calculations S2
operations are required.
11  yy;x , moving the window one position to the right
Process #1: 




1
1
)(
1
1
)11()(
a
by,axp.
b
b,awy,xG ; (8)
)( y,xGGq . (9)
Process #2: 




0
1
)1(
1
1
)11()(1
a
by,axp.
b
b,awy,xPS ; (10)
)(11 y,xPSqPS  . (11)
Process #3: )1(
1
1
)10()(2 by,xp.
b
b,wy,xPS 

 . (12)
Step #2. This step is performed for 1 x,x . Four parallel processes are implemented to estimate the
current values and )( y,xG , the initial value of PS1(x, y) and the final value of the PS2(x, y). . To perform the
calculations S operations are required.
Process #4: rNF.Gqy,xg 


1
)( . (13)
1 xx , moving the window one position to the right
Process #1: )1(
1
1
)12(1)( by,xp.
b
b,wqPSy,xG 

 ; (14)
)( y,xGGq . (15)
Process #2:      by,xp.
b
b,wy,xPS 

 1
1
1
121 . (16)
Process #3:    by,xp.
b
b,wy,xPSy,xPS 

 1
1
1
12)1(2)(2 ; (17)
)(22 y,xPSqPS  . (18)
Step #3. This step is performed for. 1 x,x . Implemented are four parallel processes through which
current values and G(x, y), the final value PS1(x, y) and the initial value of PS2(x, y) are calculated. To perform
the calculations S operations are required.
Process #4: rNF.Gqy,xg 


1
)( . (19)
1 xx , moving the window one position to the right
Process #1: )1(
1
1
)12(2)( by,xp.
b
b,wqPSy,xG 

 ; (20)
)( y,xGGq . (21)
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 4 | Page
Process #2:      by,xp.
b
b,w)y,x(PSy,xPS 

 1
1
1
12111 . (22)
)(11 y,xPSqPS  . (23)
Process #3:    by,xp.
b
b,wy,xPS 

 1
1
1
12)(2 . (24)
Operations for managing the cycles:
if 2 Mx then Step #2
еlse if 1 Ny then Step #1
еlse end of the algorithm
The timing of the steps and processes in the execution of the algorithm is shown in Fig. 3.
Fig. 3. Generalized timing model
The effect of applying the algorithm using partial sums can be assessed by comparing the algorithm
without using partial sums. Let us, for a given k with the Wn denote the number of windows required to filter the
input image of size NM  , and with S the number of operations in a single window. Then
   kNkMWn  . (25)
The number of operations in algorithms without the use of partial sums CO (8) and using my partial sums PO
(8) (14) (20) defined by the following expressions:
1
2
 Wn.SOC ; (26)
  Wn.SkNSOP 
2
. (27)
Then the relation
PCR O/OF  , (28)
can be used as a criterion for evaluating the maximum value of which reduces the number of operations in the
algorithms using the partial sums as compared to the algorithms without using the partial sums.
The real effect of reducing the number of operations in the algorithms using the partial sums can be
determined from the relation RT :
PCR T/TT  , (29)
where:
CC O.tclkT 1 ; (30)
PP O.tclkT 2 ; (31)
tclk1, (32)
is the period of clk (Fig. 3) in the algorithms without the use of partial sums;
tclk2 , (33)
is the period of clk in algorithms using partial sums.
In contrast to RF , the relation RT takes into account the influence of the period of the clk for which
positive Slack is calculated. This factor is calculated after timing analysis of the hardware realization of filters.
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 5 | Page
2. FPGA-based DSP operational block
To realize the algorithm FPGA-based DSP operational block is developed (Fig.4). A combinational
scheme in which parallel processes are combined with the use of pipelines is employed.
Fig. 4. Structural diagram of FPGA-based DSP operational block: k = 1.
DSP operational block consists of four sub blocks:
1. WAVR DSP BLOCK. In this block, calculate a normalized weighted value of the filtered pixel
)( y,xg (13), (19). For the realization of operations in this block is used a four stage pipeline.
2. PS1 (2) MAC DSP BLOCK. When k = 1, two identical multiply accumulate (MAC) DSP blocks are
used to calculate the partial sums PS1(x,y) (10), (16), (22) и PS2(x,y) (12), (17), (24). The output registers PS1q
and PS2q are set with the current values of PS1(x,y) and PS2(x,y) in compliance with equations (11), (23) and
(18). For the realization of operations in these blocks a two stage pipeline is used.
3. CONTROL BLOCK. For managing and synchronizing the processes in DSP a finite state machine
synthesized with the built-in FPGA ROM memory (ROM FSM) is used. This FSM generates three groups of
signals:
3.1. Weight coefficients
Table I.Weight coefficients: k=1; NF =16
Weights/Step Step #1 Step #2 Step #3
PSweights 1 2 1 2 4 2 1 2 1 1 2 1 1 2 1
P1weights NA 1 2 1 2 4 2 1 2 1 2 4 2
PS2weights NA 1 2 1 2 4 2 1 2 1
3.2 Control signals. Signals WAVR Control, PS1 Control and PS2 Control is used to manage the registers and
multiplexers in the individual DSP blocks.
3.3. Next Address. These signals are determined next state ROM FSM.
The interface of the DSP operational block is realized with the following input-output ports:
 Start filtering (SF) – input port. When this signal is set in high levels LSF filtering operations start.
 Input pixsel p(x,y) – input port. Pixels of the input image are consecutively set at this port by the columns of
the window in which the filtering is carried out.
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 6 | Page
 Reciprocal of normalization factor 1
NF - input port. Constant 1
NF (6) (7) is set at this port before the
activation of SF and remains unchanged until of all the image pixels are filtered.
 g(x,y) – output port. At this port, the current value of the filtered pixel is set (7).
FPGA-based DSP operational block for parallel calculation of LSF using partial sums (Fig.4), can be extended
to other values of k (3) with the instantiation of a new pair PS MAC DSP Block in the project, for each new
value of k .
III. Result And Discussion
For the purposes of this study, VHDL is used for the hardware design. The low cost, low resources
FPGA family Cyclone V of Altera is employed. Table II shows the utilized resources in the FPGA- based
hardware implementation of a LSF
Table II. Resource utilization summary ( Altera Cyclone VE 5CEBA4F17C6 Device)
Resource utilization Available Without partial sum With partial sum
Used Utilization Used Utilization
Logic utilization (in ALMs) 18480 14 0.0756 % 36 0.1948 %
Total registers - 41 - 86 -
Total block memory bits 3153920 1792 0.0568 % 1984 0.0629 %
Total (Mult) DSP Blocks 66 2 3 % 4 6 %
Compared are the utilized resources with and without the use of partial sums. The relatively small share
of the used adaptive logic modules (ALMs) is due to: the use of built-in FPGA DSP Multiplier Blocks;
combining in a ROM-based FSM the signals for managing the operational block and weight coefficients.A
static timing analysis between the related registers of the DSP operational block. The minimum values of the
Slack calculated in the different processes, are shown in Table III.
Table III. Registers Path minimum slack summary
Process # Registers Path Clock Delay1
Clock Delay2
Data Delay Slack
1 Gxy_Data to Gq 3.264 2..962 3.807 1.888
2 PS1weight to PS1q 3.262 2.652 4.627 0.81
3 PS2weight to PS2q 3.252 2.664 4.529 0.93
4 Gq to g 3.263 3.211 6.602 10.85
It is ascertained that the parameter Slack reaches critical values between registers PS1 (2) weight to
PS1 (2) q (Fig. 5). The timing of Slack can be improved by increasing the number of steps of the pipelines
through which PS1 (2) MAC DSP blocks are realized.
Fig. 5. Timing analisis registers to registers path: PS1weight to PS1q.
Table IV shows the dependence of OC (26), OP (27), TC (30), TR (31), FR (28), on the size of the
image - M × N.
Table IV. Dependence of OC, OP , TC , TR , FR from M × N.
М × N
640×480 1024×768 1600×1200
OC 2754730 7061770 16103529
OP 922554 2360826 5378634
TC [ms] 13,7736 35,3088 80,5176
TP [ms] 5,5353 14,1650 32,2718
FR 2,9860 2,9912 2.9940
TR 2,4883 2,4927 2,4950
Optimized linear spatial filters implemented in FPGA
DOI: 10.9790/4200-0701010107 www.iosrjournals.org 7 | Page
Studies are carried out for k = 1 (3), tclk1 = 5ns (32) and tclk2 = 6ns (33). Although for the selected
FPGA family positive Slack is reached for tclk2> tclk1, the relation TR reaches up to 83% of FR. As a result of
studies carried out it is ascertained that with the increase of k increases the value of RF , and 12  kFR . The
upward trend of RF is shown in Fig. 6
2,99
4,96
6,92
8,87
10,81
1
3
5
7
9
11
1 2 3 4 5
FR
k
Fig. 6. Dependence of RF on the size of the window k: M × N = 640 × 480
By increasing the size of the window a large number of parallel processes are realized, and this results in
increased RF .
IV. Conclusion
In this paper, discussed is an approach aimed at optimizing the LSF using parallel algorithms based on
partial sums and their hardware implementation in FPGA. Presented is an FPGA-based DSP operational block
to calculate the LSF. Provided is an option for expanding this block through instantiating new MAC DSP
blocks within the module. Compared are the utilized FPGA resources with the use of partial sums, and without
the use of partial sums. It is ascertained that for the hardware implementation of algorithms based on the partial
sums a minimum number of resources is required. A static timing analysis among the registers of the DSP
operational block is realized. The critical values of the parameter Slack are ascertained. Studied is the
reduction of operations in the hardware implementation of the algorithm using partial amounts.
References
[1] Gonzalez R., Woods R., Digital Image Processing, third edition, Prentice Hall, 2012.
[2] Uwe Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays (FPGA), 4th Edition, Springer 2014.
[3] Jackson L., Digital Filters and Signal Processing, with MATLAB Exercises, 3rd Edition, Springer 2010.
[4] Salunkhe A., Bombale U., Optimized Implementation of Edge Preserving Color Guided Filter for Video on FPGA, IOSR Journal
of VLSI and Signal Processing , e-ISSN: 2319 – 4200, Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 27-33
[5] Bailey D., Design for Embedded Image Processing on FPGAs, First Edition, John Wiley & Sons (Asia) Pte Ltd., 2011
[6] Dimitrios B. et all, An FPGA-based hardware implementation of configurable pixel-level color image fusion, IEEE Trans. Geosci.
Remote Sens., 50(2), 2012, 362–373.
[7] Vasilev N., Bosakova-Ardenska A., A Parallel Conveyer Algorithm for the Recursive Method of Scanning Mask for Primary
Images Processing, CompSys Tech’2006, Veliko Tarnovo, Bulgaria, ISBN-13: 978-954-9641-46-2, pp IIIA.19-1 – IIIA.19-7, 2006.
[8] Khan S., Digital Design of Signal Processing Systems: A Practical Approach, First Edition. John Wiley & Sons, Ltd. Published
2011.
[9] Malik S. et. all, Implementation of MAC unit using booth multiplier & ripple carry adder, International Journal of Applied
Engineering Research, Vol 7, No 11, 2012
[10] M. Rawski, H. Selvaraj, and T. Luba, An application of functional decomposition in ROM-based FSM implementation in FPGA
devices, Journal of Systems Architecture, vol. 51, p. 424, 2005.
[11] Kanev I., Comparison of Two Methods for Computing FPGA-based Weighted Average Linear Spatial Filters CompSysTech’15,
ACM International Conference Proceeding Series, Vol. 1008, ACM Inc., N.Y. USA, pages 276-283, 2015.

More Related Content

PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
GENERIC APPROACH FOR VISIBLE WATERMARKING
PPT
05 histogram processing DIP
PDF
PPTX
Lect 03 - first portion
PPTX
Histogram based Enhancement
PDF
改进的固定点图像复原算法_英文_阎雪飞
PPTX
Image Restoration And Reconstruction
Welcome to International Journal of Engineering Research and Development (IJERD)
GENERIC APPROACH FOR VISIBLE WATERMARKING
05 histogram processing DIP
Lect 03 - first portion
Histogram based Enhancement
改进的固定点图像复原算法_英文_阎雪飞
Image Restoration And Reconstruction

What's hot (19)

PDF
Hough Transform: Serial and Parallel Implementations
PDF
Lec07 aggregation-and-retrieval-system
PPTX
Tennis video shot classification based on support vector
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
PPTX
Contrast limited adaptive histogram equalization
PDF
Mask R-CNN
PDF
Digital Image Processing: Image Enhancement in the Frequency Domain
PDF
Histogram Operation in Image Processing
PPTX
LAPLACE TRANSFORM SUITABILITY FOR IMAGE PROCESSING
PDF
A Comparative Study of Histogram Equalization Based Image Enhancement Techniq...
PDF
第13回 配信講義 計算科学技術特論A(2021)
PPTX
Histogram Equalization
PPT
Histogram equalization
PDF
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
PDF
Lecture 3 image sampling and quantization
PPTX
Log Transformation in Image Processing with Example
PPTX
Parallel implementation of geodesic distance transform with application in su...
PDF
Lec12 review-part-i
Hough Transform: Serial and Parallel Implementations
Lec07 aggregation-and-retrieval-system
Tennis video shot classification based on support vector
Welcome to International Journal of Engineering Research and Development (IJERD)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Contrast limited adaptive histogram equalization
Mask R-CNN
Digital Image Processing: Image Enhancement in the Frequency Domain
Histogram Operation in Image Processing
LAPLACE TRANSFORM SUITABILITY FOR IMAGE PROCESSING
A Comparative Study of Histogram Equalization Based Image Enhancement Techniq...
第13回 配信講義 計算科学技術特論A(2021)
Histogram Equalization
Histogram equalization
A DIGITAL COLOR IMAGE WATERMARKING SYSTEM USING BLIND SOURCE SEPARATION
Lecture 3 image sampling and quantization
Log Transformation in Image Processing with Example
Parallel implementation of geodesic distance transform with application in su...
Lec12 review-part-i
Ad

Similar to Optimized linear spatial filters implemented in FPGA (20)

PDF
F0255046056
PDF
Intra Frame Coding in H.264 to Obtain Consistent PSNR and Reduce Bit Rate for...
PDF
Two-Channel Quadrature Mirror Filter Bank Design using FIR Polyphase Component
PDF
Parallel implementation of geodesic distance transform with application in su...
PDF
E046012533
PDF
International Journal of Engineering Inventions (IJEI)
PDF
Neighbour Local Variability for Multi-Focus Images Fusion
PDF
SHORT LISTING LIKELY IMAGES USING PROPOSED MODIFIED-SIFT TOGETHER WITH CONVEN...
PDF
SHORT LISTING LIKELY IMAGES USING PROPOSED MODIFIED-SIFT TOGETHER WITH CONVEN...
PDF
RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...
PDF
New Approach of Preprocessing For Numeral Recognition
PDF
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
PDF
Improving initial generations in pso algorithm for transportation network des...
PDF
Neighbour Local Variability for Multi-Focus Images Fusion
PDF
Neighbour Local Variability for Multi-Focus Images Fusion
PDF
IRJET- A Novel Adaptive Sub-Band Filter Design with BD-VSS using Particle Swa...
PDF
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
PDF
E0212730
PDF
Improved Characters Feature Extraction and Matching Algorithm Based on SIFT
PDF
Performance analysis of transformation and bogdonov chaotic substitution base...
F0255046056
Intra Frame Coding in H.264 to Obtain Consistent PSNR and Reduce Bit Rate for...
Two-Channel Quadrature Mirror Filter Bank Design using FIR Polyphase Component
Parallel implementation of geodesic distance transform with application in su...
E046012533
International Journal of Engineering Inventions (IJEI)
Neighbour Local Variability for Multi-Focus Images Fusion
SHORT LISTING LIKELY IMAGES USING PROPOSED MODIFIED-SIFT TOGETHER WITH CONVEN...
SHORT LISTING LIKELY IMAGES USING PROPOSED MODIFIED-SIFT TOGETHER WITH CONVEN...
RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...
New Approach of Preprocessing For Numeral Recognition
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Improving initial generations in pso algorithm for transportation network des...
Neighbour Local Variability for Multi-Focus Images Fusion
Neighbour Local Variability for Multi-Focus Images Fusion
IRJET- A Novel Adaptive Sub-Band Filter Design with BD-VSS using Particle Swa...
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
E0212730
Improved Characters Feature Extraction and Matching Algorithm Based on SIFT
Performance analysis of transformation and bogdonov chaotic substitution base...
Ad

More from IOSRJVSP (20)

PDF
PCIe BUS: A State-of-the-Art-Review
PDF
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
PDF
Design of Low Voltage D-Flip Flop Using MOS Current Mode Logic (MCML) For Hig...
PDF
Measuring the Effects of Rational 7th and 8th Order Distortion Model in the R...
PDF
Analyzing the Impact of Sleep Transistor on SRAM
PDF
Robust Fault Tolerance in Content Addressable Memory Interface
PDF
Color Particle Filter Tracking using Frame Segmentation based on JND Color an...
PDF
Design and FPGA Implementation of AMBA APB Bridge with Clock Skew Minimizatio...
PDF
Simultaneous Data Path and Clock Path Engineering Change Order for Efficient ...
PDF
Design And Implementation Of Arithmetic Logic Unit Using Modified Quasi Stati...
PDF
P-Wave Related Disease Detection Using DWT
PDF
Design and Synthesis of Multiplexer based Universal Shift Register using Reve...
PDF
Design and Implementation of a Low Noise Amplifier for Ultra Wideband Applica...
PDF
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
PDF
IC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design System
PDF
Design & Implementation of Subthreshold Memory Cell design based on the prima...
PDF
Retinal Macular Edema Detection Using Optical Coherence Tomography Images
PDF
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
PDF
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
PDF
Performance Analysis of the Sigmoid and Fibonacci Activation Functions in NGA...
PCIe BUS: A State-of-the-Art-Review
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
Design of Low Voltage D-Flip Flop Using MOS Current Mode Logic (MCML) For Hig...
Measuring the Effects of Rational 7th and 8th Order Distortion Model in the R...
Analyzing the Impact of Sleep Transistor on SRAM
Robust Fault Tolerance in Content Addressable Memory Interface
Color Particle Filter Tracking using Frame Segmentation based on JND Color an...
Design and FPGA Implementation of AMBA APB Bridge with Clock Skew Minimizatio...
Simultaneous Data Path and Clock Path Engineering Change Order for Efficient ...
Design And Implementation Of Arithmetic Logic Unit Using Modified Quasi Stati...
P-Wave Related Disease Detection Using DWT
Design and Synthesis of Multiplexer based Universal Shift Register using Reve...
Design and Implementation of a Low Noise Amplifier for Ultra Wideband Applica...
ElectroencephalographySignalClassification based on Sub-Band Common Spatial P...
IC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design System
Design & Implementation of Subthreshold Memory Cell design based on the prima...
Retinal Macular Edema Detection Using Optical Coherence Tomography Images
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Performance Analysis of the Sigmoid and Fibonacci Activation Functions in NGA...

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
composite construction of structures.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
OOP with Java - Java Introduction (Basics)
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
additive manufacturing of ss316l using mig welding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
composite construction of structures.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Lecture Notes Electrical Wiring System Components
Model Code of Practice - Construction Work - 21102022 .pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
573137875-Attendance-Management-System-original
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
UNIT 4 Total Quality Management .pptx
Lesson 3_Tessellation.pptx finite Mathematics
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Structs to JSON How Go Powers REST APIs.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
bas. eng. economics group 4 presentation 1.pptx
OOP with Java - Java Introduction (Basics)

Optimized linear spatial filters implemented in FPGA

  • 1. IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue 1, Ver. I (Jan. - Feb. 2017), PP 01-07 e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197 www.iosrjournals.org DOI: 10.9790/4200-0701010107 www.iosrjournals.org 1 | Page Optimized linear spatial filters implemented in FPGA Ivan Kanev, Petya Pavlova (Department CST, Technical University Sofia - Plovdiv branch, Bulgaria) Abstract: Linear spatial filters (LSF) are used for filtering of digital images with the purpose of blurring, noise reduction, detail enhancement etc. The realization of LSF confronts the capital problem of a lot of operations needed for their computation. In this paper, described is an approach for optimizing of LSF by utilizing parallel algorithms and their hardware implementation on FPGA. A model and an algorithm based on partial sums and aimed at calculating the filtered pixels are presented. Defined are criteria for comparing of the different types of linear filters. A schematic diagram of an FPGA-based DSP operational block is shown. VHDL is utilized for the hardware design. Conducted are studies focused on comparing the partial sums and the non-partial sums based methods of filtering. It is ascertained that the methods employing partial sums reduce the number of operations to the size of the window (3, 5, 7,...) . These FPGA-based LSF are suitable for applications using threshold detecting, edge detection or image detail enhancement. Keywords: DSP, FPGA, Linear Spatial Filtering, Partial Sum, VHDL I. Introduction The concept of filtering is related to the use of Fourier transform for signal processing in the frequency domain. The digital filtering of an image uses procedures which are applied directly on the pixels of the image. In this case, the term spatial filtering is employed to distinguish these processes from filtering in the frequency domain. In spatial filtering the value of the pixel being filtered is calculated by the cross-correlation or convolution operators applied on the pixel neighborhood [1]. Smoothing spatial filters are used for blurring and noise reduction in an image. Blurring is used in the image preprocessing. such as removing of small details in the image or overcoming small gaps in lines or curves (Fig. 1). Noise reduction is used to improve image quality. Spatial filtering are classified as linear and nonlinear. If the operations performed on neighbourhood pixels are linear, filters are classified as linear. Otherwise filters are classified as non-linear [2]. This article deals only with linear spatial filters (LSF), because they are the basis for important applications in digital filtering of images such as threshold detection, edge detection etc. (a) (b) (c) Fig. 1. Filter images LSF: (a) the image from the Hubble Space Telescope: (b) filtering the input image with a mask of dimensions 15 * 15; (c) performance thresholds [1]. A major problem in the implementation of LSF is that calculating the pixels being filtered requires multiple operations summation, multiplication and division. The number of these operations is increased proportionally to increase of the image size and reaches hundreds of millions. Oh that basis, they are difficult to apply when employing general-purpose microprocessors (GPP), or DSP processors [3]. The limitations that are inherent of GPP and DSP processors can be overcome through implementing LSF on a single chip (System on a Chip (SoC)) based on Field Programmable Gate Arrays (FPGAs) [4]. FPGA is having the capability of parallel processing and hence it is a good platform for image processing [5]. Bailey, [6] analyzes various pipeline and parallel processing schemes of LSF, implemented on FPGA. The development of such schemes can be realized in two directions: new algorithms to calculate the pixel being filtered; reducing the number of operating units. Vasilev at al. [7] propose an algorithm to reduce the time to calculate operations with the use of partial sums. Khan [8], Malik [9] recommend FPGA-based LSF to be constructed with a DSP blocks using multiply accumulate (MAC). To optimize the number of registers used to store the weight coefficients a ROM- based finite-state machine (FSM) can be employed [10].
  • 2. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 2 | Page In this paper, discussed is an approach to optimize the LSF using parallel algorithms based on partial sums, and their hardware implementation on FPGA. The paper is organized as follows. Section II presents a parallel algorithm and a DSP operating unit The results are analyzed in Section III. Section IV concludes the paper. II. Parallel Algorithm And DSP Operational Block For Calculating LSF Using Partial Sums The filtering of a pixel )( y,xp in an image of size NM  results in a new pixel g (x, y) with coordinates equal to the coordinates of the center of the neighbourhood and whose value is the result of the filtering operation [1]. The effect of filtering is determined by the mask of weight coefficients )( b,aw . The cross-correlation operator G(x, y) used to calculate the filtered pixel employing linear spatial filters, and defined in the window of size S × S, is presented by the following relation:      k ka by,axp. k kb b,awy,xG )()()( , (1) where 1210  M,..,,,x ; 1210  N,..,,,y . (2) 2 1  S k ; (3) S is an odd number, and 3S . (4) The normalized value of the filtered pixel g (x, y) can be calculated employing two methods: Method # 1 - dividing G (x, y) by the normalizing factor )( b,aNF )()()( b,aNF/y,xGy,xg  , (5) where      k ka k kb b,awb,aNF )()( . (6) Method # 2 - by multiplying the )( y,xG by the reciprocal value of the normalization factor )(1 b,aNF  )( 1 )()( b,aNF.y,xGy,xg   . (7) Although both methods provide the same results, their implementation in FPGA-based LSF presents substantial differences. With Method # 1 the normalization is implemented through a divider synthesized with the FPGA logical elements, and with Method # 2 - through FPGA built-in multipliers. In this regard, method # 2 is preferable because it does not use the logical elements and the normalization is performed more quickly. [11] 1. Algorithm for parallel computation of the filtered pixels by using partial sums The algorithm for the parallel calculation of )( y,xg using the partial sums are based on the fact that the filtering of several consecutive windows includes pixels having common local coordinates. This circumstance can be used as a basis for algorithms in which, parallel to the calculation of the current value of )( y,xG and )( y,xg , calculated are partial sums PS1 (x, y), PS2 (x, y), .. composed of pixels and weight coefficients that can be used in subsequent iterations. Fig 2. Conceptual model of a method for parallel calculation of LSF using partial sums: Step # 1; x, y = 1
  • 3. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 3 | Page In Fig. 2 shows a conceptual model of a method for parallel calculation of the LSF using the partial sums. In the proposed model it is assumed that the pixels of the input image p (x, y) arriving in succession at the input of the filter, by the columns of the window in which the filtering is carried out: p(x,y) = p(0,0), p(0,1),.., p(3,0), p(3,1) p(3,2),p(4,0), p(4,1), p(4,2),.. The pixels of the input image are multiplied by the weight coefficients of the Gaussian kernels PS weights, PS1 weights, PS2 weights, and summed up in three accumulators (Σ). In the window of size S = 3 (4), four parallel processes are realized in three successive steps through which calculated are )( y,xG , )( y,xg , PS1 (x, y), and PS2 (x, y). The final values of the results are stored in the output registers Gq, PS1q and PS2q of the accumulator. The normalized value of the filtered pixel )( y,xg is rounded by the factor  10,r,r  , to ensure more accuracy to the calculations. Start of the algorithm y=0 Step # 1. This step is performed for all x = 1. Realized are three parallel processes in which calculated are )( y,xG , the final value of PS1(x, y), the initial value of PS2(x, y) and the current values registers Gq, PS1q, PS2q are set. To perform the calculations S2 operations are required. 11  yy;x , moving the window one position to the right Process #1:      1 1 )( 1 1 )11()( a by,axp. b b,awy,xG ; (8) )( y,xGGq . (9) Process #2:      0 1 )1( 1 1 )11()(1 a by,axp. b b,awy,xPS ; (10) )(11 y,xPSqPS  . (11) Process #3: )1( 1 1 )10()(2 by,xp. b b,wy,xPS    . (12) Step #2. This step is performed for 1 x,x . Four parallel processes are implemented to estimate the current values and )( y,xG , the initial value of PS1(x, y) and the final value of the PS2(x, y). . To perform the calculations S operations are required. Process #4: rNF.Gqy,xg    1 )( . (13) 1 xx , moving the window one position to the right Process #1: )1( 1 1 )12(1)( by,xp. b b,wqPSy,xG    ; (14) )( y,xGGq . (15) Process #2:      by,xp. b b,wy,xPS    1 1 1 121 . (16) Process #3:    by,xp. b b,wy,xPSy,xPS    1 1 1 12)1(2)(2 ; (17) )(22 y,xPSqPS  . (18) Step #3. This step is performed for. 1 x,x . Implemented are four parallel processes through which current values and G(x, y), the final value PS1(x, y) and the initial value of PS2(x, y) are calculated. To perform the calculations S operations are required. Process #4: rNF.Gqy,xg    1 )( . (19) 1 xx , moving the window one position to the right Process #1: )1( 1 1 )12(2)( by,xp. b b,wqPSy,xG    ; (20) )( y,xGGq . (21)
  • 4. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 4 | Page Process #2:      by,xp. b b,w)y,x(PSy,xPS    1 1 1 12111 . (22) )(11 y,xPSqPS  . (23) Process #3:    by,xp. b b,wy,xPS    1 1 1 12)(2 . (24) Operations for managing the cycles: if 2 Mx then Step #2 еlse if 1 Ny then Step #1 еlse end of the algorithm The timing of the steps and processes in the execution of the algorithm is shown in Fig. 3. Fig. 3. Generalized timing model The effect of applying the algorithm using partial sums can be assessed by comparing the algorithm without using partial sums. Let us, for a given k with the Wn denote the number of windows required to filter the input image of size NM  , and with S the number of operations in a single window. Then    kNkMWn  . (25) The number of operations in algorithms without the use of partial sums CO (8) and using my partial sums PO (8) (14) (20) defined by the following expressions: 1 2  Wn.SOC ; (26)   Wn.SkNSOP  2 . (27) Then the relation PCR O/OF  , (28) can be used as a criterion for evaluating the maximum value of which reduces the number of operations in the algorithms using the partial sums as compared to the algorithms without using the partial sums. The real effect of reducing the number of operations in the algorithms using the partial sums can be determined from the relation RT : PCR T/TT  , (29) where: CC O.tclkT 1 ; (30) PP O.tclkT 2 ; (31) tclk1, (32) is the period of clk (Fig. 3) in the algorithms without the use of partial sums; tclk2 , (33) is the period of clk in algorithms using partial sums. In contrast to RF , the relation RT takes into account the influence of the period of the clk for which positive Slack is calculated. This factor is calculated after timing analysis of the hardware realization of filters.
  • 5. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 5 | Page 2. FPGA-based DSP operational block To realize the algorithm FPGA-based DSP operational block is developed (Fig.4). A combinational scheme in which parallel processes are combined with the use of pipelines is employed. Fig. 4. Structural diagram of FPGA-based DSP operational block: k = 1. DSP operational block consists of four sub blocks: 1. WAVR DSP BLOCK. In this block, calculate a normalized weighted value of the filtered pixel )( y,xg (13), (19). For the realization of operations in this block is used a four stage pipeline. 2. PS1 (2) MAC DSP BLOCK. When k = 1, two identical multiply accumulate (MAC) DSP blocks are used to calculate the partial sums PS1(x,y) (10), (16), (22) и PS2(x,y) (12), (17), (24). The output registers PS1q and PS2q are set with the current values of PS1(x,y) and PS2(x,y) in compliance with equations (11), (23) and (18). For the realization of operations in these blocks a two stage pipeline is used. 3. CONTROL BLOCK. For managing and synchronizing the processes in DSP a finite state machine synthesized with the built-in FPGA ROM memory (ROM FSM) is used. This FSM generates three groups of signals: 3.1. Weight coefficients Table I.Weight coefficients: k=1; NF =16 Weights/Step Step #1 Step #2 Step #3 PSweights 1 2 1 2 4 2 1 2 1 1 2 1 1 2 1 P1weights NA 1 2 1 2 4 2 1 2 1 2 4 2 PS2weights NA 1 2 1 2 4 2 1 2 1 3.2 Control signals. Signals WAVR Control, PS1 Control and PS2 Control is used to manage the registers and multiplexers in the individual DSP blocks. 3.3. Next Address. These signals are determined next state ROM FSM. The interface of the DSP operational block is realized with the following input-output ports:  Start filtering (SF) – input port. When this signal is set in high levels LSF filtering operations start.  Input pixsel p(x,y) – input port. Pixels of the input image are consecutively set at this port by the columns of the window in which the filtering is carried out.
  • 6. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 6 | Page  Reciprocal of normalization factor 1 NF - input port. Constant 1 NF (6) (7) is set at this port before the activation of SF and remains unchanged until of all the image pixels are filtered.  g(x,y) – output port. At this port, the current value of the filtered pixel is set (7). FPGA-based DSP operational block for parallel calculation of LSF using partial sums (Fig.4), can be extended to other values of k (3) with the instantiation of a new pair PS MAC DSP Block in the project, for each new value of k . III. Result And Discussion For the purposes of this study, VHDL is used for the hardware design. The low cost, low resources FPGA family Cyclone V of Altera is employed. Table II shows the utilized resources in the FPGA- based hardware implementation of a LSF Table II. Resource utilization summary ( Altera Cyclone VE 5CEBA4F17C6 Device) Resource utilization Available Without partial sum With partial sum Used Utilization Used Utilization Logic utilization (in ALMs) 18480 14 0.0756 % 36 0.1948 % Total registers - 41 - 86 - Total block memory bits 3153920 1792 0.0568 % 1984 0.0629 % Total (Mult) DSP Blocks 66 2 3 % 4 6 % Compared are the utilized resources with and without the use of partial sums. The relatively small share of the used adaptive logic modules (ALMs) is due to: the use of built-in FPGA DSP Multiplier Blocks; combining in a ROM-based FSM the signals for managing the operational block and weight coefficients.A static timing analysis between the related registers of the DSP operational block. The minimum values of the Slack calculated in the different processes, are shown in Table III. Table III. Registers Path minimum slack summary Process # Registers Path Clock Delay1 Clock Delay2 Data Delay Slack 1 Gxy_Data to Gq 3.264 2..962 3.807 1.888 2 PS1weight to PS1q 3.262 2.652 4.627 0.81 3 PS2weight to PS2q 3.252 2.664 4.529 0.93 4 Gq to g 3.263 3.211 6.602 10.85 It is ascertained that the parameter Slack reaches critical values between registers PS1 (2) weight to PS1 (2) q (Fig. 5). The timing of Slack can be improved by increasing the number of steps of the pipelines through which PS1 (2) MAC DSP blocks are realized. Fig. 5. Timing analisis registers to registers path: PS1weight to PS1q. Table IV shows the dependence of OC (26), OP (27), TC (30), TR (31), FR (28), on the size of the image - M × N. Table IV. Dependence of OC, OP , TC , TR , FR from M × N. М × N 640×480 1024×768 1600×1200 OC 2754730 7061770 16103529 OP 922554 2360826 5378634 TC [ms] 13,7736 35,3088 80,5176 TP [ms] 5,5353 14,1650 32,2718 FR 2,9860 2,9912 2.9940 TR 2,4883 2,4927 2,4950
  • 7. Optimized linear spatial filters implemented in FPGA DOI: 10.9790/4200-0701010107 www.iosrjournals.org 7 | Page Studies are carried out for k = 1 (3), tclk1 = 5ns (32) and tclk2 = 6ns (33). Although for the selected FPGA family positive Slack is reached for tclk2> tclk1, the relation TR reaches up to 83% of FR. As a result of studies carried out it is ascertained that with the increase of k increases the value of RF , and 12  kFR . The upward trend of RF is shown in Fig. 6 2,99 4,96 6,92 8,87 10,81 1 3 5 7 9 11 1 2 3 4 5 FR k Fig. 6. Dependence of RF on the size of the window k: M × N = 640 × 480 By increasing the size of the window a large number of parallel processes are realized, and this results in increased RF . IV. Conclusion In this paper, discussed is an approach aimed at optimizing the LSF using parallel algorithms based on partial sums and their hardware implementation in FPGA. Presented is an FPGA-based DSP operational block to calculate the LSF. Provided is an option for expanding this block through instantiating new MAC DSP blocks within the module. Compared are the utilized FPGA resources with the use of partial sums, and without the use of partial sums. It is ascertained that for the hardware implementation of algorithms based on the partial sums a minimum number of resources is required. A static timing analysis among the registers of the DSP operational block is realized. The critical values of the parameter Slack are ascertained. Studied is the reduction of operations in the hardware implementation of the algorithm using partial amounts. References [1] Gonzalez R., Woods R., Digital Image Processing, third edition, Prentice Hall, 2012. [2] Uwe Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays (FPGA), 4th Edition, Springer 2014. [3] Jackson L., Digital Filters and Signal Processing, with MATLAB Exercises, 3rd Edition, Springer 2010. [4] Salunkhe A., Bombale U., Optimized Implementation of Edge Preserving Color Guided Filter for Video on FPGA, IOSR Journal of VLSI and Signal Processing , e-ISSN: 2319 – 4200, Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 27-33 [5] Bailey D., Design for Embedded Image Processing on FPGAs, First Edition, John Wiley & Sons (Asia) Pte Ltd., 2011 [6] Dimitrios B. et all, An FPGA-based hardware implementation of configurable pixel-level color image fusion, IEEE Trans. Geosci. Remote Sens., 50(2), 2012, 362–373. [7] Vasilev N., Bosakova-Ardenska A., A Parallel Conveyer Algorithm for the Recursive Method of Scanning Mask for Primary Images Processing, CompSys Tech’2006, Veliko Tarnovo, Bulgaria, ISBN-13: 978-954-9641-46-2, pp IIIA.19-1 – IIIA.19-7, 2006. [8] Khan S., Digital Design of Signal Processing Systems: A Practical Approach, First Edition. John Wiley & Sons, Ltd. Published 2011. [9] Malik S. et. all, Implementation of MAC unit using booth multiplier & ripple carry adder, International Journal of Applied Engineering Research, Vol 7, No 11, 2012 [10] M. Rawski, H. Selvaraj, and T. Luba, An application of functional decomposition in ROM-based FSM implementation in FPGA devices, Journal of Systems Architecture, vol. 51, p. 424, 2005. [11] Kanev I., Comparison of Two Methods for Computing FPGA-based Weighted Average Linear Spatial Filters CompSysTech’15, ACM International Conference Proceeding Series, Vol. 1008, ACM Inc., N.Y. USA, pages 276-283, 2015.