Design of Processing Element (PE3) for Implementing Pipeline FFT Processor

International Journal on Cybernetics & Informatics (IJCI) Vol. 5, No. 4, August 2016
DOI: 10.5121/ijci.2016.5435 323
DESIGN OF PROCESSING ELEMENT (PE3) FOR
IMPLEMENTING PIPELINE FFT PROCESSOR
Mary RoselineThota,MounikaDandamudi and R.Ramana Reddy
Department of ECE, MVGR College of Engineering(A),Vizianagaram.
ABSTRACT
Multiplexing is a method by which multiple analog message signals or digital data streams are combined
into one signal over a shared medium. In communication, different multiplexing schemes are used. To
achieve higher data rates, Orthogonal Frequency Division Multiplexing (OFDM) is used due to its high
spectral efficiency. OFDM became a serious alternative for modern digital signal processing methods
based on the Fast Fourier Transform (FFT).The problems with Orthogonal subcarriers can be addressed
with FFT in communication applications. An 8-bit processing element (PE3), used in the execution of a
pipeline FFT processoris designed and presented in this paper. Simulations are carried out using Mentor
Graphics tools in 130nm technology.
KEYWORDS:
Multiplexing, OFDM, FFT processor, Mentor Graphics tools.
1. INTRODUCTION
InDiscrete Signal Processing and telecommunications, Discrete Fourier Transform (DFT) is
essential. Cooley and Tukey [1] proposed FFT to overcome the intensive computation, which has
applications involving OFDM, such as WiMAX, LTE, DSL, DAB/DVB systems, and efficiently
reduced the time complexity from O(N2
) to O (Nlog 2N), where N denotes the FFT size. Different
FFT processors developed for hardware implementation are classified as memory based and
pipeline based architectures [2-4]. Memory-based architecture (single Processing Element (PE)
approach), consists of a principal Processing Element and multiple memory units resulting in
reduced power consumption and less hardware than the pipeline architecture, but have
disadvantages like low throughput, long latency, and cannot be parallelized. Besides, the pipeline
architecture can overcome the disadvantages of the memory based architecture style, with an
acceptable hardware overhead.
Single-path Delay Feedback ( SDF )pipeline and Multiple-path Delay Commutator (MDC)
pipeline architectures are the two widely used design styles in pipeline FFT processors. SDF
pipeline FFT [2-5] requires less memory, easy to design, utilizes less than 50% of the
multiplication computation, and its control unit is used in portable devices In view of the
advantages, the Radix-2 SDF pipeline architecture is considered in implementing the FFT

324
processor. Three processing elements are used in the architecture of the proposed design of FFT
processor [1]. In this paper, design of 8-bit processing element (PE3) is implemented.
2.FFT ALGORITHM
The DFTXkof an N-point discrete-time signal xnis defined by:
∑
−
=
=
1
0
,
N
n
nk
Nnk WxX , 10 −≤≤ Nk (1)
where N
nkj
eW nk
N
π2−
= is twiddle factor.
The direct implementation of DFT is difficult to realize due to the requirement of more hardware.
Therefore, to reduce its hardware cost and speed up the computation time, FFT was developed.
By using Decimation-in-Time (DIT) or decomposition or Decimation-in-Frequency (DIF), FFT
analyzes an input signal sequence to construct a Signal-Flow Graph (SFG) that can be computed
efficiently. DIF decomposition is employed as it meets the operation of SDF pipeline architecture.
A radix-2 DIF FFT SFG for N=8 is presented in Figure1.
Figure1. Radix-2 Decimation-In-Frequency Fast Fourier Transform Signal Flow Graph for N=8.
To perform FFT computing, complex multiplication scheme [6-11] is used, as a result hardware
cost is increased due to the use of ROM and complex multipliers.
DIF FFT is suitable for hardware implementation as it has a regular SFG and requires less
complex multipliers resulting in smaller area of the chip. For example, an input signal multiplied
by W1
8 in Figure. 1 can be expressed as:
( ) ( ) ( )[ ] 221
8 yxjyxWjyx −++=+ , (2)
Where(x+ jy) denotes a complex discrete-time signal.

325
Similarly, the complex multiplication of W3
8 is given by
( ) ( ) ( )[ ] 223
8 yxjyxWjyx +−−=+ (3)
Both the equations (2) and (3) will ease hardware implementation.
From symmetric property of the twiddle factors, the complex multiplications can be one of the
following three operation types:
Type 1: ( ) ( )
( )jxyWjyxW
Nk
N
k
N −=+
−
4
24
N
k
N
<<
(4)
Type 2: ( ) ( )
( )jyxWjyxW
Nk
N
k
N +−=+
−
2
4
3
2
N
k
N
<<
(5)
Type 3: ( ) ( )jxyWjyxW
Nk
N
k
N −−=+








− 4
3
Nk
N
<<
4
3
(6)
Any twiddle factor can be obtained by combining the twiddle-factor primary elements (equations
(4-6)). The three operation types are used to find the twiddle factor required to reduce the size of
the ROM. Additional operation types are given below:
Type 4: ( ) ( )
( )
*
4



 +=+
−
jxyWjyxW
kN
N
k
N
4
1
N
k <≤
(7)
Type 5: ( ) ( )
( )
*
2



 +−=+
−
jxyWjjyxW
kN
N
k
N
24
N
k
N
<<
(8)
Where * indicates conjugate value. A significant shrinkage of twiddle- factor ROM table can be
obtained, after the third butterfly stage as the complex multiplications will be reduced by using
the five operation types.
3.ARCHITECTURE OF FFT:
A radix-2 8point pipeline FFT processor is presented in Figure 2.The architecture of the pipeline FFT
processor contains three processing elements namely,PE3, PE2 and PE1, a complex constant multiplier and
delay-line buffers. To remove the twiddle-factor ROM, a reconfigurable complex constant multiplier is
used which reduces chip area required and power consumption of FFT processor.

N
NW
8N
NW
PROCESSING ELEMENTS
The three processing elements PE1, PE2, and PE3
presented in Figures.3 to 5, respectively. The Processing Elements processes each stage of the
butterfly presented in Figure.1. PE3 stage implements a simple
the sub module for PE2 and PE1 stages.
In Figure 3, Iinand Iout denote
input and output data, respectively. Similarly,
DL_Qinand DL_Qoutare for the imaginary part
respectively. The multiplication by
the input value, multiplication by
Compared to PE2 stage, calculation
multiplications by –j, and
multiplication by followed by
calculation can be done. The cascaded calculations along with multiplexers are used in
calculations and forms a low -cost hardware by saving a bit
computing
.
Figure 3. Architecture of PE3
Figure 2
8 8N
NW 3N
NW
processing elements PE1, PE2, and PE3 of the radix-2 pipeline FFT processor are
butterfly presented in Figure.1. PE3 stage implements a simple radix-2 butterfly, and
PE2 and PE1 stages.
the real parts, and Qin and Qoutare the imaginary parts of the
input and output data, respectively. Similarly, DL_Iinand DL_Iout stand for the real parts and
are for the imaginary parts of input and output of the DL buffers,
respectively. The multiplication by –j or 1 is required for PE2 stage. By taking 2’s complement of
the input value, multiplication by -1 in Figure.4 can be done practically.
calculations in PE1 stage are more complex, as it computes the
j, and respectively. Since =- j,
followed by multiplication with –j or the reverse of the previous
calculation can be done. The cascaded calculations along with multiplexers are used in
cost hardware by saving a bit-parallel multiplier for
Figure 3. Architecture of PE3 Figure 4. Architecture of PE2
Figure 2. Radix-2 8 point pipeline FFT processor.
83N
NW
83N
NW
326
8N
2 pipeline FFT processor are
, and functions as
the imaginary parts of the
stand for the real parts and
s of input and output of the DL buffers,
or 1 is required for PE2 stage. By taking 2’s complement of
more complex, as it computes the
either the
or the reverse of the previous
calculation can be done. The cascaded calculations along with multiplexers are used in PE1 stage
multiplier for

4. PROCESSING ELEMENT
PE3 is the main component in FFT processor as it serves as the sub module for PE2 and PE1
stages. It processes the stage P=3 of the
Hardware implementation of PE3 employs a ten transistor adder and a multiplexer.1
PE3 elements are presented in Figure. 6 and 7 respectively.
Figure 6.Schematic of 1-bit PE3
Figure 5
LEMENT(PE3)
stages. It processes the stage P=3 of the radix-2 8 point DIF FFT butterfly structure in Figure1.
Hardware implementation of PE3 employs a ten transistor adder and a multiplexer.1
PE3 elements are presented in Figure. 6 and 7 respectively.
bit PE3. Figure 7.Schematic of 8
Figure 5.Architecture of PE1.
327
2 8 point DIF FFT butterfly structure in Figure1.
Hardware implementation of PE3 employs a ten transistor adder and a multiplexer.1-bit and 8-bit
Figure 7.Schematic of 8-bit PE3.

328
5. RESULTS
PE3 element is simulated with ELDO software in Mentor Graphics. The simulated waveforms of
1-bit and 8-bit PE3 are shown in figure 8 and figure 9-10 respectively.
Figure 8. 1-bit PE3 simulated waveforms.
PE3 element processes the stage P=3 of theradix-2 DIF-FFT . It takes Input data (Iin) and Delay
Output(DL_Iout) as the inputs and gives the Output data(Iout) and Input Delay to the next
buffer(DL_Iin) based on the selection line of the multiplexer.
When S0=0 DL_Iin = Iin (9)
Iout = DL_Iout (10)
S0=1 DL_Iin = DL_Iout – Iin (11)
Iout = = DL_Iout + Iin. (12)
From Figure 8,
When So=0, Inputs are Iin= 1010 ; Dl_Iout=0001 then outputs are Dl_Iin=1010 ; Iout = 0001
When So=1, Inputs are Iin=1000 ; Dl_Iout=1011 then outputs are Dl_Iin=0011; Iout=0011
Figure 9Input waveforms of 8-bit PE3.

329
Figure 10 Output waveforms of 8-bit PE3.
The power dissipation (from the E-Z wave)of 1-bit PE3 is 0.5517 mwatts and for 8-bit PE3 it is
0.9237mwatts.

330
6. CONCLUSIONS
The pipelined FFT architecture contains three processing elements PE1, PE2, PE3. PE3 is the
important element as it serves as a sub module to the other two processing elements PE2 and
PE1.PE3 (1- bit and 8-bit) is implemented using Mentor Graphics tools and the power dissipation
is observed. To implement the proposed pipelined architecture of FFT, PE2 and PE1 are to be
further designed.
REFERENCES
[1] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,”
Math. Comput., Vol. 19, pp. 297- 301, Apr. 1965.
[2] S.-Y. Peng, K.-T. Shr, C.-M. Chen, and Y.-H. Huang,“Energy-efficient128∼2048/1536-point
FFTprocessor with resource block mapping for 3 GPP-LTE system,” in Proc. Int. Conf. Green
Circuits Syst.,Jun. 2010.
[3] Nilesh Chide, ShreyasDeshmukh, Prof. P.B. Borole ,” Implementation of OFDM System using IFFT
and FFT”, International Journal of Engineering Research and Applications (IJERA), Vol. 3, Issue 1,
January -February 2013, pp.2009-2014
[4] Taewon Hwang, Chenyang Yang, Gang Wu, Shaoqian Li, and Geoffrey Ye Li,” OFDM and Its
Wireless Applications: A Survey”, IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY,
VOL. 58, NO. 4, MAY 2009.
[5] Lokesh C, Dr. Nataraj K. .,” Implementation of an OFDM FFT Kernel for WiMAX”, International
Journal Of Computational Engineering Research, Vol. 2 Issue. 8, Dec. 2012.
[6] Chua-Chin Wang, Jian-Ming Huang, and Hsian-Chang Cheng, “A 2K/8K mode small-area FFT
processor for OFDM demodulation of DVB-T receivers,” IEEE Transactions on Consumer
Electronics, Vol. 51, no. 1, pp. 28-32, Feb. 2005.
[7] C. P. Hung, S. G. Chen, and K. L. Chen, “Design of an efficient variable-length FFT processor,”
Proceedings of the 2004 International Symposium on Circuits and Systems, vol. 2, pp. 23–26, May
2004.
[8] KoushikMaharatna, Eckhard Grass, and Ulrich Jagdhold, “A 64-Point Fourier transform chip for
high-speed wireless LAN application using OFDM,” IEEE Journal of Solid-State Circuits, Vol. 39,
no. 3, pp. 484- 493, Mar. 2004.
[9] Yu-Wei Lin and Chen-Yi Lee,” Design of an FFT/IFFT Processor for MIMO OFDM Systems”, IEEE
Transactions on circuits and systems—I, VOL. 54, NO. 4, APRIL 2007.
[10] Hsii-Fu Lo; Ming-Der Shieh; Chien-Ming Wu, “Design of an efficient FIT processor for DAB
system”, IEEE International Symposium on CircuiB and Systems, Volume: 4, May 2001.
[11] P. DivakaraVarma, Dr. R. Ramana Reddy, “A novel 1-bit full adder design using DCVSL XOR /
XNOR gate and Pass transistor Multiplexers” in International Journal of Innovative Technology and
Exploring Engineering (IJITEE) ISSN: 2278-3075,Volume-2, Issue-4, March 2013 pp: 142-146

331
AUTHORS
Mary RoselineThota received B.Tech. degreein ECE from GVP College of Engineering
for Women in 2014. Pursing M.Tech(VLSI) in MVGR College of Engineering. Research
interest includes VLSI design methodologies.and Low power VLSI design
MounikaDandamudireceived B.Tech. degree in ECE from Chirala Engineering College in
2014. Pursing M.Tech(VLSI) in MVGR College of Engineering. Research interest
includes VLSI design methodologies and Low power VLSI design.
Dr. R. Ramana Reddydid AMIE in ECE from The Institution of Engineers(India) in 2000,
M.Tech (I&CS) from JNTU College of Engineering, Kakinadain 2002, MBA (HRM &
Marketing) from Andhra University in 2007 and Ph.Din Antennas in 2008 from Andhra
University. He is presently working asProfessor & Head, Dept. of ECE in MVGR College
of Engineering,Vizianagaram. Coordinator, Center of Excellence – Embedded Systems,
Head,National Instruments Lab VIEW academy established in Department of ECE, MVGR College
ofEngineering. Convener of several national level conferences and workshops.Published about 70 technical
papers in National/International Journals / Conferences. He is a member of IETE,IEEE, ISTE, SEMCE (I),
IE, and ISOI. His research interests include Phased Array Antennas,Slotted Waveguide Junctions,
EMI/EMC, VLSI and Embedded Systems.

Design of Processing Element (PE3) for Implementing Pipeline FFT Processor

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Design of Processing Element (PE3) for Implementing Pipeline FFT Processor (20)

Recently uploaded (20)

Design of Processing Element (PE3) for Implementing Pipeline FFT Processor