

(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

Organized by

Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

# Low Power Consuming FFT Pipelined Processor

# A.Soundrakumar<sup>1</sup>

ME-VLSI Design, Shreenivasa Engineering College, B.Pallipatti, Dharmapuri, Tamilnadu, India<sup>1</sup>

**Abstract:** Mobile WiMAX (Worldwide Interoperability for Microwave Access) or 802.16e standard was ratified by the IEEE in late 2005 as a potential to emerge as a real viable competitor to existing 3G technologies. Mobile WiMAX uses an OFDMA<sup>TM</sup> technology called 1K-FFT. Orthogonal Frequency-Division Multiple Access (OFDMA) is a multi-user version of the popular Orthogonal frequency-division multiplexing (OFDM) digital modulation scheme. In the widely used OFDM systems, the FFT and IFFT pairs are used to modulate and demodulate the data constellation on the subcarriers. This paper presents a high level implementation of a high performance FFT for OFDM Modulator and Demodulator. The design has been coded in Verilog and targeted into Xilinx Spartan3 FPGAs. Radix-22 Algorithm is proposed and used for the OFDM communication system. This algorithm has the same multiplicative complexity as the radix-4 algorithm, but retains the butterfly structure of radix-2 algorithm.

Key Words: Radix 2<sup>2</sup> algorithm, Fast Fourier Transform, Orthogonal Frequency Division Multiplexing, Mobile WiMAX.

## I. INTRODUCTION

Discrete Fourier transform (DFT) is a very important technique in modern digital signal processing (DSP) and telecommunications, especially for applications in orthogonal frequency demodulation multiplexing (OFDM) systems, such as IEEE 802.11a/g, Worldwide Interoperability for Microwave Access (WiMax), Long Term Evolution (LTE), and Digital Video Broadcasting—Terrestrial (DVB-T).

For hardware implementation, various FFT/IFFT processors have been proposed. These implementations can be mainly classified into memory-based and pipeline architecture styles. Memory-based architecture is widely adopted to design an FFT/IFFT processor, also known as the single processing element (PE) approach. This deign style is usually composed of a main PE and several memory units, thus the hardware cost and the power consumption are both lower than the other architecture style. However, this kind of architecture style has long latency, low throughput, and cannot be parallelized.

On the other hand, the pipeline architecture style can get rid off the disadvantages of the foregoing style, at the cost of an acceptable hardware overhead. Generally, the pipeline FFT/IFFT processors have two popular design types. One uses single-path delay feedback (SDF) pipeline architecture and the other uses multiple-path delay commutator (MDC) pipeline architecture. The single-path delay feedback (SDF) pipeline FFT/IFFT is good in its requiring less memory space (about N-1 delay elements) and its multiplication computation utilization being less than 50%, as well as its control unit being easy to design. Such implementations are advantageous to low-power design, especially for applications in portable DSP devices. Based on these reasons, the SDF pipeline FFT/IFFT is adopted in our work.

However, the FFT/IFFT computation often needs to multiply input signals with different twiddle factors for an outcome, which results in higher hardware cost because a large size of ROM is needed to store the wanted twiddle factors. Therefore, to throw off these ROM's for area-efficient consideration.

The complex multipliers used in the processor are realized with shift-and-add operations. Hence, the processor uses only a two-input digital multiplier and does not need any ROM for internal storage of coefficients. However, low speed and higher hardware cost caused by the proposed complex multiplier are the pay-off.

The modulation and demodulation of OFDM based communication systems can be efficiently implemented with an FFT/IFFT, which has made the FFT/IFFT valuable for those communication systems. The OFDM based communication systems have high performance requirement in both throughput and power consumption. This performance requirement



(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

#### Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

## Organized by

# Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

necessitates an application-specific integrated circuit (ASIC) solution for FFT/IFFT implementation. This thesis addresses the problem of designing efficient application-specific FFT/IFFT processors for OFDM based wide-band communication systems.

#### II. EXISTING SYSTEM

Memory-based architecture is widely adopted to design an FFT processor, also known as the single processing element (PE) approach. This deign style is usually composed of a main PE and several memory units, thus the hardware cost and the power consumption are both lower than the other architecture style.

The pipeline FFT processors have two popular design types.

- 1. Single-path delay feedback (SDF) pipeline architecture,
- 2. Multiple-path delay commutator (MDC) pipeline architecture.

The single-path delay feedback (SDF) pipeline FFT is good in its requiring less memory space (about N-1 delay elements) and its multiplication computation utilization being less than 50%, as well as its control unit being easy to design. The FFT computation often needs to multiply input signals with different twiddle factors for an outcome, which results in higher hardware cost because a large size of ROM is needed to store the wanted twiddle factors.

- a. DISADVANTAGES
- Long latency
- Low throughput

## III. PROPOSED SYSTEM

We propose efficient radix-2 pipeline architecture with low power consumption for the FFT/IFFT processor. Our proposed architecture includes a reconfigurable complex constant multiplier and bit-parallel complex multipliers instead of using ROM's to store twiddle factors, which is suited for the power-of-2 radix style of FFT/IFFT processors.

The complex multipliers used in the processor are realized with shift-and-add operations. So the processor uses only a two-input digital multiplier and does not need any ROM for internal storage of coefficients.

#### **3.1 ADVANTAGES**

- Low speed and higher hardware cost caused by the proposed complex multiplier are the pay-off.
- A smart structure for ROM-size reduction to produce twiddle factors as well as to compact the chip area.
- The pipeline architecture style can get rid of the disadvantages of the foregoing style, at the cost of an acceptable hardware overhead.

• Generally, Such implementations are advantageous to low-power design, especially for applications in portable DSP devices. Based on these reasons, the SDF pipeline FFT is adopted in our work.

#### 3.2MODULE DESCRIPTION

The system architecture consists of 4 modules. The modules are

- 1. Bit Parallel Multiplier
- 2. Complex multiplier.
- 3. Reconfigurable complex constant multiplier.
- 4. Three different types of Process elements (PE)

## BIT PARALLEL MULTIPLIER

The multiplication by 1/ Sqrt(2) can employ a bit parallel multiplier to replace the word length multiplier and square root evaluation for chip area reduction. The bit-parallel operation in terms of power of 2 is given by

Output=in\*
$$\sqrt{2}/2=in*(2^{-1}+2^{-3}+2^{-4}+2^{-6}+2^{-8}+2^{-14})$$



(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

## Organized by

# Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

This above equation will produce poor precision due to truncation error. So the above equation can be rewritten for to achieve hardware cost.

Output=in\* $\sqrt{2}/2=in*[1+(1+2^{-2})(2^{-6}-2^{-2})]$ 

The structure of the bit parallel multiplication given by



Fig.1: Circuit diagram of the bit-parallel multiplication by 1/ sqrt (2)

The resulting circuit uses three additions and three barrel shift operations. The realization of complex multiplication by  $w_N^{N/8}$  using a radix-2 butterfly structure with its both outputs commonly multiplied by 1/ sqrt(2). This bit parallel multiplier used in multiplication at process element (PE1) stage of the FFT/IFFT processor.

# BIT PARALLEL MULTIPLIER CONDITIONS

We need not to use bit-parallel multipliers to replace the word length one for two reasons. One is on the operation rate. If bit-parallel multipliers are used, the clock rate is decreased due to the many cascaded adders. The other reason is the introduction of high wiring complexity because many bit-parallel multipliers are required to be switched for performing multiplication operations with different twiddle factors.

#### COMPLEX MULTIPLIER

Complex multiplier can be constructed with the bit parallel multiplier in order to produce the twiddle factors for FFT/IFFT operations. Complex multiplier consists of circuit switch and radix 2 architecture and bit parallel multiplier. The architecture of constant multiplier is given by



Fig.2: Complex Multiplier



(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

## Organized by

# Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

This circuit is responsible for the computation of multiplication by a twiddle factor W <sup>i</sup><sub>64.</sub> The word length multiplier used in Fig. adopts a low-error fixed width modified Booth multiplier for hardware cost reduction. The coefficient values i1-i8 and q1-q8 are listed in Table 1, which can be used to synthesize the entire twiddle factors required in our proposed 64-point FFT processor.

#### RECONFIGURABLE COMPLEX CONSTANT MULTIPLIERS

A reconfigurable low-complexity complex constant multiplier for computing W i 64 is proposed. This structure of this complex multiplier also adopts a cascaded scheme to achieve low-cost hardware. Here, the meaning of two input signals (Iin and Iout) and two output signals (Qin and Qout) are the same as the signals in the PE1 stage. The structure of Reconfigurable complex constant multiplication is given by



Fig.3: Reconfigurable complex constant multiplier

#### **3.3 PROCESS ELEMENTS**

Based on the radix-2 FFT algorithm, the three types of processing elements (PE3, PE2, and PE1) used in our design are illustrated respectively. The functions of these three PE types correspond to each of the butterfly stages as shown in Fig.

#### PE3 STAGE

The PE3 stage is used to implement a simple radix-2 butterfly structure only, and serves as the sub modules of the PE2 and PE1 stages. In the figure, lin and lout are the real parts of the input and output data, respectively.

Oin and Oout denote the image parts of the input and output data, respectively. Similarly, DL Iin and DL Iout stand for the real parts of input and output of the DL buffers, and DL Qin and DL Qout are for the image parts, respectively.



Fig.4: PE3 Stage circuit



(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

## Organized by

Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

#### PE2 STAGE

The PE2 stage is required to compute the multiplication by -j or 1. Note that the multiplication by -1 in Fig is practically to take the 2's complement of its input value.



#### PE1 STAGE

The PE1 stage calculation is more complex than the PE2 stage, which is responsible for computing the multiplications by -j,  $W^{N/8}{}_N$ , and  $W^{3N/8}{}_N$  respectively. Since  $W^{3N/8}{}_N = -j W^{N/8}{}_N$  it can be given by the multiplication by  $W^{N/8}{}_N$  first and then the multiplication by -j or the reverse of the previous calculation. Hence, the designed hardware utilizes this kind of cascaded calculation and multiplexers to realize all the necessary calculations of the PE1 stage. This manner can also save a bit-parallel multiplier for computing  $W^{3N/8}{}_N$ , which further forms a low-cost hardware



Fig.6: PE1 Stage



(An ISO 3297: 2007 Certified Organization)

Vol.2, Special Issue 1, March 2014

Proceedings of International Conference On Global Innovations In Computing Technology (ICGICT'14)

## Organized by

Department of CSE, JayShriram Group of Institutions, Tirupur, Tamilnadu, India on 6<sup>th</sup> & 7<sup>th</sup> March 2014

#### **3.4 SYSTEM ARCHITECTURE**

The proposed architecture is composed of three different types of processing elements (PEs), a complex constant multiplier, delay-line (DL) buffers (as shown by a rectangle with a number inside), and some extra processing units for computing IFFT.





Here, the conjugate for extra processing units is easy to implement, which only takes the 2's complement of the imaginary part of a complex value. The divided-by-64 module can be substituted with a barrel shifter. In addition, for a complex constant multiplier in Fig., we propose a novel reconfigurable complex constant multiplier to eliminate the twiddle-factor ROM. This new multiplication structure thus becomes the key component in reducing the chip area and power consumption of our proposed FFT/IFFT processor.

## IV. CONCLUSION

We have proposed a memory based recursive FFTdesign which has much less gate counts, lower powerconsumption and higher speed. The proposed architecturehas three main advantages (1) fewer butterfly iteration toreduce power consumption, (2) pipeline of radix-22butterfly to speed up clock frequency, (3) even distribution femory access to make utilization efficiency in SRAM ports. In summary, the speed performance of our design easilysatisfies most application requirements of mobile WiMAX802.16e, which uses OFDMA modulated wirelesscommunication system. Our design also occupies lesserarea, hence lower cost and power consumption.

#### REFERENCES

[8] KoushikMaharatna, Eckhard Grass, and Ulrich Jagdhold, "A 64-Point fourier transform chip for high-speed wireless LAN application using OFDM," IEEE Journal of Solid-State Circuits, vol. 39, no. 3, pp. 484- 493, Mar. 2004.

<sup>[1]</sup> IEEE Std 802.11a, 1999, "Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-Speed Physical Layer in the 5 GHz band."

<sup>[2]</sup> IEEE 802.16, IEEE Standard for Air Interface for Fixed Broadband Wireless Access Systems, the Institute of Electrical and Electronics Engineers, Inc., June 2004.

<sup>[3] 3</sup>GPP LTE, "Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation" 3GPP TS 36.211 v8.5.0, 2008-12.

<sup>[4]</sup> ETSI, "Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television," ETSI EN 300 744 v1.4.1, 2001.

 <sup>[5]</sup> J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput., vol. 19, pp. 297-301, Apr. 1965.
[6] S. He and M. Torkelson, "Designing Pipeline FFT Processor for OFDM (de)Modulation," in Proc. URSI Int. Symp. Signals, Systems, and Electronics, vol. 29, Oct.1998, pp. 257-262.

<sup>[7]</sup> H.L. Groginsky and G.A. Works, "A pipeline fast Fourier transform," IEEE Transactions on Computers, vol. C-19, no. 11, pp. 1015-1019, Nov. 1970.