

International Journal of Innovative Research in Science, Engineering and Technology

Volume 3, Special Issue 3, March 2014

2014 International Conference on Innovations in Engineering (ICIET'14) On 21<sup>st</sup> & 22<sup>nd</sup> March Organized by

K.L.N. College of Engineering, Madurai, Tamil Nadu, India

# Synthesis and Implementation of 3D IIR Filter As a Processing Element of Systolic Array Architecture

Shantha Selva Kumari R<sup>#1</sup>, Vishnu Priya M<sup>#2</sup>

<sup>#1</sup> Department Of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi, India.

<sup>#2</sup> Department Of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi, India.

ABSTRACT—The parallel processing systolic array architecture is designed for the real time VLSI spatio temporal 3D Infinite Impulse Response (IIR) frequency planar filter to achieve high throughput of one frame per clock cycle (OFPCC). To reduce the circuit complexity by designing the architecture, that is based on differential form transfer function of a 3D IIR frequency planar filter. The 3D Look Ahead(LA) form of the transfer functions along with retiming techniques is used to maximize the speed of the architecture. This array architecture is used for a real-time implementation of 3D IIR frequency planar filters, which is operating at radio frequency frame rate. This 3D IIR frequency planar filter acts as a building block for 3D IIR digital filters having beam- and coneshaped pass bands, which is required for smart antenna array beamforming applications. The proposed 7X7 systolic array architecture of 3D IIR frequency planar filter is synthesized and implemented on Virtex5 xc5vlx50tff1136-1 FPGA device and its achieves maximum operating frequency of 109.016MHz.

**KEYWORDS**— Systolic array, Multidimensional signal processing, Frequency Planar filter, 3D LA Techniques, Retiming Techniques.

## **I.INTRODUCTION**

A Systolic array is a specialized form of parallel computing architecture. The 3D first order IIR Frequency planar digital filter is the processing cell of the systolic array architecture. This architecture is designed for the real time high throughput of OFPCC. First order 3-D frequency planar digital filter have wellknown application as a building-block for 3-D broadband sensor-array beamformer [1], [2], [9]. The 3-D region of support(ROS) of the spectrum of an ideal 3-D broadband plane wave lies on the line through the frequency origin  $\omega \equiv (\omega_1, \omega_2, \omega_3) = (0, 0, 0)$  and the direction of this line in  $\omega$  is equal to the direction of arrival (DOA) of the plane wave in the spatio-temporal domain [3],[4],[5]. The architecture proposed for a 2-D spatial array of antenna.

The signals from the antenna are amplified by using low noise amplifier and then filter the desired signal by a low pass filter. The analog signal is converted into the digital signal using an analog to digital converter (ADC). The digital signal is given as input for processing element. Similarly each processing element get the input from an each antenna. The architecture is designed based on the differential form and is of low circuit complexity compare with the direct form architecture [6]. The speed is maximized by the 3-D LA form of the transfer function along with retiming techniques.

## II. DESIGN OF 3D IIR FREQUENCY PLANAR FILTER

The required z domain input output transfer function [11], [12], [6] of the 3-D IIR frequency-planar filter is given by

$$H(z) = \frac{\sum_{i=0}^{1} \sum_{j=0}^{1} \sum_{k=0}^{1} z_{1}^{-i} z_{2}^{-j} z_{3}^{-k}}{1 + \sum_{i=0}^{1} \sum_{j=0}^{1} \sum_{k=0}^{1} b_{ijk} z_{1}^{-i} z_{2}^{-j} z_{3}^{-k}}$$
(1)

Where  $b_{ijk} = \frac{(R+(-1)^{i}L_1+(-1)^{j}L_2+(-1)^{k}L_3)}{(R+L_1+L_2+L_3)}$  are direct form feedback coefficients.

The required algebraic decomposition of (1) that yields an architecture having the desired high throughput of OFPCC. For this purpose, the first order 1-D differentiator having 1-D z-domain transform transfer functions [13]

$$Y'_{k}(z_{k}) = \frac{z_{k}^{-1}}{1 + z_{k}^{-1}}Y(z), k = 1, 2, 3$$
<sup>(2)</sup>

Where  $z_1^{-1}, z_2^{-1}$  are the horizontal and vertical spatial delay operators respectively and  $z_3^{-1}$  is the temporal delay operator. Equation (2) is to be decomposed in the differentiator form [7], [8],

$$H(z) \equiv \frac{Y(z)}{X(z)} = \frac{1}{1 - \sum_{k=1}^{3} \alpha_k \frac{z_k^{-1}}{1 + z_k^{-1}}}$$
(3)

## **Copyright to IJIRSET**

## www.ijirset.com

657

#### M.R. Thansekhar and N. Balaji (Eds.): ICIET'14

-- - >

Where  $\alpha_k$  be the feedback coefficients, is given by  $\alpha_k = \frac{2L_k}{R+L_1+L_2+L_3}$ , R>0,  $L_k \ge 0$ , K=1,2,3.

### A. Optimizing the Speed by using LA and Retiming **Techniques**

The direct form denominator of the input output transfer function (1) of the filter is non-separable for these frequency planar filters. It is well known that such 3D polynomial cannot generally be factored. Therefore 3D poles surface of the transfer function of the 3D frequency planar filter is not found. As a result, the 1D LA techniques is not applicable for the non-separable multidimensional transfer function.

By decomposing the transfer function of a 3D first order filter using differential operator for applying LA speed optimizations only in the direction of the temporal recursion.

This is not applicable for higher order or higher dimensional transfer function.

Cross-Multiplying the term, simplifying (3) and using (2), we obtain

$$Y(z) = \frac{X(z) + \alpha_1 Y_1'(z_1) + \alpha_2 Y_2'(z_2)}{1 - \alpha_3 \frac{z_3^{-1}}{1 + z_3^{-1}}}$$
(4)

--/ - >

Where

X(z) is the 3D input in the z transform

Y(z) is the 3D output in the z transform

By further simplifying (4), we obtain the following decomposition of (4) as required for speed maximization using LA and pipelining:

$$Y(z) = \left(\frac{X(z) + \sum_{k=1}^{2} \alpha_k Y'_k(z_k)}{1 + (1 - \alpha_3) z_3^{-1}}\right) (1 + z_3^{-1})$$
(5)

Rearranging (5) the proposed 3-D LA form

$$Y(z) = \left(X(z) + \sum_{k=1}^{2} \alpha_k \frac{z_k^{-1}}{1 + z_k^{-1}} Y_k(z_k)\right) T(z_3)$$
(6)

Where  $T(z_3)$  is the sub filter transfer function.



Fig .1. Alternative implementations of resulting of the subfilter ; (i) Without LA speed optimization; (ii) First order LA speed optimization; (iii) First order LA along with retiming for speed optimization.

$$T(z_3) \equiv \frac{(1+z_3^{-1})}{(1+\beta z_3^{-1})} \equiv \frac{X_0(z_3)}{X_l(z_3)}$$
(7)

Where

 $X_1(z_3)$  is the z transform input to the subfilter  $T(z_3)$  $X_0(z_3)$  is the z transform output to the subfilter  $T(z_3)$ 

The denominator of (7) is separable transfer function and it is allow us to employ conventional time domain LA method.

The transfer function of the sub filter  $T(z_3)$  is implemented using different number of adder, multiplier and subtractor circuit with different critical path delay. Stable first-order recursive filter may be fully pipelined by adding delay inside the feedback path, employing the known method LA speed optimization [10],[6]. LA speed optimization of order K in a first order recursive filter by adding K delay in the feedback path, which leads to reduce the critical path delay is given by  $\approx T_{CPD}^o/(k+1)$  where  $T_{CPD}^o$  is the critical path delay without LA speed optimization [10],[6].

The subfilter  $T(z_3)$  is shown in Fig. 1(i). has no LA speed optimization. The critical path of  $T(z_3)$  without LA is  $T_{CPD}^o \approx T_{MUL} + T_{A/S}$  where  $T_{MUL}$  and  $T_{A/S}$  corresponds to the logic delays in a parallel multiplier and adder/subtractor respectively. The maximum clock frequency is therefore given by  $F_{CLK}^o = 1/T_{CPD}^o$  Hz. The first order LA is obtained by multiplying both numerator and denominator by  $(1 - \beta z_3^{-1})$ , leading to

$$T(z_3) = \frac{(1 - \beta z_3^{-1})}{(1 - \beta^2 z_3^{-2})} (1 + z_3^{-1})$$
(8)

The alternative circuit of transfer function (8) shown in Fig. 1(ii). The critical path delay of the circuit is

## **Copyright to IJIRSET**

www.ijirset.com

659

#### M.R. Thansekhar and N. Balaji (Eds.): ICIET'14

reduced  $T_{CPD}^1 \approx T_{CPD}^o/2$  and is therefore capable of approximately twice the throughput which is given by  $F_{CLK}^1 = 2F_{CLK}^o$ .

Retiming is a transformation techniques used to change the location of delay element without affecting the input/output characteristics of the circuit [10], [15]. Retiming techniques along with first order LA is used to increase speed of the circuit by reducing the critical path delay. By changing the position of the delay of the first order LA transfer function reduce the critical path of the circuit which is shown in the Fig. 1(iii).

## B. The Differential Form of 3D IIR Frequency Planar Filter

The spatial 2-D inverse z-transform of (6) under Zero Initial Conditions, gives the mixed domain equation in the 3-D variable  $(n_1, n_2, z_3)$ :

$$Y(n_1, n_2, z_3) = \left(X(n_1, n_2, z_3) + \sum_{K=1}^{2} \alpha_K Y_K'(n_1, n_2, z_3)\right) T(z_3)$$
(9)



Fig .2 The differential form of 3D IIR frequency planar filter.

## Where

$$Y'_{1}(n_{1}, n_{2}, z_{3}) = Y(n_{1} - 1, n_{2}, z_{3}) -Y'_{1}(n_{1} - 1, n_{2}, z_{3})$$
(10)  
$$Y'_{2}(n_{1}, n_{2}, z_{2}) = Y(n_{1}, n_{2} - 1, z_{2})$$

$$\begin{array}{c} r_{2}(n_{1}, n_{2}, z_{3}) = r(n_{1}, n_{2} - 1, z_{3}) \\ - Y_{2}'(n_{1}, n_{2} - 1, z_{3}) \end{array}$$
(11)

 $X(n_1, n_2, z_3)$  is the input in the mixed domain.  $Y(n_1, n_2, z_3)$  is the output in the mixed domain.

By converting the equation (6) implemented as differential form gives the difference equation are as follows

$$y(n) = x(n) + \sum_{k=1}^{3} \alpha_k y'_k(n)$$
(12)

Where

x(n) is the Synchronously sampled 3-D input signal.

Copyright to IJIRSET

#### www.ijirset.com

proposed architecture by employing a novel 3D LA speed optimization method.

## **IV.IMPLEMENTATION RESULT AND DISCUSSION**

The 3D IIR frequency planar filter (3) is designed and 660

M.R. Thansekhar and N. Balaji (Eds.): ICIET'14

y(n) is the Synchronously sampled 3-D output signal.

$$y'_1(n) = y(n_1 - 1, n_2, n_3) - y'_1(n_1 - 1, n_2, n_3)$$
 (13)

$$y'_{2}(n) = y(n_{1}, n_{2} - 1, n_{3}) - y'_{2}(n_{1}, n_{2} - 1, n_{3})$$
 (14)

$$y'_{3}(n) = y(n_{1}, n_{2}, n_{3} - 1) - y'_{2}(n_{1}, n_{2}, n_{3} - 1)$$
 (15)

The differential form (12) of a 3D IIR spatio-temporal is shown in Fig. 2. The circuit complexity is reduced by implementing in differential form compared to the direct form [1], [6].

#### **III.SYSTOLIC ARRAY ARCHITECTURE**

The Systolic array architecture represents a network of processing element arranged in a regular manner [10]. The processing element here is a 3D IIR frequency planar filter. The design of the systolic array of a 3D IIR filter is shown in Fig. 3. The architecture is designed for the throughput of OFPCC. Here the frame refers to  $N_1 \times N_2$  set of data samples obtained at each time sample. This architecture consists of an interconnected array of identical synchronous parallel processing core modules. The architecture of the 3D IIR frequency planar filter is the building block of cone shaped filter or beam filter for antenna beamforming.



Fig. 3. Systolic array architecture of 3-D IIR frequency planar filter

implementations, the direct-form realization is not the

best choice because it leads to high VLSI resource

consumptions, low computational throughput due to high

complexity, as well as higher CPDs. Here, the circuit

complexity is reduced by employing an alternative

differential-form realization inside the PPCMs. The throughput limitations in [14] are much improved in the

Each PPCM essentially implements (1) in real time for a particular spatial location. The straightforward directform I implementation of (1) inside each PPCM is described in [14],[6]. However, for high-speed

it is simulating using matlab and the frequency response is shown in Fig 4.

The systolic array of a 3D frequency planar filter is designed. A 3D IIR frequency planar filter is designed based on the differential form of transfer function. The critical path delay of the subfilter  $T(z_3)$  is reduced by the



Fig. 4. Frequency response of 3D IIR Frequency planar filter.

first order LA techniques and retiming techniques. The

programming was done in Modelsim 6.4a and synthesized using PlanAhead 13.4. The systolic array architecture was implemented in the Xilinx Virtex-5 xc5vlxtff1136-1.

The critical path delay of subfilter transfer function is reduced by LA techniques along with retiming techniques is implemented and its RTL diagram and the FPGA editor is shown in Fig. 5.

The differential form of 3D frequency planar filter is implemented and its RTL diagram and the FPGA editor are shown in Fig. 6.

Table I. The design summary shows the resources used for the subfilter  $T(z_3)$ , differential form of 3D IIR frequency planar filter and architecture and 7×7 systolic array architecture. From the Table I. it's infer that the speed is optimized using LA along with retiming techniques for the subfilter  $T(z_3)$ . Hence the critical path is reduced by implementing the subfilter  $T(z_3)$  using LA along with retiming techniques. This subfilter  $T(z_3)$  design is used as building block for the systolic array architecture.

| DESIGN SUMMARY FOR SYSTOLIC ARRAY ARCHITECTURE                   |            |                       |          |                            |                            |
|------------------------------------------------------------------|------------|-----------------------|----------|----------------------------|----------------------------|
| Hardware Modules                                                 | Multiplier | Adder /<br>Subtractor | Register | Total On Chip<br>Power(mW) | Critical Path<br>Delay(ns) |
| Without LA Subfilter $T(z_3)$                                    | 1          | 4                     | 2        | 444                        | 5.000                      |
| First Order LA Form Subfilter $T(z_3)$                           | 2          | 6                     | 4        | 488                        | 3.191                      |
| Retiming Along With First<br>Order LA Form Subfilter<br>$T(z_3)$ | 2          | 6                     | 4        | 488                        | 2.542                      |
| Differential Form 3D IIR<br>Frequency Planar Filter<br>(PPCM)    | 4          | 14                    | 10       | 527                        | 4.939                      |
| 7×7 Systolic Array<br>Architecture                               | 196        | 686                   | 1078     | 826                        | 9.173                      |

TABLE I ESIGN SUMMARY FOR SYSTOLIC ARRAY ARCHITECTUR



Fig. 5. RTL & FPGA Editor of First order LA along with retiming for speed optimization of the subfilter.



Fig.6. RTL & FPGA Editor of PPCM

#### **V.CONCLUSION**

By implementing the architecture in the differential form, the circuit complexity is reduced compared to the direct form architecture. A 3D LA technique is used to optimize the speed. The first order LA technique along with retiming techniques gives the optimized result for the subfilter transfer function  $T(z_3)$ . Thus the proposed systolic array architecture of a 3D frequency planar filter achieved the throughput of OFPCC which is required for the real time VLSI application.

The  $7 \times 7$  array of a 3-D IIR frequency planar filter is implemented using Xilinx Virtex-5 xc5vlxtff1136-1

device. The maximum clock frequency of the device is about 109.016MHz. The 3D IIR frequency planar filter is the building block for the 3D IIR beam or 3D cone filter banks, which is required for smart antenna beamforming application. Thus the architecture can be further improved for the 3D IIR beam and cone filter VLSI DPP implementations at OFPCC throughput, for high- speed applications.

#### REFERENCES

- L. T. Bruton, "A 3-D polyphase-DFT cone filter bank for broad band plane wave filtering," in Proc. IEEE Int. Symp. Circuits Syst., Vancouver, BC, Canada, 2004, vol. 3, pp. 181–184.
- [2] B. Kuenzle and L. T. Bruton, "3-D IIR filtering using decimated DFT polyphase filter bank structures," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 2, pp. 394–408, Feb. 2006.
- [3] L. T. Bruton, "Three-dimensional cone filter banks," IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 2, pp. 208– 216, Feb.2003.
- [4] A. C. Tan and H. Sun, "Structurally passive synthesis of threedimensional recursive cone filter," in Proc. 32nd Midwest Symp. Circuits Syst., Aug. 1989, vol. 2, pp. 1119–1122.
- [5] M. Bolle, "A closed-form design method for 3-D recursive cone filters," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1994, vol. 6, pp. 141–144.
- [6] H. L. P. Arjuna Madanayake and Len T. Bruton, "A Systolic-Array Architecture for First-Order 3-D IIR Frequency-Planar Filters," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 6, pp. 1546– 1559, July. 2008.
- [7] Y. Zhang and L. T. Bruton, "Differentiator-type three-dimensional recursive ladder filters having frequency-planar- or frequencybeam-shaped passbands," IEEE Trans. Circuits Syst. Video Technolgy, vol. 2, no. 3, pp. 297–305, Sep. 1992.
- [8] L. T. Bruton and N. R. Bartley, "Three-dimensional image processing using the concept of network resonance," IEEE Trans. Circuits Syst., vol. CAS-32, no. 7, pp. 664–672, Jul. 1985.
- [9] Rimesh M. Joshi, Arjuna Madanayake, Jithra Adikari and Len T. Bruton, "Synthesis and Array Processor Realization of a 2-D IIR Beam Filter for Wireless Applications," IEEE Trans. VLSI Sys, II, Reg. Papers, vol. 20, no. 12, pp. 2241–2254, Dec. 2012.
- [10] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999.
- [11] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1990.
- [12] H. Schroeder and H. Blume, One- and Multidimensional Signal Processing—Algorithms and Applications in Image Processing. New York: Wiley, 2000.
- [13] R. K. Bertschmann, N. R. Bartley, and L. T. Bruton, "A 3-D integrator-differentiator double-loop (IDD) filter for raster-scan video processing," in Proc. IEEE Int. Symp. Circuits Syst., May 1995, vol. 1, pp.470–473.
- [14] A. Madanayake and L. Bruton, "A high performance distributedparallel-processor architecture for 3-D IIR digital filters," in IEEE Intl.Symp. Circuits Syst., Kobe, Japan, May 2005, vol.2,pp.1457– 1460.
- [15] Xue-Yang Zhu, Twan Basten, Marc Geilen, and Sander Stuijk, "Efficient Retiming of Multirate DSP Algorithms," IEEE TRANS On Cad Of IC And Syst, June 2012, Vol. 31, pp. 831-844.