

(An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 5, December 2014

# **Design of ALU with LFSR Using Clock Gating**

Anuja Aravind<sup>1</sup>, Raseena K.A<sup>2</sup>

Department of Electronics and Communication, Ilahia College of Engineering and technology, Muvattupuzha,

Ernakulam, India<sup>1,2</sup>

**ABSTRACT:** This paper proposes a method to reduce the power consumption of a 32 bit linear feedback shift register. The proposed scheme is based on data driven clock gating approach and it can offer improved power reduction based on the technological characteristics of the employed gates compared to the traditional gated design approach. Reduction of dynamic power is done by the effective management of utilization of clock signal for the flip flops present in the LFSR circuit by adopting the grouping of available flip flops .The dynamic power consumption due to unnecessary switching of the clock gated circuitry is eliminated by a modified approach in the design of clock gating circuitry. The LFSR designed so can be used in the design of an arithmetic and logic unit (ALU) with the designed LFSR as on of its input register and hence reduce the power consumption of the overall system. Further power saving is done by means of the application of power saving modes to the ALU circuit when the inputs are not subjected to change.

KEYWORDS: DDCG, LFSR, toggling, low power

### I. INTRODUCTION

About 50% of the total dynamic power consumption in consumer electronics is the system's clock signal. Many techniques are devised to reduce the usage of clock signal have. While [1] uses multiple supply voltages to reduce to clock tree power [2] focuses on Interconnect Power, i.e. energy dissipation due to the switching of interconnection capacitances,. Grouping of FFs present in the sequential circuit is an efficient method of reducing clock power. FF grouping in [3] is mainly driven by the physical position proximity of individual FFs while grouping for data driven clock gating combine toggling similarity with physical position considerations.

The principle of data driven clock gating as explained in [4] and [5] can be effectively implemented in linear structured circuits such as Linear feedback shift registers. The work is done as an extension of [6] in which individual clock disabling circuitry is made available for each flip flop. An LFSR is nothing but a sequential shift register with combinational logic that causes it to pseudo-randomly cycle through a sequence of binary values. Linear feedback shift registers have different uses in digital systems design.

In this paper a 32 bit LFSR is presented in which the 32 flip flops are organized into different groups based on the toggling probability and different clocks are given to each group. So we need not have to supply master clock all the time. The additional power consumption occurred due to unnecessary switching of the controlling device is eliminated by means of a new clock gated approach as explained in [7]. The LFSR so designed is used in the design of a 32 bit ALU circuit and the power results are compared for each design stage.

### II. DATA DRIVEN CLOCK GATING

When a logic unit is clocked, the sequential elements present in the logic unit receive the clock signal, regardless of whether or not they will toggle in the next cycle. In clock gating, the clock signals are ANDed with explicitly predefined enabling signals. Clock gating can be employed at all levels including system architecture, logic design, and gates. Fig. 1 shows how a FF can find out whether its clock can be disabled or not in the next cycle. The XOR gate shown in fig.1(a) compares the FF's current output with the present data input that will appear at the output in the next cycle. The clock driver shown in fig. 1(a) is then replaced by a 2-way AND gate called *clock gater* in fig.1(b).



(An ISO 3297: 2007 Certified Organization)

### Vol. 3, Special Issue 5, December 2014



Fig 1:Enabling of clock signal into one gating signal and joining of k enabling signals

Additional power reduction can be achieved by lowering the number of clock gaters. It is possible to drive several FFs with a common gater if we knew that they are toggling simultaneously most of the time, thus achieving almost the same power reduction, but with fewer number of gaters. Fig. 1(c) shows how to join k clk\_en signals generated by distinct FFs into one gating signal where k indicates the number of FFs chosen to form the group.

The grouping saves the number of individual clock gaters at the expense of an OR gate and a negative edge triggered latch that is required to avoid glitches of the enable signal. The combination of a latch and an AND gate is commonly used by commercial tools and is called as integrated clock gate (ICG). The hardware savings increases with k, but the number of disabled clock pulses decreases. Hence, for the scheme proposed in Fig. 1(c) to be beneficial, the clock enabling signals of the grouped FFs should be highly correlated. Fig. 2 shows the data-driven gating proposed in [1]. A FF finds out that its clock can be disabled in the next clock cycle by XORing its output with the present data input that will appear at its output in the next clock cycle. The outputs of the k XOR gates are ORed inorder to generate a joint gating signal for k FFs, and is then connected to the integrated clock gate (ICG) unit.

In the physical implementation of the DDCG circuit, the XOR gate is integrated into the FF, while the OR gate, AND gates and the latch are integrated into the clock gater. Two distinct clock signals are there:clk\_g is the ordinary gated signal driving the registers present, while clk is driving the latches of the clock gaters.

### III. DESIGN OF 32 BIT LFSR

#### A. Traditional LFSR

A Linear Feedback Shift Register is a sequential shift register with combinational logic which causes it to pseudorandomly cycle through a sequence of binary values. Linear feedback shift registers have different uses in digital systems design. It is a shift register whose input bit is a linear function of its previous state and it is obtained with an array of FFs with a linear feedback performed by several XOR gates [6]. They are described through an nth-order polynomial;

$$Pn(x)=x^{n}+bn-1x^{n-1}+\dots+b1x+1$$

where the binary coefficients bi define the well-known polynomial characteristic which the generator properties depends on.

As it is well known, LFSRs exhibit a high-speed bit generation and also have very good statistical properties. But, the main drawback for these generators is the high power consumption given as,

#### PTR=nPFF+t.PXOR

where n is the register's length (i.e., the order of the generator) and t is the number of the inner taps (i.e., the number of the terms of the polynomial characteristic except  $x^n$  and 1). The terms PFF and PXOR are the dynamic power



(An ISO 3297: 2007 Certified Organization)

### Vol. 3, Special Issue 5, December 2014

consumption of D flip flop and XOR gates respectively. Both these terms PFF and PXOR are proportional to Vdd<sup>2</sup>fck, where Vdd is the supply voltage and fck the clock frequency.

The clock path toggles at every clock cycle, thus dissipating a significant amount of power especially at high clock rate. Vice versa, power consumption of the D-path and the XOR gates depends upon the switching activity at the inner node.

#### B. LFSR with Data Driven Clock Gating

Here a Linear feedback shift register generating 32 bit random sequence is designed which consume less power. Although it is designed to reduce the dynamic power consumption, it has some other drawbacks. This circuit needs same clock for all 32 flip-flops. So it consumes much power for clocking. Dynamic power dissipation is hence found to be more. Our objective is to reduce the dynamic power dissipation by controlling the clocks using data driven clock gating clock gating method. The design is based on switching by checking whether the output of the flip-flop is toggling or not.

In data driven clock gating approach flipflops are organised in to different groups based on the toggling probability and different clocks are given to each group. So we need not have to give so supply master clock all the time. Hence in the design of a 32 bit LFSR circuit, we may group four FFs combined to form single group which means k=4. So here, we have eight groups(8X4=32).

Fig.3 shows a 32 bit LFSR using Data driven clock gating technique. The LFSR uses a gated clock for each of the 8 groups so that the each clock may drive four FFs simultaneously.



Fig. 2. Practical data-driven clock gating. The latch and gater (AND gate) overheads are amortized over k FFs.



Fig.3. LFSR circuit with data driven clock gating

C. LFSR with New clock gated approach



(An ISO 3297: 2007 Certified Organization)

### Vol. 3, Special Issue 5, December 2014

In this section a new approach for saving clock power is discussed. The new Gated Clock Generation Circuit is shown in Fig.4 using negative latch. In the circuit enable changes from negative edge to next negative edge and also target is negative edge triggered .Here FF's state changing delay is different but output is correct which gives us solution of the problem that persists in data driven clock gating.

The circuit in Fig.4 saves power in such a way that even when Target device's clock is ON, the controlling device's clock is OFF and also when the target device's clock is OFF then also Controlling device's clock is OFF. This way we can save more power by avoiding unnecessary switching at clock net. To understand the working of circuit consider Fig.4, an input signal named 'En' is provided to the latch. When En turns to '1' at that time GEN is '0', XNOR will produce x='0' which goes to the first clock generation logic that generates clock for controlling device (LATCH). In first logic we have an OR gate which have Global Clock as an input at the other input of OR gate. This logic will generate a clock pulse that will drive the controlling latch when 'x' turns to '0'. In the next clock pulse, when GEN turns to '1' our second clock generation logic which is an AND gate which has GEN and Global clk at its input and when Gen goes '1' it generates clock pulse that goes to the target device. Since GEN is '1' the XNOR will produce x='1' thus OR will produce at CClk constant HIGH until En turns to '0'. This way GClk will be running and CClk will be at Constant '1' state that means latch will hold its state without any switching.Fig.5 shows the newly designed LFSR.



Fig.4. Generation of Gated clock using the new approach

The proposed design of 32-bit LFSR has been validated through simulations run in Xilinx 14.1. Here we implemented 32 bit LFSR using Xilinx ISE. Xilinx provides automation tools for designing and implementing any logical as well as hardware on a single chip to get faster prototype, so it is widely used to implement any digital logic on FPGA. The power consumption of the LFSR designed is compared with an ordinary LFSR and DDCG LFSR in table II.



Fig. 5.LFSR circuit using the new approach of clock gating



(An ISO 3297: 2007 Certified Organization)

#### Vol. 3, Special Issue 5, December 2014

#### IV. DESIGN OF 32 BIT ALU

This section describes the design of a 32-bit Arithmetic Logic Unit (*ALU*). An ALU is the brawn of the computer, the device that performs the arithmetic operations like addition and subtraction or logical operations like AND,NOR,OR etc. This section illustrates an ALU which carryout such operations using the inputs applied to each of the 1-bit ALUs used to construct the 32 bit ALU. Because the input word is 32 bits wide, we need a 32-bit-wide ALU. Hence we will connect 32 1-bit ALUs to create the desired ALU and explains how combinational logic works. Table I shows the operations performed by the ALU designed.

For analysing the effect caused by the new clock gated approach used in the LFSR, the newly designed circuit is connected as an input to the 32 bit ALU circuit. Even though there is no considerable power reduction, there is a certain power saving occurred for the ALU. For further reduction, we may apply separate power saving modes in the ALU circuitry itself without taking the LFSR under concern.

Additional power reduction can be done by using power saving modes for the state of the ALU which can be driven as either 'Sleep' or 'Active' depending on the change in the outputs in the next clock cycle. If the input (other than the 32 bit word generated by LFSR) applied to the ALU remains the same during the next clock cycle then the ALU retains the result since it is not subjected to any change. Hence the state of the ALU will be 'Sleep'. The operation to be performed is determined by the word on the select line. If we choose another operation for the next clock cycle the ALU may enter the 'Active' state since the select inputs has changed.

Fig.6 shows the designed ALU. Fig6(a) illustrates a 1 bit ALU which is replicated to make the 32 bit ALU as shown in fig.6(b). A schematic of the 32 bit ALU is shown in fig.6(c)

| _ |           |          |           |            |
|---|-----------|----------|-----------|------------|
| S | elect lin | e inputs | Operation |            |
| S | 0         | S1       | S2        |            |
| 0 | )         | 0        | 0         | transfer a |
| 0 |           | 0        | 1         | not a      |
| 0 |           | 1        | 0         | a or b     |
| 0 |           | 1        | 1         | a nor b    |
| 1 |           | 0        | 0         | a xor b    |
| 1 |           | 0        | 1         | a and b    |
| 1 |           | 1        | 0         | not b      |
| 1 |           | 1        | 1         | transfer b |

#### TABLE I ALU OPERATIONS

#### V. RESULTS

#### A.Simulation of LFSR

Fig.7(a) shows the waveforms of the LFSR obtained during simulation. It shows the reduced clock utilization unlike the traditional one. Flip flops one to four is fed by child clock 1, five to eight with clock2 and so on, thereby total eight clocks are required. Some of these clocks are idle during certain point of time during the overall simulation period depending upon the toggling of the individual flip flops. The masking of the clock signals hence eliminates the need to supply master clock all the time.

#### B.Simulation of ALU

Fig.7(b) shows the simulation waveforms of the 32 bit ALU connected to LFSR as one of its inputs .Simulations are run after the ICG in the clock gated circuitry is replaced by the new clock generation circuit and also applying the power saving modes to the ALU. Table III shows the comparison of the ALU with ordinary LFSR and DDCG LFSR.



(An ISO 3297: 2007 Certified Organization)

# Vol. 3, Special Issue 5, December 2014

Table IV shows the comparison of the ALU with DDCG LFSR and that with modified LFSR. Table V shows the power results of ALU with power saving modes applied compared to the modified LFSR.



Fig.7.Simulation waveforms of LFSR and ALU

(b)

#### C.Power consumption of LFSR

| TABLE III                          |            |     |          |            |     |           |            |     |          |  |  |
|------------------------------------|------------|-----|----------|------------|-----|-----------|------------|-----|----------|--|--|
| POWER CONSUMPTION OF VARIOUS LFSRS |            |     |          |            |     |           |            |     |          |  |  |
| Power                              | F=50 MHz   |     |          | F=100 MHz  |     |           | F=1000 MHz |     |          |  |  |
| consum                             |            |     |          |            |     |           |            |     |          |  |  |
| ption                              | LFSR       | DDC | LFSR     | LFSR       | DDC | LFSR      | LFSR       | DDC | LFSR     |  |  |
| (mW)                               | with       | G   | with     | with       | G   | with      | with       | G   | with new |  |  |
|                                    | ordinary   | LFS | New      | ordinary   | LFS | new gated | ordinary   | LFS | gated    |  |  |
|                                    | clockgatin | R   | gated    | clockgatin | R   | approach  | clockgatin | R   | approach |  |  |
|                                    | g          |     | approach | g          |     |           | g          |     |          |  |  |



(An ISO 3297: 2007 Certified Organization)

# Vol. 3, Special Issue 5, December 2014

| Clock  | 1.27 | 0.38 | 0.33 | 2.54 | 0.76 | 0.66 | 25.41 | 7.59 | 6.64 |
|--------|------|------|------|------|------|------|-------|------|------|
| power  |      |      |      |      |      |      |       |      |      |
|        |      |      |      |      |      |      |       |      |      |
| Dynami | 7    | 12   | 14   | 27   | 25   | 28   | 1894  | 452  | 278  |
| с      |      |      |      |      |      |      |       |      |      |
| power  |      |      |      |      |      |      |       |      |      |
| Total  | 50   | 54   | 56   | 69   | 68   | 70   | 1942  | 496  | 321  |
| power  |      |      |      |      |      |      |       |      |      |

### D.Power comparison of ALU

 TABLE IIIII

 POWER CONSUMPTION OF ALU USING ORDINARY CLOCK GATED LFSR AND DDCG LFSR

|                          | F=50 MHz                                 |                          | F=100 MHz                                |                          | F=1000 MHz                               |                          |  |
|--------------------------|------------------------------------------|--------------------------|------------------------------------------|--------------------------|------------------------------------------|--------------------------|--|
| Power                    |                                          |                          |                                          |                          |                                          |                          |  |
| (mW)                     | ALU with<br>ordinary clock<br>gated LFSR | ALU with<br>DDCG<br>LFSR | ALU with<br>ordinary clock<br>gated LFSR | ALU with<br>DDCG<br>LFSR | ALU with<br>ordinary clock<br>gated LFSR | ALU with<br>DDCG<br>LFSR |  |
| Clock<br>domain<br>power | 0.70                                     | 0.35                     | 1.39                                     | 0.69                     | 13.91                                    | 6.93                     |  |
| Dynamic<br>power         | 19                                       | 14                       | 38                                       | 29                       | 381                                      | 289                      |  |
| Quiescent<br>power       | 42                                       | 42                       | 42                                       | 42                       | 43                                       | 43                       |  |
| Total power              | 61                                       | 57                       | 81                                       | 71                       | 424                                      | 332                      |  |

| IADLEIVV     |                                                            |          |           |          |        |          |         |          |  |  |  |
|--------------|------------------------------------------------------------|----------|-----------|----------|--------|----------|---------|----------|--|--|--|
|              | POWER CONSUMPTION OF ALU USING DDCG LFSR AND MODIFIED LFSR |          |           |          |        |          |         |          |  |  |  |
| Power        | F=50 MHz                                                   |          | F=100 MHZ |          | F=1GHz |          | F=2 GHz |          |  |  |  |
| consumption  |                                                            |          |           |          |        |          |         |          |  |  |  |
| (mW)         | ALU                                                        | ALU with | ALU       | ALU with | ALU    | ALU with | ALU     | ALU      |  |  |  |
|              | with                                                       | modified | with      | modified | with   | modified | with    | with     |  |  |  |
|              | DDCG                                                       | LFSR     | DDCG      | LFSR     | DDCG   | LFSR     | DDCG    | modified |  |  |  |
|              | LFSR                                                       |          | LFSR      |          | LFSR   |          | LFSR    | LFSR     |  |  |  |
| Clock domain | 0.35                                                       | 0.30     | 0.69      | 0.59     | 6.93   | 5.95     | 13.85   | 11.34    |  |  |  |
| power        |                                                            |          |           |          |        |          |         |          |  |  |  |
|              |                                                            |          |           |          |        |          |         |          |  |  |  |
|              |                                                            |          |           |          |        |          |         |          |  |  |  |



(An ISO 3297: 2007 Certified Organization)

### Vol. 3, Special Issue 5, December 2014

| Dynamic<br>power   | 14.47 | 14.14 | 28.95 | 28.88 | 289.39 | 288.10 | 572 | 561 |
|--------------------|-------|-------|-------|-------|--------|--------|-----|-----|
| Quiescent<br>power | 42.42 | 42.42 | 42.45 | 42.45 | 43.09  | 43.09  | 44  | 44  |
| Total power        | 56.89 | 56.86 |       | 71.33 | 332.39 | 331.19 | 616 | 604 |

|                              | PADLE V<br>POWER CONSUMPTION OF ALLUSING MODIFIED LESS AND ALLUWITH POWER SAVING MODES APPLIED |                                              |                              |                                              |                              |                                              |  |  |  |  |
|------------------------------|------------------------------------------------------------------------------------------------|----------------------------------------------|------------------------------|----------------------------------------------|------------------------------|----------------------------------------------|--|--|--|--|
| Power<br>consumption<br>(mW) | F=50                                                                                           | MHz                                          | F=100MHz                     | 20 WHITTOWER SAVE                            | F=1000 MHZ                   |                                              |  |  |  |  |
|                              | ALU with<br>modified<br>LFSR                                                                   | ALU with<br>power saving<br>modes<br>applied | ALU with<br>modified<br>LFSR | ALU with<br>power saving<br>modes<br>applied | ALU with<br>modified<br>LFSR | ALU with<br>power saving<br>modes<br>applied |  |  |  |  |
| Dynamic<br>power             | 14                                                                                             | 5                                            | 29                           | 11                                           | 288                          | 109                                          |  |  |  |  |
| Total power                  | 57                                                                                             | 48                                           | 71                           | 53                                           | 332                          | 152                                          |  |  |  |  |

TADIEV

### **VI. CONCLUSION**

This paper studied the problem of grouping FFs for joint clocking by a common gater to yield maximal dynamic power savings in a 32 bit LFSR circuit. In order to evaluate the power reduction obtained by applying DDCG in LFSR, we have evaluated the power consumption in 32 bit LFSR with traditional clock gating and power consumption in 32 bit LFSR with DDCG for same input vector and same clock cycles. VHDL code of former LFSR was simulated in Xilinx 14.2 ISE Navigator and then VHDL code of latter was simulated and synthesized and the power was obtained using Xilinx XPower Analyzer. Further reduction of the dynamic power consumption of the DDCG LFSR can be fulfilled by eliminating unnecessary switching of the latch involved in the clock gater circuitry. The proposed LFSR is found to be power efficient than an ordinary LFSR and DDCG LFSR. LFSR so realized using Data driven clock gating approach is used as an input register for a 32 bit ALU, providing 32 bit word as one of its operands. Additional power reduction for the ALU is done by applying power saving modes to the ALU. The ALU designed draws little power than an ALU using an ordinary LFSR and with DDCG LFSR at its input. As an extended work the proposed LFSR can be applied to other circuits such as pseudo random number generators as well for obtaining better performance results. Dynamic power reduction can be done by reducing the clock capacitance instead of focussing on the clock frequency as done in this paper since power consumption has a linear relationship with clock capacitance also.



(An ISO 3297: 2007 Certified Organization)

#### Vol. 3, Special Issue 5, December 2014

#### REFERENCES

[1] M. Igarashi, K. Usami, K. Nogami, F. Minami, Y. Kawasaki, T. Aoki, M. Takano, C. Misuno, T. Ishikawa, M. Kanazawa, S. Sonoda, M. Ichida, N. Hatanaka, "A Low-Power Design Method using Multiple Supply Voltages", ISLPED-97: ACM/IEEE International Symposium on Low-Power Electronics and Design, pp. 36-41, Monterey, CA, August 1997.

[2] H. Zhang, J. Rabaey, "Low-Swing Interconnect Interface Circuit"s, ISLPED-98: ACM/IEEE International Symposium on Low-Power Electronics and Design, pp. 161-166, Monterey, CA, August 1998.

[3] Y.-T. Chang, C.-C. Hsu, M. P.-H. Lin, Y.-W. Tsai, and S.-F. Chen, "Post-placement power optimization with multi-bit flip-flops" in Proc. IEEE/ACM Int. Conf. Comput., Aided Design, Nov. 2010, pp. 218–223.

[4] Shmuel Wimer, and Israel Koren, "Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating", IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

[5] S. Wimer and I. Koren, "The Optimal fan-out of clock network for power minimization by adaptive gating", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1772–1780, Oct. 2012

[6] W. Aloisi and R. Mita, "Gated-clock design of linear-feedback shift registers," IEEE Trans. Circuits Syst., II, Brief Papers, vol. 55, no. 5, pp. 546–550, Jun. 2008.

[7] Jagrit Kathuria, M. Ayoubkhan, Arti Noor. "A Review of Clock Gating Techniques", MIT International Journal of Electronics and Communication Engineering Vol. 1 No. 2 Aug 2011 pp 106-114.