Keywords
|
double edge triggering, low power, implicit clock pulse, clock branch sharing, flip-flop. |
INTRODUCTION
|
As we know, the clock system which consists of the clock distribution network and timing elements(flip-flops and latches) is one of the most power consuming components in a VLSI system[1]. This power consumption is approximately 30% to 60% of the total power dissipation in a system. As a result of reducing power consumed by flip-flops will have a deep impact on the total power consumption. In common digital VLSI circuits,the various sources of power dissipation are switching power (Pswitching) ,short circuit power(Pshortcircuit),static power(Pstatic) and leakage power(Pleakage) [4]. The following equation describes the total power consumption(Ptot) related to these four power components. |
|
The important ways to reduce this power consumption are voltage scaling and double edge triggering .Voltage scaling is the most effective way to decrease power consumption, since power is proportional to the square of the voltage (the golden equation for power consumption of VLSI circuits P =CLVdd 2fclk ; where CL – load capacitance ,Vdd – supply voltage and fclk – clock frequency [7]). However, voltage scaling is associated with threshold voltage scaling which can cause the leakage to increase exponentially. On the other hand, double-edge triggered clocking can be used to save half of the power on the clock distribution network results in total power consumption. Double edge triggering means that , a flipflop responses for both positive(0 to 1 transition) and negative(1 to 0) edges results in cutting the frequency of the clock by one half . In this paper the second method-double edge triggering is proposed to implement clock branch sharing-implicit pulse(CBS_ip) scheme flip-flop and make comparison analysis with the existing double edge triggering flip-flops. |
TECHNIQUES FOR IMPLEMENTING DOUBLE EDGE TRIGGERED FLIP-FLOPS
|
In view that, most double edge triggered flip-flops(DEFF) are developed using single edge triggered flip-flops (SEFF) design. The various SEFF are traditional master-slave FF, sense amplifier based FF, pulse triggered FF. The first two SEFF are having two stages and are characterized by a positive setup time, causing large D-Q delays. Alternatively pulse triggered FF reduces the two stages into single stage and is characterized by soft edge property. The pulsed latches have fewer clocked transistors and hence lower power consumption[3]. The pulse edge triggered flip-flops are classified in to two types : Explicit pulsed FF(ep-FF) and implicit pulsed flip-flop(ip-FF). |
In this paper the various categories of SEFF and DEFFs are analyzed in terms of its clock pulse generating schemes as well as the data latch scheme. Generally the DEFF design will use more clocked transistors than SEFF design . However, the DEFF design should not increase the clock load too much. The DEFF design should aim at saving energy on both the clock distribution network (by halving the frequency) and flip-flops. It is preferable to reduce circuit’s clock loads by minimizing the number of clocked transistors . Furthermore, from the equation (1) ,circuits with reduced switching activity would be preferable. Low swing capability is also very helpful to further reduce the voltage on the clock distribution network for power saving. Due to the fact that voltage scaling can reduce power efficiently, the cluster voltage scaling (CVS) systems are also preferred. The various techniques to implement double edge triggered flip-flops are conventional master slave scheme ,explicit pulse triggered scheme and implicit pulse triggered scheme. In contrast, the various implicit pulse triggered schemes are symmetric pulse generator(SPGFF) scheme, conditional pre-charge(DECPFF) and the clock branch sharing-implicit pulse(CBS_ip)schemes[1]. |
REVIEW OF DETFF
|
When the Single Edge Trigger(SET) clocking strategy is replaced by the Double Edge Trigger(DET) strategy, the performance of the DEFF must be comparable to the original SEFF(Fig.1) in order to exploit the power savings due to halved clock frequency. If the clock load of a DEFF is much larger than that of the SEFF, power savings due to the clock frequency reduction might be cancelled by an increase in the switching capacitance. Therefore, the clock load is a crucial performance parameter of a DEFF. In this section, we give the review of the state-of-the-art DEFFs and examine their characteristics that affect the performance and energy consumption. |
A. Latch-MUX |
1)Transmission-Gate Latch-MUX: |
The transmission-gate latch-MUX (TGLM) [6] is dual-edge counterpart of the single-edge transmission-gate master-slave latch (SE-TGMS)[6]. The TGLM requires two complementary clock phases. The TGLM is the straightforward implementation of the latch-MUX structure that uses transmission-gate (TG) latches. The clock load of the TGLM is large, since each Clk and CKD drives twice as many large transistors compared to the TGMS latch. Thus, even though the TGLM offers good energy-delay tradeoff, large clock load may impair the benefits obtained by operation at reduced clock frequency. |
2)C2MOS Latch-MUX: |
The C2MOS latch-MUX (C2MOSLM) [6], Fig. 2 (a), is dual-edge version of the C2MOS master-slave latch . The latch used in the C2MOS LM is the conventional clocked CMOS latch. The multiplexer consists of two clocked CMOS inverters, high-Z-wired at the output, and a buffer inverter. During the time when CLK=0, the forward path of the transparent latch M1-M2, the feedback path of the opaque latch M9-M10 , and the path of the multiplexer M7- M8 are ON. Similarly, during the time when CLK=1 , the forward path of the transparent latch M11-M12 , the feedback path of the opaque latch M3-M4, and the path of the multiplexer – are ON. The C2MOSLM exploits this property of latch-MUX structure to share the clock transistors. In Fig.2(a), only one pair of clocked transistors ( M14/M15 or M13/M16 ) is used for forward path of one latch, feedback path of the other latch, and multiplexer, at the same time. This transistor sharing greatly reduces clock load and power consumption, while not compromising the performance. |
B. Pulsed Latches |
DE-TGPL [6], Fig.3 is DET counterpart of the single ended transmission-gate pulsed latch (TGPL) . It consists of the clock pulse generator and the TG latch. The clock pulse generator creates a short pulse after each clock edge. During each pulse, the TG latch becomes transparent and captures the input data. At all other times, the latch is opaque and the change of the output is not allowed. The pulse after each clock edge is obtained by performing the XOR/XNOR function of the input clock and the delayed clock. The delayed clock is obtained using an odd number of inverters. In this way,the clock and the delayed clock are at the same logic level only during the short time after both edges of the clock. In Fig.3, the XOR pass-transistor logic gate is used to obtain a short negative pulse at the node CP, and a short positive pulse at the node CN after each clock edge. |
The main advantage of the DE-TGPL compared to the DET latch-MUXs is its speed[6]. An indication of the timing overhead of this pulsed latch can be obtained by observing that the D-to-Q path traverses only a single TG and an inverter. However, this timing overhead is somewhat degraded by the delay needed to generate the inverted clock pulse CN from CP, which causes an asymmetry between low-to-high and high-to-low setup and hold times. The power consumption of the DETGPL is dominated by the clock activity, due to the large switching activity of the clock pulse generator. In addition, both pass gates in the XOR gate are simultaneously open for a short time whenever the clock is switching. As a result, a contention exists at each edge of the clock that causes an increase in overall power consumption. The clock load of the DETGPL consists of the input load of an unbuffered pass-transistor XOR gate. Therefore, it is considerably larger than the clock load of the corresponding SET TGPL, which consists of one static NAND gate and one inverter[6]. |
The other two advanced DET flip-flops described below allow clock frequency reduction while maintaining comparable timing overhead and clock load to the conventional SETSE. |
2)DET Conditional Precharge Flip-Flop |
One approach for obtaining a DET flip-flop from a transparency window-based SET flip-flop is to generate a transparency window after each clock edge. The simplest way to do this is to generate an XOR logic function of the clock and the delayed clock. Another method, logically equivalent to the above, but simpler to implement, generates the signal CKD that switches low as a result of the rising edge of the delayed clock CK2, and switches high as a result of falling edge of the clock Clk. Logical AND of CKD and Clk gives the transparency window after the rising edge of the clock. Similarly, logical AND of CKD and the four-inverter-delayed clock CK4 provides the transparency window after the falling edge of the clock. Applying this method to the SET CPFF [6], we obtain the DE-CPFF, shown in Fig.4. The internal node S’ evaluates (discharges) during these transparency windows if input D=1 . Outside of the transparency windows, the path from the node S’ to ground through the transistors M1,M2,M3 is OFF, and either M7 or the series M8,M9 are ON. Thus, the node S’ takes value of D NAND Q. |
During the transparency windows, conditional evaluation of the node S’ takes place, based on the previous level of the output Q. If Q was low in the previous clock half-cycle, the node S’ was precharged high. In the transparency window, the node S’ switches low if D is high (either the path M1-M3 – M4 or the path M2-M3 –M4 is on). As a result, Q switches high via transistor M16. If the input D is low, the node S’ remains high and Q remains low. If Q was high in the previous clock half-cycle, the node S’ took value of the inverted input D (M4,M5 andM6). When a transparency window arrives, the high level of S’ causes Q to switch low (paths M11-M13 –M14 and M12-M13 –M14 ). The low level of S’ has no effect to Q, as it was already high. Once the node S’ is low, it can return to the high level only if the input is low. In other words, the node S’ does not exercise pre-charge-evaluate sequence in each clock cycle. Therefore, internal power consumed for the redundant pre-charge for the case D=Q=1 is saved. Consequently, this flip-flop has the feature of conditional precharge and statistically reduces power consumption for low input activity. |
DOUBLE EDGE CLOCK BRANCH SHARING IMPLICIT PULSED FLIP-FLOPS(CBS_IP)
|
The conventional DEFFs duplicate the area and the load on the inputs[1]. Explicit pulsed DEFFs use external clock pulse generators, which increase the power. In addition, explicit pulsed DEFFs cannot work with dynamic logic. SPGFF uses implicit pulsing; however, it has four internal redundant switching nodes. Unlike SPGFF, DECPFF eliminates the redundant switching activity, however, the number of clocked transistors reaches 21, and the clock branch duplicating structure is complex. |
To ensure efficient implementation of double-edge clock triggering in an implicit pulsed environment and to overcome the problem with previous implicit pulsed flip-flops which is the large clock load, a novel clock branch sharing topology is used. The sharing concept is similar to the single transistor clocked FF and another clock branch sharing flip-flop. The advantage of this sharing concept is reflected in reducing the number of transistors required to implement the clocking branch of the double-edge triggered implicit-pulsed flip-flops. Without this sharing, the number of clocked transistors would be much larger than the number of transistors used with the sharing concept. Recall that clocked transistors consume a large amount of power. Reducing the number of clocked transistors is an efficient way to decrease the power . The CBS_ip uses the pseudo nMOS logic resulting from the conditional discharge technique. |
SETFF AND DETFF SIMULATION RESULTS AND PERFORMANCE COMPARISON
|
The simulation results are obtained from PSPICE simulation tool (version 9.2) for 0.18um CMOS technology[2] at room temperature. Each design is simulated using the circuit at the schematic level in PSPICE AD and results are verified through transient analysis. |
In this paper, First the simulation results are obtained for the designs of SEFF[Fig.1(b)],C2MOS latch- MUX[Fig.2(b)], DE-CPFF[Fig.5], CBS_ip DEFF[Fig.6(b)] and power consumption for the said designs are extracted from their corresponding output files and are shown in table 1. From table 1 , the comparisons are made in terms of number of transistors ,clock frequency and power consumption .It is illustrated in the graph Fig.8. Second it is designed 2 bit Serial- In-Serial-Out(SISO) shift registers for the designs C2MOS latch-MUX[fig.2(c) and CBS_ip DEFF[Fig.7] and their outputs are verified. The performance are compared in terms of number of transistors and power consumption and is listed in table 2. It is inferred from the results that,the power consumption is less for CBS_ip DEFF design as compared to C2MOS latch- MUX design is illustrated in the fig.9. |
CONCLUSION
|
In this paper, it is survived to implement low-power shift register using a double edge triggered flip-flop and make comparison analysis between various existing design. The flip-flops(FF) in the proposed shift register are designed using clock branch-sharing. The various existing double edge triggered flip-flops are transmission-gate latch-MUX, C2MOS Latch-MUX, Dual-edge transmission-gate pulsed latch (DE-TGPL),CPDEFF and CBS_ip DEFF. From the simulation results shown in table 2 it is inferred that the proposed shift register designed using CBS_ip has an improvement in power consumption as compared to the other state of the art double-edge triggered flip-flop designs. Since the proposed design has less number of clocked transistors and lowest power, it is suitable for high-performance and low power environments. |
Tables at a glance
|
|
Table 1 |
|
Figures at a glance
|
|
|
|
|
|
Figure 1a |
Figure 1b |
Figure 2a |
Figure 2b |
Figure 2c |
|
|
|
|
|
Figure 3 |
Figure 4 |
Figure 5 |
Figure 6a |
|
|
|
|
|
Figure 6b |
Figure 7 |
Figure 8 |
Figure 9 |
|
References
|
- Peiyi Zhao, Jason McNeely, Pradeep Golconda, Magdy A. Bayoumi, Robert A. Barcenas, and WeidongKuang ,” Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop”, IEEE transactions Very Large Scale Integration (VLSI) Systems, Vol. 15, No. 3, 338-345,March 2007.
- David A. Hodges, Horace G. Jackson, ResveSaleh, Resve A. Saleh, “Analysis and design of digital integrated circuits: in deep submicron technology” Mc.Graw-Hill, 2nd Edition,2003.
- Weste N and Harris D, “CMOS VLSI Design”. 3rd edition, Addison Wesley, 2007.
- Kim C C and Kang S, “A low-swing clock double edge-triggered flip-flop,” IEEE J. Solid-State Circuits, vol. 37, no. 5, pp. 648–652, May2002.
- Chung W and Sachdev M, “A comparative analysis of low power low-voltage dual-edge-triggered flip-flops,” .IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 10, no. 6, pp. 913–918, Dec. 2002.
- Nikola Nedovic and Vojin G.Oklobdzija,”Dual-edge triggered storage elements and clocking strategy for low -power systems”,IEEE transactions on VLSI systems, Vol.13,No.5, pp. 577-590,May2005.
|