A Novel Clock Distribution Technology – Multisource Clock Tree System (MCTS)

Nikhil Patel

M.Tech Student [VLSI & EMBEDDED SYSTEM], Department of Electronics & Communication,
U.V.Patel College of Engineering, Ganpat University, Kherva, Mehsana, Gujarat, India.

ABSTRACT: The Heart of any chip is “Clock”. The clock works inside the integrated circuit like the heart in human body. Virtually every digital circuit relies on a highly accurate, highly stable, cyclical clock to control all data movement and processing being done by the various functional blocks on the chip. Clock network design is a critical task in the design of high performance circuits because both the performance and the functionality of the circuit depend directly on the characteristics of the clock network. A multisource Clock Tree System (MCTS) is a hybrid containing the best aspects of a conventional clock tree and a clock mesh.

Keywords: Conventional Clock Tree, Clock mesh, Multisource Clock Tree System (MCTS), Mesh Fabric, On-chip variation (OCV)

I. INTRODUCTION

A Multisource Clock Tree System (MCTS) represents a novel clock distribution technology that fills the gap between conventional clock tree and clock mesh. Clock mesh delivers the best possible clock frequency, skew and OCV results, and whereas conventional clock tree delivers the lowest power consumption and easiest flow.

A Multisource clock tree is a hybrid clock structure containing the best aspects of conventional clock tree and clock mesh. It offers lower skew and better on-chip variation (OCV) performance than conventional clock tree; lower clock tree area/power and shorter, easier flow compared to a clock mesh implementation. A renewed emphasis on high-frequency clock design has heightened interest in multisource clock-tree system (MCTS).

Up to now, we have three methods of clock distribution for large, high-performance designs.

1) Conventional clock-tree
2) Clock mesh.
3) Multisource Clock Tree System (MCTS)

II. CONVENTIONAL CLOCK TREE

In today’s IC designs, designers have traditionally used clock trees to evenly distribute the clock signals inside a chip. An external oscillator tuned to the desired operating frequency is fed into a pin on the chip where it is buffered into many “branches” that are then connected to all the “leaves” of the clock tree.

The physical realization of the clock tree network constrains the chip design and its direct impact on the control of clock skew and jitter. Clock skew refers to the difference in time when each clock signal arrives at the various “leaves” (or clock loads) on the clock tree branches. In ideal condition, there would be no clock skew but in reality, it’s a major design headache. Engineers meticulously place clock buffers and route clock signals across the entire chip, and then exhaustively simulate the timing of all clock signals over all temperatures, voltages, and process variations. Care must be taken at excruciating levels of detail during clock tree buffer and interconnect design and layout to ensure clock skew is minimized and balanced across all the branches of the clock tree. Clock jitter is a critical specification for timing components because excessive clock jitter can compromise system performance. Since it eliminates a portion of the available clock period means 5 to 10% of given clock cycle is “off limits”.

Copyright to IJARIEEIE www.ijareeie.com 2234
Further, the clock network impact overall chip power and area because of the large number of buffers and repeaters that are inserted during clock tree synthesis.

In a clock tree, all signals radiate out in branch-like fashion from the clock source (tree root). This is why clock skew exists in clock trees each branch of the clock tree is made up of a unique chain of buffers and intermediate loads, and therefore exhibits delays and timing characteristics that may have little in common with those of the other branches of the clock tree.

III. CLOCK MESH

For many years ultra-high performance microprocessor design engineers have utilized clock meshes instead of clock trees to improve performance. A clock mesh is a uniform metal grid that is laid across the chip to distribute all of the clock signals, with local connections to clock loads available via a metal strap at every horizontal row or vertical column. The mesh differs from a clock tree in the way that it is driven by the incoming clock signal.

A clock mesh is driven by multiple drivers spread out across an interconnected "grid" of metal wires. The outputs of all the drivers are thus shorted together by the metal grid, ensuring that the entire clock mesh oscillates as a single entity instead of as multiple, independent branches as in a clock tree. The result is that clock skew is virtually eliminated using clock meshes and higher performance can be achieved.

As manufacturing processes shrink, then the electrical characteristics of transistors across the chip will vary substantially. For example, a small variance in the amount of doping atoms in a diffused source or drain, or in the thickness of gate oxide layers or in the critical dimensions of transistors can cause clock buffers on the same chip to have slightly different electrical properties and that requires further guard-banding of the clock branches to ensure proper operation due to the potential of clock skew from OCV.

Clock meshes are immune to OCV because the clock mesh is composed of nothing but metal wires, no transistors in the clock mesh means no transistor characteristics can cause variation. Clock meshes therefore improve manufacturing yield due to their immunity to OCV. Higher yield means lower prices, along with higher performance, and IC designers have long considered switching to clock meshes for just these reasons. But power consumption is more in clock meshes, since clock mesh use big grid of metal means there is a lot of capacitance to drive by the clock buffers. The clock buffers thus consume a tremendous amount of power driving the large capacitance of a clock mesh.

Clock Mesh have advantages like higher performance, reduced design engineering requirements, and immunity to OCV but due to more power consumption that can not replaced clock tree in clock distribution network.
IV. MULTISOURCE CLOCK TREE SYSTEM (MCTS)

A Multisource clock tree system is a hybrid system containing the best aspects of a conventional clock tree and pure clock mesh. It offers low clock skew and better on chip variation (OCV) performance than a conventional clock tree; lower clock tree power/area; and a shorter easier flow compared to pure clock mesh implementation.

Multisource CTS represents a novel clock-distribution technology that fills the methodology gap between conventional Clock Tree and clock mesh. Whereas clock mesh delivers the best possible clock frequency, skew, and OCV results, and whereas conventional Clock Tree delivers the lowest power consumption and the easiest flow, multisource CTS offers a compromise between the two methods while favoring the OCV tolerant nature of pure clock mesh.

Benefits of Multisource CTS:

- Higher performance and lower skew than Conventional Clock Tree.
- Better OCV Tolerance than Conventional Clock Tree.
- Better multi-corner performance than Conventional Clock Tree.
- Less power consumption than pure clock mesh.
- Greater tolerance for irregular, highly macro density designs than pure clock mesh.
- Faster and easier flow than pure clock mesh.
- Deeper clock gating levels enabled for more complex power plan.

The many performance and flexibility benefits of multisource CTS make it strong candidate for broad set of design types. Several high clock frequency designs are starting and taping out with multisource CTS.

A multisource CTS Design comprises three different structures in the design.

A. Pre-mesh Clock Tree
B. Multisource Mesh Fabric
C. Moderately sized clock trees

A. PRE-MESH CLOCK TREE

Each buffer in the pre-mesh tree drives four other buffers, which implies that the pre-mesh topology is implemented using H-tree placement and routing. An H-tree structure provides a uniform, scalable, and predictable means of distributing the root clock over a large area. In addition, H-trees exhibit excellent corner-to-corner variation tolerance because of their balanced structure.
B. MULTISOURCE MESH FABRIC

The multisource mesh fabric resembles a power/ground or clock mesh fabric, but is one or two orders of magnitude less dense. The coarse fabric smooths out any remaining clock arrival-time differences from the multiple H-tree buffers that directly drive the fabric, whereby the skew measured at the mesh plane is effectively zero.

The fabric also represents the lowest part of the multisource CTS topology shared by every sink in the design. The conceptual position in the Z-axis determines the specific OCV tolerance characteristic of a multisource CTS design the higher up in the topology, the more it behaves like conventional CTS and the less the OCV tolerance. Conversely, as the mesh gets pushed further down in the structure, the more it behaves like clock mesh and, as a result, benefits from the best possible OCV tolerance.

C. MODERATELY SIZED CLOCK TREES

The multiple clock trees attached to the coarse mesh gives the technology its name. As mentioned, designers may target the OCV performance level by targeting the depth of the clock tree. In clock mesh, the guideline is to restrict the buffer and clock-gating depth to one or, at most, two levels. Multisource CTS generally ranges from three to nine levels of buffers of clock gating. If more levels of clock gating become necessary, conventional CTS may be the natural choice.

The synthesis and optimization of the multiple clock trees leverage conventional CTS methods. One benefit of multisource CTS is that designers can take a “divide and conquer” approach. In this scenario, the root-to-mesh portion is timed with circuit simulation, and then the multiple clock trees are timed with standalone signoff timing engines or with the timer embedded in the place-and-route tool.

Mesh Fabric pitch determination

In multisource CTS implementation, the intersections of the horizontal and vertical mesh spines become potential tap-point locations. For that particular style, determination of the mesh pitch must consider the location of the tap points.

OCV latency is used to establish the tap-point density. The goal is to establish the minimum topology that meets the design’s OCV latency target. The OCV latency decreasing as a function of an increasing number of tap points. Timing analysis of the mesh fabric validates the tap-point locations, and thus the mesh-fabric design.

Tap-point Determination and Sink Assignment

Sinks in the local area are attached to the tap point after determining the number and location of the tap points. Though multisource CTS tap-point clustering is based on the geography, it’s influenced by the design’s hierarchy.

Synthesize the multisource Trees

Now the sinks are associated with their tap points, the clock trees are synthesized using conventional CTS methods. The process starts by placing buffers at the tap points. The input pin of these buffers attaches to the mesh fabric, and the output is the local clock root for each instance of the multiple clock trees below the mesh.

Subsequently, clock trees are compiled and optimized for skew. The multiple clock trees are balanced during compilation or as a post-processing step, per the designer’s preferred practice. After clock-tree synthesis and optimization, the clocks are routed and the design is ready for signoff timing analysis.

Timing Analysis

While static timing tools alone aren’t well-suited to analyse parallel drive networks, the combination of Spice-accurate simulation with a static timing engine provides a seamless, signal-integrity-aware timing approach. Some design teams can isolate circuit-simulation activities away from pure digital design by timing the root to mesh path separately from the multiple clock trees. An ideal clock applied to the multiple clock trees enables timing and optimization in an all-digital place-and-route environment. It is always recommended, however, to perform final sign-off timing with a full circuit-simulation-based timing analysis run.
V. KEY DIFFERENCE BETWEEN CONVENTIONAL CLOCK TREE, CLOCK MESH AND MULTISOURCE CLOCK TREE SYSTEM

There are four key differences between Conventional Clock Tree, Multisource Clock Tree System and Clock mesh: shared path, mesh fabric, design complexity, and timing analysis.

A. SHARED PATH:

From Conventional Clock Tree Shown in fig, we can say that it has unlimited depth for both buffer and clock gating levels. Most of the sinks in the design share very few paths back to the clock root.

In Multisource Clock Tree System, it has typically between three to nine levels of clock gating and buffers. The Multiple clock trees are at the bottom of mesh grid structure and all the structure above the mesh form a shared path back to the clock root.

In Clock Mesh, it has an extremely shallow logic depth below the mesh, usually just single buffer or clock gating cell directly driving the sinks. It has large shared path from root to the mesh.

The respective logic depths are inversely related to the level of shared path between the sinks and the clock root. Path sharing reduces the impact of on-chip variation (OCV) effects on the design since when the sinks share the same clock path to the root, any process variation occurrence in that path affects both flops equally and all timing assumptions are preserved. In the absence of the path sharing, one must increase the clock margin by derating factor to account and either or both flip flops experience a process variation phenomenon.

B. MESH FABRIC:

The respective logic depths are inversely related to the level of shared path between the sinks and the clock root. Path sharing reduces the impact of on-chip variation (OCV) effects on the design since when the sinks share the same clock path to the root, any process variation occurrence in that path affects both flops equally and all timing assumptions are preserved. In the absence of the path sharing, one must increase the clock margin by derating factor to account and either or both flip flops experience a process variation phenomenon.
Multisource CTS Mesh Fabric is one to two orders of magnitude less dense than the clock mesh fabric. The coarse pitch of the multisource CTS mesh fabric has the benefit of using considerably less power than the extremely fine pitch of a clock-mesh fabric.

In Clock Mesh, the dense fabric defines relatively small bins that contain cluster or sub-cluster amounts of logic. These structures of buffers or clock gates and the sinks they drive are often called twigs.

In Multisource CTS, multiple clock trees attach to the coarse mesh fabrics at locations called tap points. The tap points are the clock tree roots and assigned sinks define the boundary of the clock tree.

C. DESIGN COMPLEXITY:

Conventional CTS is the most accommodating approach for dealing with design complexity.

Clock mesh is the most rigid of the three approaches. An ideal clock mesh design has no RAMs, ROMs, or other hard blocks. Indeed, it is a flat sea of gates.

Multisource CTS falls between conventional CTS and clock mesh with respect to its handling of design complexity. The depth of the multisource clock trees tolerates most clock-gating plans well, and the smaller pre-mesh H-Tree means fewer drivers to account for amid RAMs and hard blocks in the floorplan.

D. TIMING ANALYSIS:

In Conventional Clock Tree, we perform timing analysis using signoff static timing engines and similar timing engines embedded within place and route tools.

In Clock Mesh and multisource CTS, we perform timing analysis in mesh fabrics using circuit simulation. The standard is for automation within the place and route tool to launch the simulation run and then annotate the timing values onto the design for subsequent static timing reports and analyses.

VI. CONCLUSION

As new technology nodes enable increasingly larger and more feature-rich designs, the choice of clock-distribution methodology becomes ever more important. Conventional Clock Tree, which has traditionally been the default choice for all designs, may no longer be the optimal choice when an extremely high clock frequency is required.

Thus, it is a good idea to broaden the clock-distribution skill set to include clock mesh and multisource CTS technologies. Experience with these methodologies enables designers to make the most optimal design choice given the design goals: clock frequency, OCV tolerance, power consumption, flow ease, and time-to-market pressure.

Multisource CTS presents a viable hybrid approach for designers seeking the best of conventional Clock Tree and clock mesh. It provides better high-clock frequency performance and OCV tolerance than conventional Clock tree. It’s also more tolerant of complex floorplans, creating greater flexibility for clock-gating depth than clock mesh.

The multisource CTS design flow is easier to use than the clock-mesh flow, yet maintains many of the benefits of clock mesh. Adoption of multisource CTS continues to grow, as design teams seek the easiest path to very high-frequency clock design.

ACKNOWLEDGEMENT

I would like to thank Mr. Kashyap Kansara and Mr. Krishnachaitanya challa for providing necessary facilities towards carrying out this work. I am also very thankful to all my friends for all the thoughtful and mind stimulating discussion we had, which prompted me to think beyond the obvious.

REFERENCES