

# **Fast Flexible FPGA-Tuned Networks-on-Chip**

Deepan Raj.B<sup>1</sup>, T.V.P.Sundararajan<sup>2</sup>, K.Shoukath Ali<sup>3</sup>

PG Scholar<sup>1</sup>, Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu,India

Professor<sup>2</sup>, Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India

Assistant Professor<sup>3</sup>, Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India

**Abstract :** To design 4x4 Mesh, Fat Tree16, Ring16 and Double Ring networks using CONNECT based NoC that embodies a set of FPGA-motivated design principles that uniquely influence key NoC design decisions, such as topology, flit width, router pipeline depth and flow control. The flexibility, lightweight nature and high performance of CONNECT-based NoCs makes them ideal candidates for use in FPGA-based research studies. In this project evaluating these networks with different FPGA family and analysing the efficient FPGA family using the Xilinx ISE software. In the 4x4 mesh configuration when evaluated using synthesis results of different FPGA family, reduced logic resource cost is obtained. To demonstrate CONNECT's flexibility and extensive design space coverage, different CONNECT networks are synthesized.

### Keywords : CONNECT, NoC, FPGA, Xilinx

#### I. INTRODUCTION

Network-on-Chip (NoC) is an emerging paradigm for communications within large VLSI systems implemented on a single silicon chip that is called the layered-stack approach to the design of the on-chip intercore communications the Network-on-Chip (NOC) methodology. In a NoC system, modules such as processor cores, memories and specialized IP blocks exchange data using a network as a public transportation sub-system for the information traffic. A decisions at the switches. A NoC is similar to a modern telecommunications network, using digital bit-packets switching over multiplexed links. Although packet-switching is sometimes claimed as necessity for a NoC, there are several NoC proposals utilizing circuit-switching techniques. This definition based on routers is usually interpreted so that a single shared bus, a single crossbar switch or a point-to-point network are not NoCs but practically all other topologies are somewhat confusing since all above mentioned are networks (they enable communication between two or more devices) but they are not considered as network-on-chips. This paper is organised as follows : Section II describes introduction to NoC terminologies. Section III describes CONNECT router architecture. Section IV describes about flit data width analysis. Section V describes about conclusion and future work.

#### A. Packets

II. NoC TERMINOLOGIES

Packets are the basic logical unit of transmission at the endpoints of a networks.

## B. Flits

When traversing a network, packets, especially large ones, are broken into its (flow control digits), which are the basic unit of resource allocation and flow control within the network. Some NoCs require special additional header or tail its to carry control information and to mark the beginning and end of a packet [10].

#### C. Virtual Channels

A channel corresponds to a path between two points in a network. NoCs often employ a technique called virtual channels (VCs) to provide the abstrac-tion of multiple logical channels over a physical underlying channel. Routers implement VCs by having non-interfeRing it buffers for different VCs and time-multiplexed shaRing of the switches and links. Thus, the number of implemented VCs has a large impact on the buffer requirements of an NoC. Employing VCs can help in the implementation of protocols that require traffic isolation between different message classes (e.g. to prevent deadlock).

#### D. Flow Control

In lossless networks a router can only send a it to a downstream receiving router if it is known that the downstream router's buffer has space to receive the it. Flow control refers to the protocol for managing and negotiating the available buffer space between routers. Due to physical separation and the speed of router operation, it is not always possible for the sending router to have immediate, up-to-date knowledge of the buffer status at the receiving router. In credit-based flow-control, the sending router tracks credits from its downstream receiving routers. At any moment, the number of



accumulated credits indicates the guaranteed available buffer space (equal to or less than what is actually available due to delay in receiving credits) at the downstream router's buffer. Flow control is typically performed on a per-VC basis. E. Input-Output Allocation

Allocation refers to the process or algorithm of matching a router's input requests with the available router outputs. Different allocators offer different trade-offs in terms of hardware cost, speed and matching efficiency. [10] Separable allocators form a class of allocators that are popular in NoCs. They perform matching in two independent steps, which sacrifices matching efficiency for speed and low hardware cost.



# **III. CONNECTROUTER ARCHITECTURE**

Fig. 1 CONNECT Router Architecture

Driven by the special characteristics of FPGAs, a simple router architecture to serve as the basic building block for composing CONNECT networks. Our router design was implemented using Bluespec System Verilog (BSV). CONNECT routers are heavily configurable and among other parameters they support -variable number of input and output ports, variable Number of virtual channels (VCs), variable it width, variable it buffer depth, two flow control mechanisms, flexible user-specifed routing, four allocation algorithms.

## IV. FLIT DATA WIDTH ANALYSIS

#### A. Introduction

In this chapter 4x4 Mesh, Fat Tree16, Ring16 and Double Ring networks are designed with 32, 128,256-bits flit data width using CONNECT based NoC. To evaluate the CONNECT NoC architecture and highlight its flexibility and extensive design space coverage examine CONNECT networks on FPGA resource usage (LUTs) and frequency estimates from synthesis report for different FPGA family like Virtex 4, Virtex 5, Virtex 6 in Xilinx ISE 14.2 version.

## B. 4x4 Mesh Network

In the Figure 1 a 4x4 Mesh network is designed and it consist of 16 routers with a flit data width of 32-bits and 4 virtual channels. Flit data width determines the data transmission rate of the routers. The network is designed on partial Mesh topology and each router is connected to some of its neighbouring routers. Router 5,6,9 and 10 is connected to maximum to four neighbouring routers. Routing technique is used to know the path availability from source to destination. The network is quite reliable, as there is often more than one path between source and destination in the network.



Fig. 2 4x4 Mesh Network



| VIRTEX 6  |                 |           |                 |           |                 |  |
|-----------|-----------------|-----------|-----------------|-----------|-----------------|--|
| 32 Bits   |                 | 128 Bits  |                 | 256 Bits  |                 |  |
| Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) |  |
| 4         | 100.1           | 7         | 101.7           | 1.1       | 121.7           |  |

 TABLE I

 4x4 Mesh - LUTs and Frequency Comparison in Virtex 6 FPGA Family

In the comparison Table 1, synthesis results of 4x4 Mesh network is compared with 32,128 and 256-bit flit data width in Virtex 6 FPGA family. LUTs utilization is increased and frequency is reduced when synthesized with higher order bits like 128 and 256-bits. When each bits compared with different FPGA families Virtex 6 family is efficient in LUTs usage.

## C. Fat Tree16 Network



Fig. 3 Fat Tree16 Network

In the Figure 3 a Fat Tree16 network is designed and it consist of 20 routers, 2 virtual channels specified with a flit data width of 32,64,128-bits and this determines the data transmission rate of the routers. It consist of 16 nodes(N0 to N15) and they are resources. The main characteristic of Fat Tree is that the links that connect nodes from different levels may have different bandwidth depending on their utilization. The complexity of nodes grows as they get close to the roots. It is recursively scalable and easily partitionable network.

 TABLE II

 Fat Tree16 - LUTs and Frequency Comparison in Virtex 6 FPGA Family

| VIRTEX 6  |                 |           |                 |           |                 |  |
|-----------|-----------------|-----------|-----------------|-----------|-----------------|--|
| 32 Bits   |                 | 64 Bits   |                 | 128 Bits  |                 |  |
| Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) |  |
| 1         | 151.9           | 2         | 152.2           | 3         | 152.6           |  |

In the comparison Table 2, synthesis results of 4x4 Mesh network is compared with 32,64,128-bit flit data width in Virtex 6 FPGA family. LUTs utilization is increases by 1 percent and frequency increases when synthesized with higher order bits like 64 and 128-bits. When each bits compared with different FPGA families Virtex 6 family is efficient in LUTs usage.

D. Ring16 Network



Fig. 4 Ring16 Network

In the Figure 4 a Ring16 network is designed and it consist of 16 routers, 4 virtual channels specified with a flit data width of 32,64,128-bits.Flit data width determines the data transmission rate of the routers. Routers from R0 to R15 are connected circularly. It consist of 16 nodes (N0 to N15). A Ring network is a standard circular topology in which each router is connected directly exactly two other nodes, forming a circular pathway and provides only one pathway between any two routers.



| Ring16 - LU1s and Frequency Comparison in Virtex 6 FPGA Family |
|----------------------------------------------------------------|
|                                                                |
|                                                                |

TADIE III

| VIRTEX 6  |                 |           |                 |           |                 |  |
|-----------|-----------------|-----------|-----------------|-----------|-----------------|--|
| 32 Bits   |                 | 64 Bits   |                 | 128 Bits  |                 |  |
| Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) |  |
| 1         | 158.8           | 1         | 175.9           | 3         | 177.4           |  |

In the synthesis results of Ring16 with 32,64 and 128-bit flit data width, Virtex 6 FPGA family is efficient.From the comparison Table 3 the frequency variation is high but increase in LUTs usage is with very less variation when the network is designed with higher order bits.

# E. Double Ring Network



Fig. 5 Double Ring network

In the Figure 5 a Double Ring network is designed and it consist of 16 routers, 4 virtual channels specified with a flit data width of 32,64,128-bits. Flit data width determines the data transmission rate of the routers. It consist of two concentric Rings that connect each node on a network instead of one network Ring that is used in a Ring topology. Secondary Ring in a dual-Ring topology is redundant. It is used as a backup in case the primary Ring fails. In these configurations, data moves in opposite directions around the Rings. Each Ring is independent of the other until the primary Ring fails and the two Rings are connected to continue the flow of data traffic.

| TABLE IV                                         |                    |    |
|--------------------------------------------------|--------------------|----|
| Double Ring - LUTs and Frequency Comparison in V | Virtex 6 FPGA Fami | ly |

| VIRTEX 6  |                 |           |                 |           |                 |
|-----------|-----------------|-----------|-----------------|-----------|-----------------|
| 32 Bits   |                 | 64 Bits   |                 | 128 Bits  |                 |
| Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) | Slice LUT | Frequency (MHz) |
| 1         | 158.8           | 2         | 158.8           | 2         | 153.8           |

In the comparison Table 4, synthesis results of Double Ring network is compared with 32,64,128-bit flit data width in Virtex 6 FPGA family. LUTs utilization is increases by 1 percent **and** frequency deccreases when synthesized with higher order bits like 64 and 128-bits. When each bits compared with different FPGA families Virtex 6 family is efficient in LUTs usage.

## **V. CONCLUSION**

Network topologies such as 4x4 Mesh, Fat Tree16, Ring16 and Double Ring are designed using CONNECT based NoC. These networks has been designed with different flit data width like 32,64,128 and 256. Each network is synthesized with Xilinx ISE 14.2 software. Synthesis results has been compared with different FPGA family like Virtex 4, Virtex 5, Virtex 6. In the synthesis report of these networks with all types of bit values the LUTs usage has



been decreased in Virtex 6 family. Synthesis report shows that Virtex 6 family is very efficient in reducing area and FPGA cost compared to other family. In future designing different networks, performing simulation and analyzing the latency and network performance.

### REFERENCES

[1] Hilton.C, B.Nelson "A Flexible Circuit-Switched NoC for FPGA-based Systems". International conference on Field Programmable logic and applications, 2005.

G. Schelle and D. Grunwald. "ExploRing FPGA Network on Chip Implementations Across Various Application and Network Loads". In International Conference on Field Programmable Logic and Applications, 2008.

[3] Shelburne, M. Patterson, C. Athanas, P. Jones, M.Martin, B.Fong. "MetaWire: Using FPGA Conguration Circuitry to Emulate a Network-on-Chip". In International Conference on Field Programmable Logic and Applications, 2008.

[4] D. Wang, N. Jerger, and J. Stean : "A programmable architecture for NoC simulation on FPGAs". In Fifth IEEE/ACM International Symposium on Networks on Chip (NoCS), 2011.

[5] I. Kuon and J. Rose. "Measuring the Gap Between FPGAs and ASICs". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2007, Vol. 26, pp 203 – 215.

[6] M. Saldana, L. Shannon, J. S. Yue, S. Bian, J. Craig and P. Chow. "Routability of Network Topologies in FPGAs". IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2007, Vol. 15, pp 948 – 951.

[7] M. K. Papamichael and J. C. Hoe. "CONNECT: Re-Examining Conventional Wisdom for Designing NoCs in the Context of FPGAs". In FPGA, 2012.

[8] J. Lee and L. Shannon. "The Effect of Node Size, Heterogeneity, and Network Size on FPGA based NoCs". In International Conference on Field-Programmable Technology (FPT), 2009.

[9] Shelburne, M. Patterson, C. Athanas, P. Jones, M.Martin, B. Fong, R. "MetaWire: Using FPGA Configuration Circuitry to Emulate a Network-on-Chip". In International Conference on Field Programmable Logic and Applications (FPL), 2008.

### BIOGRAPHY



**Deepan Raj.B** obtained his B.E degree in ECE from SNS College of Technology, Coimbatore in 2011. He is a PG Scholar of 2013 batch in Applied Electronics from Bannari Amman Institute of Technology, Sathy . His area of interest are VLSI Design and Computer Networks.



**T.V.P.Sundararajan** obtained his B.E degree in ECE from Kongu Engineering College, Erode in 1993. He obtaines his M.E degree in Applied Electronics from Government College of Technology, Coimbatore in 1999. He has done Ph.d in "Investigations On The Performance Of Security Enhancement Schemes For Routing Protocols In Mobile Ad Hoc Networks". He has teaching experience of 13 years. His area of interest are Communication and Networking.

**K.Shoukath Ali** obtained his B.E degree in C.Abdul Hakeem College of Engg & Tech, Vellore in 2008. He obtained his M.E degree from PSG College of Technology, Coimbatore in 2010. His area of interest are Communication and VLSI Design.