Fine-Grain Redundancy Techniques for High-
Reliable SRAM FPGA`S in Space
Environment: A Brief Survey

T.Srinivas Reddy; J.Santosh; J.Prabhakar

Fine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey

T.Srinivas Reddy, J.Santosh, J.Prabhakar
Assistant Professor, Department of ECE, MREC, Hyderabad, India1

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Abstract

SRAM based reprogrammable FPGA with high-flexibility combined with high-performance have become increasingly important for use in space applications. With the advances in technology, the device size decreasing below nm, FPGAs used in space environment are more susceptible to radiation. The radiation effects can cause Single Event Upset (SEU) which are soft-errors and non-destructive. This can appear as transient pulses in logic or support circuitry, or as bit flips in memory cells or registers of SRAM cells and respectively change the function of logic elements within FPGA. There are various methodologies proposed in the literature that would reduce the effects and the mask them for proper device operation. In the paper we study the various redundancy techniques that have been proposed to improve reliability of the system designed with SRAM FPGA`s.

Keywords

SEE, SEU, TMR, CGTMR, AND FGTMR.

INTRODUCTION

Over the past few years there has been a strong tendency to replace hardened electronics for space by commercial-offthe- shelf (COTS) components, fabricated in a mainstream commercial CMOS technology[1]. A Field Programmable Gate Array (FPGA) is a semiconductor logic device with reprogrammability. Reprogrammability in FPGAs makes it possible to realize a logic function after the manufacturing process. With satellite lifetimes increased far beyond 10 years, much longer than the validity of previous standards, reprogrammability of system becomes a stringent requirement. With little software solutions possible, FPGA are the only possible solution. A strong expectation is that scaled technologies should inherently be more radiation tolerant. In order make a FPGA fault tolerant various methodologies have been proposed in the literature. Of these redundancy techniques play vital role in conjunction with reconfiguration techniques improve the reliability of the system drastically in space environment.

In the section II of the paper we explore the various technologies of FPGA and in the process we focus on the suitability of SRAM-based FPGA for space applications, then radiation effects on SRAM FPGA is discussed in section III. In the section IV we study about mitigation of the system errors by common test and mitigation techniques. Section V then focuses on SEU mitigation through various redundancy techniques proposed in the literature. The change to fine-grain approach from coarse-gain is discussed in Section VI and the paper concludes with Section VII.

FPGA TECHNOLOGIES

Field Programmable Gate Arrays (FPGAs) with in-field reprogrammability, low non-recurring engineering costs (NRE), and with relative short design cycle are becoming a key component of digital system. Recently, there has been great interest in using FPGAs within spacecraft with a mixed level of success. FPGA`s are classified based upon different process technologies to build the memory cells used to program the device. Of these Anti-fused and SRAMbased FPGA are widely used for implementation of digital systems.

Antifused FPGA:

Antifuse cells are non-volatile and only one-time programmable (OTP). Instead of breaking a metal connection by passing current through (like fuse-technology), a link is grown to make a connection. Antifuse-based devices are very good for high reliability applications, because of their unlimited data-retention time. However due to large programming transistors on the device, design changes are not possible and FPGAs cannot be reprogrammed.

SRAM-based FPGA:

SRAM programming technology uses static memory cells or SRAM cells for programming. In SRAM-based devices, static memory cells, such as the one shown in Fig.1, provide configurability for FPGAs. SRAM cells are also used in interconnection or implementing logic functions. In interconnects, SRAM cells are used to select lines to multiplexers in order to connect logic components in an appropriate way. In order to implement logic functions, SRAM-based FPGAs use Look Up Tables(LUTs).

SRAM-based FPGAs are widely used in different applications due to their reprogrammability. SRAM bits can be configured by user defined configurations on power up. This can be done an indefinite number of times. Unlike other programming technologies, SRAM-based cells use the standard CMOS processing technology. As a result, the latest available CMOS technology can also be applied to SRAM-based FPGAs and, therefore, benefit from the increased integration, the higher speeds and the lower dynamic power consumption

Advantage of SRAM FPGAs

• SRAM-based FPGAs can be created using a standard CMOS process which means they are available at the forefront of each new technology node.

• Antifuse-based FPGAs require extra processing steps during the manufacturing process. Therefore, SRAM-based FPGAs are more suitable for reconfigurable computing applications, where the device function is dynamically changed according to the need of the design.

CHALLENGES FOR FPGAs IN SPACE ENIVORNMENT

Space environment [2] is different from terrestrial systems. Space environment consists of electrons, protons and heavy ions which frequently effect the operation of electronic devices in space. The radiation environment and its effects on electronic systems[3] ,[4] is shown in the Fig.2.

The flow as shown in the Fig.2 is from the radiation environment (top) to effects that these interactions lead to (bottom).The bottom layer shows the three effect categories: total ionizing dose, displacement damage, and single event effects.

Total Ionizing Dose

The term, total ionizing dose, implies the dose that is deposited in the electronics through ionization effects only. All types of electronics are susceptible to ionization but the charge generated inside the semiconductormaterial can quickly be collected and removed without ill-effect. In general, TID effects are mitigated through proper use of shielding materials.

Displacement Damage

The second area of the cumulative effects of radiation is displacement effects. In which if sufficiently high energy supplied to the atom, it can overcome the binding energy of the atom in the crystalline lattice of the material. If this occurs, the atom is "displaced" from its normal position to various end locations. Unless the end location is an exact duplicate of the former position, the regular order of the crystalline lattice is disturbed. In general, displacement effects are also mitigated through proper use of shielding materials.

Single Event Effects (SEE)

Single Event Effects (SEEs) are effects caused by a single, energetic particle radiation on electronic circuit, which causes transient errors and it can take on many forms. There are three main subclasses of SEEs which are: Single Event upsets (SEUs), Single Event Functional Interrupt (SEFIs), and Single Event Transients (SETs).

Single Event Upsets (SEUs) are soft errors, and non-destructive. They normally appear as transient pulses in logic or support circuitry, or as bit-flips in memory cells or registers. Several types of hard errors, potentially destructive, can appear: Single Event Latchup (SEL) results in a high operating current, above devicespecifications, and must be cleared by a power reset. When impacting ions induce voltage pulses on combinatorial circuitry in a device, these effects are known as Single Event Transients (SETs).

The concern for FPGAs is the effect of the radiation on SRAM cells. If an ion strike of sufficient energy occurs near one of these transistors, the bit value stored in the cell can change or flip.The major effect of the upset is configuration bit-stream bit flipping which results in the change of the functionality of the SRAM FPGAs.

SEU MITIGATION TECHNIQUES

Various Techniques are employed in order to reduce the effects of radiation on SRAM FPGA`S. These mitigation techniques are broadly classified into two categories, Reconfiguration-based techniques, Redundancy-based techniques. Reconfiguration-based Techniques Reconfiguration, also known as Scrubbing, is to write the original configuration to the memory periodically, so single event upsets (SEU) due to radiation are corrected.Scrubbing techniques are classified into two types, Blind Scrubbing and Read-back Scrubbing.

Blind-Scrubbing

Blind Scrubbing is a periodical scrubbing in which the reconfiguration of memory occurs irrespective of occurrence of any upsets in the system. Due to periodically stopping the system operation and reconfiguring it may degrade the system performance if the minimum scrubbing cycle duration is not achieved.

Read-back Scrubbing

Read-back Scrubbing makes use of a fault-detector which refreshes the configuration memory contents only when a sensitive bit upset or a certain number of non-sensitive upsets are detected. It involves both reading and reloading the configuration memory contents.

Limitations of Reconfiguration-based Techniques

For critical applications, use of only reconfiguration techniques to cope with the system failures may not be the viable choice.[5] Scrubbing, reloading the configuration memory, results in temporary termination of system operation can be fatal for critical applications because scrubbing may lead to a temporary system termination. Apart from scrubbing, the redundancy-based techniques, which detect, mask and correct errors, are also essential to cope with high single event upset rates.[10] Therefore, the focus of the paper will be on Redundancy-based Techniques which help in the reliable system operation even in the presence of SEUs.

REDUNDANCY-BASED TECHNIQUES

Redundancy techniques are widely used to build reliable systems that continue to operate satisfactorily in the presence of faults occurring in the components. Redundancy can take many forms such as hardware, software, time, and spatial redundancies. However, hardware redundancy techniques, use of additional hardware components, are used extensively for mitigating the SEU effects in SRAM FPGAs.

TRIPLE MODULAR REDUNDANCY (TMR)[6]:

TMR is a widely used redundancy technique for mitigation of SEUs in digital circuits. TMR scheme uses three identical logic blocks performing the same task, the output of each block are compared through a voter and the output is generated as shown in the Fig.3 given below.

TMR technique is generally used in ASIC to protect memory elements from the radiation effects. Similarly, FPGA configuration memory is protected using the technique otherwise any modification due to SEUs may affect entire FPGA device. For ground based complex systems, TMR might be able to mask and correct single failures. However, when a SEU hits the voter, it may not function anymore, hence TMR is implemented by triplicating the voter.

With the advances in device technology the device scaling reducing below nm, there is an increase in high upset rates in SRAM FPGA`s as shown in the Table 1. For high upset rates simple replication of the entire system i.e modular redundancy may not be sufficient for the reliable system performance.

COARSE-GRAIN VS FINE-GRAIN

The granularity of a fault-tolerance mechanism determines how the system is divided into modules for the sake of the applying technique. The conventional TMR is named as Coarse Grain TMR (CGTMR). CGTMR is a view to the fault tolerance issue, which does not cope with the fine grain parts of the system. In other words, a designis triplicated and when an upset occurs, it is not clear where exactly in the design is affected. This is specially an obstacle when more than one copy is affected in CGTMR which is often in case of multiple SETs. A coarse grain view to redundancy might not be sufficient for scenarios with high rates of SETs, SEUs and MBUs.

Fine-Grain Redundancy

If the system would be able to detect failures and upsets locally, it can be able to deal with high failure rates. These challenges lead to a more localized view regarding the application of fault tolerance methods. It is categorized as Fine- Grain Redundancy Techniques.

The basic advantage of FGTMR is that Homogenous architecture like FPGAs, TMR can be applied to fine-grain homogenous parts like LUT,CLB,etc., of the design instead of whole design. The level of granularity can be determined based upon the requirements of the system.

Reliability-Probability Model for Granularity[8]

A simple reliability-probability modelling has been done to analyse the effect of fine granularities on TMR technique which is shown in the Fig.4 indicates that reliability probability of different fine grains is in better condition than coarse grain.

CONCLUSION

The paper presented the different mitigation techniques employed for reliable operation of SRAM FPGA`s in space environment. The paper indicates that scrubbing cannot be the only method to mitigate SEU. However, a combination of reconfiguration and redundancy techniques is the ideal way to deal with SEU effects in SRAM FPGA`s. The reliability of SRAM FPGA`s can be improved still further by change in the granularity from coarse to fine-grain.

References

ROOSTA, R.: A Comparison of Radiation-Hard and Radiation-Tolerant FPGAsfor Space Applications. NASA Electronic Parts and Packaging Program,NASA and Jet Propulsion Laboratory, December 2004.

RUHL, K.: An introduction to Space Weather. Computer Sceintists, March 2010.

RICHARD H. MAURER, MARTIN E. FRAEMAN,M. N. M. andD. R. ROTH:Harsh Environments: Space Radiation Environment, Effects, and Mitigation. John Hopkins Apl Technical Digest,28(1), 2008.

NASA/GSFC Radiation Effects and Analysis home page.

BERG, M., C. POIVEY, D. PETRICK, D. ESPINOSA, A. LESEA, K. LABEL, M. FRIENDLICH, H. KIM and A. PHAN: Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and AnalysisNuclear Science, IEEE Transactions on,55(4):2259 ÃÂ¢Ãâ¬Ãâ2266, aug.2008.

LYONS, R. E. and W. VANDERKULK: The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM Journal of Research and Development,6(2):200 ÃÂ¢Ãâ¬Ãâ209, April, 1962.

Anthony J.Yu, Guy G.Lemieux: FPGA Defect Tolerance: Impact of Granularity

NIKNAHAD, M., O. SANDER and J. BECKER: FGTMR - Fine Grain Redundancy Method for Reconfigurable Architectures under high Failure Rates.NASNIT 2011.