SRAM based reprogrammable FPGA with high-flexibility combined with high-performance have become increasingly important for use in space applications. With the advances in technology, the device size decreasing below nm, FPGAs used in space environment are more susceptible to radiation. The radiation effects can cause Single Event Upset (SEU) which are soft-errors and non-destructive. This can appear as transient pulses in logic or support circuitry, or as bit flips in memory cells or registers of SRAM cells and respectively change the function of logic elements within FPGA. There are various methodologies proposed in the literature that would reduce the effects and the mask them for proper device operation. In the paper we study the various redundancy techniques that have been proposed to improve reliability of the system designed with SRAM FPGA`s.
Keywords |
SEE, SEU, TMR, CGTMR, AND FGTMR. |
INTRODUCTION |
Over the past few years there has been a strong tendency to replace hardened electronics for space by commercial-offthe-
shelf (COTS) components, fabricated in a mainstream commercial CMOS technology[1]. A Field Programmable
Gate Array (FPGA) is a semiconductor logic device with reprogrammability. Reprogrammability in FPGAs makes it
possible to realize a logic function after the manufacturing process. With satellite lifetimes increased far beyond 10
years, much longer than the validity of previous standards, reprogrammability of system becomes a stringent
requirement. With little software solutions possible, FPGA are the only possible solution. A strong expectation is that
scaled technologies should inherently be more radiation tolerant. In order make a FPGA fault tolerant various
methodologies have been proposed in the literature. Of these redundancy techniques play vital role in conjunction with
reconfiguration techniques improve the reliability of the system drastically in space environment. |
In the section II of the paper we explore the various technologies of FPGA and in the process we focus on the
suitability of SRAM-based FPGA for space applications, then radiation effects on SRAM FPGA is discussed in section
III. In the section IV we study about mitigation of the system errors by common test and mitigation techniques. Section
V then focuses on SEU mitigation through various redundancy techniques proposed in the literature. The change to
fine-grain approach from coarse-gain is discussed in Section VI and the paper concludes with Section VII. |
FPGA TECHNOLOGIES |
Field Programmable Gate Arrays (FPGAs) with in-field reprogrammability, low non-recurring engineering costs
(NRE), and with relative short design cycle are becoming a key component of digital system. Recently, there has been
great interest in using FPGAs within spacecraft with a mixed level of success. FPGA`s are classified based upon
different process technologies to build the memory cells used to program the device. Of these Anti-fused and SRAMbased
FPGA are widely used for implementation of digital systems. |
Antifused FPGA: |
Antifuse cells are non-volatile and only one-time programmable (OTP). Instead of breaking a metal connection by
passing current through (like fuse-technology), a link is grown to make a connection. Antifuse-based devices are very
good for high reliability applications, because of their unlimited data-retention time. However due to large
programming transistors on the device, design changes are not possible and FPGAs cannot be reprogrammed. |
SRAM-based FPGA: |
SRAM programming technology uses static memory cells or SRAM cells for programming. In SRAM-based devices,
static memory cells, such as the one shown in Fig.1, provide configurability for FPGAs. SRAM cells are also used in
interconnection or implementing logic functions. In interconnects, SRAM cells are used to select lines to multiplexers
in order to connect logic components in an appropriate way. In order to implement logic functions, SRAM-based
FPGAs use Look Up Tables(LUTs). |
|
SRAM-based FPGAs are widely used in different applications due to their reprogrammability. SRAM bits can be
configured by user defined configurations on power up. This can be done an indefinite number of times.
Unlike other programming technologies, SRAM-based cells use the standard CMOS processing technology. As a
result, the latest available CMOS technology can also be applied to SRAM-based FPGAs and, therefore, benefit from
the increased integration, the higher speeds and the lower dynamic power consumption |
Advantage of SRAM FPGAs |
• SRAM-based FPGAs can be created using a standard CMOS process which means they are available at the
forefront of each new technology node. |
• Antifuse-based FPGAs require extra processing steps during the manufacturing process.
Therefore, SRAM-based FPGAs are more suitable for reconfigurable computing applications, where the device
function is dynamically changed according to the need of the design. |
CHALLENGES FOR FPGAs IN SPACE ENIVORNMENT |
Space environment [2] is different from terrestrial systems. Space environment consists of electrons, protons and heavy
ions which frequently effect the operation of electronic devices in space. The radiation environment and its effects on
electronic systems[3] ,[4] is shown in the Fig.2. |
|
The flow as shown in the Fig.2 is from the radiation environment (top) to effects that these interactions lead to
(bottom).The bottom layer shows the three effect categories: total ionizing dose, displacement damage, and single event
effects. |
Total Ionizing Dose |
The term, total ionizing dose, implies the dose that is deposited in the electronics through ionization effects only. All
types of electronics are susceptible to ionization but the charge generated inside the semiconductormaterial can quickly
be collected and removed without ill-effect. In general, TID effects are mitigated through proper use of shielding
materials. |
Displacement Damage |
The second area of the cumulative effects of radiation is displacement effects. In which if sufficiently high energy
supplied to the atom, it can overcome the binding energy of the atom in the crystalline lattice of the material. If this
occurs, the atom is "displaced" from its normal position to various end locations. Unless the end location is an exact
duplicate of the former position, the regular order of the crystalline lattice is disturbed. In general, displacement effects
are also mitigated through proper use of shielding materials. |
Single Event Effects (SEE) |
Single Event Effects (SEEs) are effects caused by a single, energetic particle radiation on electronic circuit, which
causes transient errors and it can take on many forms. There are three main subclasses of SEEs which are: Single Event
upsets (SEUs), Single Event Functional Interrupt (SEFIs), and Single Event Transients (SETs). |
Single Event Upsets (SEUs) are soft errors, and non-destructive. They normally appear as transient pulses in logic or
support circuitry, or as bit-flips in memory cells or registers. Several types of hard errors, potentially destructive, can
appear: Single Event Latchup (SEL) results in a high operating current, above devicespecifications, and must be
cleared by a power reset. When impacting ions induce voltage pulses on combinatorial circuitry in a device, these
effects are known as Single Event Transients (SETs). |
The concern for FPGAs is the effect of the radiation on SRAM cells. If an ion strike of sufficient energy occurs near
one of these transistors, the bit value stored in the cell can change or flip.The major effect of the upset is configuration
bit-stream bit flipping which results in the change of the functionality of the SRAM FPGAs. |
SEU MITIGATION TECHNIQUES |
Various Techniques are employed in order to reduce the effects of radiation on SRAM FPGA`S. These mitigation
techniques are broadly classified into two categories, Reconfiguration-based techniques, Redundancy-based techniques.
Reconfiguration-based Techniques
Reconfiguration, also known as Scrubbing, is to write the original configuration to the memory periodically, so single
event upsets (SEU) due to radiation are corrected.Scrubbing techniques are classified into two types, Blind Scrubbing
and Read-back Scrubbing. |
Blind-Scrubbing |
Blind Scrubbing is a periodical scrubbing in which the reconfiguration of memory occurs irrespective of occurrence of
any upsets in the system. Due to periodically stopping the system operation and reconfiguring it may degrade the
system performance if the minimum scrubbing cycle duration is not achieved. |
Read-back Scrubbing |
Read-back Scrubbing makes use of a fault-detector which refreshes the configuration memory contents only when a
sensitive bit upset or a certain number of non-sensitive upsets are detected. It involves both reading and reloading the
configuration memory contents. |
Limitations of Reconfiguration-based Techniques |
For critical applications, use of only reconfiguration techniques to cope with the system failures may not be the viable
choice.[5] Scrubbing, reloading the configuration memory, results in temporary termination of system operation can be
fatal for critical applications because scrubbing may lead to a temporary system termination. Apart from scrubbing, the
redundancy-based techniques, which detect, mask and correct errors, are also essential to cope with high single event
upset rates.[10] Therefore, the focus of the paper will be on Redundancy-based Techniques which help in the reliable
system operation even in the presence of SEUs. |
REDUNDANCY-BASED TECHNIQUES |
Redundancy techniques are widely used to build reliable systems that continue to operate satisfactorily in the presence
of faults occurring in the components. Redundancy can take many forms such as hardware, software, time, and spatial
redundancies. However, hardware redundancy techniques, use of additional hardware components, are used extensively
for mitigating the SEU effects in SRAM FPGAs. |
TRIPLE MODULAR REDUNDANCY (TMR)[6]: |
TMR is a widely used redundancy technique for mitigation of SEUs in digital circuits. TMR scheme uses three
identical logic blocks performing the same task, the output of each block are compared through a voter and the output
is generated as shown in the Fig.3 given below. |
|
TMR technique is generally used in ASIC to protect memory elements from the radiation effects. Similarly, FPGA
configuration memory is protected using the technique otherwise any modification due to SEUs may affect entire
FPGA device. For ground based complex systems, TMR might be able to mask and correct single failures. However,
when a SEU hits the voter, it may not function anymore, hence TMR is implemented by triplicating the voter. |
With the advances in device technology the device scaling reducing below nm, there is an increase in high upset rates
in SRAM FPGA`s as shown in the Table 1. For high upset rates simple replication of the entire system i.e modular
redundancy may not be sufficient for the reliable system performance. |
|
COARSE-GRAIN VS FINE-GRAIN |
The granularity of a fault-tolerance mechanism determines how the system is divided into modules for the sake of the
applying technique.
The conventional TMR is named as Coarse Grain TMR (CGTMR). CGTMR is a view to the fault tolerance issue,
which does not cope with the fine grain parts of the system. In other words, a designis triplicated and when an upset
occurs, it is not clear where exactly in the design is affected. This is specially an obstacle when more than one copy is
affected in CGTMR which is often in case of multiple SETs. A coarse grain view to redundancy might not be sufficient
for scenarios with high rates of SETs, SEUs and MBUs. |
Fine-Grain Redundancy |
If the system would be able to detect failures and upsets locally, it can be able to deal with high failure rates. These
challenges lead to a more localized view regarding the application of fault tolerance methods. It is categorized as Fine-
Grain Redundancy Techniques. |
The basic advantage of FGTMR is that Homogenous architecture like FPGAs, TMR can be applied to fine-grain
homogenous parts like LUT,CLB,etc., of the design instead of whole design. The level of granularity can be
determined based upon the requirements of the system. |
Reliability-Probability Model for Granularity[8] |
A simple reliability-probability modelling has been done to analyse the effect of fine granularities on TMR technique
which is shown in the Fig.4 indicates that reliability probability of different fine grains is in better condition than coarse
grain. |
|
CONCLUSION |
The paper presented the different mitigation techniques employed for reliable operation of SRAM FPGA`s in space
environment. The paper indicates that scrubbing cannot be the only method to mitigate SEU. However, a combination
of reconfiguration and redundancy techniques is the ideal way to deal with SEU effects in SRAM FPGA`s. The
reliability of SRAM FPGA`s can be improved still further by change in the granularity from coarse to fine-grain. |
References |
- ROOSTA, R.: A Comparison of Radiation-Hard and Radiation-Tolerant FPGAsfor Space Applications. NASA Electronic Parts and Packaging Program,NASA and Jet Propulsion Laboratory, December 2004.
- RUHL, K.: An introduction to Space Weather. Computer Sceintists, March 2010.
- RICHARD H. MAURER, MARTIN E. FRAEMAN,M. N. M. andD. R. ROTH:Harsh Environments: Space Radiation Environment, Effects, and Mitigation. John Hopkins Apl Technical Digest,28(1), 2008.
- NASA/GSFC Radiation Effects and Analysis home page.
- BERG, M., C. POIVEY, D. PETRICK, D. ESPINOSA, A. LESEA, K. LABEL, M. FRIENDLICH, H. KIM and A. PHAN: Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and AnalysisNuclear Science, IEEE Transactions on,55(4):2259 âÃâ¬Ãâ2266, aug.2008.
- LYONS, R. E. and W. VANDERKULK: The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM Journal of Research and Development,6(2):200 âÃâ¬Ãâ209, April, 1962.
- Anthony J.Yu, Guy G.Lemieux: FPGA Defect Tolerance: Impact of Granularity
- NIKNAHAD, M., O. SANDER and J. BECKER: FGTMR - Fine Grain Redundancy Method for Reconfigurable Architectures under high Failure Rates.NASNIT 2011.
|