ABSTRACT
Due to the shrinking of feature size and reduction in supply voltages, nanoscale circuits have become more susceptible to radiation induced transient faults. In this paper, we present a symbolic framework based on BDDs and ADDs that enables analysis of combinational circuit reliability from different aspects: output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns. This is demonstrated by the set of experimental results, which show that the mean output error susceptibility can vary from less than 0.1%, for large circuits and small glitches, to about 30% for very small circuits and large enough glitches. The results obtained with the proposed symbolic framework are within 7% average error and up to 5000X speedup when compared to HSPICE detailed circuit simulation. The framework can be used for selective gate sizing targeting radiation hardening which is done only for gates with error impact exceeding a certain threshold. Using such a technique, soft error rate (SER) can be reduced by 23-67% for various threshold values, when applied to a subset of ISCAS’85 and mnc’91 benchmarks.

Categories and Subject Descriptors: B.8.1 Reliability, Testing, and Fault-Tolerance

General Terms: Reliability

Keywords: SER, reliability symbolic techniques

1. INTRODUCTION
For the last few decades, the main factors driving the design of digital systems have been cost, performance, and, more recently, power consumption. However, with technology scaling, reliable operation of digital systems is being severely challenged, thus pointing to the use of fault-tolerance-driven design methodologies [1]. Due to reduction in device feature size and supply voltage, the sensitivity to radiation induced transient faults of digital systems increases dramatically [2]. When a radiation event causes a charge generation large enough to flip the output of a gate, a single-event transient (SET) is generated, which, if propagated and latched into a memory element, will lead to a single event upset (SEU) or a soft error. Soft errors are measured by the soft error rate (SER) in FITs (failure-in-time), which is defined as one failure in 10^9 hours.

Traditionally, soft errors have been a much greater concern in memories than in combinational logic, due to three factors that prevented logic from becoming more susceptible to soft errors [1]:
• logical masking – to be latched, a SET needs to propagate on a sensitized path from the location where it originates to a latch;
• electrical masking – due to the electrical properties of the gates the glitch is passing through, it can be attenuated or even completely masked before it reaches the latch;
• latching-window masking – only if the glitch reaches the latch and satisfies setup and hold time conditions, it will be latched.

In this work, we estimate the likelihood that a transient fault will lead to a soft error. Our main goal is to allow for symbolic modeling and efficient estimation of the susceptibility of a combinational logic circuit to soft errors. We further use this framework to reduce the cost of radiation hardening techniques by selectively resizing the gates that have the largest impact on circuit error.

The rest of this paper is organized as follows. In Section 2 we outline the contribution of our work. In Section 3 we give an overview of related work. Section 4 describes the assumptions and the notations we use in the rest of the paper. Section 5 presents in more detail the mathematical model that lies behind our framework. In Section 6, we describe our symbolic modeling methodology, while in Section 7 we describe a practical method for determining circuit susceptibility to soft errors and how this can be applied to harden the circuit. In Section 8, we report experimental results for a set of common benchmarks. Finally, with Section 9 we conclude our work and provide some directions for future work.

2. PAPER CONTRIBUTION
In order to estimate the probability of errors in combinational logic, our symbolic tool uses Binary Decision Diagrams (BDDs) and Algebraic Decision Diagrams (ADDs). BDDs [3] provide an efficient and canonical representation for Boolean functions. ADDs are presented in [4] as a class of symbolic models and associated algorithms applicable not only to arithmetic, but also to many algebraic structures. In comparison to [1,5-7], where the impact of logical, electrical and latching-window masking is evaluated separately and then merged into the final reliability measure, our approach provides a unified treatment of these three factors, while including their joint dependency on input patterns and circuit topology. In our work, by using BDDs and ADDs, the information about the masking factors is implicitly generated inside the decision diagrams, and therefore allows for efficient concurrent computation of output error susceptibility due to hits on various internal nodes.

The unified treatment of three masking factors is important due to the following:
• logical masking depends on inputs and circuit topology since, for different input vectors, different paths in the circuit are sensitized;
• electrical masking (glitch attenuation) depends on the gates through which glitch propagates, and thus depends on logical masking;
• the probability of latching the glitch depends on the glitch size at the output, which is a function of:
  a) the initial size of the glitch and the attenuation on the sensitized paths;
  b) the size and relative arrival time of reconvergent glitches, which affects the amplitude/duration of the resulting glitch.

Considering these three factors independently is an incorrect assumption as they all depend on the inputs and sensitized paths from the gate hit and outputs.

To prove these claims, we show an example in Figure 1. We consider separately the effect of logical masking, on one hand, and the
effect of electrical and latching-window masking, on the other hand, for the ISCAS'85 benchmark C17. Our framework allows for joint, unified modeling of logical, electrical and latching-window masking (UM column), as well as separate logical masking only (LM column) or electrical and latching-window masking only (ELWM column). In the case of C17, for gate G2 there are two paths that lead to output 7, and the separate computation of different masking factors leads to overestimation of the probability of error, as shown in the table in Figure 1. The results in Figure 1 are presented for three different input vector probability distributions that exercise the two paths from G2 to output 7 with different probabilities. The last column in the table includes the results of our model when all three masking factors are considered together. As it can be seen from these results, multiplying the values in column LM by the values in column ELWM leads to the overestimation of the probability of error (column LM+ELWM) by as much as 100%.

![Figure 1. Example circuit C17 and results for separate and unified treatment of masking factors.](image)

To this end, the contributions of this work consist of:

- Fast and accurate estimation of propagated glitch duration and amplitude when compared to HSPICE detailed circuit simulation (up to 5000X speedup with 7% average error);
- ADD- and BDD-based symbolic circuit reliability modeling that unifies logical, electrical, and latching window masking treatment and their interdependencies due to various input patterns;
- Characterization of output error susceptibility and internal gate error impact under various input distributions;
- Radiation hardening based on selective gate resizing which shows SER reduction of up to 67% with 17% area overhead.

3. RELATED WORK

Intensive research has been done so far in the area of analysis and modeling of the effect of transient faults in logic circuits [5-12]. One obvious approach is to inject the fault into the given node of the circuit and simulate the circuit for different input vectors in order to find whether the fault propagates [11-12]. However, this approach becomes intractable for larger circuits and larger number of inputs.

In [5-7], the authors separate the analysis of the three masking factors and include different heuristics to speed up the evaluation of the soft error susceptibility of logic circuits. Recently, several symbolic models have been developed to estimate the susceptibility of logic circuits to soft errors. Work by Krishnaswamy et al. in [8] uses probabilistic transfer matrices to represent gate functionality. However, their work focuses only on logical masking for given gate output error probabilities, without considering electrical and latching-window masking. In [9], the authors give a mathematical model to estimate electrical masking when a transient fault propagates through the gate, but do not model logical and latching-window masking. The authors of [10] use BDDs to represent sensitized path information, as well as upset events. However, their approach appears to rely on explicit enumeration of BDDs corresponding to all input conditions and assumes simple superposition of reconvergent glitches, without considering their possible mutual masking. Furthermore, since it doesn’t rely on using ADDs, the approach in [10] cannot model arbitrary input distributions which can be handled with ADDs via Dynamic Markov Models [13].

The approaches in [1,5-7,10] belong to one of the following groups:

1. Logical and electrical/latching-window masking are considered independent across various input streams, thus corresponding to the case when their probabilities are multiplied (as in Figure 1, column LM+ELWM) [1,5];
2. Latching probabilities are determined and summed up for all sensitized paths, irrespective of their possible overlap or masking of SETs propagated on reconvergent paths [6,7,10].

Approaches falling under (1) above fail in both cases, while those under (2) fail in the case when reconvergent glitches exist. This stems from the fact that reconvergent glitches can cancel one another, or the resulting glitch(es) can be smaller or larger than the original ones.

4. ASSUMPTIONS AND NOTATIONS

We show in Figure 2 an example of a target circuit, including the combinational logic, as well as its input and output latches. To this end, the purpose of our work is to estimate the probability that a pulse or glitch, occurring due to some transient physical phenomenon at an internal gate G of the circuit, will result in an error at output F. In our framework, we capture all gate-output combinations, i.e., we determine the probability of a soft error at any output due to a fault originating at any internal gate. Figure 3 shows the propagation of the glitch, that is, the shape at the output of gate G where it occurs (a), at the input and output of a gate G’ on the sensitized path between gate G and latched output F (b), and at the output F (c).

![Figure 2. A target combinational circuit.](image)

At the output of gate G, the glitch has an initial duration $d_{init}$ and initial amplitude $a_{init}$. The duration at the output of the gate is always measured at switching threshold voltage ($V_{th}$) [14] of downstream gate, therefore, according to Figure 3:

$$d_{init} = t_2 - t_1 \quad (1)$$

At the input of gate G’, the glitch has amplitude $a_{in}$ and duration $d_{in}$ and the output amplitude $a_{out}$ and duration $d_{out}$. Durations $d_{in}$ and $d_{out}$ are in this case measured at the switching threshold voltage of gate G’. However, for all output neighbors of gate G’, $d_{out}$ will be recomputed according to their switching thresholds. Propagation delay of gate G’ is $t_{prop}$. To find out if the glitch propagates through gate G’, and to compute the new amplitude and duration, we use the methodology from [9]. Finally, at the latched output F, the glitch has amplitude $A$ and duration $D$. Switching threshold voltage of the latch, at which $D$ is
measured, is $V_{\text{S latch}}$. Since there is a delay from gate $G$ to output $F$ (denoted by $T_F$), the time when the glitch becomes larger than $V_{\text{S latch}}$ is $t_1$; and when it becomes lower than $V_{\text{S latch}}$ is $t_2$:

$$T_F = t_1 - t_2 \quad (2)$$

The duration $D$, as well as the amplitude $A$, can have different values at output $F$, depending on the various sensitized paths, from $G$ to $F$. For a given initial glitch, the set of different values of duration $D$ at output $F$ for various sensitized paths is denoted by $\{D_i\}$. The delay $T_F$ depends on the sensitized path (i.e., on the gate delays on that path) from gate $G$ to output $F$, while the delay from input latches to gate $G$ ($T_G$) depends on the path from inputs to gate $G$. When computing latching-window masking, we assume the worst case delay $T_1$ for which the latching window probability is maximized, as described next.

Since we are interested in the propagation of a glitch in the time interval between two rising edges of the clock signal, we can take $[0, T_{\text{clk}}]$ as the interval of observation. For a signal to be latched, it needs to be stable during the setup time $t_{\text{setup}}$, before the rising edge of the clock and hold time $t_{\text{hold}}$ after the rising edge of the clock. In other words, it needs to be stable inside interval $[T_{\text{clk}} - t_{\text{setup}}, T_{\text{clk}} + t_{\text{hold}}]$.

5. MATHEMATICAL MODEL

This section describes the conditions that are needed for a transient glitch at the output of an internal gate to be propagated to the output and latched, such that a soft error is registered. We detail the interdependency between logical, electrical, and latching-window masking, and describe their joint model.

5.1. Necessary conditions

If a radiation event results in a glitch at the output of gate $G$, in order to be latched at the output $F$, the following (latching) condition needs to be satisfied:

- the glitch has to appear at the output $F$ on time to be latched (i.e., it satisfies the setup time and hold time conditions).

This condition implies two conditions for the size of the glitch at the output:

- the amplitude of the glitch at output $F$ must be larger than the switching threshold of the latch (if the correct output value is “0”) or smaller than the switching threshold (if the correct output value is “1”);
- the duration of the glitch at output $F$ has to be larger than the sum of setup and hold time of the latch.

As mentioned in Section 4, the switching threshold of the latch at the output $F$ is $V_{\text{S latch}}$. To satisfy the latching condition, the time at which the glitch reaches $V_{\text{S latch}}(t')$ must satisfy:

$$t_1 \leq T_{\text{clk}} - t_{\text{setup}} \quad (4)$$

In addition, the time when the glitch becomes less than $V_{\text{S latch}}(t')$ must satisfy:

$$t_2 \geq T_{\text{clk}} + t_{\text{hold}} \quad (5)$$

with duration $D$ of the glitch at output $F$ given by equation (3). Thus, we can write the condition for the time when glitch needs to occur at gate $G$ to be latched at output $F$, as:

$$t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2] \quad (6)$$

It is important to note here that, even if $t_1$ does not satisfy this condition, there is a non-zero probability of a metastable state, thus latching the wrong value. However, since this probability is of the order of $10^{-5}$ for current technology [14], its contribution is negligible for all practical purposes when compared to output error rates.

More formally, one can define the three events, $A$, $D$, and $T_F$ that occur when previously described conditions are satisfied, as:

$$A: A > V_{\text{S latch}} \quad (\text{when correct output value is “0”})$$

$$D: D > t_{\text{setup}} + t_{\text{hold}}$$

$$T_F: t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]$$

As seen in Figure 3, $D$ occurs only if $A$ occurs, that is, only if the amplitude of the glitch at the output $F$ is larger than the switching threshold $V_{\text{S latch}}$; the duration will be different from zero, and then:

$$D \subseteq A \quad (7)$$

and thus:

$$A \cap D = D \quad (8)$$

Therefore, the probability that a glitch originating at gate $G$ is latched at the output $F$ can be written as:

$$P(A \cap D \cap \bar{D}) = P(T \cap D) = P(t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]) \cap D > t_{\text{setup}} + t_{\text{hold}}) =$$

$$P(t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]) \cap (D = D_1) =$$

$$\sum_i P(t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]) \cap D = D_1) = P(D = D_1)$$

where $\{D_1\}$ is the set of possible glitch durations, along various sensitized paths.

As in [9], we assume that $t_1$ is uniformly distributed in the interval $(T_1, T_1 + T_{\text{clk}} - d_{\text{int}})$, i.e., only the interval while output of gate $G$ is stable is considered. Thus, in the worst case when, for a given glitch duration $D_1$, the interval $[T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]$ lies inside it, the probability of event $T$ at the output is:

$$P(t_1 \in [T_{\text{clk}} + t_{\text{hold}} - T_2 - D, T_{\text{clk}} - t_{\text{setup}} - T_2]) \cap D = D_1) =$$

$$D_1 - (t_{\text{setup}} + t_{\text{hold}}) \quad (10)$$

5.2. The attenuation model

From previous equations we can see that, to determine the probability that a glitch, originating at the output of a gate $G$, is latched at output $F$, it is necessary to find out what are the possible values for glitch duration, $\{D\}$, and determine the probabilities associated with those values. Another issue is finding the correct values for the glitch amplitude at output $F$. To find these values, we use the method proposed in [9] and proved to have an average accuracy of 90% when compared to HSPICE.

6. THE SYMBOLIC MODELING FRAMEWORK

To find the probability that a glitch originating at a gate $G$ is latched at output $F$ (as described in Section 5.1.), we need to find the possible values for the duration and amplitude of a glitch at the output $F$. To determine the probability of having a glitch of duration $D_1$ at that output, we use BDDs and ADDs. Our algorithm is shown in Figure 4.

ADDs are created starting with the first node in topological order. Duration and amplitude ADD are the same, except for the values stored in the terminal nodes. Terminal node “0” represents combinations of inputs that logically mask the glitch, and all cases when the glitch becomes too short or too attenuated to be propagated, i.e., all cases when the glitch is electrically masked. The values on the other terminal nodes will depend on the paths through which the glitch propagates.

The initial ADD for each gate is built for the glitch originating at that gate. It consists of only one terminal node for all possible input patterns – initial duration or amplitude value. Those ADDs are passed to all fanout gates, which use them for creating new ADDs based on their own attenuation model. Since the glitch propagates only if it is on a sensitized path, we also need to create sensitization BDDs that store information on sensitized paths. Starting with the first node in the list sorted in topological order, we create ADDs and BDDs at each node (memory requirements are kept minimal as they are destroyed as soon as
as they are not needed). Moreover, some of the current ADDs become “0” due to masking effects, so those ADDs are also removed. When the final node in the circuit is reached, only the ADDs for outputs are needed. Each of these ADDs represents a gate-output pair, where gate is the one where glitch appears and output is the one for which we determine the error susceptibility. The terminal nodes for these ADDs represent the final duration or amplitude of the glitch at the output. In addition to them, we also keep track of the propagated delays from the originating gate to the current gate. These delays are computed in parallel with creating ADDs and used when glitches from reconvergent paths are merged.

\[
\text{createAllADDs} \{ \\
\quad \text{set technology parameters; } \\
\quad \text{parse input netlist; } \\
\quad \text{for each gate in gate_node_list } \\
\quad \quad \text{build neighbors list; } \\
\quad \text{sort gates topologically; } \\
\quad \text{for each gate in sorted_gate_node_list } \\
\quad \quad \text{create output BDD; } \\
\quad \quad \text{find reconvergent paths; } \\
\quad \quad \text{merge ADDs; } \\
\quad \quad \text{create sensitization BDDs; } \\
\quad \quad \text{create duration and amplitude ADDs; } \\
\quad \quad \text{remove zero ADDs; } \\
\quad \quad \text{pass all ADDs to output neighbors;} \\
\}
\]

**Figure 4.** The algorithm for creating ADDs, computing output error probabilities and gate resizing.

![Sensitization BDDs for paths G2→G3→G5 and G1→G5 from circuit C17](image)

![Duration ADDs for the propagation of glitch originating at gate G2, G3 and G1](image)

To show how our method works, Figure 5 presents ADDs that are built on paths G1→G5 and G2→G3→G5 from benchmark C17 (Figure 1). Figure 5a shows sensitization BDDs for paths G1→G5 and G2→G3→G5, while Figure 5b represents initial and propagated duration ADDs for glitches originating at gate G2 (2 steps) and gates G1 and G3 (one step for each). From the ADDs we can determine output glitch duration (\(D_i\)) probabilities, via a bottom-up ADD traversal using associated input probability distributions. As it can be seen from Figure 4, the algorithm for creating ADDs is linear in the number of gates and number of inputs, while the algorithm for computing probabilities is linear in the number of gates and number of outputs.

When creating ADDs, we also address the problem of glitches arriving on reconvergent paths. For example, in the case of benchmark C17, we can see that the output of gate G2 goes to gates G3 and G4, and that the outputs of these gates (G3 and G4) are inputs to gate G6. Thus, a glitch occurring at the output of the gate G2 can propagate through two paths (through gates G3 and G4) to gate G6. In this case, depending on the values on the circuit inputs, different superpositions of the two glitches arriving to the inputs of the gate G6 can occur. Therefore, when building ADDs for duration and amplitude, we consider all possible combinations of controlling and non-controlling gate inputs to compute the correct values for the output glitch duration and amplitude. Possible cases include a single (shorter or longer) glitch or multiple glitches, depending on: (i) whether the input values are controlling or not, and (ii) whether merged glitches mask each other or not.

**7. PRACTICAL CONSIDERATIONS**

When all ADDs for a given circuit are built, the error susceptibility for each output due to an error at the output of any gate in the circuit can be computed. We use equation (9) to compute these probabilities. For all pairs (output \(F_i\), gate \(G_j\)) we build ADDs representing the duration and amplitude of a glitch starting at the output of gate \(G_j\) and propagating to output \(F_i\). For multiple input distributions, characterized by different probabilities for input vectors, we compute the probability that the glitch duration \(D\) at the output is \(D_0\) and the corresponding latching probability for this specific duration value as in equation (10).

To analyze error susceptibility of a given combinational logic circuit, we assume a discrete set of test glitches of different initial duration \(d_{init}\) and amplitude \(a_{init}\) and we use a mix of random and biased input probability distributions.

**7.1. Mean Error Susceptibility, Mean Error Impact**

We analyze each circuit from two aspects: reliability of its outputs when faults occur inside the circuit and influence of individual gate errors on outputs.

For each output \(F_i\), \(d_{init}\) and \(a_{init}\) we find mean error susceptibility (MES) as the probability of output \(F_i\) failing due to errors at internal gates:

\[
\text{MES}(F_i) = \sum_{k=0}^{s_i} \sum_{n=0}^{a_i} P(F_i|G_i) \cdot P(G_i|d_{init}) \cdot P(a_{init})
\]

where \(s_i\) is the cardinality of the set of internal gates of the circuit, \(d_{init}\) and \(a_{init}\) is the cardinality of the set of probability distributions, \(\{d_i\}\) and \(\{a_i\}\), associated to the input vector stream.

For each gate \(G_i\), \(d_{init}\) and \(a_{init}\) we find minimum, maximum and median error impact over all outputs \(F_j\) that are affected by a glitch occurring at the output of gate \(G_i\). Mean error impact (MEI) for gate \(G_i\) is defined as:

\[
\text{MEI}(G_i) = \sum_{k=0}^{s_i} \sum_{n=0}^{a_i} P(F_j|G_i, d_{init}, a_{init})
\]

where \(n_F\) is the cardinality of the set of primary outputs of the circuit, \(\{F_j\}\). Similarly, we can find minimum, maximum and median error impact across all outputs and all output probability distributions. For each input probability distribution used, we also find the number of gates that do not affect any of the outputs.

**7.2. Relationship with SER**

Our framework computes MES for all outputs of the circuit and for a discrete set of pairs \((d,a)\) of initial glitch durations and amplitudes, while the surface defined by all allowed pairs \((d,a)\) is continuous. To this end, we partition this surface into a grid with increments \(\Delta d\) and \(\Delta a\) for \(d\) and \(a\), respectively. We assume that MES is constant within each sub-surface. Without loss of generality, we assume a uniform
distribution of pairs \((d,a)\) along the surface \(S = (d_{\text{max}} - d_{\text{min}}) \cdot (a_{\text{max}} - a_{\text{min}})\) of all allowed pairs, such that:

\[
P(d_{\text{i}} - \Delta d \leq d \leq d_{\text{j}}, a_{\text{i}} - \Delta a \leq a \leq a_{\text{j}}) = \frac{\Delta d \cdot \Delta a}{S}
\]  

where:

\[
d_{\text{i}} = d_{\text{min}} + l \cdot \Delta d \quad \text{and} \quad d_{\text{max}} = d_{\text{min}} + n_{i} \cdot \Delta d
\]

\[
a_{\text{i}} = a_{\text{min}} + r \cdot \Delta a \quad \text{and} \quad a_{\text{max}} = a_{\text{min}} + a_{n} \cdot \Delta a
\]

Therefore, we can find the probability of output \(F_j\) failing due to glitches at internal nodes as a double weighted sum:

\[
P(F_j) = \frac{\Delta d \cdot \Delta a}{S} \sum_{i=1}^{n} \sum_{j=1}^{m} \text{MES}(F_{ij})
\]

where \(\text{MES}(F_{ij})\) is the particle hit rate per unit of area, \(R_{\text{eff}}\) is the speedup (x100) of the gate, and \(A_{\text{circ}}\) is the total silicon area of the circuit.

### 7.3. Gate resizing for radiation hardening

When the gate width-length ratio \((W/L)\) is changed, the impact that radiation has on that gate is affected. In other words, if this ratio is larger, more charge needs to be generated by a radiation event, so as to result in a glitch of a magnitude larger than the switching threshold of that gate. The voltage \(V_{\text{out}}\) at the output of the gate can be found by solving the following differential equation [15]:

\[
C_{\text{total}} \left( \frac{dV}{dt} \right) = I_{\text{in}}(t) - \left( \frac{W}{L} \right) I_{\text{D}}(V_{\text{out}})
\]

where \(C_{\text{total}}\) is the total capacitance at the output of the gate hit by radiation, \(I_{\text{in}}(t)\) is the current pulse that resulted from the collection of charge induced by radiation (modeled as in [15]), and \(I_{\text{D}}(V_{\text{out}})\) is the effective drain current that drives the output of the gate. From HSPICE simulation, we determine the new size for the gate such that a glitch that occurs at its output, due to the charge collection, is not large enough to flip its state (that is, resulting \(V_{\text{out}}\) is smaller than the switching threshold of the gate). We show in Figure 4 the proposed \texttt{resize} algorithm that selects as candidates for resizing gates that have the mean error impact (MEI) as in (12) larger than a certain threshold. As shown in the algorithm \texttt{resize}, resizing gates with large MEI reduces it to values lower than a given threshold, with a beneficial effect on \(SER\). The algorithm can also be adapted to select gates for resizing, based on median or maximum error impact, instead of mean values.

### 8. EXPERIMENTAL RESULTS

In this section, we show the results of our symbolic framework for nine combinational circuits, given different glitch durations and different sets of input probabilities. The technology used is 70nm, Berkeley Predictive Technology Model [16]. The clock cycle period \((T_{\text{clk}})\) used is 250ps, and setup \((t_{\text{setup}})\) and hold \((t_{\text{hold}})\) times for the latches are assumed to be 10ps each. \(V_{\text{dd}}\) is assumed to be 1V, and for simplicity, all switching threshold voltages, gate threshold \(V_{\text{th}}\), and latch threshold \(V_{\text{th latch}}\) are assumed to be \(V_{\text{dd}}/2\). The delay of an inverter in the given technology is determined by simulating a ring oscillator in HSPICE and found to be 6.5ps. The delays for other gates are found by using logical and electrical effort methodology [17]. The benchmark circuits are chosen from ISCAS’85 and mcnc’91 suites. Our symbolic modeling framework is implemented in C++, and run on a 3GHz Pentium 4 workstation running Linux.

#### 8.1. Comparison with HSPICE

The first set of results compares glitch durations and delays obtained using our symbolic framework at the outputs of circuits C17 and \textit{circ}, with results from HSPICE simulations for several initial glitch durations ranging from 30ps to 120ps, assuming exhaustive input sets and considering all gate-output pairs. We find the relative error of our model for a given initial glitch size as:

\[
\text{relative error} = \frac{\sum_{k=1}^{n} \sum_{j=1}^{m} [D^{\text{symbolic}}_{ijk} - D^{\text{HSPICE}}_{ijk}] / D^{\text{HSPICE}}_{ijk}}{n_{G} \cdot n_{A} \cdot n_{V}}
\]

where \(n_{G}, n_{A}, n_{V}\) are in (11), (12), \(n_{V}\) is the number of input vectors, \(D^{\text{symbolic}}_{ijk}\) and \(D^{\text{HSPICE}}_{ijk}\) are the durations of the glitch for input vector \(k\) and the gate-output pair \(G_{j}F_{i}\), found by our symbolic framework and HSPICE, respectively. Note that this error includes a node-by-node analysis and not just a lumped \(SER\) comparison. The relative error of our framework is presented in Figure 6. As it can be seen, the error stemming from the approximate gate delay model and the attenuation model we are using ranges from less than 5% to about 20% in one instance (40ps glitch duration for C17), while averaging 7% overall for an effective 3900X average speedup (up to 5000X in some cases).

#### 8.2. \(MES\), \(MEI\) and \(SER\) results

The results for one small benchmark 5xp1 (116 gates, 7 inputs) and one larger benchmark, C1908 (174 gates, 36 inputs) are presented in Figure 7 (top four charts). We divide the interval [0,1] of possible error impact into ten subintervals. For each benchmark, each error impact interval, and various input probability distributions, we show the number of gates that have minimum, maximum, mean or median error impact in those intervals. We present this dependence in the case of two different initial glitch durations (50ps and 125ps). As can be seen, for small glitches (50ps), all error impact values are in the range from 0 to 0.4. The gates that influence outputs are just the output gates, and their fanin gates. However, a 125ps long glitch is less affected by electrical masking. Since the glitch is very long even at the output, there is a considerable number of gates that will almost certainly have an impact on output error, as seen in the middle two charts.

In Figure 7, we also present the impact of gate resizing on \(MEI\) of these two circuits for the case of a long glitch (125ps – bottom two charts). As it can be seen from the bottom two charts, gate resizing for radiation hardening moves all curves toward the left, such that all gates have \(MEI\) (mean curve) smaller than the given threshold (0.2 for 5xp1 and 0.01 for C1908). Max and median curves might still exceed the given threshold as the target for reduction was the \textit{mean} error impact.

In Figure 8, we present average bit soft error rates (original column) for the same set of benchmark circuits as it was used in [18] to report the \(MES\) values as well as the associated run times and memory requirements. The allowed interval for the initial duration of the glitch is assumed to be \((d_{\text{min}},d_{\text{max}}) = (45,125)\)ps, while initial amplitude is in the range \((a_{\text{min}},a_{\text{max}}) = (0.8,1)\)V. \(MES\), \(P(F_j)\) and \(SER\) for each output are found using equations (11), (14) and (15), respectively. Since for glitches smaller than 45ps all benchmark circuits (except for a few that have very small number of gates) have output error induced mostly by output gates and their fanin gates, we use this duration as the lower bound of our interval. Similarly, as already explained, for glitches longer than 125ps, all benchmarks propagate almost all the glitches,
and thus we use this as an upper bound. MES for each output is found within these allowed intervals at incremental steps $\Delta t = 20$ps and $\Delta t = 0.1V$. The $R_{\text{fit}}$ used is 56.5 m-2s-1, $R_{\text{off}}$ is 2.2·10-5, and the total silicon area found for each benchmark circuit is derived as a function of gate count.

![Figure 7. Error impact (with impact computed as in (12)) for a small benchmark (5xp1 – left charts) and a large benchmark (C1908 – right charts) for short and long glitches, without gate resizing (top four charts) and for the long glitch with gate resizing (bottom two charts).](image)

![Figure 8. Average bit SER for several benchmarks without and with gate sizing for several MEI thresholds.](image)

**9. CONCLUSION**

In this paper, we presented a symbolic modeling methodology and associated framework for efficient estimation of the soft error susceptibility of a combinational logic circuit. We have demonstrated the efficiency of our framework by comparing it to HSPICE detailed circuit simulation and applying it on a subset of ISCAS’85 and mcnc’91 benchmarks of various complexities. The framework allows for the analysis of reliability of combinational circuits from various aspects: output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of the circuit reliability on glitch duration, amplitude, and input patterns. We have also shown that, by using the information obtained from the framework, we can resize the gates that have largest impact on circuit reliability, such that their impact is decreased and SER is improved with minimal area overhead.

**10. REFERENCES**


