# Mitigation of NBTI Induced Performance Degradation in On-Chip Digital LDOs

Longfei Wang\* S. Karen Khatamifard† Ulya R. Karpuzcu† Selçuk Köse\*

\*Department of Electrical Engineering, University of South Florida, Tampa, FL, USA

†Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA

longfei@mail.usf.edu {khatami,ukarpuzc}@umn.edu kose@usf.edu

Abstract—On-chip digital low-dropout voltage regulators (LDOs) have recently gained impetus and drawn significant attention for integration within both mobile devices and microprocessors. Although the benefits of easy integration and fast response speed surpass analog LDOs and other voltage regulator types, NBTI induced performance degradation is typically overlooked. The conventional bi-directional shift register based controller can even exacerbate the degradation, which has been demonstrated theoretically and through practical applications. In this paper, a novel uni-directional shift register is proposed to evenly distribute the electrical stress and mitigate the NBTI effects under arbitrary load conditions with nearly no extra power and area overhead. The benefits of the proposed design as well as reliability aware design considerations are explored and highlighted through simulation of an IBM POWER8 like processor under several benchmark applications. It is demonstrated that the proposed NBTI-aware design can achieve up to 43.2% performance improvement as compared to a conventional one.

#### I. INTRODUCTION

With ubiquitous applications of on-chip voltage regulation [1] within modern microprocessors, Internet of Things (IoT), wireless energy harvesting, and applications such as aerospace engineering, the reliable operation and lifetime of on-chip voltage regulators have become one of the most significant and challenging design considerations. Within those applications, large variations in the load current, voltage, and temperature can occur. These variations may speed up the aging process of the devices under stress and further deteriorate the performance and lifetime of on-chip voltage regulators. As those regulators are already deployed in the field, replacement of them can be costly or even impossible. The conflicting need of harsh environment applications and highly reliable designs necessitates reliability evaluations at design stage as well as reliability enhancement techniques.

The major transistor aging mechanisms include bias temperature instability (BTI), hot carrier injection (HCI), time dependent dielectric breakdown (TDDB), and electromigration (EM), among which BTI is the dominant reliability concern for nanometer integrated circuits design [2-4]. BTI can induce threshold voltage increase and consequent circuit level performance degradation. Positive BTI (PBTI) induces aging of nMOS transistors while negative BTI (NBTI) causes aging of pMOS transistors [3]. The impact of BTI aging mechanism is a strong function of temperature, electrical stress, and time.

On the other hand, as an essential part of large scale integrated circuits, on-chip voltage regulators need to be active

most of the time to provide the required power to the load circuit. The load current and temperature can vary a lot especially for microprocessor applications [5]. All of these variations partially contribute to different aging mechanisms of on-chip voltage regulators, which should be considered to avoid overdesign for a targeted lifetime.

Several studies have been performed regarding the reliability issues in nanometer CMOS designs [6-8]. There is, however, quite limited amount of work on the reliability of on-chip voltage regulators. Device aging on the immunity level of electro-magnetic interference (EMI) for low-dropout regulators (LDO) is characterized in [9]. A method of distributing the aging stress by rotating the phase to shed at light load is proposed in [10] to enhance the light load efficiency for multiphase buck converters. The reliability of metal wires connected to on-chip voltage regulators is investigated in [11]. Nonetheless, quantitative analysis of aging effects on on-chip voltage regulators considering load current characteristics and temperature variations as well as efficient reliability enhancement techniques under arbitrary load conditions have not yet been investigated.

As compared to other voltage regulator types, the emerging digital LDO (DLDO) has gained impetus due to the design simplicity, easiness for integration, high power density, and fast response [12], [13]. DLDOs have demonstrated major advantages in modern processors including the recent IBM POWER8 processor [14]. More importantly, as compared to the analog LDOs, DLDO can provide certain advantages for low-power and low-voltage IoT applications due to its capability for low supply voltage operations [15]. However, as pMOS is used as the power transistor for DLDOs, NBTI induced degradations largely affect important performance metrics such as the maximum output current capability  $I_{max}$ , load response time  $T_R$ , and magnitude of the droop  $\Delta V$  as defined in [16]. It is therefore imperative to investigate aging mitigation techniques for DLDOs to achieve reliable operation of critical systems.

The main contributions of this paper are threefold. First, NBTI induced threshold voltage  $V_{th}$  degradations are theoretically demonstrated that deteriorate DLDO performance metrics including  $I_{max}$ ,  $T_R$ , and  $\Delta V$ , making NBTI-aware DLDO designs necessary. Second, a novel uni-directional shift register (uDSR) is proposed to mitigate the NBTI induced DLDO performance degradation under arbitrary load condi-



Fig. 1. Schematic of conventional DLDO.

tions without degrading the performance. Third, possible mitigation strategies of DLDO performance degradation using the proposed technique are evaluated and reliability-aware design considerations are explored within practical applications.

The rest of this paper is organized as follows. Background information regarding conventional DLDO regulator and NBTI is introduced in Section II. NBTI induced DLDO performance degradation including  $I_{max}$ ,  $T_R$ , and  $\Delta V$  is demonstrated theoretically in Section III. The proposed uDSR based NBTI-aware DLDO is described in Section IV. Evaluation of the benefits of the proposed NBTI-aware DLDO through simulation of an IBM POWER8 like processor is provided in Section V. Concluding remarks are offered in Section VI.

#### II. BACKGROUND

#### A. Conventional DLDO regulator

The schematic of a conventional DLDO [12] is shown in Fig. 1. DLDO is composed of N parallel pMOS transistors  $M_i$ (i = 1, ..., N) connected between the input voltage  $V_{in}$  and output voltage  $V_{out}$ , and feedback control loop implemented with a clocked comparator and digital controller. The value of  $V_{out}$  and reference voltage  $V_{ref}$  are compared through the comparator at the rising edge of the clock signal clk. More (less) number of  $M_i$  is turned on through the digital controller output signals  $Q_i$  (i = 1,...,N) if  $V_{out} < V_{ref}, V_{cmp} =$  $H(V_{out} > V_{ref}, V_{cmp} = L)$ . A bi-directional shift register (bDSR), as shown in Fig. 2a, is conventionally implemented for the digital controller to turn on (off) power transistors  $M_1$ to  $M_m$  ( $M_{m+1}$  to  $M_N$ ) with the value of m decided by the load current  $I_{out}$ . At a certain step k+1,  $M_{m+1}$   $(M_m)$  is turned on (off) if  $V_{cmp} = H$  ( $V_{cmp} = L$ ) and bDSR shifts right (left) as demonstrated in Fig. 2b.

DLDO needs to able to supply the maximum possible load current  $I_{max}$ . It is, however, demonstrated that, within most practical applications, including but not limited to smart phone [10] and chip multiprocessors [17], less than the average power is consumed most of the time. The application environment of DLDO together with the conventional activation scheme of  $M_i$  leads to the heavy use of  $M_1$  to  $M_m$  and less or even no use of  $M_{m+1}$  to  $M_N$ . This scheme can therefore introduce serious degradation to  $M_1$  to  $M_m$  due to NBTI. The subsequent DLDO performance deteriorations are discussed in Sections II-B and III.



Fig. 2. Digital controller for conventional DLDO. (a) Bi-directional shift register. (b) Operation of bi-directional shift register.

#### B. Negative bias temperature instability

NBTI can introduce significant  $V_{th}$  degradations to pMOS transistors due to negatively applied gate to source voltage  $V_{gs}$ . The increase in  $|V_{th}|$  due to NBTI is considered to be related to the generation of interface traps at the Si/SiO<sub>2</sub> interface when there is a gate voltage [18].  $|V_{th}|$  increases when electrical stress is applied and partially recovers when stress is removed. This process is commonly explained using a reaction-diffusion (R-D) model [18]. The  $V_{th}$  degradation can be estimated during each stress and recovery phase using a cycle-to-cycle model and can also be evaluated using a long-term reliability model [3], [7], [19]. As the long-term reliability evaluation is the focus of this work, the analytical model for long-term worst case threshold voltage degradation  $\Delta V_{th}$  estimation in [3] is adopted in this work as

$$\Delta V_{th} = K_{lt} \sqrt{C_{ox}(|V_{gs}| - |V_{th}|)} e^{\frac{-E_o}{kT}} (\alpha t)^{\frac{1}{6}}$$
 (1)

where  $C_{ox}$ , k, T,  $\alpha$ , and t are, respectively, the oxide capacitance, Boltzmann constant, temperature, the fraction of time (activity factor) when the device is under stress, and operation time.  $K_{lt}$  and  $E_a$  are the fitting parameters to match the model with the experimental data [3]. Note that NBTI recovery phase is already included in the model.

## III. NBTI INDUCED PERFORMANCE DEGRADATION

 $I_{max}$ ,  $T_R$ , and  $\Delta V$  are among the most important design parameters for DLDOs. The effect of NBTI induced degradations on these important performance metrics is examined in this section.

# A. Maximum current supply capability

Without NBTI induced degradations,  $I_{max}$ = $NI_{pMOS}$ , where  $I_{pMOS}$  is the maximum output current of a single pMOS stage. For DLDO,  $|V_{gs}|$  in (1) is equal to  $V_{in}$  when  $M_i$  is active. The pMOS transistor  $M_i$  operates in linear region



Fig. 3. Percentage  $I_{pMOS}$  degradation of conventional bDSR based DLDO. when turned on and the on-resistance  $R_{on}$  of a single pMOS stage can be approximated as [3]

$$R_{on} \approx [(W/L)\mu_p C_{ox}(V_{in} - |V_{th}|)]^{-1}$$
 (2)

where W, L,  $\mu_p$ , and  $C_{ox}$  are, respectively, the width, length, mobility, and oxide capacitance of  $M_i$ .  $I_{pMOS}$  can thus be expressed as

$$I_{pMOS} = \frac{V_{sd}}{R_{on}} = (V_{in} - V_{out})(W/L)\mu_p C_{ox}(V_{in} - |V_{th}|)$$
 (3)

where  $V_{sd}$  is the source drain voltage of  $M_i$ . NBTI induced degradation factor  $DF_i$  for  $M_i$  can be defined as

$$DF_{i} = \frac{I_{pMOS_{i}}^{deg}}{I_{pMOS}} = \frac{V_{in} - |V_{th}| - \Delta V_{th_{i}}}{V_{in} - |V_{th}|}$$
(4)

where  $\Delta V_{th_i}$  and  $I_{pMOS_i}^{deg}$  are, respectively, NBTI induced  $V_{th}$  degradation and the degraded  $I_{pMOS}$  for  $M_i$ . Degraded  $I_{max}$  can be expressed as

$$I_{max}^{deg} = I_{pMOS} \sum_{i=1}^{N} DF_i.$$
 (5)

As an example, the percentage  $I_{pMOS}$  degradation  $1-DF_i$  for smaller value of i, considering  $M_i$  is active most of the time, is shown in Fig. 3 as a function of time under different temperatures. A 32 nm metal gate, high-k strained-Si CMOS technology from PTM model library [19] is utilized. A nominal supply voltage  $V_{in}=0.9~V$  is used. PTM is adopted for simulation as it is widely used for BTI study due to the availability of fitting parameter values in the  $\Delta V_{th}$  degradation model [2], [3], [6-8]. As shown in Fig. 3, NBTI can induce significant  $I_{pMOS}$  degradations, especially at high temperatures. Degraded  $I_{pMOS}$  can further lead to reduced  $I_{max}$  and lower output voltage regulation capability under high load current. Moreover, as discussed in Sections III-B and III-C, degraded  $I_{pMOS}$  also exacerbates  $T_R$  and  $\Delta V$ , necessitating reliability enhancement techniques.

#### B. Load response time

Load response time  $T_R$  measures how fast the feedback loop responds to a step load.  $T_R$  can be estimated as [20]

$$T_R = RCln(1 + \frac{\Delta i_{load}}{I_{pMOS}f_{clk}RC}) \tag{6}$$

where R, C,  $f_{clk}$ , and  $\Delta i_{load}$  are, respectively, the average DLDO output resistance before and after  $\Delta i_{load}$ , load capacitance, clock frequency, and amplitude of the load change. Considering NBTI effect, degraded  $T_R$  can be expressed as

$$T_R^{deg} = RCln(1 + \frac{\Delta i_{load}}{DFI_{pMOS}f_{clk}RC}). \tag{7}$$

As 0 < DF < 1 and  $T_R < T_R^{deg}$ , NBTI induced degradation slows down DLDO response.

# C. Magnitude of the droop

Magnitude of the droop  $\Delta V$  reflects the  $V_{out}$  noise profile under transient response and can be estimated as [20]

$$\Delta V = R\Delta i_{load} - I_{pMOS} f_{clk} R^2 C ln \left(1 + \frac{\Delta i_{load}}{I_{pMOS} f_{clk} RC}\right). \tag{8}$$

Considering NBTI effect, degraded  $\Delta V$  can be expressed as

$$\Delta V_{deg} = R\Delta i_{load} - DFI_{pMOS} f_{clk} R^2 C ln \left(1 + \frac{\Delta i_{load}}{DFI_{pMOS} f_{clk} RC}\right). \quad (9)$$

Let  $\Delta i_{load}/I_{pMOS}f_{clk}RC = A$ , A > 0. Under 0 < DF < 1, the following holds

$$1 + A > (1 + \frac{A}{DF})^{DF} \tag{10}$$

thus

$$I_{pMOS}f_{clk}R^{2}Cln(1 + \frac{\Delta i_{load}}{I_{pMOS}f_{clk}RC}) > DFI_{pMOS}f_{clk}R^{2}Cln(1 + \frac{\Delta i_{load}}{DFI_{pMOS}f_{clk}RC})$$
(11)

and  $\Delta V < \Delta V_{deg}$ , which means NBTI can degrade the transient voltage noise profile.

Furthermore, it is worth noting that, as seen from (5), (6), and (8), NBTI induced DLDO performance degradations are mainly due to the degradation of the power transistors  $M_i$  (i = 1, ..., N) rather than the control loop. Thus, mitigation of power transistor degradations should be taken as a priority.

Power transistor  $M_i$ s with smaller values of i are more heavily used than those with larger values of i for conventional bDSR based DLDO. As studied in [17] that load current variation per processor clock cycle can be small most of the time. It is thus reasonable to assume that the newly activated/deactivated power stages have similar level of  $I_{pMOS}$  degradations. As below average power is mostly consumed, conventional bDSR based DLDOs experience worst case  $T_R$  and  $\Delta V$  degradations since the worst degraded  $M_i$ s are utilized most of the time.

# IV. NBTI-AWARE DLDO VOLTAGE REGULATOR

To mitigate NBTI induced DLDO performance degradations, distributing the electrical stress among all available power transistors as evenly as possible under arbitrary load current conditions is essential. Reliability is not considered in conventional bDSR based DLDO designs, and therefore too



Fig. 4. Proposed uni-directional shift register for NBTI-aware DLDO.

| Q <sub>1</sub>                                       | Q <sub>2</sub> | Q₃ | Q <sub>4</sub> | Q <sub>5</sub> | Q <sub>6</sub> |  |       | Q <sub>N-1</sub> | Qn |
|------------------------------------------------------|----------------|----|----------------|----------------|----------------|--|-------|------------------|----|
| (1) Initialize: all Mi turned off                    |                |    |                |                |                |  |       |                  |    |
| 1                                                    | 1              | 1  | 1              | 1              | 1              |  |       | 1                | 1  |
| (2) Step k                                           |                |    |                |                |                |  |       |                  |    |
|                                                      | 1              | 0  | 0              | 1              | 1              |  |       | 1                | 1  |
| (3-a) Step k+1 if V <sub>cmp</sub> =H: Shift right → |                |    |                |                |                |  |       |                  |    |
|                                                      | 1              | 0  | 0              | 0              | 1              |  | • • • | 1                | 1  |
| (3-b) Step k+1 if V <sub>cmp</sub> =L: Shift right → |                |    |                |                |                |  |       |                  |    |
| • • •                                                | 1              | 1  | 0              | 1              | 1              |  |       | 1                | 1  |

Fig. 5. Operation of the proposed uni-directional shift register.

much stress is exerted on a small portion of  $M_i$ s. A novel uDSR is thus proposed in this work to evenly distribute the electrical stress among all of the  $M_i$ s to realize a NBTI-aware DLDO voltage regulator and enhance reliability.

The schematic and operation of the proposed uDSR are shown, respectively, in Figs. 4 and 5. The elementary D flipflop (DFF) and multiplexer within bDSR, as shown in Fig. 2a, are replaced with T flip-flop (TFF) and simple logic gates within the proposed uDSR, respectively. The rest of the DLDO including parallel power transistors and clocked comparator remains unchanged. The idea is to balance the utilization of each available  $M_i$  under all load current conditions. To achieve this objective, control signals  $Q_{i-1}$  and  $Q_i$  for two adjacent power transistors  $M_{i-1}$  and  $M_i$ , respectively, are XORed to determine if  $M_{i-1}$  and  $M_i$  are at the boundary of active and inactive power transistor portions. Normally, there are two such boundaries if at least one power transistor is active, as shown in Fig. 5.  $Q_{i-1}$  and output of the comparator  $V_{cmp}$  are thus XORed to decide which power transistor at the boundaries need to be turned on/off at the rising edge of the clock signal. Inactive (active) power transistor at the right (left) boundary is turned on (off) if  $V_{cmp}$  is logic high (low). A uni-directional shift register is realized through this activation/deactivation scheme, as demonstrated in Fig. 5.  $Q_{i-1}$  for the first stage is  $Q_N$  from the last stage and thus a loop is formed. Considering the initialization step when all  $M_i$ s are off and the full load current condition when all  $M_i$ s are on, additional control signals are inserted as  $T_b$  and  $T_c$  in the first stage, to avoid inaction under these two situations, where  $T_b = Q_1 \cdot Q_2 \cdot \cdot \cdot Q_N \cdot V_{cmp}$ and  $T_c = \overline{Q_1 + Q_2 + \cdots + Q_N + V_{cmp}}$ . The logic functions for  $T_b$  and  $T_c$  can be implemented with n-input AND/NOR gates [21]. Considering the similar area of DFF and TFF, the proposed uDSR only induces  $\sim 3.8\%$  area overhead per control stage compared to bDSR. The total area overhead is thus  $\sim 2.6\%$  of a single DLDO area designed with  $\mu A$  current

TABLE I TECHNOLOGY AND ARCHITECTURE PARAMETERS

| Technology node: 22nm, Frequency: 4.0GHz TDP: 150W, Area: $441mm^2$ , Vdd: 1.03V             |
|----------------------------------------------------------------------------------------------|
| # cores: 8, issue width: 8                                                                   |
| 64 architectured FRF, 32 architectured IRF<br>L1-I cache: 32KB, 8-way, 64B, LRU, 1-cycle hit |
| L1-D cache: 64KB, 8-way, 64B, LRU, 1-cycle hit                                               |
| L2 cache: 512KB, 8-way, 128B, LRU, 11-cycle hit                                              |
| L3 cache: 64MB, 8-way, 128B, LRU, 30-cycle hit                                               |

supply capability [12]. As little extra transistors are added per control stage and the bDSR only consumes a few  $\mu W$  power [12], the uDSR induced power overhead is also negligible. With larger  $I_{pMOS}$  for higher load current rating, both the area and power overhead can be significantly less.

Under transient load current conditions, if  $V_{out} < V_{ref}$  $(V_{out} > V_{ref})$  due to increased (decreased) load current, inactive (active) power transistors at the right (left) boundary are gradually turned on (off) to supply the required output current and regulate  $V_{out}$ . Under steady state conditions, the number of active power transistors changes dynamically due to limit cycle oscillations as shown in [12] in order to supply the required current. Newly activated (deactivated) power transistors always occur at the right (left) boundary, leading to the right shift of the active power transistors all the time. Thus, regardless of the load current conditions, electrical stress can always be evenly distributed among all of the available power transistors. Furthermore, as compared to conventional bDSR based DLDO, the number of activated/deactivated power transistors per clock cycle remains the same and thus the DLDO performance is not negatively affected.

Unlike the rotating phase-shedding scheme for multiphase buck converters implemented in [10], which only mitigates aging effects at light load conditions with only one active phase, the proposed uDSR is effective under all load current conditions. Moreover, as the proposed uDSR is a generalized method to determine which parallel power stage needs to be turned on/off, it can also be tailored for reliability enhancement within multiphase buck or switched capacitor voltage regulators with phase-shedding functionality.

# V. EVALUATION

To evaluate the benefits of the proposed uDSR based DLDO architecture in terms of reliability enhancement and to provide design insights for a targeted lifetime, an IBM POWER8 like microprocessor [14] simulation platform is constructed.

# A. Simulation framework

1) IBM POWER8 Like Microprocessor: IBM POWER8 microprocessor [14] is among one of the state-of-the-art server-class processors and thus representative for evaluation of the proposed NBTI-aware DLDO scheme. The corresponding technology and architecture parameters listed in Table I are adopted from [5]. The IBM POWER8 like microprocessor as shown in Fig. 6, includes a load store unit (LSU), an execution unit (EXU), an instruction fetch unit (IFU), an instruction scheduling unit (ISU), an L1 data cache inside



Fig. 6. A schematic diagram demonstrating the floor plan of one core within IBM POWER8 like microprocessor chip.

LSU, an L1 instruction cache inside IFU, and a private L2. All benchmarks are from SPALSH2x [22] and cover a wide range of representative application domains. Analysis is restricted to the region-of-interest of the benchmarks and eight threads are involved in the simulations. The load characteristics of different functional blocks, as shown in Fig. 6, under all experimented benchmarks are summarized in Table II.

- 2) DLDO Design Specifications: Distributed microregulators are implemented in IBM POWER8 microprocessor [23]. In this simulation example, a switch array of 256 pMOS transistors, which is typical in DLDO designs [12], is implemented in each micro-regulator. Two different DLDO designs with bDSR and uDSR controls are implemented using 32 nm PTM CMOS technology where  $V_{in}=1.1~{
  m V}$ and  $V_{out}=1~\mathrm{V}$  as in [23].  $I_{pMOS}=2~\mathrm{mA}$  and  $I_{max}=512$ mA are used in the simulations, leading to 7, 24, 3, 10, and 5 micro-regulators (DLDOs) in, respectively, IFU, LSU, ISU, EXU, and L2 blocks to be able to supply the maximum load current across all benchmarks in each block. Load current of each block is assumed to be supplied by micro-regulators within that block, which is reasonable due to the principle of spatial locality [24] regarding current distribution. Each micro-regulator within a certain block is assumed to provide equal current due to the availability of current balancing scheme implemented within IBM POWER8 microprocessor [16].  $f_{clk} = 10$  MHz and C = 15 nF are used for each DLDO to achieve smaller than 10% Vdd transient voltage noise [5] most of the time. The total output capacitance of 735 nF is comparable to 750 nF used in [23].
- 3) Evaluation of NBTI Induced Performance Degradation: Equations (1), (3), (6), and (8) are leveraged for evaluation of NBTI induced performance degradation. A typical temperature profile [5], [25] of 90°C, 69°C, 67°C, 63°C, and 62°C for, respectively, LSU, EXU, IFU, ISU, and L2 is adopted for evaluations. The activity factors for both DLDO designs under different benchmarks and functional blocks are estimated through simulations in Cadence Virtuoso. The worst case  $I_{pMOS}$  degradations are used for evaluations of both designs, which is reasonable due to load characteristics of typical applications [17] and the consequent heavy use of a portion of  $M_i$ s in conventional DLDOs.

# B. Simulation results

1) Performance Degradation within Conventional DLDO: Conventional DLDO performance degradation for different functional blocks for a five-year time frame is summarized in Table III. These degradations apply to all the experimented benchmarks as the worst case  $I_{pMOS}$  degradation is con-

TABLE II

LOAD CHARACTERISTICS OF DIFFERENT FUNCTIONAL BLOCKS WITHIN
ONE CORE OF AN IBM POWER8 LIKE MICROPROCESSOR CHIP UNDER
ALL EXPERIMENTED BENCHMARKS

|                           | IFU   | LSU    | ISU   | EXU   | L2    |
|---------------------------|-------|--------|-------|-------|-------|
| Min I <sub>load</sub> (A) | 0.091 | 0.172  | 0.125 | 0.251 | 0.178 |
| Max $I_{load}$ (A)        | 3.245 | 12.092 | 1.356 | 5.056 | 2.195 |
| Avg $I_{1}$ (A)           | 1.138 | 0.908  | 0.201 | 1 294 | 1.719 |

#### TABLE III

CONVENTIONAL DLDO PERFORMANCE DEGRADATION FOR DIFFERENT FUNCTIONAL BLOCKS UNDER ALL EXPERIMENTED BENCHMARKS FOR A FIVE-YEAR TIME FRAME

|                           | IFU  | LSU  | ISU  | EXU  | L2   |
|---------------------------|------|------|------|------|------|
| $\% I_{pMOS}$ degradation | 16.2 | 21.4 | 15.3 | 16.6 | 15.1 |
| % $T_R$ degradation       | 9.4  | 12.9 | 8.9  | 9.7  | 8.7  |
| % $\Delta V$ degradation  | 6.4  | 8.7  | 6.1  | 6.6  | 6    |

sidered. As shown in Table III, NBTI can induce serious  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$  degradations for all functional blocks.  $I_{pMOS}$  degradation can lead to the deterioration of DLDO  $V_{out}$  regulation capability and possible  $V_{out}$  drop under large load current conditions. Larger than  $10\%~V_{out}$  drop can lead to voltage emergencies and potential execution errors for microprocessors. Similarly,  $T_R$  and  $\Delta V$  degradations can, respectively, increase the duration and frequency of voltage emergencies, which can slow down microprocessor executions as further actions may need to be taken to remedy the errors. Moreover, for a longer targeted lifetime of more than five years, the degradations are expected to be more disastrous as  $I_{pMOS}$  degradations are even worse, as seen from Fig. 3, which may not be tolerable for critical applications where replacement of the devices can be costly or even impossible.

- 2) Mitigation with Proposed NBTI-Aware DLDO: Simulation results for all benchmarks are summarized in Figs. 7, 8, and 9 regarding, respectively,  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$  degradation mitigation of the proposed NBTI-aware DLDO as compared to the conventional DLDO design for a five-year time frame. Up to 39.6%, 43.2%, and 42% performance improvement is achieved for, respectively,  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$ . The highest performance improvement is obtained for LSU with the highest operation temperature. Even at the lowest operation temperature within L2, degradation mitigations of up to 15.1%, 16.4%, and 15.9% are achieved for, respectively,  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$ .
- 3) Discussions: For high temperature applications and applications with high maximum to average current ratio, such as the LSU block, NBTI can induce greater performance degradations as summarized in Tables II and III. The benefits of the proposed NBTI-aware DLDO scheme are also more advantageous for certain applications as shown in Figs. 7, 8, and 9 for the LSU portions. For applications where the average current is close to the maximum current, such as the L2 block, the performance degradation mitigations using the proposed NBTI-aware DLDO are less significant but still beneficial as compared to the conventional design considering negligible extra power and area overhead induced by the proposed design.

DLDO performance degradations can vary under different



Fig. 7. Percentage  $I_{pMOS}$  degradation mitigation of the proposed NBTI-aware DLDO as compared to the conventional DLDO design for different functional blocks under all experimented benchmarks.



Fig. 8. Percentage  $T_R$  degradation mitigation of the proposed NBTI-aware DLDO as compared to the conventional DLDO design for different functional blocks under all experimented benchmarks.

load characteristics and temperature. It is thus essential to examine these degradations in early design stage with the applied reliability enhancement techniques. Extra design margins, such as increased number of  $M_i$  and/or output capacitance, should be adopted adaptively according to the aging speed of different functional blocks and benchmark applications instead of utilizing a uniform margin to avoid potential overdesign.

## VI. CONCLUSION

The DLDO regulators can experience serious NBTI induced performance degradations including  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$ . These degradations are typically overlooked in the design of DLDOs and can deteriorate the regulation capability, response speed, and transient voltage noise profile. A novel uni-directional shift register is proposed in this paper to evenly distribute the electrical stress among different power transistors to mitigate NBTI induced performance degradation with nearly no extra power and area overhead under arbitrary load conditions. Through practical simulations of an IBM POWER8 like microprocessor and benchmark evaluations, it is demonstrated that up to 39.6%, 43.2%, and 42% degradation mitigation can be achieved for, respectively,  $I_{pMOS}$ ,  $T_R$ , and  $\Delta V$  with the proposed technique. Simulation results also highlight the necessity of adaptive design margins to avoid overdesign.

# ACKNOWLEDGMENT

This work was supported in part by the National Science Foundation CAREER Award under Grant CCF-1350451, in part by the National Science Foundation Award under Grant CCF-1421988, and in part by the Cisco Research Award.

## REFERENCES

[1] I. Vaisband et al., On-Chip Power Delivery and Management, Fourth Edition. Springer, 2016.



Fig. 9. Percentage  $\Delta V$  degradation mitigation of the proposed NBTI-aware DLDO as compared to the conventional DLDO design for different functional blocks under all experimented benchmarks.

- [2] M. M. Mahmoud, N. Soin, and H. A. H. Fahmy, "Design Framework to Overcome Aging Degradation of the 16 nm VLSI Technology Circuits," *IEEE TCAD*, vol. 33, no. 5, pp. 691-703, May 2014.
- [3] D. Rossi et al., "Reliable Power Gating with NBTI Aging Benefits," IEEE TVLSI, vol. 24, no. 8, pp. 2735-2744, Aug. 2016.
- [4] T. Chan, J. Sartori, P. Gupta, and R. Kumar, "On the Efficacy of NBTI Mitigation Techniques," DATE, pp. 1-6, Mar. 2011.
- [5] S. K. Khatamifard et al., "ThermoGater: Thermally-Aware On-Chip Voltage Regulation," ISCA, pp. 120-132, 2017.
- [6] K. Wu, I. Lin, Y. Wang, and S. Yang, "BTI-Aware Sleep Transistor Sizing Algorithm for Reliable Power Gating Designs," *IEEE TCAD*, vol. 33, no. 10, pp. 1591-1595, Oct. 2014.
- [7] I. Agbo et al., "Integral Impact of BTI, PVT Variation, and Workload on SRAM Sense Amplifier," *IEEE TVLSI*, vol. 25, no. 4, pp. 1444-1454, Apr. 2017.
- [8] J. Fang and S. S. Sapatnekar, "The Impact of BTI Variations on Timing in Digital Logic Circuits," *IEEE TDMR*, vol. 13, no. 1, pp. 277-286, Jan. 2013.
- [9] J. Wu et al., "Characterization of Changes in LDO Susceptibility After Electrical Stress," *IEEE TEMC*, vol. 55, no. 5, pp. 883-890, Oct. 2013.
- [10] Y. Ahn, I. Jeon, and J. Roh, "A Multiphase Buck Converter with a Rotating Phase-Shedding Scheme for Efficient Light-Load Control," *IEEE JSSC*, vol. 49, no. 11, pp. 2673-2683, Nov. 2014.
- [11] L. Wang et al., "Efficiency, Stability, and Reliability Implications of Unbalanced Current Sharing among Distributed On-Chip Voltage Regulators," *IEEE TVLSI*, vol. 25, no. 11, pp. 3019-3032, Nov. 2017.
- [12] Y. Okuma et al., "0.5-V input digital LDO with 98.7% current efficiency and 2.7-μA quiescent current in 65 nm CMOS," CICC, pp. 1-4, 2010.
- [13] S. Köse, "Regulator-Gating: Adaptive Management of On-Chip Voltage Regulators," GLSVLSI, pp. 105-110, 2014.
- [14] E. J. Fluhr et al., "POWER8: A 12-Core Server-Class Processor in 22nm SOC with 7.6Tb/s Off-Chip Bandwidth," ISSCC, pp. 96-97, 2014.
- [15] M. Alioto, "Enabling the Internet of Things from Integrated Circuits to Integrated Systems," Springer, 2016.
- [16] J. F. Bulzacchelli et al., "Dual-Loop System of Distributed Microregulators with High DC Accuracy, Load Response Time Below 500 ps, and 85-mV Dropout Voltage," IEEE JSSC, vol. 47, no. 4, pp. 863-874, 2012.
- [17] D. Pathak, H. Homayoun, and I. Savidis, "Smart Grid on Chip: Work Load-Balanced On-Chip Power Delivery," *IEEE TVLSI*, vol. 25, no. 9, pp. 2538-2551, Sept. 2017.
- [18] M. A. Alam and S. Mahapatra, "A Comprehensive Model of PMOS NBTI Degradation," *Microelec. Reli.*, vol. 45, no. 1, pp. 71-81, 2005.
- [19] Y. Cao, "Predictive Technology Model for Robust Nanoelectronic Design," Springer, 2011.
- [20] S. Leitner, P. West, C. Lu, and H. Wang, "Digital LDO Modeling for Early Design Space Exploration," *IEEE SOCC*, pp. 7-12, 2016.
- [21] M. Alioto and G. Palumbo, "NAND/NOR Adiabatic Gates: Power Consumption Evaluation and Comparison Versus the Fan-In," *IEEE TCAS-I*, vol. 49, no. 9, pp. 1253-1262, Sept. 2002.
- [22] S. C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," ISCA, pp. 24-36, 1995.
- [23] Z. Toprak-Deniz et al., "Distributed System of Digitally Controlled Microregulators Enabling Per-Core DVFS for the POWER8 Microprocessor," ISSCC, pp. 98-99, 2014.
- [24] S. Köse and E. G. Friedman, "Efficient Algorithm for Fast IR Drop Analysis Exploiting Locality," *Integr., VLSI J.*, vol. 45, pp. 149-161, 2012.
- [25] S. Köse, "Thermal Implications of On-Chip Voltage Regulation: Upcoming Challenges and Possible Solutions," DAC, pp. 1-6, 2014.