# Efficiency, Stability, and Reliability Implications of Unbalanced Current Sharing among Distributed On-Chip Voltage Regulators

Longfei Wang, S. Karen Khatamifard, Orhun Aras Uzun, Ulya R. Karpuzcu, and Selçuk Köse, Member, IEEE

Abstract-Power delivery networks with distributed on-chip voltage regulators serve as an effective way for fast localized voltage regulation within modern microprocessors. Without careful consideration of the interactions among the distributed voltage regulators and the power grid, unbalanced current sharing among those regulators may, however, lead to efficiency degradations, stability and reliability issues, and even malfunctions of the regulators. This paper is a first attempt to investigate the efficiency, stability, and reliability implications of unbalanced current sharing among distributed on-chip voltage regulators. Benefits of balanced current sharing are demonstrated with concrete examples, showing the necessity of an appropriate current balancing scheme. Adaptive reference voltage control method and corresponding control algorithms specifically for distributed on-chip voltage regulators are proposed to balance the current sharing among regulators at different locations. The proposed techniques successfully balance the current sharing among distributed voltage regulators and can be applied to different regulator types. Simulation results based on practical microprocessor setups confirm the efficiency, stability, and reliability implications.

*Index Terms*—Power delivery network (PDN), distributed onchip voltage regulator, current sharing, power efficiency, stability, reliability.

#### I. INTRODUCTION

**E** FFICIENT, stable, and reliable operation of power delivery networks (PDNs) are crucial to sustain high performance and low power design targets of modern large scale integrated circuits (ICs). Thermal design power (TDP) of microprocessors has increased over generations and can go beyond 100W [1]. The peak power of a microprocessor can, however, be 1.5 times the TDP rating [2]. Even small power conversion efficiency degradations within such power-hungry ICs lead to tremendous power loss, resulting in higher heat dissipation. Meanwhile, the complexity and large component count incur serious stability and reliability concerns.

Voltage regulators (VRs) as an essential part of PDNs, including commonly used buck, switched capacitor (SC), and low-dropout (LDO) regulators, have been moved from offchip placements to on-chip implementations to save board

S. K. Khatamifard and U. R. Karpuzcu are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: khata006@umn.edu; ukarpuzc@umn.edu).

area and to enable efficient, fast, and secure localized voltage regulation [3]-[5]. Distributed on-chip voltage regulation has recently become an emerging research field where multiple on-chip VRs are connected in parallel and distributed across the power grid to supply current across the whole die [6]–[13]. Previous work mainly focuses on the efficiency improvement of stand-alone VRs [3] and that of the PDNs as a whole [14]. The implications of the complex interactions among on-chip VRs and the power grid have, however, been typically overlooked. Although there are appealing benefits of the distributed on-chip voltage regulation, complex interactions among regulators and the power grid may lead to significant efficiency, stability, and reliability issues. Among the various implications of distributed on-chip voltage regulation, unbalanced current sharing, if not carefully controlled, can stultify the previously proposed efficiency enhancement benefits or even shorten the lifetime of the chip.

Unbalanced current sharing problem has been widely studied in conventional power electronics field for multiphase interleaving buck regulators [15], [16]. Little attention has, however, been paid to this problem within microelectronics field for distributed on-chip voltage regulation and, to the best of the authors' knowledge, the efficiency, stability, and reliability implications of unbalanced current sharing within distributed on-chip PDNs have not yet been investigated.

Voltage regulators within distributed on-chip PDNs, are connected to a passive mesh network [1] which supplies the required current to the load circuits. Several factors may lead to unbalanced current sharing within distributed on-chip power delivery systems that consist of multiple parallel VRs. These factors include mismatches in the component values and control loop mismatches, which are common factors leading to the unbalanced current within conventional centralized multiphase regulators [15], [16]. Specific to distributed onchip PDNs, the power grid parasitic impedance among the VRs and load circuits, although quite small, may have significant variations based on the placement of the VRs and the load circuits. Therefore, even with perfectly matched components and control loops among different distributed on-chip VRs, the variations of the power grid resistance among individual VRs and load circuits may lead to non-negligible mismatch and severe current sharing problems.

The contribution of this paper is threefold. First, the unbalanced current sharing problem is presented with extensive simulations in both Cadence Virtuoso and VoltSpot [18]. Power efficiency, stability, and reliability implications of the

This work is supported in part by the National Science Foundation CA-REER Award under Grant CCF-1350451, in part by the National Science Foundation Award under Grant CCF-1421988, and in part by the Cisco Research Award.

L. Wang, O. A. Uzun, and S. Köse are with the Department of Electrical Engineering, University of South Florida, Tampa, FL, 33620 USA (e-mail: longfei@mail.usf.edu; orhunuzun@mail.usf.edu; kose@usf.edu).



Fig. 1. On-chip power delivery network with distributed voltage regulators.

unbalanced current sharing within distributed on-chip PDNs are investigated. Theoretical derivations and simulation results lead to the observation that unbalanced current sharing can adversely affect the important design concerns, which necessitates an efficient current balancing scheme. Second, an adaptive reference voltage control mechanism is proposed as the current balancing scheme for distributed on-chip VRs to dynamically modulate the reference voltage of each individual VR. Circuit implementations are analyzed for the proposed control algorithm and preliminary simulations are performed to verify the effectiveness. Finally, an IBM POWER8like [17] microprocessor simulation platform is constructed in VoltSpot [18] to study the implications of the unbalanced current sharing problem in practical applications. Extensive simulations based on several benchmarks are performed and simulation results confirm the benefits of balanced current sharing. Although the analyses are conducted assuming a homogeneous PDN with buck regulators, without loss of generality, the proposed technique can be easily applied to heterogeneous PDNs that house different regulator types.

#### II. THE UNBALANCED CURRENT SHARING PROBLEM

An on-chip PDN model with distributed VRs is shown in Fig. 1. The inputs of the distributed VRs are connected to a global power grid that is connected to the package through the dedicated C4 pads. The outputs of the distributed on-chip VRs provide the required current at the target voltage level to the local power grid that feeds the load circuits. The global ground distribution provides the ground plane for the load circuits and is connected to the package through the dedicated GND C4 pads. The global and local power grid, and the global ground distribution are composed of orthogonal metal lines connected with vias [1]. With a first order approximation, these power grids can be modeled as a resistive mesh where the effective resistance between any two nodes on the power grid depends on the distance between the two nodes [19], [20]. The effective resistance mismatch between the distributed VRs with only local voltage regulation loops may cause unbalanced current sharing among the VRs and may even cause VR malfunctions.



Fig. 2. Unbalanced current sharing between two identical distributed on-chip buck regulators. (a) Inductor currents of two identical regulators supplying total load current of 1A. (b) A zoomed view of the inductor current profiles at steady state. (c) Inductor currents of two identical regulators supplying total load current of 2A, one inductor current goes saturated due to the maximum 1.27A load current one regulator can supply. (d) A zoomed view of the inductor current profiles showing the saturation of one inductor current.

To demonstrate the unbalanced current sharing problem, two sets of simulations are performed. First, two identical buck regulators providing localized voltage regulation are designed and simulated in Cadence Virtuoso using IBM 130nm CMOS process. The input voltage of the buck regulator is 3.3V and the output voltage is 1V. The switching frequency is 140MHz with a 5nH inductor. The peak to peak current ripple on the inductor is about 1A and the load regulation is 0.02%/A. Each regulator has a maximum load current supply capability of 1.27A. The on-chip power grid is designed as a resistive mesh using the design parameters of respective metal layers in [21]. Second, a buck regulator model is extracted and included in VoltSpot [18] for PDN simulations with large number of on-chip VRs. An IBM POWER8 like processor with 96 identical distributed regulators is used in the simulations. Detailed VoltSpot simulation setup is explained in Section VII. Simulation results demonstrating the unbalanced current sharing problem in both Cadence Virtuoso and VoltSpot are summarized in this section.

#### A. Large current variations

The load current supplied by a buck regulator is the average value of the respective inductor current. The inductor current of the two regulators when the total load current is 1A is shown in Fig. 2 (a) and (b). Due to the difference in the effective resistance for the two regulators, these regulators have different average inductor current values of 328.7mA and 671.3mA, respectively. With unbalanced current sharing, one regulator supplies more than twice the output current than the other. With a larger effective resistance mismatch, the difference can be even larger.

The output current values of the 96 identical distributed on-chip VRs within an IBM POWER8 like microprocessor chip for application  $lu\_ncb$  is shown in Fig. 3. The detailed simulation setup is explained in Section VII. In this simulation,



Fig. 3. Unbalanced current sharing among 96 identical distributed on-chip VRs within IBM POWER8 like microprocessor.

96 on-chip VRs are evenly distributed across the chip. As can be seen from Fig. 3, large current variations occur among these on-chip VRs. The highest current supplied by one VR goes up to nearly 2.5A and the lowest current supplied by one VR is around 0.5A. There is 5x difference between the highest and lowest on-chip VR current.

#### B. Voltage regulator malfunctions

For the same two buck regulator design at the same physical locations on the power grid as used in Section II-A, with a higher total load current of 2A, the inductor current distribution between the two regulators is shown in Fig. 2 (c) and (d). As can be seen from these figures, the difference between the two regulator inductor currents gradually becomes larger and at steady state one inductor becomes saturated and provides a constant current. For the saturated regulator, the pull-up pMOS transistor is always on, leading to 100% duty cycle operation and the malfunction of the VR. When the total load current is equally shared between the two, the malfunction of the VRs could be avoided as the current supplied by each VR is less than the maximum VR current capability.

Note that, in Fig. 3, on-chip VR model is included in VoltSpot for current distribution simulations and no limit is set for the maximum current that an individual VR can provide. If the output current capability of a VR is designed to be 1.5A, there would be more than ten on-chip VRs that enter this saturation point in this simulation, leading to chip-wide VR malfunctions. As over-current protection schemes are implemented for most VRs, VR malfunctions can be avoided. However, overloaded current can lead to output voltage drop [22], which is still not acceptable. Furthermore, as one VR supplies 5x current than the other, huge current density can lead to local hotspots of the VR and even destroy the VR and the nearby functional blocks.

With unbalanced current sharing, each on-chip VR needs to be designed for the worst case scenario to be able to supply the highest possible current with high efficiency. The size of power MOSFETs needs to be increased as compared to the design targeting at the total load current divided by N for Ndistributed VRs, which may introduce extra power and area overhead as power MOSFETs can occupy a large percentage of the total VR area.



Fig. 4. Conventional buck regulator, SC regulator, and LDO efficiency curves.

# III. EFFICIENCY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Power conversion efficiency curves for the conventional buck, SC, and LDO regulators are shown in Fig. 4. Consider two identical distributed on-chip buck or SC regulators with each design optimized at  $I_o/2$  for a total load current of  $I_o$ . With balanced current sharing, each buck or SC regulator operates at the optimum design point, providing maximum efficiency. With unbalanced current sharing, one regulator provides lower current  $I_1$  while the other one provides higher current  $I_2$ . As can be seen from Fig. 4, any variation in the load current from the optimum load current point leads to an unavoidable power efficiency loss. For LDOs, the efficiency is determined by

$$\eta_{LDO} = \frac{I_o V_o}{(I_o + I_q) V_i},\tag{1}$$

where  $I_o$  is the output current of the LDO and  $I_q$  is the quiescent current. With balanced current sharing, each LDO provides  $I_o/2$  current and the total efficiency is  $(I_oV_o/2)/(I_o/2+I_q)V_i = I_oV_o/(I_o + 2I_q)V_i$ . With unbalanced current sharing, one of the LDOs provides  $I_1$  current and the other one provides  $I_2$  current with  $I_1 + I_2 = I_o$ . Since MOS transistors have a nearly constant quiescent current with respect to the load current [23], the total efficiency can be expressed as  $(I_1+I_2)V_o/(I_1+I_2+2I_q)V_i$ , which is the same as the balanced current sharing case. Theoretically, there is no significant efficiency degradation due to unbalanced current sharing for LDOs, however, larger currents induced by the unbalanced current sharing do adversely affect the reliability as will be discussed in Section V.

Buck regulators will be the focus throughout the paper, however, the proposed techniques can also be tailored for SC and LDO regulators. The regulator loss model and optimum efficiency discussions are provided in Section III-A. The extra power loss and efficiency degradation induced by unbalanced current sharing for the general case of N identical distributed on-chip regulators are theoretically explored in Section III-B.

#### A. Regulator loss model and efficiency

The simplified schematic of a synchronous buck regulator is shown in Fig. 5. It is composed of high-side (Q1) and low-side (Q2) power MOSFETs for synchronous rectification, LC filter with parasitic resistances  $R_{DCR}$  and  $R_{ESR}$ , and a feedback control path.



Fig. 5. Simplified schematic of synchronous buck regulator.

The simplified power loss model in [25] is enhanced by including the conduction loss of the capacitor ESR ( $P_{ESR}$ ) for the power loss analysis in synchronous buck regulators

$$P_{loss} = R_{eq} \cdot i_{rms}^2 + P_{ESR} + A \cdot f \tag{2}$$

where  $R_{eq}$  is the regulator equivalent resistance,  $i_{rms}$  is the inductor RMS current, A is the switching power loss factor, and f is the regulator switching frequency. Detailed power loss analysis and expressions for  $R_{eq}$ ,  $P_{ESR}$ , and A can be referred to [24], [25].

Power conversion efficiency can be written as

$$\eta = \frac{P_{out}}{P_{out} + P_{loss}}.$$
(3)

Since  $P_{ESR}$  is independent of the regulator output current  $I_o$ , by setting  $\partial \eta / \partial I_o = 0$ , the maximum efficiency for the continuous conduction mode (CCM) operation is obtained as [25]

$$\eta_{max} = \frac{1}{1 + 2\frac{R_{eq}}{V_o} \cdot I_{o\_opt}} \tag{4}$$

at the optimum load current of

$$I_{o\_opt} = \sqrt{\frac{A \cdot f + P_{ESR}}{R_{eq}} + \frac{1}{12}I_{p-p}^2}$$
(5)

where  $V_o$  and  $I_{p-p}$  are, respectively, the regulator output voltage and inductor peak to peak current.

# *B.* Efficiency degradation of distributed regulators with unbalanced current sharing

Consider two identical buck regulators and assume the total load current supplied by these two regulators is  $I_o$  and each regulator design is optimized at  $I_o/2$ . With unbalanced current sharing, the load current supplied by the two regulators are, respectively,  $I_1$  and  $I_2$  for regulators 1 and 2. Current sharing ratio (*CSR*) for the two regulators are

$$CSR_1 = \frac{I_1}{I_o}, CSR_2 = \frac{I_2}{I_o}.$$
 (6)

According to (2), for CCM operations, the extra power loss induced by the unbalanced current sharing for two regulators as compared to the balanced case is

$$P_{loss\_2}^{ex} = R_{eq} \cdot I_o^2 \cdot (CSR_1^2 + CSR_2^2 - \frac{1}{2})$$
(7)

and  $P_{loss_2}^{ex} = 0$  if and only if when  $CSR_1 = CSR_2 = 1/2$ , otherwise  $P_{loss_2}^{ex} > 0$ , which means that unbalanced current sharing leads to extra power loss.

Efficiency degradation due to unbalanced current sharing can be written as

$$\eta_{deg_2} = \eta_{max}|_{I_{o\_opt} = \frac{I_o}{2}} - \frac{V_o}{\frac{V_o}{\eta_{max}|_{I_{o\_opt} = \frac{I_o}{2}}} + R_{eq} \cdot I_o \cdot (CSR_1^2 + CSR_2^2 - \frac{1}{2})}, \quad (8)$$

where  $\eta_{max}|_{I_o_opt} = \frac{I_o}{2}$  is the maximum efficiency at the optimum load current of  $I_o/2$ . Note that  $\eta_{deg} = 0$  for balanced current sharing.

Equations (7) (8) can be generalized for N identical distributed on-chip VRs with each design optimized at  $I_o/N$  for a total load current of  $I_o$  as explained below.

The extra power loss induced by unbalanced current sharing with  $CSR_i$  for the  $i^{th}$  regulator is

$$P_{loss\_N}^{ex} = R_{eq} \cdot I_o^2 \cdot (\sum_{i=1}^N CSR_i^2 - \frac{1}{N}).$$
(9)

The total efficiency degradation induced by unbalanced current sharing is

$$\eta_{deg_N} = \eta_{max}|_{I_{o\_opt} = \frac{I_o}{N}} - \frac{V_o}{V_o} - \frac{V_o}{\eta_{max}|_{I_{o\_opt} = \frac{I_o}{N}} + R_{eq} \cdot I_o \cdot (\sum_{i=1}^{N} CSR_i^2 - \frac{1}{N})}.$$
 (10)

Note that (9) (10) can be applied to a wide range of load current. As phase shedding technique [27], [28] for conventional multiphase converters and converter gating technique [5] for distributed on-chip VRs are well developed to enhance the light load efficiency and achieve a high efficiency over a wide load range, the number of active VRs  $N_{active}$  can be dynamically changed to make sure that each regulator can operate at the optimal efficiency point under various load conditions with balanced current sharing. Thus, (9) (10) hold for extra power loss and efficiency degradation calculations under a wide load range.

As an example, using design parameters in [3] for the fully integrated buck regulator, the extra power loss and efficiency degradation are evaluated for two and three distributed buck regulator cases with different CSR values. Each regulator is optimized at 225mA and the total load currents are, respectively, 450mA and 675mA for two and three regulator cases. As can be seen from Fig. 6, as CSR varies from the balanced current sharing point ( $CSR_1 = 0.5$  for two regulator case,  $CSR_1 = CSR_2 = 1/3$  for three regulator case), the additional power loss and efficiency degradation increase rapidly. Moreover, the highest extra power loss and efficiency degradation points for the three regulator case are worse than the two regulator case. It is difficult to visually demonstrate the extra power loss and efficiency degradation change when the number of regulators increase over three. With more number of regulators and larger output current, however, the highest extra power loss and efficiency degradation further increase. This indicates that significant attention should be paid to guarantee



Fig. 6. Unbalanced current sharing induced extra power loss and efficiency degradation as a function of  $CSR_i$  for N identical distributed on-chip VRs. (a) Extra power loss, N=2. (b) Efficiency degradation, N=2. (c) Extra power loss, N=3. (d) Efficiency degradation, N=3.

the proper current sharing among distributed on-chip VRs that are widely used in high performance microprocessors.

# IV. STABILITY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Stable operation of the stand-alone on-chip VR as well as the whole PDN is the basis for every other performance metric. Oscillations can occur due to an unstable internal feedback loop of a single VR or interactions among different VRs. The stability issue, if not properly addressed, can adversely affect important design aspects including line and load regulations, making other performance enhancing techniques useless.

Stability implications of unbalanced current sharing are explored for both individual on-chip VRs and the PDN as a whole in this section. To evaluate the effects of unbalanced current sharing on individual on-chip VRs, the state-space averaging method [29] is applied to obtain the various important transfer functions of closed loop synchronous buck regulators while considering parasitic impedances. For the stability of the whole PDN, the implications of unbalanced current sharing can be examined by analyzing the Y-parameter model of the individual on-chip VRs based on the recently proposed hybrid stability framework for PDNs [8].

#### A. Stability of individual on-chip voltage regulators

The state-space expression for a conventional voltage mode controlled buck regulator with diode rectification and gparameters has been explored in [30]. For the synchronous buck regulator operating in CCM, as shown in Fig. 5, the open-loop g-parameter set can be written as

$$\begin{bmatrix} Y_{i\_o} & T_{oi\_o} \\ G_{io\_o} & -Z_{o\_o} \end{bmatrix} = \frac{\begin{bmatrix} \frac{D^2s}{L} & \frac{D(1+sR_{ESR}C)}{LC} \\ \frac{D(1+sR_{ESR}C)}{LC} & -\frac{(R_E+sL)(1+sR_{ESR}C)}{LC} \end{bmatrix}}{s^2 + s\frac{R_E+R_{ESR}}{L} + \frac{1}{LC}}$$
(11)

$$\begin{bmatrix} G_{ci} \\ G_{co} \end{bmatrix} = \frac{\begin{bmatrix} \frac{sDU_E}{L} \\ \frac{U_E(1+sR_{ESR}C)}{LC} \end{bmatrix}}{s^2 + s\frac{R_E + R_{ESR}}{L} + \frac{1}{LC}} + \begin{bmatrix} I_o \\ 0 \end{bmatrix}$$
(12)

where

$$R_E = R_{DCR} + R_{on\ hs}D + R_{on\ ls}(1-D)$$
(13)

$$U_E = V_i + (R_{on\ ls} - R_{on\ hs})I_o.$$
 (14)

 $Y_{i\_o}$ ,  $T_{oi\_o}$ ,  $G_{io\_o}$ ,  $Z_{o\_o}$ ,  $G_{ci}$ ,  $G_{co}$ , D are, respectively, the open loop input admittance, the output to input current transfer function, the input to output voltage transfer function, the output impedance, the control to input current transfer function, the control to output voltage transfer function, and the duty cycle of the buck regulator.

The line and load regulation capabilities of a buck regulator can be examined by analyzing the closed-loop input to output voltage transfer function  $G_{io_c}$  and the output impedance  $Z_{o_cc}$ , respectively. To achieve a stable line and load regulation, all poles of the corresponding transfer function need to lie within the left-half of the s-plane. The closed-loop g-parameters can be obtained based on the open-loop g-parameters and the relationship demonstrated in [30]. Assuming Type III compensation [31], the characteristic equation of  $G_{io_cc}$  and  $Z_{o_cc}$  is

$$CLs^{2} + (CG_{a}G_{cc}G_{se}U_{E}R_{ESR} + CR_{ESR} + CR_{E})s + G_{a}G_{cc}G_{se}U_{E} + 1 = 0$$
(15)

where  $G_{se}$ ,  $G_{cc}$ ,  $G_a$  are, respectively, the sensing gain of the output voltage, the transfer function of the error amplifier (EA) and compensator, and the PWM generator gain. Typically,  $G_{se}$  and  $G_a$  are constant. As some of the coefficients are a function of  $I_o$ , solutions of (15) change as  $I_o$  changes. For N identical distributed on-chip VRs with unbalanced current sharing, some of the parallel on-chip VRs will supply more current while others will supply less, leading to the movement of system poles. As the stability is affected by the right-half plane (RHP) poles, we define a CSR- and N-dependent function S(CSR, N) as

$$S(CSR, N) = \begin{cases} \max_{i=1,\dots,n} \{Re(p_i)\}, & \max_{i=1,\dots,n} \{Re(p_i)\} < 0\\ \min_{i=1,\dots,j} \{Re(p_i^+)\}, & otherwise \end{cases}$$
(16)

where  $n, j, p_i$   $(i = 1, ..., n), p_i^+$  (i = 1, ..., j) are, respectively, the total number of system poles, the total number of RHP (or 0) poles, the *i*<sup>th</sup> system pole, and the *i*<sup>th</sup> RHP (or 0) pole. |S(CSR, N)| either indicates how close the system is to be unstable (for  $\max_{i=1,...,n} {Re(p_i)} < 0$ ) or how far the system has gone beyond the marginally stable point (for *otherwise*). The system is stable if S(CSR, N) < 0 and unstable otherwise.

Using similar design parameters in [3],  $S(CSR_i, N)$  for the  $i^{th}$  VR within N identical distributed on-chip VRs is plotted as a function of  $CSR_i$  and N in Fig. 7. It can be seen from Fig. 7 that, for a fixed number N,  $S(CSR_i, N)$  increases as  $CSR_i$  increases. Note that although all  $CSR_i$  values are plotted even for large number of N in Fig. 7 for completeness, due to the maximum current supply capability of a single VR, inductor



Fig. 7. Stability of individual on-chip VR as a function of  $CSR_i$  and N.

current of individual VR can become saturated and the CCM model is no longer valid. The output voltage can drop [22] for large number of N and  $CSR_i$  values, for example N = 80 and  $CSR_i = 0.5$ . Also, as N becomes large,  $S(CSR_i, N)$  approaches the unstable region from the stable one as  $CSR_i$  increases, indicating the negative effects of unbalanced current sharing on the stability and proper operation of individual VR.

#### B. Stability of the power delivery network

A sufficient condition for stability checking of the PDN network is proposed in [8] based on the hybrid stability framework. This condition consists of a complementary way of using either passivity evaluation or system gain evaluation for LTI systems. By satisfying either one of these two conditions, the stability of the PDN can be guaranteed. For stability checking using the system gain condition, a Z-parameter model of the passive subnetwork is needed for evaluation. The passive subnetwork model can vary for different applications or design requirements, which makes it difficult to evaluate the general effects of unbalanced current sharing on the stability of PDN. However, the passivity evaluation does shed light on this point.

The synchronous buck regulator system is approximated as a linear continuous-time time-invariant system through statespace averaging method [32]. Thus, the passivity criterion [8] can be applied, which is given by

$$\lambda_{min}(j\omega_k) = \min_{i=1,\dots,N; j=1,2} \{\lambda_j(\boldsymbol{Y}_i(j\omega_k) + \boldsymbol{Y}_i^H(j\omega_k))\}$$
(17)

where  $\lambda_{min}(j\omega_k)$  is the minimum eigenvalue among any  $i^{th}$  VR at  $\omega_k$  and H denotes the complex conjugate transpose. Passivity condition is met for the VRs if  $\lambda_{min}(j\omega_k) \ge 0$ .

The Y-parameter model for the  $i^{th}$  VR can be obtained through the closed-loop g-parameters. Note that the Yparameter model is a function of individual VR output current  $I_o$  and thus with unbalanced current sharing, it will be affected and so does  $\lambda_{min}(j\omega_k)$ . Using the same design parameters in Section IV-A,  $\lambda_{min}^i(j\omega_k)$  is examined for the  $i^{th}$  VR under different  $CSR_i$  and N values in Fig. 8, where

$$\lambda_{\min}^{i}(j\omega_{k}) = \min_{j=1,2} \{\lambda_{j}(\boldsymbol{Y}_{i}(j\omega_{k}) + \boldsymbol{Y}_{i}^{H}(j\omega_{k}))\}.$$
 (18)



Fig. 8.  $\lambda_{min}^{i}(j\omega_k)$  as a function of  $f_k$  under different values of  $CSR_i$  and N.  $\lambda_{min}^{i}(j\omega_k)$  shifts rightwards as  $N \cdot CSR_i$  increases, demonstrating the adverse effects of unbalanced current sharing on VR passivity.

 $\lambda_{min}^{i}(j\omega_{k})$  remains negative for  $f_{k} < 10$ MHz and positive for  $f_{k} > 100$ MHz. As  $I_{o}$  supplied by the  $i^{th}$  VR, (*i.e.*,  $N \cdot CSR_{i}$ ), increases,  $\lambda_{min}^{i}(j\omega_{k})$  shifts rightwards, rendering the following

$$\lambda_{min}(j\omega_k)|_{\omega_k \le \omega_{k0}} = \min_{\substack{i=1,\dots,N}} \{\lambda^i_{min}(j\omega_k)\}$$

$$= \lambda^i_{min}(j\omega_k)|_{CSR_i = CSR_{max}}$$
(19)

where

$$\lambda_{\min}(j\omega_{k0}) = 0, \ CSR_{\max} = \max_{i=1,\dots,N} CSR_i.$$
(20)

For example, at  $f_k = 45$  MHz, with balanced current sharing (CS), (*i.e.*,  $\forall N$ , balanced CS),  $\lambda_{min}(j\omega_k)|_{\omega_k=9\pi\cdot 10^7} > 0$ , the passivity condition is satisfied. However, with unbalanced current sharing case, (*e.g.*, N = 20,  $CSR_i = 0.1$ ),  $\lambda_{min}(j\omega_k)|_{\omega_k=9\pi\cdot 10^7} < 0$ , which pushes the originally passive point to the potentially unstable region, indicating the adverse effects of unbalanced current sharing on the stability of the whole PDN.

### V. RELIABILITY IMPLICATIONS OF UNBALANCED CURRENT SHARING

Electromigration (EM) induced wear-out dictates the lifetime of each component of the PDN. EM results in gradual mass transport in metal conductors along the direction of an applied electric field, which in turn may cause open or short circuits. The metal wires in the PDN are particularly vulnerable to EM as they experience uni-directional currents [33], and such constant stress reveals EM failures faster. EM grows with current density *J*.

Black's equation [34] captures the mean time to failure (MTTF) due to EM:

$$MTTF = AJ^{-n}exp(Ea/kT) \tag{21}$$

where A is a constant that depends on the geometry, Ea is the EM activation energy, k is Boltzmann constant, n is a material-specific constant, and T is the temperature. Following [18],



Fig. 9. MTTF as a function of  $CSR_i$ .

Black's equation can be adjusted to consider current crowding and Joule heating as

$$MTTF = A(cJ)^{-n} exp[Q/k(T + \Delta T)]$$
(22)

where both Q and c are material-specific constants.

Consider N identical distributed on-chip VRs, each of which optimized for a load current of  $I_o/N$ , where  $I_o$  represents the total load current. Since J is directly related to  $CSR_i$  at a specific  $I_o$ , MTTF of the metal wire at the output of the  $i^{th}$  regulator can be expressed in terms of  $CSR_i$  as

$$MTTF_{i} = A'(cCSR_{i})^{-n}exp[Q/k(T + \Delta T)]$$
(23)

where A' is a constant that depends on the geometry and  $I_o$ .

For the same example in [3], for two and three regulator cases with a total load current of 450mA and 675mA, respectively, Fig. 9 shows how  $MTTF_i$  for the  $i^{th}$  regulator changes due to unbalanced current sharing. Fig. 9 captures the impact of unbalanced current sharing on MTTF under EM per (23). We report how the MTTF varies as a function of CSR where n= 1.8, Q = 0.8 eV, c = 10, and  $\Delta T = 40^{\circ}$ C [35]. We observe that differences in CSR can result in notable differences in MTTF. The MTTF at CSR = 0.5 (0.33), which corresponds to perfect load balance, is 5 years at 65°C for the two (three) regulator case. For the two regulator case, both regulators would have this same MTTF=5 years at CSR = 0.5. If CSR assumes a higher value than 0.5 for one of the regulators, the MTTF value quickly decreases below 5 years. The other regulator's CSR in this case remains lower than 0.5, and hence induces an MTTF of more than 5 years. In this case, one of the regulators would fail much earlier than the other. Better load balance (i.e., CSR = 0.5 for the two regulator case) mitigates this adverse effect on reliability. Fig. 9 reveals a similar trend for three VR case.

#### VI. ADAPTIVE REFERENCE VOLTAGE CONTROL

The implications of unbalanced current sharing on power efficiency, stability, reliability and overall functionality of the chip are demonstrated above. Balanced current sharing is beneficial to maintain the overall PDN performance. An adaptive reference voltage control method designed specifically for distributed on-chip VRs is proposed to balance the current sharing. The proposed technique is scalable for different number of distributed on-chip VRs and can be used for different types of VRs. The control algorithm is explained and circuit implementation and simulations are presented to verify the effectiveness of the proposed techniques. Practical concerns are also addressed in this section.



Fig. 10. Simplified model of two identical distributed on-chip VRs with power grid effective resistances.

#### A. Adaptive $V_{ref}$ control mechanism

Consider two identical distributed VRs connected to the same power grid. The simplified model is shown in Fig. 10 with the power grid effective resistance included between any two connection nodes within the grid. With a large number of VDD C4 pads, the input voltage of the VRs  $V_i$  can be considered ideal and constant. To perform a steady state analysis with multiple VRs, suppose  $V_{o1} = V_{o2}$ , then  $I_3 = 0$ , and  $R_{eff3}$  can be removed as open circuit. When  $V_{o1} = V_{o2}$ , to make  $I_1 = I_2$  for balanced current sharing,  $R_{eff1}$  and  $R_{eff2}$ have to be equal. However, in practice, due to the location variations of the VRs with respect to the load,  $R_{eff1}$  and  $R_{eff2}$  can hardly be equal, which means variations between  $V_{o1}$  and  $V_{o2}$  are unavoidable to make  $I_1 = I_2$  for balanced current sharing. In fact, the effective resistances  $R_{eff1}$ ,  $R_{eff2}$ , and  $R_{eff3}$  are very small, making the balanced current sharing possible with quite small variations of  $V_{o1}$  and  $V_{o2}$  with negligible effects on the regulated output voltage  $V_o$ .

Based on the above analyses, an adaptive reference voltage  $V_{ref}$  control mechanism that is tailored specifically for distributed on-chip VRs is proposed. A system level block diagram of the proposed adaptive  $V_{ref}$  control method is illustrated in Fig. 11 and the  $V_{ref}$  control algorithm is presented in Fig. 12 for N identical distributed on-chip VRs. The proposed adaptive  $V_{ref}$  control block consists of an average current sensor within each VR, two comparators with N inputs for each (N comparator) [36] to determine the maximum and minimum currents, a current\_mismatch decision block, and a  $V_{ref}$  control logic. For each iteration, the average current value of each VR for that cycle is obtained through the average current sensor and represented by respective output voltage  $V_{sense_i}$  (i = 1, ..., N). The maximum and minimum value of  $V_{sense_i}$  (i = 1, ..., N) are decided by the N comparator [36]. The difference between the maximum and minimum current is compared to a *current mismatch* value by the *current mismatch* decision block. The processed outputs of the N comparator and  $current_mismatch$  decision block serve as the control signals for the  $V_{ref}$  control logic for multi-level  $V_{ref}$  generation through the switch network and resistor string. Mismatch between the maximum and minimum average inductor current indicates unbalanced current sharing. If the mismatch is larger than a certain threshold  $current\_mismatch$ , the proposed  $V_{ref}$  control algorithm is triggered and the corresponding reference voltages are adjusted. current\_mismatch value is added as an option to



Fig. 11. System level block diagram of the proposed  $V_{ref}$  control method and multi-level  $V_{ref}$  generator for N identical distributed on-chip VRs.

adjust the desired accuracy for the current matching among the VRs and to eliminate constant toggling during steady state where all the VR output currents are close to each other. If the optimal load current ( $I_{o_opt}$  in (5)) a single VR can supply is in the range of several hundred mA, a few mA of the threshold value can be considered as balanced current. A threshold value of 30 mA is used in the simulations. A too small threshold value can lead to toggling reference voltages at steady state.

By increasing (decreasing)  $V_{ref}$  of an individual on-chip VR, the output current supplied by that VR will increase (decrease).  $V_{ref\_max}$  and  $V_{ref\_min}$  in Fig. 12 denote the reference voltages for the on-chip VRs with the maximum and minimum average inductor current, respectively. Once the difference between the maximum and minimum average inductor current values is greater than  $current\_mismatch$ ,  $V_{ref\_max}$  is decreased by a voltage step to decrease the output current supplied by the VR which provides the maximum output current.  $V_{ref\_min}$  is increased by a voltage step to increase the output current supplied by the VR which provides the minimum output current. The reference voltages of other VRs remain unchanged.

Note that the  $V_{ref}$  control loop waits n clock cycles before changing the  $V_{ref}$  again. This is done in order to allow the VR's voltage regulation feedback loop to respond before any changes made to the  $V_{ref}$  in the next step. Making the reference control loop slower than the VR's voltage regulation feedback loop improves the stability of the overall system.

As compared to [6], the proposed method does not rely on equalizing duty cycles to balance the current sharing, and thus can be applied to most regulator types that need a reference voltage to operate. Furthermore, as the reference voltage of each VR is adjusted individually with respect to an initial reference voltage, the power noise on the local power grids is less affected by localized load fluctuations.

#### B. Adaptive $V_{ref}$ control implementation

Circuit level implementation of the proposed adaptive  $V_{ref}$  control method is analyzed in this section. Although buck regulator is adopted for demonstration, the proposed  $V_{ref}$  control method can be applied to other regulator types by adopting an appropriate current sensor for that regulator type, as the proposed method is a general way of modulating  $V_{ref}$  to balance the current.

1) Average current sensor: The schematic of the average current sensor [37] is shown in Fig. 13. When the sampling



Fig. 12. Flowchart of the proposed adaptive  $V_{ref}$  control algorithm.

clock  $\phi$  becomes high, the drain voltages of the power MOS-FET and the sense MOSFET are equalized by the operational amplifier. The inductor current from the power MOSFET is mirrored to the sense MOSFET and a corresponding voltage  $V_{sense}$  that is proportional to the inductor current is generated as output.  $V_{sense}$  is maintained when  $\phi$  becomes low. By replacing the ramp signal in Fig. 5 with a symmetrical triangular waveform shown in Fig. 11, a clock signal  $\phi'$  can be generated to sample the instant inductor current value in the middle of the inductor energizing or de-energizing phase, which corresponds to the average inductor current value [37]. As *n* clock cycles need to be skipped before taking the next sample for average inductor current, the frequency  $f_{\phi}$  of the actual sampling clock signal  $\phi$  needs to be  $f_{\phi'}/(n+1)$ .

2) N comparator: The schematic of the N comparator [36] for maximum and minimum current decision is shown in Fig. 14.  $V_{sense_i}$  (i = 1, ..., N) from the output of the average current sensor serves as the input of the N comparator. For the N comparator for maximum current decision, the tail current provided by transistor  $M_{tail}$  is divided into each branch equally when the same voltage is given to all inputs.  $M_i$  (i = 1, ..., N) devices are biased and sized appropriately  $\left(\left(\frac{W}{L}\right)_{M_{tail}}=N\left(\frac{W}{L}\right)_{M_i}\right)$  to reflect this distribution. The voltage input  $V_{sense_i}$  determines the portion of the tail current that passes through each branch. Since the sum of the currents from all the branches must be equal to the tail current provided by the  $M_{tail}$  device, the branch with the highest input voltage gets the largest portion of the tail current. The branch currents are then mirrored and a high resistance output node is formed using the  $M_i$  (i = 1, ..., N) devices. Since  $M_i$  (i = 1, ..., N)devices are biased for 1/N of the tail current, the output voltage becomes logic high when a branch gets more than 1/N of the tail current, which is true for the branch with the highest voltage, and logic low if a branch gets less than 1/N of the tail current. The high resistance node provides high gain at the output but further cascading may be needed to provide rail to rail outputs. Less than 1mV input voltage difference can be distinguished by cascading three stages in the simulations. In the case where the input voltages are very close to each other, this comparator may give incorrect outputs



Fig. 13. Schematic of the average current sensor.



Fig. 14. Schematic of the analog N comparator for maximum and minimum current decision.

where more than one current is minimum or maximum. Considering this case, the outputs of the N comparator  $V_{max_i}$ and  $V_{min_i}$  (i = 1, ..., N) are processed by a digital logic to generate  $V'_{max_i}$  and  $V'_{min_i}$  (i = 1, ..., N) to control the *current\_mismatch* decision block and  $V_{ref}$  control logic shown in Fig. 11. If there are more than one maximum or minimum current, the digital logic simply selects the VR with smaller *i* as the one that supplies the maximum or minimum current. The N comparator for minimum current decision can be implemented as a complement of the N comparator for maximum current decision shown in Fig. 14.

3) current\_mismatch decision: The schematic of the current\_mismatch decision block is shown in Fig. 15. The processed outputs of the N comparator  $V'_{max_i}$  and  $V'_{min_i}$ (i = 1, ..., N) are fed to 2N transmission gates (TG) as selection signals for the maximum and minimum value of  $V_{sense_i}$  (i = 1, ..., N). The maximum and minimum value of  $V_{sense_i}$  serve as the inputs of the current\_mismatch comparator as, respectively,  $V_{max}$  and  $V_{min}$  to generate the enable signal EN for subsequent  $V_{ref}$  control logic. An intentional input transistor size mismatch is introduced for the *current\_mismatch* comparator with larger transistor size connected to  $V_{min}$  as compared to that connected to  $V_{max}$ to achieve the offset voltage  $V_{offset}$  that corresponds to the current\_mismatch value. Only when  $V_{max} - V_{min} >$  $V_{offset}$  will the EN signal be active. As current\_mismatch does not need to be accurate as long as it is larger than  $\Delta(\Delta I)$ ,



Fig. 15. Schematic of the current\_mismatch decision block.

as will be discussed next, practical circuit implementations considering process variations have negligible impacts on the circuit function.

4) Multi-level  $V_{ref}$  generation: The proposed multi-level  $V_{ref}$  generator is composed of a  $V_{ref}$  control logic, a bandgap voltage reference, and a simple resistor string DAC as shown in Fig. 11. There are two resistors with large resistance  $R_b$  at the top and bottom of the string and a few resistors with smaller resistance  $R_s$  connected in the middle to generate the desired  $V_{refs}$ .  $V'_{max_i}$ ,  $V'_{min_i}$  (i = 1, ..., N), EN and a clock signal, which is a delayed version of  $\phi$  are given to the  $V_{ref}$  control logic. This logic determines how the reference voltages for each VR should behave according to the algorithm in Fig. 12. The logic can be implemented completely in verilog and synthesized.

The reference voltage generation requires analog implementation, and this implementation can be a resistor string DAC. The voltage step level that can achieve the desired current\_mismatch value is the LSB of the DAC. The goal of the adaptive  $V_{ref}$  control method is to achieve  $\Delta I =$  $I_{max} - I_{min} < current\_mismatch$ . If without  $V_{ref}$  control,  $\Delta I = \Delta I_0$  and one voltage step change can introduce  $\Delta(\Delta I)$ of  $\Delta I$  change, the number of bits for the DAC ( $N_{DAC}$ ) that is fine enough for balanced current sharing can be estimated as  $N_{DAC} > \log_2(\Delta I_0 / \Delta(\Delta I))$ . A 7-bit DAC is used to achieve a 30mA current\_mismatch value with a voltage step of 1mV in the simulations. In the case where large number of VRs and high resolution DAC are needed, a charge pump can be utilized for each phase after the  $V_{ref}$  control logic for DAC implementation to avoid possible routing problem induced by the resistor string.

#### C. Simulation verifications

To demonstrate the effectiveness of the proposed control method, two and three identical distributed on-chip VR cases are simulated. The power grid parameters are provided in Section VII. Simulation results with constant DC load current are shown in Fig. 16 and Fig. 17, respectively, for the two and three VR cases. In the simulations, ideal  $V_{ref} = 0.5V$  is used to realize 1V output voltage. A  $V_{ref}$  step of 1mV is used in the simulations. The proposed adaptive  $V_{ref}$  control method begins to operate at  $5\mu s$ . As can be seen from Fig. 16 (a)(c) and Fig. 17 (a)(c), for stand-alone VRs operating without



Fig. 16. Simulation results with and without the proposed adaptive  $V_{ref}$  control scheme for two identical distributed on-chip VRs. (a) Inductor currents before and after the proposed  $V_{ref}$  control is applied. (b) A zoomed view of balanced current sharing showing the effectiveness of the proposed  $V_{ref}$  control method. (c) A zoomed view of unbalanced current sharing without the proposed  $V_{ref}$  control. (d)  $V_{refs}$  signal change showing the operation of the proposed  $V_{ref}$  control method.



Fig. 17. Simulation results with and without the proposed adaptive  $V_{ref}$  control scheme for three identical distributed on-chip VRs. (a) Inductor currents before and after the proposed  $V_{ref}$  control is applied. (b) A zoomed view of balanced current sharing showing the effectiveness of the proposed  $V_{ref}$  control method. (c) A zoomed view of unbalanced current sharing without the proposed  $V_{ref}$  control. (d)  $V_{refs}$  signal change showing the operation of the proposed  $V_{ref}$  control method.

proper  $V_{ref}$  control, large inductor current variations occur among those VRs. After the proposed  $V_{ref}$  control mechanism is applied, seen from Fig. 16 (a), (b) and Fig. 17 (a), (b), the unbalanced current converges quickly to the balanced one for both two and three VR cases. Also, as can be seen from Fig. 16 (d) and Fig. 17 (d), only small variations of reference voltage lead to quite good inductor current match and meanwhile the proper operation of the VRs is guaranteed. Simulation results with a fast changing sinusoidal and a step current load are shown in Fig. 18. In the simulations, the frequency of the sinusoidal wave is ten times of the VR switching frequency. As can be seen from Fig. 18, the proposed  $V_{ref}$  control method works well under changing load currents.



Fig. 18. Simulation results with sinusoidal and step load current for three identical distributed on-chip VRs. (a) Sinusoidal load current applied at  $2\mu s$ . (b) Step load current waveform applied. (c) Balanced inductor currents under sinusoidal current load. (d) Balanced inductor currents under step current load. (e) A zoomed view of balanced inductor currents near the rising edge of the step current load. (f) A zoomed view of balanced inductor currents near the falling edge of the step current load.

#### D. Practical concerns

Considering the practical implementations of the  $V_{ref}$ control method, there are parasitic impedances between the generated reference voltage and the corresponding error amplifier introduced by the distribution wires. The impedance of the distribution wires among different VRs can be different. Also, there can be VR components and control loop mismatches. Considering these effects, simulations are performed by introducing wire resistances and capacitances as well as VR components and loop delay mismatches to justify the effectiveness of the proposed method. 1mm distribution wire is assumed in the simulations. Based on IBM 130nm process, the parasitic resistance and capacitance are, respectively, around  $70\Omega$  and 230fF. A 10% mismatch is introduced among each VR regarding distribution wire impedance, L, C,  $R_{DCR}$ ,  $R_{ESR}$ , Q1, Q2 size. 5ns control loop delay difference is introduced among each phase. The simulation results for three phases are shown in Fig. 19. As can be seen from the simulation results, the proposed method is immune to these mismatches.

# VII. CASE STUDY: IBM POWER8 LIKE MICROPROCESSOR

**Benchmarks:** All the benchmarks used in the simulations are from SPLASH2x [38]. The benchmarks experimented represent typical application domains and features. Eight threads are involved in the simulations and analysis is limited to the region-of-interest of the benchmarks.



Fig. 19. Simulation results with and without the proposed adaptive  $V_{ref}$  control scheme for three distributed on-chip VRs under distribution wire and VR mismatches. (a) Inductor currents before and after the proposed  $V_{ref}$  control is applied. (b) A zoomed view of balanced current sharing showing the effectiveness of the proposed  $V_{ref}$  control method under distribution wire and VR mismatches. (c) A zoomed view of unbalanced current sharing without the proposed  $V_{ref}$  control. (d)  $V_{refs}$  signal change showing the operation of the proposed  $V_{ref}$  control method.

Architecture: An IBM POWER8-like [17] processor is modeled to quantitatively characterize unbalanced current sharing effects. The technology and architecture parameters of the processor are summarized in Table I. The schematic of a core is shown in Fig. 20a, which contains a private L2, an instruction scheduling unit (ISU), an execution unit (EXU), a load store unit (LSU), and an instruction fetch unit (IFU). L1 data cache is a part of LSU, while L1 instruction cache resides inside IFU. Fig. 20b illustrates the whole chip floor plan, which contains 8 cores, 96 identical on-chip regulators, shown as little squares, network-on-chip (NOC), and memory controller (MC).

Simulation framework: Dynamic power traces are collected by integrating MR2 [39] version of McPAT [40] into SNIPER6.0 [41] micro-architectural simulator. Then, we calculate the static power of each unit based on its temperature and area. We use the equation from [42] to capture temperature-dependence of static power. The static power of the whole chip is calibrated in a way that it takes less than 30% of the total chip power at 80°C. Hotspot6.0 [43] is used to find the transient temperature across the chip. Transient temperature (output of Hotspot) is used to calculate the static power (input to Hotspot). So, we iteratively run Hotspot and update the static power numbers until they converge. Default parameters of Hotspot are used. VoltSpot is deployed to capture the current distribution among VRs at different locations and the method from [18] is followed to generate cycle-accurate power traces. One sample contains 2K cycles and 200 samples are obtained with equal distance for each application. The first 1K cycles are used for warm-up and the rest for analysis. 4 clock cycles are used as the power trace sampling interval.

**Power grid and voltage regulator properties:** In VoltSpot configurations, the on-chip power grid is designed as a resistive mesh using similar metal width, pitch, and thickness

 TABLE I

 Technology and architecture parameters.

| Technology Parameters                           |
|-------------------------------------------------|
| Technology node: 22nm, Frequency: 4.0GHz        |
| TDP: 150W, Area: $441mm^2$ , Vdd: 1.03V         |
| Architecture Parameters                         |
| # cores: 8                                      |
| issue width: 8                                  |
| 64 architectured FRF, 32 architectured IRF      |
| L1-I cache: 32KB, 8-way, 64B, LRU, 1-cycle hit  |
| L1-D cache: 64KB, 8-way, 64B, LRU, 1-cycle hit  |
| L2 cache: 512KB, 8-way, 128B, LRU, 11-cycle hit |
| L3 cache: 64MB, 8-way, 128B, LRU, 30-cycle hit  |



Fig. 20. Chiplet simplified floorplan.

parameters in [21] for the global, intermediate, and local PDN layers. The unit power grid resistance is around  $8m\Omega$  and the total power grid size is 345 by 345. The effective resistance between any two nodes can be estimated using the equations in [19], [20].

LDOs used in IBM POWER8 microprocessor and FIVRs used in Intel Haswell microprocessor are two state-of-the-art on-chip power delivery solutions. It is demonstrated in [13] that FIVR-based power delivery scheme is more advantageous with large number of cores due to high efficiency over a wide conversion ratio. The gaining impetus and benefits of distributed on-chip voltage regulation together with the advantages of FIVR motivate us to investigate distributed buck regulators in the simulation setups.

96 identical on-chip VRs, with the area of each as  $0.04mm^2$ , are used in the simulations to distribute across the chip as shown in Fig. 20b. The optimal placement of LDOs is first investigated in [44] to meet the IR-drop constraint. To avoid any adversely biased analysis in our simulations, we mimic the algorithm proposed in [45] where a voltagenoise-minimizing technique is proposed to determine the locations of the C4 pads across several benchmarks. We use this algorithm to determine the optimal locations of the onchip VRs that would minimize the voltage-noise. Since the resulting maximum voltage noise only decreases by less than 0.4% with the optimal placement as compared to the uniform distribution, we adopt the uniform placement of the VRs to simplify the analysis. These on-chip regulators are calibrated to match the conversion efficiency of FIVR design in Intel's Haswell processor [28] as it is one of the most efficient regulators in industry. Efficiency curves in [28] are picked for



Fig. 21. Calibrated efficiency curve for the on-chip voltage regulator.



Fig. 22. Power saving and regulator power loss saving with balanced current sharing for different applications.

calibration and each VR provides around 1A load current with the optimum efficiency of about 90%. The calibrated efficiency curve is shown in Fig. 21. The on-chip VR is modeled as an ideal supply voltage in series with a RLC network in VoltSpot [18] simulations. Simpler RL and RC based models have previously been used, respectively, in [1], [46] and in [47] to model VRs. The proposed adaptive  $V_{ref}$  control method can be applied to balance the current sharing.

Simulation results showing the power saving and regulator power loss saving with balanced current sharing for different applications are shown in Fig. 22. Power saving up to 1W and VR power loss saving up to 8% are observed. Note that balancing the current may lead to extra power losses on the power-grid resistors. The total gained power saving is due to the fact that the power saving induced by balanced current sharing can be much larger than the extra power loss consumed on the power grid resistors. For a general case of N distributed VRs, a total load current  $NI_{o\_opt}$  with any  $CSR_i$  (i = 1, ..., N) for the  $i^{th}$  VR, when  $CSR_i$  varies further from the balanced current sharing point, balanced current sharing may introduce more loss on the power-grid parasitic resistors, however, balanced current sharing induced power saving also increases as can be seen from Fig. 6 and (9). With large number of VRs deployed, distributed load currents are supplied by adjacent VRs, which effectively reduces the distance VR output currents travel to balance others. Furthermore, effective resistance between two nodes on the power grid does not increase linearly with distance [19], [20]. Even with quite large distance, effective resistance can be only a few times of the unit power-grid resistance. All these factors contribute to the power savings seen from Fig. 22. More importantly, with balanced current sharing, VR malfunctions can be avoided and stability and reliability are enhanced.

# VIII. CONCLUSIONS

Efficiency, stability, and reliability implications of unbalanced current sharing among distributed on-chip voltage regulators are investigated in this paper both theoretically and through extensive simulations. A current balancing scheme that can be applied to most regulator types is proposed in this work. A simple relationship between the individual voltage regulator output current and its corresponding  $V_{ref}$  is identified for balanced current sharing. And an adaptive  $V_{ref}$ control method based on the relationship is proposed. The proposed method generates and modulates the  $V_{ref}$  for each regulator to balance the output current. The implementation of the method is analyzed and simulations are presented to verify the effectiveness. Regulator power loss saving up to 8%, enhanced system stability, and several years of MTTF improvement are verified through practical case studies.

#### REFERENCES

- [1] I. Vaisband et al., On-Chip Power Delivery and Management, Fourth Edition. Springer, 2016.
- [2] J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.
- [3] C. Huang and P. K. T. Mok, "An 84.7% Efficiency 100-MHz Package Bondwire-Based Fully Integrated Buck Converter With Precise DCM Operation and Enhanced Ligh-Load Efficiency," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 11, pp. 2595-2607, Nov. 2013.
- [4] S. S. Chong and P. K. Chan, "A 0.9-uA Quiescent Current Output-Capacitorless LDO Regulator with Adaptive Power Transistors in 65-nm CMOS," *IEEE TCAS-I*, vol. 60, no. 4, pp. 1072-1081, April 2013.
- [5] O. A. Uzun and S. Köse, "Converter-Gating: A Power Efficient and Secure On-Chip Power Delivery System," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, vol. 2, no. 2 pp. 169-179, 2014.
- [6] J. F. Bulzacchelli *et al.*, "Dual-Loop System of Distributed Microregulators with High DC Accuracy, Load Response Time Below 500 ps, and 85-mV Dropout Voltage," *IEEE JSSC*, vol. 47, no. 4, pp. 863-874, 2013.
- [7] I. Vaisband and E. G. Friedman, "Stability of Distributed Power Delivery Systems with Multiple Parallel On-Chip LDO Regulators," *IEEE Transactions on Power Electronics*, vol. 31, no. 8, pp. 5625-5634, 2016.
- [8] S. Lai, B. Yan, and P. Li, "Localized Stability Checking and Design of IC Power Delivery with Distributed Voltage Regulators," *IEEE TCAD*, vol. 32, no. 9, pp. 1321-1334, 2013.
- [9] Z. Zeng, X. Ye, Z. Feng, and P. Li, "Tradeoff Analysis and Optimization of Power Delivery Networks with On-Chip Voltage Regulation," *IEEE/ACM Design Automation Conference*, June 2010, pp. 831-836.
- [10] S. Köse and E. G. Friedman, "Distributed On-Chip Power Delivery," *IEEE JETCAS*, vol. 2, no. 4, pp. 704-713, December 2012.
- [11] S. Köse et al., "Active Filter-Based Hybrid On-Chip DC-DC Converter for Point-of-Load Voltage Regulation," *IEEE TVLSI*, vol. 21, no. 4, pp. 680-691, April 2012.
- [12] S. Köse and E. G. Friedman, "On-Chip Point-of-Load Voltage Regulator for Distributed Power Supplies," in *Proc. ACM Great Lakes Symp. VLSI*, May 2010, pp. 377-380.
- [13] A. Paul *et al.*, "System-Level Power Analysis of a Multicore Multipower Domain Processor with On-chip Voltage Regulators," *IEEE TVLSI*, vol. 24, no. 12, pp. 3468-3476, 2016.
- [14] W. Lee, Y. Wang, and M. Pedram, "Optimizing a Reconfigurable Power Distribution Network in a Multicore Platform," *IEEE TCAD*, vol. 34, no. 7, pp. 1110-1123, July 2015.
- [15] J. A. Abu-Qahouq, "Analysis and Design of N-Phase Current-Sharing Autotuning Controller," *IEEE TPEL*, vol. 25, no. 6, pp. 1641-1651, 2010.
- [16] G. Eirea and S. R. Sanders, "Phase Current Unbalance Estimation in Multi-Phase Buck Converters," *IEEE PESC*, June 2006, pp. 1-6.
- [17] E. Fluhr et al., "Power8: A 12 core sever-class processor in 2nm soi with 7.6tb/s off-chip bandwidth," ISSCC, Feb. 2014, pp. 96-97.
- [18] R. Zhang et al., "Architecture Implications of Pads as a Scarce Resource," ISCA, June 2014, pp. 373-384.

- [19] S. Köse and E. G. Friedman, "Efficient Algorithm for Fast IR Drop Analysis Exploiting Locality," *Integration, the VLSI Journal*, vol. 45, no. 2, pp. 149-161, 2012.
- [20] S. Köse and E. G. Friedman, "Effective Resistance of a Two Layer Mesh," *IEEE TCAS-II*, vol. 58, no. 11, pp. 739-743, 2011.
- [21] K. Mistry et al., "A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging," *IEEE International Electron Devices Meeting*, Dec. 2007, pp. 247-250.
  [22] F. Ma, W. Chen, and J. Wu, "A Monolithic Current-Mode Buck Con-
- [22] F. Ma, W. Chen, and J. Wu, "A Monolithic Current-Mode Buck Converter with Advanced Control and Protection Circuits," *IEEE Transactions* on Power Electronics, vol. 22, no. 5, pp. 1836-1846, Sept. 2007.
- [23] TIApplicationReportSLVA079 October1999, http://www.ti.com/lit/an/slva079/slva079.pdf
- [24] X. Zhou, T. G. Wang, and F. C. Lee, "Optimizing Design for Low Voltage DC-DC Converters," *IEEE APEC*, Feb 1997, pp. 612-616.
- [25] M. Gildersleeve, H. P. Forghani-zadeh, and G. A. Rincon-Mora, "A Comprehensive Power Analysis and a Highly Efficient, Mode-Hopping DC-DC Converter," *IEEE AP-ASIC*, 2002, pp. 153-156.
- [26] Power MOSFET Electrical Characteristics. [Online]. Available: https://toshiba.semicon-storage.com/info/docget.jsp?did=13415
- [27] P. Zumel, C. Fernández, A. de Castro, and O. García, "Efficiency Improvement in Multiphase Converter by Changing Dynamically the Number of Phases," *IEEE PESC*, 2006, pp. 1-6.
- [28] E. A. Burton *et al.*, "FIVR Fully Integrated Voltage Regulators on 4th Generation Intel Core SoCs," *IEEE APEC*, 2014, pp. 432-439.
- [29] R. Middlebrook and S. Cuk, "A General Unified Approach to Modelling Switching-converter Power Stages," *IEEE PESC*, 1976, pp. 18-34.
- [30] M. Hankaniemi, "Dynamical Profile of Switched-Mode Converter-Fact or Fiction?" Ph.D. dissertation, Tampere University of Technology, Tampere, Finland, 2007.
- [31] C. P. Basso, Switch-Mode Power Supplies SPICE Simulations and Practical Designs. New York, NY, USA: McGraw-Hill, Inc., 2008.
- [32] B. Johansson, "DC-DC Converters Dynamic Model Design and Experimental Verification," Ph.D. dissertation, Lund University, Lund, Sweden, 2004.
- [33] X. Huang, T. Yu, V. Sukharev, and S. X.-D Tan, "Physics-based Electromigration Assessment for Power Grid Networks," *Design Automation Conference*, 2014, pp. 1-6.
- [34] J. R. Black, "Electromigration: A brief survey and some recent results," *IEEE Transactions on Electron Devices*, vol. 16, no. 4, pp. 338-347, 1969.
- [35] R. Zhang et al., "Tolerating the Consequences of Multiple EM-Induced C4 Bump Failures," *IEEE TVLSI*, vol. 24, no. 6, pp. 2335-2344, 2016.
- [36] J. G. Delgado-Frias and W. R. Moore, VLSI for Neural Networks and Artificial Intelligence. Springer US, 1994.
- [37] V. R. H. Lorentz et al., "Lossless Average Inductor Current Sensor for CMOS Integrated DC-DC Converters Operating at High Frequencies," *Analog Integr. Circ. and Signal Proc.*, vol. 62, no. 3, pp. 333-344, 2009.
- [38] C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," *Tech. Rep. TR-*811-08, Princeton University, Jan. 2008.
- [39] S. L. Xi et al., "Quantifying Sources of Error in McPAT and Potential Impacts on Architectural Studies," *IEEE HPCA*, 2015, pp. 577-589.
- [40] S. Li et al., "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in International Symposium on Microarchitecture, 2009, pp. 469-480.
- [41] T. E. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-core Simulation," in International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, pp. 1-12.
- [42] Y. Zhang et al., "Hotleakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects," Univ. of Virginia Tech. Report CS-2003-05, 2003, pp. 1-15.
- [43] K. Skadron et al., "Temperature-Aware Microarchitecture: Modeling and Implementation," ACM TACO, vol. 1, no. 1, pp. 94-125, March 2004.
- [44] T. Yu and M. D. F. Wong, "Efficient simulation-based optimization of power grid with on-chip voltage regulator," ASP-DAC, 2014, pp. 531-536.
- [45] K. Wang, B. H. Meyer, R. Zhang, M. Stan, and K. Skadron, "Walking pads: Managing C4 placement for transient voltage noise minimization," ACM/EDAC/IEEE Design Automation Conference (DAC), 2014, pp. 1-6.
- [46] Altera Application Note AN 574. [Online]. Available: https://www.altera.com/content/dam/alterawww/global/en\_US/pdfs/literature/an/an574.pdf
- [47] R. Zhang et al., "Transient Voltage Noise in Charge-Recycled Power Delivery Networks for Many-Layer 3D-IC," ISLPED, 2015, pp. 1-6.



Longfei Wang received the B.S. degree in electronic information engineering from Southwest Jiaotong University, Chengdu, China, in 2010, and the M.S. degree in electrical engineering from Texas Tech University, Lubbock, TX, USA, in 2013. He is currently working toward the Ph.D. degree in electrical engineering at the University of South Florida, Tampa, FL, USA.

His research interests include on-chip voltage regulation and power management.

**S. Karen Khatamifard** received the B.S. degree in electrical engineering from the Sharif University of Technology (SUT), Tehran, Iran, in 2013. He is currently a Ph.D. candidate at the University of Minnesota, Twin Cities, USA. He joined the ALTAI group at the University of Minnesota in 2013. His main research interest is computer architecture.



**Orhun Aras Uzun** received the B.S. degree in electronics engineering from Istanbul Technical University, Istanbul, Turkey, in 2012. He is currently a graduate student at the University of South Florida, Tampa, FL, USA.

His research interests include on-chip voltage converters and analog/mixed signal circuit design.



Ulya R. Karpuzcu is an assistant professor of Electrical and Computer Engineering at the University of Minnesota, Twin-Cities. She holds an M.S. and Ph.D. in Electrical and Computer Engineering from University of Illinois, Urbana-Champaign. Her research interests span the impact of technology on computing systems, energy efficient computer architecture, hardware reliability, and approximate computing. She is a Fulbright fellow and the recipient of NSF CAREER Award.



**Selçuk Köse** (S'10-M'12) received the B.S. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2006, and the M.S. and Ph.D. degrees in electrical engineering from the University of Rochester, Rochester, New York, in 2008 and 2012, respectively.

He is currently an Assistant Professor with the Department of Electrical Engineering, University of South Florida, Tampa, Florida. He was previously with TUBITAK, Intel Corporation, and Freescale Semiconductor. His current research interests in-

clude integrated voltage regulation, 3-D integration, hardware security, and green computing. Prof. Köse is an Associate Editor of the Journal of Circuits, Systems, and Computers and Microelectronics Journal. He has served on the Technical Program and Organization Committees of various conferences. He is the recipient of NSF CAREER Award, Cisco Research Award, USF College of Engineering Outstanding Junior Researcher Award, and USF Outstanding Faculty Award.