Spin-Based Computing: Device Concepts, Current Status, and a Case Study on a High-Performance Microprocessor

This paper provides a review of various spintronic devices being considered for post-CMOS computing, followed by a case study of all spin logic (ASL) technology on a realistic microprocessor.

By Jongyeon Kim, Ayan Paul, Paul A. Crowell, Steven J. Koester, Senior Member IEEE, Sachin S. Sapatnekar, Fellow IEEE, Jian-Ping Wang, and Chris H. Kim

ABSTRACT As the end draws near for Moore’s law, the search for low-power alternatives to complementary metal–oxide–semiconductor (CMOS) technology is intensifying. Among the various post-CMOS candidates, spintronic devices have gained special attention for their potential to overcome the power and performance limitations of CMOS. In particular, all spin logic (ASL) technology, which performs Boolean operations and transfers the output in the spin domain, has been proposed for enabling new capabilities—such as high density, low device count, and nonvolatility—that were previously impossible with CMOS technology. In this paper, first we provide an overview of the history and the current status of the various spintronic devices being pursued by the research community. Then, we describe how spin-based components are integrated into a computing system and the advantages that result. We use a hypothetical spintronic-based Intel Core i7 as a test vehicle to compare the system-level power requirements of ASL- and CMOS-based systems, taking into consideration the unique demands of spin-based interconnects. We conclude with a brief analysis of current limitations and future directions of spintronic research.

KEYWORDS All spin logic (ASL); interconnect; logic; post-complementary metal–oxide–semiconductor (CMOS); power consumption; spintronics

I. INTRODUCTION

Complementary metal–oxide–semiconductor (CMOS) scaling, otherwise known as Moore’s law, has transformed the way we create, process, communicate, and store information in the digital age [1]–[4]. As we approach the physical limits of CMOS technology, however, it has become increasingly difficult to manage power dissipation issues [5]–[7]. The urgent need for low-power alternatives has led to a flurry of research activity on novel post-CMOS device technologies [8], [9]. Among the various post-CMOS candidates, spintronic devices have gained special attention for their potential to overcome the power and performance limitations of CMOS [10]–[12]. From a computing perspective, spintronic devices potentially have unique features—such as zero static power, instant...
wakeup, reduced device count, and lower switching energy—that were difficult to achieve using CMOS technology. Another intriguing feature of spintronic devices is that they could augment existing Boolean computing methods by enabling an entirely new class of architectures such as processor-in-memory, logic-in-memory, and analog/neuromorphic computing [13]–[15].

Traditional spintronics research has been mainly limited to the materials and single device level, so the actual benefits of spintronics at the system level have been only superficially explored [16], [17]. The main aim of this paper is to clearly describe spintronic technology by exploring the power and performance tradeoffs at the system level using a spintronic-based Intel Core i7 processor as the test vehicle. We chose an all spin logic (ASL) device as the technology platform for this case study, although a similar methodology could be applied to other spintronic devices [18].

To provide a historical perspective, this paper first gives an overview of the various milestones in spintronics research. We then introduce the working principles and development status of various spintronic devices targeted for logic and memory applications. We then describe our benchmarking methodology whereby a simple method for estimating the device count and switching energy is proposed. We also address the signal attenuation issue in spin-based interconnects and present guidelines for assessing and optimizing total interconnect power. Finally, the power consumption of an ASL-based processor is compared with its CMOS counterpart for various device parameters and operating scenarios (e.g., all cores active, one core active, etc.). We believe that the fundamental principles and perspectives described in this study will help guide future spintronic device research and pave the way for a more rapid deployment of spintronic technology.

II. SPINTRONIC DEVICE OVERVIEW

A. Historical Advances

Fig. 1 shows the key milestones in spintronics research. Tunnel magnetoresistance (TMR) effect was first predicted in 1975, opening up the possibility that electrons tunneling through a thin insulator can be controlled by manipulating the relative magnetization of two adjacent ferromagnet layers, which, in turn, induces two states of electrical resistance [19]. In 1988, a similar form of spin-valve effect called giant magnetoresistance (GMR) was discovered in a multilayer structure composed of ferromagnets and a metallic spacer layer [20]. The main difference between TMR and GMR is that TMR uses an insulating tunnel barrier to transmit current while GMR uses a metallic layer. In general, a larger impedance change between parallel and antiparallel states (i.e., a higher magnetoresistance ratio) can be obtained using TMR, while GMR enables a lower stack resistance.

Demonstration of both GMR and TMR at room temperature led to rapid deployment of these concepts to commercial data storage products such as hard disk drives (HDD) and random access memory (RAM) devices [21]–[25]. In 1996, Slonczewski at IBM predicted that the magnetization of a free layer can be toggled using spin-polarized current rather than an external magnetic field. This effect, commonly referred to as spin transfer torque (STT), has since been experimentally verified and proven to lower energy consumption and simplify the memory cell design in comparison to field-based switching [26]. Fig. 2 illustrates STT-based switching in a magnetic tunnel junction (MTJ), a device composed of two ferromagnetic layers, a free layer and a fixed layer, separated by an ultrathin tunneling barrier [27]. When electrons enter
from the bottom fixed layer terminal (as shown on the right-hand side of Fig. 2), only those with the same magnetization manage to tunnel through, exerting spin torque on the free layer. Once the switching is complete, the magnetization directions of the two layers are in parallel to each other, resulting in a low resistance state. When electrons enter from the top free layer terminal, those with the opposite spin direction get reflected back to the free layer, switching the relative magnetization to an antiparallel state. The difference in tunneling current between a parallel state (low resistance) and an antiparallel state (high resistance) is utilized to encode binary data. Typically, the fixed layer is pinned by a single antiferromagnetic layer or a trilayer, forming a synthetic antiferromagnet (SAF) structure that does not rotate or switch during operations [28].

By 2000, experimental research on STT-based magnetization switching had led to the actual demonstration of STT at room temperature, validating the predictions made by theorists [29], [30]. With the advent of new tunnel barriers such as MgO, STT–MTJ devices have now become mature enough to be considered for commercial magnetoresistive random access memory (MRAM) products [31], [32]. Recent trends in STT–MTJ research focus on reducing the switching energy using novel perpendicular anisotropy materials, voltage-assisted switching, and the spin Hall effect (SHE) [33]–[36]. Further details on each of these phenomena will be presented in Section II-C.

Exploiting magnetism for logic computation is a topic of growing interest. The key difference between spintronic devices for memory and logic is that the latter requires not only data storage but also data transfer over longer distances by means of spin. In 1985, researchers proposed that pure spin current can be generated by nonlocal electrical spin injection in a metallic lateral spin-valve (LSV) structure [37]. In the 2000s, LSV switching by spin accumulation and transportation was demonstrated at room temperature [38], [39]. Recently, long spin diffusion materials such as semiconductor and single-layer graphene have been studied to attain longer spin interconnection lengths [40]–[42].

B. Spintronics for Logic

The main attraction of spintronic devices for logic applications is their nonvolatility, which could give computing systems zero static power and instant on–off features. The use of magnetic components to enhance the capability of conventional CMOS is also an active and fertile area of research. In this section, we introduce key spin-based logic devices that are being actively pursued by the materials and device communities.

As Fig. 3(a) shows, an ASL device consists of input and output magnets connected by a channel medium (typically copper or graphene). It utilizes spin injection, spin diffusion, and STT switching in an LSV structure to perform a logic operation [18]. Fig. 3(b) shows the LSV device structure and the measured spin signal $\Delta V/I$ for a metallic channel used to demonstrate the spin-current-induced magnetization switching principle [39]. Here, polarized spin electrons injected and diffused through the channel give rise to a difference in the electrochemical potential between antiparallel and parallel states in the output detector. The spin torque transferred by the polarized spin electrons can then toggle the output magnetization. An ASL device stores information using spin direction of the magnets and communicates using pure spin current, hence the name. Section III discusses this operational principle in greater detail. Since STT switching current scales the magnet dimensions, ASL is generally thought to be a good post-CMOS candidate from a scaling perspective [43].

Domain wall logic (DWL) stores information in the position of a single domain wall (DW) [44]. As shown in Fig. 4(a), a DW is the interface between different magnetic domains and can be shifted along a magnetic wire using spin-polarized current injected from either ends of the wire. This DW motion can be utilized for logic implementation, as shown in Fig. 4(b). The magnetic wire works as a free layer, forming an MTJ with a ferromagnet placed in the middle of DW wire. When a voltage is applied between the input and CLK terminals, the corresponding spin-polarized current causes DW motion to occur along the free layer. Applying a voltage in the reverse direction results in a DW motion in the opposite direction. The position of a DW represents the binary state information which can be read out by applying a voltage between the input and output terminals or between the output and CLK terminals, depending on the specific timing sequence of the signals.

Nanomagnet logic (NML) utilizes magnetization direction as a state variable and processes information through magnetic dipole interaction between neighboring nanomagnets [45], [46]. At first, an NML-based circuit requires an initializing magnetic field to align the magnetization of a nanomagnet chain along the hard axis (meta-stable state). As the magnetic field is removed, each nanomagnet is relaxed into a stable state with a preferred easy axis set by the input magnetization. Output magnetization is determined based on the majority logic performed by the superposition of incoming dipole fields.
Fig. 5(a) and (b) shows a quasi-stable state initialized with a magnetic field and a final stable state after the removal of a magnetic field, respectively. Despite benefits from nanosized dimensions, scaling will be a challenge for NML since the initializing magnetic field will have to increase as the magnet scales [47], [48].

A spin field-effect transistor (spin-FET) is a novel device that combines an ordinary metal–oxide–semiconductor FET (MOSFET) structure with an MTJ [49], [50]. As shown in Fig. 6, a ferromagnet contact is placed on the source side while an MTJ is placed on the drain of the MOSFET. The MTJ on the drain side stores information via spin-polarized current. Then, the stored information is detected by the output current of the transistor depending on the relative magnetization orientation between the source and the drain [51], [52]. The reconfigurable nature of spin-FET coupled with the high integration density of CMOS makes this technology attractive for field-programmable gate array (FPGA) applications.

C. Spintronics for Memory

Spin transfer torque MRAM (STT–MRAM) has been drawing a great deal of attention because it has the potential to combine the speed of SRAM, the density of DRAM, and the nonvolatility of Flash, all while providing good scalability, excellent endurance, and CMOS compatibility [53]. STT–MRAM can improve the cache access latency of last level caches (e.g., > 64 MB) by reducing the global interconnect delay, a critical performance bottleneck in SRAM-based L3 or L4 caches [54], [55]. STT–MTJ has been successfully integrated into advanced CMOS processes and is generally accepted as the most viable storage element for post-CMOS memories [56]–[60]. As shown in Fig. 7, an STT–MRAM bit cell consists of an MTJ and an access transistor. The MTJ stores information with relative magnetization, with STT switching causing magnetization reversal. A write operation is accomplished by alternating the voltage polarities of bit line (BL) and source line (SL), while a read operation is accomplished by sensing the resistance difference between the reference and the accessed cells using a small read current bias.

One of the key directions of STT–MRAM research has been the reduction of the switching current for a given nonvolatility. To address this challenge, perpendicular anisotropy MTJs based on high crystal anisotropy material have been experimentally demonstrated [61]. Another approach is to take advantage of new switching mechanisms such as voltage-controlled magnetic anisotropy (VCMA) and SHE. VCMA-based switching is being considered as a successor to conventional STT as the interfacial anisotropy in a CoFeB/MgO junction can be lowered when a voltage is applied to the MTJ [62], [63].
Fig. 8 depicts the switching sequence for VCMA. A free layer with uniaxial anisotropy has two energetically equivalent states (i.e., parallel and antiparallel states), separated by an energy barrier of $E_b$. In traditional STT switching, the barrier height between the two states remains unchanged, so a large spin-polarized current must be injected for electrons to jump over the $E_b$ barrier and land on the other side. VCMA-based switching, on the other hand, can raise or lower the barrier height, depending on the mode of operation. For example, in retention mode, no voltage is applied to the MTJ, ensuring a high $E_b$ and hence good nonvolatility. During switching, however, the voltage applied to the MTJ lowers $E_b$ and thus reduces the switching energy. When the voltage is off after the switching, $E_b$ is restored back to its former height. This novel switching method can be adopted for energy-efficient MRAM without compromising nonvolatility. Note that applied voltage alone cannot switch the magnetization, so an additional bias in the form of an external magnetic field or spin-polarized current is needed to complete the switching.

Low-energy STT–MTJ switching can also be based on the giant spin Hall effect (GSHE), which is the generation of large spin currents transverse to the charge current direction in specific spin Hall metals (such as Pt, $\beta$-Ta, $\beta$-W, and others) [64]. Fig. 9 illustrates the generation of pure spin current by GSHE, along with the cell structure of a spin Hall torque (SHT) MRAM cell. SHT–MRAM requires three terminals for separate read and bidirectional write operations. Although this three-terminal device potentially results in an area penalty, it offers several advantages over the traditional 1T–MTJ STT–MRAM, including 1) a spin injection efficiency ($I_{\text{spin}}/I_{\text{charge}}$) higher than 100% using optimal metal dimension, which enables a significantly low switching current without impacting nonvolatility; and 2) separate read and write paths, allowing for longer device lifetime and disturb-free read operations. This is because
only the small read current flows through the tunnel oxide as the write current flows through the spin Hall metal itself [65], [66].

There has also been a proposal for utilizing the position of the DW for memory applications [53], [67]. A typical three-terminal DW memory employs two fixed layers in antiparallel configuration for spin injection, which enables a bidirectional DW motion along the free layer to encode binary information [68]. Depending on the position of the DW, two possible relative magnetization orientations of the MTJ are translated to either low or high current during the read operation. Since the current paths for read and write are separated, high-speed operation with improved reliability is possible [69]. A DW logic bit-cell configuration and its basic operations are shown in Fig. 10.

D. Spintronics for Special Functions

Precessional motion and physical randomness in spintronic devices may offer new ways to design special functional blocks. For example, the steady-state magnetization precession induced by the spin torque effect can be used as a spin oscillator to generate a microwave signal [70]. The main advantages of a spin oscillator over a CMOS-based voltage-controlled oscillator (VCO) are its compact size, large frequency tuning range, and good scalability. Fig. 11(a) shows the working principle based on

![Spin-based oscillator](image)
both STT and TMR effects. When a charge current is applied to the MTJ, the spin torque excites the free-layer magnetization into steady-state oscillation, cancelling out the damping torque. Note that the frequency of the oscillation can be tuned by the amount of charge current applied to the MTJ. As shown in Fig. 11(b), the oscillating magnetization of the free layer relative to that of the fixed layer induces a change in resistance generating a time-varying output voltage [71]. Spin oscillators are being explored as an alternative to conventional ring-oscillator-based VCOs or LC–VCOs [72] and may enable new capabilities such as high-density parallel signal demodulators and inter/intrachip wireless communication.

The random thermal fluctuation present in a nanomagnet can be amplified for generating random bits [73]. Fig. 12(a) shows the operation sequence to collect physical random bits from a single MTJ. First, a negative reset ($I_{\text{reset}}$) current initializes an MTJ to an antiparallel state assuming a bottom-pinned MTJ structure. Then, by applying a perturbation current ($I_{\text{perturb}}$, an intermediate write current) that will force the magnetization direction to a neutral state and turning off the bias, a random output can be generated according to the thermal noise in the device. Finally, the MTJ state can be read out using a read bias current ($I_{\text{read}}$) and a sensing circuit. Energy diagrams for each sequence are presented in Fig. 12(b).

Fig. 12. Spin-based random number generator [73]. (a) Operation sequence for collecting physical random bits from a single MTJ. (b) Working principle with energy diagram and corresponding magnetization orientation.

### Table 1: Summary of Key Spintronic Devices

<table>
<thead>
<tr>
<th>Device name</th>
<th>State variable</th>
<th>Operating principle</th>
<th>Key Features</th>
<th>Status/Maturity</th>
</tr>
</thead>
<tbody>
<tr>
<td>All spin logic</td>
<td>Absolute magnetization</td>
<td>Non-local spin transport, STT switching</td>
<td>High scalability, Low switching energy</td>
<td>Non local switching verified</td>
</tr>
<tr>
<td>Domain wall logic</td>
<td>Domain wall position</td>
<td>Current-driven DW motion</td>
<td>Output evaluation by voltage clock</td>
<td>Concept only</td>
</tr>
<tr>
<td>Nanomagnet logic</td>
<td>Absolute magnetization</td>
<td>Magnetic dipole interaction</td>
<td>Energy overhead for initializing field</td>
<td>Single logic gate demo-ed</td>
</tr>
<tr>
<td>Spin-FET</td>
<td>Relative magnetization</td>
<td>Magnetization-dependent output current</td>
<td>High integration FPGA with spin-MOSFET</td>
<td>Junction structure demo-ed</td>
</tr>
<tr>
<td>STT-MRAM</td>
<td>Relative magnetization</td>
<td>Write: STT, Read: TMR</td>
<td>Highly scalable universal memory</td>
<td>Product prototyping stage</td>
</tr>
<tr>
<td>VCMA-MRAM</td>
<td>Relative magnetization</td>
<td>Write: Energy barrier control by voltage, Read: TMR</td>
<td>Additional stimuli for switching direction</td>
<td>MTJ switching demonstrated</td>
</tr>
<tr>
<td>SHE-MRAM</td>
<td>Relative magnetization</td>
<td>Write: GSHE-induced STT, Read: TMR</td>
<td>SHE efficiency depends on SHM dim.</td>
<td>MTJ switching demonstrated</td>
</tr>
<tr>
<td>DW-MRAM</td>
<td>Domain wall position</td>
<td>Write: DW motion, Read: TMR</td>
<td>Separate paths for read and write</td>
<td>Low density array demo-ed</td>
</tr>
<tr>
<td>Spin oscillator</td>
<td>Relative magnetization</td>
<td>Resistance change by spin precession</td>
<td>Compact size, Wide frequency range</td>
<td>Single device tested</td>
</tr>
<tr>
<td>Random Num. Generator</td>
<td>Relative magnetization</td>
<td>Randomness of MTJ switching probability</td>
<td>Compact size, tunability</td>
<td>Concept only</td>
</tr>
</tbody>
</table>
Table 1 summarizes the post-CMOS spintronic devices reviewed in this section.

III. ASL COMPONENTS

The power and performance evaluation of spin-based computing system based on ASL is of particular interest due to unique features such as nonvolatility, high density, lower device count per gate, and good scalability. This section provides an overview of all spin-based components, starting from individual devices and logic gates to functional blocks and processor systems.

A. ASL Device Basics

A conceptual diagram of an ASL-based inverter utilizing the LSV structure is shown in Fig. 13. Although ASL devices come in several different forms [for example, the injector current can be a clock pulse or a constant direct current (dc) supply, and the interface between the nanomagnet and the channel can be either a direct contact or a magnetic tunneling junction depending on the material type], they all share the same basic components: input and output nanomagnets to store digital information, a channel to transfer spin information to the next stage, an isolation layer to provide separation between devices, and an interface between the nanomagnet and channel for injecting spin-polarized electrons.

Input and output nanomagnets have two possible magnetization states (represented by left and right pointing arrows in Fig. 13) and are connected through a channel. The input current \( I_{\text{supply}} \) provided by a supply voltage pulse \( V_{\text{supply}} \) passes through the input magnet, generating spin-polarized electrons in the channel entrance. These accumulated spins induce nonequilibrium magnetization, enforcing spin diffusion along the channel in the form of spin current \( I_{\text{spin}} \), which transfers only spin angular momentum without charge flow.

Note that a positive \( V_{\text{supply}} \) results in \( I_{\text{spin}} \) with the opposite magnetization direction as the input magnet. Conversely, a negative \( V_{\text{supply}} \) results in \( I_{\text{spin}} \) with the same magnetization direction as the input magnet. Subsequently, \( I_{\text{spin}} \) propagates through the channel exerting spin torque on the output magnet. Once \( I_{\text{spin}} \) exceeds a certain switching threshold, the magnetization direction of the output magnet toggles. Thus, depending on the

![Fig. 13. (Top) Conceptual diagram of an ASL-based inverter. Net spin polarization (i.e., the difference between majority and minority spins) shown in this image. The desired properties for all subcomponents are listed. (Bottom) Waveforms illustrate the operating principle.](image-url)
polarity of $V_{\text{supply}}$, we can obtain either an INVERT function (positive $V_{\text{supply}}$), or a COPY function (negative $V_{\text{supply}}$) using the simple ASL device shown in Fig. 13. One key requirement for proper operation is to ensure spin information flows from the input toward the output while information flowing in the other direction is blocked. This directionality can be achieved by placing the GND node closer to the input terminal than the output terminal, as shown in Fig. 13 [74]. It has been shown that a large $I_{\text{spin}}$ generated at the input can diffuse toward the output while spin injection in the opposite direction is greatly reduced.

Another important point to note here is that spin can only propagate over a certain distance, which is known as spin diffusion length. Beyond that point, spin transfer becomes negligible. It is, therefore, critical to use a channel material that can support longer diffusion length in order to ensure low-power and high-speed spin transport. Section V discusses this issue in greater detail.

B. ASL Gate Implementation

Fig. 14 shows an ASL device with a positive $V_{\text{supply}}$ implementing various Boolean operations. Note that the same configuration results in different Boolean logic functions for a negative $V_{\text{supply}}$. Without loss of generality, we choose to construct gates using a positive $V_{\text{supply}}$. We now describe each type of Boolean logic gate in more detail.

As shown in Fig. 14(a), an inverter can be implemented using a single spin device comprising two magnets and a channel. A buffer (or COPY operation) can be implemented by adding another magnet at the output of the inverter, in which the second and third magnets constitute another inverter, as shown in Fig. 14(b). When it comes to implementing multiple input gates, spin devices have to rely on the majority function (or inverse of majority function for a positive $V_{\text{supply}}$), where the output value is based on whether the majority of the inputs are in a “0” or in a “1” state. For example, a NAND gate based on majority logic is depicted in Fig. 14(c). Magnets with a fixed spin polarity, known as fixed magnets (denoted as “F”), may be used in order to achieve the desired Boolean function at the output. Magnetization of the output magnet is determined by the superposition of spin-polarized signals from all input magnets and fixed magnets. Note that an AND gate can be simply implemented by adding one magnet at the output node of a NAND gate.

Another interesting feature of all-spin gates is that they can be easily reconfigured (e.g., NAND to NOR, NOR to NAND) by switching the magnetization direction of the fixed magnets, as shown in Fig. 14(d). Generally speaking, an $N$-input gate can be constructed using $N$ free magnets and $N - 1$ fixed magnets. These basic ASL gates are summarized in Fig. 15(a), and truth tables for multiple-input gates are shown in Fig. 15(b). In cascaded spin logic implementation, each output magnet of a gate becomes the input magnet of the next gate, so one of the magnets can be removed without affecting the logic function (as can be seen in Fig. 16). It is obvious that the gates connected to the primary inputs will require one input magnet for each input signal. However, all subsequent gates in the cascaded structure can simply be implemented with fixed magnets and an output magnet only. Therefore, the total number of ASL devices required for the entire logic block implementation can be calculated as follows:

$$\text{total device count} = (\# \text{ of primary input magnets})$$

$$+ \sum_{\text{all gates}} (\# \text{ of fixed and output magnets}). \quad (1)$$

Table 2 shows the device count comparison of a logic block using CMOS gates, individual spin gates, and
cascaded spin gates. The number of devices for the cascaded ASL configuration can be calculated by subtracting the number of primary input magnets from the individual ASL gate’s total device count. Interestingly, the number of devices for a cascaded ASL configuration is half the number of devices required for CMOS implementation. This is indeed valid for typical logic blocks where the number of magnets connected to the primary input is small enough compared to the total device count, including the input, output, and fixed magnets. Consequently, large combinational logic blocks can be implemented by using primarily the fixed and output magnets only. This device count estimation method is based on a drop-in replacement scenario in which each CMOS gate is replaced by an equivalent ASL gate. However, the ASL implementation could be made even more efficient if the circuit block can be resynthesized to take advantage of the inherent majority function of ASL [43], [75].

C. ASL Pipeline Implementation

Fig. 17 shows how we can leverage the inherent nonvolatility of spin technology to efficiently implement sequential logic elements such as latches and flip-flops [76]. This is achieved by serially connecting ASL devices while carefully manipulating the CLK and $V_{\text{const}}$ signals. In Fig. 17(a), a level-sensitive positive latch is demonstrated using a pair of magnets. The first magnet controlled by CLK behaves like a switch, while the second magnet with a constant bias $V_{\text{const}}$ acts as a storage device. When the CLK goes high, the latch becomes transparent, and the pair of magnets propagate the input signal. The operation is then reversed when the CLK goes low, storing the input signal in the second magnet. This demonstrates the inherent majority function of ASL in a simple and efficient way.
magnets transfer spin signal from input to output. On the other hand, a low CLK signal disables the spin signal propagation through the first magnet, so the output retains its original state. This construction of the ASL latch closely resembles that of a conventional CMOS latch. Cascading two latches and making them work in a master and slave fashion also leads to an edge-triggered ASL flip-flop, as illustrated in Fig. 17(b). The device count for the ASL flip-flop is 4, while a CMOS flip-flop would typically require 20 or more transistors. As such, the design of sequential elements can be drastically simplified with spin technology, resulting in considerable savings in area and power.

The inherent nonvolatility of ASL devices also opens up the possibility of removing sequential elements from the circuit. In a conventional CMOS pipeline, sequential elements are inserted between pipeline stages that are clocked in a synchronized manner, requiring a separate supply voltage and clock for each element [as shown in Fig. 18(a)]. In contrast, ASL utilizes a single-input terminal for supply voltage and CLK at the same time. By proper manipulation of CLK applied at the input node, data propagation can be controlled without explicit sequential elements. As illustrated in Fig. 18(b), a nonoverlapping dual-phase clock applied to alternate stages of an ASL pipeline enables sequential operation since data propagation only happens when the CLK is enabled. For instance, when CLK2 is low, the first magnet of each ASL pipeline stage [denoted as “B” in Fig. 18(b)] stores the final outcome from the previous pipeline stage. When CLK2 goes high, magnet “B” launches the data to the following stage.

Applying this dual-phase clocking to every other logic gate enables an ultradeep pipeline that increases the throughput of system, as shown in Fig. 18(c). Deeper pipelining in CMOS usually suffers from large power consumption in the sequential elements since the number of sequential elements has an exponential dependency on pipeline depth [77]. In the case of an ASL-based pipeline, however, no sequential elements are present in the system, so the power overhead for realizing an ultradeep pipeline becomes negligible.

D. Device Count Comparison

In this section, we compare the device count between ASL and CMOS using Intel’s Core i7 processor as the target system. The specifications are listed in Table 3 [78]. We consider a processor built with 32-nm high-k metal-gate CMOS technology.

Our initial focus is on gate-level power and performance, so for the time being, we will assume that the global interconnects between subblocks for spin are charge based, not spin based. Furthermore, we will assume no spin attenuation in the local interconnects, which removes the need for local ASL buffers. In reality, spin current cannot travel over a long distance (e.g., several micrometers). As a result, numerous ASL buffers are needed to

<table>
<thead>
<tr>
<th>Function</th>
<th>CMOS</th>
<th>Individual ASL gate</th>
<th>Cascaded ASL gate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td>2</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>Buffer</td>
<td>4</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>2-input NAND</td>
<td>4</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-input NOR</td>
<td>4</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-input AND</td>
<td>6</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>2-input OR</td>
<td>6</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>3-input NAND</td>
<td>6</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-input NOR</td>
<td>6</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-input AND</td>
<td>8</td>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td>3-input OR</td>
<td>8</td>
<td>7</td>
<td>4</td>
</tr>
</tbody>
</table>

Table 2 Device Count Comparison Between CMOS, Individual ASL, and Cascaded ASL Gates

Fig. 17. Implementation of ASL-based sequencing elements. (a) Level-sensitive positive latch. (b) Edge-triggered flip-flop. Clocked magnets control the spin signal propagation.
amplify the attenuated spin signal. An in-depth discussion on this key issue is in Section V.

As described in Section III-C, the total device count for a given ASL block is the sum of the number of fixed and output magnets for the ASL gates and the number of primary inputs for that block. The device count for ASL gates was shown to be roughly half that of CMOS. Intel’s Core i7 processor consists of roughly 1 billion CMOS transistors out of which approximately 0.46 billion are used for SRAM caches while the remaining 0.54 billion are used in random logic. An ASL implementation of the logic part can be simply estimated as $0.54/2 = 0.27$ billion based on (1). For a more accurate estimate, we need to check if the number of input magnets is indeed negligible compared to the total device count. To estimate the number of input magnets, we use a well-established empirical relationship known as Rent’s rule. According to this rule, the relationship between the number of input/output (I/O) terminals of a logic block ($T$) and the number of gates in the logic block ($N$) is given as [79]

$$T = k \cdot N^p$$

(2)

where $k$ is the average number of terminals per gate and $p$ is the connectivity of the gates ($0 < p < 1$). $N$, which is the total number of gates in a logic block, can be roughly estimated using a known $k$ value [80]. Since $k$, which is the average number of terminals per gate, is approximately equal to 2, equivalent logic gates for this particular $k$ value can be assumed to be an inverter. Since an inverter has two transistors and the total number of transistors present in the Core i7 processor is approximately 1 billion, the total number of equivalent logic gates present in the processor can be calculated as $N = 1 \text{ billion}/2 = 0.5$ billion. With known $k$ and $p$ values and previously estimated $N$ value, using Rent’s rule shows that the total number of pins for the ASL-based Core i7 processor is found to be 2830, which is negligible compared to the number of devices used for the ASL gates. This, therefore, confirms that the random logic portion of a spin-based Core i7 chip can be implemented with only 0.27 billion devices.

### IV. METHODOLOGY FOR ESTIMATING ASL POWER DISSIPATION

In this section, we present a methodology to estimate the switching energy of ASL gates considering design space options under process constraints and specific system level requirements.

#### A. Strategy for Switching Energy Calculation

Fig. 19 shows the overview of our switching energy estimation strategy of a single ASL gate. Switching energy can be expressed as $E = V_{\text{supply}} \cdot Q_{\text{total}}$, where $Q_{\text{total}}$ is the total amount of charge applied at the input magnet of the ASL gate for switching the state of the output magnet (which can be expressed as $Q_{\text{total}} = I_{c, \text{critical}} \cdot t_{\text{sw}}$). Here, $I_{c, \text{critical}}$ is defined as the critical charge current for a given switching time $t_{\text{sw}}$. Only a fraction of $I_{c, \text{critical}}$ known as critical spin current ($I_{c, \text{critical}}$) is responsible for switching, and the corresponding fraction is known as the spin injection ratio and denoted by $I_s/I_c$. Therefore, switching energy can be expressed as $E = V_{\text{supply}} \cdot I_{c, \text{critical}} \cdot (I_s/I_c) \cdot t_{\text{sw}}$. This final equation suggests that switching energy of an ASL gate can be reduced either by increasing

![Fig. 18. Construction of an ASL-based pipeline. (a) Conventional CMOS pipeline. (b) Pipeline architecture can be implemented in ASL without any sequencing elements by simply employing nonoverlapping dual-phase clocks. (c) Example of an ultradeep pipeline with one logic gate per pipeline stage.](image_url)
Is or by lowering \( I_{\text{critical}} \). The \( I_{\text{critical}} \) required for a successful switching of output magnet is estimated by a physical simulation framework based on a Landau–Lifshitz–Gilbert (LLG) solver. The inputs to the LLG solver are functions of the material and the dimension of the magnets. These dimensions and material parameters are, in turn, determined by the degree of nonvolatility of the system. Spin injection ratio \( (I_s/I_c) \) is a device parameter that represents a spin transport capability of the LSV structure, which is governed by materials and dimensions of magnet and channel. More details on how each of these parameters can be optimized for minimum chip power will be discussed in Sections IV-B–IV-E.

**B. Thermal Stability Requirements**

In this section, we discuss how to determine the thermal stability factor in the context of a realistic microprocessor system.

Thermal stability \( (\Delta = E_b/k_B T) \) is a measure of how much energy is required to flip the magnetization direction under thermal fluctuation, where \( E_b \) is the energy barrier between two states, \( k_B \) is the Boltzmann constant, and \( T \) is the temperature in absolute scale. To realize a practical nonvolatile system, the thermal stability of each magnet must be high enough so that thermally assisted magnetization reversal can be prevented during the lifetime of the data (e.g., ten years for storage data or one clock cycle for computation data). On the other hand, the thermal stability of a magnet should be minimized for low switching energy. To satisfy these two conflicting requirements, the thermal stability must be determined based on the nonvolatility and switching energy requirements at the system level. To this end, we present a systematic methodology for calculating the optimal thermal stability value in this section.

Our derivation starts from the equation describing the thermal switching probability of a magnet [81]

\[
P(t) = 1 - \exp\left(-\frac{t}{\tau}\right).
\]

Here, \( \tau \) is the relaxation time defined by Néel–Arrhenius equation

\[
\tau = \tau_0 \exp\left(\frac{E_b}{k_B T}\right)
\]

where \( \tau_0 \) is the attempt cycle time (typically of the order of 1 ns). Equation (4) can be further extended to the probability of an entire chip fail as [82]

\[
F_{\text{chip}} = 1 - \exp\left\{-m \frac{t}{\tau_0} \exp\left(-\frac{E_b}{k_B T}\right)\right\}
\]

where \( m \) is the total number of devices in the system and \( t \) is the retention time period. Fig. 20 plots the required...
thermal stability for an ASL Core i7 with a ten-year data retention time as a function of chip failure rate at room temperature (300 K). Note that a 0.27 billion device count is used as estimated in Section III. We see that a thermal stability greater than $69k_BT$ is needed to guarantee a chip failure rate lower than 0.01% (or 1 FIT). Here, FIT stands for failure in time and is equivalent to one failure in $10^9$ device hours of operation.

C. Magnet Dimensions for Ensuring Nonvolatility

We have already seen that degree of nonvolatility is determined by the system-level thermal stability criterion, which, in turn, sets the value of $E_b$ required of the magnetic material. $E_b$ can be expressed as

$$E_b = K_u V = H_k M_s V/2$$

where $K_u$ is the uniaxial magnetic anisotropy energy density and $V$ is the volume of magnet. $H_k$ is the magnetic anisotropy field, which decides the energetic preference of the magnetization direction (often referred to as the “easy axis”). $M_s$ is the saturation magnetization, which occurs when all domains are aligned. Depending on the orientation of the easy axis, magnetic anisotropy can be classified into following two categories: in-plane magnetic anisotropy (IMA) and perpendicular magnetic anisotropy (PMA). The easy axis of IMA lies in the $x$–$y$ plane of the magnet, while that of PMA is perpendicular to the $x$–$y$ plane of the magnet. Fig. 21(a) and (b) shows the dynamic spin motion during switching for IMA and PMA, respectively.

For IMA, thermal stability is primarily determined by shape anisotropy. The surface poles of a magnet produce not only an outward field, but also a counter field inside the magnet. This counter field acts against the magnetization, thereby demagnetizing the magnet, which is why it is also known as the demagnetizing field $H_d$. $H_d$ depends on the geometry of the magnet and becomes weaker in the direction with the longer dimension. This is why the magnetization inside a ferromagnet aligns in the elongated direction, giving rise to shape-induced magnetic anisotropy. Mathematically, $H_d$ can be described as

$$H_d = -4\pi N_d M_s$$

(7)

Fig. 20. Thermal stability required for an ASL Core i7 with 0.27 billion devices to meet a ten-year retention time requirement. Here, we assume a retention error of 1 FIT (= one failure in $10^9$ device hours of operation).
and $N_{dy}$ and is governed by the aspect ratio of the magnet as follows:

$$H_{k,\text{shape}} = 4\pi(N_{dx} - N_{dy})M_s.$$  \hfill (8)

Finally, the $\Delta$ of the IMA can be expressed as

$$\Delta_{\text{IMA}} = \frac{K_u V}{k_B T} = \frac{H_{k,\text{shape}} M_s V}{2k_B T} = \frac{2\pi(N_{dx} - N_{dy})M_s^2 V}{k_B T}. \hfill (9)$$

In terms of spin motion, as shown in Fig. 21(a), IMA shows limited trajectory in the $z$ direction. This indicates that IMA has to overcome a large $H_{dz}$ field that attempts to keep the magnetization within the $x-y$ plane. This translates into a large switching current.

As an alternative to IMA, PMA has been extensively investigated recently to achieve low current switching while maintaining the same degree of thermal stability. As shown in Fig. 21(b), $H_{dz}$ assists the magnetization switching by partially canceling out the perpendicular anisotropy field ($H_{k,\perp}$), resulting in a lower switching current. However, $H_{k,\perp}$ must be larger than the $H_{dz}$ in order to maintain the orientation of the magnetization [59]. This can be achieved by using either high crystal anisotropy from $L_1_0$-phase alloys (e.g., FePt, CoPt, FePd, etc.) or interface anisotropy from a thin CoFeB layer [84]–[86]. The effective perpendicular anisotropy field ($H_{k,\text{eff}}$) is determined by a difference between $H_{k,\perp}$ and $H_{dz}$ as follows:

$$H_{k,\text{eff}} = H_{k,\perp} - H_{dz} = 2K_u/M_s - 4\pi N_{dz}M_s. \hfill (10)$$

The resultant $\Delta$ of the PMA can be expressed as

$$\Delta_{\text{PMA}} = \frac{K_{\perp,\text{eff}} V}{k_B T} = \frac{H_{k,\text{eff}} M_s V}{2k_B T} = \frac{(K_{\perp} - 2\pi N_{dz}M_s^2)V}{k_B T}. \hfill (11)$$

Note that the $\Delta$ of PMA is also affected by magnet dimensions due to $N_{dz}$. Therefore, the thermal stability requirement for both IMA and PMA can be met by adjusting the magnet dimensions according to (9) and (11).

In this work, we consider a crystal anisotropy-based PMA magnet that utilizes a high $K_u$ (previously noted as $K_{\perp}$ for PMA) of specific materials for enhancing thermal stability. Note that interface anisotropy-based PMA requires further reduction in damping and a stronger interface anisotropy in order to be a viable contender in scaled technologies (e.g., 5 nm). Target parameters for the PMA magnet are shown in Table 4. The width and the length of the magnet have been fixed as per the technology node (i.e., 5 nm by 5 nm). The thickness of the magnet is set as one spin diffusion length of the magnet material since a magnet thinner than its spin diffusion length will behave as a leaky polarizer causing an incomplete spin polarization and partial relaxation in the input and output magnets [87]. Based on these magnet dimensions and the given $M_s$ value, the required $K_u$ of the magnet was calculated to be $3.15 \times 10^6$ J/m$^3$ for a thermal stability of 69$k_B T$ using (11).

D. Critical Spin Current for Magnet Switching

Macrospin simulation based on the LLG equation can predict the critical spin current ($I_{\text{critical}}$) required for the output magnet to switch. Material parameters, magnet dimensions, temperature, and physical constants are first given as input parameters. The material parameters include $M_s$, $\alpha$, and $P$. $\alpha$ is the damping factor, which determines how fast the magnetization returns to the easy axis. $P$ is the polarization factor, which is estimated using the difference in the spin-dependent density of states (DOS). The material parameter values used in this work are listed in Table 4, while the required magnet dimensions were estimated in Section IV-C. Dynamic spin motion of the output magnet can be modeled as a time-varying magnetization vector assuming that a macrospin model works for a nanosized ferromagnet. At the equilibrium temperature, thermal fluctuation induces a randomly distributed initial angle between the magnetization vector ($\mathbf{M}$) and the easy axis. Note that, for switching pulses shorter than ~3 ns, spin precession dominates magnetization switching and thus, the initial position of the magnetization vector can be used to account for the switching probability profile [88]. In this work, we use an initial angle of 1.5°, which has been confirmed by Zhao et al. to guarantee reliable switching [89]. When $V_{\text{supply}}$ is turned on, spin current with a

![Table 4: Device Parameters of PMA-based ASL for a Ten-Year Retention Time at 5-nm Technology Node](image)
density of $J_s$ is generated by the input magnet and travels through the channel and exerts spin torque to the output magnet. Here, polarized spin direction depends on the magnetization of the input magnet ($\overrightarrow{M}$) (which is represented as $[0, 0, 1]$), assuming that the easy axis is in the $z$ direction. This spin torque attempts to flip the $\overrightarrow{M}$ in the output magnet against the $H_{k,\perp}$ effect. $H_{k,\perp}$ is mainly governed by the difference between $H_{k,\perp}$ and $H_{d,\perp}$, which can be denoted as the following time-varying vector:

$$\overrightarrow{H}_{k,\perp}(t) = \begin{bmatrix} 0, 0, (2K/M_s)M_z(t) \\ -4\pi M_s \cdot [N_{dx}M_x(t), N_{dy}M_y(t), N_{dz}M_z(t)] \end{bmatrix}. \quad (12)$$

The dynamics of $\overrightarrow{M}(t)$ is described by the LLG equation as follows:

$$\frac{1 + \alpha^2}{\gamma} \frac{d\overrightarrow{M}}{dt} = -\overrightarrow{M} \times \overrightarrow{H}_{k,\perp} - \alpha \cdot \overrightarrow{M} \times (\overrightarrow{M} \times \overrightarrow{H}_{k,\perp}) + \frac{h_i}{2\nu_i M_s} \cdot \overrightarrow{M} \times (\overrightarrow{M} \times \overrightarrow{M}) \quad (13)$$

where $\gamma$ is the gyromagnetic ratio, $\hbar$ is the reduced Planck’s constant, $e$ is the electron charge, and $t_m$ is the thickness of the magnet. For a $J_s$ exceeding the critical value, a dynamic precession is reinforced, which finally switches the magnetization vector to another energetically stable state. Based on the FO of 4 and a switching time of 2 ns, $I_{s,\text{critical}}$ for output magnet switching would be 51 $\mu$A, which will be used to estimate $I_{s,\text{critical}}$ in Section IV-E.

### E. Spin Injection Ratio of ASL Gate

The switching energy of ASL device is primarily a function of the spin injection ratio ($I_s/I_c$). The spin signal ($\Delta V/I$) is proportional to the spin accumulation in the channel and can be analytically derived using the following spin diffusion [90]:

$$\frac{\Delta V}{I} = \frac{P^2 R_{s,m}^2}{2R_{s,m} \exp \left( \frac{L_{ch}}{\lambda_{ch}} \right) + R_{s,ch} \sinh \left( \frac{L_{ch}}{\lambda_{ch}} \right)}. \quad (14)$$

Here, $P$ is the spin polarization factor, $\lambda$ is the spin diffusion length, and $L$ is the channel length. $R_s$ is the spin resistance and can be expressed as

$$R_s = \frac{2\rho \lambda}{[(1 - P^2)S]} \quad (15)$$

where $\rho$ is the resistivity and $S$ is the effective cross-sectional area. If the spin current $I_s$ generated by the charge current $I_c$ is sufficiently large, the transfer of spin angular momentum causes the magnetization of the detector magnet to reverse. When $I_s$ is completely relaxed in the injector magnet, $I_s$ flowing into the detector can be expressed as [91]

$$I_s = \frac{\Delta V}{I} \frac{I_c}{R_{s,m}}. \quad (16)$$

Eventually, by rewriting (16), the spin injection ratio can be derived as

$$I_s = \frac{\Delta V}{I} \frac{I_c}{R_{s,m}} = \frac{P^2 R_{s,m}^2}{2R_{s,m} \exp \left( \frac{L_{ch}}{\lambda_{ch}} \right) + R_{s,ch} \sinh \left( \frac{L_{ch}}{\lambda_{ch}} \right)} \exp \left( \frac{L_{ch}}{\lambda_{ch}} \right). \quad (17)$$

As can be seen in (17), the spin injection ratio depends strongly on the material parameters as well as the device geometry.

Using this analytical model, we can predict the spin injection ratio for ASL gates with varying dimensions. The dimensions of the magnet are estimated based on the thermal stability requirement for a chip failure rate of 1 FIT, as described in Section IV-B. The local channel length is assumed as 10 nm, considering minimum spacing between the two magnets, which is also short enough so that additional buffers are not necessary. The optimal channel thickness is then determined for a high spin injection ratio. Note that a thinner channel reduces the resistance of the input current path (i.e., magnet and channel stack on the input side), but a narrow channel results in a large spin signal loss due to spin scattering. Based on device dimensions and material parameters listed in Table 4, the $\Delta V/I$ and the spin injection ratio of the PMA-based ASL are estimated as 8 $\Omega$ and 22.1% at room temperature, respectively. Finally, the critical charge current ($I_{s,\text{critical}}$) applied to the input magnet can be estimated by $I_{s,\text{critical}} = I_{s,\text{critical}} \cdot (I_c/I_s)$. The minimum value of $V_{\text{supply}}$ is also calculated based on the resistance of input current path. For a switching time of 2 ns, the switching energy of a single ASL gate with FO = 4 can be estimated as 3.5 fJ using $E = V_{\text{supply}} \cdot I_{s,\text{critical}} \cdot (I_c/I_s) \cdot t_{\text{sw}}$.

### V. ASL INTERCONNECT CONSIDERATIONS

One critical issue pertaining to spin-based interconnects is that once spin current enters the channel, it attenuates quickly. Specifically, spin signals have an $\exp(-d/\lambda)$ dependency on interconnect distance $d$ where $\lambda$ is the
material-specific spin diffusion length. Fig. 22 shows a steep decrease in spin injection ratio for a copper channel ($\lambda_{Cu} = 400$ nm). As such, all spin-based interconnect scheme necessitates a large number of ASL buffers to transfer the spin signal over long distances, degrading system performance and leading to a prohibitively high power overhead. This section analyzes this power overhead and explores practical solutions for mitigating it.

### A. Power Overhead of Spin-Based Interconnect

In order to measure the overhead of a spin-based interconnect in ASL Core i7, it is necessary to count the number of ASL buffers needed. Interconnect density function based on Rent’s rule is used to model the statistical distribution of wire lengths in a random logic block [80].

Region I: $1 \leq l \sqrt{N}$

$$i(l) = \frac{\alpha k}{2} \Gamma \left( \frac{l^3}{3} - 2 \sqrt{N} l^2 + 2N l \right) l^{-4}.$$  \hspace{1cm} (18)

Region II: $\sqrt{N} \leq l \leq 2 \sqrt{N}$

$$i(l) = \frac{\alpha k}{6} \Gamma \left( 2 \sqrt{N} - 1 \right) l^{-4}.$$ \hspace{1cm} (19)

Here, $l$ is the interconnect length normalized to the gate pitch and $\alpha$ is defined as

$$\alpha = \frac{FO}{FO + 1}$$ \hspace{1cm} (20)

where FO is the average fanout of a logic gate. $k$ and $N$ were defined earlier as the average number of terminals per gate and the total number of gates in the processor, respectively. The $\Gamma$ parameter used in (18) and (19) is the normalization factor. We assume $k = 3.2$ and $p = 0.6$, as suggested in [92] for typical logic blocks. The number of gates $N$ can be estimated as we did in Section III. Since $k$ is approximately 3 (i.e., a three-terminal gate), we can assume that the representative logic gate is a two-input NAND gate composed of four CMOS transistors. From the specification that the logic part of a single core has 135 million transistors (0.54 billion transistors for logic/4 cores = 135 million), we can then calculate that the number of its equivalent logic gates is 33.8 million (i.e., $N = 135$ million transistors for logic of 1 core/4 transistors for an equivalent gate = 33.8 million). With an ASL gate pitch of 10 nm and an average FO of 4, the wire length distribution for the random logic portion of the Core i7 processor can be plotted as shown in Fig. 23(a). The ASL buffer distribution $buffer\_count(l)$ gives the expected number of ASL buffers for a wire with a length of $l$ and is simply expressed as

$$buffer\_count(l) = \text{quotient}(l, L_{ch}) \cdot i(l)$$ \hspace{1cm} (21)

where $L_{ch}$ is the buffer channel length and $\text{quotient}(l, L_{ch})$ is the number of buffers for a wire length of $l$. Fig. 23(b)
displays the cumulative distribution of ASL buffer count for a single processor core as a function of spin channel length.

B. Optimization of Spin-Based Interconnects

Spin channel length directly impacts interconnect power. For longer channel lengths, the total number of ASL buffers is reduced, but each buffer requires a higher input current to compensate for the loss in spin current. Due to these two conflicting effects, an optimum spin channel length exists where the interconnect power is minimized. Fig. 24(a) shows the dependency of buffer count and critical charge current \(I_{c\text{, critical}}\) on the spin channel length indicating that the optimal spin channel length for Cu is 150 nm. However, as estimated in Fig. 23, the corresponding ASL buffer count is about 67 million/core, which is comparable to the total number of devices in a single core. This simple back-of-the-envelope analysis reveals that interconnect power is a critical issue that warrants further investigation. Detailed analysis for calculating interconnect power is presented in Section VI-C.

Novel channel materials with longer spin lifetimes are being explored to overcome the loss in spin current and help realize the full potential of ASL devices. As described in Fig. 24(b), a longer spin diffusion channel translated into a longer optimal channel length, thereby reducing the number of buffers and eventually the total power consumption. With a spin diffusion length of 2 \(\mu\)m at room temperature, single graphene layer (SLG) is the leading candidate among materials that show exceptional spin transport characteristics [41]. However, for efficient spin current injection, graphene-based spin-valve devices require a tunnel barrier such as MgO due to the drastically different impedance values between SLG and the ferromagnet.

VI. SYSTEM LEVEL POWER COMPARISON

The power comparison between ASL and CMOS is presented in this section. We will again use Intel’s Core i7 as a test vehicle, and we will consider various combinations of device parameters and power reduction schemes. This comparison study suggests how ASL-based systems need to be optimized so that they can compete better with CMOS-based systems in terms of active power consumption.

A. Power Calculation Parameters for ASL-Based Processors

We used the following simple equation to estimate the logic and interconnect power of ASL: 

\[
\text{Power} = \frac{\text{switching energy of a single device}}{\text{clock frequency}} \times \frac{1}{\text{device count}}.
\]

Our parameters were as follows.

- We assumed that each ASL gate in the pipeline stage is sequentially pulsed by the clock to reduce power consumption. The overall approach for estimating switching energy was described in Section IV.
- We estimated switching energies for the logic devices as well as the interconnect buffers, while considering different spin injection ratios and supply voltage requirements. The optimal distance between two interconnect buffers for minimum power was calculated based on the methodology shown in Section V.
- We chose 25 MHz as the operating frequency for power comparison since frequencies higher than this would make the comparison meaningless due to the extremely high ASL power. Although we did not perform a full energy-delay optimization for CMOS, the supply voltage was reduced to account for the lower operating frequency.
- We used an industrial 32-nm process design kit for the schematic design and HSPICE simulation of CMOS gates. Assuming 20 logic gates in a single pipeline stage, the switching time of an ASL device can be calculated as 40 ns/20 = 2 ns [93].
- We calculated the number of ASL interconnect buffers required on the basis of the channel material and optimal buffering interval. As explained in Section III, the device count for the
logic portion of the processor can be cut down by half using ASL.

- We considered various power management schemes in order to assess the advantage of ASL while considering both static and active power (i.e., varying activity levels of the processor cores). Modern microprocessors such as Intel’s Core i7 are capable of adjusting the voltage and frequency, gating off clocks, and shutting down cores all together, depending on the computation demand [94]. According to Intel’s Core i7’s datasheet, the C0 state represents the highest power consumption mode (i.e., all four cores are switching), while the C1 state is used for the clock gating mode which draws static leakage power only. The C6 state represents a power gating mode which can achieve the lowest static power consumption [95].

B. Activity Factor Between ASL and CMOS

Our analysis shows that the zero standby leakage power of ASL is offset by the high switching energy due to the low spin injection ratio and the large number of buffers for interconnects. Another critical obstacle that has been largely overlooked is the 100% activity factor associated with any spin-based logic scheme. As can be seen in Table 5, output of a CMOS gate switches only when the input changes. In other words, if the input remains constant, CMOS logic gates do not consume any dynamic power. Note that CMOS gates in complex blocks typically have an activity factor lower than 10%. On the other hand, ASL and other spin-based logics have to evaluate every cycle regardless of the input data. This is equivalent to an activity factor of 100%. As shown in Table 5, ASL consumes $I_c$ whenever the clocked $V_{supply}$ is on. This is an inherent drawback of most spin-based devices that may have to be addressed with the help of auxiliary CMOS circuits.

C. Power Comparison: ASL Versus CMOS

Table 6 presents the power comparison between CMOS-based and various ASL-based Core i7 implementations.

We estimate the power consumption of future ASL technologies assuming improvement in the magnet and channel properties, and a minimum feature size of 5 nm. That is, the minimum magnet width and the minimum gate-to-gate distance are both 5 nm. Ideally, the comparison between CMOS and ASL should be done at the same technology node (i.e., 5-nm CMOS versus 5-nm ASL). But the supply voltage, transistor parameters, threshold voltage, and operating frequency for 5-nm CMOS are largely unknown at this point. As a compromise, we chose to compare 5-nm ASL to 32-nm CMOS, hoping that this will give readers at least a sense of how the power consumption of ASL compares to that of today’s microprocessors.

In order to mitigate the high power consumption and limited performance of spin-based interconnects, we also considered a hybrid spin-charge interconnect scheme in which interconnects longer than a certain length (e.g., 5 μm) are replaced with charge-based interconnects. The minimum wire length for switching to charge will depend on the conversion overhead as well as the performance and power benefits. The total number of interconnect buffers was estimated based on the specific type of channel material and interconnect scheme (e.g., spin only or hybrid).

Another possible method for reducing ASL power is to deliberately lower the thermal stability to the point of guaranteeing nonvolatility for just a single clock cycle. This can be achieved by either shrinking the volume of the magnet or switching to a lower Ku material. Fig. 25(a) plots the minimum thermal stability required for an ASL-based Core i7 processor to satisfy a chip failure rate of $0.01\%$ for different target retention times. Results in Fig. 25(b) show the clear tradeoff between retention

| Table 5 Activity Factor Comparison Between CMOS and ASL. CMOS Gates Only Consume Power When the Input Signal Switches, While ASL Gates Consume Power Every Cycle Irrespective of the Input Pattern | 
| --- | --- |

![Image of CMOS and ASL inverters and their activity patterns](image-url)
time and power consumption for different spin diffusion lengths, polarization factors, and interconnect schemes.

Material and device parameters of ASL to meet a system requirement of ten years of retention and 25 MHz of operating frequency are listed separately for the core devices and interconnect buffers in Table 6. Total system power for ASL is estimated for different power down modes (i.e., C0, C1, and C6) in order to evaluate the power saving benefits under different active to static power ratios.

The power consumption values are listed in the bottom part of Table 6 for different operating modes, but we also provide a bar chart version of the same data in Fig. 26, showing the logic power and interconnect buffer power, separately. For the C0 state (i.e., where all four cores are actively switching), ASL with $\lambda = 1 \mu m$ consumes excessively high active power compared to its CMOS counterpart, as shown in Fig. 26. The interconnect power can be reduced by employing a longer spin diffusion channel material ($\lambda = 5 \mu m$) and a hybrid interconnect scheme for long distances.

Note that the impact of a longer spin diffusion channel on logic power is negligible since the interconnect length between local ASL gates is too short to benefit from the longer spin diffusion. Meanwhile, a material with a high polarization factor ($P = 0.8 \sim 0.9$) is considered to enhance the spin injection ratio, resulting in significant power savings in both the logic and interconnect circuits.

Finally, we show another future scenario in which the retention time is traded off (down to $1/\mu s$), which eventually makes the ASL power comparable to that of CMOS. In the bottom part of Fig. 26, power consumption numbers are shown for a C1 operating mode where only a single core is active while the other three cores are in a

Table 6  ASL Versus CMOS Power Comparison Under an Operating Frequency of 25 MHz. (C0: All Four Cores Active; C1: One Core Active While Three Cores Are Clock Gated; C6: One Core Active While Three Cores Are Power Gated)

<table>
<thead>
<tr>
<th>Parameter</th>
<th>32nm CMOS</th>
<th>ASL, $\lambda=1\mu m$</th>
<th>ASL, $\lambda=5\mu m$</th>
<th>Hybrid interconnect (&gt;5$\mu m$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology node</td>
<td>32nm</td>
<td>10nm</td>
<td>10nm</td>
<td>10nm</td>
</tr>
<tr>
<td>Device count</td>
<td>540 million</td>
<td>299 million</td>
<td>280 million</td>
<td>280 million</td>
</tr>
<tr>
<td>Activity factor</td>
<td>5%</td>
<td>100%</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>Channel</td>
<td>-</td>
<td>1$\mu m$</td>
<td>5$\mu m$</td>
<td>5$\mu m$</td>
</tr>
<tr>
<td>$P_{RF}$</td>
<td>-</td>
<td>0.5</td>
<td>0.5</td>
<td>0.8</td>
</tr>
<tr>
<td>Retention time/$K_b$</td>
<td>-</td>
<td>10year/3.15</td>
<td>10year/3.15</td>
<td>10year/3.15</td>
</tr>
<tr>
<td>Critical $I_s$ (FO4)</td>
<td>-</td>
<td>51$\mu A$</td>
<td>51$\mu A$</td>
<td>51$\mu A$</td>
</tr>
<tr>
<td>$L_s/L_b$ Core/Buffer</td>
<td>24.3%/11.2%</td>
<td>24.5%/8.2%</td>
<td>24.5%/8.2%</td>
<td>39.2%/12.6%</td>
</tr>
<tr>
<td>$V_{supply}$ Core/Buffer</td>
<td>12.8mV/27.8mV</td>
<td>12.7mV/37.8mV</td>
<td>12.7mV/37.8mV</td>
<td>5.5mV/17.1mV</td>
</tr>
<tr>
<td>Power (25MHz)</td>
<td>C0 Active/Static/Total</td>
<td>0.05/3.7/0.00379W</td>
<td>91.2/0.00/91.2W</td>
<td>69.3/0.00/69.3W</td>
</tr>
<tr>
<td></td>
<td>C1 Active/Static/Total</td>
<td>0.01/3.7/0.00317W</td>
<td>22.8/0.00/22.8W</td>
<td>17.3/0.00/17.3W</td>
</tr>
<tr>
<td></td>
<td>C6 Active/Static/Total</td>
<td>0.01/1.0/0.010W</td>
<td>22.8/0.00/22.8W</td>
<td>17.3/0.00/17.3W</td>
</tr>
</tbody>
</table>

* $\lambda$: spin diffusion length, $L$: channel length, $P$: spin polarization, $K_b$: crystal anisotropy ($10^3$J/m$^3$), $I_s$: spin current, $I_c$: charge current

Fig. 25. Tradeoff between ASL retention time and switching power. (a) Thermal stability versus retention time (0.01% chip failure rate assumed). (b) Power consumption versus retention time for various ASL devices.
clock gated mode and hence dissipating leakage power. ASL is expected to show more favorable results in this operating mode since, here, the portion of leakage power is higher due to the three idle cores. Indeed, our estimation results show that ASL can achieve a power level comparable to CMOS even without sacrificing retention time or requiring a very high polarization factor (e.g., 0.9).

VII. LIMITATIONS OF THIS WORK AND FUTURE DIRECTIONS

Due to limited experimental data available and the speculative nature of this type of research, our benchmarking study had to rely on many assumptions and workarounds. Here, we summarize some of the known limitations of this work which could be addressed in future work.

- The power estimation focuses on only the logic portion of the processor. Memory power needs to be addressed separately.
- We assume a 5-nm technology for ASL, which is beyond the limit of today’s lithography tools. Recently developed gas phase synthesis methods could enhance the patterning resolution by direct placement of nanoparticles [96].
- Variation in the magnet dimensions in extremely scaled technologies will have significant impact on the thermal stability and critical switching current of spin-based devices. Further studies are necessary to assess device performance in the presence of dimensional variability and material imperfections.
- The physical parameters we set in this work are based on room temperature. However, the worst case operating temperature in many integrated circuits (ICs) is generally higher [97]. This may result in higher magnet resistivity, lower spin polarization, and shorter spin diffusion length [90].
- We assume that each ASL gate receives a pulsed clock that is delayed from one logic stage to the next. By doing so, we can also assume that static power is consumed only during the short computation period.
- Our power estimation is based on the device count for core and interconnect circuits and their individual energy dissipation. For the sake of clarity and focus, the power and area overhead associated with clocking the ASL gates was not considered. Several studies are currently underway developing techniques to reduce clocking power by utilizing clocking transistors with reduced voltage headroom, shared among multiple ASL devices.
- A tunneling barrier may be required for good impedance matching between a metallic magnet and a graphene channel. This may necessitate a higher voltage that could result in a higher overall energy consumption for ASL.
- In the hybrid interconnect scheme, we assume that the overhead for spin-to-charge and charge-to-spin conversion is negligible compared to the buffer power overhead for long wires. Additional work will be required for an accurate estimation of the spin-to-charge and charge-to-spin conversion overhead.
- Our device count methodology for an ASL system is based on a drop-in replacement scenario; that is, each CMOS gate is substituted with an equivalent ASL gate. However, certain logic functions may be able to take advantage of the inherent majority function of ASL which could open up new design methodologies for ASL [43], [75].
- The standby power of ASL circuits was assumed to be negligible by utilizing ultralow leakage power gates, or by completely shutting down the external power supply.
- Nonvolatile devices allow the chip to instantly power-down and power-up without incurring any backup overhead. This is a key benefit of ASL that has not been accounted for in this work.

VIII. CONCLUSION

The first part of this paper provides a general overview of spintronic technology, including historical advances, working principles of key spin-based devices, and recent breakthroughs demonstrated by the research community. The second part of the paper highlighted the key advantages of ASL-based computing—such as zero static power, lower device count, and lower supply voltage—by presenting a case study of ASL technology on a hypothetical Intel Core i7 processor.
Technical barriers associated with spin devices such as low spin injection, limited spin diffusion length, and intrinsically high activity factor were also extensively discussed. It is our sincere hope that this paper will provide the general engineering community with a clearer picture of spintronic computing technology from a system level perspective.

Acknowledgment

The authors would like to thank Dr. B. Behin-Aein at Global Foundries for technical feedback and encouragement, and the reviewers for their help in improving the quality and presentation of this paper.

REFERENCES


Kim et al.: Spin-Based Computing: Device Concepts, Current Status, and a Case Study on a High-Performance...
ABOUT THE AUTHORS

Jongyeon Kim received the B.S. and M.S. degrees in electrical engineering from Yonsei University, Seoul, Korea, in 2005 and 2008, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering, in Prof. C. H. Kim’s group, at the University of Minnesota, Minneapolis, MN, USA. In 2008, he began working for the Semiconductor R&D Center, Samsung Electronics, Hwasung, Korea. For three years, he was engaged in research and development of FG/TANOS/3D nano flash memories, especially focusing on the cell reliability characterization. He joined the Department of Electrical and Computer Engineering, University of Minnesota, in 2011. From June 2014 to August 2014, he interned at Globalfoundries, Santa Clara, CA, USA, working on magnetic tunnel junction (MTJ) modeling and spin transfer torque magnetoresistive random access memory (STT-MRAM) circuit design. His research interests include device-circuit codesign for spintronic systems and nanoelectronic/spintronic compact modeling. He is an author or coauthor of over ten journal and conference papers.

Mr. Kim was the recipient of the Samsung Corporation Scholarship in 2007 and the best student presentation at the 2014 Center for Spintronic Materials, Novel Interface and Architectures (C-SPIN) annual review.

Ayan Paul received the B.E. degree in electronics and telecommunication engineering from Jadavpur University, Jadavpur, Kolkata, India, in 2005 and the M.S. degree in electrical engineering from the University of Michigan, Ann Arbor, MI, USA, in 2008. He is currently working toward the Ph.D. degree in electrical engineering at the University of Minnesota, Minneapolis, MN, USA. His Ph.D. research is focused on novel device and circuit design for spintronic systems and nanoelectronic/spintronic compact modeling. He is an author or coauthor of over 130 journal and conference papers. He holds 61 U.S. patents.

He worked at PricewaterhouseCoopers, India, as a Consultant in 2005–2006, and at Atrenta, India, as a Corporate Applications Engineer in 2006–2007. He has worked on transistor leakage modeling and spin transfer torque magnetoresistive random access memory (STT-MRAM) scaling analysis.

Paul A. Crowell received the Ph.D. degree in physics from Cornell University, Ithaca, NY, USA, in 1994. He is a Professor of Physics in the School of Physics and Astronomy, University of Minnesota, Minneapolis, MN, USA. His employment history includes serving as a Postdoctoral Associate at CNRS, Grenoble, France, for 1993–1995; and at the University of California Santa Barbara, Santa Barbara, CA, USA, in 1995–1997. He joined the University of Minnesota in 1997. He is an expert on spin dynamics and transport in metals and semiconductors and has written over 70 refereed publications. He pioneered the development of time-resolved Kerr microscopy as a tool for spin wave spectroscopy. His work with C. Palmström on hybrid ferromagnet-semiconductor heterostructures has included the first demonstration of spin injection into semiconductors from a Heusler alloy and a perpendicular anisotropy material. In extending this work to device physics, they have demonstrated the first completely functional FM/SC/FM lateral spin valve and the first completely electrical measurement of the direct spin Hall effect in n-GaAs.

Dr. Crowell has been a Fellow of the American Physical Society since 2008.

Steven J. Koester (Senior Member, IEEE) received the Ph.D. degree in electrical and computer engineering from the University of California Santa Barbara, Santa Barbara, CA, USA, in 1995. He is a Professor in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA. From 1997 to 2010, he was a Research Staff Member at the IBM T. J. Watson Research Center and made significant contributions to the development of strained Si and SiGe MOSFETs, Ge photodetectors and III–V MOSFETs. From 2006 to 2010, he served as Manager of Exploratory Technology at IBM Research, where his team investigated advanced devices and integration concepts for use in future generations of microprocessor technology. Since joining the University of Minnesota in 2010, his research has focused on novel device concepts, with an emphasis on graphene and 2-D materials. Currently, he collaborates with J.-P. Wang, C. H. Kim, and P. A. Crowell to investigate spin-based logic using graphene interconnects. He has edited seven volumes, and authored or coauthored four book chapters and over 170 journal publications and conference presentations. He holds 61 U.S. patents.

Dr. Koester is an Associate Editor for IEEE ELECTRON DEVICE LETTERS.

Sachin S. Sapatnekar (Fellow, IEEE) received the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, IL, USA, in 1992. He holds the Henle Chair in the Department of Electrical and Computer Engineering and a Distinguished McKnight University Professorship at the University of Minnesota, Minneapolis, MN, USA. His research interests are in the area of design automation of integrated systems.

Dr. Sapatnekar was conferred with the Semiconductor Research Corporation Technical Excellence Award in 2003 and the Semiconductor Industry Association University Researcher Award in 2013. He served as General Chair of the ACM/EDAC/IEEE Design Automation Conference (DAC) in 2010 and as Editor-in-Chief of the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN from 2010 to 2013. His publications have received seven best paper/poster awards at various conferences and an International Conference on Computer-Aided Design (ICCAD) Ten-Year Retrospective Most Influential Paper Award.
Jian-Ping Wang received the Ph.D. degree in physics from Chinese Academy of Sciences (CAS), Beijing, China, in 1995. He is the Distinguished McKnight University Professor of Electrical and Computer Engineering, and a member of the graduate faculty in Physics and Chemical Engineering at the University of Minnesota, Minneapolis, MN, USA. He is a leader on experimental applied magnetic research and spintronic devices and is the Director of the Center for Spintronic Materials, Novel Interface and Architectures (C-SPIN) and the Associate Director of the Center for Micromagnetics and Information Technologies. He established and managed the Magnetic Media and Materials program at the Data Storage Institute, Singapore, as the Founding Program Manager, from 1998 to 2002. He has authored and coauthored more than 200 publications in peer-reviewed top journals and conferences and holds 30 patents. His current research programs focus on searching, fabricating, and fundamentally understanding new nanomagnetic and spintronic materials and devices. He was the first to demonstrate the perpendicular spin transfer torque device, and he has been performing pioneering research on magnetic-tunnel-junction-based logic devices.

Dr. Wang received the Information Storage Industry Consortium (INSIC) technical award in 2006 for his pioneering work in exchange coupled composite magnetic media.

Chris H. Kim received the B.S. and M.S. degrees from Seoul National University, Seoul, Korea and the Ph.D. degree in electrical and computer engineering, from Purdue University, West Lafayette, IN, USA, in 2004. He is an Associate Professor in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA. His group has expertise in digital, mixed-signal, and memory circuit design in silicon and beyond-silicon (such as flexible electronics and spintronics) technologies. He has extensive experience in both CMOS circuit design and novel circuit architectures, and has made significant contributions in the area of reliability monitor designs, embedded memory design, time-based mixed-signal circuits, and power delivery techniques.

Prof. Kim is the recipient of a National Science Foundation (NSF) CAREER Award, a McKnight Foundation Land-Grant Professorship, a 3M Non-Tenured Faculty Award, Design Automation Conference/International Solid-State Circuits Conference (DAC/ISSCC) Student Design Contest Awards, IBM Faculty Partnership Awards, and an IEEE Circuits and Systems Society Outstanding Young Author Award.