Published in issue of Chip Design Magazine

Hot Chips? ... Not!

Efficient power management in the 90-nanometer foundry reference flow

On-chip leakage power consumption increases as threshold voltage decreases. As a result, the low threshold voltages associated with 90-nanometer process geometries demand use of a power-aware design flow. Just as previous technology nodes required tradeoffs between delay and area, now the designer must balance delay with power.

Without a power-aware design flow, 90-nanometer system-on-chip (SoC) leakage power would reach unacceptable levels, even when the chip is in standby mode. Fortunately, a good power management flow, coupled with flexible library options, allows designers to meet both power goals and timing requirements. Such flows are often important in meeting leakage power goals at higher technology nodes (particularly 130-nanometer), but at 90 nanometers, leakage power has become a critical issue for all designs.

This article provides an overview of the power issues at 90 nanometers and describes a practical design flow that addresses these issues-a flow developed for the 90-nanometer process (CMOS9SF) at IBM Microelectronics. Working with the library models and process parameters defined by IBM, a development team from Synopsys Professional Services created the RTL-to-GDSII foundry reference flow that integrates threshold voltage scaling for effective leakage power management.

The design flow described here uses threshold voltage scaling to meet both timing and leakage-power goals. The flow also includes a vectorless switching activity annotation method that enables analysis of power consumption without simulation vectors.

Figure 1: The dynamic and static (leakage) currents associated with a CMOS device. The main current of concern in limiting leakage power consumption is the subthreshold flow.

Since power optimization has become such an integral part of the flow, accurate and efficient power analysis must coexist. The total power consumed by a chip equals dynamic power plus static power. Dynamic power is the power consumed in switching logic states, and static power (leakage power) is consumed while transistors are not switching. Although CMOS transistors have some reverse-biased diode leakage from drain to substrate, the larger portion of leakage power is due to the sub-threshold current through a transistor that is turned off, conduction between source and drain. (see Figure 1)

The sub-threshold leakage current is problematic because it increases as transistor threshold voltages (Vth) decrease. The move to the 130-nanometer technology node led to a significant rise in leakage power consumption. At 90 nanometers, leakage power can represent as much as 50 percent of the total power consumed by a chip, depending on the design In addition, high leakage power can exponentially increase reliability related failures, even in standby. (see Figure 2)

Figure 2: Leakage power consumption has grown exponentially in very deep submicron CMOS chips. A leakage-management methodology is particularly crucial at 90 nanometers and below.

As CMOS technologies scale down, the main approach for reducing power has been to scale down the supply voltage (VDD). Voltage scaling is a good technique for controlling dynamic power because of the quadratic effect of voltage on power consumption. However, simply reducing the power supply reduces circuit speed because the switching delay time is proportional to the load capacitance and the ratio Vth/VDD. To maintain sufficient drive strength for fast switching, Vth must decrease in proportion to VDD; leakage power increases as process geometries shrink.

Note that reducing the threshold voltage of a particular CMOS transistor is necessary only to allow that transistor to switch at the fastest possible speed. If the transistor’s highest speed is not needed to meet timing goals, its Vth can be increased to reduce leakage power consumption. This concept makes it possible for a power-aware design flow to balance timing requirements with leakage power goals.

Multiple-threshold libraries

Balancing timing and leakage power requires the use of multiple libraries whose cells operate at different threshold voltages. Usually two libraries suffice-one with high Vth and one with low Vth.

In the case of the IBM CMOS 9SF process, these libraries are referred to as RVT for regular Vth and LVT for low Vth. IBM has also characterized a third library (eMPU) for this process, one that has an even lower Vth. This library targets applications such as processors that must push performance to the limit. The initial foundry reference flow for leakage management uses only the RVT and LVT libraries. However, the same flow can readily encompass the eMPU library in the future. (see Figure 3)

Figure 3: Comparison of normalized standby power and delay values between IBM s CMOS 9SF process RVT and LVT libraries.

Data in Figure 3 compares the normalized standby power values for the RVT and LVT libraries. Note the exponential relationship between these values. In general, every 90 mV reduction in Vth increases leakage by an order of magnitude. The figure also shows the performance advantage of the low-Vth library. The 9SF process supports this performance with copper interconnect on as many as nine layers plus one aluminum/copper layer. The process also provides options such as low- and high-Vth FETs, as well as triple-well structures.

Optimization meets timing goals by using the low-Vth cells on critical timing paths and high-Vth cells on non-critical paths. The low- and high-Vth cells have the same footprint for equivalent functions, so synthesis and layout tools can swap the cells as needed to meet timing requirements on a particular path without changing the layout. Maintaining the same footprint requires careful library design because the low-Vth cells have a different well implant to create their lower threshold voltage. If this implant extended to the edges of the cell, it could overlap the edge of an adjacent high-Vth cell. The cells are therefore designed with a small buffer space around the edges, so that RVT and LVT cells can be placed side by side without problems.

Problems can occur if cells of the same Vth type are placed with a small space between them and a filler cell of the opposite Vth type is used to fill the gap. This mismatched filler creates a gap in the implant regions that violates design rules. The design flow prevents this problem from occurring by performing intelligent filler cell insertion.

The leakage management reference flow

The purpose of a foundry reference flow is to ensure that foundry users have a smooth path from chip design to production. As with the 130-nanometer process, the companies are validating the 90-nanometer reference flow in silicon using a test chip that incorporates technology from several sources, including IBM (Burlington, VT), Synopsys, Inc. (Mountain View, CA), and ARM Ltd. (Cambridge, U.K.).

The reference flow for the new CMOS 9SF process carries over most of the flow that Synopsys developed for IBM’s previous-generation 130-nanometer process (CMOS 8SFG) and is based on the Synopsys Galaxy and Discovery design and verification platform (Design Compiler for synthesis, Physical Compiler for unified synthesis and placement, and Astro for physical implementation). The key addition to the 90-nanometer flow is the crucial leakage-management reference flow. (see Figure 4)

Figure 4: The reference flow for the IBM 90-nanometer process minimizes leakage power consumption. By using low-threshold-voltage/high-leakage cells only when needed, the flow minimizes leakage while maintaining performance.

The part of the flow that deals with leakage power management begins with Design Compiler’s initial synthesis of the circuit using the RVT library. Physical Compiler then performs additional optimizations using both the RVT and LVT libraries. In these optimizations, only cells in critical paths are replaced by LVT cells, as needed, to meet timing. Physical Compiler makes these replacement choices using delay calculations based on physical placement. The foundry reference flow includes scripts that automate these synthesis steps.

This portion of the flow relies on the use of a reasonable leakage constraint. However, setting this constraint to zero is not recommended because the synthesis tools will spend a lot of CPU cycles trying to minimize the leakage current all the way to zero with little or no improvement past a certain point. The constraint must still be set low enough such that the synthesis tool makes a worthwhile effort to use low-leakage cells where the timing slack allows.

One method for selecting a suitable constraint value is to run a power analysis with Power Compiler (report_power command) before Physical Compiler’s optimizations. Compare the reported leakage power against the target budget specified by the design requirements. Subtract about 10 percent from the budget and use that value as the maximum leakage constraint. For example, if the power analysis reports a leakage value of 1.0 watts and the budget is 0.8 watts, set the maximum leakage constraint to 0.72 or 0.70 watts for most efficient optimization.

After synthesis, Astro performs more leakage-related steps as part of routing. Based on the more accurate delay information available at this stage, Astro swaps RVT cells back in for some LVT cells if the former can meet timing requirements.

Astro also inserts suitable filler cells next to the RVT or LVT cells, when needed. Typically, filler cells are used to fill any spaces between regular library cells to avoid planarity problems and provide electrical continuity for power and ground. Because the RVT cells have a different diffusion layer over them, an RVT filler cell needs to be placed between RVT cells, and an LVT filler cell needs to be placed between LVT cells. A script that comes with the foundry reference flow applies a multi-Vth-aware strategy, so that Astro inserts the correct filler cells.

While the use of multiple-Vth libraries can limit leakage power consumption, other power-related issues have also become critical across a wide range of designs today. Leakage and dynamic power combine to increase a chip’s overall power consumption, which in turn reduces battery life in mobile applications and often raises heat dissipation to unacceptable levels. Additionally, a designer needs to analyze a chip’s power distribution network to prevent electromigration and IR drop problems.

Analyzing a design to deal with these issues requires values for switching activity because dynamic power dominates once leakage power is under control. A device’s dynamic power = CV2F (C is load; V is voltage swing; F is the number of logic-state transitions, the switching activity).

Switching activity is usually derived from simulation data, but this source can be problematic for many reasons. For example, the design may be too large to simulate at the gate level, where the most accurate switching data is available. Furthermore, an incomplete netlist may cause inaccurate timing that makes gate-level simulations impossible. The designer might also lack a testbench and/or test cases that force maximum power utilization.

In fact, simulation test cases rarely generate the worst-case values for switching activity needed for power analysis. Designers can never be sure whether they’ve covered the cases that cause the highest power consumption. That uncertainty parallels the problems associated with using simulations to verify timing, where designers can never be sure they’ve covered all the timing paths. As a result, static timing is now the predominant timing approach, because it offers complete coverage.

Power analysis coverage

To achieve a similar degree of “coverage? for power analysis, the design flow for IBM’s 90-nanometer process includes vectorless power analysis. Using this technique, the designer annotates worst-case switching values on all of a design’s ports and registers-the architecturally stable parts of the design. Designers can then add these annotations manually (via Power Compiler), although it’s quite easy to create a TCL script that automates the task.

These switching values need to be propagated throughout the design. Power Compiler performs the task statistically and heuristically through all the logic cones so that all nets are automatically annotated. The tool then writes a net switching report, and a perl script processes the report data into a sequence of Astro net-switching activity annotation commands. As Astro executes these commands, all nets are annotated with the appropriate switching data.

Following on these steps in the flow, designers can then perform any type of power-related analysis based on the switching data. This method requires no simulations and thus avoids all the limitations of simulation-based power analysis. Most importantly, the method provides complete switching activity annotation, just as STA offers complete timing coverage.

Avoiding the shotgun approach

It’s possible to assign switching activity to nets in Astro using wildcards for matching netnames. However, all nets would then have the same switching values. This shotgun-style approach usually results in grossly over-pessimistic and unrealistic estimates of switching activity. The vectorless approach implemented in the IBM flow allows designers to specify pessimistic values for port and register switching, while also allowing the tool (Power Compiler) to make more realistic estimates for the remaining nets. With power becoming so critical in SoC design, the days of over-designing for power are over.

Design flows for advanced technology nodes such as the IBM 90-nanometer process require accurate power analysis and power-aware methods that manage both dynamic and leakage power consumption. Threshold voltage scaling is a highly automated technique for dramatically reducing leakage power consumption while maintaining high design performance. This technique, in combination with the many available methods for analyzing and managing dynamic power, successfully meets a wide variety of design goals for both battery-powered and non-battery-powered applications. These advanced strategies for power management are crucial in the drive to increasingly sophisticated and power efficient products.

Lance Pickup is a senior engineer in the Foundry Design Enablement group at IBM Microelectronics, where he has developed ASIC and RF/mixed-signal design methodologies. He is also involved with enabling IP used in COT flows.

Scott Tyson is a principal consultant for Synopsys, Inc. providing methodology consulting and design services with Synopsys Professional Services. Previously, Tyson spent 11 years at IBM in personal computer and workstation development, including architecture, system, and ASIC design.


CHIPD TV

EECatalog Tech Videos

MAGAZINE

  • Download the latest issue of the Chip Design Magazine
    and subscribe to receive future issues and the email newsletter

©2014 Extension Media. All Rights Reserved. PRIVACY POLICY | TERMS AND CONDITIONS