Part of the  

Chip Design Magazine

  Network

About  |  Contact

Design For Reliability

By Arvind Shanmugavel
Faster processors, lowered power targets and shrinking technology have increased the complexity of integrated circuit (IC) reliability analysis. With the 20nm node becoming mainstream, IC design teams are fast re-tooling their analysis methodologies to simulate and capture various reliability failure mechanisms. Electromigration (EM) analysis, thermal analysis and Electrostatic Discharge (ESD) analysis are among the critical reliability checks that need to be performed on an IC.

Shrinking Design Cycles and Margins
IC design teams face constant pressure to deliver products that are cheaper and better than those of the previous generations. This has forced design teams to re-architect their designs, adding newer functionality and adopting aggressive scaling through technology migration to keep up with the market demands. The time-to-market needs of the mobile industry require IC design teams to compress design cycles. The shorter design cycles coupled with the complexity of new process node requirements demand earlier verification. Reliability verification is no longer performed towards the end of a project as a sign-off metric, but performed from an early design stage to capture potential reliability issues that could cause schedule slips.

There are several challenges on the reliability front when migrating to the 20nm node. Electromigration verification on interconnects and ESD verification are among the most challenging. Margins for electromigration and ESD in the 20nm node are smaller than ever. Minimum width rules for metals are seldom used in library cells due to electromigration and self-heat requirements. The ESD operating window, which lies between the device breakdown voltage and the device operating voltage, is also at its smallest in the 20nm node. In order to meet the proper ESD standards, it is critical to use a simulation driven approach to model Human Body Model (HBM), Machine Model (MM) and Charged Device Model (CDM) events and ensure precise placement of the clamps and diodes. Current density checks are also a mandatory to ensure that ESD events do not cause metal burn out.

Application-Aware Reliability
Designing for worst-case reliability is no longer acceptable with today’s compressed chip design cycles. ICs need to be optimally designed for reliability and not overdesigned for a worst-case scenario. The operating condition of the SoC needs to be well understood before performing electromigration analysis. For example, if an application processor is running a particular vector set, such as video encoding or audio playback, one needs to estimate the cumulative electromigration impact based on all these vectors through the lifetime of the device.

Thermal reliability is also another area that should be application aware and end-use aware. For example, the same processor can be used in a handheld device or in a standalone media server, both of which have different end uses and applications. The Thermal Design Power (TDP) of the processor for the end-use needs to be well understood before performing thermal simulations. One also needs to understand the cooling method such as forced or passive cooling to design for thermal reliability.

Reliability Scaling Trends for Libraries
Standard-cell library design and reuse has been one of the cornerstones in SoC design methodologies for decades. Moving to the 20nm node has its own set of challenges for library design. Typically, a single set of standard-cell libraries is designed for a particular process node and re-used for a variety of end products. However, with the current scaling trends in electromigration, the same cell that was designed to operate at one frequency can no longer reliably operate at a much higher frequency. For example, a sample 8X clock buffer operating at 500 MHz can pass the signal EM checks using minimum width routes on metal1 in the 20nm node. However, the same 8X buffer operating at 1 GHz will cause signal EM checks to fail on the minimum width routes. Overdesign for worst-case operating frequency typically comes with an area penalty and could also affect design schedules.

Thermal Impact on Reliability
With passive cooling being mainstream in mobile products, the computation of thermal characteristics of the system and the ICs are critical. System integrators need to understand the true thermal boundaries of the system, whereas IC designers need to understand the on-die thermal characteristics for better reliability analysis. Historically, system designers would use a single heat source to model the behavior of a chip within a system. However, understanding spatial thermal characteristics of the IC in a mobile chassis is critical to the success of the passive cooling scheme. System-centric platforms such as IcePak from Ansys provide the capability for system level thermal analysis on the board and chassis. IC-centric platforms such as Sentinel-TI from Ansys provide the capability to analyze the thermal behavior of the IC within the context of the package boundary constraints.

Electromigration limits have an exponential dependency on the operating temperature of the die. Small variations in operating temperature can cause large variations in EM failures during analysis. It is important to consider an accurate thermal profile of the die during EM analysis. Die centric platforms such as RedHawk and Totem from Ansys provide abilities to extract and analyze the power and signal EM with the impact of an on-die thermal profile.

The Future of Reliability Verification
Reliability verification techniques have drastically changed from one process node to another. As we scale into sub-10nm range, complex multi-physics simulation capabilities will need to be developed for reliability analysis. Statistical failure models for electromigration will need to be developed with various operating scenarios or end-use applications. 3D-ICs will unleash a new level of verification complexity for thermal reliability. Multi-physics platforms that can simulate complex failure mechanisms and apply statistical models will be the new wave of capabilities.

—Arvind Shanmugavel is director of application engineering at Apache Design Inc.

Leave a Reply