• Article
Published in June / July 2005 issue of Chip Design Magazine
Manage Complexity in Nanometer SoC Designs
To handle complex designs, tools must address a variety of scaling, optimization, automation, and reuse issues.The rapid increase in design complexity has become a serious limiting factor in nanometer system-on-a-chip (SoC) designs. This sharp increase is driven by two factors. One is the exponential rise in the number of devices integrated in a single chip. Another factor is actually due to many new issues, such as the interconnect, noise, power, and thermal limitations that are associated with technology scaling. As a result, a gap is widening between the silicon capacity and the design productivity. A few years ago, a study by SEMATECH showed that the level of on-chip integration (expressed in terms of the number of transistors per chip) increases at an approximate 58% annual compound growth rate. Yet the design productivity (measured in terms of the number of transistors per person-month) only grows at a 21% annual compound rate. To manage the exponential increase in design complexity, efforts must concentrate on the following four areas: development of more scalable optimization engines, more efficient solutions to various scaling-related problems, a higher degree of design automation, and design reuse.
Scalable Optimization Engines
Most IC design problems can be formulated as a multi-objective constrained optimization problem. The optimality and scalability of the optimization engine significantly impact the design methodology and quality. For example, one has to partition a large design into smaller pieces for logic synthesis, as most logic-synthesis engines have capacity and scalability limitations. Two years ago, our studies showed that existing circuit-placement tools were surprisingly far from optimal [1]. We used a set of cleverly constructed, circuit “placement examples with known optima� (PEKO) that match many industrial circuit characteristics. In doing so, our study showed that the total wirelengths produced by then-leading placement tools from both industry and academia were 70% to 150% more than the optimal wirelengths for these cases. Although this gap has been narrowed considerably with recent efforts on placement research (e.g., mPL2 [2]), it still reveals a great opportunity for improvement from design-automation tools. Note that a 30% wirelength reduction is equivalent to the benefit of one process technology scaling or the introduction of copper interconnects. Both required multi-billion-dollar investments.
At UCLA, we’re currently using the multilevel method as a general framework for developing scalable algorithms for IC design. Multilevel methods have been studied extensively over the last 20 years as a means of accelerating numerical algorithms for partial differential equations. Recently, multilevel techniques also have been applied successfully to circuit partitioning. This step lead to the widely known hMetis package for hypergraph partitioning [3]. Our recent efforts to apply multilevel techniques to placement and routing also have been very successful. Those efforts lead to highly scalable, placement-engine mPL placement [2] and routing-package MARS [4].
Solutions To Scaling-Related Problems
As we move into nanometer technologies, several scaling-related issues become very important. They include interconnect-bottleneck, noise-sensitivity, power, and thermal limitations. These problems have significantly complicated the design process. To efficiently solve these problems, progress is needed in several directions.
One direction is to develop novel predictable and robust synthesis techniques. In order to reduce the uncertainty of interconnect delays, for example, gain-based-synthesis and physical-synthesis approaches have been proposed and successfully used in the existing synthesis tools. To tolerate interconnect latency, techniques for multi-cycle on-chip communication also have been proposed and showed promising results. These techniques are based on the regularly distributed register architectures [4] and latency-insensitive designs [6]. They greatly reduce the difficulty in handling interconnect uncertainty in high-level designs.
Another approach is to develop integrated modeling, analysis, and synthesis capabilities. For noise and power control, for example, interconnect capacitance needs to be modeled and estimated throughout the synthesis process (i.e., during logic optimization, technology mapping, placement, and routing). The models and analysis tools used at different stages should be consistent and have increasing accuracy as more physical information is available. In addition, various optimization operations need to be applied throughout the design process--from the logic domain to the physical domain and guided by the analysis results of increasing accuracy. These operations include netlist remapping, driver sizing, buffer insertion, and wire spacing for noise and power optimization. A unified tool (single binary) with integrated synthesis, physical-design, and analysis capabilities provides a promising solution.
Higher Degree Of Design Automation
As process technologies continue to shrink, our efficiency in managing complex designs is determined by the degree of design automation. That degree of automation, in turn, is determined by the degree of tool integration and the abstraction level. Given the multiple, highly related design concerns in nanometer technologies like delay, noise, power, and thermal constraints, it is no longer efficient and effective for the designers--COTS customers, in particular--to integrate “best-of-class� point tools for modeling, analysis, synthesis, and optimization. The EDA vendors need to provide highly integrated solutions in the implementation and verification spaces. A single implementation tool from RTL to GDSII is one example. In Figure 1, the single-execution binary embodies an extensive set of modeling, analysis, synthesis, and physical-design capabilities.

Figure 1: This illustration shows one example of a unified IC-implementation solution.
A higher degree of design automation also can be achieved by raising the design-abstraction level. Electronic-system-level (ESL) design automation has been identified by Dataquest [7] as the next productivity boost for the semiconductor industry. Some recent success has been witnessed in ESL simulation. If it lacks robust and efficient behavioral synthesis, however, the transition to ESL design won’t be as well accepted as the one to RTL in the 1990s. This technology, which also is known as high-level synthesis, automatically compiles functional and/or algorithmic descriptions into optimized hardware architectures. Although behavioral synthesis has been a topic of research for almost two decades, it has never really caught on among chip designers. We believe that the previous failures are due to the following reasons:
Given the rapid increase in design complexity and the availability of robust RTL-to-GDSII flows, it is time to re-visit behavior synthesis again--this time with full consideration of the physical reality. The recent research work on combining behavior synthesis with physical planning on the RDR architecture [5] is a good example of this direction.
Design Reuse
Another technique to address the design-complexity challenge is design reuse. In his keynote speech at DAC 2000, Dr. Theo Classen, Philips Semiconductors’ Chief Technology Officer, classified design reuse into four levels: cell reuse, IP reuse, architecture reuse, and silicon reuse. Cell reuse and IP reuse are now well-accepted practices in the industry. But designers may not be very conscientious about architecture reuse. A good example of architecture reuse is the evolution of the x86 architectures through multiple generations of Intel processors. Another example is the Xtensa architecture platform from Tensilica. The recently advocated platform-based design methodology [8][9][10] is one step further along the line of architecture reuse. It should bring significant productivity gains.
The ultimate form of reuse is silicon reuse, in which the same silicon chip can be used for multiple applications. This type of reuse is achieved through the extensive use of on-chip programmable logic and microprocessors. Such programmable chips can be classified as general-purpose programmable chips or application-specific or domain-specific programmable chips. General-purpose programmable platforms are offered through FPGA vendors, such as Actel, Altera, and Xilinx. They provide FPGAs with embedded processors (hard or soft), embedded memories, and a large amount of programmable logic.
On the other hand, domain-specific platforms include a number of domain-specific customized blocks in addition to general-purpose processors and programmable logic. Such platforms are attractive to many applications. They present a rich potential for flexibility, cost, performance, and design-time tradeoff. For example, one possible application is to use such a platform to quickly design an application-specific instruction-set processor (ASIP). Here, the embedded processors can support an extensible instruction set and the programmable logic can implement application-specific instructions. The ASIP compilation flow in [11], as shown in Figure 2, can take a general application specified in C language. It can then identify application-specific instructions, synthesize them in programmable logic, and generate transformed programs for the ASIP. Encouraging speed-ups were reported based on the Altera Stratix platform [11]. Commercial ASIP chips are being offered by some startup companies, such as Stretch, Inc.

Figure 2: Here is the ASIP compilation flow in Error! Reference source not found.
In addition to complete silicon reuse, partial reuse also is possible. Here, several layers of chips (including the device and low-level metal layers) are pre-fabricated. A few metal layers are available for customization. Examples include the well-known gate array and the structured ASIC, which also has been introduced recently (by Altera, ChipX, eASIC, Faraday, Flextronics, LSI Logic, and NEC). A typical structured ASIC consists of hard-coded functions, such as memory and microprocessors, and customizable logic gates. It uses a subset of metal/via layers for customization. The design complexity for structured ASICs is reduced because characterization, layout, and optimization for pre-fabricated layers only need to be done once for performance, density, and yield optimization. Structured ASICs also provide a good way to lower the NRE cost and reduce turnaround time. In general, ASICs provide the highest density, performance, and power efficiency but the highest NRE cost. FPGAs provide the lowest density, performance, and power efficiency due to the extensive use of programmable logic and programmable interconnects. But they have the lowest NRE cost. The structured ASICs lie in between. They may have FPGA-like logic cells, but all interconnections have direct metal connections without going through programmable switches.
Table 1 provides a cost comparison between a typical 1-million-gate design at a 0.13-um process for FPGA, structured-ASIC, and cell-based-ASIC implementations under different volume assumptions. One should consider all three silicon-implementation platforms for the best density, performance, and cost tradeoff. The IC implementation system from Magma allows the designer to target the same design to multiple silicon platforms including FPGAs, structured ASICs, and cell-based IC designs.

Table 1: Comparison of FPGA, cell-based ASIC, and structured-ASIC development costs
Design complexity is becoming a serious limiting factor in nanometer SoC designs. We identified four key areas for design-complexity management: the development of more scalable optimization engines, more efficient solutions to various scaling-related problems, a higher degree of design automation, and design reuse. The collective efforts of the EDA industry and the research community in these areas will help designers cope with the rapid design-complexity increase in the nanometer era.
Dr. Jason Cong is a Professor and Co-Director of the VLSI CAD Laboratory in the Computer Science Department of the University of California, Los Angeles. He is an Associate Editor of ACM Trans. on the Design Automation of Electronic Systems and an IEEE Fellow. Cong was the Founder and President of Aplus Design Technologies, Inc. until it was acquired by Magma Design Automation in 2003. Currently, he serves as the Chief Technologist Advisor of Magma. Cong received his B.S. in Computer Science from Peking University in 1985. He got his M.S. and Ph.D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 1987 and 1990, respectively.
References:
[1] Chang, C.C., Cong, J., and Xie, M., “Optimality and Scalability Study of Existing Placement Algorithms,� Asia Pacific Design Automation Conference, Jan. 2003.
[2] Chan, T., Cong, J., Shinnerl, J., and Sze, K., “An Enhanced Multilevel Algorithm for Circuit Placement,� International Conference on Computer-Aided Design, Nov. 2003.
[3] Karypis, G., et. al., “Multilevel Hypergraph Partitioning: Application in VLSI Domain,� Design Automation Conference, June 1997.
[4] Cong, J., Xie, M., and Zhang, Y., “An Enhanced Multilevel Routing System,� International Conference on Computer-Aided Design, Nov. 2002.
[5] Cong, J., Fan, Y., Han, G., Yang, X., and Zhang, Z., “Architecture and Synthesis for On-Chip Multicycle Communication,� IEEE Trans. on CAD, Vol. 23, April 2004.
[6] Carloni, L., McMillan, K., and Sangiovanni-Vincentelli, A., “Latency Insensitive Protocols,� International Conference on Computer-Aided Verification, July 1999.
[7] Smith, G., and Nadamuni, D., “2003 ESL Landscape,� ID Number: SEMC-WW-DP-0259, Dataquest, April 2003.
[8] Goering, R., “Platform-based Design: A Choice, Not a Panacea,� EE Times, Sept. 2002.
[9] Sangiovanni-Vincentelli, A., and Martin, Grant, “Platform-Based Design and Software Design Methodology for Embedded Systems,� IEEE Design and Test of Computers, Volume 18, Number 6, November-December 2001.
[10] Sangiovanni-Vincentelli, A., et. al., “Benefits and Challenges for Platform-Based Design,� Design Automation Conference, June 2004.
[11] Cong, J., Fan, Y., Han, G., and Zhang, Z., “Application-Specific Instruction Generation for Configurable Processor Architectures,� Field-Programmable Gate Arrays, Feb. 2004.
[12] Snyder, C., “Structured ASICs Offer Application Adaptability,� www.synplicity.com/literature/pdf/v1-1_adaptability1.pdf, SemiView, Dec. 2003.
......................................................................









