• Article
Published in October / November 2008 issue of Chip Design Magazine
There's Finally an ESL Tool for Hardware Designers
By moving to the electronic system level, it's possible to reduce power consumption by up to 75%.Power consumption—always the bane of portable-device semiconductor design—has become a burgeoning concern for almost all systems. Portable devices are particularly sensitive to power consumption. Yet the cost and design challenges of complex cooling systems, increasing energy prices, and trends toward “green” energy-efficient design are making power a first-order problem for all system designers—whether portability is a factor or not. But how do we make a major reduction in power consumption?
Most hardware is designed at the register transfer level (RTL), where visibility into the power consumed by a design is murky at best. RTL developers generally have little information available about how the data moving through their designs will affect power. Nor do they have the time to experiment with different RTL architectures to find one that yields the required performance with the lowest power consumption.
Although RTL power tools can optimize to reduce some power consumption, they have limited scope. After all, about 80% to 90% of the power budget is set in stone once the architecture is locked in at the RTL. To make a major reduction, engineers must begin with an architecture that’s designed from the start to minimize leakage power and switching activity. Unfortunately, creating a low-power architecture isn’t an intuitively obvious task when starting at the RTL.
Here is where the electronic system level (ESL) comes in. By modeling a design’s function at an algorithmic level using ANSI C/C++ or SystemC, designers can simulate the function—at high speed in software—before the microarchitecture is considered. A power-optimizing, high-level synthesis tool can then import the design and automatically produce a power-optimized architecture that minimizes leakage and dynamic power consumption. It can then write out the design in RTL (see Figure 1). Benchmarks of this approach have shown up to a 75% reduction of overall power consumption when compared to RTL designed by hand.
One must wonder how a high-level synthesis tool can optimize for power and achieve such a compelling result. It turns out that a design’s power consumption is as dependent on the switching activity of typical datasets moving through the design as it is on the microarchitecture itself. Therefore, it is crucial to have a high-level synthesis engine that optimizes based on the power analysis of real data moving through the system. This analysis can be done as a one-time step at the system level with instrumented C code running an ESL testbench to capture activity data. That data, in turn, is provided to the synthesis engine.

Unlike simulation for timing conditions, which mostly is concerned with corner-case behavior, power analysis is most accurate when based on a representative vector set of real data that’s expected to move through the system. This is another reason why power analysis is best done at the system level: Systems modeled with hardware and software running together naturally provide realistic, representative datasets for power analysis. Thus, the power-optimized RTL designs that are produced will operate that much more efficiently when executing the target software and data.
Once the system-level power simulation is run, the power activity data can be fed to the high-level synthesis engine. The synthesis engine’s duty is to take untimed, unscheduled C-based algorithms and turn them into optimized RTL architectures. The synthesis engine schedules operations, allocates the resources to perform those operations, and—if necessary—binds them using multiplexers and glue logic.
In a power-optimizing, high-level synthesis tool, operation and variable binding are performed in a way that minimizes switching activity at the inputs and outputs of the assigned hardware resources. Consider, for example, a case in which the inputs of a multiplication operation are set to logic ‘1’ most of the time and the inputs of a second multiplication are almost always at logic ‘0.’ In this scenario, mapping both multiplications onto one multiplier would result in a lot of bit toggles.
In terms of dynamic power consumption, a better solution would be to use two multipliers or select a binding using other available multiplication operations. The power consumption of multiplexers can be reduced in cycles when the output values aren’t used. This goal is accomplished by letting the input data pass through, which renders the lowest switching activity.
A key ingredient of a power-optimizing, high-level synthesis tool is the availability of a technology library that’s characterized not only for timing and area, but also for power. The power characterization of RTL components, such as multipliers or registers, requires the consideration of switching activity as well as supply voltage. The resulting power models for the components deliver power consumption—split into leakage and dynamic power—that’s dependent on input/output activity, timing and area constraints, and the supply voltage.
Thanks to the power library and power simulation activity data that are fed into it, the synthesis engine can automatically produce an initial low-power implementation. This automatic, push-button optimization is usually performed once—up front—without setting constraints. The user can then view a report of the leakage and dynamic power consumption as well as the clock frequency, latency, and area estimates for the architecture produced.
Now comes the fun part, where the engineer gets to explore the design possibilities. With an initial run and results to compare against, the user can set constraints on the design and re-run the synthesis step iteratively (see Figure 2). This step is referred to as exploring the “design space” by interactively trading off power against timing and area. The process is highly enlightening, as some tradeoffs may not be obvious without such exploration.
For example, most engineers would naturally think that using faster adders in a function would increase the design’s power consumption. Faster adders are, in fact, usually bigger in area and use more power. By using faster adders, however, the design’s function often can be completed with less latency. The result is fewer registers and therefore less switching activity. In addition, faster adders sometimes can be shared among multiple operations, thereby enabling a reduction in the number of adders used.
While the adders themselves may be using more power, the overall power consumption is reduced. This reduction is derived from lower switching activity in the registers and possibly resource sharing as well. Note that such tradeoffs can be explored quite quickly within the synthesis tool by changing constraints and re-running the synthesis engine. This fast iteration loop operates at the system level, producing various scheduled RTL architectures. Until the final architecture is selected, however, there’s no need to actually produce an RTL design.
As another example, an engineer might want to assign multiple supply voltages to different parts in the design hierarchy, effectively creating voltage islands. A voltage island can have constant or variable voltages. When power gating is applied, supply voltages also can be switched off. A power-optimizing, high-level synthesis tool will automatically analyze the impact of voltage islands on performance and power consumption. The designer can run several what-if analyses to reach an optimal supply voltage assignment for the performance target. Once an optimal assignment has been found, the high-level synthesis tool outputs the voltage islands. They are output together with required additional information, such as level-shifter rules, for downstream tools in Si2’s Common Power Format (CPF) or Accellera’s Unified Power Format (UPF).
After the engineer has iterated through the synthesis engine and selected the optimal architecture based on its balance of power, timing, and area results, it’s time to produce an RTL design. Two types of RTL can be output: structural or functional. Structural RTL contains statements instantiating specific functions from the module library, such as adders. The advantage of structural RTL is that the design synthesized by the downstream RTL synthesis tool will use components specified by the power-optimizing, high-level synthesis tool. They will therefore be highly correlated to the estimates predicted at the ESL. A possible disadvantage is that structural RTL may bypass some optimization routines that are called in RTL synthesis to optimize operators.
A functional RTL description uses Verilog in-line operator symbols for operations like add, multiply, etc. Functional RTL enables downstream RTL synthesis tools to implement the most efficient operators in the RTL context from a component library or module synthesis tool. The result is sometimes more optimal than what’s produced from structural RTL. But it could possibly diverge somewhat from the power, timing, and area estimates reported at the ESL. The choice between the predictability of structural RTL and the optimization possibilities of functional RTL is at the discretion of the engineer.
As mentioned earlier, RTL (and below) power-optimization tools can further reduce power—even for a design that’s optimized at the ESL to have a low-power architecture. Such tools have the ability to optimize clock trees and insert additional clock-gating logic. They also can optimize leakage power by performing threshold-voltage and other optimizations on the gate-level design. To pass along the estimates made at the ESL to these tools and use them as constraints, a constraint file in CPF or UPF is generated by the high-level synthesis tool and output along with the Verilog RTL file for use by downstream tools.
An RTL testbench also is produced automatically along with the Verilog RTL design and CPF/UPF constraints file. The RTL testbench is automatically generated based on the results of the power simulation executed at the ESL. Note that this activity data was originally generated by the ESL testbench. As a result, the RTL testbench produced by the power-optimizing, high-level synthesis tools is correlated automatically to the ESL testbench. Yet it’s now represented in the context of the synthesized RTL design.
Finally, documentation is important—especially for designs that were automatically generated. An RTL microarchitecture specification is therefore generated for all synthesized designs. This important document lists all of the components and reports that describe the RTL design. It is a very handy document for an engineer to reference what the design contains without having to read the RTL code directly.
The advent of power-optimizing high-level synthesis, enabling power optimization and architectural exploration, make it the first ESL tool to finally deliver value to the hardware designer. System architects have utilized ESL tools for their ability to model the performance and integration issues of complete systems. Historically, however, these toolsets have created additional work for the hardware engineer. That engineer is tasked with converting high-level models into compliant architectures that are efficient in terms of power, timing, and area. Now, however, an ESL tool of this type enables hardware developers to produce designs with better quality of results—in particular, implementations with up to 75% lower power consumption than what’s achieved via RTL designed by hand. Clearly, it’s time for hardware designers to finally start taking a look at ESL.
Dr.-Ing. Lars Kruse, patent-holding technologist, serves as vice president of engineering at ChipVision. There, he is responsible for driving the technology development of the company’s low-power design-optimization EDA solutions. He holds masters and PhD degrees in technical computer science from the University of Oldenburg, Germany.
Craig Cochran is vice president of marketing and business development at ChipVision. He holds a bachelor of science degree cum laude in electrical engineering from the Georgia Institute of Technology. Cochran also has business credentials from the Stanford Graduate School of Business and the American Management Association, among others.
......................................................................





