Published in Winter 2011 issue of Chip Design Magazine
Power consumption is now one of the major constraints in chip design, with The International Technology Roadmap for Semiconductors identifying it as one of the top three overall design challenges for the last five years. For many designs in the past, power was measured only after the chip and board were fabricated and exposed to real working conditions. Or it was measured when the final application software was integrated and ported onto the final hardware. In other cases, optimizing for power was left for the back end of the design process, employing techniques like clock gating on the gate-level netlist or place-and-route optimizations. As power requirements continued to evolve, some power-optimization tasks moved up to the register transfer level (RTL) through the design of power-management units and RTL simulation of power effects, such as power-domain shutoff.
While moving power optimization from the back end to the RTL helps to reduce the power consumed by the final device, the effect is still quite limited for three reasons. According to a study published by LSI, power can be reduced by only 20% at the RTL, by 10% at the gate level, and by 5% potential reduction at the layout level. Making architectural changes that reduce power significantly isn’t practical during the latter phases of the design cycle, given the redesign effort involved and tight schedule constraints. Correctly predicting power consumption under real operating conditions is severely hampered by slow RTL and gate-level simulation speeds.
By moving power optimization to the ESL, all of these limitations are eradicated. At the ESL, engineers can reduce power by up to 80% (according to the LSI study). ESL models simulate several orders of magnitude faster than RTL and gate descriptions, providing highly accurate power results under real operating conditions. Perhaps more importantly, ESL is used early in the design cycle. It therefore enables engineers to act upon early results, making architectural changes and exploring the best tradeoffs between power and performance well before the RTL implementation.
KEY ESL POWER COMPONENTS
However, ESL solutions must possess specific characteristics to deliver on the potential to reduce power by 80% over traditional design methods. A truly effective architectural power-optimization solution has four essential ingredients:
MODELING POWER AT THE ESL
ESL implies abstraction above the RTL. Modeling abstracted functionality and abstracted timing using languages like C/C++ and SystemC is quite common. Yet it’s less common (and may even be perceived as impossible by some) to model power at the ESL, given the lack of details at this early stage of the design cycle.
As illustrated in Figure 1, a power model must cover both static and dynamic power. Because dynamic power must account for design activity, it cannot be captured by static values in a spreadsheet. Furthermore, power models must account for the process technology being used and the voltage and clock frequency applied. In particular, they must correctly model the effects of real-time data flowing through a device under the control of application software. Only power models that are “reactive” in nature to such data can properly model power.
Figure 1: Power modeling must include both static and dynamic power.
Transaction-level models (TLMs) capture abstracted design functionality and timing into loosely timed (LT) or approximately timed (AT) representations. LT modeling should present the model behavior correctly by maintaining correct event ordering. In contrast, AT modeling captures when a transaction actually began and ended. Yet it doesn’t attempt to model what happens on a cycle-by-cycle basis, as modeled at the RTL. TLM AT models are ideal for capturing the power usage of a corresponding RTL block. They’re reactive to incoming traffic, simulate fast under software control, and combine the power model with the timing information already contained in the model. TLMs can be easily assembled to form a representation of an entire design (from a single system-on-a-chip [SoC] to a board) as well as a complete electronic system.
EARLY POWER ESTIMATION
Proving an early, but sufficiently accurate estimation of power is critical. Accuracy is derived from precisely modeling all power effects at the TLM. But it’s largely dependent on simulating the design under the same operating conditions to which the design will be exposed in the consumer’s hands. When processors are running in the “Giga-CPS” range, one second of a real-time scenario will require billions of RTL and gate-level simulation cycles. Thus, simulation closer to the “Mega-CPS” range is required to simulate the software execution.
Power estimation reveals the initial gap between the desired and actual power upon which the designer will have to close. This “gap analysis” must be presented early enough so that significant architectural changes can take place, such as shifting functionality from software to a hardware-based implementation and exploring other architecture-level techniques like voltage and frequency scaling.
QUICKLY EVALUATING TRADEOFFS
When power is initially estimated and the gap between desired and actual power is quantified, the next step is an iterative process that entails applying a change quickly, simulating the modified design, and evaluating the impact of the change on the power and performance profile graphs. With the high configurability and complex topology of today’s designs, being able to change the design topology, configuration, and timing and power of a model without modifying its internal functionality is critical.
To reap the huge power-savings potential at the architecture level, the ESL power solution must leverage these four components while providing sufficient automation to make them productive and easy to use. Simulate- Analyze-Change iterations must be quick enough to allow engineers to find the best performance-versus-power tradeoff. The Vista™ suite of tools and methodologies from Mentor Graphics promises to overcome these barriers by delivering the key ESL power components, automation, and fast simulation.
SCALABLE TRANSACTION-LEVEL MODELS
At the heart of the Vista ESL solution is a scalable transaction-level modeling methodology for writing architectural models in compliance with the Open SystemC Initiative TLM2.0 standard. Scalable TLMs retain separate models for the abstracted functionality, abstracted timing/ power layer, and bus-protocol communication interface (see Figure 2). This approach allows users to change the interface attributes without changing the functional behavior of the model. It also allows the model’s timing and power attributes to be added or modified without changing its core functionality or the bus-protocol interface. This separation provides tremendous value to users. For example, exploring a different bus protocol or exploring the impact of frequency reduction, voltage scaling, and even different bursting techniques for a transaction doesn’t require modifying the functional portion of the TLM.
The policies enable designers to define key timing and power attributes for the scalable TLM timing/power layer. Changes to a design can be easily applied by modifying existing timing and power policies or changing design connectivity. Power policies allow users to define— in tables—the static, clock tree, and dynamic power associated with each transaction type. A timing or power policy can capture a simple value or complex formula that accounts for design attributes, such as clock frequency, voltage, process technology, and design state. Figure 3 illustrates the use of equations to capture power policies.
Figure 2: Scalable TLMs retain separate models for the abstracted functionality, abstracted timing/power layer, and bus-protocol communication interface.
Similarly, all communication attributes, such as bus protocols, ports, registers, and variables, are defined in a tabular entry format. The timing/power attributes can be compiled into the timing/power-layer SystemC model and the communication attributes into a TLM2.0-compliant TLM interface layer. The entire model generation of these TLM layers and their interaction with the functional core is then automated.
Assembly allows all of the TLMs to be connected in an initial configuration that represents the topology of the device architecture. As complex architectures may contain over 50 components and hundreds of interconnects, quickly incorporating changes in the topology is critical. Topology changes during exploration can be easily applied and visualized using the Vista Block Diagram editor, which automatically produces the revised, simulatable model.
At the ESL, a design can be quickly simulated and power results may be compared against prior results. Simulation is the only means by which accurate power for the entire design can be estimated correctly. Software running on embedded cores and programmable devices generates traffic at various sizes and rates. This traffic propagates through the design buses to peripherals and memories, causing dynamic power consumption due to switching activity. Power-management devices can be tuned to control voltage and frequency and turn power domains on and off during various device operation modes, thereby impacting both dynamic and static power. Simulation in AT mode, which is required for capturing performance and power data over time, is slower than LT-mode simulation. Yet it’s still orders of magnitude faster than RTL and gatelevel simulation, allowing the capture of power data for all representative use-model scenarios. Furthermore, switching off timing/power on specific components during simulation allows the designer to concentrate on the hotspot power analysis of other key components.
Figure 3: Here, equations are used to capture power policies.
Early power estimation is provided by extracting large performance and power results, which are captured during simulation, into easy-to-analyze graphs. Designers can choose any instance in the design to evaluate how much power it consumes. ESL tools also can measure mean and peak power values, provide power profiles for various system and software scenarios, and support hotspot analysis. Hot-spot analysis reveals the design areas that are consuming power well above target levels. Power profiles can show the distribution of power over time subject to model activity under system-level scenarios or the software routine running on an embedded processor core. Figure 4 illustrates a power-consumption graph of a FIR filter running three real-time scenarios. The voltage/ frequency scaling power optimization technique is applied to the last one.
ESL/HLS POWER OPTIMIZATION
One of the best opportunities for optimizing power is presented by integrating high-level synthesis (HLS) into the low-power ESL flow. Using HLS, various RTL implementations can be derived from a single TLM functional description. Power results from the block implementation can be annotated back to the architecture for further optimization. The better HLS tools automate prevailing low-power design techniques including clock gating, memory-access optimization, resource sharing, and multiple clock domains. These tools also can apply different technology and architectural constraints to the same functional model during synthesis, resulting in different RTL implementations with different power characteristics. This allows designers to update the power policies based on the actual RTL implementation generated by the HLS tool. In this way, architectural and HLS tools combine to give the most complete flow for optimizing power at the ESL.
Figure 4: Shown is the power profile of a FIR filter design under real-time scenarios.
However, ESL power design doesn’t end at this stage. Power-aware downstream tools are needed to simulate, validate, and further optimize power at the RTL synthesis, place-and-route, and layout phases. At all of these levels, the ESL power model of the entire design and each of its sub-components can be used as a timing/power driving budget for these downstream tools. The ESL platform can be used for additional exploration and power optimization as the design power model is refined.
Figure 5: Shown is the complete power-optimization ESL flow.
An ESL power solution provides the largest opportunity to reduce power. However, this requires modeling timing and power in the TLM and optimizing the architecture by simulating real device operating scenarios under application software control. An ESL power solution must allow for modeling all power attributes. It also has to produce a power model that’s available before the architecture topology is nailed down and any new RTL is designed. The power model must be at a high enough abstraction level to quickly simulate real scenarios under software application control. It should be scalable, separating timing/power from the functional specification to allow the quick exploration of design architectural attributes. It also should be capable of capturing timing and power data efficiently without intruding into the design-behavior source code. The automatic generation of timing/power models and the assembled ESL platform code significantly increases ESL design productivity. Integration with HLS tools in a tight power-optimization loop provides immediate feedback on user power-budget assumptions. Finally, the ESL power model can be used as a reference-executable specification of the design power model throughout all of the back-end designrefinement stages including post-silicon.
Shabtay Matalon is ESL market development manager for Mentor Graphics’ Design Creation and Synthesis Division. He received a BS in electrical engineering from the Technion - Israel Institute of Technology (Haifa, Israel). He has been active in systemlevel design and verification tools and methodologies for over 20 years.