Posts Tagged ‘Calypto’

Pathfinding For Power And Heat

Friday, March 2nd, 2012

By Ed Sperling
There are many ways to measure power and heat in an IC, and each one of them adds tremendous value to a design. But there are still holes, and those holes are just beginning to get filled.

Power and heat have emerged as two of the most persistent problems in advanced designs, and there is no single or simple way to tackle either of them. Nevertheless, there is at least progress on this front.

“Power is a side of complexity that has many, many dimensions,” said Aart de Geus, chairman and CEO of Synopsys. “We have multiple power domains and we now have states between on and off. How do you deal with that with ones and zeros?”

At the highest level, high-level synthesis can be used to provide generalizations about whether one processor versus another, or one piece of IP versus another will save power. The challenge there is to link those HLS models with other models to make them useful. This has been an ongoing challenge for startups such as Calypto and Forte Design Systems, as well as Synopsys and Cadence. (Mentor Graphics spun off its Catapult C platform to Calypto last year.)

At the lowest level, starting with RTL and even down to the gate, measurements are extremely accurate and useful. The problem is that once RTL code is written, it’s more difficult to change. Providing that kind of information early, and in context, has been a major challenge. Apache Design has created an RTL Power model, for example, as well as an RTL power flow and a chip-package-system model and flow to extract that information early enough to include it in the RTL.

The big missing piece, however, has been even earlier in the design process. What happens, for example, if a processor from one vendor is substituted for a processor from another vendor? Or what if signal traffic is routed one way in a design versus another? These are important tradeoffs at the architectural level, and there has been only scattered progress in this area. That’s partly because most of the complex thermal and power modeling for advanced is still being done with spreadsheets rather than with automation tools.

Docea Power jumped into the market this week with what should be an interesting first step. Its new AceThermalModeler software is aimed at architectural-level exploration and analysis for heat and power. The focus is on early system floorplanning or partitioning, system packaging, integration architectures and power management policies. It’s a certainty there will be other entrants into this space of the next year or two. All of the major EDA companies and their customers have been talking about the need for this kind of technology since designs reached 40nm.

Thermal map. Source: Docea Power


But Docea CEO Ghislain Kaiser said the spreadsheets literally have run out of room at advanced nodes. They cannot handle any more data. What’s needed now is a way of raising the level of abstraction with accuracy, and he says there is an opportunity between the complex algorithmic approaches used for signoff and the packaging data sheets that are too far from reality. It remains to be seen just how quickly this market will ramp up as a result of that, because the next challenge will be to integrate this kind of information—all of it, from the high level to the pathfinding architectural models—into existing flows. That includes companies designing chips, as well as the ESL flows that are created by the Big Three EDA vendors, and the modeling standards groups such as OSCI, which developed TLM 2.0.

All of this will take time, of course. Standards groups move cautiously and large companies don’t make rapid changes to flows that work. Still, the need for more analysis that can be integrated throughout the design process is clearly needed.

Step Away From the Spreadsheet

Thursday, February 9th, 2012

By Ann Steffora Mutschler
Engineers today spend more than a quarter of their time trying to meet power specifications.

A survey of more than 700 engineers by Calypto illustrates just how important and time-consuming power management is today for engineering teams. As consumer devices grow ever more complex, the need to deal with, analyze and optimize power at not just the RTL but at the system level is the next challenge, even if the path to reach that goal is not yet clear.

The opportunities for optimizing a design for power efficiency are greatest at the architectural level of abstraction. The further a design moves downstream the less effective optimization techniques become, noted Yossi Veller, chief scientist for ESL at Mentor Graphics, in a white paper he co-authored for ARM’s IQ Magazine. “Power optimization must begin with architectural analysis, exploration, and optimization of power and timing at the electronic system level (ESL). According to a study by LSI Logic, techniques available at the RTL synthesis phase have the ability to reduce power by 20%; those at the gate level offer a 10% reduction; while those at the layout level can reduce power by only 5%. Waiting until the RTL to begin optimizing for power is a wasted opportunity because power usage can be reduced by 80% at the ESL.”

Fig. 1: The ability to optimize power at the architectural far exceeds that at lower levels of abstraction.

“Traditional power optimization tools are really working at the lower levels of abstraction,” explained William Ruby, senior director of RTL power product engineering at Apache Design. “If you look at synthesis, if you look at physical design, there are some automated techniques that are available in those tools. But those are in a category of additional refinement-type steps. Once you have the design architecture nailed down, then you can add in some optimizations based on those tools and you can get some additional incremental power savings, but the part that is missing is enabling the true design-for-power efficiency. If you look at modern chip architectures, they are extremely complex and the RTL descriptions of these architectures are even more complex such that RTL in some cases is no longer seen as a viable architectural description language. You want to be able to describe the architecture of the design in a high level of abstraction.”

With this description comes the requirement to be able to analyze power. Today, this is done by synthesizing the design from a high-level description such as C++ down to RTL, and then an RTL power analysis tool can function and give feedback into the architectural domain. But what needs to accompany this synthesis-loop-back type of flow and give some indication of what the power numbers is more intelligence in those high level tools. They need to point out inefficiencies in a design at both the RTL and architectural levels.

Chris Rowen, CTO and co-founder of Tensilica sees two big challenges for power analysis tools. “One, it is very, very difficult to isolate where the real problem is. It only makes sense to really measure power at the level when you have really synthesized the logic and laid it out and you actually know what the physical design looks like, because the physical design has a huge impact on what the power dissipation of the circuit it.”

By the time it has gone through synthesis and place and route, you have really very little visibility into what was the original logic being questioned. “It all goes into the Cuisinart and all you get is this amorphous mush of gates at the end. So if someone asks you, ‘How much power is being dissipated in my multiplier versus in my divider versus in my register file,’ I don’t know anymore because I have to process them all together in order to get good physical results. But then it all has been aggressively remapped into other logic forms and I can’t isolate the power easily. So you have to work in rather indirect ways to figure out whether the power was being dissipated in one function versus another.”

A second problem, he said, involves system-level tracking of different scenarios. “It is extremely difficult to reach your power goal if you say, ‘Let me use the worst case assumption about each subsystem. I’m going to assume that every piece of my baseband is on, and every piece of my Layer 2 and Layer 3 protocol stack is on, and my image processor is on, and my apps processor is running full out, and all of my RF subsystems are running,’ because of course you’d exceed your power budget by a factor of two or three. Instead people recognize they’re not all on at the same time, the system doesn’t work that way. When you are doing one thing, then you’re typically not doing something else. Therefore, you only have to look at the particular combination of subsystems that is on at that time. However, the software guys have really poor tools to correlate what’s going on in the higher-level operating modes to what’s going on in terms of actual power dissipation in different subsystems. They are completely shooting in the dark where they do not have anything like the kind of accuracy for the modeling of these things.”

As a step towards true system-level power analysis, engineering teams are gradually figuring out that they need to build approximate models of power in addition to simulation environments that are fast enough to run realistic scenarios and to capture real activity. “Ironically getting power information is more than anything else probably a function of getting fast enough simulation, because only if you can run realistic size scenarios will you really gain interesting information,” he said.

This has become one of the big drivers of ESL, which until recently has been relatively slow to catch on. But complexity at advanced nodes, including power considerations, have significantly boosted it’s appeal.

“What the user would like is to have at the very early stages, when he has a TLM model of the design, is at least a relative assessment what architecture decisions will impact the energy in which direction,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “He will also want to know how the software impacts all of that. From a technology perspective, TLM models allow you to do that so it’s fairly straightforward to annotate power-related data into TLM models,” he asserted.

Annotating models with data just like annotating performance is a challenge and can be approached in three ways:

First, he said, “You can start with your assumptions, with your power budget. TLM models and virtual prototypes allow you to then execute your assumptions so you have in your power envelope/power budget. You say, ‘These tasks should take that much power, I know that from past experience,’ and then you execute your virtual platform with those annotated, estimated data or budgeted data. And you get dynamic results depending on what tasks the software ends up calling, how long a cell phone is used for which task in a day, and so forth.”

Second, annotate back from when you have RTL. “At the RTL level you have these switching formats that you can derive from the RTL to get a good idea about the activity,” Schirrmeister continued.

And third, it can be dealt with at the silicon level by taking previous designs, measuring power information and annotating back into TLM models.

Design engineers are undoubtedly looking for analysis and optimization at the system level so they can do power analysis and power estimation before RTL is available and before they can do gate-level simulations. But are they truly ready to adopt it?

Achim Nohl, technical marketing manager for Synopsys’ solutions group pointed out that today, power analysis starts with gate-level simulation. “If you talk to a hardware engineer and tell him, ‘We are going to employ virtual prototyping and high-level models to do power analysis,’ he will certainly look at you a little strange because he thinks, ‘I’m doing all those back-end optimizations and all those specific things to optimize power. How will you ever be able to reflect that in a virtual prototype simulation?’ But that’s not the point. For virtual prototyping, the granularity of a system is very much different. You’re not looking at just the memory controller. You’re looking at the CPU with the memory controller, the buses, the interconnect, the peripherals and how all those things are orchestrated to find out where the different hot spots are and what is best way to program all those pieces. What is the best scheduling technique? That is the concern at that level.”

When a new chip is architected today, estimates are done to determine whether the chip is feasible at all from a power perspective, he said. “Today, people are using spreadsheets in order to do this analysis, and this can only be a worst case analysis because they don’t know the dynamics and can’t reflect the dynamics of the system in those spreadsheets.”

While the pure architectural level tools don’t exist yet, many users are likely content with high-level synthesis tools for the time being. Apache’s Ruby believes they are good in their own respects but they are not actually meant to give architectural guidance; they are just meant to synthesize the design above the RTL.

One final thought for nervous system architects: The architectural tools of the near future will not replace the actual architect unless they become truly artificial intelligence, which is not likely to happen any time soon, Ruby concluded.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Friday, November 18th, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: What’s missing from our tools arsenal? If you have enough experience you get a feel for what works, but how does the average engineer get there?
Martin: You need to build some tools that can actually do high-level system bookkeeping. That’s really what it’s all about. When you have a power estimation or energy estimation, it produces traces of activity that can feed into an overall system model. Various IP models could all be calibrated to send out that information. People could run scenarios and estimates that way. It still seems to be missing. I recall talking to people about this two or three years ago.
Kulkarni: The industry seems to be doing a lot of optimization techniques in ESL, but not a lot of analysis techniques. We have been doing that for several years at the RT level. A lot of us can do RTL power optimization. But from an analysis standpoint there are very few who can do it. It takes so long to be accurate about how to stimulate RTL without having a synthesis engine underneath. ESL synthesis experts still don’t have the analytic engine for ESL. That’s a missing piece along with ESL power models.
McCloud: We have very good point tools out there today. We have good HLS tools doing very complex, high quality hardware accelerators. We have some pretty good power optimization, power analysis, power optimization and integrity tools. What’s missing is a way of productizing the integration of all those tools. It’s standardizing UPF and CPF and getting that propagated through the tools. And then what’s missing on the TLM side is better standardizing around annotating the power and getting the right fidelity to the TLM models. That’s one of the primary purposes of the TLM model—to be able to run real software on it and get accurate estimates. What we need to do is put these good tools together.
Meyer: And we need to recognize that it’s not just an issue of hardware vs. software. There’s also the issue of multiple cores. People have the choice to have some stuff run in a high-speed core and other stuff run on a lower-speed core. Those types of decisions don’t come easily. There’s a fair amount of effort you need to put together a prototype so you get an idea of what the power is if you have two cores vs. one core and an accelerator. There’s a lot of modeling that has to be done to come up with an answer there, even with very early estimates.
Cline: In the case of people with 30 years’ experience who have to trade off between one core, two cores or four cores and the custom logic that goes around it—there are very few of them. You won’t sell a lot of tools into that market. Most of the people with 30 years of experience are grinding RTL everyday or writing software. For EDA vendors, you get paid for optimization with an extra zero vs. what you get paid for analysis. It’s the way the world works. If you have a tool that squeezes out an extra 10% at the end of the process, you get paid for that—especially if you’re putting out a fire where you don’t meet timing or something else that’s critical. What’s the time value of a week over the course of a project? At the beginning of a design a week isn’t worth anything. At the end of the project the value of a week is huge. If you sell a product into the last week of a project and it squeezes out another 5% or 10%, you’re a hero.
Martin: On the other hand, the decisions you made in those first few weeks may cost you downstream. There are a lot of design teams out there where it seems there is very little accumulated experience and they’re confronted with these problems and very rapidly trying to do design. Stepping back from the tools and the analysis, you have to ask, ‘What is the overall design methodology?’ Do you have that experience taught to design teams to make choices about how many cores, what kinds of cores, what kind of hardware blocks, and do they all fit together? That architectural expertise has been gained by years of experience.
Kulkarni: It’s almost as if power is where timing was 15 years ago in terms of the knowledge base.

LPE: Except that the people who know timing now need to know power, as well, right?
Kulkarni: Yes. It’s not just mobile or cloud computing. Everything is focused on power, from disk drives to memory. Customers that have 65 watts per chip want to move to 60 watts. They want to move from 5 milliwatts to a few milliwatts. The knowledge of power is limited, though, in terms of power management, power optimization, as well as certain decisions that have an impact downstream. How do we, as an industry, get power analysis and power decisions to be pervasive? Even from my own company we have not given a complete recipe for how to do RTL to GDSII design. It’s in many people’s heads. The end customer also changes their mind on the fly, but at least in 80% of the design, if we can all produce a recipe book then we all benefit. UPF and CPF have started that work already. But when you go to a customer, typically they have some used IP and some new IP. UPF/CPF may apply to the brand new IP but not the old IP, so mix and match flows are another challenge. How do you make sure the previous design worked and certain parts of circuits work in the new design?

LPE: What you’re talking about is flexibility in modeling, right?
Kulkarni: That’s correct.

LPE: HLS has been around for more than a decade and it still isn’t mainstream. Will power force a change in perception?
Cline: It’s certainly going to be a factor. But what drives it is the ability to get your job done in the right amount of time.
Meyer: Yes, it’s time to market.
Cline: It’s also time to results. You have to hit some sort of metric for your results. I was just in Japan and the concern there is how they’re going to get $5 per chip. To do that they need twice the number of features and speed. So pick your favorite HDTV company. If it’s not Visio, then Visio is undercutting them by 50%. How do they get their chips to where they’re competitive?
McCloud: I wouldn’t underplay the potential significance of power becoming a critical factor for people adopting HLS. We’re just now scratching the surface in terms of what we can do in HLS. Memories are consuming 60% of the power in a typical HDTV. There’s a whole slew of memory optimization we can do around the way we slice the memory, around memory-enabled gating, light-sleep mode, deep-sleep mode. Those are things that are perfectly suited for an HLS tool, which has a detailed understanding of the state in the design of the data path. The tools already have sequential and combination clock gating. But as we start to go past 45nm, it’s not just the battery life. It’s also thermal and power integrity issues that become critical. That’s when HLS will really become close to a requirement.
Martin: Any technology that lets you explore the design space for different alternatives, whether it’s HLS or configurable processors, if you’re keen on performance you can examine more alternatives more quickly. If it’s power and it really does let you explore that space, that’s also key. If it’s just within a few percentage points of what you do in RTL by hand, that’s not going to drive that market. It has to offer a wide margin.
Cline: It’s also what’s called ‘change and check.’ You can change something very quickly and check what the results are. We see a lot of engineers that need to change a number of things and then check them, and then change and check more. You can’t do that in RTL. It gives them a whole other set of options.
McCloud: One of the things we really need to do is shrink-wrap the methodology. Then the smaller companies can pick it up and run with it. Right now it takes a big company to put that investment behind it.
Cline: I agree, except that the very small companies use it because they can’t get the funding to go build a $50 million chip. The middle of the curve are the guys who can’t move just yet.
Kulkarni: The ideal solution for designers would be HLS for optimization of power, timing and area, and then quickly checking against the RTL power analysis. The reason is that at the RTL level you can capture physical effects. That becomes a linchpin from the physical world to the ESL world. But you also have to have good power models.

LPE: Don’t you also need more standardization?
Meyer: You certainly need to be sure that you’re not double counting. A lot of times you’re modeling more than just the software, and you’re trying to estimate the power there and you’re starting to include what’s in the memory and the cache. And then you start looking at the cost somewhere else, and you’re adding the cost in again. You have to have a way to say, when you aggregate it, how do you make sure it’s not double-counted.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Friday, November 11th, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Does the approach to using tools have to change for low-power?
McCloud: Way too often, when people get to the block level, they start saying, ‘I’m going to use my functional tests.’ That’s completely wrong. That’s one of the big advantages of doing things at the higher level. You’ve got things that are closer to the real application running. That’s an incredible difference.

LPE: But how do you integrate that concept with different use models? Two different users may use the device for completely different purposes and in different ways.
Cline: You can go through the user profiles pretty easily. If Apple claims its new phone will have eight hours of talk time, that probably doesn’t include me not talking on the phone at all. I’m just going to surf and get e-mails and do other things. They understand one profile, and then there’s probably a mixed profile for the average user. You can conceptualize that at the system level when you’re doing your design, figuring out if the phone is running for eight hours straight here’s what’s probably going to happen on your next project. But you need to get down to something measurable, which right now is the RTL level, and even that is questionable in some situations. But it’s a tough problem.
McCloud: Today, the cell phone is one of the more common applications that high-level synthesis has to do with the image single-processing blocks. A large number of these ISPs—everything from the sensor to the image correction that’s occurring to the final JPEG decoding—are done with high-level synthesis. The reason is that having a dedicated hardware accelerator is going to produce the lowest power needed for doing image signal processing. It’s a very specific function. You take a picture, you do image correction and processing, you compress it, store it and you’re done. You shut the hardware down.

LPE: High-level synthesis isn’t normally associated with power estimation. Is this a new use?
Cline: It’s been there to some extent. Power estimation comes in a number of different forms. Let’s say you’re laying down a clock cycle and designing it in high-level synthesis, and the tool is doing the work to fill it up. It knows that a multiplier is going in. It can do quick analysis on a multiplier with a reasonable stimulus to figure out whether this multiplier has a better power profile than another one, as long as they both meet your performance goals. You can trade off the area with them, too. One of the things we can do is go through this process of scheduling the clock cycles, filling it up with all the different functional units that are going to go in there, and determine at a later stage in the synthesis process which are the lowest-power multipliers. Can you swap those in and still meet all your timing budgets? If you blow timing then you’re tool is useless. You have to meet timing, and afterward you minimize area and power and let the user make tradeoffs where they want to. It’s been there for a long time with various levels of maturity. What you’re going to see is that will continue to mature, but you have limited benefits at that level. You can make 5% and 10% improvements at that level. You can’t make 50% improvements.
McCloud: It’s evolving. Power has been in HLS for awhile. It’s being used to design applications for lower power. The first stages of this centered around simple exploration. You take a JPEG and your requirement is to compress that picture in 500 milliseconds. You can do that with a 25MHz clock, a 50MHz clock or a 100MHz clock. Each one of those has very different power tradeoffs. That kind of capability has existed in HLS for years.

LPE: But you’re coming at it from the standpoint of clock speed compared with power first.
Martin: That’s only true if you have a fixed-instruction-set processor. When you design your own instruction set you’re in a whole new ballgame, which is one reason we’ve been supporting high-level analysis and estimation in the design flow for a number of years. The power talk is interesting because of the confusion people have between power and energy. In mobile devices, assuming you have driven toward a peak-power level that is sufficient for low-cost packaging you’re targeting in your device, it’s all about energy. Energy is the issue here, and there are many ways to fit into a particular energy budget, but so often people express themselves using metrics such as milliwatts per megahertz. What does that mean in terms of the overall energy? It depends on what a milliwatt does. If you target one type of instruction set where you can do more in one cycle than in another, that reflects in the total energy consumption.
McCloud: For me it all boils down to battery life. That’s what matters. There’s a second component that is centered around power and integrity. The reason it’s so important is that when you reach 45nm we’re starting to reach a technology inflection point where you cannot scale the supply voltage any more to help reduce the power. At about 45nm or lower, the power density is goes non-linear. This is going to create huge problems around thermal and supply integrity. You’re going to start getting a Vdd dropout, you’ll have hot spots in your chip, and the chip will burn up. It’s not just about battery life. It’s how we’re going to be able to take advantage of these technologies in the future and be able to produce these chips in a way that power density doesn’t go through the ceiling.
Kulkarni: Specifically what we’re looking at is how do we get the high-level power support budgets versus the power consumption, which is insatiable demand of all functionality and multiple modes of operation. How do we make sure that the power grid we’ve designed will work? We’ve been watching that stimulus carefully. But what happens to that stimulus out of millions of clock cycles. There are things in the context of dynamic voltage, voltage route, and the package, and the PCB, and the system. You have a band of inaccuracy first, and then you look at the energy models and what happens over time. How do you capture those when you are switching between a lot of domains and there is a lot of switching activity? How do you model that accurately? And how do you model the physical effects at a higher level of abstraction so that your inaccuracy band gets narrower and narrower. We especially see that below 28nm, where there are huge transients causing voltage droop. Either your grid will collapse if you overdesign, or if you underdesign you will have electromigration problems with power energy and heat all coming together.

LPE: What you’re talking about here is a hierarchical flow with two-way communication, right?
Kulkarni: Yes. And the reason we have not done too much power synthesis at the ESL level in the past is that when you make a transaction-level model, how do you go inside that? Power creates a different level of challenge. The industry really needs to address how to create these high-level models that will tune into what happens down at the chip level. And then you have to connect the front end to the back end and get to the details of power in both directions.

LPE: How do we fix software to make systems more efficient?
Meyer: You’re presupposing that you have software at the time of the design. That’s one of the biggest challenges. That’s where virtual platforms become an important part, running real software on the system. That’s one of the real challenges at the system level—to have something you can run in software early enough to influence the hardware decisions that you’re making.

LPE: Do we need an understanding of the software and how it’s going to function at a very high level, though?
Meyer: For some cases, if you could characterize how much the software is using each of the blocks and be able to understand system performance without detailed modeling, that would help you understand your power budget and do a better estimate. But we really haven’t spent much time working at that level yet.
Martin: We spend a lot of our time working at that level. Letting our customers build configured and tailored processors is extremely important. Sometimes you can do an interesting job by taking referenced or standards-based software, and if you have a very good compiler for an application-specific processor it may be able to do things like automatically vectorize and infer the use of some fairly sophisticated instructions. That has a limit, though. To really figure out what an optimal algorithm implementation would do on a particular instruction set you may have to get into more manual optimization. But some of the early work can be done with an envelope if you have a good targeting compiler.
Cline: A lot of our customers have that exact issue. With cell phones they may look at their next platform and say, ‘This time it’s going to have real-time video on it. Can my ARM processor run this real-time video algorithm at the same time it’s doing a network connection to beam everything up while it’s also downloading e-mail?’ And they may not be ready to buy the next ARM processor. So if they put this into custom gates, what is the cost in terms of area and power and what is the speed? They do that initial analysis using high-level synthesis and figure out what the tradeoff is. A lot of times they can buy a bigger processor and take on more royalty cost or power issues. So in some cases they have the software already, or at least they know what it’s going to look like. In other cases when you build a bigger system you may not have that.
Kulkarni: One of our customers who was designing a digital TV asked us whether they can profile the software for power. It’s an interesting question. With digital TV your eyes are pretty much looking at an oval field picture, so can you do power reduction on the black pixels on the edges? That’s pixel-by-pixel power reduction. That’s a great challenge for all of us. It’s not just mobile applications. It’s also digital TV, streaming video, heads-up displays for military applications, and so on.
Cline: Those guys don’t care about battery life, which makes it very interesting.
Kulkarni: But they still want to reduce the power.
McCloud: It’s all about the packaging cost.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Thursday, November 3rd, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Do we need to redefine what constitutes a system when we’re talking about power?
Meyer: It definitely is part of a bigger picture. The problem is that there is very little support up at those levels. You’re managing budgets and it’s a crude mechanism you need there. As we go forward things like virtual platforms and power—or maybe more on the energy level where you’re looking at where power is spent between hardware and software—and making decisions on how things should be implemented makes a lot of sense.
Martin: We’ve supported power modeling in our application-specific processor design flow for a number of years. The kind of power modeling you get can be quite accurate for the instruction set. It’s reasonable to allow you to make energy tradeoffs with the instructions you have. You can then run that set of instructions and predict what the energy consumption is going to be, and then you can do experiments around that. We feel like we’ve been a plug without a socket for years, at least for determining the right instruction set.
Kulkarni: In terms of how we see the world, it’s more than absolute accuracy and prediction. It’s really relative accuracy and relative analysis of what-if scenarios. That’s where the whole world is going in terms of macro architecture analysis. But the Holy Grail is ESL synthesis with RTL power analysis. That’s the ideal flow—you capture some of the physical effects in the RTL world but do a lot of what-if tradeoffs, hardware-software co-simulation, DVFS, looking at various power scenarios, and then validate that through a real hardware description language because that’s where all the realization will occur. But you can do all the relative accuracy of power up at the ESL level. That’s the picture we see from our end customers.

LPE: So it’s big-picture power estimation?
Kulkarni: Power estimation is still questionable at the moment. But getting traces to drive the RTL power analysis is a much better approach, which we have used in a mobile application. We took a 2.5 million-gate design and applied SystemC-level simulation, then did ESL with a partner, and then looking at the IP and DSP models there were multiple cores. We took the transaction-level trace and combined it with RTL analysis to essentially emulate what would happen on a six-second call on a cell phone. We could analyze that in four hours using this flow vs. three months using pure RTL analysis and RTL power estimations. So a combination of ESL synthesis plus RTL analysis that captures a realistic stimulus and the physical effects can reduce that band of accuracy from RTL to gate to final signoff.
Cline: Today that’s a typical flow. There isn’t any real SystemC analysis—or good ones, anyway. But as far as optimization and estimation go, these are two separate worlds. The first one is that somebody has to get the power right on a macro level. You need some way to model those larger blocks and get your power budgets right at the block level, which may have only 10 blocks in your design. From there you need to go to power estimation. You go through synthesis, go to RTL estimation, and then loop the information back into your system level. There has to be some sort of modeling at the higher level, with more parameters than just the performance numbers. There has to be some other quick estimation of area using a synthesis engine and quick estimation of power using a combination of synthesis and techniques through application-specific processors or RTL.

LPE: How accurate do the initial high-level power analysis or estimation have to be?
Martin: Our experiments with our processor estimation have been that for the RISC core you can be plus or minus 3%. For other things it can be more like plus or minus 20%. The key is to make macro tradeoffs. It’s not whether this is 5% better than that one. It’s whether this takes you in a different space than that one, and for that 20% or 15% is an adequate coarse-grained analysis. People always want to verify at RTL and maybe down to place and route that the decisions they made at the high level are being validated at the implementation level.
Kulkarni: What we find is a band of inaccuracy, as opposed to absolute numbers. About 30% is adequate as you go through RTL, RTL synthesis, P&R, layout, and final grid signoff. If you keep that band and narrow it down consistently, you get a true power budgeting solution at the system level. That includes hardware-software tradeoffs, and RTL to synthesis all the way to grid design and package. That way the ESL designer working on the next-generation smart phone is not completely off the mark in terms of final cost, power and SI budgets.
McCloud: From a high-level synthesis perspective, accuracy needs to be quite high. At the end of the day you’re doing hardware architectural exploration between different frequencies and technology. If you are off 20% or 30% when you’re trying to make your design selection it’s significant. It does come down to the accuracy of the up-front power estimation tool when you get closer to the hardware you’re trying to create. If you’re talking about something before that in the TLM platform space, at that level when you’re making decisions about whether to move something into software or keep it in hardware, accuracies in the range of 30% or 40% are sufficient. But if you’re creating a hardware accelerator, you need to be within 10% or 20%.
Meyer: If you’re consistently overestimating by 20% and you know that’s happening, that’s much more acceptable. But if you’re plus 20% in one area and minus 20% over here, then your confidence disappears very quickly.
Martin: You have to be monotonic. If the estimator says A will be greater than B, then by the time you get detailed analysis A had better be something greater than B—even if the actual numbers aren’t the same.
McCloud: And that’s the problem. If the estimations in high-level synthesis are off by 20% or 30%, you run a high degree of risk that your relative comparison between one solution and another solution is not the same relativity when you go down to RTL synthesis at the gate level. If you think solution A is 5 milliwatts and another solution is 7 milliwatts, when you go to actually implement that and run it through power estimation tools at the gate level, you might find that comparison is correct. That’s why I believe you need a relative level of accuracy.

LPE: Is it harder to get an accurate assessment as we start adding in multiple power islands, voltage rails, and stacked die packages?
McCloud: It might get a little bit easier. Of course you need to be able to architect your high-level synthesis tool to be able to take into consideration that you’ve got islands, but in some respects you’re localizing the power estimation to a particular region of your design. When you’re talking about power gating of an entire hierarchical block, that’s actually a benefit when you localize it to a specific area.
Cline: The problem that high-level synthesis tools will have in the future, if there’s not a closer correlation to the back end, is having a disconnect at 20nm or 1nm.
Meyer: And having some way of passing down what you think is a good implementation for this piece of it. There has to be something to express, ‘I expect this to be a high Vt and this to be a low Vt.’

LPE: Does it make it harder to pick which processors and which IP and which interconnects you’re going to use because you are running at such a high level?
McCloud: The further you get away from the silicon the greater the impact you can have on power. At the gate level a power expert can save the design 10% to 15%. If you get up to the TLM decisions with software and hardware, you can achieve huge power savings. Maybe you only 30% accuracy, but the decisions you make can have a bigger impact.
Martin: That’s where configurable processors can open up a whole new area. You can get a 10-to-1 improvement in performance in terms of the instructions for deep dataplane applications. You can get a 3-to-1 improvement in energy consumption.
Kulkarni: You are doing so many tradeoffs at ESL that you are purposely making assumptions based on how long it will take. If you add more accuracy to ESL what that means is you are really adding synthesis. That will explode the runtime. The designs are getting so complex that for the next tablet there will be 1 billion gates. That’s an unheard of number for mobile applications. To do a what-if analysis for that kind of application it’s critical that it gets done quickly. In GPS vs. the iPod, for example, it’s critical to determine what kind of stimulus can be provided for ESL. That’s the missing link for root power analysis. It’s better to optimize timing and area, but it gets more difficult to optimize for power. The more analysis you do, the more synthesis you put under the hood, which will expand the run time.
McCloud: I don’t think you can do any reasonable level of power estimation without some realistic switching activity represented. Otherwise your estimates will be way off.
Martin: That just emphasizes the use scenarios. We run into design teams sometimes that don’t seem to fully understand the system constraints they’re operating under. They can’t identify whether they want to operate at a 200MHz operating point and a 40LP process, vs. a 500MHz operating point. But if you target close to the edge of a process the results you get in terms of energy consumption and peak power consumption can be way off. People really understand the usage scenarios and the constraints that system architectures put on operating points and operating constraints. Sometimes that’s missing.

Using High-Level Synthesis To Manage Power

Wednesday, November 2nd, 2011

Low-Power Engineering talks with Apache Design’s Vic Kulkarni, Tensilica’s Grant Martin, Cadence’s Mike Meyer, Calypto’s Shawn McCloud and Forte Design’s Brett Cline about the need for a higher level of abstraction to optimize power in ICs.

YouTube Preview Image

Considerations For Choosing The Right Low-Power Tools

Thursday, October 15th, 2009

By Cheryl Ajluni
Regardless of what you are designing these days, one fact holds true: Your design is only as good as the design tools you use.

Gone are the days when a design could be done on the back of napkin. Today, engineers require a complex ecosystem of interworking tools to guide them through the complex design flow. This is especially true when it comes to low-power design, as its complexity now permeates every aspect of the design flow, creating challenges that threaten to derail design closure at every turn. Here, automated design tools can play a key role in speeding the design process, selecting optimal low-power architecture and ensuring design closure.

The problem, of course, is low-power or “power-aware” design tools and flows are still in their infancy—a fact that poses a bit of a dilemma for designers. Not only do they need to figure out what type of power management and low-power design techniques to employ, but they must also determine which tool vendors support those techniques. Then they have to evaluate the possible tool options and make a selection. This can be a stressful and time-consuming process, especially when you consider the decision is critical to the success of any design project and, for that matter, to a company’s overall success and vitality. While there are no hard and fast rules for selecting the right tool, or the right vendor, there are a number of considerations—over and above a tool’s verified functionality—that engineers can use to help simplify their decision. Those considerations include:

  • Cost. A tool’s actual cost and its available pricing options are important considerations when evaluating a design tool. Of course, a tool’s true cost is also impacted by its learning curve and overall reliability—both of which can affect downtime—and therefore must also be considered prior to making a tool purchase.
  • Speed. While it may not always seem like a key consideration, how fast a tool operates can directly impact the designer’s time-to-market schedule as well as overall design costs and therefore should not be overlooked. Was it designed for multicore processors, or simply updated to take advantage of them?
  • Support for Industry Standards. Using a tool built to emerging low-power industry standards, such as the Common Power Format (Cadence and Magma) or the Unified Power Format (Synopsys, Mentor and Magma), ensures that it will interoperate with a range of other design tools and flows. It is also smart to select a tool that can be used within industry-accepted reference flows such as the power-aware reference flow recommended by the Low-Power Coalition (LPC) of Si2 or Accellera, respectively.
  • Ease of Use. Is the design tool easy to use? Does it require special training or low-power design expertise? Does it make you more efficient or productive? Does it support multi-language user interfaces for globally disperse design team members and are the user interfaces familiar? Is it easy to deploy, administer and maintain? Does it integrate well with other low-power design tools and design flows? All of these factors should be carefully considered during a tool’s evaluation.
  • Flexibility. Is the tool flexible enough to accommodate changes in technology and can it adapt to changing business conditions—an especially critical question given the current state of the global economy? Can it support the needs of a globally-disperse design team with features like revision control and policy control for IP management?
  • Customer Support. How responsive a tool vendor is to the designer’s support needs can be vitally important to the success or failure of your low-power design. Does the vendor provide quality documentation, training when needed or on site technical support? Does the vendor have proven expertise in low-power design? Such expertise may prove invaluable if you find yourself facing a difficult low-power design problem.
  • Vendor Credibility. Don’t forget to verify the tool vendor’s reputation with other designers. If they have had trouble with the vendor, then chances are good that you will, too.

Design Tool Options
Despite the fact that low-power design tools and flows are still relatively new, there are a number of options to choose from. A sampling of these tools includes the following:

  • Catapult C Synthesis and SpyGlass-Power, from Mentor Graphics and Atrenta, respectively. SpyGlass-Power is an RTL power estimation and reduction tool that is used to automate multi-level clock gating. Catapult is a high-level synthesis tool that offers a fast path to verified RTL from pure C++. New low-power optimizations enable the tool to thoroughly analyze a design to determine gateable clocks and build the appropriate logic. An interface now exists between these two tools that allows RTL output from Catapult to be handed off to SpyGlass-Power. Static and dynamic power estimates from SpyGlass-Power can then be fed back into Catapult C.
  • Eclypse Low Power Solution from Synopsys. Eclypse is an integrated flow of tools, intellectual property and methodologies that allows designers to include everything from MTCMOS power gating, multiple voltages, dynamic voltage and frequency scaling. The goal is to dramatically simplify design and the increasingly complex verification portion of that design. Eclypse also includes clock gating, low-power clock tree synthesis and leakage power recovery. As you might expect, it includes UPF support, as well as support for the Low-Power Methodology Manual created by Synopsys and ARM.
  • Cadence Low-Power Solution from Cadence Design Systems. Cadence’s Low-Power Solution is a CPF-enabled design-to-signoff methodology that makes it easy to incorporate low-power design techniques in advanced SoCs. It includes tools like the InCyte Chip Estimator for chip planning, Encounter RTL Compiler for logic synthesis, Encounter Conformal Low Power for structural, functional and equivalence checking; the Encounter Digital Implementation System for physical implementation, the Encounter Power System for power rail analysis, and Incisive Formal Verifier for formal property checking (Figure 1).

cheryl1

Figure 1. The Encounter Power System solution accelerates power optimization and signoff with a unified timing and power database. It can be used by front-end logic designers seeking high-quality early power and rail analysis, as well as by back-end physical designers looking for comprehensive signoff analysis and silicon-correlation.

  • PowerPro CG and PowerPro MG, from Calypto Design Systems (www.calypto.com). The PowerPro CG tool reduces power by implementing sequential clock gating logic in the non-memory portions of an RTL design. PowerPro MG is a memory gating tool that automatically generates power-optimized RTL by taking advantage of the low-power modes available in on-chip memories. It works with PowerPro CG to produce the lowest power design possible.
  • Talus Implementation System, from Magma. The Talus implementation system provides a fully integrated RTL-to-GDSII flow for high-performance, high-complexity, low-power nanometer designs. Talus Design and Talus Vortex are key tools in the system. Talus Design is a full-chip synthesis environment, while Talus Vortex is a physical design environment. Another tool, Talus Power Pro, works in conjunction with Talus Design and Talus Vortex to enable optimal power management throughout the flow.
  • PowerArtist-XP and PowerTheater, from Sequence Design (now part of Apache). PowerArtist-XP is an RTL Design For Power (DFP) platform that features fully-integrated advanced analysis and automatic reduction (Figure 2). Using it, designers can achieve a 10 to 60 percent or more power savings. PowerTheater is a solution for RTL power analysis.

cheryl2

Figure 2. PowerArtist-XP enables designers to make intelligent design decisions that maximize power savings while minimizing design impact.

The Bottom Line
While designing for low power remains a difficult and complex challenge these days, appropriate use of low-power (power-aware) design tools can help simplify the process. Such tools will only become better and easier to use with time. Of course, selecting the right tool or tools is absolutely critical to a successful low-power design, perhaps just as critical as determining which low-power design and power management techniques to implement. While there is no set criterion to follow when making this decision, the considerations outlined above can serve as a guide in helping to make your decision that much easier.