Posts Tagged ‘Tensilica’

Next Page »

Experts At The Table: Making Software More Energy-Efficient

Thursday, January 12th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: Software is causing as many problems in low-power designs as anything else. How do we fix that?
Neifert: Software is creating problems everywhere because software is driving everything. Customers are using software for more and more stuff, including software-driven verification and architectural analysis. It’s only natural to take traditional back-end tasks such as power and move those forward to enable that analysis to be done earlier in the process and see how the software will impact the system. As software increasingly dominates power usage, because the processor and IP guys are getting smarter and smarter about turning things off when they’re not being used, you can’t just blindly use some verification vectors. You need to really see how it’s being used by the software. That requires virtual prototypes in conjunction with power tools.
Hardee: The software is controlling everything, but in recent years we’ve seen an explosion of applications doing all kinds of things you want to do on mobile devices. One thing we see people struggling with is the best way of optimizing for power for different applications, which have different needs. If you’re watching video, you have a frame rate to deal with. Rather than shut down, you’ll probably want to optimize framework lengths or run as slow as possible and make sure memory transactions are being accessed from cache. That’s completely different when you’re on the Web, where you want to run as fast as possible to get a good image and shut down. What you’re actually doing with a device will require very different power strategies.

LPE: So how do you analyze that?
Hardee: You have to run a lot of cycles in the various system modes to be able to model it. That’s where it really gets very disconnected from the test vectors in logic simulation. You can’t use those anymore. You have to have a set of vectors that is representative of the various application modes you have on the device. That’s a big change for a lot of people.
Kaiser: If you’re going to have software guys doing power optimization, you need to get them some kind of metric to measure or estimate the power. If you go to a software engineer today and ask them to optimize power, how do they know if they’re doing better or worse? Giving them a power meter helps. Giving them a power meter that self calibrates helps a lot more because the software guy really doesn’t know what calibration is for A-to-Ds. He sees negative current and figures he must be doing something wrong because he’s charging the battery. First give these engineers something to measure. Second, when you’re doing a product you have to identify all these different use cases. Internet surfing is one. Video playback is another. Software teams need to optimize individual use cases. But to do that, you have to figure out how much you’re going to be doing of what. A cell phone is sitting in idle 99% of the time, but 99% of the energy is not used on standby. So you have to look at the different use cases and figure out which use case you really want to optimize.
Kulkarni: What we’re really concerned with is energy consumption over time. Instantaneous power, dynamic power and static power are well known, but energy consumption over time is where software enters the picture and turns it into system-driven power consumption analysis. If you look at power times time, it’s the clock frequency and the duration of the cycle. That’s where the most of the applications are causing all these headaches. Why does a GPS application versus YouTube versus music have so many energy profiles? We can control instantaneous power and dynamic power quite well at RTL and below. Static power can be controlled quite well. But testbenches that look at the overall functional verification are not relevant anymore because you need to look at the states, too. And depending on which application is used, the energy will be different. We need to instrument that with respect to the power states. How you create the right test vector set or testbenches becomes a real challenge. Then, looking at average power and average voltage versus average current over time. That’s where the issue of co-simulation comes in. Running the complete simulation on a virtual platform becomes more interesting. At the moment, instrumenting the software is not possible.
Rowen: This is a serious problem. One problem is that the software guy has a simplified view of all the clever things the hardware guy did to make it possible to reduce the power. The hardware guys are obsessed with power these days. Not all of what they come up with is good or practical, but they are at least thinking about it. They have power modes, techniques for certain operations or instructions to meet power requirements. That may be pretty far removed from what the software guy, who’s on the front lines of getting a product out the door, has visibility into. It’s made worse by the fact that most the tools people have work well in a simulation environment, but often what happens at the end of the day is you have a programmer, a prototype board and an ammeter. It’s a very crude picture of what’s going on. The poor software guy is trying to figure out what he can do overnight that will move the ammeter.

LPE: We have no standard gauge for software. It varies by application, middleware and operating system as well as by the usage from one person to the next. How do we deal with this?
Hardee: If you look at the need for system vectors that reflect the application you’re running, that becomes a problem when you’re trying to provide a power meter for the software engineers. That works well if you can run actual software against the target device, which is the only chance where you can run it fast and get accurate readings. You need accurate vectors and you need accurate characterization to get any sense of power or energy. The difference between power and energy is you need to know over time what the system is doing and model that correctly. Virtual platforms have some potential to help, but they’re problematic. If you’ve got the kind of virtual platform that runs fast enough to make a software engineer happy, you’ll be modeling that at an untimed level. At the untimed level the virtual platform is instruction accurate, so it’s getting its timing and instruction cycles from the processor. If you think about what’s going on with software and the choices to process things, can you do it from cache or do you have to go out and fetch it from some other level of memory. Those have a huge impact on the number of clock cycles, rather than instruction cycles, that it takes to perform those tasks. So the point is that you need a lot of timing accuracy before you can get any kind of energy accuracy. That’s difficult to build into a virtual platform.
Kaiser: You don’t need the actual numbers. You just need to know if it’s getting better or worse. You can give the software team a relative number. Second, you can start doing estimations. With an MP3 you want to know what your cache-miss ratio is.
Hardee: You need to start to measure the things that you know drive high energy usage, as opposed to measuring the energy usage itself. When you have 400% or 500% difference between cache and memory, it’s hard to put different algorithms in the right order. You don’t even have the relative accuracy you’re looking for.
Kaiser: So are you looking at platform-to-platform comparisons? I’m thinking you take the platform and get the software guy to make it as best as it can be.
Hardee: You’re coming it at from the standpoint of post-hardware. How does the software guy optimize his programs?
Kaiser: Or even if you don’t have the physical hardware yet.
Hardee: I’m approaching it from the view of the system architect designing a new system. How do they know they’re going to meet the power spec? If you’re rendering graphics versus video, you have to be running the right algorithm on the right core. There are multiple choices, and you have to figure this out even before you measure things relatively, let alone absolutely.
Kaiser: That’s is a system architect challenge, not a software challenge. The most the system guy can do is identify the best possible scenario. The software guy may or may not come close. Sometimes it may not even be possible.
Hardee: And you can really mess up the software. The system architect does have to make an assumption what he’s building into the system architecture will be used efficiently by the software guys.
Rowen: You really want someone who has a deep understanding of what instructions to use, what compiler flags and power modes should be used, and what is the realistic scenario that will contribute to the worst case. In general, things are better when more things are programmable. The worst thing is where the controls are inside some obscure, hard-wired function unit. We had a big customer recently that had trouble meeting power goals. The fact that they were using programmable audio made it a lot easier to come up with another way to buffer the data and initiate the applications.
Neifert: When you have the chip, at least you have an ammeter sitting there. Before you have the chip, the software guy is in the dark. He often doesn’t have any indication of what’s going on in the system from a power perspective because there isn’t much there to tell him that.

Making Software Better

Wednesday, January 11th, 2012

Low-Power Engineering talks about what will make software more energy-efficient with Pete Hardee, marketing director at Cadence; Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior VP and General Manager of Apache Design, and Bill Neifert, CTO of Carbon Design.

YouTube Preview Image

Put Low-Power, Low-Overhead, High-Fidelity Digital Sound In Your Next ASIC Or SOC

Thursday, December 1st, 2011

High-quality audio creates an immersive experience that excites buyers and spurs the purchase of consumer products such as home theater and PC sound systems, flat-panel televisions, handheld and console video games, portable music and video players, and mobile telephone handsets. As a result, digital audio has rocketed to the top of the critical features list for all sorts of products over the past several years. At the same time, the number of digital audio codecs (coders and decoders) and audio-enhancement programs has exploded and most consumer products must support multiple codecs and offer a broad range of audio-enhancement features.

To download this white paper, click here.

Energy Vs. Power: Energy, Power Optimization Is A System-Level Challenge

Thursday, December 1st, 2011

By Ann Steffora Mutschler
Power issues today, whether they are related to low power in a smart phone or highly efficient power for data center applications, are so pervasive that they touch the entire design team—and must be carefully prioritized at the system or architectural level.

As discussed in Part 1 of this series, energy and power are different entities and must be understood distinctly from each other. After that engineering teams can apply design techniques to optimize one or the other in a system. That was addressed in Part 2 of this series, including how the issues differ when optimizing for either power or energy efficiency.

For most engineering teams, the easiest place to start looking at either energy or power efficiency and prioritizing is on the hardware design side, because that is most familiar.
According to Chris Rowen, founder and CTO of Tensilica, engineering teams start the process of prioritizing by understanding what their constraints are. “Are they really interested in maximizing battery life or are they really, for example, working to stay within a thermal envelope? That will determine a lot of what they choose to do.”

Second, centrally important and increasingly well recognized by people designing battery-operated devices is they look at scenarios. “They say ‘I want to be able to play a full length video on a battery charge,’ or ‘I want to be able to listen for 100 hours to MP3s,’ or ‘I want to have this much talk time,’” he said. “Those all use, quite often, very different subsets of the hardware and run very different algorithms so you can’t lump them all together and say, ‘My chip dissipates 3 watts,’ or even, ‘My chip dissipates .3 mW per megahertz,’ because those are too coarse of a characterization. You have to say, when doing this task (decoding video, decoding audio, running the voice stack and wireless protocol, you have to characterize each of those things individually and therefore go through a fairly complete assessment.”

The assessment needs to determine which sections of the chip are active and what the flows are at the chip interface including the activity in the memories, and how much the power amplifiers are turned on. “All of those things will play a central role in assessing what the power is for the different scenarios. Anybody in the cell phone business or anybody who has battery life as a central concern is working on different scenarios and needs fairly detailed information,” Rowen noted.

These issues must be dealt with in both hardware and software, with a broad suite of techniques. From bottom to top, the first determinant is process technology—what the transistors are like, how leaky the process technology is, what the fundamental drive versus power is for the transistor family in question.

“Moore’s Law silicon scaling has helped enormously because we’ve been able to scale down voltage and scale down capacitance and scale up speed, so power and energy characteristics have been just terrific,” Rowen said. “That of course has slowed down. In particular, the leakage characteristics of these process technologies—that is, how much power is dissipated even when the transistors are off—becomes a lot worse. People have to work pretty hard on that question, and it has led to much more interest in, for example, power gating in these designs. When your scenario says that when you’re not using a particular subsystem within a chip that you not only stop doing work on it, you not only turn off all the clocks to it, but you actually remove the power to it, so it really does dissipate no power. It does mean that you have to plan ahead a little bit because restoring power to a subsystem typically takes more cycles than just turning the clocks back on. In the past you might have gone into standby. Now you go into hibernation.”

After process technology, the next level up in gross terms would be the logic design, the microarchitecture, and what’s going on cycle-by-cycle in some block. In the case of processors, they are particularly well understood in terms of their power characteristics and they do represent a significant fraction of the power. Processors and memory together tend to represent a lot of the power in many chips. Here, it is a function of how to get the work done with the smallest number of gates and shortest wires in the design.

Following logic design, the key tradeoffs at the next level up—architecture—include the sequence of operations and how algorithms will be mapped into the cycle-by-cycle computation.

“You definitely want architectures that can cope with a multitude of things,” noted Pete Hardee, director of solutions marketing at Cadence. “We are starting to see those kinds of differences in the applications that we have to run. The macro architecture decisions that drives are typically that a multiprocessing architecture is needed and often it’s heterogeneous multiprocessing. I need different kinds of processing engines to be able to cope with those two different things. So I’ve got a range of stuff and I may well have more processors in the platform than I need, but that’s okay because I can shut them down when they’re not needed so they are not burning energy. I need a range of processors, a heterogeneous multiprocessor macro architecture, which enables me to cope efficiently with these different processing tasks I have to do. That multi-processing architecture gives me some different configurations where I can process really fast or I can use parallelism to do that or I can take my time in different circumstances—do it all on one processing engine and take a little longer.”

This comes to bear on the memory architecture, as well, in terms of caching, he said. “Can I predict what algorithm I’m going to need next? Or can I predict what data I need next, and where is that coming from? This affects decisions like preemption and also the caching algorithm that I need. So those are very much processing and memory architecture decisions that affect how efficiently the software is going to run in those different cases.”

While the hardware side of the system is challenging, the software aspect of today’s complex systems is equally so.

Mark Mitchell, director of embedded tools at Mentor Graphics, related an example. “I was talking to one of our internal engineering teams. They had a customer that had tracked a problem down in a satellite. Because of a software problem on the satellite they were using the memory hierarchy inefficiently and that was generating so much heat it was causing faults on the satellite.”

This is where system-level design and integration become quite obvious—particularly when things go wrong. “There was a hardware platform that was within specs, but when you have this sort of thing that on the ground in a controlled lab environment you would have been just fine,” Mitchell said. “Out there in space you’ve got a physical effect that was no good. The problem was that because of the way the system was configured, the cache wasn’t being used efficiently. As a result, the main processor had to go out to get data from the external RAM a lot more often than it should have. So the program is working, everything is operating correctly, but you’re generating all this traffic. One of the interesting things is that memory uses up a lot of power, so by getting the data from the external memory all of a sudden the power consumption and heat levels on the thing were going up significantly.”

Knowing how to prevent future problems by anticipating them from the very start might sound like a good idea. But complexity can make it much harder to practice.

“I’m not sure if anybody had ever said to the team designing this particular piece of hardware, ‘Hey here’s what your thermal limits are—this is the maximum temperature you can reach.’ It’s possible no one even thought to ask that question,” Mitchell observed. “If you do get a requirement like that and pushing it down–—and to me the interesting thing is how it gets into the software—pushing it down to the software guys is really hard because software engineers don’t think about heat. They think about ones and zeros.”

He believes there are two problems here. One is adding some level of awareness among software engineers that they have to think about these things. The second involves ways of getting visibility into the system because you can’t see where is this heat coming from. Why is it hotter than you expect? These aren’t easy things to get intuition around.

To solve the satellite problem, Mentor’s Vista group in Israel built a virtual prototype of the board for the customer so they could get visibility into the system that isn’t possible on a satellite that’s actually flying. “Even on a ground model you would have a hard time because some of these things that you want to look at like the cache transactions aren’t exposed from real running hardware. You can’t see what’s going on, either in software or even with a physical probe connected to the device. So they were building software models that they could run that could communicate more information,” he said.

Still, at the end of the day the question of whether it is better to optimize for power and energy efficiency in software or hardware is not easily answered.

“The questions get very complicated and they really are application dependent. You can’t get to the right answer without understanding the application, the demands on the system, the overall performance requirements of the system both from timing performance and power performance to know what the right way is to do it,” said Jon McDonald, technical marketing engineer in Mentor’s design creation synthesis group. “A lot of people pick software just because they think software is simpler: ‘It’s easier; I can change it.’ But it’s going to take longer generally to do it in software and depending on what else the processor is doing. It may actually take more power to do it in software.”

To be sure, the white board, block diagram and spreadsheet system no longer works. Engineering teams today need a dynamic execution model that can run software with hardware, get some quantitative feedback on the performance and power requirements of the system that get to the energy of the system and make some decisions about the architecture before going into implementation.

“It’s better to do it before you’ve made the decision between software and hardware. It’s better to do it at the system architecture level with abstract representations of the functions so that you can model things before you decided if an algorithm is hardware or software. At the transaction level I can take an algorithm, a C or C++ function and I can compile that to a target processor or I can wrap that with a transaction-level interface. I don’t have to change that function at all and I can create a model that represents that function and accurately predicts the power and performance of that function running on a particular ISS or running as a hardware accelerator interacting with the rest of the system. By doing the analysis at that level, it’s not a hardware problem. It’s not a software problem. It’s a system problem,” McDonald concluded.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Friday, November 18th, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: What’s missing from our tools arsenal? If you have enough experience you get a feel for what works, but how does the average engineer get there?
Martin: You need to build some tools that can actually do high-level system bookkeeping. That’s really what it’s all about. When you have a power estimation or energy estimation, it produces traces of activity that can feed into an overall system model. Various IP models could all be calibrated to send out that information. People could run scenarios and estimates that way. It still seems to be missing. I recall talking to people about this two or three years ago.
Kulkarni: The industry seems to be doing a lot of optimization techniques in ESL, but not a lot of analysis techniques. We have been doing that for several years at the RT level. A lot of us can do RTL power optimization. But from an analysis standpoint there are very few who can do it. It takes so long to be accurate about how to stimulate RTL without having a synthesis engine underneath. ESL synthesis experts still don’t have the analytic engine for ESL. That’s a missing piece along with ESL power models.
McCloud: We have very good point tools out there today. We have good HLS tools doing very complex, high quality hardware accelerators. We have some pretty good power optimization, power analysis, power optimization and integrity tools. What’s missing is a way of productizing the integration of all those tools. It’s standardizing UPF and CPF and getting that propagated through the tools. And then what’s missing on the TLM side is better standardizing around annotating the power and getting the right fidelity to the TLM models. That’s one of the primary purposes of the TLM model—to be able to run real software on it and get accurate estimates. What we need to do is put these good tools together.
Meyer: And we need to recognize that it’s not just an issue of hardware vs. software. There’s also the issue of multiple cores. People have the choice to have some stuff run in a high-speed core and other stuff run on a lower-speed core. Those types of decisions don’t come easily. There’s a fair amount of effort you need to put together a prototype so you get an idea of what the power is if you have two cores vs. one core and an accelerator. There’s a lot of modeling that has to be done to come up with an answer there, even with very early estimates.
Cline: In the case of people with 30 years’ experience who have to trade off between one core, two cores or four cores and the custom logic that goes around it—there are very few of them. You won’t sell a lot of tools into that market. Most of the people with 30 years of experience are grinding RTL everyday or writing software. For EDA vendors, you get paid for optimization with an extra zero vs. what you get paid for analysis. It’s the way the world works. If you have a tool that squeezes out an extra 10% at the end of the process, you get paid for that—especially if you’re putting out a fire where you don’t meet timing or something else that’s critical. What’s the time value of a week over the course of a project? At the beginning of a design a week isn’t worth anything. At the end of the project the value of a week is huge. If you sell a product into the last week of a project and it squeezes out another 5% or 10%, you’re a hero.
Martin: On the other hand, the decisions you made in those first few weeks may cost you downstream. There are a lot of design teams out there where it seems there is very little accumulated experience and they’re confronted with these problems and very rapidly trying to do design. Stepping back from the tools and the analysis, you have to ask, ‘What is the overall design methodology?’ Do you have that experience taught to design teams to make choices about how many cores, what kinds of cores, what kind of hardware blocks, and do they all fit together? That architectural expertise has been gained by years of experience.
Kulkarni: It’s almost as if power is where timing was 15 years ago in terms of the knowledge base.

LPE: Except that the people who know timing now need to know power, as well, right?
Kulkarni: Yes. It’s not just mobile or cloud computing. Everything is focused on power, from disk drives to memory. Customers that have 65 watts per chip want to move to 60 watts. They want to move from 5 milliwatts to a few milliwatts. The knowledge of power is limited, though, in terms of power management, power optimization, as well as certain decisions that have an impact downstream. How do we, as an industry, get power analysis and power decisions to be pervasive? Even from my own company we have not given a complete recipe for how to do RTL to GDSII design. It’s in many people’s heads. The end customer also changes their mind on the fly, but at least in 80% of the design, if we can all produce a recipe book then we all benefit. UPF and CPF have started that work already. But when you go to a customer, typically they have some used IP and some new IP. UPF/CPF may apply to the brand new IP but not the old IP, so mix and match flows are another challenge. How do you make sure the previous design worked and certain parts of circuits work in the new design?

LPE: What you’re talking about is flexibility in modeling, right?
Kulkarni: That’s correct.

LPE: HLS has been around for more than a decade and it still isn’t mainstream. Will power force a change in perception?
Cline: It’s certainly going to be a factor. But what drives it is the ability to get your job done in the right amount of time.
Meyer: Yes, it’s time to market.
Cline: It’s also time to results. You have to hit some sort of metric for your results. I was just in Japan and the concern there is how they’re going to get $5 per chip. To do that they need twice the number of features and speed. So pick your favorite HDTV company. If it’s not Visio, then Visio is undercutting them by 50%. How do they get their chips to where they’re competitive?
McCloud: I wouldn’t underplay the potential significance of power becoming a critical factor for people adopting HLS. We’re just now scratching the surface in terms of what we can do in HLS. Memories are consuming 60% of the power in a typical HDTV. There’s a whole slew of memory optimization we can do around the way we slice the memory, around memory-enabled gating, light-sleep mode, deep-sleep mode. Those are things that are perfectly suited for an HLS tool, which has a detailed understanding of the state in the design of the data path. The tools already have sequential and combination clock gating. But as we start to go past 45nm, it’s not just the battery life. It’s also thermal and power integrity issues that become critical. That’s when HLS will really become close to a requirement.
Martin: Any technology that lets you explore the design space for different alternatives, whether it’s HLS or configurable processors, if you’re keen on performance you can examine more alternatives more quickly. If it’s power and it really does let you explore that space, that’s also key. If it’s just within a few percentage points of what you do in RTL by hand, that’s not going to drive that market. It has to offer a wide margin.
Cline: It’s also what’s called ‘change and check.’ You can change something very quickly and check what the results are. We see a lot of engineers that need to change a number of things and then check them, and then change and check more. You can’t do that in RTL. It gives them a whole other set of options.
McCloud: One of the things we really need to do is shrink-wrap the methodology. Then the smaller companies can pick it up and run with it. Right now it takes a big company to put that investment behind it.
Cline: I agree, except that the very small companies use it because they can’t get the funding to go build a $50 million chip. The middle of the curve are the guys who can’t move just yet.
Kulkarni: The ideal solution for designers would be HLS for optimization of power, timing and area, and then quickly checking against the RTL power analysis. The reason is that at the RTL level you can capture physical effects. That becomes a linchpin from the physical world to the ESL world. But you also have to have good power models.

LPE: Don’t you also need more standardization?
Meyer: You certainly need to be sure that you’re not double counting. A lot of times you’re modeling more than just the software, and you’re trying to estimate the power there and you’re starting to include what’s in the memory and the cache. And then you start looking at the cost somewhere else, and you’re adding the cost in again. You have to have a way to say, when you aggregate it, how do you make sure it’s not double-counted.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Friday, November 11th, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Does the approach to using tools have to change for low-power?
McCloud: Way too often, when people get to the block level, they start saying, ‘I’m going to use my functional tests.’ That’s completely wrong. That’s one of the big advantages of doing things at the higher level. You’ve got things that are closer to the real application running. That’s an incredible difference.

LPE: But how do you integrate that concept with different use models? Two different users may use the device for completely different purposes and in different ways.
Cline: You can go through the user profiles pretty easily. If Apple claims its new phone will have eight hours of talk time, that probably doesn’t include me not talking on the phone at all. I’m just going to surf and get e-mails and do other things. They understand one profile, and then there’s probably a mixed profile for the average user. You can conceptualize that at the system level when you’re doing your design, figuring out if the phone is running for eight hours straight here’s what’s probably going to happen on your next project. But you need to get down to something measurable, which right now is the RTL level, and even that is questionable in some situations. But it’s a tough problem.
McCloud: Today, the cell phone is one of the more common applications that high-level synthesis has to do with the image single-processing blocks. A large number of these ISPs—everything from the sensor to the image correction that’s occurring to the final JPEG decoding—are done with high-level synthesis. The reason is that having a dedicated hardware accelerator is going to produce the lowest power needed for doing image signal processing. It’s a very specific function. You take a picture, you do image correction and processing, you compress it, store it and you’re done. You shut the hardware down.

LPE: High-level synthesis isn’t normally associated with power estimation. Is this a new use?
Cline: It’s been there to some extent. Power estimation comes in a number of different forms. Let’s say you’re laying down a clock cycle and designing it in high-level synthesis, and the tool is doing the work to fill it up. It knows that a multiplier is going in. It can do quick analysis on a multiplier with a reasonable stimulus to figure out whether this multiplier has a better power profile than another one, as long as they both meet your performance goals. You can trade off the area with them, too. One of the things we can do is go through this process of scheduling the clock cycles, filling it up with all the different functional units that are going to go in there, and determine at a later stage in the synthesis process which are the lowest-power multipliers. Can you swap those in and still meet all your timing budgets? If you blow timing then you’re tool is useless. You have to meet timing, and afterward you minimize area and power and let the user make tradeoffs where they want to. It’s been there for a long time with various levels of maturity. What you’re going to see is that will continue to mature, but you have limited benefits at that level. You can make 5% and 10% improvements at that level. You can’t make 50% improvements.
McCloud: It’s evolving. Power has been in HLS for awhile. It’s being used to design applications for lower power. The first stages of this centered around simple exploration. You take a JPEG and your requirement is to compress that picture in 500 milliseconds. You can do that with a 25MHz clock, a 50MHz clock or a 100MHz clock. Each one of those has very different power tradeoffs. That kind of capability has existed in HLS for years.

LPE: But you’re coming at it from the standpoint of clock speed compared with power first.
Martin: That’s only true if you have a fixed-instruction-set processor. When you design your own instruction set you’re in a whole new ballgame, which is one reason we’ve been supporting high-level analysis and estimation in the design flow for a number of years. The power talk is interesting because of the confusion people have between power and energy. In mobile devices, assuming you have driven toward a peak-power level that is sufficient for low-cost packaging you’re targeting in your device, it’s all about energy. Energy is the issue here, and there are many ways to fit into a particular energy budget, but so often people express themselves using metrics such as milliwatts per megahertz. What does that mean in terms of the overall energy? It depends on what a milliwatt does. If you target one type of instruction set where you can do more in one cycle than in another, that reflects in the total energy consumption.
McCloud: For me it all boils down to battery life. That’s what matters. There’s a second component that is centered around power and integrity. The reason it’s so important is that when you reach 45nm we’re starting to reach a technology inflection point where you cannot scale the supply voltage any more to help reduce the power. At about 45nm or lower, the power density is goes non-linear. This is going to create huge problems around thermal and supply integrity. You’re going to start getting a Vdd dropout, you’ll have hot spots in your chip, and the chip will burn up. It’s not just about battery life. It’s how we’re going to be able to take advantage of these technologies in the future and be able to produce these chips in a way that power density doesn’t go through the ceiling.
Kulkarni: Specifically what we’re looking at is how do we get the high-level power support budgets versus the power consumption, which is insatiable demand of all functionality and multiple modes of operation. How do we make sure that the power grid we’ve designed will work? We’ve been watching that stimulus carefully. But what happens to that stimulus out of millions of clock cycles. There are things in the context of dynamic voltage, voltage route, and the package, and the PCB, and the system. You have a band of inaccuracy first, and then you look at the energy models and what happens over time. How do you capture those when you are switching between a lot of domains and there is a lot of switching activity? How do you model that accurately? And how do you model the physical effects at a higher level of abstraction so that your inaccuracy band gets narrower and narrower. We especially see that below 28nm, where there are huge transients causing voltage droop. Either your grid will collapse if you overdesign, or if you underdesign you will have electromigration problems with power energy and heat all coming together.

LPE: What you’re talking about here is a hierarchical flow with two-way communication, right?
Kulkarni: Yes. And the reason we have not done too much power synthesis at the ESL level in the past is that when you make a transaction-level model, how do you go inside that? Power creates a different level of challenge. The industry really needs to address how to create these high-level models that will tune into what happens down at the chip level. And then you have to connect the front end to the back end and get to the details of power in both directions.

LPE: How do we fix software to make systems more efficient?
Meyer: You’re presupposing that you have software at the time of the design. That’s one of the biggest challenges. That’s where virtual platforms become an important part, running real software on the system. That’s one of the real challenges at the system level—to have something you can run in software early enough to influence the hardware decisions that you’re making.

LPE: Do we need an understanding of the software and how it’s going to function at a very high level, though?
Meyer: For some cases, if you could characterize how much the software is using each of the blocks and be able to understand system performance without detailed modeling, that would help you understand your power budget and do a better estimate. But we really haven’t spent much time working at that level yet.
Martin: We spend a lot of our time working at that level. Letting our customers build configured and tailored processors is extremely important. Sometimes you can do an interesting job by taking referenced or standards-based software, and if you have a very good compiler for an application-specific processor it may be able to do things like automatically vectorize and infer the use of some fairly sophisticated instructions. That has a limit, though. To really figure out what an optimal algorithm implementation would do on a particular instruction set you may have to get into more manual optimization. But some of the early work can be done with an envelope if you have a good targeting compiler.
Cline: A lot of our customers have that exact issue. With cell phones they may look at their next platform and say, ‘This time it’s going to have real-time video on it. Can my ARM processor run this real-time video algorithm at the same time it’s doing a network connection to beam everything up while it’s also downloading e-mail?’ And they may not be ready to buy the next ARM processor. So if they put this into custom gates, what is the cost in terms of area and power and what is the speed? They do that initial analysis using high-level synthesis and figure out what the tradeoff is. A lot of times they can buy a bigger processor and take on more royalty cost or power issues. So in some cases they have the software already, or at least they know what it’s going to look like. In other cases when you build a bigger system you may not have that.
Kulkarni: One of our customers who was designing a digital TV asked us whether they can profile the software for power. It’s an interesting question. With digital TV your eyes are pretty much looking at an oval field picture, so can you do power reduction on the black pixels on the edges? That’s pixel-by-pixel power reduction. That’s a great challenge for all of us. It’s not just mobile applications. It’s also digital TV, streaming video, heads-up displays for military applications, and so on.
Cline: Those guys don’t care about battery life, which makes it very interesting.
Kulkarni: But they still want to reduce the power.
McCloud: It’s all about the packaging cost.

Experts At The Table: Managing Power At Higher Levels Of Abstraction

Thursday, November 3rd, 2011

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Do we need to redefine what constitutes a system when we’re talking about power?
Meyer: It definitely is part of a bigger picture. The problem is that there is very little support up at those levels. You’re managing budgets and it’s a crude mechanism you need there. As we go forward things like virtual platforms and power—or maybe more on the energy level where you’re looking at where power is spent between hardware and software—and making decisions on how things should be implemented makes a lot of sense.
Martin: We’ve supported power modeling in our application-specific processor design flow for a number of years. The kind of power modeling you get can be quite accurate for the instruction set. It’s reasonable to allow you to make energy tradeoffs with the instructions you have. You can then run that set of instructions and predict what the energy consumption is going to be, and then you can do experiments around that. We feel like we’ve been a plug without a socket for years, at least for determining the right instruction set.
Kulkarni: In terms of how we see the world, it’s more than absolute accuracy and prediction. It’s really relative accuracy and relative analysis of what-if scenarios. That’s where the whole world is going in terms of macro architecture analysis. But the Holy Grail is ESL synthesis with RTL power analysis. That’s the ideal flow—you capture some of the physical effects in the RTL world but do a lot of what-if tradeoffs, hardware-software co-simulation, DVFS, looking at various power scenarios, and then validate that through a real hardware description language because that’s where all the realization will occur. But you can do all the relative accuracy of power up at the ESL level. That’s the picture we see from our end customers.

LPE: So it’s big-picture power estimation?
Kulkarni: Power estimation is still questionable at the moment. But getting traces to drive the RTL power analysis is a much better approach, which we have used in a mobile application. We took a 2.5 million-gate design and applied SystemC-level simulation, then did ESL with a partner, and then looking at the IP and DSP models there were multiple cores. We took the transaction-level trace and combined it with RTL analysis to essentially emulate what would happen on a six-second call on a cell phone. We could analyze that in four hours using this flow vs. three months using pure RTL analysis and RTL power estimations. So a combination of ESL synthesis plus RTL analysis that captures a realistic stimulus and the physical effects can reduce that band of accuracy from RTL to gate to final signoff.
Cline: Today that’s a typical flow. There isn’t any real SystemC analysis—or good ones, anyway. But as far as optimization and estimation go, these are two separate worlds. The first one is that somebody has to get the power right on a macro level. You need some way to model those larger blocks and get your power budgets right at the block level, which may have only 10 blocks in your design. From there you need to go to power estimation. You go through synthesis, go to RTL estimation, and then loop the information back into your system level. There has to be some sort of modeling at the higher level, with more parameters than just the performance numbers. There has to be some other quick estimation of area using a synthesis engine and quick estimation of power using a combination of synthesis and techniques through application-specific processors or RTL.

LPE: How accurate do the initial high-level power analysis or estimation have to be?
Martin: Our experiments with our processor estimation have been that for the RISC core you can be plus or minus 3%. For other things it can be more like plus or minus 20%. The key is to make macro tradeoffs. It’s not whether this is 5% better than that one. It’s whether this takes you in a different space than that one, and for that 20% or 15% is an adequate coarse-grained analysis. People always want to verify at RTL and maybe down to place and route that the decisions they made at the high level are being validated at the implementation level.
Kulkarni: What we find is a band of inaccuracy, as opposed to absolute numbers. About 30% is adequate as you go through RTL, RTL synthesis, P&R, layout, and final grid signoff. If you keep that band and narrow it down consistently, you get a true power budgeting solution at the system level. That includes hardware-software tradeoffs, and RTL to synthesis all the way to grid design and package. That way the ESL designer working on the next-generation smart phone is not completely off the mark in terms of final cost, power and SI budgets.
McCloud: From a high-level synthesis perspective, accuracy needs to be quite high. At the end of the day you’re doing hardware architectural exploration between different frequencies and technology. If you are off 20% or 30% when you’re trying to make your design selection it’s significant. It does come down to the accuracy of the up-front power estimation tool when you get closer to the hardware you’re trying to create. If you’re talking about something before that in the TLM platform space, at that level when you’re making decisions about whether to move something into software or keep it in hardware, accuracies in the range of 30% or 40% are sufficient. But if you’re creating a hardware accelerator, you need to be within 10% or 20%.
Meyer: If you’re consistently overestimating by 20% and you know that’s happening, that’s much more acceptable. But if you’re plus 20% in one area and minus 20% over here, then your confidence disappears very quickly.
Martin: You have to be monotonic. If the estimator says A will be greater than B, then by the time you get detailed analysis A had better be something greater than B—even if the actual numbers aren’t the same.
McCloud: And that’s the problem. If the estimations in high-level synthesis are off by 20% or 30%, you run a high degree of risk that your relative comparison between one solution and another solution is not the same relativity when you go down to RTL synthesis at the gate level. If you think solution A is 5 milliwatts and another solution is 7 milliwatts, when you go to actually implement that and run it through power estimation tools at the gate level, you might find that comparison is correct. That’s why I believe you need a relative level of accuracy.

LPE: Is it harder to get an accurate assessment as we start adding in multiple power islands, voltage rails, and stacked die packages?
McCloud: It might get a little bit easier. Of course you need to be able to architect your high-level synthesis tool to be able to take into consideration that you’ve got islands, but in some respects you’re localizing the power estimation to a particular region of your design. When you’re talking about power gating of an entire hierarchical block, that’s actually a benefit when you localize it to a specific area.
Cline: The problem that high-level synthesis tools will have in the future, if there’s not a closer correlation to the back end, is having a disconnect at 20nm or 1nm.
Meyer: And having some way of passing down what you think is a good implementation for this piece of it. There has to be something to express, ‘I expect this to be a high Vt and this to be a low Vt.’

LPE: Does it make it harder to pick which processors and which IP and which interconnects you’re going to use because you are running at such a high level?
McCloud: The further you get away from the silicon the greater the impact you can have on power. At the gate level a power expert can save the design 10% to 15%. If you get up to the TLM decisions with software and hardware, you can achieve huge power savings. Maybe you only 30% accuracy, but the decisions you make can have a bigger impact.
Martin: That’s where configurable processors can open up a whole new area. You can get a 10-to-1 improvement in performance in terms of the instructions for deep dataplane applications. You can get a 3-to-1 improvement in energy consumption.
Kulkarni: You are doing so many tradeoffs at ESL that you are purposely making assumptions based on how long it will take. If you add more accuracy to ESL what that means is you are really adding synthesis. That will explode the runtime. The designs are getting so complex that for the next tablet there will be 1 billion gates. That’s an unheard of number for mobile applications. To do a what-if analysis for that kind of application it’s critical that it gets done quickly. In GPS vs. the iPod, for example, it’s critical to determine what kind of stimulus can be provided for ESL. That’s the missing link for root power analysis. It’s better to optimize timing and area, but it gets more difficult to optimize for power. The more analysis you do, the more synthesis you put under the hood, which will expand the run time.
McCloud: I don’t think you can do any reasonable level of power estimation without some realistic switching activity represented. Otherwise your estimates will be way off.
Martin: That just emphasizes the use scenarios. We run into design teams sometimes that don’t seem to fully understand the system constraints they’re operating under. They can’t identify whether they want to operate at a 200MHz operating point and a 40LP process, vs. a 500MHz operating point. But if you target close to the edge of a process the results you get in terms of energy consumption and peak power consumption can be way off. People really understand the usage scenarios and the constraints that system architectures put on operating points and operating constraints. Sometimes that’s missing.

Five Important Changes That Will Affect Power

Thursday, November 3rd, 2011

By Ed Sperling
So far most of the energy savings in SoCs have been achieved using two main approaches—turning off most of the chip most of the time, and changing the materials used to insulate against current leakage.

Over the next few years, changes to designs will be more radical, encompass more pieces of a bigger system, and they will be orders of magnitude more effective. From a market standpoint, there is little choice. Computing increasingly is going mobile, and time between charges is a competitive edge. The caveat is that increased battery life has to come with a subsequent increase in functionality. Everything that could be done with a plug now will have to be done without one.

That means rethinking everything from the hardware design to the usage model to the software that runs on those platforms. And it means getting chips out the door at least as quickly, if not more quickly. Here are five trends and approaches that collectively, and sometimes individually, will have a big impact on energy efficiency, power consumption and leakage:

1. Rethinking the basics. Some of the biggest advances in efficiency will come from optimizing existing technology. There is more to turn off, more pieces to improve, and there are more ways of doing it better.

Consider something as basic as the clock, for example. The big focus has been maximizing frequency for nearly five decades. There are even concurrent clocks to make that happen. But having them always on and always running at the same frequency means they use a lot more energy than necessary.

“Design has always centered around the clock being the heartbeat of the system,” said Chi-Ping Su, senior vice president of R&D for Cadence’s Silicon Realization Group. “So people always assume the clock will be on. What we have found, working with ARM and the processor type of design, is that the clock consumes an extremely large percentage of the power. Timing and frequency are based on the clock. So you build a tree to be the ideal clock and you do everything based on that. When we started looking at it, we started asking why clocks need to be balanced at all.”

So how much energy can be saved? Su contends the amount is up to 30% of clock-tree power and up to 50% of dynamic power for the entire system.

He’s not alone in touting these kinds of numbers. Most SoC tools developers believe that dealing with energy/power/leakage at or before RTL can mean significant savings for the overall design.
“All the low-hanging fruit is still available to chip designers,” said Vic Kulkarni, senior vice president and general manager at Apache Design. “We find that even advanced designers are more concerned with meeting functionality and identifying power bugs. What they forget is the relationship between data, clock, reset and enable—the four signals in an SoC.”

2. Reducing distance and resistance. Over the next two years the SoC industry will undergo a radical shift that will continue for years to come. Rather than plotting Moore’s Law linearly, transistors will be placed in three dimensions.

Driven partly by re-use, partly by time-to-market pressures and partly by physical limitations, 2.5D and 3D stacking will have an enormous effect on energy consumption and power. By stacking memory and other components on top of logic, the distance a signal must travel can be shortened significantly, along with the energy necessary to drive that signal.

“Moore’s Law is not a law,” said Wally Rhines, chairman and CEO of Mentor Graphics. “But the easiest way to reduce the cost of a transistor for the last 40 years has been shrinking feature sizes and growing wafer sizes. We are coming into an era where it will be more cost effective to stack die than to shrink feature sizes. We will hit it with memory before logic, but as with all new technologies we will adopt it before it is cost effective because of unique capabilities.”

Whether it’s done with an interposer, package-on-package, or flip-chip bumped die, Rhines said there is a 70% decrease in power dissipation if the memory can be put on top of a processor.

And that’s just for starters. By adding more processors that are sized for a particular function and tying that to just the right amount of memory, rather than a whole memory chip or block, far less power is needed. Companies such as Tensilica and ARM have been making this case for some time. With stacked die, their arguments are likely to receive far more attention.

3. New materials and structures. Calling a material “new” is something of a misnomer in SoC design. Most of the techniques that we consider revolutionary have been around for decades, but they haven’t been developed enough to the point where they are cost effective, both from a yield and materials standpoint.

Through-silicon VIAs, for example, have been talked about since the late 1950s, and interposers in 2.5D packages are simply a collection of TSVs on a single die. But there are still issues to be worked out. Shang-Yi Chiang, senior vice president of R&D at TSMC, said there questions remain about how to integrate a substrate with an interposer, and how to debug it at different phases of development so it can be tested.

“There are a lot of parasitics to deal with in 2.5D,” Chiang said. “And with 3D we need time to make sure we can calibrate it.”

The other kind of 3D—structures such as FinFETs, tunnel FETs and nanowires—have been on the drawing board since the 1990s. All of these structures can lower leakage by controlling the gate at multiple points. FinFETs are planned in volume for 14nm by both GlobalFoundries and TSMC, while Intel may begin using them as early as 22nm.

These structures hold the promise of radically reducing leakage of both static and dynamic power using all modes of operation—at least initially.

“The problem is these are a one-off thing,” said Mike Muller, chief technology officer at ARM. “FinFETs do reduce leakage, but once you’ve done that you’ve still got three impossible things to do before breakfast. Those kinds of steps are part of the solution.”

Muller said combining those with stacking techniques will go even further. “It opens the door to completely different die-to-die memory interfaces which allow you to build more efficient systems than when you go off the chip, down the serial interface to a separately packaged die. It changes the memory bandwidth, and this is just a computer at the end of the day so memory is one of the fundamentals for performance. Stacking allows you to change that.

4. Lowering the voltage. One of the benefits of 3D structures such as FinFETs and stacking of die is that they make it easier to lower the voltage in certain parts of the chip. The reason is that the minimum voltage for DRAM may be higher just to maintain functionality than it is for logic or I/O. By separating those functions into different die, issues such as state retention and leakage can be confined and dealt with independently—the so-called divide-and-conquer approach.

So how low can the voltage go? Several years ago, researchers at IBM said the minimum voltage for an SoC would be at least 0.7 volts. It now appears it can be as low as 0.1 or 0.2 volts, and research is under way to lower it even further.

“You can get down to 0.3 or 0.2 volts without any problems,” Qi Wang, technical marketing group director at Cadence, said during a recent roundtable. “If you keep the aspect ratio of the depth and the height of a FinFET then you can guarantee the performance, but you do have other physical effects. Nothing is free. But the voltage can go much lower than what the textbooks say.”

5. Fixing software. Software is the last piece of the puzzle to fix, and it’s been one of the hardest for a number of reasons.

First of all, software takes longer to create and perfect than hardware. This is evident in all the bug fixes and updates. All three of the top EDA players are involved in this effort. Synopsys is working on software prototyping to get allow software to be written even before the hardware is ready. Mentor has been involved in simplifying the creation of RTOSes and embedded software. And Cadence has shifted its design approach so that software and hardware can be done far more concurrently.

But getting software out on time is only a first step. The next step is to make software function more efficiently, an approach that dates back to the RISC vs. CISC wars of the 1990s. Reduced instruction set computing was more efficient than complex instruction set computing, which boosted performance. By taking that approach one step further, it also can reduce the amount of energy consumed by a particular task, and be used to manage the overall power in an system much more efficiently.

Work on symmetric multiprocessing continues, as well. How far that will go is anyone’s guess, but for most applications we now seem to be facing a limit on the number of cores that can be effectively used by most applications. Talk about unlimited number of cores has given way to limited numbers of cores and unlimited numbers of processors spread throughout a system—most of which are off most of the time.

Taken together, all five of these trends will have a huge effect on efficiency, power and leakage. And now that battery life is a competitive issue, it also is likely to be used by vendors and seen as a value add instead of an unnecessary engineering cost—or worse, a nuisance.

Using High-Level Synthesis To Manage Power

Wednesday, November 2nd, 2011

Low-Power Engineering talks with Apache Design’s Vic Kulkarni, Tensilica’s Grant Martin, Cadence’s Mike Meyer, Calypto’s Shawn McCloud and Forte Design’s Brett Cline about the need for a higher level of abstraction to optimize power in ICs.

YouTube Preview Image

Energy Vs. Power

Thursday, October 6th, 2011

By Ann Steffora Mutschler
The terms power and energy are used almost interchangeably these days, but understanding and clearly articulating how to optimize embedded designs for maximum energy and power efficiency can make a big difference in a design.

At a physics level, energy = power x time, whereas power is the rate of energy in a given time window. When the focus is specifically power, it is the rate at which energy is released—so much per second. Typically that is associated with things like supplying current into a chip and pulling heat out of a chip.

“You may not be able to instantaneously or over a short period of time be able to sustain more than a certain amount of power dissipation,” said Chris Rowen, CTO of Tensilica. “It’s related most often to the thermal envelope within which you are operating.”

Energy, on the other hand, is the total consumed over the course of some job. Most importantly here, he reminded, batteries don’t store power, batteries store energy. There is a finite supply of energy in a battery and often in mobile devices, and the first consideration is with energy. For example, you want someone to be able to get their job done, which might be to go all day with their phone with the amount of charge that they have in their battery. You may not care quite so much about exactly what rate that energy is used up, but you do care about completing the task at hand.

The difference between energy and power is obvious when there may be two approaches to doing some kind of computation. Power may be dissipated at a high rate, the work performed very quickly and then shut down. Alternatively, that computation may be stretched out over a longer period of time, dissipating less power at any instant, but consuming energy over a longer period.

“You have to actually look at the product of the two or the area under the curve and say how much power times how much time, that’s what gave me energy. This shows up particularly when people are looking at the interaction of hardware and software because software has a lot of control over algorithms. The algorithms often will determine how much energy is required to get something done,” Rowen explained.

If the engineering team has a good understanding of what the power dissipation is, with one set of instructions that might be executed versus another, they might able to determine that for a particular processor in a specific circumstance they are better off dissipating more power by running an algorithm that takes less time and consumes less energy. Or they may determine for a different algorithm that they are best off choosing the lowest energy instructions even though it may take a lot more instructions to get the job done but may result in less total energy.

“There are a wide range of non-obvious tradeoffs that people might make to look at the interaction between what algorithms, what software they might run and what the characteristics of hardware are. In some cases the hardware, which has higher power dissipation has lower energy,” Rowen added.

When it comes to battery life in mobile devices, energy efficiency is on the top of everyone’s list.

“Ultimately we want to do the most operations per electron that shoots through the power grid and that’s what is really going to give us longer run times for standby, active, whatever it is,” observed Cary Chin, director of technical marketing for Synopsys’ low power solutions. “Power efficiency is important, as well, because in an environment where if you assume that energy usage or power is relatively constant, then power and energy are kind of equivalent. It’s a technicality, but when you look at complex devices the level of power is certainly not constant. It starts to change a lot. This is the time about which we should really start to make the distinction because the assumption of constant power is no longer true in these devices. And as we go forward the assumption that when my phone is on, it’s just on, is no longer going to be true. In fact, it will never be all on, there will always be pieces that are off.”

That concept of on and off being relative is an important element in design. Another consideration is how much energy a design uses to get a particular task done.

“Customers think a lot about how much power the design is using and they are thinking about instantaneous power consumption,” said Jon McDonald, technical marketing engineer in Mentor Graphics’ design creation synthesis group. “But to really look at what they are trying to do, they are not trying to optimize the power generally. They really are trying to optimize the energy the system is consuming to get the job done.”

In talking with an engineering team recently, McDonald learned they have good estimates for how much power the system uses in any given state but they don’t have a good idea of how much energy is used by that system as it is processing the work that it needs to do because they don’t know how long it stays in any given state. For example, he said that when a system is processing data, if there is a lot of contention on the bus and the system ends up staying in a given state for 20% or 30% longer because it’s waiting for resources, all of a sudden it is using 20% or 30% more energy. The power it used didn’t change. The power in that state is still the same. But the power in that state doesn’t mean anything until you know how long it’s been in that state.

“You need a good understanding of not just the power that the system is consuming but the timing, the performance of the system, how long it takes in any given state to do the job that it’s trying to do,” McDonald said.

Coming next month: Optimizing for power and energy efficiency and the differences between them.

Next Page »