Posts Tagged ‘Cadence’

Next Page »

Step Away From the Spreadsheet

Thursday, February 9th, 2012

By Ann Steffora Mutschler
Engineers today spend more than a quarter of their time trying to meet power specifications.

A survey of more than 700 engineers by Calypto illustrates just how important and time-consuming power management is today for engineering teams. As consumer devices grow ever more complex, the need to deal with, analyze and optimize power at not just the RTL but at the system level is the next challenge, even if the path to reach that goal is not yet clear.

The opportunities for optimizing a design for power efficiency are greatest at the architectural level of abstraction. The further a design moves downstream the less effective optimization techniques become, noted Yossi Veller, chief scientist for ESL at Mentor Graphics, in a white paper he co-authored for ARM’s IQ Magazine. “Power optimization must begin with architectural analysis, exploration, and optimization of power and timing at the electronic system level (ESL). According to a study by LSI Logic, techniques available at the RTL synthesis phase have the ability to reduce power by 20%; those at the gate level offer a 10% reduction; while those at the layout level can reduce power by only 5%. Waiting until the RTL to begin optimizing for power is a wasted opportunity because power usage can be reduced by 80% at the ESL.”

Fig. 1: The ability to optimize power at the architectural far exceeds that at lower levels of abstraction.

“Traditional power optimization tools are really working at the lower levels of abstraction,” explained William Ruby, senior director of RTL power product engineering at Apache Design. “If you look at synthesis, if you look at physical design, there are some automated techniques that are available in those tools. But those are in a category of additional refinement-type steps. Once you have the design architecture nailed down, then you can add in some optimizations based on those tools and you can get some additional incremental power savings, but the part that is missing is enabling the true design-for-power efficiency. If you look at modern chip architectures, they are extremely complex and the RTL descriptions of these architectures are even more complex such that RTL in some cases is no longer seen as a viable architectural description language. You want to be able to describe the architecture of the design in a high level of abstraction.”

With this description comes the requirement to be able to analyze power. Today, this is done by synthesizing the design from a high-level description such as C++ down to RTL, and then an RTL power analysis tool can function and give feedback into the architectural domain. But what needs to accompany this synthesis-loop-back type of flow and give some indication of what the power numbers is more intelligence in those high level tools. They need to point out inefficiencies in a design at both the RTL and architectural levels.

Chris Rowen, CTO and co-founder of Tensilica sees two big challenges for power analysis tools. “One, it is very, very difficult to isolate where the real problem is. It only makes sense to really measure power at the level when you have really synthesized the logic and laid it out and you actually know what the physical design looks like, because the physical design has a huge impact on what the power dissipation of the circuit it.”

By the time it has gone through synthesis and place and route, you have really very little visibility into what was the original logic being questioned. “It all goes into the Cuisinart and all you get is this amorphous mush of gates at the end. So if someone asks you, ‘How much power is being dissipated in my multiplier versus in my divider versus in my register file,’ I don’t know anymore because I have to process them all together in order to get good physical results. But then it all has been aggressively remapped into other logic forms and I can’t isolate the power easily. So you have to work in rather indirect ways to figure out whether the power was being dissipated in one function versus another.”

A second problem, he said, involves system-level tracking of different scenarios. “It is extremely difficult to reach your power goal if you say, ‘Let me use the worst case assumption about each subsystem. I’m going to assume that every piece of my baseband is on, and every piece of my Layer 2 and Layer 3 protocol stack is on, and my image processor is on, and my apps processor is running full out, and all of my RF subsystems are running,’ because of course you’d exceed your power budget by a factor of two or three. Instead people recognize they’re not all on at the same time, the system doesn’t work that way. When you are doing one thing, then you’re typically not doing something else. Therefore, you only have to look at the particular combination of subsystems that is on at that time. However, the software guys have really poor tools to correlate what’s going on in the higher-level operating modes to what’s going on in terms of actual power dissipation in different subsystems. They are completely shooting in the dark where they do not have anything like the kind of accuracy for the modeling of these things.”

As a step towards true system-level power analysis, engineering teams are gradually figuring out that they need to build approximate models of power in addition to simulation environments that are fast enough to run realistic scenarios and to capture real activity. “Ironically getting power information is more than anything else probably a function of getting fast enough simulation, because only if you can run realistic size scenarios will you really gain interesting information,” he said.

This has become one of the big drivers of ESL, which until recently has been relatively slow to catch on. But complexity at advanced nodes, including power considerations, have significantly boosted it’s appeal.

“What the user would like is to have at the very early stages, when he has a TLM model of the design, is at least a relative assessment what architecture decisions will impact the energy in which direction,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “He will also want to know how the software impacts all of that. From a technology perspective, TLM models allow you to do that so it’s fairly straightforward to annotate power-related data into TLM models,” he asserted.

Annotating models with data just like annotating performance is a challenge and can be approached in three ways:

First, he said, “You can start with your assumptions, with your power budget. TLM models and virtual prototypes allow you to then execute your assumptions so you have in your power envelope/power budget. You say, ‘These tasks should take that much power, I know that from past experience,’ and then you execute your virtual platform with those annotated, estimated data or budgeted data. And you get dynamic results depending on what tasks the software ends up calling, how long a cell phone is used for which task in a day, and so forth.”

Second, annotate back from when you have RTL. “At the RTL level you have these switching formats that you can derive from the RTL to get a good idea about the activity,” Schirrmeister continued.

And third, it can be dealt with at the silicon level by taking previous designs, measuring power information and annotating back into TLM models.

Design engineers are undoubtedly looking for analysis and optimization at the system level so they can do power analysis and power estimation before RTL is available and before they can do gate-level simulations. But are they truly ready to adopt it?

Achim Nohl, technical marketing manager for Synopsys’ solutions group pointed out that today, power analysis starts with gate-level simulation. “If you talk to a hardware engineer and tell him, ‘We are going to employ virtual prototyping and high-level models to do power analysis,’ he will certainly look at you a little strange because he thinks, ‘I’m doing all those back-end optimizations and all those specific things to optimize power. How will you ever be able to reflect that in a virtual prototype simulation?’ But that’s not the point. For virtual prototyping, the granularity of a system is very much different. You’re not looking at just the memory controller. You’re looking at the CPU with the memory controller, the buses, the interconnect, the peripherals and how all those things are orchestrated to find out where the different hot spots are and what is best way to program all those pieces. What is the best scheduling technique? That is the concern at that level.”

When a new chip is architected today, estimates are done to determine whether the chip is feasible at all from a power perspective, he said. “Today, people are using spreadsheets in order to do this analysis, and this can only be a worst case analysis because they don’t know the dynamics and can’t reflect the dynamics of the system in those spreadsheets.”

While the pure architectural level tools don’t exist yet, many users are likely content with high-level synthesis tools for the time being. Apache’s Ruby believes they are good in their own respects but they are not actually meant to give architectural guidance; they are just meant to synthesize the design above the RTL.

One final thought for nervous system architects: The architectural tools of the near future will not replace the actual architect unless they become truly artificial intelligence, which is not likely to happen any time soon, Ruby concluded.

Margin Of Error

Thursday, February 9th, 2012

By Ed Sperling
Adding extra circuits and silicon area to a chip has always been frowned upon by chipmakers. Extra silicon means extra money, and for most chips the least expensive is always the better choice. But at advanced process nodes, margin also can slow performance, increase power consumption, and make it harder to achieve timing closure.

The obvious solution is to reduce margin throughout the design, but the reality is that margin budgets for a complex SoC will never go down. The best that design teams can hope for, in fact, is to keep margin constant from node to node and across stacked configurations. While this will require constant vigilance on the part of architects, it also will increase challenges from the conceptual stages of the design all the way to achieving acceptable yields in manufacturing.

What can’t be fixed
In some cases excess margin is out of reach of design teams. With more and more third-party IP now included in designs—and as much as 90% of the design now a combination of third-party and re-used IP—it’s difficult to even get a firm handle on the amount of guard-banding being done. So far, this hasn’t been a problem because most of the industry still isn’t producing 28nm chips in volume.

“Right now it’s only really a worry for the ‘star-IP,’ because if my USB controller is a bit bigger and power hungry than it might be, it is still peanuts compared with the overall platform figures,” said one architect at a large chip company, who spoke on condition that he not be named. “Even the sum of the power of all the little things doesn’t approach the star-IP. And here’s a thing about the star-IP: It may be big and power-hungry, but it there’s still a case for it. Some IP has a well-defined job to do and has to get that job done as efficiently as possible. But with star-IP, it’s mainly ‘faster is better.’ So sure your Web browser would be more power- and area-efficient on a Cortex-A8 than a Cortex-A9, but I bet you’d rather buy the A9-based tablet.”

Those kinds of choices, as well as time-to-market pressures where IP can be re-used quickly, make guard-banding almost inevitable. What’s surprising is not that it still exists, but that it has remained relatively constant given the explosion in the number of components on an SoC.

Where margin matters most
But margin still causes signal propagation issues because there is more silicon and more wires that signals need to be driven through. That, in turn, leads to the need for wider buses.

“When you guard band you need to ratchet up the intended operating frequencies and increase the clock frequency,” said Neil Hand, group marketing director for Cadence’s SoC Realization Group. “All challenges are made worse. In some parts of the design there is no impact. If you have a low-speed peripheral you probably don’t need to worry about it. But with something like high-performance PCI Express, gen 3, you have fast protocols and huge pipes and margin becomes a critical issue. You have a hard time meeting closure even with no margin. Margin makes it worse.”

He said the key is not so much reducing the percentage of guard banding. The rate has been relatively constant, with about 20% margin at 65nm and 90nm, and at least 15% at 28nm and 20nm.

“With that number there’s a lot more slack,” he noted. “You need to know where the slack is and where it’s going to impact the design. Where you do have room to move it may drive different IP use. There may be better IP externally.”

He’s not alone in that view. In fact, all of the Big Three EDA vendors are counting on the need to trim margin to boost their IP sales over internally developed IP blocks.

“There are a lot of challenges working with 28/20nm because of the variability in processes,” said Navraj Nandra, senior director of marketing in Synopsys’ Analog and Mixed Signal IP Solutions Group. “Reducing margin makes a different for getting performance out of analog. You also want to be competitive in price-performance-area. The question is how much margin you can accept in IP to meet those goals but not compromise on yield or variability.”

This becomes a difficult engineering tradeoff, however. Do you design IP for a specific chip, or do you add enough margin to allow it to easily plug into other designs? For commercial IP, the answer is clearly versatility, but there is a cost to that flexibility.

“You can’t be competitive and have slop in the design, but you can’t build something so competitive that it will only work for one design,” Nandra said. “It’s like a drag car where you run it for a half mile and then you have to replace the engine, the tires, and add more nitrous oxide. You can do the same for super high-performance chips for one temperature range and one process, but it’s useless for anything else. The goal is to build in enough circuit techniques with just enough margin not to risk performance problems if there is variability in the process.”

Manufacturability
Process variability has become particularly troublesome at advanced nodes. Coupled with double patterning at 20nm, and the likelihood of triple patterning at 14nm, margin takes on entirely new dimensions.

“We’re trying to characterize process corners and design around a nominal target,” said Jean-Marie Brunet, director of product marketing for model-based DFM and place and route integration at Mentor Graphics. “Third-party integration is a real challenge. Fill used to be a simple process where you insert it at every layer. But you don’t know what is in the IP these days, so fill has to be re-done. That doesn’t help with the integrity of the IP.”

He said that for most IP, there usually is guard-banding on the periphery of the IP to deal with fill. That impacts timing, area and performance.

“This is really an issue for the big chip companies that do 300 to 400 tapeouts a year, not for the microprocessor houses that can take their time to eliminate margin. The problem is there is no magic bullet for everyone else. And when we get into double patterning, this is really going to be an issue because you’re overlaying two masks, and any shift of the overlay will have a dramatic impact on the chip.”

The future
While pressure to reduce guard banding will continue, there is at least some hope for dealing with the problem more effectively. One involves new materials, such as graphene and silicon on insulator, which help reduce power, and new structures such as finFETs and carbon nanotube FETs, which minimize the effects of leakage and thereby make up for some of the power drawn by the extra margin.

A second approach is better tools. Knowing what the variability is in a process allows engineers to design in a minimum amount of margin. Building more accurate models can help, particularly in conjunction with analysis tools for exploring one IP block versus another.

And finally, stacked die will alleviate at least some concerns because portions such as analog can be developed at older nodes where they make more sense, rather than trying to fit everything into the latest process node.

Experts At The Table: Making Software More Energy-Efficient

Friday, January 27th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How much of the battery drain on a smart phone is caused by the hardware, how much is caused by the software, and how much is caused by bad reception?
Kaiser: Software controls a lot of it. Bad hardware that does not allow you to turn something off is one cause. But that doesn’t happen as often as bad software. If the hardware has one clock that turns everything off then you have a problem because whenever you want to use one little block you have to turn on five. But with software you have to give engineers feedback and tell them what knobs to turn. Ideally, you even give them an algorithm for how to tweak those knobs. We tried to do this with Nucleus. The drivers automatically manage their own power for WiFi or anything else. If no one opens the driver it won’t burn power. If you can lower power, don’t worry about the rest of the OS. Just minimize dynamically. You can set up limits for the driver. Then the application guy just needs to be able to allow the device to turn on. You need to give people simple metrics like CPU utilization. And if you give metrics on how much power your CPU is using while idle and how much it’s using when it’s busy, you can tell how much your CPU is using. Then, if you lower the frequency to half and the CPU is twice as busy, it’s actually burning more power. The compiler needs to do the job.
Rowen: The compiler can do a good job of the lower level things, but the choice of algorithms and which states you’re going to transition among is way beyond what the compiler has any access to. I recently saw a study of the number of states that a cell phone goes through. Something like 38 messages had to go back and forth between the software running on the phone and what was going on in the base station that were basically a negotiation as the phone entered a cell. There are some very tough and complex tradeoffs to make about whether you want to save power at one level by doing fewer transactions or you want to be aggressive and get the negotiation done as quickly as possible because it allows you to get into the lower power state as quickly as possible. There are some non-obvious tradeoffs at work at the system level because you have to determine if the phone is in a low-power or high-power state. They’re not things that you’re going to work out between Microsoft and Nokia. It’s going to be between Nokia and AT&T.
Kaiser: Does it matter? How often do you associate with a particular cell station? It affects standby time, but standby time is already pretty long. Does it really matter if you optimize that case, or do you care about other cases? How much of your battery went into this handshake?
Rowen: With the scenarios I’ve seen it could matter a lot.
Hardee: If you change the data arrival rate to those processes that are rendering Web pages, it’s a big difference. You could be running your graphics processors continually just because you have a slow data arrival rate, as opposed to processing everything and shutting down. It would be difficult for the software guys to optimize for those cases. What they can optimize for is how predictable stuff is. Can you do predictive scheduling? That changes what the application is doing. Those decisions are set pretty low down in the software stack, but what’s available to use and how effectively it can be used is another thing the software engineer has to think about.

LPE: How much of this information is making its way between hardware and software teams?
Kulkarni: That’s where virtual platforms come in. A co-simulation platform is a better description. But the marriage of the software with the hardware and how we capture that in instrumentation then can be driven toward a meter, which may be RTL power, a hardware description. But it all has to convert into power analysis at the end of the day. The feedback can be given to the system designer and the software designer, but all those things are missing. What Carbon is doing is an important step toward that. You can do the power analysis and get that feedback. We have to look at the application over time, and the feedback has to be in real time. In one of our customer applications for digital TV, they asked us if your eyes are looking at the oval in the middle of the screen can you turn off the power at the edges. They’re looking at pixel-by-pixel power control. This is real-time feedback of hardware and software applications.
Kaiser: You can re-encode movies based upon brightness. If it’s pretty dark, you can show it with much lower backlight. The backlight can vary and the screen looks the same. And it can vary by region. That’s beyond the scope of hardware. It’s algorithms.
Kulkarni: This customer is looking for software energy-reducing concepts. They want to know where their software is consuming more power.
Kaiser: They want the drivers. And if you’re going to be varying the CPU, then you also need to provide the compiler.
Rowen: Depending on what level in the system you’re talking about, the hardware has always provided the software. We’re doing a lot of advanced baseband design. The next thing after the industry specification that you do is make it happen in 150 milliwatts at 300 Mbits per second. That drives all the subsequent design, including the choice of algorithms, the processors, the allocation of memory and the interconnect. They’re all driven within a power budget. Everyone working at layer one knows the power. This very tight hardware-software co-design is very established there. It starts to loosen up as you go up, in part because you’re aggregating these much more complex systems together.
Neifert: That’s where it’s missing. The power is really a system context. Five or six years ago I started getting inquiries from leading-edge customers. A couple years later it was leading-edge research groups. About two years ago it made it out of research, and now about 30% or 40% of our customers are doing this in some way. It’s of great importance now.
Hardee: We all tend to gravitate toward the simulation model or the virtual platform’s ability to do power estimation. That’s not actually the low-hanging fruit, though. The thing that can be done relatively simply is system integration testing of power management software. Can you switch the mains on and off? Is it idle when you think it’s idle? That’s a lot lower-hanging fruit in a SystemC TLM 2.0 modeling environment than in power estimation. For power estimation, we have a ways to go even in the activity formats used. You have to use averaging formats over defined windows. These all apply at the signal level. How do we bring them up to the TLM 2.0 level to make them run faster? That can be an issue. There are circumstances where you can say you have an AXI protocol and 64 bits, and you can do the math to get from signal level to architectural level. But then you look at all the architectural differences that start to become nuances in that model, like whether you’re doing split transactions and how are bus transactions being pipelined. Is that being correctly modeled in the platform. There’s a lot of complication. Even to get relative accuracy you will need to model this.
Rowen: We’ve gone up halfway between this signal and toggle level and TLM. Processors are nicely defined. What we’ve done is to automatically derive instruction-execution-level energy models so we can, as part of the initial instruction set characterization, come up with a pretty good energy model per execution. It’s still data independent, but there’s a summary number. The simulator knows how to count things like memory references. Then the whole processor plus memory subsystem has very accurate relative and kind of accurate absolute energy at a level that runs at the full speed of a fast simulator, not at RTL speed. Therefore you can start to make that a building block within a transaction-level approach. That’s one of the pieces of raising energy in abstraction and getting past the toggle.
Neifert: You start doing toggles and you slow everything down. You may use the toggles as an instrument for calibration, and then you go back and put that in and say, when I do this I take this much power per cycle. Then you can start aggregating some of those numbers to at least get a relative figure.

Experts At The Table: Making Software More Energy-Efficient

Friday, January 20th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How do you get around the fact that there isn’t enough information available to the software team?
Kaiser: You could profile.
Neifert: You can profile to a point. A high-level virtual platform is too abstracted. It doesn’t have the concept of cycles in there. You need a level of accuracy with sufficient instrumentation. Often it’s not just doing any one single task. It’s what happens when these tasks intersect. What happens when you’re watching a video on your phone, talking to another person and someone calls on another line? It’s power drain happening at the same time, and you don’t necessarily test that from a system perspective.

LPE: Where are we starting to see the most energy consumed by software? Is it at the embedded level or further up the stack?
Kaiser: It can be anywhere.
Rowen: I don’t know how you can really separate power dissipation of software from hardware. You have to look at different subsystems. What’s the baseband subsystem doing versus the imaging subsystem versus the audio subsystem versus the graphics subsystem? Each one of those is going to be some compound of hardware and software issues. You can look at the independent worst-case scenario for each of them. Your first job is to make sure you have good characterization of what’s going on in each subsystem. Then you get to these interesting interactions. If you’re playing back a video you know you’re not doing maximum download on your wireless connection. Or when you’re recording a video, you know that something else is not happening. People have been forced to move from simple subsystem-by-subsystem worst-case analysis to looking at the whole interaction. It’s largely because they can get to a smaller worst-case number than if they didn’t consider scenarios.
Kulkarni: We found that with worst-case scenarios it’s easier to manage the power and to do hardware-software co-simulation, but within the subsystem itself there are so many different modes of operation that co-simulation gets even more interesting. It’s one level of power or energy reduction if you shut off the subsystem, but below that level how do you optimize that subsystem? You’re running software applications, which have to be co-simulated with the hardware. Relative accuracy becomes critical, although not necessarily the absolute accuracy. So how do you generate the testbench? How do you create power patterns? Selecting the critical energy-consuming patterns becomes a challenge. It’s one thing to model or create instrumentation for your software application, but you need a meaningful set of vectors for power consumption. Most of the functional testbenches are useless from a power consumption point of view. Looking at finite-state machines gets to be more and more critical. From the software application, how do you translate that into finite state machines that are control registers, which will then be translated into RTL? And then software is managing all of that. With one mobile phone application we worked with three different vendors for IP models, SystemC, an OSCI simulator and then a five-minute talk time. It would have taken about three months if the customer had not created higher-level models and energy-consuming signals out of that whole environment running together.

LPE: So the hardware guys are worried about power, but the software guys aren’t even thinking about it. How do we change that? Is putting up an ammeter enough?
Neifert: The ammeter is certainly a start. It’s a lot better than what they have today on the software side. We always have talked about concurrent engineering, and more and more processes get applied to that. This is just the next application. The first key is to provide a mechanism, then leverage it across everything and make sure that mechanism is as accurate as possible. But even a relative number is essential. Does this setting take 20% more? Give engineers good tools and they’ll figure out how to apply them.
Rowen: It’s the same as with a video game. If you give someone real-time feedback on the effect of what they’re doing, and they know what the green zone is doing versus the red zone, they’re amazingly effective at getting the needle down into the green zone and keeping it there.
Hardee: It’s not just optimization, especially at the higher levels of the stack. It’s more a case of, ‘The power consumption in this model is worse than the previous model. Something is wrong with the software. Fix it.’ And then the software guy goes off and finds the routine that is polling the modem way more often than is necessary and preventing it from going into sleep mode. It’s those gross errors that are being found when something goes wrong. Are we using the right capability, the right parallelism, the right pipelining or the right architectural facet of the platform to run the right piece of software? Those decisions are usually way down the stack. That’s something the operating system and the drivers have to understand for the various system calls that are going on. When you get into those lower levels of software, you need an accurate model of the platform that can start to tell you the energy usage you’ll get with those various selections to optimize further down the stack. With the application, it’s as simple as looking for what’s keeping something on when it should be off. For true optimization, you’re looking lower down the stack. You could hit the same problem with FPGA prototypes. You can run a decent portion of real-time and you can run some vectors, but what’s your characterization? You’re mapped to a prototype that doesn’t bear any relationship to real silicon. You need activity plus the characterization with enough compute power to run deep, real system modes.
Kulkarni: You need to turn this whole problem on its head. Why do you have to run Facebook versus YouTube versus GPS software on the same processor design? Why not create a Facebook processor rather than running it on a general-purpose processor? People are writing software applications ranging from medical imaging to health care to whatever else you need, and then tuning the hardware to that. And there will be multicore hardware implementations where it makes sense.
Rowen: That’s absolutely the case. One of the fundamental dynamics to emerge is that as power has become so much more important, people have begun to look at power as the ultimate goal and figuring out how everything else serves that goal. If that means you’re going to build a processor around an application, rather than the other way around, you’ll do it if it saves meaningful amounts of power. There are two key elements. We’ve talked about, if you can measure power that can help you make decisions about one processor versus another. The other angle changes the nature of the processor itself. You want processors where what software you run matters to power. That isn’t an obvious characteristic. A lot of people say that as long as every instruction dissipates power then that’s all you have to worry about. All you have to do is go find the one that consumes the most power and you beat down that one as much as possible. But you’re going to spend very little of your time running the worst-case instruction. You’re going to be running a mix of things. And even within your worst-case task, you’re not going to run your worst-case instruction all the time. You need internal mechanisms for clock gating, power gating logic reduction, so the difference between the lowest-power instruction compared with the highest-power instruction is no more than a factor of 10. If you’re running a lightweight mix, that will use an order of magnitude less power than something that does 128 multiplies in a single cycle. By having this big dynamic range you reduce average power and you make software matter. The programmer has implicit or explicit control over what instructions to use, so they can determine how much power to dissipate. You really need to provide people with energy feedback.
Hardee: Having those processor architectures that match the task is highly critical. But you only get a handful of programmers able to use that unless you have the compiler technology to match. You have to be able to automate and not leave it to the individual programmer to choose which instructions to use. The compiler has to be able to compile for performance versus power, just as you are with synthesis constraints in hardware, and it’s going to need to help me through automation to do the right thing.
Kaiser: Yes, we do need feedback. That needs to be there real time, if possible, and it should be better than an ammeter. You need to be able to graph it and correlate it to what’s running in the system, so when software engineers see a spike they need to know. But there’s another issue. Hardware provides a lot of knobs. The guy writing the algorithm is going to use them as little as possible. He will use those settings unless you tell him what those knobs do and why he needs to move them. Software engineers have no reason to change them. If the 128-matrix multiplication works, then they’re done. It’s functional. Power has been an afterthought for years and years.

Rethinking Good Enough

Thursday, January 12th, 2012

By Ed Sperling
Power has been elevated from an afterthought to one of the top considerations and tradeoffs in SoC design, edging out performance and area in many cases and in some cases even cost and features.

Tradeoffs in design always change, depending upon what the most pressing concern is among consumers at any time. For decades, performance was always the top of anyone’s list, followed closely by cost. The MIPS and GHz wars made for great competitive marketing. But as devices become more mobile, and as even the largest enterprises focus on energy costs, the reigning king is power. How long does a battery last between charges on a smart phone or a laptop given a normal use case? How may kilowatt hours does it take to run a server?

This isn’t always a clean tradeoff, however. For one thing, some design features require more power, forcing changes in other parts of a design. And in other cases, the lack of any single use model makes it almost impossible to guess how a device will be used. One consumer may rely on voice calls, while another focus on text and still another may play games and stream video.

What stays, what goes
Decisions about what to keep aren’t always simple. Consider an LED TV design, for example. Flattening the screen requires audio enhancement because it’s impossible to get good enough sound out of a TV without playing tricks with the sound. That typically means more post-processing, more codecs, and more energy consumed.

“There are lots of things that can be done to enhance the audio experience,” said Larry Przywara, senior director of multimedia marketing at Tensilica. “The TV designers are space constrained. That requires various volume boosts, equalization and sound widening techniques just to do what they used to do. That’s doable, though, because as algorithms have gotten more complex the SoCs have gotten more powerful.”

Overall, they also use less energy to drive the SoC and the complete system. But sometimes that requires increasing power budgets in one place and decreasing them in another. “The issues in the mobile space are now finding their way into home entertainment,” said Przywara. “With post-processing you need slight modifications in other places to keep a limited power budget.”

In televisions, that energy can come from a variety of places. For example, the current design on some TVs relies on brighter pixels in the middle, where most people focus their eyes, and dimmer pixels on the corners where viewers don’t look.

Making tradeoffs
For both video and audio, the real change is a combination of improved technology and what consumers are willing to live with. Fifteen years ago most audiophiles wouldn’t touch a CD, and even several years ago the focus on quality in DVDs was considered the competitive edge. More people have migrated to the center of the spectrum as CD quality improved and streaming offers vast convenience even if it isn’t high-definition.

“Audio, from a technology standpoint, is not a big deal,” said Cary Chin, director of director of technical marketing for low-power solutions at Synopsys. “The real focus is on video, and today the real question is how you trade off storage with communications. Do you spend more time and energy to compress it or store it? And do you store it locally or in the cloud? As we focus more on portable devices, power and cost are the main factors.”

The other question is just how much power efficiency is enough. A smart phone uses basically the same technology as a tablet, yet the tablet gets significantly longer battery life between charges, while the smart phone needs to be charged every day.

“Tradeoffs are a great way to define area where the technology is evolving,” Chin said. “In digital video you can improve the resolution, but most of the computation and power is spent in compression and decompression. Even with printers, you can print with finer technology but it’s usually more important to lower the cost. Low power is one of the areas that will become critical to all of these decisions over the next 5 to 10 years.”

But even the technology that can command a premium—products from companies such as Apple, high-end graphics from Nvidia, and laptops from Lenovo—haven’t skimped when it comes to saving power.

“The tradeoff is how much energy you use at any time and how much energy you need to accomplish a task,” said Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics. “In general there will be more dark silicon and more functionality on a chip, but it won’t all be running at the same time.”

One of the more interesting tradeoffs has to do with which processors are used for what functions. Nvidia’s Tegra3 graphics chip, for example, has a four-core graphics engine and a fifth, lower-power and lower-performance chip for less data-intensive tasks.

Features or function
Perhaps the hardest thing to determine is whether to cut features, cut performance, or live with more power consumption when it’s needed. Will Ruby, senior director of RTL power product engineering at Apache Design, said what’s changed is that power is a fundamental requirement, along with features and functions. Engineers have to meet the power spec, even if that means some tradeoffs.

“There are two aspects to a tradeoff,” he said. “One is at the spec level. What features can you add for what performance and power. As more and more people learn how to do low-power design, they will meet or beat specs. Of course that usually means even more aggressive specs in the next design. The second is a spe-level tradeoff. How much time does it take to switch to a different application, for example? If it’s one-tenth of a second that will be a big difference from two-tenths of a second.”

Some tradeoffs also occur on the process side. Do you use older low-power process technology, or do you use the fastest general-purpose process technology and turn a block off as quickly as possible? Or do you dope the channel or swap to fully depleted silicon on insulator substrates?

Conclusion
None of these tradeoffs are fixed. They can be tweaked and tweaked again, because what may be good enough for one market, one group of users or at any point in time may be different somewhere else six months later.

What is significant, though, is just how integral a part power has become in all of these decisions. “The real key is how you can exploit all the possibilities of what you can get with relatively low power,” said Pete Hardee, marketing director at Cadence. “If you’re trying to freeze frame a golf swing in video, you may want to go completely the other way—all the way up to 60 frames per second. If power is the issue, you may want a slower frame rate. And it’s not just about battery life. Reliability is a big headache for customers. The ability of low-power techniques to control the performance profile can increase reliability, too.”

Status Report: Power-Aware Design Flow

Thursday, January 12th, 2012

By Ann Steffora Mutschler
While the term “design flow” can be a moving target, there are some specific requirements for a low-power/power-aware tool flow. Looking at this from a high level, where is the industry today, and where is it headed?

There are really two sides to power, which are almost like two sides of the same coin: power consumption and power integrity. And both of those are global, spanning the system and the package and the increasing convergence of both.

“One thing required in this day and age of ever-shrinking product lifecycles is some degree of predictability,” said William Ruby, senior director of RTL power product engineering at Apache Design. “You want to be able to predict early on, when you’re not even halfway finished with the design, what is your power consumption going to be with a reasonable degree of accuracy? What does the thermal picture looks, even spilling over into power integrity? If I can estimate my power, I should be able to also predict some of the power-induced noise considerations, as well. Looking at the power-aware flow from that perspective, early power analysis for the consumption side as well as the power integrity side is really one of the keys here.”

But what about the tools? The back-of-the-napkin or spreadsheet-type calculations worked to a certain extent when things were not very complicated. There needs to be more precision built in. Apache’s answer to this is the RTL power model (RPM) to get better accuracy and more predictability early on. Ruby explained the RTL description allows for a good power number early on, looking at various operating modes. It takes that data into the power integrity side for early chip power integrity analysis. The predictability comes also with the ability to use RPM throughout the design flow to maintain consistency.

Mary Ann White, director of Galaxy power marketing at Synopsys, said various tools exist today that can deal with many aspects of the complete low-power flow. The problem is that systems engineers don’t tend to think about tools in this way. “Just within the implementation flow, there’s verification and implementation, and we find that those engineers don’t exactly talk and work together as easily, so can you imagine what the challenge would be if it went all the way from system-level to somebody that has to deal with manufacturing and then packaging? Even though we tend to provide solutions in those spaces, we find that customers are still very specialized in their very specific areas.”

What engineers want
Krishna Balachandran, director of low-power verification marketing at Synopsys, said to understand what engineering teams really need it helps to segment customers into different buckets. “There are customers that are very advanced in their needs and there are some other customers who have some low-power needs but they kind of know what they are doing—they’ve been doing low-power for longer than the power formats have existed so they’ve evolved with what has happened in terms of power formats and they’ve started using that. Then there are some new customers that are being forced to think about power not because their devices are by themselves low-power, but by virtue of the fact that they are using smaller geometries to reduce the cost and to take advantage of the wafer pricing which can drop. Those customers think that if they drop down to the lower geometries they’ll have to use some power techniques now in their design, because if they don’t then the leakage power becomes unacceptable. So for these reasons some of these customers are coming into the flow and their requirements are very modest. They are almost able to address in an ad hoc way what they have to deal with, rather than by design aiming for lower power chips. There is a whole range of sophistication when it comes to low-power designs and flows. I see that their needs are very different.”

Barry Pangrle, solutions architect for low-power design at Mentor Graphics, said in the future there will be more emphasis on front-end tools. “That will include architectural-level, system-level type stuff, especially hardware/software tools that will allow designers and even software developers to be able to get a better understanding of how the code they are writing impacts the overall power of the products they are developing. You can have really great hardware and if the software doesn’t take advantage of all the capabilities of the hardware, you throw all that effort away.”

Power formats, mixed-signal designs
In the middle part of the flow, one positive step forward last year was that all major EDA vendors came together to pledge their support on the IEEE 1801 power format standard, which should help with tying everything together. More than just the power format support, the underlying methodology is also critical. Qi Wang, technical marketing group director of solutions marketing Cadence, said a converged methodology is still needed—a single power-intent description that can be used in every stage of the design flow to provide consistency.

Overall, he said, it looks as if we have all the pieces of the power-aware design flow, but there’s still a long way to go to address the multi-vendor flow. “Right now we have two formats. Even if we have one format there will still be challenges, but that will play out over the years because at least the whole market on the customer side will be adopting the same power format approach. Right now some of them use CPF, some of them use UPF. The methodology shift is happening. That train has left the station; that will not be changed. It just takes time for the vendors to work out this multi-vendor flow.”

However, he pointed out, there still are technical areas that need more investment. “One big important thing is in the area of mixed-signal design. If you look at all the hard products right now, it’s all about mixed-signal and low power: you have a mobile application, you want to access everywhere, you have wireless, you have Wi-Fi here and there. It’s all about a mobile and battery powered. This means low power and mixed signal. Customers have combined these together. The technologies need to be combined, as well.”

Another key area is verification. Erich Marschner, product marketing manager for functional verification at Mentor Graphics said, “The verification aspects of low power are largely related to methodology because of the capabilities in the tools have been developed over the last four or five years to model the effects of low power, power management and active power management. Users are still behind the curve in terms of trying to understand what to do with those capabilities. Most of the low power simulations that are done today are still done in the context of UPF 1.0 – the previous version of the standard.”

In this regard, many users still have a way to go to take full advantage of the technology available today.

The Next Big Challenge

Thursday, January 12th, 2012

By Ed Sperling
Software is the next big target in the quest to make electronics more energy efficient, but it’s proving a far bigger challenge than most systems architects originally believed it would be.

There are several very large big problems to deal with in software. Writing efficient code for small processors isn’t one of them. In fact, the proliferation of small processors across an SoC makes it easier to deal with at least a portion of the software software. Code can run directly on the bare metal, some of it can be nothing more than an executable file, and still other code can run on a real-time operating system written for a specific purpose or even on slimmed down versions of operating system code.

But bringing all of this code under the control of an SoC is another matter, despite the fact that this is the best way to manage power and minimize physical effects in a chip. Solving this problem requires integration and coherency across a chip, which in turn requires software architects and system architects to work together up front. This may be a goal among companies, but it certainly isn’t a reality.

“You need coherence to develop a high-end software design,” said Dan Driscoll, Nucleus software architect for Mentor Graphic’s Embedded Software Division. “At this point integration is a large portion of the effort, and the problem has yet to be solved. One thing that helps is a single development environment. If you use multiple profiling tools it’s more difficult to pull that together into a system.”

Devils in the details
Just understanding the interactions between various hardware portions of an SoC has far exceeded human limits in complex SoCs, even at mainstream process nodes. Most companies use a block or subsystem approach to deal with this complexity, working on smaller pieces and then assembling them into the whole and hoping it works as a single system.

Software increases the complexity by orders of magnitude, because an increasing amount of software now controls functionality across the chip. It determines what remains on, what gets turned off, in what sequence, at what speed, and what gets priority. It also determines how much power and memory can be allocated to a given function or logic subsystem—at least in 2D designs. (In stacked die, it may be possible to dedicate portions of memory to logic blocks to minimize this issue).

“This is the job of the controller software for the overall system,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “You tell it to execute this API or put data over here. This is a high-level sequence, and it can do connectivity between different cores of a processor. You also can add up the energy transactions and memory transactions that will trigger.”

Multi-core, many-core, and multiple processors
A second big problem stems from the types of processors being used. The ability to write software applications that can take advantage of multiple cores is an old and well-understood issue—about four decades old, in fact. And while it’s easy for processor makers to add more cores onto a piece of silicon and hand it off to applications developers to deal with, the reality is that most applications cannot be parsed to take advantage of more than eight cores, and in many cases the number is likely to be fewer than four.

Databases, scientific calculations and graphics rendering, where there is extreme redundancy, are the exceptions. Even some games can have functionality parsed across cores. For most other applications, though, the limit it probably two to four cores. And if these cores are running popular general-purpose operating systems such as Windows, Mac OSX or Linux, chances are pretty good that it’s not the most efficient implementation of a function even though it may be the most convenient.

RTOSes have been used by the military for decades as a much more energy-efficient alternative, although most of that work was far less concerned about the energy than about security and performance. Their shift into commercial applications such as mobile phones makes them especially suitable for managing specific functions on separate processor cores in an SoC. It doesn’t make sense, for example, to utilize a multicore general-purpose processor for audio enhancements, and if it isn’t running on a general-purpose processor then it probably doesn’t need a general-purpose OS, either. But those functions still have to work with other parts of the chip without affecting signal integrity or creating hardware proximity effects such as heat, ESD and electromigration.

“The idea of SMP (symmetric multiprocessing) beyond 8 to 16 cores is not realistic for most applications,” said Mentor’s Driscoll. “We’re almost stuck with AMP (asynchronous multiprocessing) as part of large multicore implementations. But we’re seeing cases where you may have a TI OMAP 5, running a dual-core ARM Cortex A-9, an A4 and a DSP. You may have six or seven cores, and a general-purpose operating system going through this part of the system. That operating system may control other DSP interfaces, including RTOSes.”

Verification and testing brain freezes
This approach leads to another problem, though. How do engineering teams verify and test this complex SoC, which now may include multiple types of processors and processor cores, various types of software, and a central software management scheme that probably involves a standard operating system? There may even be middleware making some of the connections, and in homogeneous environments possibly even a virtualization layer that may include hypervisors that can run on bare metal.

“The first thing you have to deal with is a traffic debug issue,” said Cadence’s Schirrmeister. “In many cases, the partitioning may happen by hand. But how you pull this all together may affect your debug strategy. Tensilica presented an extreme example involving a printer design, where they had a block diagram of the functionality and the cores. The printer company used Tensilica cores, which allowed them to replace the functions done in RTL with programmable functions. The connections worked, the memories worked, and the functionality was done in software as bare-metal, low-level software.”

There’s a tradeoff in doing that, however. Driscoll said that pushing functionality down to lower-end processors makes integration more complex. In addition, measuring power consumption becomes more difficult because it means adding up energy transactions that the memory transactions will trigger.

“That means you need data to verify what works at the block level, the subsystem and in the overall system,” Schirrmeister said. “And some chips have processors you can’t access from outside for security reasons. You need flexibility in the software because of security, but you are not allowed to see it from the outside.”

Conclusion
While there has been much attention devoted to finding a common language between hardware and software engineers, the real path forward may be more focused on matching goals at the architectural stage, and then being able to swap information as a design progresses.

Virtual platforms that allow software to be developed earlier in the process help. So do some of the features that are being built into RTOSes these days. In addition, stacked die will help eliminate some issues, while creating new ones. But the real challenges will continue to be integration of hardware and software, and of various types of software with other software—with an eye toward remaining within a power budget and understanding how code affects energy consumed over time.

When Worlds Collide: Saving Power In Communications Applications

Thursday, January 12th, 2012

By Ann Steffora Mutschler
The interplay of hardware and software is a given in every device that contains a semiconductor chip, but is typically felt more acutely in communications applications given the extremely close dependencies for everything power-related. Managing power in these situations just gets more challenging as consumers demand more and better applications on their tablets, smartphones and other mobile devices.

Power is always one of those things that needs to be addressed at many levels simultaneously because there are both raw technology factors: the semiconductor technology as well as system issues—what algorithms you run, what is the collaboration that takes place between handset and base station, stressed Chris Rowen, CTO of Tensilica. “All of this can have significant effects on what the total energy consumption is of the system. As you go up in the level of abstraction you move away from the individual transistors and talk about what the system behavior is, so you can get larger and larger relative savings.”

This is because if the system can be organized such that no communication is required, or one bit of information tells the whole story, then gigabytes of information may not have to be moved across the network, he said. The problem is that the engineering team must know exactly what single bit tells the story. “There are many things that people do in finding ways to store data instead of communicate data, to encode data more cleverly, to make data communication more resilient so that they can avoid doing work, avoid doing communication processing and therefore save lots of power because the radio never has to go on, or has to go on very infrequently and the encoding can be greatly simplified.”

“At the other end of the spectrum,” Rowen continued, “there are many things that you can at the level of the circuit design, the logic design, the processing architecture, which can significantly reduce the power as well—even once you accept that certain standards and certain communications protocols can be used and the intelligent chip architect or system architect is aware of when they have to live within the ground rules of the standard that they are implementing.”

Techniques for saving power
From a software perspective, power-saving techniques are being driven by emerging new architectures, such as ARM’s big.LITTLE, which is where there is a companionship that is able to take over the system when there are low-performance requirement and high-energy requirements, and there are high-performance requirements and faster CPUs, said Achim Nohl, technical marketing manager for Synopsys’ solutions group. Within this approach is the new concept of switching from—on a very coarse grain level—a high-performance, high-energy profile CPU to a low-performance, low-energy footprint CPU.

“At the same time,” he said, “there is an orthogonal technique for power saving—dynamic voltage and frequency scaling (DVFS) where in parallel to big.LITTLE you are able to scale down the frequency of a cluster or of a single CPU. That can only be done by predicting what the performance requirements are for the specific workload to perform a just-in-time completion of a specific task. I need to know how much processing power will be required in order to satisfy this task so that it can be computed just-in-time.”

There’s a lot of impact on the software and on the whole workload prediction. Schedulers must become power-aware. There is also a contrasting scheme called “race to idle,” where rather than scaling voltage and frequency you run as fast as possible and then remain in idle mode as long as possible. But these solutions are hard to evaluate against each other because they are highly scenario-dependent. Scenario means the software and the whole user scenario, Nohl said.

Rowen pointed out a hardware technique for power savings that is gaining steam is the more careful adaptation of the processor engines to fit the tasks, because there are very distinctive things that you do in some parts of the receiver such as FFTs, while in other parts there is a lot of filtering, and in other parts there is forward error correction, which is a successive approximation method for determining what the signal was.

For those at the sharp end of silicon platforms for mobile devices, Pete Hardee, director of solutions marketing at Cadence observed that semiconductor and systems companies are seeking all the power saving techniques they can get. “This is where people have been using the regular techniques like power shut-off. We’re going to see, as well as power shut-off, a lot more use of DVFS – that’s certainly going to be seen a lot more as people struggle with power.”

One interesting technique as designs go from node to node is substrate biasing, which has been used effectively at earlier nodes like 90nm. “Once you get under 90nm there is a lot of debate as to whether or not substrate biasing is an effective technique or not. It is applying a negative bias to the body of the silicon, which reduces leakage especially when you’re at near-threshold voltages. We see substrate biasing being used even at very deep submicron, especially in relation to standby modes of memories that reduce leakage. [But] there is a lot of debate on the effectiveness of substrate biasing beyond standby mode of memories and the reason is there’s a routing issue that all of these bias signals. Per transistor, you’re effectively supplying a bias supply to each transistor of the chip so that gets very expensive from the power routing point of view and you start to hit routing congestion and so on. We see people using substrate biasing at the 40nm node and then it gets a lot fewer at 28nm and people are starting to wonder if its going to be effective at 20nm (22nm for Intel). One thing that we’re figuring out is that finFET probably obviates the need for substrate biasing. You’ve got a lot better control of leakage due to the topology of the gates through the 3D construction of the finFET transistors. When finFET becomes the norm, we think we’ll see the end of substrate biasing as a technique.”

Intersection of low power and test
While test is not so much an area where much can be done at this point to help save power, it is nonetheless an important part of the design process with unique issues, noted Greg Aldrich, director of marketing for the Silicon Test Systems group at Mentor Graphics. “The two aspects that we have to deal with for low power are first, when we are inserting structures into the design, are we consistent with all of the power intent or are we respecting all of the power intent, the power island and what is required for the low power design when we are inserting logic? Are we making sure that, for example, when we are inserting compression logic into a design that may have three or four different power island or power partitions that we’re not crossing those power partitions with the test logic that we insert or that we’re properly isolating those power partitions. If there’s a constraint in the low-power design where I can’t power up all of the three partitions at the same time, I need to be able to test it in that manner, as well.”

The second and, he said, maybe more concerning issue is then when testing a low-power design the tests cannot overstress the power design. “If it’s a low-power design, typically that means that there is very little switching activity, a lot of the design is turned off and the power rails, the power system is designed that way. When you do testing typically you want to get as much activity as quickly as possible so that you can test the device as quickly. You’re testing every single device you manufacture, so every second you spend testing that device costs you more money in the manufacturing cycle.”

In test the objective has always been to get as much activity in the circuit as possible in order to test it as fast as possible, but for low-power designs that approach could damage the device.

At the end of the day, the biggest problem in looking to save power in communications applications, according to Marc Serughetti, director of product marketing for virtual prototyping at Synopsys. “It’s not about hardware, it’s not about software. It’s about the two together, and when it comes to software it’s not about the low-level software either, it’s about the entire software stack because a simple application can create a significant problem when it comes to power consumption. Now you are talking two different worlds once again colliding, and if you approach this purely from a hardware perspective you are going to end up in situation that may sound interesting for the hardware people, but when it comes to the software world where you need to be able to run Android or Windows Mobile, the performance of the environment you need to use are a significant component that must be analyzed.”

Experts At The Table: Making Software More Energy-Efficient

Thursday, January 12th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: Software is causing as many problems in low-power designs as anything else. How do we fix that?
Neifert: Software is creating problems everywhere because software is driving everything. Customers are using software for more and more stuff, including software-driven verification and architectural analysis. It’s only natural to take traditional back-end tasks such as power and move those forward to enable that analysis to be done earlier in the process and see how the software will impact the system. As software increasingly dominates power usage, because the processor and IP guys are getting smarter and smarter about turning things off when they’re not being used, you can’t just blindly use some verification vectors. You need to really see how it’s being used by the software. That requires virtual prototypes in conjunction with power tools.
Hardee: The software is controlling everything, but in recent years we’ve seen an explosion of applications doing all kinds of things you want to do on mobile devices. One thing we see people struggling with is the best way of optimizing for power for different applications, which have different needs. If you’re watching video, you have a frame rate to deal with. Rather than shut down, you’ll probably want to optimize framework lengths or run as slow as possible and make sure memory transactions are being accessed from cache. That’s completely different when you’re on the Web, where you want to run as fast as possible to get a good image and shut down. What you’re actually doing with a device will require very different power strategies.

LPE: So how do you analyze that?
Hardee: You have to run a lot of cycles in the various system modes to be able to model it. That’s where it really gets very disconnected from the test vectors in logic simulation. You can’t use those anymore. You have to have a set of vectors that is representative of the various application modes you have on the device. That’s a big change for a lot of people.
Kaiser: If you’re going to have software guys doing power optimization, you need to get them some kind of metric to measure or estimate the power. If you go to a software engineer today and ask them to optimize power, how do they know if they’re doing better or worse? Giving them a power meter helps. Giving them a power meter that self calibrates helps a lot more because the software guy really doesn’t know what calibration is for A-to-Ds. He sees negative current and figures he must be doing something wrong because he’s charging the battery. First give these engineers something to measure. Second, when you’re doing a product you have to identify all these different use cases. Internet surfing is one. Video playback is another. Software teams need to optimize individual use cases. But to do that, you have to figure out how much you’re going to be doing of what. A cell phone is sitting in idle 99% of the time, but 99% of the energy is not used on standby. So you have to look at the different use cases and figure out which use case you really want to optimize.
Kulkarni: What we’re really concerned with is energy consumption over time. Instantaneous power, dynamic power and static power are well known, but energy consumption over time is where software enters the picture and turns it into system-driven power consumption analysis. If you look at power times time, it’s the clock frequency and the duration of the cycle. That’s where the most of the applications are causing all these headaches. Why does a GPS application versus YouTube versus music have so many energy profiles? We can control instantaneous power and dynamic power quite well at RTL and below. Static power can be controlled quite well. But testbenches that look at the overall functional verification are not relevant anymore because you need to look at the states, too. And depending on which application is used, the energy will be different. We need to instrument that with respect to the power states. How you create the right test vector set or testbenches becomes a real challenge. Then, looking at average power and average voltage versus average current over time. That’s where the issue of co-simulation comes in. Running the complete simulation on a virtual platform becomes more interesting. At the moment, instrumenting the software is not possible.
Rowen: This is a serious problem. One problem is that the software guy has a simplified view of all the clever things the hardware guy did to make it possible to reduce the power. The hardware guys are obsessed with power these days. Not all of what they come up with is good or practical, but they are at least thinking about it. They have power modes, techniques for certain operations or instructions to meet power requirements. That may be pretty far removed from what the software guy, who’s on the front lines of getting a product out the door, has visibility into. It’s made worse by the fact that most the tools people have work well in a simulation environment, but often what happens at the end of the day is you have a programmer, a prototype board and an ammeter. It’s a very crude picture of what’s going on. The poor software guy is trying to figure out what he can do overnight that will move the ammeter.

LPE: We have no standard gauge for software. It varies by application, middleware and operating system as well as by the usage from one person to the next. How do we deal with this?
Hardee: If you look at the need for system vectors that reflect the application you’re running, that becomes a problem when you’re trying to provide a power meter for the software engineers. That works well if you can run actual software against the target device, which is the only chance where you can run it fast and get accurate readings. You need accurate vectors and you need accurate characterization to get any sense of power or energy. The difference between power and energy is you need to know over time what the system is doing and model that correctly. Virtual platforms have some potential to help, but they’re problematic. If you’ve got the kind of virtual platform that runs fast enough to make a software engineer happy, you’ll be modeling that at an untimed level. At the untimed level the virtual platform is instruction accurate, so it’s getting its timing and instruction cycles from the processor. If you think about what’s going on with software and the choices to process things, can you do it from cache or do you have to go out and fetch it from some other level of memory. Those have a huge impact on the number of clock cycles, rather than instruction cycles, that it takes to perform those tasks. So the point is that you need a lot of timing accuracy before you can get any kind of energy accuracy. That’s difficult to build into a virtual platform.
Kaiser: You don’t need the actual numbers. You just need to know if it’s getting better or worse. You can give the software team a relative number. Second, you can start doing estimations. With an MP3 you want to know what your cache-miss ratio is.
Hardee: You need to start to measure the things that you know drive high energy usage, as opposed to measuring the energy usage itself. When you have 400% or 500% difference between cache and memory, it’s hard to put different algorithms in the right order. You don’t even have the relative accuracy you’re looking for.
Kaiser: So are you looking at platform-to-platform comparisons? I’m thinking you take the platform and get the software guy to make it as best as it can be.
Hardee: You’re coming it at from the standpoint of post-hardware. How does the software guy optimize his programs?
Kaiser: Or even if you don’t have the physical hardware yet.
Hardee: I’m approaching it from the view of the system architect designing a new system. How do they know they’re going to meet the power spec? If you’re rendering graphics versus video, you have to be running the right algorithm on the right core. There are multiple choices, and you have to figure this out even before you measure things relatively, let alone absolutely.
Kaiser: That’s is a system architect challenge, not a software challenge. The most the system guy can do is identify the best possible scenario. The software guy may or may not come close. Sometimes it may not even be possible.
Hardee: And you can really mess up the software. The system architect does have to make an assumption what he’s building into the system architecture will be used efficiently by the software guys.
Rowen: You really want someone who has a deep understanding of what instructions to use, what compiler flags and power modes should be used, and what is the realistic scenario that will contribute to the worst case. In general, things are better when more things are programmable. The worst thing is where the controls are inside some obscure, hard-wired function unit. We had a big customer recently that had trouble meeting power goals. The fact that they were using programmable audio made it a lot easier to come up with another way to buffer the data and initiate the applications.
Neifert: When you have the chip, at least you have an ammeter sitting there. Before you have the chip, the software guy is in the dark. He often doesn’t have any indication of what’s going on in the system from a power perspective because there isn’t much there to tell him that.

Making Software Better

Wednesday, January 11th, 2012

Low-Power Engineering talks about what will make software more energy-efficient with Pete Hardee, marketing director at Cadence; Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior VP and General Manager of Apache Design, and Bill Neifert, CTO of Carbon Design.

YouTube Preview Image
Next Page »