Posts Tagged ‘Mentor Graphics’

Next Page »

Experts At The Table: Low-Power Verification

Friday, February 24th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss the problems of identifying and verifying power issues with Barry Pangrle, solutions architect for low-power design at Mentor Graphics; Krishna Balachandran, director of low-power verification marketing at Synopsys; Kalar Rajendiran, senior director of marketing at eSilicon; Will Ruby, senior director of technical sales and support at Apache Design; and Lauro Rizzatti, general manager of EVE-USA. What follows are excerpts of that conversation.

LPE: What standards do we need that we don’t have?
Ruby: My vote would be for a power-consumption level. That’s absolutely critical. Gates have their own built-in standard cell libraries and power models. You abstract to RTL, and we have found a way of using the same standard-cell library. But once you go above RTL there is nothing in terms of power consumption models.
Pangrle: With UPF, SAIF (Switching Activity Interchange Format) had been included. The standard is really set up for power intent. But when you look at the back, there’s this ‘SAIF’ information, as well. There has been a little bit of talk on that committee about what we should be doing with this. If you’re going to have any type of standard to specify the energy coming out of a given part, that’s going to be tied to some activity. You have to have some way of specifying that, as well. There is a potential for tying that together in the future, whether it’s there or somewhere else.
Rajendiran: By definition, a standard means multiple parties need to talk. Models have come out of that with standards like UPF. You can specify the power intent, but that’s a very high level. Where it gets implemented, when you’re splitting microwatts and nanowatts, it’s necessary but not sufficient. You need a standard, but you need good tools and someone who can combine the high-level intent with all the nitty gritty low-level details of the variants and power and know how these things will be put together and verified. I don’t think we’ll ever have a situation where you push a button and determine whether function is correct and timing is correct and power is correct. If we do, that means we have stagnated. We will always have something pushing us.

LPE: But we’re still at the point where we don’t even know how to write testbenches for power, right?
Balachandran: Yes. When a customer starts with their first low-power design, they’re struggling to figure out what to do to exercise all the low-power functionality in the design. If the customer is more experienced, they already have a way to do that. They’re learning how to do a low-power design from a test perspective. Creating that first test plan and testbench for a low-power design is not a trivial task. It’s a huge methodology change. They’ve got to invest resources in terms of learning what it takes to do it and allocating resources to do it. That challenge is still not solved.
Ruby: If you look at it from a hardware perspective you’re absolutely right. But if we can somehow leverage the applications running on this hardware and transform them to testbenches, that would be another element of this solution.
Rizzatti: But is it the embedded software doing that? The application is the embedded software. You don’t have to invent anything.
Pangrle: It’s been a challenging transition for the hardware guys to deal with all the complexity brought about by power. If you look at the software, it’s almost always been purely about functionality. Either the software works or it doesn’t. Now you’re asking the software guys to look at the way they write their code because it can have a big impact on the performance of a device.
Rajendiran: A developer writes a piece of code and even provides a way to test it. The developer knows the guts of the software, so he comes up with a test plan. We used to call that white-box testing. Then we also used to pick up the most junior guy and tell him to do a black-box test. The reason we did that is the developer knows a certain path to exercise. If you give it to some random guy he’ll exercise it in ways the developer never thought about. It’s probably impossible for the developer to write a test plan for power, though. You probably need a combination of test plans. The better companies have people writing power verification plans and the developer writes the functionality part.
Balachandran: I’ve seen the same thing. The designer and the verification engineer are still working independently. The design engineer is not the verification expert and isn’t the architect of the low-power system. The verification engineer comes from a software background and has expertise in writing a test environment without much of the low-power hardware knowledge. The low-power architect has very little idea about how to write the software and how to write the testbenches to test his design. It’s almost as if you need a third person who knows enough about both of them to be an intermediary in order to close the gap. That’s the challenge companies are facing. You either have a verification expert, who is not a low-power expert, or you have a low-power design expert who doesn’t have enough knowledge of the verification concepts. Large companies are investing a lot in methodology. They have low-power methodology teams in place, and they’re tasked with coming up with the proper flow to get this working. Some companies are expending as many as one or two years to come up with the right methodology for their environment before they put a flow in place for their design teams.
Ruby: What if you can say, ‘Look, I’m designing this chip for a cell phone and I can verify it runs this operating system and 500 applications?’ If I forget about testbenches and power and I can just run the applications on my hardware, to me that would be golden. So can you run an application to analyze your power grid? That’s a big question.
Balachandran: That’s a great idea, but the apps don’t come out before the product is launched in many cases. How do you test for that before you get the product out in silicon and ship it? That’s where the challenge is.
Ruby: You test it the same way as our customers do. You test it with what you’ve got at the time.
Balachandran: That requires a very big effort on the part of the software developers for an application and the system providers working ahead well in advance. Only a couple companies in the world have that kind of clout.
Ruby: Look at the consolidation in the semiconductor industry. That’s exactly where we’re going.
Balachandran: But even those companies that are the system builders are sourcing their ICs from different companies. This has to go down the chain where the IC companies have those requirements and specs for apps in mind from the beginning, because if you talk to the typical IC maker they don’t care about apps as much.
Ruby: They should.
Balachandran: Maybe they should, but it’s a big change in the entire ecosystem. It’s possible, but not easy to accomplish.

LPE: What happens with stacked die? Does that change the verification process because there are new issues with power in stacks?
Pangrle: That’s more on the physical side. On the functional side it’s just a bigger system. On the physical side, the interaction of heat dissipation comes into play. If you’ve got a single die you can stick a heat sink on top or do some active cooling. If you start stacking something in the middle it doesn’t have a direct path out to that heat spreader anymore.
Rajendiran: When you think about flip chip, it took 30 years before it came into common use. The reason is that we have vectors pulling us in different directions. On one hand the mobile and consumer space are pushing us into the cheapest and lowest power. That’s a completely different direction than some of the biggest companies are taking. The biggest benefit people are looking to gain with stacked die isn’t low power. It’s that the cost is too high to move to 28nm or 22nm. Already verified chips at 180nm are perfectly fine because that portion doesn’t need to run that fast. If you combine 180 and 130 and 65, and maybe do the logic that ties them together in 90nm, time to market is the key.

LPE: But when you verify that, three good chips don’t necessarily make one good stacked die, right?
Rajendiran: No, and verification will continue to be a challenge. We need standards and models, and the verification will only be as good as the model. The overall full-chip model has to make some assumptions about timing and power, and that has to come from the IP supplier and manufacturers.

Experts At The Table: Low-Power Verification

Friday, February 17th, 2012

Low-Power Engineering sat down to discuss the problems of identifying and verifying power issues with Barry Pangrle, solutions architect for low-power design at Mentor Graphics; Krishna Balachandran, director of low-power verification marketing at Synopsys; Kalar Rajendiran, senior director of marketing at eSilicon; Will Ruby, senior director of technical sales and support at Apache Design; and Lauro Rizzatti, general manager of EVE-USA. What follows are excerpts of that conversation.

LPE: How does IP affect verification from a power perspective?
Pangrle: In the old days you’d see processor cores quoting watts per megahertz. When you’re operating at a standard clock frequency and voltage the whole time there is no other state that’s at least a reasonable approximation. Now you look at the levels of complexity going on above that with multicore designs that may be operating at different voltage levels or different clock frequencies. Watts per megahertz goes right out the window. There is a challenge in how you get that kind of information. If I’m a designer working in an SoC I’m trying to figure out what is my lowest energy core to put in there and what voltage and what frequency you want to run. How do you make these tradeoffs if you don’t have any data to work with? And then, once you have this, how do you hook it all together. You need to give that to the customer who’s going to use all this IP. That can be a third-party IP from outside or IP that’s developed in-house that you want to re-use from project to project. How do you keep that information so the next designer who picks that up knows how to use it? That’s complicated further by the fact that a lot of control for these SoCs is moving out of hardware and into software. From a verification standpoint you need some environment where you can bring in the software and get an idea. Often what’s going to determine the power is the software running on top of it. With phones we see instances with software where you download an update and your battery life gets a lot better—or it gets worse.

LPE: We’re used to verifying pieces of this, but now we’re dealing with interactions, as well. Don’t we have to raise up the level of verification?
Pangrle: You do. There’s almost a renewed interest in the high-speed emulation. Clock frequencies, especially for x86 processors that are used for most simulation, have stalled out since about 2004. We’re getting access to more cores, and if you’re doing regression tests that’s good. But if you’ve got software that you need to run and you can’t break that up, then the reality is that single-threaded performance hasn’t increased much. We’re seeing a lot of interest in higher-speed simulation and verification to help address that problem.
Rizzatti: Raising the level of abstraction won’t do any good in computing power consumption and verifying power domains because you don’t have the accuracy. Here we are really talking about cycle accuracy. Emulation is the alternative. It’s cycle accurate but you have performance that is orders of magnitude faster than a simulator. You can simulate the interaction of the software with the hardware and verify power domains turning on and off. There is lots of interest here.
Rajendiran: With chips becoming more complex and larger, if you have 100 power islands, at any time you may have one or two operating and the rest you want to put into a lower power state. So how do you know those things have been turned down to a low power state? That becomes a verification challenge. The more domains and islands you have, the bigger the challenge to make sure your intent is implemented at the chip level.
Ruby: Power verification has many different aspects. There is power intent verification. There is power consumption verification. There’s probably also something to be said about power integrity verification. Test power is orders of magnitude higher than functional power because all of the low-power techniques and clock gating are not working in the test mode. Everything is open and running at the same time. You have to verify physical power integrity, as well. Verification in the past used to be RTL or something above that. For power it needs to be done at multiple levels, from the system level to RTL. And what do you feel comfortable with? If you do RTL emulation you can understand the hardware behavior of your design. The question is how does software drive that hardware. Can you establish that kind of behavior in accelerators, or do you need to bring the hardware up a level and truly make it work with the software? Based on the RTL description we can estimate power consumption. The further you move up in levels of abstraction the more you lose. At the gate level you get good accuracy. As you move up to the RTL you start losing accuracy. As you move up to system level you may lose so much accuracy that the analogy becomes useless. There is a tradeoff here between the types of verification you do at the higher level, including power consumption, and the turnaround time that’s required.

LPE: Aren’t we changing verification, though? We’re taking it both up in abstraction and down to the most basic level.
Pangrle: It isn’t an either/or issue. It’s an ‘and.’ You need to be checking this at multiple levels. You can’t ignore one level and hope everything will work out okay. You have to make sure you’re accounting for all the different scenarios a complex SoC may run under. One thing that gets challenging is that if you’re looking at running software on a virtual prototype or some type of accelerated environment, you can’t run that many cycles when you’re looking at what’s happening on the power grid. Only certain pieces are of interest to you. When it’s doing what it’s supposed to be doing most of the time, that’s probably not where it’s going to fail.
Balachandran: Software verification is important. A lot of it is also done with methodology. The verification methodology is changing compared to non-low-power verification. They also have a link to SystemC simulation in their environments. And we are seeing customers with a renewed vigor to do gate-level simulations. Some customers now have a methodology of not taping out their chip without doing more extensive gate-level simulations for the power. The low power has caused a renewed focus in that area.
Rajendiran: When you are splitting microwatts and nanowatts, you have to go to that level. RTL was great 20 years ago because you were doing simple blocks with 20,000 gates. There are so many libraries at the same process node and for the same flavor that it really becomes very critical which library you pick. That can have a huge swing factor for a mobile device. If you have a chip verified in this process at this foundry but it’s not at a price point you like, you can’t just move it. Gate-level is becoming a lot more important at that perspective.
Rizzatti: A simulator running at the gate level will take forever. We’re talking many years.
Ruby: We need to do this with multiple levels. But one of the key elements of the solution involves technology-dependent models that can take you through the levels of abstraction. There may be multiple models for multiple levels, but you may want to say you’ve done this design and you know what the power profile is and you can measure all the way down to gate or transistor level. You can then abstract the model for that block, push it up several levels of abstraction, and still maintain as much accuracy as is practically possible. You can do this because it’s not being done from scratch. If you don’t a netlist and you don’t have a post-synthesis result, you can still estimate power consumption based on RTL and create an RTL power model. Technology-dependent modeling, not milliwatts per megahertz, will be the key to this puzzle.

Experts At The Table: Low-Power Verification

Thursday, February 9th, 2012

Low-Power Engineering sat down to discuss the problems of identifying and verifying power issues with Barry Pangrle, solutions architect for low-power design at Mentor Graphics; Krishna Balachandran, director of low-power verification marketing at Synopsys; Kalar Rajendiran, senior director of marketing at eSilicon; Will Ruby, senior director of technical sales and support at Apache Design; and Lauro Rizzatti, general manager of EVE-USA. What follows are excerpts of that conversation.

LPE: What’s the big challenge with verifying power in an SoC?
Ruby: Power has a couple of different components. One is how the low-power techniques impact functionality. If you talk about things like power gating, power supply shutoff, multiple supply voltages and so on, this is where you need to understand certain rules of turning on and off power supplies. You need to be able to create retention cells, to be able to retain state, and to retain functionality. That’s one major aspect. The other side is that you have to look at the power consumption itself. How do you verify that you are on target, if you have a target, and that you are not exceeding a specification? And how do you ensure the design has efficiency built in.
Rajendiran: This is all about is trying to verify what your intentions were that you stated in the beginning and making sure that has been implemented—and when the chip comes out, making sure it is functioning that way. In the old days we simply meant functional and timing verification. Now, just on the functional side, it has become so complex that just getting it out the door is a challenge. It’s the same with software. No one thinks about verifying it all. That’s the practical problem. The person who is verifying the power states doesn’t have the time to put in the right hooks. We have the Unified Power Format to help, but we still don’t have standardization as to how you verify the states. Tools rely a lot of naming conventions, but even though there are fewer companies there is still not compatibility in reading all of those things. Tools are always playing catch-up, too. The ideal solution will be a combination of great tools and planning. In addition, you can have the best tools, but if you put them in the wrong hands you don’t get results.
Pangrle: There’s a functional part and a physical part of verification. A lot of what is going on in the industry right now, especially with the power formats and the convergence around UPF and the 1801 IEEE working group, has been to keep the power intent separate from what has been the standard part of functional verification. It’s allowing people to use their standard flow, take it to RTL, and still be able to design RTL blocks that can be used in different design scenarios with different power management. You don’t have to hard-code isolation, level-shifting, retention registers into those blocks. You can still design your block your same way, and if in one design you’re going to power down that block that’s okay because the intent information is in a separate format and you can bring that in. From that standpoint, there has been good collaboration between EDA companies and their customers. From the standpoint of putting it all together and being able to support the tools, one of the things we’re seeing is that as EDA companies work with designers there are times where something is a little different and different vendors have created support. That’s where it gets tougher to move designs from one company’s set of tools to another. It also brings up some new questions. From the physical side, if you’re powering up and down blocks it has a real impact on your power grid and whether it’s going to function. Just because logically it looks as if it should work, that doesn’t mean when you get your chip back from the foundry you’re not going to run into other issues. And in terms of the complexity of testing, you can do the standard ATPG, but when you go through the dynamics of running different voltages and frequencies and bringing things up and taking them down, to what extent are you actually going to test that?
Balachandran: Verification is complex enough without low power, stretching the resources from both a verification productivity standpoint as well as IP cost. When you add low power into the mix, it makes things much worse. The complexity of low-power designs has been going up slowly but steadily. Some companies that are on the cutting edge, particularly in the mobile market, started adopting low-power designs about five or six years ago. They were the frontrunners of the whole low-power wave. They put the initial pressure on low-power verification, because now you have to start thinking about verification differently. You have to start thinking about voltages, multiple supplies, and whether things going to work in all those conditions. Clock gating is the most basic technique, and almost every company you talk with has been doing clock gating. Now that has expanded into more sophisticated techniques to curb the power, and with that comes the burden to verify properly. All it takes is one unverified state or transition or sequence for the design to completely lock up and not function at all.’

LPE: How bad is this problem?
Balachandran: It’s becoming more widespread. There are government regulations and green initiatives. Everything is going green. There are demands on specifications, and even on power for devices connected to the wall. That requires chipmakers to make their designs much more power-efficient. Customers typically start with four or five power domains. Some of that verification can be done with static techniques or with some rudimentary simulation. But it’s becoming more complex, and this complexity is increasing for the mainstream market, not just the mobile market. The number of power domains is exploding. We’ve seen designs with 50 power domains, which is potentially 250 power states. It’s pretty much impossible to verify all of them. So you need to come up with a really good test plan. When people are confronted with low-power designs the first time, they have no clue about how to write a testbench for low power. Often they need a lot of methodology help, in addition to having the right tools in place, to figure out what they’re going to do, how they’re going to go about doing it, and how they know when they’re done. Then, what is the measure of confidence they have at the end to figure out if they’re really done?
Rizzatti: From the perspective of emulation, this technology has been used for functional verification. Ten years ago, power management was essentially a gated clock. You turned off and on some part of the chip and saved energy there. Around 2001-2002, designs with 10 or 20 of these were called derived clocks. Today we have customers with 100,000 derived clocks. There’s an explosion. But that’s only one problem. Over the past five years, and especially in the past one or two, there are all these new techniques for turning on and off voltages. We had one customer with well more than 100 power domains. The whole industry is changing. Power management is a nightmare, and it makes SoC verification orders of magnitude more difficult.

LPE: With a disaggregated supply chain and more IP re-use, does it make it more difficult to verify the design? Not all of the IP is fully characterized for power.
Balachandran: UPF, or IEEE 1801, and CPF have ways to model the power intent of IP. The issue isn’t so much the ability to specify the power intent of IP. Talking to all the major customers, everybody is either integrating internal IP or using third-party IP. Some of the IP blocks have their own power management, too. It has to be communicated to the integrator of that SoC as to what are the legal ways to integrate the IP into the SoC. That information has to be passed along. The power format is not the right way to pass that information. So the industry has to work out a way—together—to solve this problem. The IP companies, the EDA companies and the whole ecosystem has to work on this to facilitate communicating the right behavior that IP can be integrated from a power perspective, and to tell the IP integrator when they are doing something wrong. If IP is coming from a third party and you have no idea what is going on with that IP in terms of its inner functionality or how the power is implemented and what ways you can put it together on the block, then you can shoot yourself in the foot pretty quickly. This is a problem that needs to be solved. One potential solution is to create assertions for an IP block. The IP developer doesn’t know how IP is going to be used, but they do know what is legal or not. They can create assertions for that and ship it with the IP. Then, when the integrator puts it into the SoC and runs the verification, they are able to figure out if they’ve done it properly or not. If it’s not, then they can have a dialog with the IP company. It’s a way of communicating the data sheet of the IP to the next-level integrator. This is one way of solving the problem. It requires close collaboration between IP partners and EDA and design services companies.
Rajendiran: More times than not, people don’t do that. There are many ways that tools can help, too. If some expert designed the IP block, he can provide some input and then a tool can insert assertions back into the RTL. Ideally you want to keep it separate as a companion file. That’s one approach. But the problem is more complex than that when it comes to low-power verification. IP is one issue. There is physical IP where you can’t do much because it’s already hard coded. There’s also soft IP. Each of the classes has its own challenges. With the soft IP, a lot of activity only happens at the gate level. Depending on how the RTL gets synthesized and mapped, you can have a perfectly functioning solution when you use a particular library in a particular foundry, and the same thing may not work somewhere else. You need deep knowledge about this stuff. You need collaboration of tools, the integrator and the IP developer to make sure you at least get the product out to market on time.
Ruby: There is another dimension of IP—the power intent side, which is the functional verification aspect. That’s absolutely essential to ensure the functionality. Time and time again, what I’ve come across is the need for some way to describe the power consumption behavior of IP, as well. It could be technology dependent or technology independent. It could be models that describe assumptions as a function of clock frequency or data rates. From my customer perspective, this is also becoming essential in the power verification area because they’re not just worried about functional intent. They’re also worried about hitting their power specs. They need models for the IP coming in. If they plug IP into their design and they run their clock frequency at a certain rate, what power consumption can they expect? That’s another very important element to this verification challenge.

Step Away From the Spreadsheet

Thursday, February 9th, 2012

By Ann Steffora Mutschler
Engineers today spend more than a quarter of their time trying to meet power specifications.

A survey of more than 700 engineers by Calypto illustrates just how important and time-consuming power management is today for engineering teams. As consumer devices grow ever more complex, the need to deal with, analyze and optimize power at not just the RTL but at the system level is the next challenge, even if the path to reach that goal is not yet clear.

The opportunities for optimizing a design for power efficiency are greatest at the architectural level of abstraction. The further a design moves downstream the less effective optimization techniques become, noted Yossi Veller, chief scientist for ESL at Mentor Graphics, in a white paper he co-authored for ARM’s IQ Magazine. “Power optimization must begin with architectural analysis, exploration, and optimization of power and timing at the electronic system level (ESL). According to a study by LSI Logic, techniques available at the RTL synthesis phase have the ability to reduce power by 20%; those at the gate level offer a 10% reduction; while those at the layout level can reduce power by only 5%. Waiting until the RTL to begin optimizing for power is a wasted opportunity because power usage can be reduced by 80% at the ESL.”

Fig. 1: The ability to optimize power at the architectural far exceeds that at lower levels of abstraction.

“Traditional power optimization tools are really working at the lower levels of abstraction,” explained William Ruby, senior director of RTL power product engineering at Apache Design. “If you look at synthesis, if you look at physical design, there are some automated techniques that are available in those tools. But those are in a category of additional refinement-type steps. Once you have the design architecture nailed down, then you can add in some optimizations based on those tools and you can get some additional incremental power savings, but the part that is missing is enabling the true design-for-power efficiency. If you look at modern chip architectures, they are extremely complex and the RTL descriptions of these architectures are even more complex such that RTL in some cases is no longer seen as a viable architectural description language. You want to be able to describe the architecture of the design in a high level of abstraction.”

With this description comes the requirement to be able to analyze power. Today, this is done by synthesizing the design from a high-level description such as C++ down to RTL, and then an RTL power analysis tool can function and give feedback into the architectural domain. But what needs to accompany this synthesis-loop-back type of flow and give some indication of what the power numbers is more intelligence in those high level tools. They need to point out inefficiencies in a design at both the RTL and architectural levels.

Chris Rowen, CTO and co-founder of Tensilica sees two big challenges for power analysis tools. “One, it is very, very difficult to isolate where the real problem is. It only makes sense to really measure power at the level when you have really synthesized the logic and laid it out and you actually know what the physical design looks like, because the physical design has a huge impact on what the power dissipation of the circuit it.”

By the time it has gone through synthesis and place and route, you have really very little visibility into what was the original logic being questioned. “It all goes into the Cuisinart and all you get is this amorphous mush of gates at the end. So if someone asks you, ‘How much power is being dissipated in my multiplier versus in my divider versus in my register file,’ I don’t know anymore because I have to process them all together in order to get good physical results. But then it all has been aggressively remapped into other logic forms and I can’t isolate the power easily. So you have to work in rather indirect ways to figure out whether the power was being dissipated in one function versus another.”

A second problem, he said, involves system-level tracking of different scenarios. “It is extremely difficult to reach your power goal if you say, ‘Let me use the worst case assumption about each subsystem. I’m going to assume that every piece of my baseband is on, and every piece of my Layer 2 and Layer 3 protocol stack is on, and my image processor is on, and my apps processor is running full out, and all of my RF subsystems are running,’ because of course you’d exceed your power budget by a factor of two or three. Instead people recognize they’re not all on at the same time, the system doesn’t work that way. When you are doing one thing, then you’re typically not doing something else. Therefore, you only have to look at the particular combination of subsystems that is on at that time. However, the software guys have really poor tools to correlate what’s going on in the higher-level operating modes to what’s going on in terms of actual power dissipation in different subsystems. They are completely shooting in the dark where they do not have anything like the kind of accuracy for the modeling of these things.”

As a step towards true system-level power analysis, engineering teams are gradually figuring out that they need to build approximate models of power in addition to simulation environments that are fast enough to run realistic scenarios and to capture real activity. “Ironically getting power information is more than anything else probably a function of getting fast enough simulation, because only if you can run realistic size scenarios will you really gain interesting information,” he said.

This has become one of the big drivers of ESL, which until recently has been relatively slow to catch on. But complexity at advanced nodes, including power considerations, have significantly boosted it’s appeal.

“What the user would like is to have at the very early stages, when he has a TLM model of the design, is at least a relative assessment what architecture decisions will impact the energy in which direction,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “He will also want to know how the software impacts all of that. From a technology perspective, TLM models allow you to do that so it’s fairly straightforward to annotate power-related data into TLM models,” he asserted.

Annotating models with data just like annotating performance is a challenge and can be approached in three ways:

First, he said, “You can start with your assumptions, with your power budget. TLM models and virtual prototypes allow you to then execute your assumptions so you have in your power envelope/power budget. You say, ‘These tasks should take that much power, I know that from past experience,’ and then you execute your virtual platform with those annotated, estimated data or budgeted data. And you get dynamic results depending on what tasks the software ends up calling, how long a cell phone is used for which task in a day, and so forth.”

Second, annotate back from when you have RTL. “At the RTL level you have these switching formats that you can derive from the RTL to get a good idea about the activity,” Schirrmeister continued.

And third, it can be dealt with at the silicon level by taking previous designs, measuring power information and annotating back into TLM models.

Design engineers are undoubtedly looking for analysis and optimization at the system level so they can do power analysis and power estimation before RTL is available and before they can do gate-level simulations. But are they truly ready to adopt it?

Achim Nohl, technical marketing manager for Synopsys’ solutions group pointed out that today, power analysis starts with gate-level simulation. “If you talk to a hardware engineer and tell him, ‘We are going to employ virtual prototyping and high-level models to do power analysis,’ he will certainly look at you a little strange because he thinks, ‘I’m doing all those back-end optimizations and all those specific things to optimize power. How will you ever be able to reflect that in a virtual prototype simulation?’ But that’s not the point. For virtual prototyping, the granularity of a system is very much different. You’re not looking at just the memory controller. You’re looking at the CPU with the memory controller, the buses, the interconnect, the peripherals and how all those things are orchestrated to find out where the different hot spots are and what is best way to program all those pieces. What is the best scheduling technique? That is the concern at that level.”

When a new chip is architected today, estimates are done to determine whether the chip is feasible at all from a power perspective, he said. “Today, people are using spreadsheets in order to do this analysis, and this can only be a worst case analysis because they don’t know the dynamics and can’t reflect the dynamics of the system in those spreadsheets.”

While the pure architectural level tools don’t exist yet, many users are likely content with high-level synthesis tools for the time being. Apache’s Ruby believes they are good in their own respects but they are not actually meant to give architectural guidance; they are just meant to synthesize the design above the RTL.

One final thought for nervous system architects: The architectural tools of the near future will not replace the actual architect unless they become truly artificial intelligence, which is not likely to happen any time soon, Ruby concluded.

Margin Of Error

Thursday, February 9th, 2012

By Ed Sperling
Adding extra circuits and silicon area to a chip has always been frowned upon by chipmakers. Extra silicon means extra money, and for most chips the least expensive is always the better choice. But at advanced process nodes, margin also can slow performance, increase power consumption, and make it harder to achieve timing closure.

The obvious solution is to reduce margin throughout the design, but the reality is that margin budgets for a complex SoC will never go down. The best that design teams can hope for, in fact, is to keep margin constant from node to node and across stacked configurations. While this will require constant vigilance on the part of architects, it also will increase challenges from the conceptual stages of the design all the way to achieving acceptable yields in manufacturing.

What can’t be fixed
In some cases excess margin is out of reach of design teams. With more and more third-party IP now included in designs—and as much as 90% of the design now a combination of third-party and re-used IP—it’s difficult to even get a firm handle on the amount of guard-banding being done. So far, this hasn’t been a problem because most of the industry still isn’t producing 28nm chips in volume.

“Right now it’s only really a worry for the ‘star-IP,’ because if my USB controller is a bit bigger and power hungry than it might be, it is still peanuts compared with the overall platform figures,” said one architect at a large chip company, who spoke on condition that he not be named. “Even the sum of the power of all the little things doesn’t approach the star-IP. And here’s a thing about the star-IP: It may be big and power-hungry, but it there’s still a case for it. Some IP has a well-defined job to do and has to get that job done as efficiently as possible. But with star-IP, it’s mainly ‘faster is better.’ So sure your Web browser would be more power- and area-efficient on a Cortex-A8 than a Cortex-A9, but I bet you’d rather buy the A9-based tablet.”

Those kinds of choices, as well as time-to-market pressures where IP can be re-used quickly, make guard-banding almost inevitable. What’s surprising is not that it still exists, but that it has remained relatively constant given the explosion in the number of components on an SoC.

Where margin matters most
But margin still causes signal propagation issues because there is more silicon and more wires that signals need to be driven through. That, in turn, leads to the need for wider buses.

“When you guard band you need to ratchet up the intended operating frequencies and increase the clock frequency,” said Neil Hand, group marketing director for Cadence’s SoC Realization Group. “All challenges are made worse. In some parts of the design there is no impact. If you have a low-speed peripheral you probably don’t need to worry about it. But with something like high-performance PCI Express, gen 3, you have fast protocols and huge pipes and margin becomes a critical issue. You have a hard time meeting closure even with no margin. Margin makes it worse.”

He said the key is not so much reducing the percentage of guard banding. The rate has been relatively constant, with about 20% margin at 65nm and 90nm, and at least 15% at 28nm and 20nm.

“With that number there’s a lot more slack,” he noted. “You need to know where the slack is and where it’s going to impact the design. Where you do have room to move it may drive different IP use. There may be better IP externally.”

He’s not alone in that view. In fact, all of the Big Three EDA vendors are counting on the need to trim margin to boost their IP sales over internally developed IP blocks.

“There are a lot of challenges working with 28/20nm because of the variability in processes,” said Navraj Nandra, senior director of marketing in Synopsys’ Analog and Mixed Signal IP Solutions Group. “Reducing margin makes a different for getting performance out of analog. You also want to be competitive in price-performance-area. The question is how much margin you can accept in IP to meet those goals but not compromise on yield or variability.”

This becomes a difficult engineering tradeoff, however. Do you design IP for a specific chip, or do you add enough margin to allow it to easily plug into other designs? For commercial IP, the answer is clearly versatility, but there is a cost to that flexibility.

“You can’t be competitive and have slop in the design, but you can’t build something so competitive that it will only work for one design,” Nandra said. “It’s like a drag car where you run it for a half mile and then you have to replace the engine, the tires, and add more nitrous oxide. You can do the same for super high-performance chips for one temperature range and one process, but it’s useless for anything else. The goal is to build in enough circuit techniques with just enough margin not to risk performance problems if there is variability in the process.”

Manufacturability
Process variability has become particularly troublesome at advanced nodes. Coupled with double patterning at 20nm, and the likelihood of triple patterning at 14nm, margin takes on entirely new dimensions.

“We’re trying to characterize process corners and design around a nominal target,” said Jean-Marie Brunet, director of product marketing for model-based DFM and place and route integration at Mentor Graphics. “Third-party integration is a real challenge. Fill used to be a simple process where you insert it at every layer. But you don’t know what is in the IP these days, so fill has to be re-done. That doesn’t help with the integrity of the IP.”

He said that for most IP, there usually is guard-banding on the periphery of the IP to deal with fill. That impacts timing, area and performance.

“This is really an issue for the big chip companies that do 300 to 400 tapeouts a year, not for the microprocessor houses that can take their time to eliminate margin. The problem is there is no magic bullet for everyone else. And when we get into double patterning, this is really going to be an issue because you’re overlaying two masks, and any shift of the overlay will have a dramatic impact on the chip.”

The future
While pressure to reduce guard banding will continue, there is at least some hope for dealing with the problem more effectively. One involves new materials, such as graphene and silicon on insulator, which help reduce power, and new structures such as finFETs and carbon nanotube FETs, which minimize the effects of leakage and thereby make up for some of the power drawn by the extra margin.

A second approach is better tools. Knowing what the variability is in a process allows engineers to design in a minimum amount of margin. Building more accurate models can help, particularly in conjunction with analysis tools for exploring one IP block versus another.

And finally, stacked die will alleviate at least some concerns because portions such as analog can be developed at older nodes where they make more sense, rather than trying to fit everything into the latest process node.

Mechanical Meets Electrical

Thursday, February 9th, 2012

By Ed Sperling
For the first part of the 20th century mechanical engineering dominated almost everything in technology. For the second half, once the transistor and the integrated circuit became well entrenched, those two disciplines largely divided up the tech market.

More recently, however, they are being forced to collaborate in teams that historically had nothing in common. While the combination of electrical engineering with software has raised questions about how to trade information back and forth, mechanical and electrical engineering arguably are even further apart. But there is at least one consistent element throughout the most recent combination—power.

Power and heat
One of the biggest changes in engineering is that power is global. Physical effects such as heat, electrostatic discharge and leakage current can affect many other levels of a much larger system. That larger system could be a car, an airplane, or a data center.

“Inside of an engineering organization, someone near the top has to worry about the entire system,” said Larry Williams, director of product management for the electronics business unit of Ansys. “They have to think about boundaries between systems and subsystems, or between mechanical engineering and electrical engineering, because many firms are organized that way. When building a system, the optimum design can be found by considering the system as a whole, and additional margin is often found at those boundaries.”

He said that at a meeting within one defense contractor, he actually introduced the mechanical and electrical engineering teams, who had never met even though they worked on the same projects. Those silos have since begun breaking down, in part because systems demand power efficiency, better reliability and lower cost. Things that used to be done as purely mechanical engineering may be mixed together as part of a bigger system.

But the perspective of each is different. Consider thermal budgets, for example. Electrical engineers focus on turning off as much of a chip as possible when it’s not in use, and running what’s in use as efficiently as possible—even going so far as to weigh whether specific operations use less energy when they’re run at maximum speed for short periods of time or slower speeds over longer periods of time. Mechanical engineers, meanwhile, focus in the other direction—cooling the devices as close to the heat generation as possible. In the past, that meant simply drilling holes into metal and adding heat sinks and fans.

“As density has increased it is no longer possible to thermally manage a device around the PCB,” said Robin Bornoff, FloTherm product marketing manager in Mentor Graphics’ Mechanical Analysis Division. “It’s gotten to the point where the mechanical perspective cannot be a separate discipline. We’re now seeing representatives of the thermal design teams showing up right from the beginning in meetings with the system architect. They have to work together.”

That discussion becomes even more critical in 3D stacking, where heat can get trapped between two die. And it’s not just the stacked die that needs to be considered. It’s what’s around it, as well.

“Heat doesn’t obey existing design discipline barriers,” said Bornoff. “The heat will spread into the air, the chassis, the room, and out from there. How hot the silicon gets affects everything, sometimes even outside the building. That’s why you’re starting to see water-cooling in space applications and in data centers. It’s 1,000 times denser than air and 1,000 times better at removing heat.”

The challenge is to get that cooling as close to the source of heat as possible. So rather than just cooling a server cabinet, for example, the liquid is pumped around the processors producing the heat. There is even research under way in microfluidics to pump liquid around the chip itself in a stacked die. Bornoff noted that initial approaches tried to squeeze the fluid through very narrow channels, which required massive pressure. He said the latest research uses piezoelectric fans and pumps, whereby vibration creates movement in the fluid.

Fig. 1: Microfluidics. (Source: Imperial College of London)

MEMS and energy harvesting
Another confluence of mechanical and electrical engineering skills has been the MEMs world—microelectromechanical systems—which are growing in importance in markets ranging from touch screens to smart sensors and analog signal conditioners. There are even micromotors with gears attached to semiconductors.

“Electronics is relatively young compared to mechanical engineering,” said Cary Chin, director of technical marketing for low-power solutions at Synopsys. “But the next big rev of the market is pointed toward electromechanical systems. A lot of these are being looked at for technologies that will start to solve the power problem. With a mechanical system there is no leakage. And for devices that don’t require a really high level of performance, they may be able to power a system forever.”

Think about biomedical devices such as a pacemaker, for example. An energy scavenging system that includes semiconductor technology and mechanical energy harvesting can be used to provide enough power just from a person’s own heartbeat to both keep a steady pace, detect when there is an irregularity, and even act as a defibrillator for one or two stored duty cycles.

Fig. 2: Mini motors. (Source: Sandia National Laboratories)

“The challenge for the tools world will be to rethink optimization,” said Chin. “With power we already had to make significant changes for implementation and verification. Now what we may be looking at is support electronics, where the heavy lifting of computing is moved into the cloud.”

The future
The so-called Internet of things is another big driver in this whole shift to fuse together electrical and mechanical engineering. Within this scheme, systems will be defined as much collectively as individually, much as they are from subsystem to system, with the actual location of computing as distributed along the lines of the Internet.

Within this scheme, there will be many places that mechanical and electrical engineering cross paths, many driven by power, heat, signal integrity and new applications that are just now on the drawing board. For that there will also be new opportunities for tools that can explore tradeoffs of something done mechanically versus electrically, just as those types of tradeoffs are now made for the best kind of IP and processor cores within a given power budget. And as the silos break down, the possibilities are mind-boggling.

State Of The Art In Solid State Lighting Thermal Design

Thursday, February 9th, 2012

Unlike incandescent lighting that relies on heat to cause a filament to glow and produce light as hot black body, light emitting diodes (LEDs) are semiconductors and as such must be kept cool. When LEDs produce light, heat is a by-product. Heat generated in an LED increases its temperature. As the LED’s temperature increases, the light output decreases, the light changes color, and the lifetime of the LED reduces. Temperature adversely affects both the functional performance of the LED and its longevity. As a consequence, thermal management has become the most predominant issue in solid state lighting (SSL) design.

To download this white paper, click here.

Low-Power Verification

Wednesday, February 8th, 2012

Low-Power Engineering talks about how to verify the power portion of semiconductor designs with Krishna Balachandran of Synopsys; Barry Pangrle of Mentor Graphics; Kalar Rajendiran of eSilicon; Will Ruby of Apache Design, and Lauro Rizzatti of Eve-USA.

YouTube Preview Image

Experts At The Table: Making Software More Energy-Efficient

Friday, January 27th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How much of the battery drain on a smart phone is caused by the hardware, how much is caused by the software, and how much is caused by bad reception?
Kaiser: Software controls a lot of it. Bad hardware that does not allow you to turn something off is one cause. But that doesn’t happen as often as bad software. If the hardware has one clock that turns everything off then you have a problem because whenever you want to use one little block you have to turn on five. But with software you have to give engineers feedback and tell them what knobs to turn. Ideally, you even give them an algorithm for how to tweak those knobs. We tried to do this with Nucleus. The drivers automatically manage their own power for WiFi or anything else. If no one opens the driver it won’t burn power. If you can lower power, don’t worry about the rest of the OS. Just minimize dynamically. You can set up limits for the driver. Then the application guy just needs to be able to allow the device to turn on. You need to give people simple metrics like CPU utilization. And if you give metrics on how much power your CPU is using while idle and how much it’s using when it’s busy, you can tell how much your CPU is using. Then, if you lower the frequency to half and the CPU is twice as busy, it’s actually burning more power. The compiler needs to do the job.
Rowen: The compiler can do a good job of the lower level things, but the choice of algorithms and which states you’re going to transition among is way beyond what the compiler has any access to. I recently saw a study of the number of states that a cell phone goes through. Something like 38 messages had to go back and forth between the software running on the phone and what was going on in the base station that were basically a negotiation as the phone entered a cell. There are some very tough and complex tradeoffs to make about whether you want to save power at one level by doing fewer transactions or you want to be aggressive and get the negotiation done as quickly as possible because it allows you to get into the lower power state as quickly as possible. There are some non-obvious tradeoffs at work at the system level because you have to determine if the phone is in a low-power or high-power state. They’re not things that you’re going to work out between Microsoft and Nokia. It’s going to be between Nokia and AT&T.
Kaiser: Does it matter? How often do you associate with a particular cell station? It affects standby time, but standby time is already pretty long. Does it really matter if you optimize that case, or do you care about other cases? How much of your battery went into this handshake?
Rowen: With the scenarios I’ve seen it could matter a lot.
Hardee: If you change the data arrival rate to those processes that are rendering Web pages, it’s a big difference. You could be running your graphics processors continually just because you have a slow data arrival rate, as opposed to processing everything and shutting down. It would be difficult for the software guys to optimize for those cases. What they can optimize for is how predictable stuff is. Can you do predictive scheduling? That changes what the application is doing. Those decisions are set pretty low down in the software stack, but what’s available to use and how effectively it can be used is another thing the software engineer has to think about.

LPE: How much of this information is making its way between hardware and software teams?
Kulkarni: That’s where virtual platforms come in. A co-simulation platform is a better description. But the marriage of the software with the hardware and how we capture that in instrumentation then can be driven toward a meter, which may be RTL power, a hardware description. But it all has to convert into power analysis at the end of the day. The feedback can be given to the system designer and the software designer, but all those things are missing. What Carbon is doing is an important step toward that. You can do the power analysis and get that feedback. We have to look at the application over time, and the feedback has to be in real time. In one of our customer applications for digital TV, they asked us if your eyes are looking at the oval in the middle of the screen can you turn off the power at the edges. They’re looking at pixel-by-pixel power control. This is real-time feedback of hardware and software applications.
Kaiser: You can re-encode movies based upon brightness. If it’s pretty dark, you can show it with much lower backlight. The backlight can vary and the screen looks the same. And it can vary by region. That’s beyond the scope of hardware. It’s algorithms.
Kulkarni: This customer is looking for software energy-reducing concepts. They want to know where their software is consuming more power.
Kaiser: They want the drivers. And if you’re going to be varying the CPU, then you also need to provide the compiler.
Rowen: Depending on what level in the system you’re talking about, the hardware has always provided the software. We’re doing a lot of advanced baseband design. The next thing after the industry specification that you do is make it happen in 150 milliwatts at 300 Mbits per second. That drives all the subsequent design, including the choice of algorithms, the processors, the allocation of memory and the interconnect. They’re all driven within a power budget. Everyone working at layer one knows the power. This very tight hardware-software co-design is very established there. It starts to loosen up as you go up, in part because you’re aggregating these much more complex systems together.
Neifert: That’s where it’s missing. The power is really a system context. Five or six years ago I started getting inquiries from leading-edge customers. A couple years later it was leading-edge research groups. About two years ago it made it out of research, and now about 30% or 40% of our customers are doing this in some way. It’s of great importance now.
Hardee: We all tend to gravitate toward the simulation model or the virtual platform’s ability to do power estimation. That’s not actually the low-hanging fruit, though. The thing that can be done relatively simply is system integration testing of power management software. Can you switch the mains on and off? Is it idle when you think it’s idle? That’s a lot lower-hanging fruit in a SystemC TLM 2.0 modeling environment than in power estimation. For power estimation, we have a ways to go even in the activity formats used. You have to use averaging formats over defined windows. These all apply at the signal level. How do we bring them up to the TLM 2.0 level to make them run faster? That can be an issue. There are circumstances where you can say you have an AXI protocol and 64 bits, and you can do the math to get from signal level to architectural level. But then you look at all the architectural differences that start to become nuances in that model, like whether you’re doing split transactions and how are bus transactions being pipelined. Is that being correctly modeled in the platform. There’s a lot of complication. Even to get relative accuracy you will need to model this.
Rowen: We’ve gone up halfway between this signal and toggle level and TLM. Processors are nicely defined. What we’ve done is to automatically derive instruction-execution-level energy models so we can, as part of the initial instruction set characterization, come up with a pretty good energy model per execution. It’s still data independent, but there’s a summary number. The simulator knows how to count things like memory references. Then the whole processor plus memory subsystem has very accurate relative and kind of accurate absolute energy at a level that runs at the full speed of a fast simulator, not at RTL speed. Therefore you can start to make that a building block within a transaction-level approach. That’s one of the pieces of raising energy in abstraction and getting past the toggle.
Neifert: You start doing toggles and you slow everything down. You may use the toggles as an instrument for calibration, and then you go back and put that in and say, when I do this I take this much power per cycle. Then you can start aggregating some of those numbers to at least get a relative figure.

Experts At The Table: Making Software More Energy-Efficient

Friday, January 20th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How do you get around the fact that there isn’t enough information available to the software team?
Kaiser: You could profile.
Neifert: You can profile to a point. A high-level virtual platform is too abstracted. It doesn’t have the concept of cycles in there. You need a level of accuracy with sufficient instrumentation. Often it’s not just doing any one single task. It’s what happens when these tasks intersect. What happens when you’re watching a video on your phone, talking to another person and someone calls on another line? It’s power drain happening at the same time, and you don’t necessarily test that from a system perspective.

LPE: Where are we starting to see the most energy consumed by software? Is it at the embedded level or further up the stack?
Kaiser: It can be anywhere.
Rowen: I don’t know how you can really separate power dissipation of software from hardware. You have to look at different subsystems. What’s the baseband subsystem doing versus the imaging subsystem versus the audio subsystem versus the graphics subsystem? Each one of those is going to be some compound of hardware and software issues. You can look at the independent worst-case scenario for each of them. Your first job is to make sure you have good characterization of what’s going on in each subsystem. Then you get to these interesting interactions. If you’re playing back a video you know you’re not doing maximum download on your wireless connection. Or when you’re recording a video, you know that something else is not happening. People have been forced to move from simple subsystem-by-subsystem worst-case analysis to looking at the whole interaction. It’s largely because they can get to a smaller worst-case number than if they didn’t consider scenarios.
Kulkarni: We found that with worst-case scenarios it’s easier to manage the power and to do hardware-software co-simulation, but within the subsystem itself there are so many different modes of operation that co-simulation gets even more interesting. It’s one level of power or energy reduction if you shut off the subsystem, but below that level how do you optimize that subsystem? You’re running software applications, which have to be co-simulated with the hardware. Relative accuracy becomes critical, although not necessarily the absolute accuracy. So how do you generate the testbench? How do you create power patterns? Selecting the critical energy-consuming patterns becomes a challenge. It’s one thing to model or create instrumentation for your software application, but you need a meaningful set of vectors for power consumption. Most of the functional testbenches are useless from a power consumption point of view. Looking at finite-state machines gets to be more and more critical. From the software application, how do you translate that into finite state machines that are control registers, which will then be translated into RTL? And then software is managing all of that. With one mobile phone application we worked with three different vendors for IP models, SystemC, an OSCI simulator and then a five-minute talk time. It would have taken about three months if the customer had not created higher-level models and energy-consuming signals out of that whole environment running together.

LPE: So the hardware guys are worried about power, but the software guys aren’t even thinking about it. How do we change that? Is putting up an ammeter enough?
Neifert: The ammeter is certainly a start. It’s a lot better than what they have today on the software side. We always have talked about concurrent engineering, and more and more processes get applied to that. This is just the next application. The first key is to provide a mechanism, then leverage it across everything and make sure that mechanism is as accurate as possible. But even a relative number is essential. Does this setting take 20% more? Give engineers good tools and they’ll figure out how to apply them.
Rowen: It’s the same as with a video game. If you give someone real-time feedback on the effect of what they’re doing, and they know what the green zone is doing versus the red zone, they’re amazingly effective at getting the needle down into the green zone and keeping it there.
Hardee: It’s not just optimization, especially at the higher levels of the stack. It’s more a case of, ‘The power consumption in this model is worse than the previous model. Something is wrong with the software. Fix it.’ And then the software guy goes off and finds the routine that is polling the modem way more often than is necessary and preventing it from going into sleep mode. It’s those gross errors that are being found when something goes wrong. Are we using the right capability, the right parallelism, the right pipelining or the right architectural facet of the platform to run the right piece of software? Those decisions are usually way down the stack. That’s something the operating system and the drivers have to understand for the various system calls that are going on. When you get into those lower levels of software, you need an accurate model of the platform that can start to tell you the energy usage you’ll get with those various selections to optimize further down the stack. With the application, it’s as simple as looking for what’s keeping something on when it should be off. For true optimization, you’re looking lower down the stack. You could hit the same problem with FPGA prototypes. You can run a decent portion of real-time and you can run some vectors, but what’s your characterization? You’re mapped to a prototype that doesn’t bear any relationship to real silicon. You need activity plus the characterization with enough compute power to run deep, real system modes.
Kulkarni: You need to turn this whole problem on its head. Why do you have to run Facebook versus YouTube versus GPS software on the same processor design? Why not create a Facebook processor rather than running it on a general-purpose processor? People are writing software applications ranging from medical imaging to health care to whatever else you need, and then tuning the hardware to that. And there will be multicore hardware implementations where it makes sense.
Rowen: That’s absolutely the case. One of the fundamental dynamics to emerge is that as power has become so much more important, people have begun to look at power as the ultimate goal and figuring out how everything else serves that goal. If that means you’re going to build a processor around an application, rather than the other way around, you’ll do it if it saves meaningful amounts of power. There are two key elements. We’ve talked about, if you can measure power that can help you make decisions about one processor versus another. The other angle changes the nature of the processor itself. You want processors where what software you run matters to power. That isn’t an obvious characteristic. A lot of people say that as long as every instruction dissipates power then that’s all you have to worry about. All you have to do is go find the one that consumes the most power and you beat down that one as much as possible. But you’re going to spend very little of your time running the worst-case instruction. You’re going to be running a mix of things. And even within your worst-case task, you’re not going to run your worst-case instruction all the time. You need internal mechanisms for clock gating, power gating logic reduction, so the difference between the lowest-power instruction compared with the highest-power instruction is no more than a factor of 10. If you’re running a lightweight mix, that will use an order of magnitude less power than something that does 128 multiplies in a single cycle. By having this big dynamic range you reduce average power and you make software matter. The programmer has implicit or explicit control over what instructions to use, so they can determine how much power to dissipate. You really need to provide people with energy feedback.
Hardee: Having those processor architectures that match the task is highly critical. But you only get a handful of programmers able to use that unless you have the compiler technology to match. You have to be able to automate and not leave it to the individual programmer to choose which instructions to use. The compiler has to be able to compile for performance versus power, just as you are with synthesis constraints in hardware, and it’s going to need to help me through automation to do the right thing.
Kaiser: Yes, we do need feedback. That needs to be there real time, if possible, and it should be better than an ammeter. You need to be able to graph it and correlate it to what’s running in the system, so when software engineers see a spike they need to know. But there’s another issue. Hardware provides a lot of knobs. The guy writing the algorithm is going to use them as little as possible. He will use those settings unless you tell him what those knobs do and why he needs to move them. Software engineers have no reason to change them. If the 128-matrix multiplication works, then they’re done. It’s functional. Power has been an afterthought for years and years.

Next Page »