Posts Tagged ‘Mentor Graphics’

Next Page »

Experts At The Table: Low-Power Verification

Thursday, February 9th, 2012

Low-Power Engineering sat down to discuss the problems of identifying and verifying power issues with Barry Pangrle, solutions architect for low-power design at Mentor Graphics; Krishna Balachandran, director of low-power verification marketing at Synopsys; Kalar Rajendiran, senior director of marketing at eSilicon; Will Ruby, senior director of technical sales and support at Apache Design; and Lauro Rizzatti, general manager of EVE-USA. What follows are excerpts of that conversation.

LPE: What’s the big challenge with verifying power in an SoC?
Ruby: Power has a couple of different components. One is how the low-power techniques impact functionality. If you talk about things like power gating, power supply shutoff, multiple supply voltages and so on, this is where you need to understand certain rules of turning on and off power supplies. You need to be able to create retention cells, to be able to retain state, and to retain functionality. That’s one major aspect. The other side is that you have to look at the power consumption itself. How do you verify that you are on target, if you have a target, and that you are not exceeding a specification? And how do you ensure the design has efficiency built in.
Rajendiran: This is all about is trying to verify what your intentions were that you stated in the beginning and making sure that has been implemented—and when the chip comes out, making sure it is functioning that way. In the old days we simply meant functional and timing verification. Now, just on the functional side, it has become so complex that just getting it out the door is a challenge. It’s the same with software. No one thinks about verifying it all. That’s the practical problem. The person who is verifying the power states doesn’t have the time to put in the right hooks. We have the Unified Power Format to help, but we still don’t have standardization as to how you verify the states. Tools rely a lot of naming conventions, but even though there are fewer companies there is still not compatibility in reading all of those things. Tools are always playing catch-up, too. The ideal solution will be a combination of great tools and planning. In addition, you can have the best tools, but if you put them in the wrong hands you don’t get results.
Pangrle: There’s a functional part and a physical part of verification. A lot of what is going on in the industry right now, especially with the power formats and the convergence around UPF and the 1801 IEEE working group, has been to keep the power intent separate from what has been the standard part of functional verification. It’s allowing people to use their standard flow, take it to RTL, and still be able to design RTL blocks that can be used in different design scenarios with different power management. You don’t have to hard-code isolation, level-shifting, retention registers into those blocks. You can still design your block your same way, and if in one design you’re going to power down that block that’s okay because the intent information is in a separate format and you can bring that in. From that standpoint, there has been good collaboration between EDA companies and their customers. From the standpoint of putting it all together and being able to support the tools, one of the things we’re seeing is that as EDA companies work with designers there are times where something is a little different and different vendors have created support. That’s where it gets tougher to move designs from one company’s set of tools to another. It also brings up some new questions. From the physical side, if you’re powering up and down blocks it has a real impact on your power grid and whether it’s going to function. Just because logically it looks as if it should work, that doesn’t mean when you get your chip back from the foundry you’re not going to run into other issues. And in terms of the complexity of testing, you can do the standard ATPG, but when you go through the dynamics of running different voltages and frequencies and bringing things up and taking them down, to what extent are you actually going to test that?
Balachandran: Verification is complex enough without low power, stretching the resources from both a verification productivity standpoint as well as IP cost. When you add low power into the mix, it makes things much worse. The complexity of low-power designs has been going up slowly but steadily. Some companies that are on the cutting edge, particularly in the mobile market, started adopting low-power designs about five or six years ago. They were the frontrunners of the whole low-power wave. They put the initial pressure on low-power verification, because now you have to start thinking about verification differently. You have to start thinking about voltages, multiple supplies, and whether things going to work in all those conditions. Clock gating is the most basic technique, and almost every company you talk with has been doing clock gating. Now that has expanded into more sophisticated techniques to curb the power, and with that comes the burden to verify properly. All it takes is one unverified state or transition or sequence for the design to completely lock up and not function at all.’

LPE: How bad is this problem?
Balachandran: It’s becoming more widespread. There are government regulations and green initiatives. Everything is going green. There are demands on specifications, and even on power for devices connected to the wall. That requires chipmakers to make their designs much more power-efficient. Customers typically start with four or five power domains. Some of that verification can be done with static techniques or with some rudimentary simulation. But it’s becoming more complex, and this complexity is increasing for the mainstream market, not just the mobile market. The number of power domains is exploding. We’ve seen designs with 50 power domains, which is potentially 250 power states. It’s pretty much impossible to verify all of them. So you need to come up with a really good test plan. When people are confronted with low-power designs the first time, they have no clue about how to write a testbench for low power. Often they need a lot of methodology help, in addition to having the right tools in place, to figure out what they’re going to do, how they’re going to go about doing it, and how they know when they’re done. Then, what is the measure of confidence they have at the end to figure out if they’re really done?
Rizzatti: From the perspective of emulation, this technology has been used for functional verification. Ten years ago, power management was essentially a gated clock. You turned off and on some part of the chip and saved energy there. Around 2001-2002, designs with 10 or 20 of these were called derived clocks. Today we have customers with 100,000 derived clocks. There’s an explosion. But that’s only one problem. Over the past five years, and especially in the past one or two, there are all these new techniques for turning on and off voltages. We had one customer with well more than 100 power domains. The whole industry is changing. Power management is a nightmare, and it makes SoC verification orders of magnitude more difficult.

LPE: With a disaggregated supply chain and more IP re-use, does it make it more difficult to verify the design? Not all of the IP is fully characterized for power.
Balachandran: UPF, or IEEE 1801, and CPF have ways to model the power intent of IP. The issue isn’t so much the ability to specify the power intent of IP. Talking to all the major customers, everybody is either integrating internal IP or using third-party IP. Some of the IP blocks have their own power management, too. It has to be communicated to the integrator of that SoC as to what are the legal ways to integrate the IP into the SoC. That information has to be passed along. The power format is not the right way to pass that information. So the industry has to work out a way—together—to solve this problem. The IP companies, the EDA companies and the whole ecosystem has to work on this to facilitate communicating the right behavior that IP can be integrated from a power perspective, and to tell the IP integrator when they are doing something wrong. If IP is coming from a third party and you have no idea what is going on with that IP in terms of its inner functionality or how the power is implemented and what ways you can put it together on the block, then you can shoot yourself in the foot pretty quickly. This is a problem that needs to be solved. One potential solution is to create assertions for an IP block. The IP developer doesn’t know how IP is going to be used, but they do know what is legal or not. They can create assertions for that and ship it with the IP. Then, when the integrator puts it into the SoC and runs the verification, they are able to figure out if they’ve done it properly or not. If it’s not, then they can have a dialog with the IP company. It’s a way of communicating the data sheet of the IP to the next-level integrator. This is one way of solving the problem. It requires close collaboration between IP partners and EDA and design services companies.
Rajendiran: More times than not, people don’t do that. There are many ways that tools can help, too. If some expert designed the IP block, he can provide some input and then a tool can insert assertions back into the RTL. Ideally you want to keep it separate as a companion file. That’s one approach. But the problem is more complex than that when it comes to low-power verification. IP is one issue. There is physical IP where you can’t do much because it’s already hard coded. There’s also soft IP. Each of the classes has its own challenges. With the soft IP, a lot of activity only happens at the gate level. Depending on how the RTL gets synthesized and mapped, you can have a perfectly functioning solution when you use a particular library in a particular foundry, and the same thing may not work somewhere else. You need deep knowledge about this stuff. You need collaboration of tools, the integrator and the IP developer to make sure you at least get the product out to market on time.
Ruby: There is another dimension of IP—the power intent side, which is the functional verification aspect. That’s absolutely essential to ensure the functionality. Time and time again, what I’ve come across is the need for some way to describe the power consumption behavior of IP, as well. It could be technology dependent or technology independent. It could be models that describe assumptions as a function of clock frequency or data rates. From my customer perspective, this is also becoming essential in the power verification area because they’re not just worried about functional intent. They’re also worried about hitting their power specs. They need models for the IP coming in. If they plug IP into their design and they run their clock frequency at a certain rate, what power consumption can they expect? That’s another very important element to this verification challenge.

Step Away From the Spreadsheet

Thursday, February 9th, 2012

By Ann Steffora Mutschler
Engineers today spend more than a quarter of their time trying to meet power specifications.

A survey of more than 700 engineers by Calypto illustrates just how important and time-consuming power management is today for engineering teams. As consumer devices grow ever more complex, the need to deal with, analyze and optimize power at not just the RTL but at the system level is the next challenge, even if the path to reach that goal is not yet clear.

The opportunities for optimizing a design for power efficiency are greatest at the architectural level of abstraction. The further a design moves downstream the less effective optimization techniques become, noted Yossi Veller, chief scientist for ESL at Mentor Graphics, in a white paper he co-authored for ARM’s IQ Magazine. “Power optimization must begin with architectural analysis, exploration, and optimization of power and timing at the electronic system level (ESL). According to a study by LSI Logic, techniques available at the RTL synthesis phase have the ability to reduce power by 20%; those at the gate level offer a 10% reduction; while those at the layout level can reduce power by only 5%. Waiting until the RTL to begin optimizing for power is a wasted opportunity because power usage can be reduced by 80% at the ESL.”

Fig. 1: The ability to optimize power at the architectural far exceeds that at lower levels of abstraction.

“Traditional power optimization tools are really working at the lower levels of abstraction,” explained William Ruby, senior director of RTL power product engineering at Apache Design. “If you look at synthesis, if you look at physical design, there are some automated techniques that are available in those tools. But those are in a category of additional refinement-type steps. Once you have the design architecture nailed down, then you can add in some optimizations based on those tools and you can get some additional incremental power savings, but the part that is missing is enabling the true design-for-power efficiency. If you look at modern chip architectures, they are extremely complex and the RTL descriptions of these architectures are even more complex such that RTL in some cases is no longer seen as a viable architectural description language. You want to be able to describe the architecture of the design in a high level of abstraction.”

With this description comes the requirement to be able to analyze power. Today, this is done by synthesizing the design from a high-level description such as C++ down to RTL, and then an RTL power analysis tool can function and give feedback into the architectural domain. But what needs to accompany this synthesis-loop-back type of flow and give some indication of what the power numbers is more intelligence in those high level tools. They need to point out inefficiencies in a design at both the RTL and architectural levels.

Chris Rowen, CTO and co-founder of Tensilica sees two big challenges for power analysis tools. “One, it is very, very difficult to isolate where the real problem is. It only makes sense to really measure power at the level when you have really synthesized the logic and laid it out and you actually know what the physical design looks like, because the physical design has a huge impact on what the power dissipation of the circuit it.”

By the time it has gone through synthesis and place and route, you have really very little visibility into what was the original logic being questioned. “It all goes into the Cuisinart and all you get is this amorphous mush of gates at the end. So if someone asks you, ‘How much power is being dissipated in my multiplier versus in my divider versus in my register file,’ I don’t know anymore because I have to process them all together in order to get good physical results. But then it all has been aggressively remapped into other logic forms and I can’t isolate the power easily. So you have to work in rather indirect ways to figure out whether the power was being dissipated in one function versus another.”

A second problem, he said, involves system-level tracking of different scenarios. “It is extremely difficult to reach your power goal if you say, ‘Let me use the worst case assumption about each subsystem. I’m going to assume that every piece of my baseband is on, and every piece of my Layer 2 and Layer 3 protocol stack is on, and my image processor is on, and my apps processor is running full out, and all of my RF subsystems are running,’ because of course you’d exceed your power budget by a factor of two or three. Instead people recognize they’re not all on at the same time, the system doesn’t work that way. When you are doing one thing, then you’re typically not doing something else. Therefore, you only have to look at the particular combination of subsystems that is on at that time. However, the software guys have really poor tools to correlate what’s going on in the higher-level operating modes to what’s going on in terms of actual power dissipation in different subsystems. They are completely shooting in the dark where they do not have anything like the kind of accuracy for the modeling of these things.”

As a step towards true system-level power analysis, engineering teams are gradually figuring out that they need to build approximate models of power in addition to simulation environments that are fast enough to run realistic scenarios and to capture real activity. “Ironically getting power information is more than anything else probably a function of getting fast enough simulation, because only if you can run realistic size scenarios will you really gain interesting information,” he said.

This has become one of the big drivers of ESL, which until recently has been relatively slow to catch on. But complexity at advanced nodes, including power considerations, have significantly boosted it’s appeal.

“What the user would like is to have at the very early stages, when he has a TLM model of the design, is at least a relative assessment what architecture decisions will impact the energy in which direction,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “He will also want to know how the software impacts all of that. From a technology perspective, TLM models allow you to do that so it’s fairly straightforward to annotate power-related data into TLM models,” he asserted.

Annotating models with data just like annotating performance is a challenge and can be approached in three ways:

First, he said, “You can start with your assumptions, with your power budget. TLM models and virtual prototypes allow you to then execute your assumptions so you have in your power envelope/power budget. You say, ‘These tasks should take that much power, I know that from past experience,’ and then you execute your virtual platform with those annotated, estimated data or budgeted data. And you get dynamic results depending on what tasks the software ends up calling, how long a cell phone is used for which task in a day, and so forth.”

Second, annotate back from when you have RTL. “At the RTL level you have these switching formats that you can derive from the RTL to get a good idea about the activity,” Schirrmeister continued.

And third, it can be dealt with at the silicon level by taking previous designs, measuring power information and annotating back into TLM models.

Design engineers are undoubtedly looking for analysis and optimization at the system level so they can do power analysis and power estimation before RTL is available and before they can do gate-level simulations. But are they truly ready to adopt it?

Achim Nohl, technical marketing manager for Synopsys’ solutions group pointed out that today, power analysis starts with gate-level simulation. “If you talk to a hardware engineer and tell him, ‘We are going to employ virtual prototyping and high-level models to do power analysis,’ he will certainly look at you a little strange because he thinks, ‘I’m doing all those back-end optimizations and all those specific things to optimize power. How will you ever be able to reflect that in a virtual prototype simulation?’ But that’s not the point. For virtual prototyping, the granularity of a system is very much different. You’re not looking at just the memory controller. You’re looking at the CPU with the memory controller, the buses, the interconnect, the peripherals and how all those things are orchestrated to find out where the different hot spots are and what is best way to program all those pieces. What is the best scheduling technique? That is the concern at that level.”

When a new chip is architected today, estimates are done to determine whether the chip is feasible at all from a power perspective, he said. “Today, people are using spreadsheets in order to do this analysis, and this can only be a worst case analysis because they don’t know the dynamics and can’t reflect the dynamics of the system in those spreadsheets.”

While the pure architectural level tools don’t exist yet, many users are likely content with high-level synthesis tools for the time being. Apache’s Ruby believes they are good in their own respects but they are not actually meant to give architectural guidance; they are just meant to synthesize the design above the RTL.

One final thought for nervous system architects: The architectural tools of the near future will not replace the actual architect unless they become truly artificial intelligence, which is not likely to happen any time soon, Ruby concluded.

Margin Of Error

Thursday, February 9th, 2012

By Ed Sperling
Adding extra circuits and silicon area to a chip has always been frowned upon by chipmakers. Extra silicon means extra money, and for most chips the least expensive is always the better choice. But at advanced process nodes, margin also can slow performance, increase power consumption, and make it harder to achieve timing closure.

The obvious solution is to reduce margin throughout the design, but the reality is that margin budgets for a complex SoC will never go down. The best that design teams can hope for, in fact, is to keep margin constant from node to node and across stacked configurations. While this will require constant vigilance on the part of architects, it also will increase challenges from the conceptual stages of the design all the way to achieving acceptable yields in manufacturing.

What can’t be fixed
In some cases excess margin is out of reach of design teams. With more and more third-party IP now included in designs—and as much as 90% of the design now a combination of third-party and re-used IP—it’s difficult to even get a firm handle on the amount of guard-banding being done. So far, this hasn’t been a problem because most of the industry still isn’t producing 28nm chips in volume.

“Right now it’s only really a worry for the ‘star-IP,’ because if my USB controller is a bit bigger and power hungry than it might be, it is still peanuts compared with the overall platform figures,” said one architect at a large chip company, who spoke on condition that he not be named. “Even the sum of the power of all the little things doesn’t approach the star-IP. And here’s a thing about the star-IP: It may be big and power-hungry, but it there’s still a case for it. Some IP has a well-defined job to do and has to get that job done as efficiently as possible. But with star-IP, it’s mainly ‘faster is better.’ So sure your Web browser would be more power- and area-efficient on a Cortex-A8 than a Cortex-A9, but I bet you’d rather buy the A9-based tablet.”

Those kinds of choices, as well as time-to-market pressures where IP can be re-used quickly, make guard-banding almost inevitable. What’s surprising is not that it still exists, but that it has remained relatively constant given the explosion in the number of components on an SoC.

Where margin matters most
But margin still causes signal propagation issues because there is more silicon and more wires that signals need to be driven through. That, in turn, leads to the need for wider buses.

“When you guard band you need to ratchet up the intended operating frequencies and increase the clock frequency,” said Neil Hand, group marketing director for Cadence’s SoC Realization Group. “All challenges are made worse. In some parts of the design there is no impact. If you have a low-speed peripheral you probably don’t need to worry about it. But with something like high-performance PCI Express, gen 3, you have fast protocols and huge pipes and margin becomes a critical issue. You have a hard time meeting closure even with no margin. Margin makes it worse.”

He said the key is not so much reducing the percentage of guard banding. The rate has been relatively constant, with about 20% margin at 65nm and 90nm, and at least 15% at 28nm and 20nm.

“With that number there’s a lot more slack,” he noted. “You need to know where the slack is and where it’s going to impact the design. Where you do have room to move it may drive different IP use. There may be better IP externally.”

He’s not alone in that view. In fact, all of the Big Three EDA vendors are counting on the need to trim margin to boost their IP sales over internally developed IP blocks.

“There are a lot of challenges working with 28/20nm because of the variability in processes,” said Navraj Nandra, senior director of marketing in Synopsys’ Analog and Mixed Signal IP Solutions Group. “Reducing margin makes a different for getting performance out of analog. You also want to be competitive in price-performance-area. The question is how much margin you can accept in IP to meet those goals but not compromise on yield or variability.”

This becomes a difficult engineering tradeoff, however. Do you design IP for a specific chip, or do you add enough margin to allow it to easily plug into other designs? For commercial IP, the answer is clearly versatility, but there is a cost to that flexibility.

“You can’t be competitive and have slop in the design, but you can’t build something so competitive that it will only work for one design,” Nandra said. “It’s like a drag car where you run it for a half mile and then you have to replace the engine, the tires, and add more nitrous oxide. You can do the same for super high-performance chips for one temperature range and one process, but it’s useless for anything else. The goal is to build in enough circuit techniques with just enough margin not to risk performance problems if there is variability in the process.”

Manufacturability
Process variability has become particularly troublesome at advanced nodes. Coupled with double patterning at 20nm, and the likelihood of triple patterning at 14nm, margin takes on entirely new dimensions.

“We’re trying to characterize process corners and design around a nominal target,” said Jean-Marie Brunet, director of product marketing for model-based DFM and place and route integration at Mentor Graphics. “Third-party integration is a real challenge. Fill used to be a simple process where you insert it at every layer. But you don’t know what is in the IP these days, so fill has to be re-done. That doesn’t help with the integrity of the IP.”

He said that for most IP, there usually is guard-banding on the periphery of the IP to deal with fill. That impacts timing, area and performance.

“This is really an issue for the big chip companies that do 300 to 400 tapeouts a year, not for the microprocessor houses that can take their time to eliminate margin. The problem is there is no magic bullet for everyone else. And when we get into double patterning, this is really going to be an issue because you’re overlaying two masks, and any shift of the overlay will have a dramatic impact on the chip.”

The future
While pressure to reduce guard banding will continue, there is at least some hope for dealing with the problem more effectively. One involves new materials, such as graphene and silicon on insulator, which help reduce power, and new structures such as finFETs and carbon nanotube FETs, which minimize the effects of leakage and thereby make up for some of the power drawn by the extra margin.

A second approach is better tools. Knowing what the variability is in a process allows engineers to design in a minimum amount of margin. Building more accurate models can help, particularly in conjunction with analysis tools for exploring one IP block versus another.

And finally, stacked die will alleviate at least some concerns because portions such as analog can be developed at older nodes where they make more sense, rather than trying to fit everything into the latest process node.

Mechanical Meets Electrical

Thursday, February 9th, 2012

By Ed Sperling
For the first part of the 20th century mechanical engineering dominated almost everything in technology. For the second half, once the transistor and the integrated circuit became well entrenched, those two disciplines largely divided up the tech market.

More recently, however, they are being forced to collaborate in teams that historically had nothing in common. While the combination of electrical engineering with software has raised questions about how to trade information back and forth, mechanical and electrical engineering arguably are even further apart. But there is at least one consistent element throughout the most recent combination—power.

Power and heat
One of the biggest changes in engineering is that power is global. Physical effects such as heat, electrostatic discharge and leakage current can affect many other levels of a much larger system. That larger system could be a car, an airplane, or a data center.

“Inside of an engineering organization, someone near the top has to worry about the entire system,” said Larry Williams, director of product management for the electronics business unit of Ansys. “They have to think about boundaries between systems and subsystems, or between mechanical engineering and electrical engineering, because many firms are organized that way. When building a system, the optimum design can be found by considering the system as a whole, and additional margin is often found at those boundaries.”

He said that at a meeting within one defense contractor, he actually introduced the mechanical and electrical engineering teams, who had never met even though they worked on the same projects. Those silos have since begun breaking down, in part because systems demand power efficiency, better reliability and lower cost. Things that used to be done as purely mechanical engineering may be mixed together as part of a bigger system.

But the perspective of each is different. Consider thermal budgets, for example. Electrical engineers focus on turning off as much of a chip as possible when it’s not in use, and running what’s in use as efficiently as possible—even going so far as to weigh whether specific operations use less energy when they’re run at maximum speed for short periods of time or slower speeds over longer periods of time. Mechanical engineers, meanwhile, focus in the other direction—cooling the devices as close to the heat generation as possible. In the past, that meant simply drilling holes into metal and adding heat sinks and fans.

“As density has increased it is no longer possible to thermally manage a device around the PCB,” said Robin Bornoff, FloTherm product marketing manager in Mentor Graphics’ Mechanical Analysis Division. “It’s gotten to the point where the mechanical perspective cannot be a separate discipline. We’re now seeing representatives of the thermal design teams showing up right from the beginning in meetings with the system architect. They have to work together.”

That discussion becomes even more critical in 3D stacking, where heat can get trapped between two die. And it’s not just the stacked die that needs to be considered. It’s what’s around it, as well.

“Heat doesn’t obey existing design discipline barriers,” said Bornoff. “The heat will spread into the air, the chassis, the room, and out from there. How hot the silicon gets affects everything, sometimes even outside the building. That’s why you’re starting to see water-cooling in space applications and in data centers. It’s 1,000 times denser than air and 1,000 times better at removing heat.”

The challenge is to get that cooling as close to the source of heat as possible. So rather than just cooling a server cabinet, for example, the liquid is pumped around the processors producing the heat. There is even research under way in microfluidics to pump liquid around the chip itself in a stacked die. Bornoff noted that initial approaches tried to squeeze the fluid through very narrow channels, which required massive pressure. He said the latest research uses piezoelectric fans and pumps, whereby vibration creates movement in the fluid.

Fig. 1: Microfluidics. (Source: Imperial College of London)

MEMS and energy harvesting
Another confluence of mechanical and electrical engineering skills has been the MEMs world—microelectromechanical systems—which are growing in importance in markets ranging from touch screens to smart sensors and analog signal conditioners. There are even micromotors with gears attached to semiconductors.

“Electronics is relatively young compared to mechanical engineering,” said Cary Chin, director of technical marketing for low-power solutions at Synopsys. “But the next big rev of the market is pointed toward electromechanical systems. A lot of these are being looked at for technologies that will start to solve the power problem. With a mechanical system there is no leakage. And for devices that don’t require a really high level of performance, they may be able to power a system forever.”

Think about biomedical devices such as a pacemaker, for example. An energy scavenging system that includes semiconductor technology and mechanical energy harvesting can be used to provide enough power just from a person’s own heartbeat to both keep a steady pace, detect when there is an irregularity, and even act as a defibrillator for one or two stored duty cycles.

Fig. 2: Mini motors. (Source: Sandia National Laboratories)

“The challenge for the tools world will be to rethink optimization,” said Chin. “With power we already had to make significant changes for implementation and verification. Now what we may be looking at is support electronics, where the heavy lifting of computing is moved into the cloud.”

The future
The so-called Internet of things is another big driver in this whole shift to fuse together electrical and mechanical engineering. Within this scheme, systems will be defined as much collectively as individually, much as they are from subsystem to system, with the actual location of computing as distributed along the lines of the Internet.

Within this scheme, there will be many places that mechanical and electrical engineering cross paths, many driven by power, heat, signal integrity and new applications that are just now on the drawing board. For that there will also be new opportunities for tools that can explore tradeoffs of something done mechanically versus electrically, just as those types of tradeoffs are now made for the best kind of IP and processor cores within a given power budget. And as the silos break down, the possibilities are mind-boggling.

State Of The Art In Solid State Lighting Thermal Design

Thursday, February 9th, 2012

Unlike incandescent lighting that relies on heat to cause a filament to glow and produce light as hot black body, light emitting diodes (LEDs) are semiconductors and as such must be kept cool. When LEDs produce light, heat is a by-product. Heat generated in an LED increases its temperature. As the LED’s temperature increases, the light output decreases, the light changes color, and the lifetime of the LED reduces. Temperature adversely affects both the functional performance of the LED and its longevity. As a consequence, thermal management has become the most predominant issue in solid state lighting (SSL) design.

To download this white paper, click here.

Low-Power Verification

Wednesday, February 8th, 2012

Low-Power Engineering talks about how to verify the power portion of semiconductor designs with Krishna Balachandran of Synopsys; Barry Pangrle of Mentor Graphics; Kalar Rajendiran of eSilicon; Will Ruby of Apache Design, and Lauro Rizzatti of Eve-USA.

YouTube Preview Image

Experts At The Table: Making Software More Energy-Efficient

Friday, January 27th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How much of the battery drain on a smart phone is caused by the hardware, how much is caused by the software, and how much is caused by bad reception?
Kaiser: Software controls a lot of it. Bad hardware that does not allow you to turn something off is one cause. But that doesn’t happen as often as bad software. If the hardware has one clock that turns everything off then you have a problem because whenever you want to use one little block you have to turn on five. But with software you have to give engineers feedback and tell them what knobs to turn. Ideally, you even give them an algorithm for how to tweak those knobs. We tried to do this with Nucleus. The drivers automatically manage their own power for WiFi or anything else. If no one opens the driver it won’t burn power. If you can lower power, don’t worry about the rest of the OS. Just minimize dynamically. You can set up limits for the driver. Then the application guy just needs to be able to allow the device to turn on. You need to give people simple metrics like CPU utilization. And if you give metrics on how much power your CPU is using while idle and how much it’s using when it’s busy, you can tell how much your CPU is using. Then, if you lower the frequency to half and the CPU is twice as busy, it’s actually burning more power. The compiler needs to do the job.
Rowen: The compiler can do a good job of the lower level things, but the choice of algorithms and which states you’re going to transition among is way beyond what the compiler has any access to. I recently saw a study of the number of states that a cell phone goes through. Something like 38 messages had to go back and forth between the software running on the phone and what was going on in the base station that were basically a negotiation as the phone entered a cell. There are some very tough and complex tradeoffs to make about whether you want to save power at one level by doing fewer transactions or you want to be aggressive and get the negotiation done as quickly as possible because it allows you to get into the lower power state as quickly as possible. There are some non-obvious tradeoffs at work at the system level because you have to determine if the phone is in a low-power or high-power state. They’re not things that you’re going to work out between Microsoft and Nokia. It’s going to be between Nokia and AT&T.
Kaiser: Does it matter? How often do you associate with a particular cell station? It affects standby time, but standby time is already pretty long. Does it really matter if you optimize that case, or do you care about other cases? How much of your battery went into this handshake?
Rowen: With the scenarios I’ve seen it could matter a lot.
Hardee: If you change the data arrival rate to those processes that are rendering Web pages, it’s a big difference. You could be running your graphics processors continually just because you have a slow data arrival rate, as opposed to processing everything and shutting down. It would be difficult for the software guys to optimize for those cases. What they can optimize for is how predictable stuff is. Can you do predictive scheduling? That changes what the application is doing. Those decisions are set pretty low down in the software stack, but what’s available to use and how effectively it can be used is another thing the software engineer has to think about.

LPE: How much of this information is making its way between hardware and software teams?
Kulkarni: That’s where virtual platforms come in. A co-simulation platform is a better description. But the marriage of the software with the hardware and how we capture that in instrumentation then can be driven toward a meter, which may be RTL power, a hardware description. But it all has to convert into power analysis at the end of the day. The feedback can be given to the system designer and the software designer, but all those things are missing. What Carbon is doing is an important step toward that. You can do the power analysis and get that feedback. We have to look at the application over time, and the feedback has to be in real time. In one of our customer applications for digital TV, they asked us if your eyes are looking at the oval in the middle of the screen can you turn off the power at the edges. They’re looking at pixel-by-pixel power control. This is real-time feedback of hardware and software applications.
Kaiser: You can re-encode movies based upon brightness. If it’s pretty dark, you can show it with much lower backlight. The backlight can vary and the screen looks the same. And it can vary by region. That’s beyond the scope of hardware. It’s algorithms.
Kulkarni: This customer is looking for software energy-reducing concepts. They want to know where their software is consuming more power.
Kaiser: They want the drivers. And if you’re going to be varying the CPU, then you also need to provide the compiler.
Rowen: Depending on what level in the system you’re talking about, the hardware has always provided the software. We’re doing a lot of advanced baseband design. The next thing after the industry specification that you do is make it happen in 150 milliwatts at 300 Mbits per second. That drives all the subsequent design, including the choice of algorithms, the processors, the allocation of memory and the interconnect. They’re all driven within a power budget. Everyone working at layer one knows the power. This very tight hardware-software co-design is very established there. It starts to loosen up as you go up, in part because you’re aggregating these much more complex systems together.
Neifert: That’s where it’s missing. The power is really a system context. Five or six years ago I started getting inquiries from leading-edge customers. A couple years later it was leading-edge research groups. About two years ago it made it out of research, and now about 30% or 40% of our customers are doing this in some way. It’s of great importance now.
Hardee: We all tend to gravitate toward the simulation model or the virtual platform’s ability to do power estimation. That’s not actually the low-hanging fruit, though. The thing that can be done relatively simply is system integration testing of power management software. Can you switch the mains on and off? Is it idle when you think it’s idle? That’s a lot lower-hanging fruit in a SystemC TLM 2.0 modeling environment than in power estimation. For power estimation, we have a ways to go even in the activity formats used. You have to use averaging formats over defined windows. These all apply at the signal level. How do we bring them up to the TLM 2.0 level to make them run faster? That can be an issue. There are circumstances where you can say you have an AXI protocol and 64 bits, and you can do the math to get from signal level to architectural level. But then you look at all the architectural differences that start to become nuances in that model, like whether you’re doing split transactions and how are bus transactions being pipelined. Is that being correctly modeled in the platform. There’s a lot of complication. Even to get relative accuracy you will need to model this.
Rowen: We’ve gone up halfway between this signal and toggle level and TLM. Processors are nicely defined. What we’ve done is to automatically derive instruction-execution-level energy models so we can, as part of the initial instruction set characterization, come up with a pretty good energy model per execution. It’s still data independent, but there’s a summary number. The simulator knows how to count things like memory references. Then the whole processor plus memory subsystem has very accurate relative and kind of accurate absolute energy at a level that runs at the full speed of a fast simulator, not at RTL speed. Therefore you can start to make that a building block within a transaction-level approach. That’s one of the pieces of raising energy in abstraction and getting past the toggle.
Neifert: You start doing toggles and you slow everything down. You may use the toggles as an instrument for calibration, and then you go back and put that in and say, when I do this I take this much power per cycle. Then you can start aggregating some of those numbers to at least get a relative figure.

Experts At The Table: Making Software More Energy-Efficient

Friday, January 20th, 2012

By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How do you get around the fact that there isn’t enough information available to the software team?
Kaiser: You could profile.
Neifert: You can profile to a point. A high-level virtual platform is too abstracted. It doesn’t have the concept of cycles in there. You need a level of accuracy with sufficient instrumentation. Often it’s not just doing any one single task. It’s what happens when these tasks intersect. What happens when you’re watching a video on your phone, talking to another person and someone calls on another line? It’s power drain happening at the same time, and you don’t necessarily test that from a system perspective.

LPE: Where are we starting to see the most energy consumed by software? Is it at the embedded level or further up the stack?
Kaiser: It can be anywhere.
Rowen: I don’t know how you can really separate power dissipation of software from hardware. You have to look at different subsystems. What’s the baseband subsystem doing versus the imaging subsystem versus the audio subsystem versus the graphics subsystem? Each one of those is going to be some compound of hardware and software issues. You can look at the independent worst-case scenario for each of them. Your first job is to make sure you have good characterization of what’s going on in each subsystem. Then you get to these interesting interactions. If you’re playing back a video you know you’re not doing maximum download on your wireless connection. Or when you’re recording a video, you know that something else is not happening. People have been forced to move from simple subsystem-by-subsystem worst-case analysis to looking at the whole interaction. It’s largely because they can get to a smaller worst-case number than if they didn’t consider scenarios.
Kulkarni: We found that with worst-case scenarios it’s easier to manage the power and to do hardware-software co-simulation, but within the subsystem itself there are so many different modes of operation that co-simulation gets even more interesting. It’s one level of power or energy reduction if you shut off the subsystem, but below that level how do you optimize that subsystem? You’re running software applications, which have to be co-simulated with the hardware. Relative accuracy becomes critical, although not necessarily the absolute accuracy. So how do you generate the testbench? How do you create power patterns? Selecting the critical energy-consuming patterns becomes a challenge. It’s one thing to model or create instrumentation for your software application, but you need a meaningful set of vectors for power consumption. Most of the functional testbenches are useless from a power consumption point of view. Looking at finite-state machines gets to be more and more critical. From the software application, how do you translate that into finite state machines that are control registers, which will then be translated into RTL? And then software is managing all of that. With one mobile phone application we worked with three different vendors for IP models, SystemC, an OSCI simulator and then a five-minute talk time. It would have taken about three months if the customer had not created higher-level models and energy-consuming signals out of that whole environment running together.

LPE: So the hardware guys are worried about power, but the software guys aren’t even thinking about it. How do we change that? Is putting up an ammeter enough?
Neifert: The ammeter is certainly a start. It’s a lot better than what they have today on the software side. We always have talked about concurrent engineering, and more and more processes get applied to that. This is just the next application. The first key is to provide a mechanism, then leverage it across everything and make sure that mechanism is as accurate as possible. But even a relative number is essential. Does this setting take 20% more? Give engineers good tools and they’ll figure out how to apply them.
Rowen: It’s the same as with a video game. If you give someone real-time feedback on the effect of what they’re doing, and they know what the green zone is doing versus the red zone, they’re amazingly effective at getting the needle down into the green zone and keeping it there.
Hardee: It’s not just optimization, especially at the higher levels of the stack. It’s more a case of, ‘The power consumption in this model is worse than the previous model. Something is wrong with the software. Fix it.’ And then the software guy goes off and finds the routine that is polling the modem way more often than is necessary and preventing it from going into sleep mode. It’s those gross errors that are being found when something goes wrong. Are we using the right capability, the right parallelism, the right pipelining or the right architectural facet of the platform to run the right piece of software? Those decisions are usually way down the stack. That’s something the operating system and the drivers have to understand for the various system calls that are going on. When you get into those lower levels of software, you need an accurate model of the platform that can start to tell you the energy usage you’ll get with those various selections to optimize further down the stack. With the application, it’s as simple as looking for what’s keeping something on when it should be off. For true optimization, you’re looking lower down the stack. You could hit the same problem with FPGA prototypes. You can run a decent portion of real-time and you can run some vectors, but what’s your characterization? You’re mapped to a prototype that doesn’t bear any relationship to real silicon. You need activity plus the characterization with enough compute power to run deep, real system modes.
Kulkarni: You need to turn this whole problem on its head. Why do you have to run Facebook versus YouTube versus GPS software on the same processor design? Why not create a Facebook processor rather than running it on a general-purpose processor? People are writing software applications ranging from medical imaging to health care to whatever else you need, and then tuning the hardware to that. And there will be multicore hardware implementations where it makes sense.
Rowen: That’s absolutely the case. One of the fundamental dynamics to emerge is that as power has become so much more important, people have begun to look at power as the ultimate goal and figuring out how everything else serves that goal. If that means you’re going to build a processor around an application, rather than the other way around, you’ll do it if it saves meaningful amounts of power. There are two key elements. We’ve talked about, if you can measure power that can help you make decisions about one processor versus another. The other angle changes the nature of the processor itself. You want processors where what software you run matters to power. That isn’t an obvious characteristic. A lot of people say that as long as every instruction dissipates power then that’s all you have to worry about. All you have to do is go find the one that consumes the most power and you beat down that one as much as possible. But you’re going to spend very little of your time running the worst-case instruction. You’re going to be running a mix of things. And even within your worst-case task, you’re not going to run your worst-case instruction all the time. You need internal mechanisms for clock gating, power gating logic reduction, so the difference between the lowest-power instruction compared with the highest-power instruction is no more than a factor of 10. If you’re running a lightweight mix, that will use an order of magnitude less power than something that does 128 multiplies in a single cycle. By having this big dynamic range you reduce average power and you make software matter. The programmer has implicit or explicit control over what instructions to use, so they can determine how much power to dissipate. You really need to provide people with energy feedback.
Hardee: Having those processor architectures that match the task is highly critical. But you only get a handful of programmers able to use that unless you have the compiler technology to match. You have to be able to automate and not leave it to the individual programmer to choose which instructions to use. The compiler has to be able to compile for performance versus power, just as you are with synthesis constraints in hardware, and it’s going to need to help me through automation to do the right thing.
Kaiser: Yes, we do need feedback. That needs to be there real time, if possible, and it should be better than an ammeter. You need to be able to graph it and correlate it to what’s running in the system, so when software engineers see a spike they need to know. But there’s another issue. Hardware provides a lot of knobs. The guy writing the algorithm is going to use them as little as possible. He will use those settings unless you tell him what those knobs do and why he needs to move them. Software engineers have no reason to change them. If the 128-matrix multiplication works, then they’re done. It’s functional. Power has been an afterthought for years and years.

Characterizing PLL Jitter From Power Supply Fluctuation Using Mixed-Signal

Thursday, January 12th, 2012

Characterizing PLL Jitter is important yet challenging. Usually done through transistor-level transient analysis, slow simulation speed has been the major bottleneck preventing jitter from being characterized in a timely manner. This paper presents an approach for fast jitter characterization using mixed-signal simulation (combination of transistor-level blocks and calibrated behavioral models). Among various PLL jitter mechanisms, jitter from CMOS gate switching threshold variation due to power supply fluctuation is chosen to be the focus. Analog/digital converters carrying dynamic power supply dependency, together with behavioral models written in Verilog-AMS, are used to approximately model and characterize the targeted type of jitter. Jitter characterization using this method is applied to two PLL blocks, phase detector and frequency divider. Results show that jitter measured from the proposed method is in good agreement with transistor-level simulation and the speed improvement from mixed-signal simulation is significant, proving this method to be a feasible approach for fast jitter characterization. Keywords PLL jitter, power supply fluctuation, gate switching threshold variation, analog/digital converter, dynamic power supply dependency.

To download this white paper, click here.

Rethinking Good Enough

Thursday, January 12th, 2012

By Ed Sperling
Power has been elevated from an afterthought to one of the top considerations and tradeoffs in SoC design, edging out performance and area in many cases and in some cases even cost and features.

Tradeoffs in design always change, depending upon what the most pressing concern is among consumers at any time. For decades, performance was always the top of anyone’s list, followed closely by cost. The MIPS and GHz wars made for great competitive marketing. But as devices become more mobile, and as even the largest enterprises focus on energy costs, the reigning king is power. How long does a battery last between charges on a smart phone or a laptop given a normal use case? How may kilowatt hours does it take to run a server?

This isn’t always a clean tradeoff, however. For one thing, some design features require more power, forcing changes in other parts of a design. And in other cases, the lack of any single use model makes it almost impossible to guess how a device will be used. One consumer may rely on voice calls, while another focus on text and still another may play games and stream video.

What stays, what goes
Decisions about what to keep aren’t always simple. Consider an LED TV design, for example. Flattening the screen requires audio enhancement because it’s impossible to get good enough sound out of a TV without playing tricks with the sound. That typically means more post-processing, more codecs, and more energy consumed.

“There are lots of things that can be done to enhance the audio experience,” said Larry Przywara, senior director of multimedia marketing at Tensilica. “The TV designers are space constrained. That requires various volume boosts, equalization and sound widening techniques just to do what they used to do. That’s doable, though, because as algorithms have gotten more complex the SoCs have gotten more powerful.”

Overall, they also use less energy to drive the SoC and the complete system. But sometimes that requires increasing power budgets in one place and decreasing them in another. “The issues in the mobile space are now finding their way into home entertainment,” said Przywara. “With post-processing you need slight modifications in other places to keep a limited power budget.”

In televisions, that energy can come from a variety of places. For example, the current design on some TVs relies on brighter pixels in the middle, where most people focus their eyes, and dimmer pixels on the corners where viewers don’t look.

Making tradeoffs
For both video and audio, the real change is a combination of improved technology and what consumers are willing to live with. Fifteen years ago most audiophiles wouldn’t touch a CD, and even several years ago the focus on quality in DVDs was considered the competitive edge. More people have migrated to the center of the spectrum as CD quality improved and streaming offers vast convenience even if it isn’t high-definition.

“Audio, from a technology standpoint, is not a big deal,” said Cary Chin, director of director of technical marketing for low-power solutions at Synopsys. “The real focus is on video, and today the real question is how you trade off storage with communications. Do you spend more time and energy to compress it or store it? And do you store it locally or in the cloud? As we focus more on portable devices, power and cost are the main factors.”

The other question is just how much power efficiency is enough. A smart phone uses basically the same technology as a tablet, yet the tablet gets significantly longer battery life between charges, while the smart phone needs to be charged every day.

“Tradeoffs are a great way to define area where the technology is evolving,” Chin said. “In digital video you can improve the resolution, but most of the computation and power is spent in compression and decompression. Even with printers, you can print with finer technology but it’s usually more important to lower the cost. Low power is one of the areas that will become critical to all of these decisions over the next 5 to 10 years.”

But even the technology that can command a premium—products from companies such as Apple, high-end graphics from Nvidia, and laptops from Lenovo—haven’t skimped when it comes to saving power.

“The tradeoff is how much energy you use at any time and how much energy you need to accomplish a task,” said Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics. “In general there will be more dark silicon and more functionality on a chip, but it won’t all be running at the same time.”

One of the more interesting tradeoffs has to do with which processors are used for what functions. Nvidia’s Tegra3 graphics chip, for example, has a four-core graphics engine and a fifth, lower-power and lower-performance chip for less data-intensive tasks.

Features or function
Perhaps the hardest thing to determine is whether to cut features, cut performance, or live with more power consumption when it’s needed. Will Ruby, senior director of RTL power product engineering at Apache Design, said what’s changed is that power is a fundamental requirement, along with features and functions. Engineers have to meet the power spec, even if that means some tradeoffs.

“There are two aspects to a tradeoff,” he said. “One is at the spec level. What features can you add for what performance and power. As more and more people learn how to do low-power design, they will meet or beat specs. Of course that usually means even more aggressive specs in the next design. The second is a spe-level tradeoff. How much time does it take to switch to a different application, for example? If it’s one-tenth of a second that will be a big difference from two-tenths of a second.”

Some tradeoffs also occur on the process side. Do you use older low-power process technology, or do you use the fastest general-purpose process technology and turn a block off as quickly as possible? Or do you dope the channel or swap to fully depleted silicon on insulator substrates?

Conclusion
None of these tradeoffs are fixed. They can be tweaked and tweaked again, because what may be good enough for one market, one group of users or at any point in time may be different somewhere else six months later.

What is significant, though, is just how integral a part power has become in all of these decisions. “The real key is how you can exploit all the possibilities of what you can get with relatively low power,” said Pete Hardee, marketing director at Cadence. “If you’re trying to freeze frame a golf swing in video, you may want to go completely the other way—all the way up to 60 frames per second. If power is the issue, you may want a slower frame rate. And it’s not just about battery life. Reliability is a big headache for customers. The ability of low-power techniques to control the performance profile can increase reliability, too.”

Next Page »