Posts Tagged ‘Mimasic’

5 Ways To Cut Power

Thursday, June 16th, 2011

By Ed Sperling
Low energy consumption with minimal leakage has emerged as the most competitive element in an IC design, regardless of whether it involves a plug, a battery, or whether it’s powered by a gasoline engine.

While components on an SoC aren’t always power-aware, they’ll have to be in the future as consumers focus first on energy efficiency. With rising fuel costs, a concern over global warming and a steady reminder that smart phones have to be plugged in every night, car companies are shifting their strategy from efficient hybrids to even more efficient plug-in hybrids and electric vehicles, and California has gone so far as to mandate that one-third of all electricity sold in the state by the end of 2020 must come from renewable sources.

This shift in public awareness hasn’t been lost on the chip industry, which has been rolling out some very complex advances well ahead of schedule. Here are some of the most important:

Clouds
The push toward a cloud-based infrastructure is a way of centralizing computing—basically a return to the time-sharing model once perfected by the mainframe and then re-distributed with the advent of the commodity PC server. The data processing world is re-aggregating, but this time with a difference. It’s not just that the computing is being centralized. It’s that the centralization is taking place in proximity of cheap power sources such as hydroelectric power, nuclear plants (for now) and wind farms.

“Cloud leads to big efficiency gains,” said Chris Rowen, chief technology officer at Tensilica. “Now you can put the computing farm where the energy is available. It’s an arbitrage opportunity. It’s not hard to ship bits when you compare that to the difficulty in transporting electricity.”

There’s a clear business case to be made on this front. An estimated 6.5% of electricity is lost in transmission, according to the U.S. Energy Information Administration. That may not seem like a lot until you consider those are high-voltage transmission lines. Bits are cheap, in comparison—even trillions of them—which is why there is talk now of centralizing portions of even base stations. Those parts that do intensive computation with a high degree of redundancy are prime candidates for being located in a data center.

“There’s a lot of computation needed to reduce noise and create a clean signal,” said Rowen. “But there’s also some computing that has to be done locally because there are tough latency requirements.”

Adaptive Body Biasing
Adaptive body biasing has been under serious discussion for the past five years as a way of reducing current leakage by controlling a device’s body voltage, which in turn increases the voltage threshold. The big advantage here is less switching to the off state. The downside is this is has been difficult stuff to design and manufacture.

“This was not seen as a mainstream approach, but now it’s showing up almost everywhere,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “This was seen as a challenging technique to implement, but now TI and Samsung are using it. If you change the body bias voltage, you impact the threshold voltage. You can increase or decrease leakage, as needed, and boost performance.”

Consultant Bhanu Kapoor, president of Mimasic, noted that for some high-performance applications the alternatives such as power gating may be impractical because it simply takes too long to turn on and off sections of a chip. In those cases, body biasing is the only choice.

Atomic-Level Changes
Another technique that has been particularly difficult to master is atomic-level control of channel doping on the manufacturing side. And while most experts don’t expect the process and manufacturing side to offer any huge gains, this one may be the exception.

Scott Thompson, chief technology officer at startup SuVolta, said that by improving the doping technique, both dynamic and static current leakage can be reduced with regular bulk CMOS.

“The problem is that the wall around the channel is leaky and it’s hard to control the shape,” said Thompson. “Strain engineering helps to control the atomic-level analysis. But there has been no other breakthrough other than changing the transistor, and we don’t see a need for that for all architectures.”

At its unveiling last week, SuVolta had lined up support from Fujitsu, Cypress, ARM and Broadcom. The company claims the technology is an alternative to FinFETs, which are more difficult to manufacture.

3D Transistors And Packaging
Nevertheless, the major foundries have committed to building FinFETs at advanced nodes. Intel’s announcement of a Tri-Gate three-dimensional transistor at 22nm has been a major topic in the semiconductor industry. The question is now that Intel has publicly committed to the technology, can it really be manufactured with sufficient yield? And can it be built effectively using the disaggregated foundry model in the near future?

These kinds of questions will remain unanswered at least for the next couple years. TSMC is planning to use FinFETs at 14nm, and GlobalFoundries has been working on the same technology. Nevertheless, the big advantage of FinFET technology is a sharp reduction in leakage while providing a significant performance boost.’

Creating stacks of die also has a huge effect on power, in part because the distances between logic and memory can be shortened significantly. A system-in-package version of stacked die, using interposer technology, is expected to begin widespread production over the next 12 to 18 months, bolstered by the new Wide I/O standard that increases the size of the pipes between logic and memory.

New Materials
Fully depleted SOI, silicon on sapphire, as well as new ways of putting them all together in stacks connected by low-cost interposers that can be made of glass have turned into major research efforts as companies seek to knock costs out of the bill of materials for new chips.

While the FD SOI has been well tested for years by the Common Platform participants, the others have only been used on a very limited basis. One approach now being considered is actually designing chips to run hotter rather than trying to keep the power down. While there are limits to this approach—no one wants to pick up a hot phone—there are times when performance is more important than heat.

Taken as a whole, all of these changes can have a significant reduction in power, particularly when coupled with efficient software code and more customized user controls—and end devices that actually use the power-saving technology that is being built into these chips.

Power Panel: IP And Other Key Issues For Future Development

Thursday, March 17th, 2011

By Ed Sperling
Low-Power Engineering chaired a DesignCon panel of low-power experts with Bhanu Kapoor, president of Mimasic; Kesava Talupuru, DV engineer at MIPS; Prapanna Tiwari, CAE manager at Synopsys, and Rob Aitken, an ARM Fellow. What follows are excerpts of their presentations and the panel discussion that followed.

Prapanna Tiwari: UPF and CPF are text files that capture the power intent of the design.

Power management is one of the main problems we’re trying to solve in every design. The goal is to operate every given part of the chip at the lowest voltage you can get away with. If you can shut it off, you do that. If you can’t shut it off—and you can’t shut off memories—then you reduce the voltage to the lowest possible level so you don’t lose as much through leakage. From a verification standpoint, what you used to write in Verilog would appear in silicon and that was all there was to it. That’s no longer true. Now there is this idea of power intent that has to be captured. It has structure to it. It has semantics, and it has simulation sequences. It impacts every part of a design. (See Fig. 1)

Fig. 1

The product behavior has two components. One is the design. The other is the power. Verification needs to take care of this.

The power intent itself has two aspects. One is static. What are the regions? How are the regions partitioned? How do they map onto my design hierarchy? That’s where UPF comes into play. It says these are the domains. These are the different level shifters you’re going to insert in your design. That’s the structure.

But there’s a second aspect, which is dynamic. How are you going to exercise these different voltage regions on a chip? What is allowed, what isn’t allowed? If ARM or MIPS delivers cores to their customers, they need to let them know here’s how you should use it. There is no way in our current methodology, when you deliver a Verilog model, what voltage levels its supposed to be instantiated at. There’s nothing in Verilog that lets you do that. Different customers will use ARM and MIPS cores using different power management techniques, different voltage levels, different process nodes. How do you let them know you’re not supposed to do certain things?

If an IP can provide constraints that you can’t use IP in a different way, that’s where power intent comes in. You can do that from a functional standpoint today. You cannot do that it in a power-aware model. There’s no way to figure out where IP gets used. Context is missing. (See Fig. 2)

Fig. 2

Even within the same semiconductor company you will see different modules have different design owners. You don’t want anyone to be using IP in the wrong way even years from now. There is IP in designs where no one has any clue where it came from. That’s one of the key challenges for an IP provider—to generate behavioral and verification IP, not just with the VHDL view but with the power-aware view to go with it. If you can deliver this, it will eliminate an enormous amount of risk that it will reduce the cost.

For any verification, there are three pieces. There is the testbench, the design itself and then assertions and coverage. (see Fig. 3)

Fig. 3

In the overall verification, the testbench needs to be power-aware. IP users need to be able to monitor any region of the design. The way IP is growing, it may have many power domains inside. It may even have its own power controller that reacts to events from outside the IP. A customer testbench needs to know power events and sequences in different parts of the IP. Otherwise you have no idea if the IP really shut down or not.

You also need to be able to write models for the IP. One user may be at 1.2 volts. Another might be at 1.0 volts. Different signals will react differently. All this behavior needs to access the power information.

All of that power information needs to be available, and it should be context-free. And last but not least, assertions need to be power-aware. When the system is shut down, how is the IP being isolated, what are the level shifters being inserted?

To solve this, you need to be able to merge UPF and HDL into one. In your RTL you should be able to query information and build models around it.

Rob Aitken: If it’s not clear what are the issues are involved, what would make it clear?

There’s an existence proof. In one chip we had some RTL and a power spec and it turned into a chip and the chip worked. There were multiple decades of ARM experience, the latest IP and EDA tools, access to IP designers, skill in all available EDA tools and some magic smoke. But what if you don’t have all that? How many of those things do you need?

In addition, there’s no one thing called IP and there are lots of different uses for the same IP. There’s one group that says, ‘Whatever it is, give it to me, I want it to work and be done.’ Then there’s another group that might say, ‘I don’t care what you think should be done with this IP. Give me the parts and I’ll do it myself.’ What we really want to make sure of is that the standards don’t interfere with the use models and that they cover all of the possible use cases.

Context also matters. We like to talk about something like an always-on buffer. If it’s in a system where there’s a battery connected to it and part of the processor is shut down, that has a different meaning than when it’s plugged into the wall and the system is turned off. That always-on buffer isn’t always on anymore. It’s just sometimes on.

And what happens if I run my IP at 0.6 volts? If no one designed it for that, will it work? Maybe.

There are all sorts of other clever things we can do. An SRAM will retain data at much lower voltage than you can read or write it. You can have SRAM-dependent behavior. If it operates at a very low voltage you can store data, but if you write it, it will fail. Trying to model that in a high-level language is an interesting challenge.

From a soft IP standpoint, you can say here’s some RTL and here’s a power description. You only need four things:

  1. What are the atomic power domains? Are there more than one?
  2. If you shut it down, some key element of the state needs to be retained. If you haven’t thought about that initially, it’s pretty much every flip-flop.
  3. You need to know the signals that need to be isolated.
  4. And you need to know the legal power states and the transitions between them.

If you have those four things you have the power intent for soft IP. That’s not enough to actually build something. Then you take the low-power intent and refine it. (see Fig. 4)

Fig. 4

Based on the various failures we’ve had, here are some things not to do. First, avoid non-contiguous power domains. When in doubt, align it with the logic hierarchy.
Second, don’t use clock gating on both ends of the clock. That ties you to specific libraries. Third, avoid partial retention within a power domain. Don’t try to retain some things but not all. It leads to weird behavior. And make sure that your power domains clocks and resets can be controlled externally. One other thing I would add is avoid test power or scan chains crossing multiple domains because that leads to interesting test challenges.

Power Panel: IP And Other Key Issues For Future Development

Thursday, February 10th, 2011

Low-Power Engineering chaired a DesignCon panel of low-power experts with Bhanu Kapoor, president of Mimasic; Kesava Talupuru, DV engineer at MIPS; Prapanna Tiwari, CAE manager at Synopsys, and Rob Aitken, an ARM Fellow. What follows are excerpts of their presentations and the panel discussion that followed.

Bhanu Kapoor: There are two components of power—dynamic and leakage. Dynamic is what gets used for some useful activity on a chip. Leakage is wasted power. To put this in perspective, at the 65nm technology node leakage power is about the same as dynamic power.

Dynamic power depends on the frequency, capacitance and supply voltage. Changing supply voltage makes a big difference.

Leakage has two components—sub-threshold and gate tunneling. The gate tunneling is addressed by high k/metal gate technology. The sub-threshold remains there and is growing exponentially. While it was not a factor at 130nm it has become a critical factor at 65nm and beyond. When you manage power, you have to manage dynamic power and leakage in active and standby mode.

You’ll need high voltage if you want to operate at high frequency. As such, you can reduce voltage if your application doesn’t need high performance. There’s a cubic effect on power consumption because of scaling voltage and frequency. In standby mode you want to completely switch off the supply. Power is a product of current and voltage. If you turn off the voltage you can eliminate most of the standby leakage.

There are various power management techniques to deal with leakage. (see fig. 1). Voltage is a key parameter to address power. It’s the use of voltage—and your design description language not allowing voltage to be an input—that have made design so difficult.

Fig. 1

You can’t be far away from what’s happening with process technology if you’re targeting your IP for future generations of chips. The process variation is a problem. You could be doing everything right, but process variations may lead to a leaky part. Unless you have controls such as adaptive body biasing to address leakage in those variations it’s going to be a potentially fatal factor.

There are different EDA tool flows and because of that we’ve got different formats for describing power. On top of that, soft IP is unqualified.

IP will be running in different power states, and there are different voltage levels for different portions of the chip. This information needs to be provided to SoC teams. Isolation and level shifting have to be taken into account. State retention is another technique. Bring-up current may be an issue. The spike in current could lead to voltage issues. For all of these reasons, if you’re a small IP vendor doing low-power design, life is very, very difficult.

Kesava Talupuru: There are a number of techniques you can use to reduce power.
With power gating you can shut off any of the pieces that are not in use to save on leakage power. With tree-root clock gating you can save dynamic power. With multi-voltage designs, for any part that does not require maximum frequency you can minimize dynamic power. And for multi-threshold libraries you can minimize the leakage power.

So what are the challenges for low-power verification? One is that traditional functional simulators are not power-aware. They assume that voltage is constant at zero or one. They cannot emulate protection gate behavior. They cannot model power ports and switches, and they cannot find structural errors. On top of that, the power-on and the power-down sequence checks are not adequate. They do not understand voltage transitions. When you do a reset they initialize the signal immediately, and when you power down the flops still retain value.

The verification environment should be power-aware. You need voltage-level aware simulation for dynamic voltage low-vdd standby techniques and you should simulate real silicon behavior. You should be able to model power switches and protection gates and check illegal power state transitions. And they also should support recovery sequences.

At MIPS we used three different techniques to deal with this. One is formal verification for the power manager unit, which gives you full control of logic, enables a small design size and provides formal proofs. A second is power-aware simulation for the entire system. This is useful for finding polarity isolation issues, retention and restore behavioral issues, and problems with power-up/power-down sequences. The static verification was basically lint checking. The tools can find any missing isolation cells or level shifters.

For the power manager we added hardware and software control. The software-related properties include read, write, hold and reset values. The hardware FSM properties included state transitions, illegal states, power up and power down sequences and hardware/software priorities.

We found a number of bugs using our flow (see fig. 2). Through formal, we found bugs in state transitions, illegal states and power up/power down sequence errors. We found errors in the power down where it needed to wait until all the transactions were completed. There also was a problem with the coherent to non-coherent switching. Using power-aware simulation we found missing isolation, some wrong isolation polarity and architectural bugs.

Fig. 2

FIG 2: KT-FINAL-DESIGNCON SLIDE 10

Killer Bugs

Thursday, April 8th, 2010

By Ed Sperling

Hardware and software bugs are all around us. When an application suddenly dies or a smart phone freezes because of the unanticipated interaction between hardware and software blocks in a system on chip, most users aren’t even the least bit fazed. They usually just re-boot and forget about it.

Bugs caused by power are an entirely different matter, however. For one thing, they’re usually fatal. For another, they’re getting much, much harder to detect. And third, they’re harder to fix when they are detected.

“Debugging is getting much more difficult because when the lead generator is powered off, how do you find out that there’s a problem? You may have two power lines with a different Vdd because of connectivity and it will not work,” said Bhanu Kapoor, president of Mimasic, a consultancy focused on low power. “With power you used to have a single voltage. Now you have different supply lines, so you get new problems. Some of this can be detected in the netlist, but some of these problems also show up in the course of manufacturing.”

The problem is magnified by the addition of multiple power islands and multiple cores.

“Correct delivery of a power supply is at the core of many of the power issues, and traditional testing methods use a fault model that is based on wires erroneously connected to supply or ground,” Kapoor said. “And needless to say, incorrect delivery of power will result in fatal issues for proper operation of the chip. For example, an isolation cell at the output of a power domain ensures active regions receive meaningful signal when this domain is shut down. If the supply to the isolation cell itself is switched off due to either an incorrect wiring or improper placement of the isolation cell then active regions will see some unknown values that will lead to failure of operation in this mode. “

Similar things will happen if the power supply for a level-shifter has wiring issues. It may be worse here since depending upon of voltage differences, the issue may only show up sometimes. And there may be these very hard to find sneaky leakage paths that drain the battery much faster without any functional problem ever showing up. They will also sneak through testing methods and only show up as a fatal business issue.

Consider a real-world example: A major wireless chipmaker was recently headed to tapeout when it ran some additional tests and found eight bugs related to wrong implementations of power intent. “They would have caused catastrophic failures,” said Peter Hardee, director of solutions marketing at Cadence Design Systems. “Things are getting a lot more complex. You may have power domain ‘A’ physically separated from power domain ‘B,’ and at some point they need to talk. The problem is that the wires may run through power domain ‘C.’ Was ‘C’ on or off when you verified the chip?”

It’s not that the wireless chipmaker didn’t understand all of these issues, either. Even at the most sophisticated chip companies where power intent and design was part of the up-front architectural decisions, problems still surface late in the design cycle. A device may be functionally verifiable but have fatal errors. And there’s no magic button to push or even an integrated tools flow that solves everything.

“A lot of things that used to be secondary issues are now primary issues,” said Vic Kulkarni, general manager of the RTL business unit at Apache Design Automation. “In the past, you could just put a lot of margin into the design, but the voltage has to be high for that to work. Today, the margin is no longer there.”

Dueling priorities
Creating SoC designs has always been about making tradeoffs between area, power and performance. Before 90nm, however, the power was more of an afterthought than part of the initial planning process. At 65nm and beyond, it is now an integral part of every chip, along with software and IP—which also were afterthoughts at older process nodes.

“The reality is if you have a performance issue or a power problem, it stems from the fact that you may validated the hardware in isolation, but not in the context of the software application,” said Shabtay Matalon, ESL marketing manager in Mentor Graphics’ Design Creation Division. “There are ways to fix functionality in terms of software. But I’m not aware of one that can fix power or performance by fixing the software.”

IP is likewise a problem when it comes from multiple sources and when it involves multiple voltages. Big IP vendors are all emphasizing power-aware IP so that it can be re-used more easily. But the amount of IP inside all SoCs is growing steadily, in large part because there are too few engineers inside companies to re-invent that IP and still get a chip to market on time.

Not all of that IP runs at the same voltage, and not all of it is necessarily used in a manner in which it was intended by the IP vendor. And while power methodologies such as UPF and CPF are supposed to account for that, some of it still slips through the cracks. In the best-case scenario, some of that can be fixed with software. There are plenty of cases that don’t fit that description, however.

“The fatal bugs are the ones that kill the company before the product ships,” CEO of MCCI Corp. “What causes those are mask spins. Behind those are system-level problems. You hook it up to a critical system and it doesn’t work. It’s down at the PHY level or the RTL level and it’s not accessible to software.”

Experts At The Table: Low-Power Management And Verification

Thursday, March 11th, 2010

By Ed Sperling

Low-Power Engineering moderated a panel featuring Bhanu Kapoor, president of Mimasic; John Goodenough, director of design technology at ARM; and Prapanna Tiwari, CAE manager at Synopsys. What follows are excerpts of their presentations, as well as the question-and-answer exchange that followed.

Bhanu Kapoor: There are two types of power you need to consider: Dynamic power, which is consumed because you are doing some useful activity, and leakage power, which gets consumed whether you’re doing something or not.

The dynamic power has dependence on switching activity, the frequency, the capacitance and the supply voltage. There are two components of leakage—sub-threshold and gate tunneling. Gate-tunneling is addressed by advances in process technology such as metal gates. Sub-threshold leakage grows exponentially with the decrease in threshold voltage. At 90nm it was significant, at 65nm it was equal to the dynamic power, and it grows from there.

If you look at the typical smart phone, it’s the same system-on-chip that is running different applications. These different modes of operation have different performance requirements. You can use different voltages to achieve those different levels of performance.

A typical power-managed SoC includes a power-management IC that provides different cores. One core can be a processor. And if it’s an ARM Cortex A9, there is power management in that core, as well. A second core might be for mixed signal, which potentially could require higher performance. And then this power controller, which is on all the time.

All of these power techniques have an implication on verification.

Slide5

If you look at standby leakage, one of techniques is power gating, which is cutting off power to certain regions. If you don’t need portions of the chip to be on, you can completely shut it down. That is power gating. But that has an effect on performance, because turning on and off a function is a long event compared to a clock cycle. You need to sometimes retain the state so you can come up fairly quickly.

All of this has an effect on verification, as you can see from the following chart.

Slide6

If you can do gate-level simulation, that is very helpful. You need input/ouput and power connected and you need to have appropriately modified your library definitions so power is one of the variables. With domain isolation, once you shut down you have to make sure you are not sending floating values to other regions. You have to isolate it to proper ones and zeros, which you can check with isolation gates using a rule-based checker.

If you have power in your simulation, a lot of rule-based issues can be addressed right up front. Over the years, simulation was not power aware. In the future, simulation will take a more and more important role. Simulation, by default, will incorporate power.

John Goodenough: We are verifying systems on chip. They’re large. They have lots of power domains to match all the application workloads that are going to be demanded on those devices. They have processors and software. Some of the domains are being switched on and off to meet the energy profile. They have virtually every technique available. The state space you’re trying to validate is therefore exploding by an order of magnitude.

One of the things we think about a lot at ARM is that it’s not so much the techniques that you can apply. It’s how you’re going to scale them to tackle these problems. There are lots of clever ways to validate, but not all of them scale effectively into workflows and onto your infrastructure. Power verification is not just about logical verification.

If you get a chip like the one below, you can mess it up in a lot of different ways.

Slide3

Usually, you can fix it in software. But you also can mess up the connectivity between the power domains. If you get your level shifter or always-on buffer or retention register wired up wrong, it’s not going to work. It’s going to be D.O.A. on the bench. A lot of chip failures are being caused by the failure to verify the integrity of the power network.

That’s a non-standard piece of verification, particularly where that interacts with the logical function of the chip and you’re trying to measure the maximum in-rush current and the average in-rush current. If you’re switching domains on and off, what’s the power domain going to look like from an electrical perspective? Is turning one domain on and turning another domain off going to put the voltages on either side of a level shifter into a pathological state that will damage or degrade the transistors and the level-shifting buffer?

There are some very interesting cross-coverage issues between what is traditionally more of the analog verification space on the power network and the logical verification space. We need, when considering power simulation, to run abstracted analog simulations, SPICE-level simulations, and cross between the two.

Unfortunately, the explosion in power states is also increasing because of the number of software states or the number of field configuration states. From a verification standpoint, not only are you adding a multiplier due to power states, you also have things like a secure or non-secure state. Will they work when a chip is configured for a single package and pinout if it uses another package and pinout? There’s an explosion in these operating modes.

The other pressure we have is making sure you’re going to hit a given schedule. In looking at the power metrics it’s important to see how they can be applied into practical workflows and how you can feed performance metrics from wherever you are in the process back up into reporting and closure reporting. If you combine the need for those two, one of the things it leads to is enterprise scaling, both in terms of infrastructure to support the simulation and how you scale this across workgroups that are not co-located.

The other problem you face is that if you do all of the verification, you’re never going to get the chip out the door. You’ve got to have a verification plan and really narrow down which of the power modes are going to be pathological and which ones can be worked around in software. A major part of thes power verification is the integration of a VP of engineering risk-reduction play into a more mainstream verification practice.

We’ve come a long way in a lot of the techniques, but at the end of the day you have a block diagram that needs to be simulated. Today that block diagram consists of RTL and some way of describing the power network or the power intent and power state space of the design. You also have to support the verification IP and transactors. You need coverage across the RTL and the power descriptions. It’s not rocket science. It’s just a more complicated block diagram.

Slide4

Power Delivery Issues

Wednesday, November 11th, 2009

By Ed Sperling
Reducing the voltage in a system on chip is like turning down the water pressure on a home plumbing system. Pretty soon you find out that not all the faucets work properly because there isn’t enough pressure behind them.

While it’s vital to drop the voltage to boost battery life in mobile devices, not to mention reduce the overall power consumption in plug-in devices, the effects aren’t always well understood ahead of time. Power delivery changes with the voltage, and not always in anticipated ways. The problem is that chips are getting so complicated with power islands and multiple cores that it’s difficult to anticipate all the possible permutations up front.

“There are indeed challenges,” said Jan Rabaey, who heads the Wireless Research Center at the University of California at Berkeley. “Fluctuations in currents are an obvious result of turning domains on and off.”

In fact, the more abrupt the on/off states, the greater the likelihood of power delivery problems. “It’s like hitching a car to a trailer and taking off,” said Srikanth Jadcherla, group director for R&D in Synopsys’ verification group. “It doesn’t move the same way.”

And the more power islands, the worse those problems get. “This is something that’s well known in the cell phone industry,” said Bhanu Kapoor, head of Mimasic, a low-power consultancy. “They’ve got ARM cores, DSPs and memory blocks on a cell phone processor and they have a power supply for all of these different modules. But when you need to switch on a new block, the power supply has to deliver power to both. The power supply inductor tries to guard against any change, though, so it actually gives parts a lower voltage. That causes a temporary malfunction.”

Thinking about delivery in the architecture
While the effect of power islands have gotten the lion’s share of attention in low-power designs, they’re certainly not the only things that can go wrong. Failing to account for all possibilities up front can cause problems that grow as the chip moves from architecture to design and verification.

“Blocked frequencies and domains shutting off are a result of badly designed power distribution networks, which can happen even if you don’t have power islands,” said Rabaey. “By changing the resonant frequencies of the power network, you may see potential interplay with the clock frequency of the modules. But again, this is a generic problem with power distribution networks and has nothing to do with having power islands or not.”

Problems also grow as the semiconductor process shrinks. One of the problems in delivery of power at smaller geometries is the width of the wires themselves. While most engineers went through school with the assumption that electrons move through wires at a fairly constant rate–depending upon the type of wire rather than the thickness of that wire—that’s clearly not the case. IBM first began noticing earlier in the decade that resistance of smaller wires was increasing due to electron crashes with the atoms in the wires. Increased density meant more crashes.

The typical route for chipmakers is to engineer a solution to these kinds of problems. But that also increases the complexity and the price, because it usually means more parts. A 10-cent decoupling capacitor for a chip that is sold in quantities of 50 million adds $5 million to the overall price. And that doesn’t include the additional cost for assembly, which typically adds another nickel, or $2.5 million.

More parts also mean more complexity in the design. And more complexity means more things can go wrong.

“There was one chip we were developing where the clock gating domain produced a spike in current,” said one engineer, who asked not to be named. “We came up with logic to control the wake up, but when you shut down the clock it staggers it. As you’d expect, it got stuck. So we took off the clock-gating circuitry and there was a huge droop in voltage.”

In another real-world example, chip development was stopped the day before tapeout because there was insufficient decoupling capacitance. That affects timing. The chip arrived at tapeout two days later because a crew of engineers worked solidly for 36 hours to fix the problem. Needless to say, they wished the chip architects had figured this out ahead of time.

How Many Power Islands Is Too Many?

Wednesday, May 13th, 2009

By Ed Sperling

Power domains, also known as power islands, have become to design engineers what multiple cores are to processor architects. They can serve a purpose, namely reducing static current leakage and saving battery life. But they also can add so much complexity that they can make it almost impossible to get a new chip out the door.

Just as there has been talk of hundreds of cores, there has been talk of hundreds of power islands. But trying to verify a chip with that number of power islands is beyond human comprehension at this point, and so far there are no tools to make it simpler. As with multicore programming, there may never be, which is why companies like AMD are now considering dedicating different features for one or more cores rather than trying to split applications into myriad parts.

But power islands bring their own set of unique challenges. When you have 20 power islands, for example, each combination has to be tested. If one is one while another is off, that combination has to be tested when both are on, both are off, both are in sleep mode (or various modes that draw less power). Add another couple dozen power islands and the problem begins approaching epic proportions.

Shireesh Verma, a verification expert in Conexant’s Imaging and PC Media Group, said at this point there are definitely practical limits for the number of power islands.

“The maximum I have seen is 28, but typical is less than 20,” Verma said. “But it is not the complexity in the number of domains. It’s the combination of domains and the sequences you have—how many you have at different power states.”

Power islands must be balanced with the number of cores and the ability to verify the design. At least some of these techniques are needed. Bhanu Kapoor, founder of Mimasic, a consultancy in Richardson, Texas, said that clock gating sufficed as a way of controlling dynamic power until 90nm. But he said from 65nm on, every trick is needed.

“I’ve seen 5 to 9 power islands as the most common number,” Kapoor said. “The largest I’ve seen is from Renesas, which had 23. They had an interesting hierarchical power management scheme. But they’re not all independent [power islands].”

He noted that Nvidia is working on chips with up to 500 cores for graphics processing, which is one of the very few highly parallelizable mainstream applications. He said each power island on an Nvidia chip may control 24 cores.

In addition, there are diminishing returns for power islands. While shutting down power on functions clearly can save battery power by limiting the amount of static leakage, waking up and managing power islands impacts power, as well—both from the state change to the management of those various states.

For most design engineers, though, power islands are a relatively new concept. While the largest semiconductor companies have been working with them since 90nm, most of the work has been experimental.

ARM has had power domain test chips since the 130nm node, but most customers never really began thinking about them until the 90nm node. They’re now starting to hit production in high-volume applications such as smart phones, where turning off functions is essential for preserving battery life.

“A lot of times people will settle for two power domains—here’s the CPU and here’s everything else,” said Rob Aitken, R&D fellow at ARM. “We’ve been interested from the question of how many domains per CPU. We have settled on two. It’s only the more recent cores architected with cores in mind.”

ARM has been demonstrating its 1176 processor cores with state retention, but Aitken said there’s a question of whether design engineers will want to keep everything in the same state. He said that with state retention, there are no more than two power domains per CPU.

“It limits architecturally the things the processor ought to do. There’s also a concept that if you can do it in a nice way that’s transparent to the rest of the methodology, then you can have more. If your RAM had a power switch and it didn’t interfere with anyone’s verification or regulators, then you could put switches on it and buy people something. The added complexity of these domains limits what you can do before you throw up your hands in despair. The limits are in the 20s,” he said.

Less Room For Error

Wednesday, May 13th, 2009

By Ed Sperling

Say goodbye to fat design margins in advanced SoCs. The commonly used method of adding extra performance or area into semiconductors to overcome variability in manufacturing processes or timing closure issues has begun to create problems of its own.

While there was plenty of slack available at 90nm, adding margins at 45nm and 32nm disrupts performance or eats into an increasingly tight power budget—or both. And while this may seem like a relatively problem solving exercise, margins are to a design engineer what a safety net is to a high-wire acrobat. They allow engineering teams to get to market on time and on budget, with an incredibly small number of bugs considering the complexity of current designs.

Cutting margins means substantially more up-front modeling and much more work in figuring out where the variability is in new manufacturing processes. It also means potentially more restrictive design rules and less creativity at the very front end of Moore’s Law.

Different approaches

“At 45nm and 32nm, you can’t put a margin on everything because your performance would go to zero,” said Rob Aitken, a research fellow at ARM. “For the relationship between design and low power, there are two approaches being advocated. One is to do a better job quantifying the margins. Instead of putting a finger in the air and saying, ‘Let’s worst case this and worst case that,’ the solution is more, ‘Let’s actually look at data and figure out where the worst cases lie, look for correlations and relationships between the amount of timing slack we have and our verification extraction methodology. Maybe we can use a better extraction technique and shave off some of that margin.”

A second approach is a more adaptive one, where you know there will be some margins but you don’t know exactly what they are. “When you get your silicon you have adjustable parameters, whether they’re voltage or clock frequency or something else, that you can tune on a per-chip basis to boost up yield and achieve margin without necessarily putting it in the design,” Aitken said.

There are other approaches being advocated, as well. Bhanu Kapoor, founder of Mimasic, a consultancy in Richardson, Texas, said building work-arounds into chips such as classic fault tolerance is an acceptable option.

“We need to start learning to live with errors,” Kapoor said. “Margin-related issues will lead to errors and they will not function correctly at times. That’s where you have to bring in techniques like fault tolerance, where you have error correction. That is a very useful technique for low power, too, because you can work at lower voltages. There will be times when your critical path timing will not be met and you will have errors. Then you try to detect the errors, correct them and learn to live with them.”

Still others say there should be no workarounds. Vinay Srinivas, group director for R&D at Synopsys, said the solution is eliminating variability up front so there is less need for margins and far fewer errors.

“You need better tools, modeling and methodology,” Srinivas said. “Having these guardbands is not acceptable. If you were to guardband everything when the system wakes up you would have so much latency that you couldn’t afford it in the design. At 45nm and 32nm, you need more voltage-aware modeling.”

What works?

While companies such as Synopsys are pushing for better designs up front, the majority of designs will still include some design margins—at least in the short term. Hamid Mahmoodi, assistant professor of electrical and computer engineering at San Francisco State University’s School of Engineering, said there are times when each approach works.

“There is a lot of variability and unpredictability in designs,” Mahmoodi said. “Adding margins is the easiest way to solve that. You can make the design faster than expected by adding in additional biasing or something to cope with the variation in processes. But adding margin means more silicon area and more power. There is cost in terms of additional sensors or voltage regulators. Even corrective action requires overhead.”

Sometimes, in fact, adding margin can be the most cost-effective solution.

“In a given process, which is more cost effective depends,” Mahmoodi said. “If the variability is small, adding margins is the most cost effective solution. When the variability is large, and there are variations is process parameters and voltage, then adding margins is too expensive. At that point, it’s best to consider fault tolerance schemes or adaptive asset calibration methods to make the design more reliable.”

Conclusion

The bottom line is that even the experts disagree on what route to take when. That largely will be up to the design teams working under intense deadlines to get their chips out the door. But at each new process node, there clearly is less room for adding margins and more restrictive design rules for getting chips to yield properly and perform as planned within power limits defined by customers. And if you think it’s hard at 45nm, it’s only going to get more difficult over the next couple nodes.

Lower Power, Bigger Problems

Wednesday, May 13th, 2009

By Ed Sperling

Low power used to be an afterthought in semiconductor design, and it almost was never a consideration in verification or manufacturability. But at each new process node, the number of power considerations goes up as the line widths go down.

To begin with, there are two basic types of power. The first is dynamic, which has been a consideration ever since batteries were added into devices. Dynamic power is the amount of power needed to do something useful with a device. And while components continue to get more efficient, those improvements typically are measured in the single digits.

Much bigger gains come from more efficient use of those components, particularly turning them on and off. At 130nm and above, turning off components was a “nice to have.” Below 130nm, it’s a requirement because of static power consumption—the current that leaks out of transistors that are left in the “on” state when they’re not being used.

The effects are easy to see when nothing is done at different nodes versus remediation with power shutdowns, as the following diagrams show:

vtspanel-20091

vtspanel31

vtspanelx

Source: Mimasic

Bhanu Kapoor, founder of Mimasic, a Richardson, Texas-based consultancy, said during a recent speech that there are several interdependencies in static leakage that need to be considered.

“Leakage has a linear relationship with the supply voltage,” Kapoor said. “Leakage also has sub-threshold and gate-tunneling components, which have been growing exponentially with respect to the threshold voltage. Gate tunneling is being addressed with high-k materials and metal gate technology. The sub-threshold component is still there.”

Old problem, new tricks

Addressing static leakage is absolutely essential at 90nm and beyond. Kapoor said that at 90nm, leakage amounts to 20% to 30% of the total power consumed by a device, and techniques such as clock gating have no impact on static leakage. The only thing that really has an effect is shutting down portions of a chip or device that are not in use. So while a cell phone also has a camera, games and music, the only function that has to be on all the time is the ability to receive calls.

“There is a strong dependence on power with respect to voltage, and there are several techniques to use voltage to get a handle on power consumption,” he said. “A typical application like a cell phone has times when you use applications and there are long periods of standby when the device is not in use.”

Voltage also can be scaled for a specific function or application of a cell phone, for example, something that is beginning to make its way into heterogeneous multicore design. The basic idea is that you have a fixed power budget for a design, and you can better utilize that budget if all the cores aren’t drawing the same voltage. A phone needs more power than the camera, for example, so the core design can be changed to reflect that. Similarly, logic and memory for one function may be significantly smaller than for another.

Verification challenges

But even after a design team has done everything to minimize power consumption, the problem is far from solved. Verification, which accounts for 70% of the time spent in chip design, gets significantly more complicated as each of these new tricks is implemented. There are now a lot of different power states in the design, and there are voltage islands that can be on, off, or somewhere in between. Intel, for example, has seven sleep states in its core processors.

“What all of these techniques did was introduce voltage as a variable in the design process,” Kapoor said. “Verilog and VHDL, or any other language, don’t have a notion of voltage as a variable. You need additional description to go along with the functional descriptions to describe the power management architecture. That has led to a power architecture description format that is being standardized through IEEE’s 1801 working group. But in terms verification, now that you are powering down different regions of the chip you need to isolate those and retain values when you are powering down. When you are going from one region to another you need to level shift these signals. And all of these things need to be validated.”

He said that requires protocols for shifting things on and off, potentially changing the design to allow for verification, and some formal assertions for when one area powers down and what effect it has on other areas of the design.

On, Off and Mostly Off

Friday, March 13th, 2009

By Ed Sperling

System-on-chip architecture has always been about getting the most performance out of a device, and the basic premise is that when you turn on a device it is always on.

That approach has been challenged over the past few years with a fundamental shift toward more of the design being in the ‘off’ position. Aside from reversing decades of engineering practices and assumptions, that accomplishes a couple of very significant things.

First of all, with static leakage a persistent issue in all devices at 90nm and below, the simplest thing to do from the standpoint of the device’s power budget is to turn parts of a chip completely off. That has become the norm in most designs, which is why the number of power domains is growing. Some use different voltages, some are turned off completely when not in use, and still others are reduced to various levels of standby, depending upon how quickly they need to return to a full “on” position. All of this saves battery life in handheld devices, and it saves power in large racks of servers in data centers.

From an architectural standpoint, the key concern has been prioritization of function and what is most important to the consumer. In a smart phone, for example, the phone must be able to receive a call at all times while data needs to be uploaded regularly but not in place of a phone call. And a camera can be switched off almost all the time. In a television or computer, almost all functions are on at all times, but in a more acceptable state. The long delay in booting up a computer from scratch or waiting for a television to warm up was considered unacceptable by consumers so a standby mode was added, basically giving priority to their time while reducing energy consumption.

In the future, however, more of the device will move to the off position, regardless of whether it’s a home appliance, a computer in the home or in the corporate enterprise, or a handheld device with limited battery life. Work is underway to develop intelligent devices that reside inside plugs so that once devices are fully charged they no longer draw current.

Ferroelectric memory (FeRAM) is another option in devices. The construction works the same way as DRAM, but it uses a ferroelectric layer rather than a dielectric one. The advantage is lower power draw, higher speed and more write-erase cycles. So far, cost has been a deterrent, but with power now at a premium in designs, experts believe there is some hope that FeRAM could grow as part of an overall low-power design.

The more immediate solution, however, is multiple power islands. Bhanu Kapoor, founder of Mimasic, a low-power design services company. The problem comes when you turn those islands on and off.

“It’s not hard to imagine a situation where you go from ‘standby’ to ‘on’ and then to a large portion of the chip being ‘on,’ said Kapoor. “That can lead to voltage spikes on the device, and it gets worse as you move to many-core computing where you have a large number of processing cores.”

He noted that Nvidia is developing a 512-core graphics chip that is highly parallel with cores divided into groups of 24—a many-core approach as differentiated from a multicore approach. That could create as many as 30 power islands, however, and he said each of those islands has to be sequenced to avoid huge power spikes. From a design standpoint, that is no simple task.