Posts Tagged ‘Apache Design Solutions’

Next Page »

Experts At The Table: Power Budgeting

Friday, June 3rd, 2011

By Ed Sperling
Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: Where are the gotchas in design?
Klein: There’s a good Las Vegas analogy for place and route. Even if there’s 50-50 odds, you’ll lose because you don’t have an infinite supply of money. It’s similar with capacitance. You have the same thing with RTL code. You can’t connect to an infinite number of places using the shortest distance. So then you need to make tradeoffs. What is the V(2) and what is the F of this group? Should you allow more capacitance here, or less? You can’t connect to anything in zero time?
Chin: Aside from the obvious V, which is squared, it’s the F that’s important. C is an implementation—a detail—that is impacted by the other variables, but F is the most important. It’s the switching activity. The more understanding you have of that switching activity, the more you can affect how is this thing operating and when is it operating. The biggest problem of being able to start with some budget of power and later being surprised is usually associated with having a bad assumption about F. At a very high level we can understand that. A lot of things we intend to be mostly ‘off’ aren’t quite so ‘off.’ Even though your phone is sleeping, it isn’t quite sleeping. When you move around, accelerometers wake up even though the screen is still black. This is the leakage with a small ‘L.’ It’s not static power. It’s operations that are happening at the device level that aren’t doing anything useful. It’s high-level leakage of power that isn’t computed into the implementation flow because we assume it’s ‘off.’
Klein: We don’t assume it’s going to be off because in the FPGA we don’t have islands where we turn off an entire function.
Chin: But in terms of activity, we’re assuming there’s not going to be any toggling of activity, and therefore no dynamic power.
Klein: In regards to noise, sometimes you can create an algorithm that you didn’t expect to be generating too much noise. But I’ve worked with customers where they’re getting noise and they can’t figure out why. We’ve said, ‘It looks like you’re doing four cycles and then rest and then four cycles and then rest.’ And they are doing that to match rates of blocks. That produces noise, and noise can produce power, as well. Sometimes those require algorithmic readjustments.
Kulkarni: We’re finding that the glue is missing between these areas. As the problems get more complex with Moore’s Law, there’s no ability of architectural designers to talk with SoC designers and IP designers. Sometimes the IP is designed by another company or in another country. And then, finally, there’s the package at the end of the chain, the PCB and the DDRs. DDR4 is going between 1.4 and 1.6GHz, and it’s creating tremendous noise. It’s creating overall power budgeting issues that transcend just the SoC. You have to manage that off-chip.

LPE: That brings up a good point. How much of this is out of your control now as you bring in more third-party IP and software from other vendors?
Van Besouw: Most of the IP that people bring in right now is RTL-based. It’s soft. IP may even go to the system level, as well. But you have to question whether it’s a budgeting issue going from RTL or whether it is knowing how the power consumption will be once you go to silicon. One of the reasons system-level design isn’t everywhere is that it’s not connected to the rest of the design flow. We need to create tools to make that connection. You need to make the right choices at the right time so you can still make a big impact on those variables. That’s the key. It’s more about automation than trying to impose more methodologies around the limited tools we have today. EDA has allowed that, and it has affected innovation.
Kulkarni: It’s our responsibility as a tools vendor to allow cross-domain enablement. How do we connect these static blocks of companies, groups, divisions, or the knowledge base? We need to encapsulate those models—and modeling is an important part of the power budgeting equation. The power models may contain a die model for a power grid, current sources, power profiles and the parasitic capacitances. We need to encapsulate that, along with the right frames that are selected from a stimulus, which can then include IP—including hard IP and hard analog IP—plus RTL that’s configurable. Then you need to create a chip-power-model. The job of a package engineer becomes easier. The job of an IP engineer becomes easier. And the job of an RTL engineer becomes easier. The models are encapsulated in the flows and the tools themselves.
Pangrle: There are certainly challenges there, but it opens up opportunities for everybody here to provide better methodology in tools. There’s also room for improvement in how the IP is being delivered to the customer. There’s a lot more information that can be included with the IP to make everybody’s life easier.
Kulkarni: There has been a lot of work in the functional and timing areas, but power is where those areas were 20 years ago.
Van Besouw: Power is like an afterthought.
Klein: There is some flexibility in FPGAs to alter the power afterwards, but you have to do advanced design in the parts that can’t be altered after the fact such as the process choice, static power, what user functionality you have for power.
Van Besouw: But even configurable approaches have limitations.
Klein: The attitude about timing is different than power. When the timing doesn’t do what it says, the design fails. Nobody will accept that. But if the power is 10% high, the attitude is, ‘Oh well, I didn’t really think I could predict that accurately, anyway.’ With timing if you’re off by 10% it doesn’t work.
Van Besouw: That’s changing for power.
Klein: Yes, it is getting that way.
Chin: On my phone I’m tolerant of dropping calls, and I’m becoming more tolerant of waiting for my phone to do things. But it’s really a problem when it’s out of power. From a mobile device standpoint, that’s like getting the wrong answer on your calculator. Power is the most important thing on a mobile device.

LPE: So how is this changing the design priorities?
Kulkarni: At least 10 companies we talk to all compete on power. They spec it that way.
Klein: We’re seeing the same thing. Our customers are competing with each other on power. They never were before.
Chin: That’s really where the problem is. You know the applications you want to run, but there’s no good way of determining whether it’s 5 watts for one application or 1 watt for another. No one can really tell. We don’t have a hard limit on power. We can’t say if it’s 1.1 watts or 1.2 watts to view a YouTube video, because there are lots of other things going on.
Pangrle: There are lots of other variables to play with. The reason power is becoming more important is that if you’re using a certain package and you know what the cost of that package is, if you’ve blown your power budget and need another more expensive package you may not have a market anymore. You may price yourself out of the market. On the other hand, you can suffer a little on performance to keep your cost within an acceptable range. If you look at applications such as graphics, how accurate do you have to be? If you slip a few frames a second the end user isn’t going to notice.
Van Besouw: It’s like how far can you go on a gallon of gas. It’s not a matter of how fast you can go. Power is the same thing. How long can you talk on your phone?
Chin: And given that battery technology is relatively equal across the vendors, it’s all about power efficiency.

Experts At The Table: Power Budgeting

Friday, May 20th, 2011

Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: What’s the best practice for dealing with power in complex chips?
Van Besouw: In today’s chips, with 50 million or 100 million gates, it’s inconceivable that the whole chip is functioning at the same time. You have to make decisions at the architectural level as to what’s on and off and which voltage islands to use. It’s another level of complexity. We’re just scratching the surface of trying to manage that. It’s one thing if you’re doing it on a small design. It’s another if you’re doing it on a complex SoC. There’s still a lot of headroom.
Chin: There are two separate things we’re dealing with. One is high-level complexity. Everyone agrees you have more leverage at the architectural level. But if you look at dynamic power optimization, a lot of what we can do at the implementation level we can do better if we understand what modes of operation we’re trying to optimize for. If you’re trying to optimize your silicon—even your physical implementation for a specific case—then you can optimize for that case very well. If we can take our optimization and weight it in the direction we want, whether it’s 75% of the time or the maximum operating condition or anything else, understanding those things has become very important. We have ways of optimizing our tools for activity constraints. We can put in specific vectors. All of these things can lead to different implementations, both logically and physically, of the actual circuit. Circuits can be optimized for certain conditions. But the problem I’m seeing today is even in the few cases where we have information about switching activity, there’s no context to tell us, ‘This is switching activity for this particular mode with these pieces powered down and these pieces running.’ We need to add that kind of information. That will make all of today’s tools work much better than they do today.

LPE: Doesn’t that make it harder to devise a derivative chip because you’re optimizing for each market?
Chin: Not really. Imagine if you had a device that ran twice as long just for gaming. Or you have a phone that gives you three weeks of talk time if you don’t do any gaming. This is an opportunity from the design side and the manufacturing side. In an era of dark silicon, where we have more gates than we can power at any given time, we can customize that hardware. Let’s push in the direction of specific implementations that allow you to optimize for all these applications. It’s also a way for people to differentiate their products.
Kulkarni: We almost need to reverse the paradigm of design. Why should we have a processor architecture for an embedded processor and use it for various applications such as YouTube or e-mail or music? We should do it in reverse. What kind of architecture does Facebook need? There could be a ‘processor-Facebook’ and a ‘processor-YouTube.’ The stimulus is different, the power consumption is different, and stimulus is becoming more critical for power consumption. You essentially are taking your Vcd power and analyzing your function based on your Vcd set. The real problem is power, but we’re searching for solutions because some Vcd set is available to you. So now you can start talking about power budgeting. What will happen to a power grid design? Will you create EMI problems or EM problems? Will it blow up while you’re using a Facebook application?
Chin: Today you can carry around a navigation system in your car that will communicate back into the cloud in real time so other people will know to avoid traffic congestion. Wouldn’t it be great if we could use that same technology to understand what modes in the chip are being used? That’s exactly what we want to know. Here is the switching activity that millions of people are using. We could write that today with EDA tools that can run on your phone. We don’t have that information. We need to customize more on usage all the way up to the application so we really can have a Facebook processor.
Pangrle: A lot of the architects have been focused on performance. It varies from application area to application area, but the guys doing processors with standard instruction sets know what they need to cover and how to get more performance. They need to extend what they’re doing to include both performance and power.
Chin: And they need the analysis for power. We do really good analysis for performance these days, but for power it’s a lot more nebulous. And to a certain extent we do performance analysis statically. But in the power realm, it depends on what you’re executing. At some point, when you’re averaging across too many cycles, you’re losing the information you need. Today it’s still important to look at dynamic capabilities, especially with power budgeting. What people want to know is how much resolution do you need. You need to know where all the peaks are. Today, power reminds me of dynamic simulation 20 years ago when people looked at timing of specific long paths.
Pangrle: Being able to look at different modes is important. But if everything comes in at the same activity level you can’t distinguish or optimize for that. Being able to capture at different modes what that activity is that’s related to that mode is a big help. The things that can be done at an architectural level will have an impact downstream, even in regard to the tools needed to get the job done. This is like multicorner-multimode at logic synthesis or place-and-route. You need to make sure you’re not just meeting performance, timing and power at one operating point. These things are operating across multiple voltages. You need to make sure you’re safe across all the process corners as well as all the operating modes. That’s having an impact on the downstream tools. I like the idea of having a different processor for a different application.

LPE: It seems that another large EDA company has pitched a similar idea.
Klein: That’s where the FPGA is unique. It is, by definition, meant for an arbitrary number of modes of operation. We can dynamically reprogram sections of the FPGA while other parts are running. You can change functionality based upon what you see coming in. Additionally, because we have a programmable device, we need to put levels of hierarchical things that people doing synthesis can take advantage of—or which smart designers who understand all the modes of operation can utilize. We have hierarchical levels of clock gating. You can globally gate off the clocks with a one or zero. At a regional level you can have multiple clocks to gate off tens of thousands of flip-flops or block RAMs or DSPs at one time. Then we have finer gating at the individual block level. Each of them has various benefits and deficits. We also look at whether the contents of this flip-flop will be consumed on the next clock cycle. If not, I can gate it off on that clock cycle only. Knowing the functionality would be helpful for more global analysis, but if we don’t put the hardware capability in there in the first place to gate off locally, regionally and globally, it won’t matter what the software does because we won’t have the hardware features to take advantage of it.
Van Besouw: You have to make assumptions. The same functionality may be used on many different modes. That’s interesting because for one mode you may write completely different RTL. What you generate as an end product may be completely different. It may have different timing and physical constraints.
Chin: And these days that’s not beyond the realm of possibility. You can implement multiple modes because you have much more silicon than you can use. So why not have the Facebook processor as well as the gaming processor all on the same chip? You can power up the different sections based upon what you’re doing. In total you can save a lot more power. The tradeoff has always been timing and area. Now it’s timing, power and area, and area is probably third on the list these days. There are a lot of transistors on that chip. Figuring out what to do with them is something we’re having problems with. And the best way to control leakage is to shut things down. We’re starting to approach the more optimal implementations. It’s the reverse of resource sharing. There’s more and more hardware with specific functions.
Kulkarni: Besides power analysis, how do you refine the band of power? The assumptions you make at RTL almost always get thrown off the moment you go to clock-tree synthesis and place-and-route.

LPE: Meaning that when you take real measurements they’re not accurate?
Kulkarni: Yes, they may be off. So that means it’s not just the tools. Power budgeting is a set of tools and a methodology for the whole refinement from ESL to RTL to CTS to P&R and power-grid design. The capacitance can go haywire. Between the clock tree, what are the so-called source tree and leaves? What happens to the mesh clock structure if P&R tools play with it to do timing optimization? You can throw off all the assumptions you make at the RT level for power consumption unless there is a methodology to define power accuracy or inaccuracy. The plus or minus 30% should go down to 3% to 5% when you are doing final dynamic voltage-level signoff. The power intent will tell the tools what to do, but CPF and UPF do not tell you how to implement the low-power design.

Experts At The Table: Power Budgeting

Thursday, May 12th, 2011

Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: The ITRS road map points to serious problems ahead with power budgets. How do we solve that?
Chin: We want to be talking about power budgets at as high a level as possible. You have much more leverage for impacting power the sooner you can attack the problem. You want to look at the problem as far up in the architectural cycle as possible. That’s critical to power budgets. Another issue involves how we close in on the specific details. On the verification and test side we talked about vectors driving the tests. That’s been the case with power at a detail level, but not at the higher level. We need to better understand what are the operating modes, how we get into these modes, standards, how we specify power intent, and the implications from the hardware through the software stack, including the firmware, middleware, operating system, all the way up to the applications. Application software itself has a big implication on the power in the device. We need to be able to boil that down to the hardware and meet somewhere in the middle.
Klein: Static power has been a big driver after 90nm. Because it’s process- and temperature-dependent, it affects things where the chips are being used in a rack with limited airflow. We see thermal and power as very important, and each generation we’ve gone further than the recommendations of the ITRS. We’ve had to reduce those by process choices and other techniques. But as we begin to drive down static power we have to consider dynamic power. As we grow the cores of the FPGA bigger and bigger you can have more things that are toggling. That’s where power optimization techniques come in. We’ve invested in post-synthesis power optimization through SPICE-level clock gating. And once you’ve dealt with all those things, you’re still left with I/O power. You have to look at what you can do with the architecture and how you can deal with that. It’s a multi-pronged approach between static, dynamic and I/O power.
Pangrle: A lot of what drives the power budget is the target market for the chip. If you look at high-performance microprocessors in servers, they have topped out under the 150-watt range. If you’re looking at a cell phone you may be limited to 1 watt. There’s a broad spectrum, and for each one of these there are a lot of networking applications. If it’s going on a board you may be looking at a total of 17 watts. What’s driving this is the total cost of the system, which includes the packaging and how you’re going to cool it. With high-level servers, if you start going above a certain level you have to start looking at liquid cooling rather than just using air and fans to cool them.

LPE: At one point even fans were considered exotic and too pricey, right?
Pangrle: Yes, that’s correct. It all comes down to cost, and at 90nm we’re seeing a real big impact in leakage current and static power. When digital watches first came out they were running at 9 volts and static power was practically non-existent. Threshold voltages were high and you didn’t even take those into account. When we got below 100nm we went to 1 volt. Even if you look at the 28nm process, the nominal Vdd is still in the range of 0.85 to 1 volt, so we’ve lost that scaling. If we’re following Moore’s Law and doubling what we’re putting on each chip at every new technology node, but the energy per device isn’t being halved, that creates some real issues. We need higher-level optimization and tradeoffs between hardware and software.
Van Besouw: Power is the limiting factor in everything from small devices to set-top boxes for meeting performance goals. It’s a very complex process in determining how much power is going to be distributed to each smaller block. If you have hundreds of millions of gates that means you have hundreds of blocks. But when you distribute power that isn’t evenly distributed across the chip. It very much depends on the functionality of each block. You don’t know until you go down to the placement how much power is really being consumed. You want to do the power optimization at as high a level as possible—at the architectural level. You want to design this at the chip level, not at the low level, but there’s also another problem. You need details. You need to know what’s being used. That will determine how you implement RTL. It depends on the power requirements, and that impacts timing and placement. It’s a very connected problem. You want to make the right decision at the RTL level, but you need accurate information for placement. Floor planning, in turn, has an impact on timing closure and the timing characteristics, which includes the use of voltage islands. It’s like putting a 3D puzzle together where the shape and size of the puzzle is constantly changing.
Kulkarni: There is a difference between the classic Moore’s Law consumption and the Moore’s Law expectation. The trend is toward ‘More than Moore,” or MtM, which creates demand for power budgeting. The Moore’s Law that has been scaling transistors and process geometries was really driven by timing and performance. The MtM roadmap shows this problem isn’t just ICs. It’s also 3D stacked ICs. When you look at all the new tablets and smartphones we’re looking at stacked ICs. The power budgeting is exacerbated by the MtM law, which is taking over now. Moore’s Law will continue, which is ‘More of Moore.’ There will also be ‘More than Moore,’ which is MtM. We see that in the mobile markets, which are 100% focused on power and noise. OEMs are defining a power spec at RTL and then asking chip vendors to bid for it. The chip vendors need to define a band of accuracy for power all the way through post synthesis, clock re-synthesis, block placement, placement and route, and then dynamic voltage drop. Power budgeting came about as an emerging challenge. You need to make sure you can deliver the 5 watts maximum that the customer is asking for. But you have to be careful because you also can create voltage drop issues downstream on a PCB once everything is signed off. How do you predict that with the right stimulus management?

LPE: How much can still be saved in power in 2D for a reasonable cost?
Klein: We think there’s a lot of room left. We have spent more and more time at each generation looking at ways to save power. There’s a lot of low-hanging fruit that you don’t necessarily think is there. Even in the dynamic power area there’s low-hanging fruit. We’ve implemented fine-grained clock gating, and smart software at the post-synthesis level can take advantage of that. Designers also could do it pre-synthesis, as well. There are so many things people haven’t done yet that there is a lot available. We’ve also built headroom into our 28nm processes. We have a large number of parts we can offer at a lower voltage, which gives us the ability to lower dynamic power by 20% just based on the square of the voltage. Then you can still architect hard blocks, which compared to FPGA soft logic will be much better. Because we’re coming from the FPGA space, there is significant improvement.
Kulkarni: We’ve found there is significant room at the RT level. In the mobile area we found a sophisticated designer had done all he could for a quad-core design. In certain modes, there were three cores shut off and only one was running. But he found hot spots on those cores when they weren’t supposed to be running, so he investigated further. It turns out that the dynamic power, which is the relationship of four signals—data, clock, enable and reset—was not off completely. Data was circulating. Functional verification showed no problem. Formal verification showed no problem. But those three cores were consuming useless power. Once he found the problem, he reduced dynamic power by 22%. But there is no single button to push for that. RTL debug is becoming a way of finding those problems. That’s not really low-hanging fruit, though.
Pangrle: A lot of what you’re bringing up is that customers are new to active power management. That can get built into their flow and it’s it’s something that could be caught during verification. You can create assertions to catch those signals. People will find incremental ways to improve things from the RTL level down, but the real progress will be in looking at it from a system perspective.

Power Noise Analysis For Next Generation ICs

Thursday, May 12th, 2011

In advanced technology nodes, SoC designs face complex power supply challenges driven by changes such as higher gate placement density, smaller wire and via geometries, and lower supply voltages in sophisticated, multi-layered packages and boards. The challenges associated with power delivery networks (PDN) anywhere on the die, package and board designs can be seen in all types of ICs, including low-power ICs, which have seen a surge in demand and applications due to the proliferation of handheld devices. These circuits employ several design techniques, both in the die and on the package, to control operational and standby power. The PDN design then has to contend with the transition between the various operating modes introduced by these circuit design techniques. If the package design combines several die together, either in a stacked or in a multi-chip module configuration, the impact of the presence of these various ICs have to be considered in the overall system PDN design. Devices that operate or combine elements of both high-performance and low-power circuits must address the concerns seen by both.

To download this white paper, click here.

Extraction, Power And Final Silicon

Thursday, May 12th, 2011

By Ann Steffora Mutschler
As semiconductor technology scales down, manufacturing effects are coming front and center, putting constant pressure on design teams to make sure that silicon can be modeled through the extraction process while performing analysis accurately.

Extraction technology is one of the basic components needed to gain an accurate measurement of power, timing and signal integrity.

“Device characteristics are changing from what they used to be,” said Sudhakar Jilla, product manager for place and route at Mentor Graphics. “In the past a simple resistance and capacitance calculation of a given wire segment used to be sufficient to do any timing, power, signal integrity analysis. What we have seen, especially going from 65 into 45 and now 28 and even 20, is that the device has big fat short wires vs. thin, long wires. As a result, traditional programming methods are no longer sufficient to capture the electrical behavior of your wires.”

That means more corners to consider in each design, whether it’s power analysis, timing analysis or signal integrity analysis. And at each node, there are a lot more corners because there are more components, more interactions, and much more complexity. That complexity and the changes within a design aren’t necessarily taken into account in the overall tool flows, requiring the addition of device and parasitic extraction tools.

“The problems are getting a lot more complex in terms of extracted devices and the parasitics all at the same time, and making sure they are properly hooked up through the extraction,” explained Harish Kriplani, R&D group director in charge of power analysis and IR drop analysis at Cadence. “There are a lot of manufacturing effects that need to be taken into account, and that’s where the complexity is coming from in the flow.”

Data sizes exploding
The focus on manufacturing effects, the sizes of designs, and the drive for better extraction accuracy also has led to an explosion in data from extraction. Design teams are challenged to manage the data while still analyzing it in a reasonable amount of time. To handle the billion-plus pairs of elements in some of today’s cutting-edge chips, EDA tool providers have invested in parasitic reduction while also increasing the capacity and performance of solvers and equipping tools to leverage threading and parallel processing to compensate for long runtimes.

To address the need for more accurate modeling, EDA tool developers take into account issues such as corner capacitance, fringe capacitance and other electrical effects that weren’t critical at older process nodes. However, because of the proximity effects at smaller geometries, corner and fringe capacitance is examined more closely now.

In addition, designers are running into huge brick walls when they don’t include the package in their modeling of the whole system—inductance, package and the capacitance on the chip. This creates a whole different kind of power analysis environment that you have to consider. Otherwise your chip will work fine on the tester, but when you plug it into the package it may not work, said John Kane, senior product marketing manager for timing and power tools at Cadence.

From an implementation and sign-off perspective, even when it comes to just the pure layout of the wires and the effects, the extraction model has become a lot more complex. “You need to look at not only the length and width of the wires—you need to look at the neighbors, the top and bottom, the layers below and above, to capture the complete effect of what you are talking about,” said Mentor’s Jilla. “Especially from an implementation perspective every optimization that the engine tries needs to look at the timing and power and signal integrity affects. And this has ripple effects throughout the flow because all the pieces are tied together. Most of the power analysis in any implementation system traditionally used to look at power and SI only after the timing was met. Now as we are going down the technology curve what we are seeing is that all of these analysis must be concurrent.”

Going forward to 20nm and 14nm, technologies such as double patterning and triple patterning enter the picture, complicating matters further.

“The other axis of this whole thing is 3D. Now you’re basically adding one or two more levels of complexity in this whole modeling and what things need to be considered during extraction. If we thought it was hard before, it’s going to get even more challenging going to 20nm and 14nm—especially with respect to power. The way our extractor works depends on where you are in the flow. The extraction accuracy and runtime tradeoffs change. That is needed because, for example, during global routing or in the floor-planning phase you just need an estimate of what the Rs and Cs might be. When you go toward the end of the flow and you have the actual detailed routes in place you can extract more accurate numbers. It’s like a dial. In the beginning you have accurate enough but very fast. Toward the end, after the implication flow when you have that detail post route, you go for the full accuracy.”

Power budgeting needed
While extraction can provide extremely valuable information to the design team, there is also the issue that next generation chip architectures are no longer constrained by silicon area or library speed. What becomes critical is the power they can afford to consume and the allowed heat dissipation. Chips must meet maximum power specification limits to control the widening imbalance between what the system will consume versus what it can consume to be competitive.

“Below 28-nm, a lot of customers are requesting that we narrow the band for power budgeting,” said Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions. “Budgeting, very simply put, is different modes of operation on a cell phone, for instance. The power consumption varies dramatically because different stimulus is applied during different applications. To be able to reflect that in the power grid design is a very difficult challenge. How do you predict that at the RT level? And how do you select the high power consuming vectors from the millions of cycles you have?”

There is a need to bring physical effects up in the design flow to predict what happens to the RTL power downstream—after synthesis, place and route, and clock-tree synthesis. A big technical challenge is making sure that the range of power numbers predicted early in the design are maintained throughout the design process—and also that the numbers collected early enough in the cycle actually work in final silicon.

Beyond the standard issues of timing closure with accurate and fast grid extraction, the RC will surely play an important role in other part of the overall power, noise and EM analyses. The capacitance (“C”) modeling of wires and clock structures can be a key element in modeling the impact of these at an early RTL power planning stage and can be a part of a power budgeting flow. At the same time, the accuracy of the resistance (“R”) part of extraction is expected to play a role in electro-migration analysis of power grid as well as for signal-EM, and will be critical as the industry embraces 3D structures.

Power Budgets: Where Is The Low-Hanging Fruit?

Wednesday, May 11th, 2011

Low-Power Engineering looks for the best ways to cut power for a reasonable cost with Mentor Graphics, Synopsys, Apache Design Solutions, Oasys Design Systems and Xilinx.

YouTube Preview Image

Rationalization For Power

Thursday, April 14th, 2011

By Ed Sperling
Power budgets are becoming almost universally problematic. What used to be a unique headache for the cell-phone market has evolved into an ugly migraine that now includes everything with a battery—and increasingly even those devices that rely on a plug.

The result is a cascade of effects that are widespread and growing. And while the drivers of this effort vary widely from one market to the next, the push toward greater efficiency in everything electronic—and the migration from mechanical to electrical—is creating some radical changes in the design process.

What’s changing
The primary change underway is greater granularity and the elimination of waste. While advanced mobile designs are among the most efficient designs on the planet, there is still much that can be reaped from a device. Over the next several years these additional steps will be required, too. A smartphone that used to handle just voice and e-mail is evolving into what some are labeling “superphones,” with the ability to process high-definition streaming video and audio, connect to WiFi, LTE and LTE Advanced, and run new and much more graphical applications.

There is plenty of capability to add processing power to these devices to boost performance, but just throwing cores at a problem will eat up the power budget, overheat the device and burn through batteries in a matter of minutes. Turning on and off portions of the chip as needed helps, but even that won’t solve the entire problem.

“There’s a lot more that has to be done,” said Eric Dewannain, vice president and general manager of Tensilica’s baseband business unit. “We need to reduce the frequency of certain parts, throw away gates you don’t need, and tailor performance, power and area. We’ve gone from performance. to performance, power and area. To reduce the power you need the smallest power and area.”

To do that requires a deeper understanding of exactly what various components such as a processor actually are doing. A general-purpose processor with four cores provides an enormous amount of processing power, but it’s often overkill for a particular function and most software can’t take advantage of all those cores.

When multi-core chips were first rolled out, they were more a recognition that classical scaling of a single core was doomed rather than a well thought-out plan to utilize multiple cores at lower frequency in unison. The argument at the time was that software would make great strides in parallelism, something it had never been able to achieve in 50 years of programming. And while it’s true that some applications have been threaded onto two, and in some cases four cores, the reality is that most applications people use still don’t utilize all those cores—and most don’t need all the power provided by all the cores.

This has led to renewed interest in heterogeneous cores—each sized according to the needs of the device and the applications that will run on them—with the ability to power down cores when they’re not needed. In some cases that involves deep sleep modes, and in others it means turning them off completely.

“You need to tailor the processor for the task,” said Dewannain. “You may be able to reduce the size of the processor by a factor of two or three that way. To do that you need to know what actually needs to run, what in the architecture you don’t really need, and how to tweak various functions to lower the megahertz. If you have something running in 5 to 10 cycles, you may be able to reduce that to 1.”

Drivers of change
What’s clear is this problem isn’t getting any easier. Roger Smullen, director of strategic marketing at AnalogicTech, said the demand for efficiency is a “constant battle.”

“Our customers are looking for smaller chips and more efficient battery life,” he said. “That’s forcing improvements everywhere. We are implementing light modes, depending on the load required by the system. There’s also a huge push in the industry to move from linear battery chargers to switching power chargers, which are much more efficient. They produce less heat and speed up the charging process. And there’s a need for power management inside of systems where you need less energy to do the same job.”

But shrinking geometries and area causes other problems, too. “The multicore side means you can get better performance, Smullen said. “But the other, non-sexy part is that you have to divvy up power domains depending on noise. That requires a lot more communication between power management and components.”

It also requires a lot more up-front planning about exactly what will be used and how it will be used in devices. One of the baseline comparisons used frequently by power experts in the IC industry is the International Technology Roadmap for Semiconductors, which shows a double-digit increase in power along Moore’s Law if nothing is done to deal with power.

“There is a widening gulf between what will happen with power consumption vs. what is needed,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “If you look at the recent OMAP 5 announcement by Texas Instruments, there are 2 ARM processors running at 1.5GHz, 2 cores running at lower frequencies, and graphics. Qualcomm’s Snapdragon has a quad-core processor, a DSP and a modem, plus high-speed I/O and memory. All of this is creating a power budget gap.”

Dealing with that gap between what’s available and what’s needed is forcing changes in how systems are designed, what’s used on an SoC, how it’s used—and whether there is a better way to achieve the same goal.

“The first step is predicting power early in the design,” said Sarkar. “RTL is a good starting point. You want to make sure the number is consistent, so you need to define up front the number you want to have. That will help guide you to where reductions can be achieved. If you have a 1.5GHz chip you can’t use all the functions at once, so you need to develop a power-budgeting flow with an analysis-based approach. Right now power is not part of the design process. It may be functionally correct, but it does not tell you how to shut off what’s not needed. If you have a multi-core design you may shut off the clock to the other cores, for example, but what often happens is you still have data going to these other cores. You need to shut off the data in these cores, too.”

New capabilities, new tradeoffs.
The key in effectively managing power budgets is dealing with power very early in the design—and understanding what works best from a system level rather than from a design expertise or functionality level.

“The one lesson from the ITRS report is that you need to place more emphasis on power decisions at the architectural level,” said Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics. “The earlier in the design process, the more options that are available. The key is to provide designers with the capability to do more tradeoffs.”

Those tradeoffs aren’t always intuitive, however. While more and more functions—particularly power management—are relegated to software, one large chipmaker discovered that using that approach for one of its mixed-state processors would require a power budget of 50 watts. By doing everything in hardware with custom-designed accelerators, the power budget of the chip dropped to 15 watts.

“The challenge is looking at this from a system standpoint,” said Pangrle. “Mike Muller (ARM’s chief technology officer) said that if you go from 40/45nm down to 10/11nm, you’ve got 16 times the amount of integration that has to be done. If you’re using the same power budget, that’s 16 times as many devices and each needs to consume 1/16th as much energy. The reality is those devices will probably use about one-third as much. We’re seeing more and more functionality moving to software, but you really need to think about what runs better in hardware.”

This puts added pressure on IP vendors—particularly those creating soft PHY—to allow their offerings to be customized for specific designs. The one-size-fits-all approach of highly characterized and tested IP with deep understanding of context is important, but that IP also needs to be flexible for power reasons.

“The key here is flexibility and configurability,” said Vishal Kapoor, vice president and general manager of the SoC realization group at Cadence. “”With hard PHY you have to pick on. But with soft PHY the customer can tailor the IP to the needs of power or performance. We’re seeing more and more effort focused on that in an SoC, and we’re having a lot more discussion with customers and requests in that area.”

Spillover into other markets
The mobile markets have embraced this approach to varying degrees, with smartphone chipmakers and increasingly tablet chipmakers leading the way, followed by a variety of other devices. But it’s also starting to become an issue inside of datacenters, where the cost of powering servers and cooling them can reach millions of dollars each year.

Virtualization was a first step in improving server utilization. Cloud computing is a second. But what cloud services also do is allow companies to rethink what they run and whether they actually need everything in-house—or whether it can be replaced by more efficient applications and server strategies.

This is already an important consideration for whether chipmakers use simulation or emulation to verify their designs using their own datacenters, but increasingly it’s also driving some non-technology companies to consider splitting server functions between Intel’s most powerful Xeon chips and its power-saving Atom processors. Even ARM has made inroads into datacenters with Linux-based applications. And considering that IT departments are extremely cautious about making any changes because of the potential for disaster inside of major companies, this is an almost startling pace of change.

In the mil/aero markets, these kinds of decisions are playing a role in reducing the weight that soldiers have to carry and in extending the range of a variety of equipment ranging from drones to ships. And in the automotive market, the rising price of gasoline is pushing the market to consider a variety of new options.

In all cases, power is now the driver. The race is on to re-think and rationalize every decision in the design process based upon a combination of how devices will be used, how much performance is needed and when, and how to best achieve that goal using as little power as possible.

Power Issues In 3D

Thursday, April 14th, 2011

By Ann Steffora Mutschler
The challenges associated with implementing IP subsystems range from maintaining a consistent I/O voltage, achieving consistency in metal stacks to managing a clock distribution network and creating adequate isolation between subsystems on a chip. It’s enough to make your brain hurt. Add to that 3D or 2.5D stacking and the engineering considerations grow substantially.

The concept of stacking die has captivated the semiconductor industry with its promise of, among other things, better performance, shorter signal distances and in some cases a smaller footprint. But with it comes additional design complexity and cost considerations.

“In the old days when people talked about IP subsystems very often they were talking about one SoC, because within this one SoC you have IP that has certain well-defined behavior and has a good interface with protocols around it,” said Dian Yang, general manager and senior VP of product management at Apache Design Solutions. “Now the problem is that those kinds of subsystems are getting more and more complicated, so the subsystem itself becomes a gigantic chip. To be embedded inside a chip, sometimes it may not be economically feasible or may not be technically feasible in theory, especially when you look at 3D implementation.”

For example, if an IP subsystem is implemented on separate die and then stacked with the original SoC, the user interface is well-defined, a micro bump or TSV is used to connect them, But if you separate that design in order to achieve 3D, there is some disadvantage – you have two wafers, two dies, and have to use a more expensive 3D package, he said.

There are advantages, too, Yang said. “Let’s say the subsystem itself can be independently manufactured from the SoC guys in terms of the process technology: one is in 65nm and the other one can be 40nm or 28nm. The second advantage is that the testing embedded inside the 3D SoC can be a little more difficult but separately, you can do wafer-level testing much easier.”

A third implementation advantage is that if the chip can be separated into two die versus one die and uses a 3D package, sometimes the cost can actually be lower. This isn’t always true, but in some cases, especially when mixing two process technologies, the end cost can indeed be less.

“Customers may already have a subsystem implemented and well-tested – they don’t want to change anything because there is always risk associated with that…especially subsystems from a third party. For whatever economical reason they don’t want to migrate to 40 or they don’t want to migrate to 28, on the other side, if I want to shrink my whole SoC – that economically makes sense,” Yang said.

Today, RF subsystems commonly are implemented separately from the SoC because to combine them is not easy in terms of technology shrinking. Memory subsystems and IP such as embedded DRAM are also often implemented separately from the SoC as it is much more cost-effective to separate the DRAM using stacked die.

Fig. 1: Example of a contemporary audio IP susbsystem. (Source: Semico/Synopsys)

The impact of 3D on IP subsystems
Understanding the impact of 3D on IP subsystem implementation is key to moving ahead, reminded Samta Bansal, product marketing for Encounter digital implementation system, and a 3D IC expert at Cadence. “When I look at immediate implementation of 3D, I actually think about it utilizing the IP subsystems. Immediately, I see we will leverage the 3D with TSV structures as being able to use IPs and build something more, tailoring it to different applications.”

3D impacts IP subsystems on a number of fronts, including what people will actually be designing and partitioning in 3D and how to utilize the IP that is going to come from third parties. “In bringing all of this together, what becomes very important is co-design: how are you going to co-design everything together knowing that IP will come from different sources,” she explained. DFT is another area that is affected by 3D. Every chip could have its own DFT, but when put together as a system how will they talk to one another?

“There are a couple of ways to solve the challenge when you put power in the context of 3D or IP subsystems. Number one, we can solve it by adding additional resources. I can add decoupling capacitance, I can add extra vias, I can do more I/Os and increase the routing area through the power distribution network but that means cost. And the whole idea of doing 3D is to somehow manage the cost in addition to the power and performance that you would get. That is a way to do it, but is that an effective way? It depends on the application that it is targeted for, and the volume you are going to bring up, that may work and it may not work,” Bansal offered.

Good engineering always helps too, she said, pointing to Freescale’s use of stacked decoupling capacitors (decaps) instead of its standard gate oxide decaps, which shows how smart, next-generation decaps can be leveraged to realize some improvement in the clock frequency. That, in turn, helps the power network overall. IBM, in contrast, uses trench capacitors for decoupling. “You can either throw cost at the problem with more capacitance, more routing areas, more vias and try to solve the challenge of power, or you can implement these smart engineering techniques. And you can use EDA tools to optimize the power.”

Looking at power in IP subsystems from a higher level, Bansal believes the dies that are going to be farther away from the die with the connection to the package will be susceptible to noise. They will get noise from their own switching and they will be affected by the power noise from the die below it. From the pin count and routing limitations, sometimes it is very difficult to isolate the power distribution network between the two dies—both for the power and the ground supply network.

Navraj Nandra, director of analog/mixed signal marketing at Synopsys, disagrees on this point. He said the biggest challenge in terms of implementing IP subsystems that contain 3D structures is packaging and stacking up various die to make sure that the interaction of the IP functions are managed in terms of signal integrity and noise. “The idea of 3D or 2.5D actually solves a lot of these problems because the goal of 3D is to get your form factor into something that’s mobile, but what it does is reduce a lot of the trace lengths. Imagine if you had two chips communicating with each other across a board and now they are communicating with each other on top of each other through TSVs so that distance now is reduced. This actually makes the challenge from a noise perspective less because there is less distance to communicate over and the signal integrity will improve.”

Bansal doesn’t agree. “Even if you can think of these IP subsystems having their own power and ground networks unique to them, you can imagine the chip higher up in the stack will probably have more increased power noise and voltage drop because there are now a large number of metal and via layers that are added in the conduction path, and also because of the capacitance/inductance noise that must be happening from the intermediate chips. To mitigate that, design teams will not only have to focus on how to design the robust power grid on that IP subsystem, but also to ensure that the power delivery network that gets designed in other chips in that stack are also adequate to meet the system need. They can’t just worry about the design and density of the power grid routing. Now they also need to worry about the via count, the design and placement of the TSVs that are going to connect those power grid networks to the other die. That means when you are thinking about the power grid network or distribution, this co-design becomes very important.”

In all cases, this will be an interesting space to watch as IP vendors look for ways to combine their IP in unique ways. In fact, Semico Research Corp. predicts that advanced performance multicore SoC will be the device type shipping with the most IP subsystems, reaching 1.558 billion units by 2015, which is a 25% CAGR.

Advanced Modeling Technologies For Chip, Package, System Co-Analysis And Co-Optimization

Thursday, April 14th, 2011

The traditional approach to chip-package-system (CPS) co-analysis and co-optimization lacks required accuracy and limits productivity. To meet the increasing demands for system cost down calls for a new methodology that is more comprehensive. This white paper outlines the Chip Power Model (CPMT) technologies and solutions available from Apache Design Solutions to help address the CPS convergence challenge.

To download this paper, click here.

ESL Power Optimization Flow Requires Ecosystem

Thursday, April 14th, 2011

By Ann Steffora Mutschler
The issue of power optimization today is very painful for many chip architects who are tasked with determining, meeting and holding to a tight power envelope. Questions concerning how well and to what extent power can truly be understood at the architectural level, let alone optimized, are the subject of debate.

The ITRS’s most recent projection provides some insight as to current market drivers. The following figure illustrates that the power consumption trend versus power requirements is creating the “Power Gap” akin to the “Design Gap” that the industry dealt with a decade ago, noted Vic Kulkarni, senior VP and general manager at Apache Design Solutions. “This gap is forcing people to think hard on how to manage power at all levels of abstraction.”

Source: ITRS

With mainstream users, there is no controversy about whether abstraction of power and performance needs to shift higher in the designer’s mindset. There is no choice. Designers must shift their thinking from high accuracy/power validation to relative power/power exploration, but making that shift is easier said than done. Designers are not typically accustomed to thinking that way, barring a few architects at some of the largest and most advanced chip companies.

From a technical perspective, the role of power at different levels of abstraction, as well as its nuances and characteristics, is not always well understood.

“There is a need for a lot of education here,” said Shabtay Matalon, Mentor’s ESL market development manager. “I think that power is mostly understood at the transistor level because that’s where people can very tightly correlate power with transistor switching activity, with the threshold levels, with Vdd, and all of those basic equations that people can calculate the static power and dynamic power.”

At the gate level, design engineers still have a good understanding of problems because that involves a relatively low level of abstraction. They can change states on those gates and still clock the flops. Move further along in the flow and things become less clear, however.

“When you go to the RTL things become very vague. Frankly, the challenge here is that there is a limitation what level of accuracy you get at the RTL,” Matalon noted. Raising that to transaction-level modeling (TLM) will offer some relief here, but not until there is more education about how to use these ESL approaches. “That’s the reality. We are dealing with this reality in the marketplace. However, what works well in favor of dealing with power at the TLM is that the payoff is huge, in terms of power optimization.”

Moving up in abstraction from the gate level to RTL it is possible to achieve approximately 5% to 10% improvement in power optimization. “Given that the payoff is so high at the architectural level (up to 80%), on one hand we are seeing that there is a lot of attention to it but on the other hand, I can’t say that the knowledge is yet prolific,” Matalon added.

Where power optimization occurs
While power needs to be planned for at the architectural level, the real optimization of that power happen further down the flow.

“Power optimization really happens purely on the hardware level and purely on the RTL down level so you have all these cool techniques starting at RTL down,” said Frank Schirrmeister, director of product marketing for system level solutions at Synopsys.

The following illustration presented by Synopsys at ARM’s TechCon shows the impact of power optimization techniques at different levels of abstraction and stages in the design.

“One triangle identifies the leverage you have, and the other identifies the time you need to implement, which is the cycle time. The earlier you start, obviously the more impact you have (shown by the wider part of the inverse triangle) and you need less time to do it because you have a shorter cycle time and you can still make changes,” Schirrmeister explained.

Today the majority of techniques are employed at the RTL on the hardware side with the software then trying to optimize things like cache utilization by itself on fixed hardware.

“So now the objective has to be, given the very intuitive notion, that the earlier you start the more impact you have and the more leverage you have on power consumption. We need to move upwards,” Schirrmeister said. “On the architectural level, before you have even decided between hardware and software, you will try to make very early considerations about how to separate it into hardware and software. That’s the architectural analysis part and what people are doing there is really around taking abstract descriptions of the software and the function and figuring out whether the architecture will actually support that. Once you have made the decisions between hardware and software you really have a couple of components running in parallel— in the wider sense it’s block design and how those blocks integrate.”

Bringing these concepts together, Apache believes that an ESL power design flow can be realized by leveraging ESL simulation, ESL synthesis to RTL and RTL power analysis using ESL simulation results. Kulkarni stressed this has been demonstrated successfully by working closely with an ecosystem of an IP provider and a system company.

In addition, virtual prototyping will play a vital role in this upcoming ESL power design flow. As just one example, Mentor’s Vista can be extended to the virtual prototyping space. Traditionally, virtual prototypes have been perceived as just a functional model that only allows people to validate—to do verification of software against a hardware model. This landscape is changing today because software is becoming such a dominant part of a modern design. So much design know-how and IP is implemented in software, and software has such a major role in setting the performance and the power that it is no longer sufficient to provide legacy virtual prototypes that only are functional. Software engineers need a model that is a more hardware-aware virtual prototype where power and timing are modeled so they can evaluate based on their software getting not only the functional spec of the design, but the spec of the design in terms of performance and power, Mentor’s Matalon observed.

Synopsys plays heavily in the virtual prototyping space, while Cadence is still mum on the topic, focusing more on its hardware emulation approach.

The power optimization challenge
Still, as we go up in abstraction, designers need to try to understand software issues and how the software relates to hardware.

“As you get into advanced power management schemes in the hardware a lot of times that’s controlled by the software, so how effective is the software at doing that?” asks Jack Erickson, product marketing director at Cadence. “Ideally, you’d like to be able to run your software on your hardware and have a better understanding of how much power is consumed by software. The more you can do in the software before you ship the product the better. If you can get your chip into emulation and run your actual software in emulation and examine the power effects of your software, that’s late in the hardware design cycle, but you can have large effects on your software before you ship your system.”

The biggest challenge is that it’s really a new space, he said. “Folks have started to worry about power only in the past few years and now we’re also talking about moving up to a higher level where they have even less experience typically. The combination of those two is very difficult,” Erickson concluded.

Next Page »