Posts Tagged ‘Xilinx’

Anatomy Of An Acquisition

Thursday, December 15th, 2011

By John Blyler
Lattice Semiconductor’s proposed acquisition of FPGA start-up SiliconBlue Technologies for $62 million in cash is the latest signal that the smart-phone market may be showing signs of overcrowding.

While researchers are quick to point out the growth rates of smart phones sales versus computers, there also are an unprecedented number of companies vying for a stake of that market. Lattice’s push into adjacent markets is a hedge against that overcrowding.

Lattice until now has focused on the high end of the smart phone market. Silicon Blue targets mid-range players such as watch companies.

Doug Hunter, vice president of marketing at Lattice, said both companies occupy complementary spaces in the mobile consumer market. Silicon Blue offers a reduced feature set at lower power and with a one-time programmable (OTP) memory technology that it licensed exclusively from Kilopass. “This will allow us to go into customers with both a simpler and smaller or bigger and more fully featured suite of products,” explained Hunter.

By far the larger company, Lattice has more than $250 million in cash on the balance sheet with a good quality track record, said Hunter. The company also has a much wider distribution and sales network than start-up Silicon Blue, which should help win sales from customers that are reluctant to deal with a start-up company.

Still, Lattice has had its share of challenges in recent times, including numerous CEOs over the last six years and loss of market share to giants such as Xilinx and Altera. Hunter acknowledge these challenges, but highlight the company’s current strategy of finding niche to “differentiate, duck, bob and weave” against the two industry giants.

The acquisition of Silicon Blue fits that strategy. In addition to its mid-range handset sales, Silicon Blue recently won a design in an unusual ultra-lower power niche market. Watchmaker giant Citizen Watch selected SiliconBlue’s extremely low-power FPGA device for use in its new Eco-Drive Satellite Wave watch. Citizen claims that this is the world’s first solar-powered GPS-synchronized watch.

One key element in this selection by Citizen was the ultra low power of the company’s 8,000 FPGA logic cells, based on TSMCs 65nm low-power standard CMOS process. The other key factor was the tiny 4×5 mm footprint of the wafer-level chip package, where the ball-grid array (BGA) is placed directly on the wafer. This ensures a very thin package, essentially the same size as the dye.

Silicon Blue optimizes its designs for ultra-low power by using transistors with very fast switching speeds in critical areas of the design like clock trees. Additionally, their design makes use of the default “off” state inherent in FPGAs. “The network is only switched on when it is being used,” explained a company spokesman.

This move by Citizen to incorporate greater electronic functionality in its watches represents an interesting convergence between the worlds of traditionally mechanical-digital systems and fully electronic systems. Citizen’s Eco-Watch is a traditionally high-end timepiece that incorporates modern GPS technology. On the other side of the convergence are fully electronic systems like Apple’s Nano, a multimedia player with Wi-Fi connectivity that now incorporates a digital watch display.

TSVs Ease Heat In 3D ICs

Thursday, August 11th, 2011

By Ann Steffora Mutschler
In the evolving discussion of 3D ICs and through silicon via (TSV) technology, a key issue engineering teams are facing today is how to reduce the thermal coefficients between substrates in a stacked die. Simply put, what is the best way to get the heat out of the 2.5 or 3D IC?

The answer, of course, is anything but simple.

“In a 3D system, the heat hierarchy is through the package, through the heat sink, through the bumps, through the adhesives and through the stacked tier layers. If the wafers were thicker, the heat would have a chance to flow out horizontally or vertically and dissipate a bit. As wafers are thinned more and more, the heat dissipation becomes an issue, and if you stack them it gets worse. Within them, thermal flux increases, your peak temperature within the stack increases and since your wafers are thinner, you also have a higher temperature gradient across the thinned wafer,” explained Sesh Ramaswami, senior director at Applied Materials.

The first step in managing thermal issues today is accurately calculating the power and leakage in the design, with leakage now one of the most dominant issues to be addressed.

“The low-dielectric constant materials are actually causing more of a problem because they’ve got lower thermal conductivity. That in itself is not helping with thermal gradients on a die,” said Pete McCrorie, product marketing director at Cadence. Thermal analysis technology employs IR drop power rail calculation to generate instance power in the design, so for each of the instances in a design that power is based on activity information. That is added to the leakage power, which is calculated, and it all gets thrown into a solution where the thermal conductivity of the substrate, interconnects, ball bonds and package is extracted and is solved for thermal at that point.

“When you’re stacking—and it doesn’t matter if it’s a 3D package where the package is a system-in-package or MCM, or a number of die on top of each other—with 3D IC it’s the same thing,” observed Navraj Nandra, senior director of marketing for analog/mixed-signal IP at Synopsys. “The difference between a 3D package is that you do all the signaling off the die, so you have wire bonds going off the substrate and connecting into the other substrate. With a 3D IC, you have on-chip signaling, so you’ve got the communication between the various substrates happening through TSVs, for example.”

So when it comes to all the heat issues, everyone dealing with stacking is basically having the same problems. To combat the heat, engineers try to put the active devices—those devices or transistors doing all of the switching—at the top of the substrate hierarchy where a lot of heat is generated. Looking down further into the stack, IC developers are trying to increase the effective heat transfer coefficient, meaning they are looking for technologies that can shift the heat out very quickly from that substrate.

In terms of the packaging aspect of stacking 3D ICs, one approach is to develop a substrate with better thermal conductivity, but that’s cost-prohibitive for most developers. Intel has done research in this area and released a paper in 2007 with its suggestions for managing thermal issues. http://download.intel.com/design/iio/applnots/31505102.pdf Other approaches leverage familiar techniques that use copper, such as including a copper spreader or copper underfill between the substrates to dissipate the heat.

Then, when it comes to 3D ICs, TSV technology not only gives area, bandwidth and latency benefits, but it also can be used to manage all the thermal problems on a 3D IC by using the signal and power TSVs to dissipate the heat.

Synopsys’ experiments in this area involve taking the concept and introducing more vias or a via array—specifically, a TSV array—to reduce the temperature, Nandra said. “The idea is that if you can understand where the hot spots are going to occur in your design and somehow predict that in your EDA methodology, you can then insert a bunch of TSVs and those will help in the thermal dissipation. The question is how much do they help? We are seeing that they certainly help to reduce the peak temperature and the overall temperature gradients, but they don’t get the minimum temperature down any further.”

Not just for cell phones
Engineers tend to think of low power designs as being the wireless type solutions, but today everyone including the high-performance server developers are looking at lowering the power because of the associated heat and the high energy costs, Cadence’s McCrorie pointed out. “You think about the heating problem of the chip, but then when you try and dissipate the heat from the board and from the environment, that all gets very expensive if you’re generating too much heat.”

Meanwhile, Applied’s Ramaswami believes 3D stacking and TSVs may well pan out in the datacenter. With servers containing multicore CPUs that require lots and lots of data, if DRAMs are used in traditional DIMM approaches there will be several DIMMs on the board. “These DIMMs have a latency factor. They are a little slower because of the wire length, and so on. For the server market the blade would probably have these memory cubes on them [referring to Micron’s Hybrid Memory Cube as an example] with the following advantages: You get more memory per unit volume, which is much closer to the CPU, and because of that the latency goes down and your power dissipation goes down.”

Also, teams building chipsets for datacenters are asking for much lower power consumption for high-speed interfaces than what was typically thought of in the past. “They want something like a 10 or 12Gbps interface, but the power consumption numbers that they are asking for are very similar to what we would have thought in the past would be required by someone in the consumer industry,” Synopsys’ Nandra said. And they don’t always push for the higher performance, opting instead for lower power. “They say, ‘We want the 10Gbps interface, but what we really want is not for you to show us that you can take that 10 to 15Gbps or whatever. We want you to show us your roadmap to get the power consumption of that interface down.’ That’s a different requirement from customers.”

Modeling first
Of course, knowing where to put the TSVs is critical. From the design perspective, the first step is to model the problem with three pieces of information needed: current, resistance and voltage. “Once you’ve modeled the problem then you can think about some kind of automated EDA implementation. The way to think about this is going back to some very basic analogies. In order to do your thermal simulation, you can think of the heat source like a current source, because the current is directly related to heat, and you have an equation to do that,” Nandra said.

Thermal resistance is the other problem that causes heat, which is equivalent to a resistor, and that is equivalent to electrical resistance. Add to this the temperature gradient, which is analogous to the electric potential or voltage. With these pieces of information, a thermal model can be built based on those three parameters, which can then be used with any kind of numerical based simulator to do the thermal equivalent simulation, like SPICE, he explained.

“Then the question is, where do you implement it in the design flow. Fundamentally, the whole idea of 3D ICs is to solve the wiring crisis of interconnects. You’ve tried to solve the RC delay problem by having a vertical interconnect system. But now the next question is, in that EDA model, where do you implement the simulation of the vertical stack of heat?”

Grossly simplified, this is not too terribly complex of a problem in terms of modeling, he admitted. “The complexity is the fact that when you’ve got millions of TSVs in your network that you’re simulating, it all goes into this big matrix in SPICE or whatever numerical simulator you’re using and that becomes a challenge.” As such, there is work to be done with simulators for thermal analysis. More knowledge or heuristics need to be built in to help designers determine where to focus the model of the simulation.

Nandra believes that’s the most interesting aspect of this. “You can take this simple model and apply it blindly to the whole 3D IC, and that’s going to make the matrix that you’re running on the simulator huge. Or you can intelligently think, with some heuristics, ‘Okay, I’ve got 15 areas where this thing is going to get hot, and that’s where I want to apply the model.’ The reason you want to apply the model is because once you understand that the hotspot is occurring in this region, that’s where you want to put your TSV array to reduce the heat in that area. Then you need to know how many vias to put there because there is an area impact. You can do your insertion in that region. In the end, it becomes like a synthesis problem in a way. It’s almost like the way Design Compiler started because there was a way that Design Compiler initially worked out how to size gates based on logical effort and then, over years, the scientists that were working on it figured out some heuristics to make the optimization of that logical effort tuned to what you were trying to synthesize. I think that’s the way that this technology in terms of EDA automation is going to go,” he concluded.

Additional resources:
Examples of TSV technology in production designs today

1. TSV with interposer – Xilinx
2. TSV through memory – Elpida
3. TSVs through a logic chip – Qualcomm but no product out yet. discussed at many conferences.

Experts At The Table: Power Budgeting

Friday, June 3rd, 2011

By Ed Sperling
Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: Where are the gotchas in design?
Klein: There’s a good Las Vegas analogy for place and route. Even if there’s 50-50 odds, you’ll lose because you don’t have an infinite supply of money. It’s similar with capacitance. You have the same thing with RTL code. You can’t connect to an infinite number of places using the shortest distance. So then you need to make tradeoffs. What is the V(2) and what is the F of this group? Should you allow more capacitance here, or less? You can’t connect to anything in zero time?
Chin: Aside from the obvious V, which is squared, it’s the F that’s important. C is an implementation—a detail—that is impacted by the other variables, but F is the most important. It’s the switching activity. The more understanding you have of that switching activity, the more you can affect how is this thing operating and when is it operating. The biggest problem of being able to start with some budget of power and later being surprised is usually associated with having a bad assumption about F. At a very high level we can understand that. A lot of things we intend to be mostly ‘off’ aren’t quite so ‘off.’ Even though your phone is sleeping, it isn’t quite sleeping. When you move around, accelerometers wake up even though the screen is still black. This is the leakage with a small ‘L.’ It’s not static power. It’s operations that are happening at the device level that aren’t doing anything useful. It’s high-level leakage of power that isn’t computed into the implementation flow because we assume it’s ‘off.’
Klein: We don’t assume it’s going to be off because in the FPGA we don’t have islands where we turn off an entire function.
Chin: But in terms of activity, we’re assuming there’s not going to be any toggling of activity, and therefore no dynamic power.
Klein: In regards to noise, sometimes you can create an algorithm that you didn’t expect to be generating too much noise. But I’ve worked with customers where they’re getting noise and they can’t figure out why. We’ve said, ‘It looks like you’re doing four cycles and then rest and then four cycles and then rest.’ And they are doing that to match rates of blocks. That produces noise, and noise can produce power, as well. Sometimes those require algorithmic readjustments.
Kulkarni: We’re finding that the glue is missing between these areas. As the problems get more complex with Moore’s Law, there’s no ability of architectural designers to talk with SoC designers and IP designers. Sometimes the IP is designed by another company or in another country. And then, finally, there’s the package at the end of the chain, the PCB and the DDRs. DDR4 is going between 1.4 and 1.6GHz, and it’s creating tremendous noise. It’s creating overall power budgeting issues that transcend just the SoC. You have to manage that off-chip.

LPE: That brings up a good point. How much of this is out of your control now as you bring in more third-party IP and software from other vendors?
Van Besouw: Most of the IP that people bring in right now is RTL-based. It’s soft. IP may even go to the system level, as well. But you have to question whether it’s a budgeting issue going from RTL or whether it is knowing how the power consumption will be once you go to silicon. One of the reasons system-level design isn’t everywhere is that it’s not connected to the rest of the design flow. We need to create tools to make that connection. You need to make the right choices at the right time so you can still make a big impact on those variables. That’s the key. It’s more about automation than trying to impose more methodologies around the limited tools we have today. EDA has allowed that, and it has affected innovation.
Kulkarni: It’s our responsibility as a tools vendor to allow cross-domain enablement. How do we connect these static blocks of companies, groups, divisions, or the knowledge base? We need to encapsulate those models—and modeling is an important part of the power budgeting equation. The power models may contain a die model for a power grid, current sources, power profiles and the parasitic capacitances. We need to encapsulate that, along with the right frames that are selected from a stimulus, which can then include IP—including hard IP and hard analog IP—plus RTL that’s configurable. Then you need to create a chip-power-model. The job of a package engineer becomes easier. The job of an IP engineer becomes easier. And the job of an RTL engineer becomes easier. The models are encapsulated in the flows and the tools themselves.
Pangrle: There are certainly challenges there, but it opens up opportunities for everybody here to provide better methodology in tools. There’s also room for improvement in how the IP is being delivered to the customer. There’s a lot more information that can be included with the IP to make everybody’s life easier.
Kulkarni: There has been a lot of work in the functional and timing areas, but power is where those areas were 20 years ago.
Van Besouw: Power is like an afterthought.
Klein: There is some flexibility in FPGAs to alter the power afterwards, but you have to do advanced design in the parts that can’t be altered after the fact such as the process choice, static power, what user functionality you have for power.
Van Besouw: But even configurable approaches have limitations.
Klein: The attitude about timing is different than power. When the timing doesn’t do what it says, the design fails. Nobody will accept that. But if the power is 10% high, the attitude is, ‘Oh well, I didn’t really think I could predict that accurately, anyway.’ With timing if you’re off by 10% it doesn’t work.
Van Besouw: That’s changing for power.
Klein: Yes, it is getting that way.
Chin: On my phone I’m tolerant of dropping calls, and I’m becoming more tolerant of waiting for my phone to do things. But it’s really a problem when it’s out of power. From a mobile device standpoint, that’s like getting the wrong answer on your calculator. Power is the most important thing on a mobile device.

LPE: So how is this changing the design priorities?
Kulkarni: At least 10 companies we talk to all compete on power. They spec it that way.
Klein: We’re seeing the same thing. Our customers are competing with each other on power. They never were before.
Chin: That’s really where the problem is. You know the applications you want to run, but there’s no good way of determining whether it’s 5 watts for one application or 1 watt for another. No one can really tell. We don’t have a hard limit on power. We can’t say if it’s 1.1 watts or 1.2 watts to view a YouTube video, because there are lots of other things going on.
Pangrle: There are lots of other variables to play with. The reason power is becoming more important is that if you’re using a certain package and you know what the cost of that package is, if you’ve blown your power budget and need another more expensive package you may not have a market anymore. You may price yourself out of the market. On the other hand, you can suffer a little on performance to keep your cost within an acceptable range. If you look at applications such as graphics, how accurate do you have to be? If you slip a few frames a second the end user isn’t going to notice.
Van Besouw: It’s like how far can you go on a gallon of gas. It’s not a matter of how fast you can go. Power is the same thing. How long can you talk on your phone?
Chin: And given that battery technology is relatively equal across the vendors, it’s all about power efficiency.

Experts At The Table: Power Budgeting

Friday, May 20th, 2011

Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: What’s the best practice for dealing with power in complex chips?
Van Besouw: In today’s chips, with 50 million or 100 million gates, it’s inconceivable that the whole chip is functioning at the same time. You have to make decisions at the architectural level as to what’s on and off and which voltage islands to use. It’s another level of complexity. We’re just scratching the surface of trying to manage that. It’s one thing if you’re doing it on a small design. It’s another if you’re doing it on a complex SoC. There’s still a lot of headroom.
Chin: There are two separate things we’re dealing with. One is high-level complexity. Everyone agrees you have more leverage at the architectural level. But if you look at dynamic power optimization, a lot of what we can do at the implementation level we can do better if we understand what modes of operation we’re trying to optimize for. If you’re trying to optimize your silicon—even your physical implementation for a specific case—then you can optimize for that case very well. If we can take our optimization and weight it in the direction we want, whether it’s 75% of the time or the maximum operating condition or anything else, understanding those things has become very important. We have ways of optimizing our tools for activity constraints. We can put in specific vectors. All of these things can lead to different implementations, both logically and physically, of the actual circuit. Circuits can be optimized for certain conditions. But the problem I’m seeing today is even in the few cases where we have information about switching activity, there’s no context to tell us, ‘This is switching activity for this particular mode with these pieces powered down and these pieces running.’ We need to add that kind of information. That will make all of today’s tools work much better than they do today.

LPE: Doesn’t that make it harder to devise a derivative chip because you’re optimizing for each market?
Chin: Not really. Imagine if you had a device that ran twice as long just for gaming. Or you have a phone that gives you three weeks of talk time if you don’t do any gaming. This is an opportunity from the design side and the manufacturing side. In an era of dark silicon, where we have more gates than we can power at any given time, we can customize that hardware. Let’s push in the direction of specific implementations that allow you to optimize for all these applications. It’s also a way for people to differentiate their products.
Kulkarni: We almost need to reverse the paradigm of design. Why should we have a processor architecture for an embedded processor and use it for various applications such as YouTube or e-mail or music? We should do it in reverse. What kind of architecture does Facebook need? There could be a ‘processor-Facebook’ and a ‘processor-YouTube.’ The stimulus is different, the power consumption is different, and stimulus is becoming more critical for power consumption. You essentially are taking your Vcd power and analyzing your function based on your Vcd set. The real problem is power, but we’re searching for solutions because some Vcd set is available to you. So now you can start talking about power budgeting. What will happen to a power grid design? Will you create EMI problems or EM problems? Will it blow up while you’re using a Facebook application?
Chin: Today you can carry around a navigation system in your car that will communicate back into the cloud in real time so other people will know to avoid traffic congestion. Wouldn’t it be great if we could use that same technology to understand what modes in the chip are being used? That’s exactly what we want to know. Here is the switching activity that millions of people are using. We could write that today with EDA tools that can run on your phone. We don’t have that information. We need to customize more on usage all the way up to the application so we really can have a Facebook processor.
Pangrle: A lot of the architects have been focused on performance. It varies from application area to application area, but the guys doing processors with standard instruction sets know what they need to cover and how to get more performance. They need to extend what they’re doing to include both performance and power.
Chin: And they need the analysis for power. We do really good analysis for performance these days, but for power it’s a lot more nebulous. And to a certain extent we do performance analysis statically. But in the power realm, it depends on what you’re executing. At some point, when you’re averaging across too many cycles, you’re losing the information you need. Today it’s still important to look at dynamic capabilities, especially with power budgeting. What people want to know is how much resolution do you need. You need to know where all the peaks are. Today, power reminds me of dynamic simulation 20 years ago when people looked at timing of specific long paths.
Pangrle: Being able to look at different modes is important. But if everything comes in at the same activity level you can’t distinguish or optimize for that. Being able to capture at different modes what that activity is that’s related to that mode is a big help. The things that can be done at an architectural level will have an impact downstream, even in regard to the tools needed to get the job done. This is like multicorner-multimode at logic synthesis or place-and-route. You need to make sure you’re not just meeting performance, timing and power at one operating point. These things are operating across multiple voltages. You need to make sure you’re safe across all the process corners as well as all the operating modes. That’s having an impact on the downstream tools. I like the idea of having a different processor for a different application.

LPE: It seems that another large EDA company has pitched a similar idea.
Klein: That’s where the FPGA is unique. It is, by definition, meant for an arbitrary number of modes of operation. We can dynamically reprogram sections of the FPGA while other parts are running. You can change functionality based upon what you see coming in. Additionally, because we have a programmable device, we need to put levels of hierarchical things that people doing synthesis can take advantage of—or which smart designers who understand all the modes of operation can utilize. We have hierarchical levels of clock gating. You can globally gate off the clocks with a one or zero. At a regional level you can have multiple clocks to gate off tens of thousands of flip-flops or block RAMs or DSPs at one time. Then we have finer gating at the individual block level. Each of them has various benefits and deficits. We also look at whether the contents of this flip-flop will be consumed on the next clock cycle. If not, I can gate it off on that clock cycle only. Knowing the functionality would be helpful for more global analysis, but if we don’t put the hardware capability in there in the first place to gate off locally, regionally and globally, it won’t matter what the software does because we won’t have the hardware features to take advantage of it.
Van Besouw: You have to make assumptions. The same functionality may be used on many different modes. That’s interesting because for one mode you may write completely different RTL. What you generate as an end product may be completely different. It may have different timing and physical constraints.
Chin: And these days that’s not beyond the realm of possibility. You can implement multiple modes because you have much more silicon than you can use. So why not have the Facebook processor as well as the gaming processor all on the same chip? You can power up the different sections based upon what you’re doing. In total you can save a lot more power. The tradeoff has always been timing and area. Now it’s timing, power and area, and area is probably third on the list these days. There are a lot of transistors on that chip. Figuring out what to do with them is something we’re having problems with. And the best way to control leakage is to shut things down. We’re starting to approach the more optimal implementations. It’s the reverse of resource sharing. There’s more and more hardware with specific functions.
Kulkarni: Besides power analysis, how do you refine the band of power? The assumptions you make at RTL almost always get thrown off the moment you go to clock-tree synthesis and place-and-route.

LPE: Meaning that when you take real measurements they’re not accurate?
Kulkarni: Yes, they may be off. So that means it’s not just the tools. Power budgeting is a set of tools and a methodology for the whole refinement from ESL to RTL to CTS to P&R and power-grid design. The capacitance can go haywire. Between the clock tree, what are the so-called source tree and leaves? What happens to the mesh clock structure if P&R tools play with it to do timing optimization? You can throw off all the assumptions you make at the RT level for power consumption unless there is a methodology to define power accuracy or inaccuracy. The plus or minus 30% should go down to 3% to 5% when you are doing final dynamic voltage-level signoff. The power intent will tell the tools what to do, but CPF and UPF do not tell you how to implement the low-power design.

Experts At The Table: Power Budgeting

Thursday, May 12th, 2011

Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: The ITRS road map points to serious problems ahead with power budgets. How do we solve that?
Chin: We want to be talking about power budgets at as high a level as possible. You have much more leverage for impacting power the sooner you can attack the problem. You want to look at the problem as far up in the architectural cycle as possible. That’s critical to power budgets. Another issue involves how we close in on the specific details. On the verification and test side we talked about vectors driving the tests. That’s been the case with power at a detail level, but not at the higher level. We need to better understand what are the operating modes, how we get into these modes, standards, how we specify power intent, and the implications from the hardware through the software stack, including the firmware, middleware, operating system, all the way up to the applications. Application software itself has a big implication on the power in the device. We need to be able to boil that down to the hardware and meet somewhere in the middle.
Klein: Static power has been a big driver after 90nm. Because it’s process- and temperature-dependent, it affects things where the chips are being used in a rack with limited airflow. We see thermal and power as very important, and each generation we’ve gone further than the recommendations of the ITRS. We’ve had to reduce those by process choices and other techniques. But as we begin to drive down static power we have to consider dynamic power. As we grow the cores of the FPGA bigger and bigger you can have more things that are toggling. That’s where power optimization techniques come in. We’ve invested in post-synthesis power optimization through SPICE-level clock gating. And once you’ve dealt with all those things, you’re still left with I/O power. You have to look at what you can do with the architecture and how you can deal with that. It’s a multi-pronged approach between static, dynamic and I/O power.
Pangrle: A lot of what drives the power budget is the target market for the chip. If you look at high-performance microprocessors in servers, they have topped out under the 150-watt range. If you’re looking at a cell phone you may be limited to 1 watt. There’s a broad spectrum, and for each one of these there are a lot of networking applications. If it’s going on a board you may be looking at a total of 17 watts. What’s driving this is the total cost of the system, which includes the packaging and how you’re going to cool it. With high-level servers, if you start going above a certain level you have to start looking at liquid cooling rather than just using air and fans to cool them.

LPE: At one point even fans were considered exotic and too pricey, right?
Pangrle: Yes, that’s correct. It all comes down to cost, and at 90nm we’re seeing a real big impact in leakage current and static power. When digital watches first came out they were running at 9 volts and static power was practically non-existent. Threshold voltages were high and you didn’t even take those into account. When we got below 100nm we went to 1 volt. Even if you look at the 28nm process, the nominal Vdd is still in the range of 0.85 to 1 volt, so we’ve lost that scaling. If we’re following Moore’s Law and doubling what we’re putting on each chip at every new technology node, but the energy per device isn’t being halved, that creates some real issues. We need higher-level optimization and tradeoffs between hardware and software.
Van Besouw: Power is the limiting factor in everything from small devices to set-top boxes for meeting performance goals. It’s a very complex process in determining how much power is going to be distributed to each smaller block. If you have hundreds of millions of gates that means you have hundreds of blocks. But when you distribute power that isn’t evenly distributed across the chip. It very much depends on the functionality of each block. You don’t know until you go down to the placement how much power is really being consumed. You want to do the power optimization at as high a level as possible—at the architectural level. You want to design this at the chip level, not at the low level, but there’s also another problem. You need details. You need to know what’s being used. That will determine how you implement RTL. It depends on the power requirements, and that impacts timing and placement. It’s a very connected problem. You want to make the right decision at the RTL level, but you need accurate information for placement. Floor planning, in turn, has an impact on timing closure and the timing characteristics, which includes the use of voltage islands. It’s like putting a 3D puzzle together where the shape and size of the puzzle is constantly changing.
Kulkarni: There is a difference between the classic Moore’s Law consumption and the Moore’s Law expectation. The trend is toward ‘More than Moore,” or MtM, which creates demand for power budgeting. The Moore’s Law that has been scaling transistors and process geometries was really driven by timing and performance. The MtM roadmap shows this problem isn’t just ICs. It’s also 3D stacked ICs. When you look at all the new tablets and smartphones we’re looking at stacked ICs. The power budgeting is exacerbated by the MtM law, which is taking over now. Moore’s Law will continue, which is ‘More of Moore.’ There will also be ‘More than Moore,’ which is MtM. We see that in the mobile markets, which are 100% focused on power and noise. OEMs are defining a power spec at RTL and then asking chip vendors to bid for it. The chip vendors need to define a band of accuracy for power all the way through post synthesis, clock re-synthesis, block placement, placement and route, and then dynamic voltage drop. Power budgeting came about as an emerging challenge. You need to make sure you can deliver the 5 watts maximum that the customer is asking for. But you have to be careful because you also can create voltage drop issues downstream on a PCB once everything is signed off. How do you predict that with the right stimulus management?

LPE: How much can still be saved in power in 2D for a reasonable cost?
Klein: We think there’s a lot of room left. We have spent more and more time at each generation looking at ways to save power. There’s a lot of low-hanging fruit that you don’t necessarily think is there. Even in the dynamic power area there’s low-hanging fruit. We’ve implemented fine-grained clock gating, and smart software at the post-synthesis level can take advantage of that. Designers also could do it pre-synthesis, as well. There are so many things people haven’t done yet that there is a lot available. We’ve also built headroom into our 28nm processes. We have a large number of parts we can offer at a lower voltage, which gives us the ability to lower dynamic power by 20% just based on the square of the voltage. Then you can still architect hard blocks, which compared to FPGA soft logic will be much better. Because we’re coming from the FPGA space, there is significant improvement.
Kulkarni: We’ve found there is significant room at the RT level. In the mobile area we found a sophisticated designer had done all he could for a quad-core design. In certain modes, there were three cores shut off and only one was running. But he found hot spots on those cores when they weren’t supposed to be running, so he investigated further. It turns out that the dynamic power, which is the relationship of four signals—data, clock, enable and reset—was not off completely. Data was circulating. Functional verification showed no problem. Formal verification showed no problem. But those three cores were consuming useless power. Once he found the problem, he reduced dynamic power by 22%. But there is no single button to push for that. RTL debug is becoming a way of finding those problems. That’s not really low-hanging fruit, though.
Pangrle: A lot of what you’re bringing up is that customers are new to active power management. That can get built into their flow and it’s it’s something that could be caught during verification. You can create assertions to catch those signals. People will find incremental ways to improve things from the RTL level down, but the real progress will be in looking at it from a system perspective.

Worst Case Power Varies With Geometrics

Thursday, July 8th, 2010

By John Blyler
When designing for low power operation, engineers are constrained by the worst case (highest power) ratings for the silicon. But the power distribution characteristics of silicon can vary significantly from wafer lot to lot for the latest, lowest process geometry. How can designers deal with the worst case power ratings in their low power, high volume FPGAs designs?

First, let’s consider the process. To establish the power distribution range for their products, FPGA vendors start with a target yield. This yield provides the initial cost structure and allows them to publish numbers based on characterization over a statistically meaningful number of wafer lots, notes Christian Plante, director of marketing for low-power and mixed-signal FPGAs at Actel. “We characterize our silicon over many lots. Thus, it can take us a little while to put worst-case numbers (for the latest geometrics) into our software modeling tools.” The reason for this delay is that the latest process geometric nodes are less tamed than the older, higher, established nodes.

Characterizing worst-case conditions at higher nodes like 130nm isn’t a big problem. The manufacturing processes at these geometrics are well known. Thus, the power distribution curves are much tighter with less variation.

It’s the lower geometrics, like Xilinx’s and Altera’s 28nm processes, where the power distribution between wafer lots will be the most variant. And while this variation will tighten-up as the process matures, that will take some time.

Process variations during manufacturing also can worsen the affects of static power leakage, notes Michael Kendrick, product planning manager for Lattice Semiconductor. “As we move forward with geometries the voltage threshold decreases, which in turn causes static power leakage to increase, relative to dynamic power.” This results in a wider distribution of static power consumption over time – increasing the worst-case power constraints for FPGA designers.

Engineers are not without options. There are several techniques to mitigate the effects of static power leakage. For example, designers can be more careful on the mix of high-speed transistors used, since these transistors have higher leakage, says Kendrick. There are also process improvements that reduce leakage at 28nm.

The uncertainties of exact worst-case low-power conditions at lower geometrics, like 28nm, may give FPGA vendors of higher node chips an advantage. After all, the power distribution at higher nodes is more fully understood. Less variation in the power distribution of well-known, higher node geometrics should translate to less variant in the worse case power ranges.

But Actel’s Plante adds a note of caution, explaining that if the power distribution strays too far outside of customer expectations then the FPGA vendors can’t sell those chips—except to a customer that will accept the additional power consumption.

Further, FPGA vendors at the lower process nodes, like Xilinx’s new 28nm Virtex 7 and Altera’s Stratix V product lines, offer the lower power that is inherent with the move to smaller process geometry. Also, Xilinx emphasizes the power benefits of scalability with their new 28nm offerings. Both their lower-end, higher-volume and high-end, higher-performance FPGA families are built on the same underlying architecture, which may help mitigate the effects of wafer power distribution variations at the newer node.

The move to new process geometrics always brings new challenges. Fully understanding the variation of power distributions within the silicon is but one of those challenges that FPGA designers must understand when designing to worst-case power conditions.

Special Report: Using FPGAs For 3D Stacking

Thursday, June 10th, 2010

By Ed Sperling
Xilinx is developing a 3D architecture for its FPGAs and Actel has been approached by SoC makers to use its flash-based FPGA as a layer in a 3D IC stack. Both approaches could radically alter the fundamental equation about the tradeoffs between FPGAs and ASICs—particularly the power and performance overhead normally associated with programmable logic.

Xilinx declined to comment, but a half-dozen independent industry sources familiar with its efforts have confirmed the 3D development is well under way. Rich Kapusta, Actel’s vice president of marketing, applications and business development confirmed his company has been approached by SoC makers to use the company’s non-volatile flash-based FPGA as a layer in their 3D SoCs. He declined to comment further.

Getting 3D chips this kind of work done is anything but guaranteed. It’s complicated and there are lots of pitfalls, such as accessing RAM or logic across multiple die. Nevertheless, the implications of these developments are enormous. Because of the very regular and controlled structure of an FPGA, it is extremely well suited to defining where components can be placed on a chip. That makes it much easier to predict hot spots caused by putting two or more chips together—a problem that becomes particularly thorny when chip layers are developed by multiple vendors without knowledge of the thermal characteristics and layout of the other components.

3D stacking makes it far easier to bump up performance at advanced nodes using shorter wires while reducing power because it takes less power to achieve that performance over shorter distances. But getting this accomplished with SoCs has been particularly difficult. As a result, sources say the need for FPGA prototypes may change FPGAs into the end game rather than an in-between step.

Moreover, both moves also are expected to open huge markets, finally, for advanced EDA tools to work on complex FPGA designs, as well as third-party IP, processor cores from companies like ARM, MIPS and Virage Logic, and interconnect fabrics such as network on chip. They also can open up 3D to mainstream development. While companies such as IBM, Freescale, Qualcomm and Texas Instruments have been working on 3D chips for years—IBM started its R&D in this area almost a decade ago—most of that work has been a closely held secret because it is considered a competitive advantage for performance and power. FPGAs can quickly turn that into a less expensive option that may have more overhead than bottom-to-top 3D ASIC designs, but far less than 2D ASICs.

Issues in 3D
FPGAs can solve one of the biggest problems in 3D stacking, namely standards for placement of components. Without those standardized approaches there will likely be some ugly finger-pointing when two chips are put together.

“One of the problems that we see coming is who’s going to pay for a bad part,” said Andrew Yang, chairman and CEO of Apache Design Systems. “Testing may show that memory and logic are all good and that the die works, but when you put it together with another chip it may turn into a bad part. So you can say it’s good, and all your testing and verification may show that it is, but when it doesn’t work who pays?”

Yang said there is a need for far more analysis of the stacked die, measuring everything from heat and power to electrostatic discharge and signal integrity.

“We also need to understand what are the killer applications and what applications are not good for 3D,” he said. “The compelling value of 3D is shorter distance, which is the TSV promise. The challenge is in coupling chips together. In 2D you could shield high-speed signal transmissions. You get a cross-coupling effect with a TSV, so there is promise but there are also challenges.”

One of the big draws for 3D in general is the ability to re-use IP, which may come in the form of entire chips. That doesn’t work too well, however, when those chips were created for the best utilization of real estate on a 2D structure, where heat dissipation is relatively simple. In 3D, putting chips together can sandwich heat between die with no way to get it out of the chip.

“When you stack die you concentrate the heat,” said Carey Robertson, product marketing director for Calibre Design Solutions at Mentor Graphics. “That affects chip reliability, either short-term or long-term because they’re operating at temperatures they’re not expected to operate at. Circuits perform differently at 100C or 125C or 130C. At 130C it may affect the core, the timing, the signal integrity.”

While the overall heat of a chip hasn’t changed much, the more tightly everything is packed together the more difficult it is to cool. “When you stack them, you concentrate that heat even more,” Robertson said. “Potentially, when you move the wires closer together you can reduce resistance and IR drop. There would be a decrease in power and heat, but we have not seen enough of that yet to draw that conclusion.”

Under the covers, there are two technical ways to make this all possible, according to an ARM insider. “The first is for TSVs at similar pitch to solder bumps (about 50nm). This expands the capability of FPGAs and creates what amounts to multi-FPGA chips, as well as allowing for better-integrated flash, DRAM, and high-performance logic. The limited inter-chip bandwidth and power delivery, along with thermal issues, keep this as more of a cost dynamic – an extension to existing SiP approaches,” said the source. “The second answer is for high-density future TSVs, at a pitch of less than 5nm. These increase inter-chip bandwidth by a factor of 100 over the first solution and allow for some game-changing capability, including wide word high-speed off-chip memory access, combined FPGA/logic solutions, multi-die FPGA (greatly increased gate count) and so on. The reconfigurable aspect of FPGAs may also help solve the test and fault tolerance issues that are a very significant impediment to making tight pitch TSVs viable. Neither of these eliminates the crossover argument on power and performance, but they both have the potential to move it.”

Programming the future

Whether this effort ultimately succeeds is anyone’s guess. What is known is that a lot of resources are being marshaled into 3D stacking and a lot of hopes are being pinned on the back of efforts such as those from Xilinx and Actel’s partners.

Tom Quan, deputy director of design methodology at TSMC, said the great advantage of FPGAs is that they are very regular. “You can predict the thermal profile much better than with a mixed-signal SoC. Analog can be all over the map. But while the base array may be regular, in another corner of the chip you might have a USB so the outside of the chip might be hotter than the inside.”

Still, there was a lot of hype behind multi-chip modules in the 1990s and so far they have failed to materialize as a popular solution, largely because of cost. That could change as double patterning becomes the norm at 22/20nm and standard production costs rise, but visibility remains limited at that node.

At the very least, the moves by FPGA players are worth tracking, and a lot of companies are predicting major changes if these scenarios work. There are reasons FPGAs may hold more promise than multi-vendor or multi-generational SoCs. But there are still a lot of challenges to resolve before the total cost of development is known

The Week In Review: June 4

Friday, June 4th, 2010

By Ed Sperling
ARM, Freescale, IBM, Samsung, ST-Ericsson and Texas Instruments teamed up to create “Linaro,” an open-source software engineering company. The stated goal is to speed the development of Linux tools and foundation software. While this is great for large processors, the real question is just how much Linux technology will be scaled down. In many applications, size matters, and being able to work with open source software in a smaller footprint is a big plus when it comes to power issues.

MIPS added symmetric multiprocessing support for the Android platform using multicore MIPS SoCs. This gets particularly interesting because in addition to multi-threaded applications, there is a trend to dedicate specific functions for cores. The possibilities are enormous, both in terms of functionality and more efficient power utilization.

Mentor Graphics updated its verification lineup just in time for DAC. The company rolled out version 3 of its O-In formal verification, adding better support for mixed language design and tighter integration with its Questa platform. The company also released a O-In CDC update for clock-domain crossing verification. While these are interesting releases in their own right, it looks particularly interesting for SiP and 3D stacking.

Synopsys, meanwhile, rolled out high-level synthesis support for Xilinx’s Virtex-6 FPGAs. Design of FPGAs used to be relatively straightforward, but at advanced process nodes they encounter the same headaches that SoCs do—area, power, performance and verification.

Sound quality may be the next big selling point in the PC and netbook space, along with battery life and I/O speed. ASUS is betting the bank on Virage Logic’s Sonic Focus as a differentiator, complete with new enhancements. So much for the tinny-sounding speakers that make it next to impossible to understand anything.

The Week In Review: March 5

Friday, March 5th, 2010

Actel set the FGPA market ablaze with its new SmartFusion device, which combines programmable analog with a complete microcontroller subsystem and an integrated programming environment, including tools. This is an interesting move, and it will be equally interesting to see how long it takes Actel’s top rivals to respond. Actel insiders, most of whom came from Xilinx and Altera, say the catch up period may be quite lengthy. They may have a bone to pick, but the low-power angle is definitely interesting. This also should grab some attention from the companies that have been developing multichip solutions because they don’t want to deal with integrating analog and digital.

Mentor Graphics announced its fiscal Q4 financials for the full year ending Jan. 31. Revenue was $802.7 million, up 2% from fiscal 2009. Non-GAAP earnings per share more than doubled to $0.47 per share, while the GAAP loss was $0.23 per share. That’s a lot better than a loss of $0.99 per share. For the fiscal Q4 Mentor revenues of $237.1 million, non-GAAP earnings per share of $.30, and GAAP earnings per share of $.39. As Mentor chairman and CEO Wally Rhines pointed out, “the electronics industry recovery seems to be well underway.” Break out the champagne—but don’t spend more than $8 a bottle or the corporate accounting department won’t approve it.

Synopsys bolstered the capabilities of its System Studio C/C++ analysis and simulation environment. The product now includes support for matrix and vector data types, which the company says significantly reduces coding and debugging efforts.

The Taiwanese earthquake earlier this week registered 6.4 and cost about 1.5 days in wafer movement from TSMC’s fabs in Tainan. This was a big earthquake, but the impact was slightly less near the Tainan fabs.

You have to wonder about Wall Street. Marvell beats estimates by $500,000 and the stock tumbles. According to analysts, the company didn’t beat estimates by enough. Isn’t the whole point to meet estimates?

Intel added the Atom processor to the networked small office/home office storage market. What’s interesting about this announcement isn’t Intel’s push into this market. It’s that there is now a dual-core version of Atom available. This should make for a nifty ultra-low power solution.

Writing Application Software Directly To The Metal

Friday, March 13th, 2009

By Ed Sperling

How necessary is an operating system?

That question would have been considered superfluous a decade ago, possibly even blasphemous and career-limiting. But it now is beginning to surface in low-power discussions, particularly in compute-intensive applications where performance and power are both critical. General-purpose operating systems constantly call on the processor for updates, while software written straight into the metal using Verilog or System C can be written for specific cores.

Highly parallelized applications such as search, particularly in bioinformatics, already are exploring writing applications directly into FPGAs. And heterogeneous cores may give application developers more reason to write to the chip rather than an operating system application programming interface (API).

For application developers, power is as much a balancing act with performance as it is for hardware developers. While classical scaling before 90nm provided both power and performance benefits at each process node, the decision has moved largely to one or the other. For every gain in performance, there has to be a subsequent drop in power somewhere on the chip. Otherwise the clock speed cannot be improved without burning up the chip.

That has prompted software developers to look for different solutions. Even Intel, whose success was built almost entirely on tight integration with operating systems—Windows, Mac OSX and Linux—is looking at utilizing some of the cores in its future chips differently.

“There is broad agreement that we need to be able to represent the ability to do parallelism at the application level and not force everything through the operating system,” said Pat Gelsinger, senior vice president in charge of Intel’s Enterprise Group. “Any time you have a call through the operating system to get a resource—whether it’s a thread or an I/O—your application has gone away for thousands of clock cycles. You want to do that when you need something that only the operating system can give you.”

Typically the operating system acts like a layer of middleware. It makes the connections through its APIs that allow applications like Office to work together so that portions of one application can be dragged and dropped into another. But in highly parallel applications, the interactions are largely within the application rather than with other applications.

“There is an active effort to move some of this parallelism to the application level so the application programmer, given the right tools and libraries, can take advantage of that.” Gelsinger said. “Microsoft has taken steps like that recently with networking and the NPI (network programming interface) layer—moving it into the user space. Use the operating system for what you need it for, but allow parallelism to be more lightweight. Those steps are under way, and they will have great benefit. It started out as the HPC (high-performance computing) community, where they were using tens of thousands of threads.”

IBM is likewise experimenting with a thinner operating system layer for its Power architecture. Brad McCredie, chief architect of the new Power 6 chip and an IBM Fellow, said one of the first examples are hardware accelerators, which are being used to speed up applications.

“We’ve already created an architected layer in the Cell processor,” said McCredie. “It’s not exactly writing software into the metal. We gave the software programmers an architected interface, so we hid some of the messiness of the 100 gigaflop accelerator with a new generalized interface, which is OpenCL. We expect to put in multiple types of accelerators in the future.”

At some point, though, even this approach will run out of steam. McCredie said the debate inside IBM right now is when exactly that point will occur. He believes it will happen at 22nm.

“Eventually we’re going to run out of power on a chip,” he said. “The next way will be to design devices to do fewer and fewer things. That trend will happen. The question is whether we will be able to invent a more specific device that can do 80% of the workloads at less power? If it only does 10%, then no one will write a line of code for it. But if it covers 80%, then it will have much better power/performance.