Posts Tagged ‘ASIC’

End User Report: Reliability

Thursday, October 15th, 2009

John Kern, vice president of product operations inside Cisco Systems’ customer value chain management group, sat down with Low-Power Engineering to talk about the company’s internal focus on reliability and what factors are causing the most concern. What follows are excerpts of that conversation.

By Ed Sperling
LPE: How does Cisco gauge reliability?
John Kern: The bulk of our revenue today is switching and routing products, which have high use of complex ASIC and microprocessor technology. We have a pretty extensive technology qualification process, as well as an individual component quality process. At a high level, we focus on robust design with solid margin, manufacturing, and to complement all of that are high-reliability components. We have one process for all right now, but we clearly see the need to differentiate depending upon the use case.

What’s involved in that process?
We have a preventive process and a reactive process. The preventive piece starts with partnering with the right suppliers, component selection for the [bills of material].

Does that require new attention to suppliers?
For the past five years, we’ve had a process to have very tight alignment across our critical technologies. So for our ASIC supply base, SERDES and PHY, we have deep partnerships with a handful of companies that we really rely on. We count on them to make investments in areas where we have need, and to do that is we have to be articulate about what our future holds. The payback for that investment—and at times it’s a significant investment—is we reward them with new business. This has borne a lot of fruit in terms of major technology transitions for us. It also has given us access to intellectual property that has enabled our products.

Does complexity from low-power designs change anything?
Nothing is radically changing in terms of complexity. We embed a lot of differentiation and intellectual property in our ASICs. Each technology node has its own transition. The transitions of late have not been as severe. As we move to 32nm and beyond, the complexity curve goes up. The issue around power and reliability is definitely more of a challenge, and it’s something we’re spending a lot more time and energy trying to get ahead of. We’re using techniques that emphasize power reduction in our ASIC designs that are probably commonplace in handsets, but they haven’t been as prevalent in switches and routers.

These are techniques like power islands and various on/off states?
Yes, multiple Vt’s and clock gating and making use of techniques to optimize for power. We had the luxury in the past of creating an architecture, and whatever the power below was it was acceptable and we designed the system around that. We’ve flipped that around now to where we start with a system-level power budgeting process that then drives down to the individual boards and the individual components.

Why is Cisco involved that deeply?
A lot of it begins with our customers. There’s also a green component. We started to do things like embed into our requirements documents, which define the deployment of the product way up front, more considerations around green and sustainability. These are things like considerations for high-efficiency power supplies for our ASICs and recycling at end of life, which are things we never built into creation of a new product.

Is it all ASICs, or are you moving into programmable chips and SoCs, as well?
In terms of the effect we can make on the system-level, it’s largely ASICs. We are also probably one of the largest users in the world of PLDs. But optimizing in those areas doesn’t make as big an impact on the system level as we can with ASICs. And as far as SoCs, the lines are blurring between SoCs and ASICs. The distinction we make there is we control the designs.

In the low-power world, there’s so much complexity that debugging the chip is becoming more difficult. Is that a problem?
Clearly. At every process node there is a shift. Given the complexity of the ASIC designs we have, this isn’t a new phenomenon. We’ve been dealing with power modeling, signal integrity, multiple Vt planes on the same chip, some of the interaction between substrate design and chip design.

What node is Cisco at?
We’re at 65nm. We’ve launched a host of 40nm designs. We’re on an unusual schedule, though. A lot of companies will launch designs at the lowest power process because they want the learning early and the ramp to volume is faster. We require performance and lower power, so it’s really a function of where the IP qualification is and how far behind the process schedule that is.

Does Cisco do its own designs?
We do most of the ASIC work ourselves. When we enter into an SoC joint development it can be shared, where pieces are done by third parties and our suppliers and pieces are done by us. In most cases we’ll handle the stitching of the chip. We’ve outsourced that a few times, but it’s pretty rare.

Cisco doesn’t have its own fabs though, right?
No. We outsource the fabrication to the traditional players.

Does that mean you’re subject to the more restrictive design rules?
Yes. We will engage with more traditional ASIC players that have something unique in their flow, but we’ll usually wrap the design around their constraints. We use their design rules and libraries and we follow those, depending upon the supplier we’re working with. A few years ago we did all the intellectual property and back-end work. We were our own fabless chip supplier. It played out well for a couple nodes, but when we used beyond 90nm commercially it didn’t make sense.

As you push into 32nm, do you foresee more issues with quality and reliability?
For reliability, absolutely. IBM is very fearful that the techniques that have served them well for predicting the lifecycle of a product will be affected by pushing the voltage curve. Our products should last for 10 years. We’re certainly aware of the risk and we’re working with our key suppliers to learn what we can as we venture into that node.

Will you necessarily move to the next node as quickly as in the past?
I think so. It’s hard to tell how much is an aberration based on the current economy or Moore’s Law. We are certainly starting fewer designs because we can make use of the technology to pack more functionality into devices. The devices are more complex. When you look at future nodes, though, the learning curve is going to get steeper.

One last question: How does all of this affect your make or buy decision?
We’ll use as much outside content as possible for the areas that aren’t differentiating. If we do a good job articulating to our partners what we need and it works for them, we’ll usually get the investment we need. There are some cases where we need a capability that’s core to us, so we’ll partition those off. As a company, we’re also starting to get into some adjacent markets. We’ll use whatever is available to get us into the market quickly. The benefit of leverage or cost or scaling that custom silicon can give you isn’t a factor for jumping into a new market.

Feel The (Low) Power

Thursday, September 17th, 2009

By Clive (Max) Maxfield

When I designed my first ASIC way back in the mists of time (circa 1980), its power consumption was the last thing on my mind. You have to remember that we’re talking about a device containing only about 2,000 equivalent gates implemented in a 5 micron technology. Also, I was designing this little scamp as a gate-register-level schematic using pencil and paper (I predate EDA as we know it today).

We didn’t have automated schematic capture or logic simulation or timing analysis tools. Functional verification was performed by your peers looking at your schematics. You explained what a particular portion of the design was going to do, and they thought about it and said: “That looks good to us.” Similarly, timing analysis involved deciding which paths were important, and then adding up all of the lump-load delays specified in the cell library data book. Once again this task was performed by hand using pencil and paper (no one I knew could afford one of the recently-introduced electronic calculators. Those were of interest only to well-paid managers who needed help balancing their expense accounts. What do you mean? Of course I’m not bitter.)

So my main concern was squeezing my design into 2,000 gates. The thought of how much power my device was going to consume literally never even crossed my mind. I’m sure that someone at the system level had some sort of power budget. Actually, as opposed to a “budget,” which implies some upper, not-to-be-exceeded value, I think it was more a case of keeping track as to the total estimated power consumption; multiplying this by some factor to provide a margin of safety; and then throwing a big enough power supply unit into the system.

Once again, you have to remember what we were trying to achieve. This was before the days of hand-held, battery-powered, portable electronics products like MP3 players and GPS receivers and cell phones. My ASIC was intended for use in the CPU of a mainframe computer. The circuit board (one of many forming the system) on which my device was to reside was about 1/4″ thick, around three feet long by two feet wide, with its periphery populated by power studs capable of handling the power requirements of a small town. Ah, the good old days…

How the world has changed. Over the last few years, power consumption has moved to the forefront of ASIC and SoC development concerns. And this interest in low-power design is not restricted to portable, handheld products. Instead, it spans the entire deployment domain including fixed installations such as all forms of computing, networking, and set-top boxes, to name but a few.

As a simple example, consider the fact that every time you request a search on Google, the servers in the data centers consume an average of 4.5 watts. Remembering that Google is easily processing 400 million queries a day, so this equates to 1.8 billion (1,800,000,000) watt-hours of energy being used daily to handle basic search queries. If you can significantly reduce the power being consumed by the CPUs and support chips in these servers, this will have a HUGE impact on Google’s bottom line and – more importantly – the environment.

Over the recent years a wide variety of low-power design techniques have evolved to address the various aspects of the power problem and to meet ever-more-aggressive power specifications. These days, power planning can no longer be considered as an afterthought; instead, system architects need to make power-aware architectural decisions and power-aware third-party IP selections at the very beginning of the development process.

There are so many aspects to the low-power story that we could write a book on it, but I’m a hardware design engineer by trade, so for the purposes of this article I’m going to briefly summarize the various low-power implementation technologies that are available to us as follows:

Clock gating
Although clock-gating may seem a little boring, doing it right can significantly reduce the design’s dynamic power consumption. This is because the clock trees in a modern ASIC/SoC can account for one-third to one-half of a chip’s dynamic power consumption.

Clock-gating involves all sorts of decisions. For example, should it be performed only at the bottom of the tree (the leaf nodes), at the top of the tree, in the middle of the branches, or as a mixture of all of these cases? There are tools that can help in moving the clock-gating structures around (upstream and/or downstream) and in performing tasks like splitting and cloning.

The current state-of-the-art in clock gating is “multi-stage gating,” in which a common enable is split into multiple sub-enables that are active at different times and/or under different operating modes. Although clock-gating offers big paybacks, it adds substantially to the task of physically implementing the clock tree(s) and also verifying the tree(s).

LP-01-a

Note: Terms such as “Little”, “Low”, “Medium”, and so forth as used in diagrams like the one shown above are intended only to convey relative quantities in the context of the entire chip and/or development process.

Multi-Vt optimization
Static power dissipation is associated with logic gates when they are inactive (static); that is, not currently switching from one state to another. In this case, these gates should theoretically not be consuming any power at all. In reality, however, there is always some amount of leakage current passing through the transistors, which means they do consume a certain amount of power.

Even though the static power consumption associated with an individual logic gate is extremely small, the total effect becomes significant when we’re playing with devices containing tens of millions of gates. Furthermore, as transistors shrink in size when the industry moves from one technology node to another, the level of doping has to be increased, thereby causing leakage currents to become relatively larger. The end result is that even if a large portion of the device is totally inactive it may still be consuming a significant amount of power. In fact, static power dissipation is expected to exceed dynamic power dissipation for many devices in the near future.

Now, static power dissipation has an exponential dependence on the switching threshold of the transistors (Vt). In order to address low-power designs, each type of logic gate is available in two (or more) forms: with low-threshold transistors that switch quickly but have higher leakage and consume more power, or with high-threshold transistors that have lower leakage and consume less power but switch more slowly. Of course this leads to other problems, such as unwanted signal integrity effects, but that’s a topic for another day.

LP-01-b

Multi-supply multi-voltage (MSMV)
The idea here is that blocks in the design that are powered by higher voltages run faster and consume more power than blocks that are powered by lower voltages. Thus, if we have a block that can run slower than surrounding blocks, it may make sense to implement this block as its own “voltage island”.

The downside is that we no require the insertion, placement, and connection of specialized power structures, such as level shifters, power pads, and so forth.

LP-01-c

Dynamic and adaptive voltage and frequency scaling (DVFS)
In this case, the idea is to optimize the tradeoff between frequency and power by varying the voltage or frequency in relatively large discrete “chunks.” For example, the nominal frequency may be doubled to satisfy short bursts of high-performance requirements or halved during times of relatively low activity.

Similarly, a nominal voltage of 1.0V may be boosted to 1.2V to improve the performance, or reduced to 0.8V to reduce the power dissipation. Of course this can quickly become a verification nightmare, because each of these scenarios has to be tested in the context of surrounding blocks, which may themselves switch from one mode to another.

LP-01-d

Power shutoff (PSO)
As its name suggests, power-shut-off refers to powering-down selected portions of the design that are not currently in use. If, for example, your cell phone includes an MP3 player capability but you aren’t currently listening to any music, then powering-down that function will save power.

In this case, designers have to choose between “simple power shut-off” where everything in the block is powered down, and “state retention power shut-off,” in which the bulk of the logic is powered down but key register elements remain “alive.” This latter technique can significantly reduce the subsequent boot-up time, but state-retention registers consume power and also have an impact on silicon real-estate utilization.

LP-01-e

Substrate biasing
Substrate biasing is typically applied only to selected portions of the design. The idea here is that a functional block typically doesn’t need to run at top speed for the majority of the time, in which case substrate biasing can be applied, which causes that block to run at a slower speed but with significantly reduced leakage power. The benefits from substrate biasing can be large, but actually implementing it can be a pain in the rear end.

LP-01-f

Summary
The problem with low-power design is that there’s so much of it. We really only have scratched the surface of it here. For example, at the initial system-design, architectural evaluation level, one critical task is to partition the system into its hardware and software components. Hardware implementations are fast and consume relatively little power, but they are “frozen in silicon” and cannot be easily modified to address changes in the standards or the protocols. By comparison, software implementations are slow and consume a relatively large amount of power, but they are extremely versatile and can be modified long after the chip has gone into production.

Another interesting area to consider is the interconnect mechanism used to link the various functional blocks forming the chip. For example, conventional synchronous bus architectures constantly burn power, even if they aren’t actually moving any data around. One solution is to move to a globally asynchronous locally synchronous (GALS) architecture. In this case, data flows as fast as possible through the self-timed (asynchronous) interconnect network because there is no waiting for clock edges, and the power consumed by the buses is dictated by their traffic loads. Furthermore, the clocks associated with the synchronous blocks can be stopped (or gated) when those blocks are not being used.

And we’ve really only pondered various aspects of low-power design. Another huge area of interest is low-power-aware verification. Take the case of power shut-off for example. This rarely involves powering-down a single block, but when multiple blocks are being powered-down (and back up again) there has to be a defined sequence. And what happens if the device is halfway through a power-down sequence when that pesky user presses a button requesting this functionality … in this case the chip has to gracefully about the power-down and return any already-disabled functions to their active state. All of this has to be verified. But that’s a topic for another day.

Experts At The Table: Greener Design

Wednesday, April 15th, 2009

By Ed Sperling

Low-Power Design sat down to discuss green technology and the future of low-power design with Rich Kapusta, Actel vice president of marketing and business development; Tom Quan, TSMC senior director of EDA and design service marketing, and Brani Buric, Virage Logic executive vice president of marketing and sales. What follows are excerpts of that conversation.

LPD: Is communication more open these days between the various parties involved in the design-to-manufacturing flow?

Quan: It’s improved greatly in the past couple years. Traditionally, for people doing digital designs, you needed the SPICE model. That has been available and there is no issue there, except that there is a lot more information than before when you go down to 45, 32 and 28 nanometers. And most of the digital designs will need to have access to the timing models for standard cells and memories. Those are pretty available. The challenge is when you go to a new process node, before the design can take advantage of that the infrastructure has to be available. We have to re-work with the IP vendors so companies can start building RAM—even though the process may change. We still need to go to pre-production so customers can start using it.  That’s the part that’s more challenging.  The actual mechanism of transferring data is not an issue anymore.

Kapusta: We’re on a process technology that’s somewhat unique, so we’re forced to co-develop the process with the foundry, which in our case is UMC. We’re working on 65nm embedded flash with UMC to give us the best technology for our FPGA families. We have our own process development engineers working with UMC, we’re doing test chips together and we’re tweaking the process together. By the time we finally tape out we’ve seen a couple runs of silicon, we understand what we’re doing, and we’re working together to get it right from the very beginning as opposed to waiting for a foundry to create a process node and jump onto that. We’re co-developing the process node.

Buric: That’s been a trend for several process nodes. Foundries are developing application-specific processes. With TSMC, when you go up to 65nm and 90nm, there is optimization for mixed-signal and ultra-low power processes, and even CIS (CMOS image sensor) processes. The idea of a general-purpose process serves a smaller and smaller market segment. More and more you will see applications that drive huge volumes will also be able to drive modifications to the process.

 

LPD: Are the foundries seeing that, as well?

Quan: Yes, that definitely is the trend. When you go down to 40, 32 and 22 nanometers, there will be mostly SoC designs. You have fewer of those, but the volume is larger and those customers have very specific requirements for their products to be competitive. Those will be more specialized processes. Last year we introduced the open innovation platform, which allows collaboration to go much deeper and much earlier in the process. One of the main features is design co-optimization so that each side can take full advantage of what’s available on the other side. We can trim 20% to 30% of leakage power even before tapeout. That was not possible before when everything was separate.

 

LPD: Is the concern low power or performance—or both?

Kapusta: For us it’s all about low power. Our customer base is not performance-driven. We’ve already surpassed all of their performance needs at the current node. When we go to the next node it’s all about getting more and more power out of the system, not making it go 10 times faster.

Quan: For the computer guys, it’s still all about performance.

 

LPD: That’s the plug-in computers, though, right? Not the notebooks?

Quan: Yes, the servers. Even for laptops, it’s hard to say you want less performance. Intel now has a 2GHz version of the Atom that only takes 1 watt. Communication and consumer are all about low power.

Buric: Atom is a good example. Even where there is a need for performance, those designs are built with low power in mind. If you just said run it at the highest performance possible with no concern for power, it would be in a ceramic package and require liquid nitrogen to cool it down. With everything we have seen at 40nm, they design for performance, but they also design for low power because it is cost-effective. With packaging costs and a huge number of transistors, you cannot afford to make those designs if they are not low power.

 

LPD: In some designs, the clock speeds on individual cores aren’t getting faster. Are we getting to the point where adoption of new nodes will slow down?

Kapusta: We’re not even talking about 32nm. We’re strategically behind the leading edge because we don’t need that performance and we can get the power we need one or two nodes back. We’re at 65nm now.

Quan: For mixed-signal RF and analog, most designs don’t go that fast. With the 65nm general-purpose process, you can push the 60GHz transceiver. That’s the probably the highest frequency we push in any market. But for computing and graphics, the trend is still there and going down. We had a lot of activity at 40nm and 28nm. Traditionally there were bell curves with adoption and maturation. It probably will get flatter, though, as time goes on. The highest revenue producer for us is the 90nm and 65nm nodes. More than 60% of our revenue is there. That’s the sweet spot, and it’s where most of the activity will occur for some time.

Kapusta: Even at 130nm when we came out with our ProASIC 3, it was low power but it wasn’t that low power. It was still higher performance. When we came out with the Igloo line on the same node, we pushed the equation further into the power side. That family is more popular than the ProASIC 3. It’s basically the same architecture but lower power vs. higher performance.

 

LPD: Will Igloo ever fit into embedded applications?

Kapusta: Right now you can embed an ARM soft core into the chip.

 

LPD: How about the other way around—embedding Igloo into other chips?

Kapusta: We have had a few conversations about that. Some customers are trying to figure out how to embed it into other processors, but so far, no.

 

LPD: On a different subject, is the trend toward stacked die?

Quan: Stacked die is a different term for 3D chips. It’s already there for the memory companies. Most of the connections are still through bonding, but one technology that is still in the works is through-silicon vias. You actually drill holes through the wafer and fill it with copper. There are still a lot of issues to solve, but the technology is there and the prototypes are done. Certainly timing has to be worked out, but the real issue is how to distribute the thermal crests of these die and how these conducting columns are supposed to be behave. The good news is that silicon is not a bad thermal dissipator. The challenge is when you stack things against each other—a processor next to RAM next to analog. You need to analyze what gets affected most.

 

LPD: Is it lower power when you stack a die, though?

Quan: If there’s any change, it’s the power dissipation in the interconnect. If you have four cores and it’s flat, the signal needs to travel across these connections on a narrower line versus a fat copper interconnect between die.

Buric: A big problem to solve is how to test for a good die before you start stacking things. Your yield changes when you add these interconnects. The problems multiply.

Kapusta: I think you can get lower power by stacking. If you look at two functions you can choose the lowest power process implementation for each of these two functions separately and stack them together versus making a compromise of integrating them on a less optimal process. If you’re looking at 65nm flash and want to stack it with some memory, you can choose a 40nm SRAM process and get the best of both worlds. If you want to implement it on a 65nm flash, you make compromises. You can make lower-power SIPs (systems in package) by taking the best of each element you’re trying to stack.

 

LPD: Is there a way of manufacturing these so the cores are not the same?

Kapusta: When you think of multicore, you’re thinking high performance. We have customers implementing multiple soft cores on a fabric. As we start embedding hard cores into our FPGA, people will have the opportunity to use hard cores and soft cores and build a system based on the processing chunks they need rather than being forced into some choice.

Quan: There are two trends. One is a more general-purpose platform where the cores will be different, but each one has the same purpose such as processing or graphics.  The other thing we see is where each core is custom. They’re all small, very low power, but there may be 500 or 1,000 of them. Those are for very specific purposes like simulation or processing of security applications. Instead of using a general-purpose application where you waste a lot of power, you make the cores very specific and very low power. Each of the cores is maybe 100th of the size of a standard core in terms of size and power.