Posts Tagged ‘MIPS’

Next Page »

The True Test Of IP Reuse

Thursday, March 17th, 2011

By Ann Steffora Mutschler
Fewer and fewer systems and semiconductor companies are designing brand new processors from scratch. Instead, they leverage as much IP as possible in their designs, investing selectivity in areas where they can add significant value. The challenges are varied from low-power issues to process technology migrations.

Generally, IP consumers are doing two levels of IP-based reuse. First, they re-use fixed instruction set architecture (ISA) processors because by configuring and extending the instruction set they can make the base into something of their own. “They can add differentiation to it that you don’t get if you just license totally fixed processors from a totally fixed kind of processor company,” said Grant Martin, chief scientist at Tensilica.

Another level of IP-based reuse includes configuring and extending a processor because substantial reductions in energy consumption and peak power dissipation can be achieved if it can be tightly tuned to the application. “We can actually start by a processor type in a domain. I think audio processing is a really good illustration of how the application-specific nature to a processor allows you to get substantial reductions in energy consumption for very standard things like various kinds of audio codecs because the instructions you execute are so tuned to the application itself,” he said. “In many, many handheld and portable devices, people use programmable audio processors or ASIPs (application-specific instruction-set processors) that are highly tuned in that way. And it’s very few people who, as a result, would be tempted into designing their own unique hardware blocks for that function.”

MIPS has seen much of the same activity among its customers. Mark Throndson, director of product marketing, said customers often leverage standard IP such as a piece of USB IP, as it is “probably much better to leverage somebody’s standard implementation and the various compatibility testing and certifications that it’s already gone through from the supplier than going and rolling their own, over and over again.”

MIPS believes that in addition to the inherent value in terms of core functionality, the ecosystem is just as important. “It’s standards, it’s the breadth of software and an ecosystem around a standard microprocessor architecture in the industry—and the value on that is huge,” he said. “It makes a lot more sense in most cases for companies to leverage a standard architecture like MIPS to gain access to tools, software and surrounding components on the SoC than to ‘roll your own’ in that regard.”

Low-power considerations for reusing IP
When it comes to low-power considerations of IP re-use, there are two areas not governed by standards: power consumption and area.

“You can still meet the standard but consume lots of power or consume lots of die area, so what we’re doing is making sure that for that particular standard the area and power numbers are still attractive to the point where customers consider integrating that piece of IP on their chip,” said Navraj Nandra, director of analog/mixed signal marketing at Synopsys. “Maybe three years ago in the data center/cloud computing-type market, it was all about speed. Customers would come to us and say, ‘We want the fastest thing possible.’ Now they are saying, ‘Well, actually we want the same speed but we want the lowest power solution that we can have.’ We’ve had to change some of our design approaches while still meeting the standards. The re-use model still applies, but we’ve had to figure out new techniques for reducing the power consumption for these interfaces.”

Synopsys has been using techniques such as voltage mode output drivers for high-speed SerDes because they are lower in power consumption, although they are more sensitive to noise.

Process considerations
Ideally, IP providers try to make sure that the hardware they generate will synthesize into a wide variety of process technologies and a wide variety of cell libraries, so the customer has a lot of choices to make in terms of process, cost, power and area.

Tensilica’s Martin said the company considers how something will work on the next-generation process using the next-generation library. “There is always a demand for additional characterization data so people can understand the tradeoff choice. We believe that most of the tuning of the foundry specifics to the higher-level RTL that we generate is actually done by the cell libraries and specialized IP generators, for example for memory, analog/mixed-signal and interface blocks. Those are the areas where foundry and process-specifics really play. We want to remain with good digital RTL that maps across a whole range and gives good results in many different technologies.”

However, some IP is more sensitive to the process technology that it needs to be implemented in than others, noted MIPS’ Throndson. “The more you get to actually having to do a full physical implementation of the IP to make it real in a particular process, the more that case is true. In the case of a USB, it’s the actual USB PHY that is more process-specific. To actually use this and verify that it operates correctly in that particular process node and do testing of it, people end up doing test chips. It becomes very process-specific, so you have to be very aware and have a plan effectively for which nodes you are offering your IP in.”

On the other side, he said, MIPS’ core IP is fully synthesizable as soft cores and can be used in a variety of process nodes and flavors from a number of different foundries.

Synopsys also supports many different foundries—all of the big ones and some of the smaller ones such as SMIC. Nandra pointed out that what customers are asking for now from a re-use perspective is footprint compatibility between the IPs. If they purchase something from one foundry, and they decide that for second source or whatever reason they want to go to another foundry, they want that hard macro from Synopsys. They want the pin placement and the size of the macro to be, if not exactly the same size—as close as possible so it can be a drop-in replacement between Foundry A and Foundry B.

Is that realistic? “In one design, we went with that target. We had to change the internal layout of the designs so much because the two foundry design rules were different even though they were both 40nm, but actually the design rules are very different. The goal from the customer’s perspective was met because the pin placement and the size of the outline were exactly the same. But in order for us to do that we had to do some very, very clever layout [tricks.] That is the true IP re-use test,” Nandra said.

Power Panel: IP And Other Key Issues For Future Development

Thursday, March 17th, 2011

By Ed Sperling
Low-Power Engineering chaired a DesignCon panel of low-power experts with Bhanu Kapoor, president of Mimasic; Kesava Talupuru, DV engineer at MIPS; Prapanna Tiwari, CAE manager at Synopsys, and Rob Aitken, an ARM Fellow. What follows are excerpts of their presentations and the panel discussion that followed.

Prapanna Tiwari: UPF and CPF are text files that capture the power intent of the design.

Power management is one of the main problems we’re trying to solve in every design. The goal is to operate every given part of the chip at the lowest voltage you can get away with. If you can shut it off, you do that. If you can’t shut it off—and you can’t shut off memories—then you reduce the voltage to the lowest possible level so you don’t lose as much through leakage. From a verification standpoint, what you used to write in Verilog would appear in silicon and that was all there was to it. That’s no longer true. Now there is this idea of power intent that has to be captured. It has structure to it. It has semantics, and it has simulation sequences. It impacts every part of a design. (See Fig. 1)

Fig. 1

The product behavior has two components. One is the design. The other is the power. Verification needs to take care of this.

The power intent itself has two aspects. One is static. What are the regions? How are the regions partitioned? How do they map onto my design hierarchy? That’s where UPF comes into play. It says these are the domains. These are the different level shifters you’re going to insert in your design. That’s the structure.

But there’s a second aspect, which is dynamic. How are you going to exercise these different voltage regions on a chip? What is allowed, what isn’t allowed? If ARM or MIPS delivers cores to their customers, they need to let them know here’s how you should use it. There is no way in our current methodology, when you deliver a Verilog model, what voltage levels its supposed to be instantiated at. There’s nothing in Verilog that lets you do that. Different customers will use ARM and MIPS cores using different power management techniques, different voltage levels, different process nodes. How do you let them know you’re not supposed to do certain things?

If an IP can provide constraints that you can’t use IP in a different way, that’s where power intent comes in. You can do that from a functional standpoint today. You cannot do that it in a power-aware model. There’s no way to figure out where IP gets used. Context is missing. (See Fig. 2)

Fig. 2

Even within the same semiconductor company you will see different modules have different design owners. You don’t want anyone to be using IP in the wrong way even years from now. There is IP in designs where no one has any clue where it came from. That’s one of the key challenges for an IP provider—to generate behavioral and verification IP, not just with the VHDL view but with the power-aware view to go with it. If you can deliver this, it will eliminate an enormous amount of risk that it will reduce the cost.

For any verification, there are three pieces. There is the testbench, the design itself and then assertions and coverage. (see Fig. 3)

Fig. 3

In the overall verification, the testbench needs to be power-aware. IP users need to be able to monitor any region of the design. The way IP is growing, it may have many power domains inside. It may even have its own power controller that reacts to events from outside the IP. A customer testbench needs to know power events and sequences in different parts of the IP. Otherwise you have no idea if the IP really shut down or not.

You also need to be able to write models for the IP. One user may be at 1.2 volts. Another might be at 1.0 volts. Different signals will react differently. All this behavior needs to access the power information.

All of that power information needs to be available, and it should be context-free. And last but not least, assertions need to be power-aware. When the system is shut down, how is the IP being isolated, what are the level shifters being inserted?

To solve this, you need to be able to merge UPF and HDL into one. In your RTL you should be able to query information and build models around it.

Rob Aitken: If it’s not clear what are the issues are involved, what would make it clear?

There’s an existence proof. In one chip we had some RTL and a power spec and it turned into a chip and the chip worked. There were multiple decades of ARM experience, the latest IP and EDA tools, access to IP designers, skill in all available EDA tools and some magic smoke. But what if you don’t have all that? How many of those things do you need?

In addition, there’s no one thing called IP and there are lots of different uses for the same IP. There’s one group that says, ‘Whatever it is, give it to me, I want it to work and be done.’ Then there’s another group that might say, ‘I don’t care what you think should be done with this IP. Give me the parts and I’ll do it myself.’ What we really want to make sure of is that the standards don’t interfere with the use models and that they cover all of the possible use cases.

Context also matters. We like to talk about something like an always-on buffer. If it’s in a system where there’s a battery connected to it and part of the processor is shut down, that has a different meaning than when it’s plugged into the wall and the system is turned off. That always-on buffer isn’t always on anymore. It’s just sometimes on.

And what happens if I run my IP at 0.6 volts? If no one designed it for that, will it work? Maybe.

There are all sorts of other clever things we can do. An SRAM will retain data at much lower voltage than you can read or write it. You can have SRAM-dependent behavior. If it operates at a very low voltage you can store data, but if you write it, it will fail. Trying to model that in a high-level language is an interesting challenge.

From a soft IP standpoint, you can say here’s some RTL and here’s a power description. You only need four things:

  1. What are the atomic power domains? Are there more than one?
  2. If you shut it down, some key element of the state needs to be retained. If you haven’t thought about that initially, it’s pretty much every flip-flop.
  3. You need to know the signals that need to be isolated.
  4. And you need to know the legal power states and the transitions between them.

If you have those four things you have the power intent for soft IP. That’s not enough to actually build something. Then you take the low-power intent and refine it. (see Fig. 4)

Fig. 4

Based on the various failures we’ve had, here are some things not to do. First, avoid non-contiguous power domains. When in doubt, align it with the logic hierarchy.
Second, don’t use clock gating on both ends of the clock. That ties you to specific libraries. Third, avoid partial retention within a power domain. Don’t try to retain some things but not all. It leads to weird behavior. And make sure that your power domains clocks and resets can be controlled externally. One other thing I would add is avoid test power or scan chains crossing multiple domains because that leads to interesting test challenges.

Experts At The Table: Billion-Gate Design Challenges

Thursday, March 17th, 2011

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: What are the big issues we need to contend with in billion-gate designs?
Rajendiran: Billion-gate designs are no longer a fantasy. We can do that at 28nm with a 20 x 20 mm chip. But just to put this in perspective, when we first sent a man to the moon they had three computers. The power and the memory those three had together was less than we have in a phone today. So the question you have to ask is are your really putting that to good use? And from a business perspective, will it work when it comes out and who can help across the business value chain?
Baker: We’re approaching billion-gate designs in the GPU or microprocessor area. In the SoC area, we’re approaching about 100 million gates. In the next generation, we’ll see SoCs with quad cores. Beyond that, there will need to be some very significant changes in what kinds of applications we can apply those to and how we’re going to deal with the power aspects. These will most likely be in the mobile market and we’re going to have to deal with system-level issues like verification, battery life, and power. From an EDA perspective we’re on track for capacity and for some of the turnaround time, but power will need some of the focus.
Throndson: Process migration hasn’t continued to scale forward. We hit a performance wall years ago. Power hasn’t scaled, either, as we reached some of the smaller geometries. Area is the one piece that is scaling better, which enables these large numbers of gates. The keys here are systems integration and multicore processing horsepower.
Browne: When you look at design costs for billion-gate designs you have to look at the markets that are going to drive them. The mobile market has enough volume to handle the cost of these types of designs. It also has a lot of parallelism and concurrency because there is a lot of functionality, and there are a lot of different use scenarios. Traditional EDA is scaling so it can take advantage of this—traditional designs partitioned at a chip boundary in a way that fits well with the system architecture. That’s probably where 80% of us will see business opportunities. The other 20% is where you take a design and partition it across two chips. Their bigger challenge is on the tool and the architecture side and the ability of semiconductor and system companies to manage that level of complexity. When you scale to four or eight cores, there’s a huge amount of parallelism and on-chip memory. The issue we see is how you get that right, and today the solution is a lot of subsystem design. LTE radios are a good example. We’re going to replace GSM radios with LTE radios. They’re going to be 15mm of area and have a half-dozen DSP cores, but it’s going to be a standalone system that allows you to do verification, have a known good block, and which is characterized with the others. But you can’t do this as a billion gates at the top level.
Janac: What I have in my house isn’t a personal computer. My phone is a personal computer, and it will have everything I need in terms of data, family photos, passwords and payment systems. It’s more like a supercomputer and it’s going to be the driver for the billion-gate design. You’ll need storage and the computing power to make this a true PC. There are four criteria for this. The first is processing power. We’re going to have to go to many cores, so you’ll need cache coherency to utilize those cores from a programming perspective. Another key is integration. How do you bring these cities of silicon together, which is where the communication system for the SoC becomes critical? You also need partitioning. As you build more and more functions, those functions have different dynamics. The modem has to go through SoC evaluation, so it’s on an 18-to-24 month cycle, whereas the efficient digital SoC people are going to be on an annual cycle. You have to decide whether you’re going to put it on one die or multiple dies, whether you can stack the functions, and whether you can mix processes in the same dies. The partitioning and the support for the partitioning are going to have to be there. The last part involves the cost of the hardware and software. The hardware cost has been increasing slowly but the software has been increasing rapidly. So how can you use the hardware and the parameters in the hardware to lower the cost of embedded software, if not the operating system?

LPE: Will an increase in granularity in designs, in terms of various core sizes, wider I/O and multiple cores and processors, affect how we build these devices?
Janac: We’re going to have tremendous power, but we’re not going to be able to afford to keep it all on. When you’re doing graphics the GPU will be on and the rest of it needs to be shut off. For audio it will be the same. You need to be able to manage turning on and off of this functionality. And in terms of 3D silicon, some of the high-power parts of the chip such as RF and some of the modems probably need to be on a different die and connected through wide I/O and TSVs (through-silicon vias). These things will need very intelligent and capable power architectures. While you have more transistors you’re still dealing with the same power budgets.

LPE: Won’t it be even tighter budgets? In 3D stacks, the dies are actually thinner?
Browne: The terminals are better in those packages, though. Even though the dies are thinner there is a lot better coefficient with the bonding. But it’s still a problem.
Throndson: But the power source is not scaling with the demands.
Browne: We’re seeing designs today with a dozen to 100 power domains. Those are at 40nm. We have customers starting 14nm designs now. You’re going to have to move to abstractions. There are 1,000 voltage domains. Somebody will have to have a product that generates the HAL (hardware abstraction layer) of software. We generate RTL. Generating RTL and C code are not that different. That’s where you’re going to see a lot of growth in the supply chain.
Rajendiran: If you look at 130nm, we used to have one type of transistor. Now we have multiple types of transistors and different process flavors, which add a level of complexity. You now have a whole bunch of different libraries, depending on which type of transistor you use. That’s an opportunity and a challenge. How are you going to pick and choose your implementation? Then you throw in a billion transistors, and you’re talking about putting it into a single SoC. It’s going to cost a lot of money and you don’t even know if you’re taking the right path to optimize power, performance and the market. And most of it is driven by consumer markets where each person will use a device differently. What you put on the chip affects battery, performance and even leakage. There are great opportunities, but it’s also more complex. It comes down to who can you partner with for the software, for planning the product, and for implementing the chip in hardware. And it really needs to be tied together so you hit the product introduction times.

Billion-Gate Chips

Wednesday, March 16th, 2011

Low-Power Engineering examines hurdles ranging from power to cost in billion-gate IC designs with Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma.

YouTube Preview Image

Power Panel: IP And Other Key Issues For Future Development

Thursday, February 10th, 2011

Low-Power Engineering chaired a DesignCon panel of low-power experts with Bhanu Kapoor, president of Mimasic; Kesava Talupuru, DV engineer at MIPS; Prapanna Tiwari, CAE manager at Synopsys, and Rob Aitken, an ARM Fellow. What follows are excerpts of their presentations and the panel discussion that followed.

Bhanu Kapoor: There are two components of power—dynamic and leakage. Dynamic is what gets used for some useful activity on a chip. Leakage is wasted power. To put this in perspective, at the 65nm technology node leakage power is about the same as dynamic power.

Dynamic power depends on the frequency, capacitance and supply voltage. Changing supply voltage makes a big difference.

Leakage has two components—sub-threshold and gate tunneling. The gate tunneling is addressed by high k/metal gate technology. The sub-threshold remains there and is growing exponentially. While it was not a factor at 130nm it has become a critical factor at 65nm and beyond. When you manage power, you have to manage dynamic power and leakage in active and standby mode.

You’ll need high voltage if you want to operate at high frequency. As such, you can reduce voltage if your application doesn’t need high performance. There’s a cubic effect on power consumption because of scaling voltage and frequency. In standby mode you want to completely switch off the supply. Power is a product of current and voltage. If you turn off the voltage you can eliminate most of the standby leakage.

There are various power management techniques to deal with leakage. (see fig. 1). Voltage is a key parameter to address power. It’s the use of voltage—and your design description language not allowing voltage to be an input—that have made design so difficult.

Fig. 1

You can’t be far away from what’s happening with process technology if you’re targeting your IP for future generations of chips. The process variation is a problem. You could be doing everything right, but process variations may lead to a leaky part. Unless you have controls such as adaptive body biasing to address leakage in those variations it’s going to be a potentially fatal factor.

There are different EDA tool flows and because of that we’ve got different formats for describing power. On top of that, soft IP is unqualified.

IP will be running in different power states, and there are different voltage levels for different portions of the chip. This information needs to be provided to SoC teams. Isolation and level shifting have to be taken into account. State retention is another technique. Bring-up current may be an issue. The spike in current could lead to voltage issues. For all of these reasons, if you’re a small IP vendor doing low-power design, life is very, very difficult.

Kesava Talupuru: There are a number of techniques you can use to reduce power.
With power gating you can shut off any of the pieces that are not in use to save on leakage power. With tree-root clock gating you can save dynamic power. With multi-voltage designs, for any part that does not require maximum frequency you can minimize dynamic power. And for multi-threshold libraries you can minimize the leakage power.

So what are the challenges for low-power verification? One is that traditional functional simulators are not power-aware. They assume that voltage is constant at zero or one. They cannot emulate protection gate behavior. They cannot model power ports and switches, and they cannot find structural errors. On top of that, the power-on and the power-down sequence checks are not adequate. They do not understand voltage transitions. When you do a reset they initialize the signal immediately, and when you power down the flops still retain value.

The verification environment should be power-aware. You need voltage-level aware simulation for dynamic voltage low-vdd standby techniques and you should simulate real silicon behavior. You should be able to model power switches and protection gates and check illegal power state transitions. And they also should support recovery sequences.

At MIPS we used three different techniques to deal with this. One is formal verification for the power manager unit, which gives you full control of logic, enables a small design size and provides formal proofs. A second is power-aware simulation for the entire system. This is useful for finding polarity isolation issues, retention and restore behavioral issues, and problems with power-up/power-down sequences. The static verification was basically lint checking. The tools can find any missing isolation cells or level shifters.

For the power manager we added hardware and software control. The software-related properties include read, write, hold and reset values. The hardware FSM properties included state transitions, illegal states, power up and power down sequences and hardware/software priorities.

We found a number of bugs using our flow (see fig. 2). Through formal, we found bugs in state transitions, illegal states and power up/power down sequence errors. We found errors in the power down where it needed to wait until all the transactions were completed. There also was a problem with the coherent to non-coherent switching. Using power-aware simulation we found missing isolation, some wrong isolation polarity and architectural bugs.

Fig. 2

FIG 2: KT-FINAL-DESIGNCON SLIDE 10

Power Bits: Feb. 4

Friday, February 4th, 2011

By Ed Sperling
AMD jumped into the low-power market with a new version of its Fusion chip for the tablet market, which the company claims can reduce energy consumption by 40%. That puts AMD squarely in competition with Intel’s Atom, ARM’s Cortex A9 and A15, a swath of MIPS chips aimed at Android, Apple’s A4 and probably some others that sources say will be produced for localized markets. AMD also rolled out an updated parallel processing development kit, which is absolutely essential for performance.

The U.S. Air Force is developing a new ultra low-power RF transceiver to preserve battery life in military sensors, including radar and infrared cameras. The less power drawn and the lighter these devices can be made, the smaller they can be designed and the longer they can be in the air.

Toumaz, a U.K.-based developer of low-power telemetry technologies, introduced an ultra low energy radio for wireless sensor networks. The company says the device can run at 1 volt using a single button-sized cell battery and consume less than 3mV of continuous power. That should make for some interesting application possibilities.

Power Bits: Jan. 21

Friday, January 21st, 2011

IBM and ARM are working together on a design platform that will scale down to 14nm. The focus ostensibly is low power and high performance, but the real target of this effort may prove to be much more far-reaching. With more computing being done by smart phones and crossover devices such as tablets–which are making inroads into what previously had been the exclusive domain of PCs or netbooks– this may be a huge opportunity for growth well beyond the smart phone’s footprint.

There is no shortage of companies vying for a piece of this action. IBM, which has largely been frozen out of the PC world after getting trounced first by Microsoft in software and then by Intel in PC processors, is looking at what could be an re-entry point into an even larger market. The combination of Intel and Microsoft works spectacularly well in devices that can be carried in a briefcase, but when it comes to extremely light portable devices they have not been effective competitors.

The bigger threat to Intel and Microsoft comes from the Google Android operating system and Apple’s iOS, and while Apple is functioning more like an integrated device manufacturer Google has a slew of companies willing to use its operating environment on their hardware. The battleground here is low power, which is where ARM and MIPS have carved out a space. Even Synopsys has thrown its hat in the ring with its ARC processor, which it inherited with its purchase of Virage Logic, using its vast array of tools as a differentiator.

But with IBM’s deep pockets and technical expertise, it’s anyone’s guess where this relationship with ARM could go. IBM has been burned more than once in the consumer electronics arena, and it’s uncertain whether it will produce chips for mass consumption or simply license them to other companies. But no matter which route it takes, this may be the clearest sign yet that power will be the big differentiator in portable electronics for years to come—and maybe even in those with a plug.

Power Bits: Nov. 12

Friday, November 12th, 2010

By Ed Sperling
The processor wars have started again, but with a completely different focus this time. It’s no longer all about performance. The real differentiator is power.

ARM, which has always been about lower power, began showing off a way to dramatically reduce power in its chips with a cache coherency layer. This is a big step forward in how software interfaces with the hardware because it can slash the number of calls made to the processor to check things. Think of this as a sophisticated scheduler for one or more cores as well as a bridge between the CPU and GPU.

ARM’s approach will certainly work well in its core market of mobile devices and set-top boxes, but it also is beginning to gain traction in the data center where lower power—particularly for Linux-based applications—are worth millions of dollars in power each year.

MIPS has a similar strategy with its own chips, not to mention its own coherent processing system for multiprocessing. Expect the two companies to compete in the same markets with roughly the same approaches, with wins trumpeted on both sides–particularly in the set-top, mobile and automotive worlds.

In the x86 world, AMD this week began showing off its next-generation low-power chips. The new Fusion Accelerated Processing Units combine the CPU and GPU onto the same die, which can significantly reduce the amount of power needed to drive signals off chip. With some slick caching technology, that also greatly reduces the overall power draw while also boosting performance.

Intel will respond early next year with its own lower power chips, which are code-named Oak Trail and expected to deliver up to eight hours of battery life.

Redefining Performance In Mobile Devices

Thursday, October 7th, 2010

By Ann Steffora Mutschler
While mobile product trends can be reliably unpredictable, devices are definitely moving towards supporting more software-based browsers, plug-ins for browsers, and downloaded codecs to go to browsers. This results in coming up with a best guess for performance targets. Throw power tradeoffs into the mix and things really start to get interesting.

In terms of defining performance today, one of the first considerations is the usage of the device. Josefina Hobbs, technical solutions architect in the low-power solutions group at Synopsys, said a lot of the challenge is understanding what the users are going to do with these things. “Even tougher, developers have to make some guesses, so of course the better representation they have a real usage model is going to help get them there. What it really boils down to is how is this thing going to be used and how well can you guesstimate how it’s going to be used. The minute you are off on your guesstimate you’re going to be negatively impacting your battery life.”

In terms of CPU considerations, Bill Orner, director of platform engineering at MIPS Technologies explained, “You have to start at the top and work your way down. What features are required for the device? From the features you’ll determine things like operating system that are necessary to fill those features. How much functionality trade-off do you want to do between things that are soft implementations versus dedicated hardware? Take, for example, an iPod where people want to watch video on it. Do you put in dedicated video decode hardware or are you going to expect that the CPU has to do the decode of all the compressed video? That has a massively significant impact on the CPU requirements.”

Another consideration is the fact that to achieve the same level of performance, implementing in hardware is almost certainly an order of magnitude more efficient in terms of power and potentially system cost because software doesn’t have to be developed for that particular function, saidf MIPS engineering director Darren Jones. “The tradeoff is that when you put it in hardware it’s built exactly once and you can’t upgrade it in the field. But it’s almost certainly much more efficient to put it in hardware.”

Latency is tied very closely with performance in mobile devices. “Can I keep up with this video stream? That is one key thing. The other key thing is not just power, which is how quickly you use your energy but energy itself: energy efficiency. If it’s a battery-powered device it’s not really how much power you use, it’s how much energy you use to finish your computation function. This is why MIPS decided to go the route of multithreading first and then multiprocessing. Multithreading gives a much more efficient use of the existing hardware whereas with a multiprocessor approach, you are replicating efficient hardware,” Jones said.

For example, a processor runs really well as long as it’s getting instructions and data from its caches. But when it gets a cache miss, especially on a mobile device where cost is certainly a factor, the memory subsystem tends to be pretty slow. It could take 100 or 200 CPU cycles to get the data from memory. The whole time the CPU is sitting doing nothing it’s probably burning power so it’s not really doing nothing, but it’s not doing anything good. Multithreading allows the chip, as soon as it gets that cache miss to switch to a different software thread whose data is in the cache. That means that while it’s waiting for those hundred cycles it’s actually getting some real work done.

Challenges in defining performance
The tradeoff between power and performance is the biggest challenge, according to Eyal Bergman, director of product marketing for CEVA. “We see that the same vendors, especially in wireless devices, give pretty much the same power budget that they gave a few years back and this is simply because battery technology has not progressed at the same pace that wireless technology has progressed and as applications have been developed. We need to put more functionality into a device that was originally designed to be powered by a battery. And we are using pretty much the same battery—maybe 10% or 20% better—but pretty much the same technology. And now we see that we need to do much more. It could be 10 times more when we talk about wireless communications.”

As power is directly related to voltage levels, moving to smaller manufacturing geometries helps here. He pointed out that the same chip is manufactured with today’s 40nm technology can be four times as fast as 10-year-old technology in terms of power.

Improving processor and overall system architecture is also a daunting challenge. “In the past, people used to run everything in CPUs or in other blocks. Now we see more optimized processors for communications, for multimedia and video, for graphics and when you move from a general processing unit to an optimized processor you can get a lot of power reduction because basically the processor is much more efficient for the target application. It can do more with less,” Bergman explained.

Companies that take this approach–MIPS, CEVA, ARM, among others—can offer flexibility in terms of having the ability to do a lot of things with software, although it is for a specific type of application. You cannot do video decoding for the wireless processor but you can do multi-standard communications processing very efficiently.

As such, system architects have a bigger challenge than ever. “When we talk with the system integrators that they want interfaces to the processors. What you have on the system is the power management unit that is becoming more complex than you want. To have interfaces to the system level gives you flexibility to shut down processors. For instance, you want to be able to lower the speed and the voltage of the processor per use case in order to decide which parts of the system need to be activated and which need to be deactivated. And all of these interfaces need to be defined very closely with the architecture in the early stages because once you integrate at the top and don’t have the interfaces you will limit the flexibility later on,” he said.

Paradigm shift in mobile
Given the dynamic nature of the mobile device market, Jones observed a paradigm shift occurring. “A few years ago when you got a phone it had a phone on it and maybe voicemail. Now the phone part is one-third or less of the functions and maybe holds less value because consumers desire to get iPads and smart phones. It used to be that the system designer would put as much functionality in [the device] as possible and it would take up all the power of the available CPU. Now we can give them more powerful CPUs, but the problem is if they used it their battery would last 15 minutes and that’s totally unacceptable–nobody is going to buy the iPhone if the battery only lasts 15 minutes.”

He noted that system designers such as Samsung and Apple now have to think of what feature set can be delivered with a certain battery size and energy budget. “So they are challenging us to give them the energy efficiency with good performance (meaning megahertz and delivered numbers of instruction), but they don’t want the bleeding edge because then they’ve got the 15-minute battery problem. They want something that’s good performance, but energy efficiency is actually the most important thing—more so than performance.”

Special Report: Using FPGAs For 3D Stacking

Thursday, June 10th, 2010

By Ed Sperling
Xilinx is developing a 3D architecture for its FPGAs and Actel has been approached by SoC makers to use its flash-based FPGA as a layer in a 3D IC stack. Both approaches could radically alter the fundamental equation about the tradeoffs between FPGAs and ASICs—particularly the power and performance overhead normally associated with programmable logic.

Xilinx declined to comment, but a half-dozen independent industry sources familiar with its efforts have confirmed the 3D development is well under way. Rich Kapusta, Actel’s vice president of marketing, applications and business development confirmed his company has been approached by SoC makers to use the company’s non-volatile flash-based FPGA as a layer in their 3D SoCs. He declined to comment further.

Getting 3D chips this kind of work done is anything but guaranteed. It’s complicated and there are lots of pitfalls, such as accessing RAM or logic across multiple die. Nevertheless, the implications of these developments are enormous. Because of the very regular and controlled structure of an FPGA, it is extremely well suited to defining where components can be placed on a chip. That makes it much easier to predict hot spots caused by putting two or more chips together—a problem that becomes particularly thorny when chip layers are developed by multiple vendors without knowledge of the thermal characteristics and layout of the other components.

3D stacking makes it far easier to bump up performance at advanced nodes using shorter wires while reducing power because it takes less power to achieve that performance over shorter distances. But getting this accomplished with SoCs has been particularly difficult. As a result, sources say the need for FPGA prototypes may change FPGAs into the end game rather than an in-between step.

Moreover, both moves also are expected to open huge markets, finally, for advanced EDA tools to work on complex FPGA designs, as well as third-party IP, processor cores from companies like ARM, MIPS and Virage Logic, and interconnect fabrics such as network on chip. They also can open up 3D to mainstream development. While companies such as IBM, Freescale, Qualcomm and Texas Instruments have been working on 3D chips for years—IBM started its R&D in this area almost a decade ago—most of that work has been a closely held secret because it is considered a competitive advantage for performance and power. FPGAs can quickly turn that into a less expensive option that may have more overhead than bottom-to-top 3D ASIC designs, but far less than 2D ASICs.

Issues in 3D
FPGAs can solve one of the biggest problems in 3D stacking, namely standards for placement of components. Without those standardized approaches there will likely be some ugly finger-pointing when two chips are put together.

“One of the problems that we see coming is who’s going to pay for a bad part,” said Andrew Yang, chairman and CEO of Apache Design Systems. “Testing may show that memory and logic are all good and that the die works, but when you put it together with another chip it may turn into a bad part. So you can say it’s good, and all your testing and verification may show that it is, but when it doesn’t work who pays?”

Yang said there is a need for far more analysis of the stacked die, measuring everything from heat and power to electrostatic discharge and signal integrity.

“We also need to understand what are the killer applications and what applications are not good for 3D,” he said. “The compelling value of 3D is shorter distance, which is the TSV promise. The challenge is in coupling chips together. In 2D you could shield high-speed signal transmissions. You get a cross-coupling effect with a TSV, so there is promise but there are also challenges.”

One of the big draws for 3D in general is the ability to re-use IP, which may come in the form of entire chips. That doesn’t work too well, however, when those chips were created for the best utilization of real estate on a 2D structure, where heat dissipation is relatively simple. In 3D, putting chips together can sandwich heat between die with no way to get it out of the chip.

“When you stack die you concentrate the heat,” said Carey Robertson, product marketing director for Calibre Design Solutions at Mentor Graphics. “That affects chip reliability, either short-term or long-term because they’re operating at temperatures they’re not expected to operate at. Circuits perform differently at 100C or 125C or 130C. At 130C it may affect the core, the timing, the signal integrity.”

While the overall heat of a chip hasn’t changed much, the more tightly everything is packed together the more difficult it is to cool. “When you stack them, you concentrate that heat even more,” Robertson said. “Potentially, when you move the wires closer together you can reduce resistance and IR drop. There would be a decrease in power and heat, but we have not seen enough of that yet to draw that conclusion.”

Under the covers, there are two technical ways to make this all possible, according to an ARM insider. “The first is for TSVs at similar pitch to solder bumps (about 50nm). This expands the capability of FPGAs and creates what amounts to multi-FPGA chips, as well as allowing for better-integrated flash, DRAM, and high-performance logic. The limited inter-chip bandwidth and power delivery, along with thermal issues, keep this as more of a cost dynamic – an extension to existing SiP approaches,” said the source. “The second answer is for high-density future TSVs, at a pitch of less than 5nm. These increase inter-chip bandwidth by a factor of 100 over the first solution and allow for some game-changing capability, including wide word high-speed off-chip memory access, combined FPGA/logic solutions, multi-die FPGA (greatly increased gate count) and so on. The reconfigurable aspect of FPGAs may also help solve the test and fault tolerance issues that are a very significant impediment to making tight pitch TSVs viable. Neither of these eliminates the crossover argument on power and performance, but they both have the potential to move it.”

Programming the future

Whether this effort ultimately succeeds is anyone’s guess. What is known is that a lot of resources are being marshaled into 3D stacking and a lot of hopes are being pinned on the back of efforts such as those from Xilinx and Actel’s partners.

Tom Quan, deputy director of design methodology at TSMC, said the great advantage of FPGAs is that they are very regular. “You can predict the thermal profile much better than with a mixed-signal SoC. Analog can be all over the map. But while the base array may be regular, in another corner of the chip you might have a USB so the outside of the chip might be hotter than the inside.”

Still, there was a lot of hype behind multi-chip modules in the 1990s and so far they have failed to materialize as a popular solution, largely because of cost. That could change as double patterning becomes the norm at 22/20nm and standard production costs rise, but visibility remains limited at that node.

At the very least, the moves by FPGA players are worth tracking, and a lot of companies are predicting major changes if these scenarios work. There are reasons FPGAs may hold more promise than multi-vendor or multi-generational SoCs. But there are still a lot of challenges to resolve before the total cost of development is known

Next Page »