Posts Tagged ‘MIPS’

Next Page »

Experts At The Table: Multi-Core And Many-Core

Thursday, August 11th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: How does cloud computing change the need for multicore and many-core processors?
Sherwani: Cloud architectures will evolve differently from mobile architectures. They will be homogeneous 8-, 16- and 32-core architectures. They knows a lot about what you are storing. You can put a lot of intelligence into what you’re storing, which is not the case in a mobile device.

LPE: So what does that mean for the mobile devices taking advantage of it?
Sherwani: It can certainly make mobile devices more efficient. You can store a lot more on the mobile devices. You can do a lot of streaming.
Martin: The application cloud interaction may change in character. People will write somewhat different apps in the future that will take advantage of what the cloud has to offer. This is why you’ll see cobwebs on the desktop in the future because no one is very interested in it anymore.
Sherwani: And if you look at video, with the cloud and a good wireless connection you don’t have to store the video. Video cameras will become a lot less expensive.
McDermott: This should be put into context. It’s amazing that people are so excited about a database. That’s all it is. I believe the vision for the mobile device is that you have access to all the data, and you selectively choose how to expose it. The browsing experience is different. You don’t try to replicate the desktop experience on a smaller screen. It’s a given. You take the appropriate content and you display it in a way that’s easiest to digest. I think the hardware on the mobile device will become smart enough to selectively show you the piece that you need on your mobile device. You don’t need an entire map. You just need to know where you are.

LPE: What’s interesting about databases, though, is that they’re one of the very few applications that really can do true parallel processing and scale effectively.
Sherwani: I’ve been saying for the last two years that we should stop giving people content. In five years all the content will be available. If you’re a mechanical engineer, everything you need will be on the Web. What we need to do, though, is teach people how to do something useful. This is the same thing with mobile devices. Whatever device will be useful will be the one that can quickly filter through what you’re looking for to get something done. It’s not about storing more information. Cloud brings that opportunity to people, devices and things. Our view of expertise will change. It won’t matter if you’re an electrical engineer. It’s whether you can get a task or series of tasks done. That will be more important than a Ph.D. We are 10 years from that, but this is how people of the next generation will think.

LPE: What you’re talking about is data mining for the masses?
Sherwani: Yes.
Martin: Before we get too carried away, there are a couple of issues that really need to be solved in this cloud paradigm. We do need to think a lot about privacy, security, and the ability of the infrastructure—both wired and wireless—to deliver all of this content off the cloud and onto the sea of mobile devices. We all know about the experiences of certain smart phones overloading networks and they’re still trying to improve the quality of the network. The wired infrastructure is not fault free. Security and privacy worry me more. If you upload all your data into some big infrastructure, you want your data secured.
Rohatgi: That’s the weakest link. Everybody’s pushing down this path. What worries me is the security and reliability. There are a ton of issues that need to be resolved. Creating a smart infrastructure for data mining can be done today. On the mobile side, there are probably some advances necessary to improve battery life, which is the No. 1 complaint I hear today. But the weakest links we hit are the communications channel, security, privacy and reliability. If those can be resolved then we can progress.
Martin: The technologies we’re all involved with are going to help in a big way. It just requires a bit of mobilization to focus on those issues.
McDermott: This reminds me of where we were with cell phones years ago when the processor went through certification with the carrier. The consumer doesn’t see all the certification on the network. The carrier loves new features. It’s more traffic for their store. It brings in a new wave of users. What they don’t want to see is something that disrupts their infrastructure. For the engineer, the certification is really intense and the field trials are difficult. The cell phone industry has to show a partition that you can certify your baseband and your protocol stack and that has to be isolated from other activity. That underlying security infrastructure is built into the certification. I think we’ll see that extended upward through commercial transactions to having trusted processes and transactions.

LPE: Will cores all be homogeneous or heterogeneous, and will some of them be virtualized?
Sherwani: All of the above. There will be homogeneous cores, heterogeneous cores and there will be virtualization. They all solve different problems. You need virtualization in data centers.

LPE: But will you need virtualization on your smart phone?
Rohatgi: We’re starting to see some of that. I don’t think the operating system wars are dead. And at the end of the day, there is some value to keeping RTOS access to legacy hardware and a high-level operating system like Android or Windows or IOS. From a security angle, it all depends on the use case. The mobile guys are really scared of virtualization of a single processor that has access to all memory. They want separate memory and separate everything.

LPE: This is similar to devices that have a partition between what’s used at home and at the office, right?
Rohatgi: Yes. It’s the same problem. And this almost ties into virtualization. On the privacy side, there isn’t a well-defined security layer with NFC (Near Field Communications Forum) and they’re talking about mobile payments. If you power on an Android phone and shut off all networking then your maps go haywire. Why? Because there’s a back channel that goes to some cloud that helps triangulate where you are. That information is stored to help applications of the future. I’m surprised people aren’t bothered by this. But to return to the question, we’re starting to see some effort down the path of virtualization even though it’s not widespread yet.
Martin: You won’t see virtualization down to the metal. In the dataplane layers it’s nice that processors can emulate other processors effectively, but close to the metal you want extreme efficiency and high performance.
Neifert: And that’s where I see the problem with virtualization. It’s the power. Virtualization is nice, but it’s an abstraction away, which is a power loss. At that point you need heterogeneous processing.
Rohatgi: Transmeta, about nine years ago when they started doing abstractions to hardware, had power numbers that were way down. It’s too bad that green energy wasn’t something that was important then. Still, the genesis of the Atom processor was entirely because of Transmeta..
Sherwani: A typical Bluetooth radio takes about 32 milliwatts of active power. At 65nm we have a Bluetooth radio that only uses 3.2 milliwatts. And there is a design on the board that will take it below 1 milliwatt. There are a bunch of engineers getting excited because over the last 100 years the basic design of a radio has not changed. What Marconi designed is essentially the same as we have today. But when you scale down the power needs to go down. It’s amazing how much lower you can go.
Rohatgi: There’s the other side of this, too. Battery technology has not evolved as much as we would like. For the analog components, it’s the switching characteristics that are governing it. That’s where you’re seeing a lot more intelligence. If you were to look at the power profiles of a mobile device, LEDs and LCDs were supposed to be the promise for low power. That hasn’t worked out. There are still 250 milliwatt drivers. The radio is probably No. 2 on the list after that.
McDermott: People’s expectations were that a screen would be a certain pixel density. Today that needs to be super high-definition. It’s beyond high-def.

LPE: So will we see more cores in the future or have we maxed out?
McDermott: As a programmer, how are you going to keep track of 100 cores? How are you going to program that intelligently? Either it’s going to be some array a programmer can visualize, or it’s going to be three or four very solid cores and let other cores do things like Bluetooth. You can’t keep 100 threads in your mind.
Rohatgi: There’s a limit to this. If you look at the desktop space, in 2006 when Intel began heading out on this multicore approach they found that success wasn’t nearly as fast as they thought. There’s probably a limit on mobile devices, too.
Sherwani: We did all this in the 1980s. nCube used to have a 16-core and 32-core machine. It works great up to 8 cores, but after that you lose it.
Martin: If you are trying to program a concurrent application and split it into different threads, there are inherent limits. Some very specialized applications may be very concurrent, but most are not.
Neifert: The programming model has a human in the center, and humans can only process so much. Until the fundamental programming model changes, you won’t see much advancement.

Experts At The Table: Multi-Core And Many-Core

Friday, July 29th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Is software taking advantage of the hardware in a power-efficient way?
Rohatgi: Yes, and the ultimate example of that is the Android operating system. Even though it relies on Linux there are on-demand and five levels built into Linux that controls at the software level the CPU registers or SoC registers to shut down power. You’re already seeing that at the operating-system level.
Martin: It depends upon which software you’re talking about. At the OS level, where lots of apps are running, there may be commoditization happening. Down at the dataplane, where people use application-specific processors, you can argue that’s the infrastructure. People want extreme power efficiency and reliable continuously executing functionality. That’s the place where heterogeneous multiple processors really shine. It’s almost an infrastructure layer in a mobile device. So you see different solutions depending on what level of the device you’re talking about. We see a drive to more heterogeneity, too. Baseband wireless infrastructure works better with heterogeneous processors than trying to shove that onto a multicore device.
Neifert: That’s certainly what we’re seeing in our customer base. They want one processor to run the modem subsystem or the WiFi and partition that off. The last thing you want to do is wake the application processor all the time. The application processors are getting more complex so you can talk and play games at the same time and surf the Web. The application processor has to handle all of that. The application processor may be power efficient, but not as power efficient as one that just runs the radio or data transfer.

LPE: Is it better to actually design a device with multiple processors or a single multicore processor?
Sherwani: When I was at Intel we believed it was the best processor ever developed. I never thought I would see ARM and x86 processors on the same device. We are not that far away right now—and I’m talking about having them on a single chip. Or it may be a MIPS or Tensilica core. Such processors will exist. We are very efficient these days about using power islands. We can put six or eight processors on a chip and we can put them to sleep when they’re not being used.

LPE: Is it more difficult to verify them?
Sherwani: The verification nightmare is growing exponentially, and it’s not clear to me how we will be doing verification five years from now. At the implementation level, verification is becoming a bigger and bigger piece. But it’s more of an architecture question than whether you’re using multicore or many cores.
Martin: This whole approach tends to lead to a more compositional design style where you’re composing well-understood systems. What you need to do is limit the interactions between them to a relatively high level of abstraction or control. You verify significantly each subsystem and then you verify without having a great deal of interaction between the subsystems.
Sherwani: It’s amazing that on a big chip people don’t do flop-to-flop timing on a block. This is a situation that would never happen in software between subroutines, but it happens all the time in hardware. In hardware we have not reached a maturity level where I take care of my block and you take care of your block. We have timing paths going to two blocks and you cannot time it unless you do the timing and verification together.
Neifert: I’ve got customers that will spend months validating their processor, fabric, memory and data path, throwing out all the various options on there and running that. That could be a single-core processor reaching out to memory, and they’ll spend a lot of time optimizing that. Now throw in one other master accessing the same memory and everything goes out the window because of all the different permutations when these things talk to each other. It now blows up exponentially. The nice thing about a multicore approach is that you’ve handed off a lot of that task to the processor guys and hope that they’ve done it properly. It may not be the optimal use for your application, but pushing the problem off to an IP provider and a multicore solution is what a lot of our customers are doing.

LPE: What’s the best way to take advantage of cores? Do you do it with Wide I/O or through multicore and a standard bus?
Sherwani: If you look at where Micron is going with this, the whole interface has been changed. The memory becomes a lot more intelligent instead of a dumb storage. You will be able to ask memory to do certain tasks. Processor people have tried to make memory as dumb as possible in order to commoditize it. All the value comes from the processor side. But balancing would be better so you can offload things. You can combine flash into the most cost-effective memory. Instead of saying, ‘Give me byte No. 7,’ you can say, ‘I need this piece of information.’ It’s a lot more power-efficient to do it that way.
McDermott: It’s quality of service. You’re not just making a data request. You’re saying, ‘I need high bandwidth or high efficiency or low latency.’ A processor may need only a small amount of data, but it may need it very efficiently and very fast. With video you need high bandwidth that is very predictable. Having graphics integrated is one way to go. Unless you have a view of the fabric, the quality of service and the end power engine it’s going to be very hard to engineer a one-point solution.
Martin: With a compositional approach, you may have big memories and then a lot of small distributed memories to keep data close to the area where it is being processed. And maybe you need some intelligent abstractions on things like DMA (direct memory access). That would give programmers more assistance in managing the data flow and data interaction so things will move out of central memory into local memory before they’re needed. That’s a different programming style. We need more flexibility in how hardware and software developers can compose these memory systems together.
Sherwani: If memory is knowledgeable about what is stored inside, it can give you service of the highest level. Right now you can’t do that. The attitude has been, ‘I have a board and I have a DIMM and I want this DIMM to be as low cost as possible.’ That approach has led us down this path. If you’re designing a microprocessor of any kind, it puts a lot of burden on the microprocessor to do all these things with memory. Eventually you will see memory microprocessors—storage with a processor on it—that can gate what is being stored on it. That is a new area, though, and I don’t think much has been done so far.
Rohatgi: In some respects this is already happening. If you think about cache controllers over the last 30 years, this is where you’ve seen a massive improvement. It isn’t user-level aware. It’s bit-level aware. And if your memory isn’t fragmented it works. Or in a multicore design, a coherency module is also very well aware of what it needs to do to keep synchronization between processors. I like the visionary statement of making it user-focused.
Neifert: If you look at the various SoCs on the market, they may use processors from ARM, MIPS and Tensilica, but a large number of them are still doing their own memory controllers because that’s a place to differentiate their design. There are more memory controllers coming out of Synopsys and Cadence, but in large part the bleeding-edge SoCs are still designing their own.
Sherwani: But you can go a lot further.
McDermott: There’s a big difference if you can optimize a path for video and have some pre-fetch algorithm. That may not apply to every chip. But in a custom design, you can partition as needed. When you define your coherency space you need to make them aware of these choices. It’s not just an arbitrary memory spec. You need to make them aware of how to use it.
Martin: That should lead to some opportunities for much more sophisticated memory control, and the kinds of data flows and accesses that people really want to do. That can be reflected in configurable memory IP. I’m not sure how rapidly that’s happening, but there are moves in that direction.
Sherwani: For the work we are doing with the [Micron] Hybrid Memory Cube, there’s a lot of excitement around that space. A completely different level of system design is possible with that kind of hybrid model.

Experts At The Table: Multi-Core And Many-Core

Thursday, July 21st, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Computers aren’t getting the power/performance boost today from multiple cores because the software can’t take advantage of them. How do we fix that?
Martin: Your computer isn’t a place where all the advanced design techniques are used. You have to look at battery-powered, cordless devices to look at the places where people use the most advanced design techniques. There they very often will have specialized application processors for different parts of the applications they want to run on those devices. Those processors are designed to be energy-efficient and to efficiently use battery power, and they probably do work better from one generation to the next—except for the case where they may throw on additional general purpose processors and don’t take advantage of energy consumption. You have to get a big distinction between multiple processors that are application specific vs. general-purpose processors that do not offer efficiency or better performance.
Rohatgi: Once the Intel-AMD megahertz wars ended people started heading down a different dimension of multicore. Back then they believed that changing the software ecosystem so that specific software or systems could be written to take advantage of multi-core, multi-thread, multiple processor designs would actually work. We’ve seen it work in many cases. You can reduce the latency when you’re executing a certain process or multiple processes. Another twist to this paradigm is people use core islands. The operating system may run on one core while another core is used for acceleration. Some people define that as multi-core, and that has been very successful because you can partition between a media processor engine, a video processor engine and a graphics processor engine. In terms of power consumption, that whole element needs to be pieced into this picture. When it comes to embedded SoC design vs. desktop design, those are very different when it comes to power consumption. That element hasn’t been worked through very cleanly on the desktop side, where suddenly you need 800-watt power supplies.
Neifert: The overall user experience that people have when interacting with a device has moved from the underlying hardware to the software. The emphasis has shifted to enhance the user experience. Opening a window on your desktop used to be simple. Now there’s shading and fancy graphics, so the same window that used to come up in 5 instructions may now take 500. It looks a lot nicer and in some cases that changes the user experience. But from the processing side, the focus stopped being on single-thread performance as the megahertz started burning up too much power. They branched out into multicore to solve that, but changing the software to accommodate that has been a big struggle. Changing the hardware to isolate that properly has been a struggle, too. Some of the processing that been done on computers is difficult to migrate over to mobile devices. A lot of the innovation on the desktop is now taking place in the embedded space. If you want to see the leading-edge design techniques, that is where you have to look.
McDermott: In the mobile area low power is associated with the battery life and the key to the user experience is maintaining functionality throughout a working day. We’ve gotten to that point. Now we’re engineering more productivity. There are more features you can run, more capabilities, more graphics, but still within that working day. Now what we’re seeing is low power is key to other markets. Data centers are predicted over the next few years to rival the airline industry for energy consumption. Cloud computing will lower the power a node, but that energy is still being used somewhere even though it’s shifted. What cloud changes is that if you run an application on one device and shift to a different device it’s no big deal. It takes advantage of the underlying computing architecture. There also may be a hierarchy of operating systems to deal with it, depending on the device.
Sherwani: We got very interested in how power relates to multiprocessing. If you are trying to predict power within a watt or two that’s no big deal. If you are trying to predict power within a milliwatt, that’s very difficult. We thought that by looking at implementation of the netlist we could predict power. That turned out to be not the case. Then we tried system-level design. That doesn’t work. We finally came to the conclusion that you have to have a user model. We needed a human model—a businessman, a lawyer, a student—and then analyze what they did during the day. Then we had to convert that into system level and then RTL level. This takes us far from what Open-Silicon does as a company, but we have found this the only way to accurately predict power. These kinds of human models don’t exist. We created two models of two types of people who use it. Then we started recording real human beings and calculating the model against them. Good models don’t exist if you want to accurately predict power.

LPE: Are we better off with many cores or multiple processors?
Martin: Multiple heterogeneous processors are the way to go, particularly in the mobile domain. With clusters of servers you may have many homogeneous tasks you want to map. The desktop is a bit of the orphan here. If you move to cloud computing and the highly mobile devices and ever-smarter phones, you wonder if people will worry about even having a tethered desktop. That means the innovation may be in the big server farms and the mobile devices, and the desktop may gather dust.
Neifert: It will be replaced by a docking station that you plug your mobile device into.
Martin: That’s right. Or as we have seen, some companies are combining mobile devices and a laptop together. The use cases are extremely interesting because there is no single use case. For a mobile device that has an advanced graphics processor, the game player may burn up battery by hammering that all the time. The music lover may be using MP3 decoding and get significantly longer time out of the battery. That drives significantly different use models and processor choices.
Rohatgi: There are a lot of different vertical markets. It ranges from digital still cameras to anything with a battery. There is a use case for multiple processors. Networking and cloud computing are very large markets. In the embedded space, what has happened is there are a lot of people in the SoC space. The hardware itself is heavily commoditizing. Even the operating system is commoditizing. The differentiation is how you pick and choose your IP. If it comes down to cost in a mobile phone, from the top up they don’t have a feature list or a use model. The discussion begins with, ‘What can you fit in a 7 x 7?’ Based on something like that, what kind of IP can you fit in there and still have a useful device? In the volume mobile phone market, the direction is to shrink the die as small as possible. It may be a 6 x 6 or a 5 x 5. In that case, I would choose multicore rather than multiple processors.
McDermott: In cell phones the issue used to be standby and talk time. People could self control that. If you talk more your battery goes down. People are starting to experience that if you want to play games you have to deal with this. We’re starting to deal with the apps developers. You used to have specialized OSes and applications. With the proliferation of open source you don’t know what could be running on there. It can run any app. We’re reaching out to the app developer to write code that is attentive to the power effects. There is an amazing learning curve through people writing a good game experience in a power budget that’s acceptable. You need to get the apps to be power-efficient.

Power Bits: Why Set-Top Boxes Are Energy Hogs

Thursday, July 21st, 2011

By Ed Sperling
For years, semiconductors have been getting more efficient. Desktop computers that used to peak out at 250 watts are now down to the 30- to 60-watt range. But set-top boxes, those inconspicuous little boxes that connect televisions to services provided by cable companies can consume even more.

The problem has become bad enough that the National Resources Defense Council issued a report last month saying digital video recorders, cable and other pay-TV boxes were costing U.S. consumers $3 billion a year.

So what went wrong? The answer actually has nothing to do with the semiconductors inside the boxes. It’s the back-end systems from the companies that offer pay-TV services—the use model into which chip designers had no visibility.

“The problem is that the MSO (multi-system operator) is querying the boxes regularly, which means they’re also spinning up the hard drives,” said Paolo Masini, principal architect for digital home at MIPS. “Over the long term, this problem will go away because functionality will be absorbed into the residential gateway. But in the short-term—meaning over the next few years—there will be a move of all these services into the cloud. That will offer huge power savings.”

How much savings? The starting target is 70%, and that’s the easy stuff. Add in more power-saving features and it can go significantly higher.

“There is a lot of synergy here with gaming consoles, too,” said Masini. “The companies making these devices have introduced reduced power versions, but they’re only slightly better. They’re now getting a lot of pressure to decrease their energy consumption, as well. The blocks and peripherals on set-top boxes and gaming systems are similar, and they use similar chips.”

Several companies compete in the set-top box chip market. MIPS is the current market leader, but ARM is competing with similar performance and power credentials. Intel has made some inroads, as well, but its primary focus is CPU and graphics performance rather than efficiency.

Design For Power Methodology

Thursday, July 21st, 2011

By Ann Steffora Mutschler
It is rare to find an advanced chip today that has not been designed considering power from the very earliest point. In fact, it is safe to say that power is the No. 1 priority, or a close No. 2.

But to achieve the highest performance for a low-power design, a design-for-power methodology is necessary, comprised of the capabilities to implement power in the most efficient way through the design flow.

If power is not implemented in the most efficient way, meaning if it isn’t optimized and reduced to the bare minimum, then what’s the purpose of designing it?

“Whatever the power ends up becoming, it is what it is, and in many traditional designs this has been the approach,” said Shabtay Matalon, ESL market development manager at Mentor Graphics. “There wasn’t in mind an objective to say, ‘Let me design it such that the power will be minimized.’ The power conservation and reducing the power is the primary objective.”

Most tools that address power today begin at the RTL, but there is an increasing consensus that this may not be early enough. “The percentage of gates or transistors in a design that can be exercised at the same time is shrinking and shrinking,” said Matalon. “On one hand we get this huge capacity to put billions of transistors on silicon. On the other hand, the power is [holding back] the percentage of the resources that we put on the chip that can be exercised. There is a need for this intelligence that is usually in the software. I’m sorry to offend anybody on the hardware side, but the intelligence is really in the software that is running the application—the software that understands the application context to play a role in reducing the power in the environment. Obviously, the hardware needs to be below the infrastructure and that’s why RTL might be too late.”

Design-for-power is not just analysis at the RTL. It is design for optimizing power. Some define a design-for-power methodology as having a gate-level representation, running some analysis, then predicting the power. Predicting the power accurately at RTL is highly questionable, though, unless you really run the device in the same operating conditions that you will actually use it.

“But there is not even a doubt that when you are doing this analysis at RTL down, that you lost your possibility to optimize,” said Matalon. “Design-for-power is not just analysis. It is the reduction of power.”

Example of a power methodology. (Source: Mentor Graphics)

Larry Hudepohl, VP of hardware engineering at MIPS, agrees. He said the importance of power as a design metric is one of the first and foremost criteria, not just an afterthought when putting the final chip together. “In the same way that the analysis of performance has moved much earlier in the design flow in advance of RTL, I see that same trend happening on the power side too. Earlier estimation of power, especially in a complex SoC where there are multiple devices driving multiple complex interfaces so the modeling of that—the power dissipation characteristics of the full chip under different operating conditions, under different power management modes—can really be assisted by modeling in a stage earlier than RTL.”

On the other hand, Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions, stressed that RTL is indeed early enough for a DFP (design-for-power) methodology. “Design for power must be done at the design level of abstraction, and for hardware design this means RTL. Anything after RTL is either automatic optimization (e.g. synthesis) or implementation, which in the case of the digital flow is also automated (i.e. place and route).”

Apache’s view is that a key part of a DFP methodology is power debug and power efficiency analysis, and the benefit of doing these at the RTL is a significant improvement in productivity (and corresponding turnaround time reduction) compared to traditional gate-level flows.

The cost of power-saving techniques
When designing an SoC or a multicore platform, there are a lot of architecture decisions that are clearly set before RTL is written and which must be considered in a design-for-power methodology, said Pete Hardee, director of solutions marketing at Cadence. “There are a lot of decisions that affect power that are already set in concrete before you are coding in RTL. Usually when a device like this is being designed, there is a lot of reuse going on. The rule of thumb is typically 70% to 80% re-use and 20% to 30% new design.”

There are some blocks that are being re-used that have already been characterized for power or known from previous use, or that can be recalculated if moving a design into a new node.

“What needs looking at is the cost of implementing the power saving techniques,” said Hardee. “We’ve got various techniques going on—power shut-off, including state retention. Some people call that power gating. What we are doing is splitting the design into various power domains and doing different things with those power domains, either switching them off and working out which registers need to hold value to come back on quicker or running from multiple supply voltages. There is a cost in implementing all of those techniques. Every time I split something into power domains, for every signal that crosses a power domain that I’m switching differently I need either isolation or level shifters or both. For every register that I need to hold a value during power off, I need a state retention register in there, which is roughly double the size of a regular register. Also, in normal operating mode, it takes greater power, there’s greater leakage due to the state retention registers compared with the normal registers. All of these decisions — how many power domains I’m splitting up into, how I’m switching those power domains — they have a cost and that cost can be assessed before RTL.”

Source: Cadence

Today, those costs are typically tracked by the power architect in a large Excel spreadsheet that contains all of the components that will be re-used in the platform. The architect tries to work out how many components need to be added for the power scheme in the new design, which are generally pre-RTL decisions. Of course, in a spreadsheet it is very difficult to work out for all of the combinations of domains being on and off as to what’s happening.

In lieu of the spreadsheet approach, there are a small number of commercial modeling frameworks available today from Cadence, Mentor and Docea Power, a French start-up.
This is also where things get interesting. A modeling framework captures the static power techniques, which need to be balanced with some kind of dynamic idea.

Above RTL that means simulation, Hardee pointed out. “This is where virtual platforms come into play and allow engineering teams to start exploring with running some software with a model of the platform and start to bring in a time element…the closer to the real operating environment, the better idea the simulation can give for whether the power architecture is sufficient or if changes need to be made to the power specification. Above RTL, I think most people’s goal is to relatively rank various candidate architectures. It’s a relative thing. What you are really trying to do as a power architect is at least get the ranking right to know if one architecture is better or worse compared with another.”

Obviously, the RTL tools can’t be abandoned because that’s where a lot of the detail design is done.

“It’s where most of the microarchitectures for the blocks being implemented are decided during the RTL coding phase,” Hardee said. “High-level synthesis is interesting because it can allow you to do better exploration of those microarchitectures before RTL. As soon as you start coding, you fix the microarchitectures. But RTL is still a very critical area. It’s really the first abstraction level that you can accurately verify the power architecture.”

The future of DFP
Looking at design-for-power from a high level, Cary Chin, director of technical marketing for low power solutions at Synopsys observed, “Advanced low-power optimization has come a long way in the past few years but clearly, we’re not done. There is much more to be done at a high level, looking at new methodologies and better ways of optimizing for power. It’s been a theory of mine that as we go forward, power becomes one of these things that we are designing around and it’s really something that is going to be a requirement and one of the fundamental keys to design going forward. I think we’ll see methodologies evolve even more going forward where, from the very high level, one of the main things you’ll want to consider is going to be power all the way through the design flow.”

And in future designs, the alternatives may be much less attractive.

Power Bits: May 27

Friday, May 27th, 2011

By Ed Sperling

Going Vertical
Now that everyone has gotten the energy-efficiency message down pretty well, the next step is to apply that to specific markets. That’s beginning to happen, too.

A leaked product roadmap from AMD shows machines with all-day battery life and a focus on everything from ultra-mobile notebooks to tablets.

Intel is refining its own message to go after specific markets, as well. The company has created a small-business cloud platform on a pay-as-you-go basis. Given the amount of energy consumed by underutilized servers, this is a huge efficiency play—as well as a way of Intel sidestepping the PC OEM for its share of the profits. 98

Companies such as Tensilica, meanwhile, have been focused heavily on low-power communications, most recently in the LTE and LTE Advanced space. And ARM and MIPS have been divvying up targeting a variety of specific markets. ARM has been focused on mobile devices and a slew of vertical applications ranging from medical devices to other consumer electronics is well documented. Likewise, MIPS has focused on set-top boxes and Android-based devices.

Lowering Carbon Dioxide
The International Energy Agency issued a report today that carbon dioxide emissions must be eliminated from electricity generation to limit the rise of global temperature to 2 degrees Celsius.

The report noted that total output of electricity and heat grew 55% between 1990 and 2008, but corresponding CO2 emissions grew 64.5% in the same period. The report recommends greater efficiency in lighting, heating, cooling and information technology, and powering with renewable sources of energy, nuclear, and carbon capture and storage.

This is good news for the electronics industry, in general, and the low-power engineering portion in particular.

The Impact Of Triple Play

Thursday, May 12th, 2011

By Ann Steffora Mutschler
Not so long ago there were multiple networks that supported different kinds of traffic—a telecommunications network based on high-reliability protocols, the Internet for burst-centric data traffic and video distribution networks.

From the consumer standpoint that was highly inefficient. Managing three subscriptions from three service providers was unnecessary, which is why the concept of bundled services a single broadband line has become increasingly popular. This so-call triple play
is focused on efficiency and flexibility, and over time these services have been layered on top of Internet Protocol using a single broadband connection.

“The implications for this is that when you send all of this data on a single broadband, you need to be able to manage the priorities of traffic and be able to provision the network to give the guy that’s paying extra for high-speed Internet the quality of service that he is expecting to receive,” noted Del Rodillas, director of marketing for networking at MIPS. Obviously the silicon at the heart of the delivery devices needs to be pretty smart to accommodate all of this.

Designing for a triple-play environment does dictate certain specifications. “In the absence of different types of traffic you have very predictable traffic profiles,” he explained. “For instance, if I say I’m only going to get voice, I will ensure that every 125 microseconds that I’m processing a voice channel and I can support x number of channels. If you have predictable traffic, you can provision your design to assume that, and you don’t have to plan for highs and lows in terms of traffic.”

Rodillas also pointed out that once you have this packet traffic that’s very bursty in nature you need to provide some headroom in your design and the ability to prioritize. “Typically what happens when you implement a triple play design, you need to put in some traffic management/packet classification. If you didn’t have this, it’s pretty straightforward. It’s called framing the data. Bringing in video, voice and voice over IP, your design needs to be a lot smarter and that’s really where processors come in. Processors give the ability to identify what kind of traffic, what kind of quality of service is assigned to that packet and the rest of the hardware can take action based on the intelligence provided by the processors.”

Along these lines, Steve Roddy, VP of marketing at Tensilica, said that in triple play applications there are generally two classes of designs: wired access (DSL modems, powerline modems) and wireless terminals (LPE designs, etc.). “Both classes of products have high-speed modem, packetized data, IP protocol, various classes of service for audio, video, data, etc. And all of them seem to have the same philosophical approach to the architecture that is the modem itself is obviously integrated–there is one pipe. Speeds there keep increasing and they attack the other elements in the design separately.”

In other words there will be a separate control subsystem, audio subsystem, video subsystem, packet forwarding/routing data subsystem. “If a video stream comes in it is steered or directed into the video subsystem, which is a pretty much standalone entity unto itself so the integration challenge seems to be solved by the divide and conquer approach of these various specialized subsystems,” he explained. For the most part these specialized subsystems are a big benefit to power.

The use of multithreading in processors allows different types of traffic to be directed to different threads, while schedulers inside cores work with the processors to set priorities to achieve the most optimized and efficient use of the processor for the task at hand.

Power planning for mobile vs. connected home

Planning for power looks different depending on the target platform, such as a smartphone versus a set-top box. “In bigger equipment, you have a little bit more of a power budget so the ability to run the core at very high frequency is an option,” MIPS’ Rodillas said. “The ability to run multiple cores is also an option. But once you get into the mobile environment, you need to start being more wary of throwing bandwidth or processing power at the problem.”

Instead of running one core very fast, one option in a mobile application would be to run several cores with the frequency scaled down, which reduces power consumption. Another approach would be to utilize virtual processors, which are essentially threads by themselves that allow different types of processing capabilities to be run on virtual processors. Instead of using two actual processors, an SoC could be designed to use just one processor that contains two virtual processors, thereby reducing the area and power consumption by half.

Jim McGregor, chief technology strategist at In-Stat, sees the definition of triple play changing drastically. “We used to think of it in the home–entertainment services, phone services, etc., but that has really changed. Even looking at the home, everything goes IP. First we saw people cutting their cords from home phones going cellular. Now the cellular technology is going to IP technology based on OFDM (orthogonal frequency division multiplexing), whether it is WiMAX, LTE or whatever. But also, when you start looking at the home people are now cutting their cable cords because they can get everything downloaded off the Internet. All you need is a high-speed Internet connection. It kind of goes to the point about everything—whether it’s voice communications, data communications, entertainment—it all goes through the Internet.”

Power Bits: May 6

Friday, May 6th, 2011

By Ed Sperling

The Other 3D
Intel will roll out processors using tri-gate finFET transistors at 22nm, which it says will sharply lower the operating voltage, boost performance and reduce leakage.

Multigate transistors have been the subject of research for decades, most prominently at UC Berkeley, because they can be used to reduce current leakage and increase density. Going vertical allows more transistors to be loaded onto a piece of silicon, which in the case of a processor is particularly important because more transistors can translate into better performance.

Intel claims the new structures will improve performance by 37% at low voltages. The company said that makes it ideal for small handheld devices, a market where Intel has not done very well in the past primarily because its chips are considered power hogs next to those using ARM and MIPS cores. That statement alone caused ARM’s stock to plunge 7% as speculation mounted that Intel could replace ARM cores inside of some Apple devices. This is pure speculation, of course. Apple never talks about that stuff and Intel hasn’t even intimated that. ARM’s stock recovered rather quickly, too.

Still, most companies have shied away from finFETs because they are extremely difficult to manufacture and potentially can add to the design and manufacturing cost. Intel’s big advantage in this regard is that it still owns its own fabs and develops its own manufacturing process, something that is far too costly for all but a handful of chipmakers.

An alternative to 3D structures is ultra-thin body silicon on insulator, which is now being tested by IBM, STMicroelectronics, Soitec and Globalfoundries. And there is a possibility of mixing things up to include both. But the writing is on the wall—big changes are ahead, and Intel’s move is a first big step in that direction.

TI Pushes FRAM
Microcontrollers have been used for years to reduce power in devices through such developments as multispeed motor control and intelligent sensors, but the real battle of late has been inside the microcontrollers themselves. Companies in this sector have been playing leapfrog with power numbers taking priority over performance increases.

TI’s latest rollout includes an ultra-low-power FRAM, or ferroelectric RAM (previously written as FeRAM). This type of RAM uses 250 times less power than EEPROM-based microcontrollers, according to TI, and can be written at speeds of 100 times faster. FRAM is not a new technology. It was developed in the 1990s by Ramtron, and has been manufactured by Fujitsu for more than a decade.

Apparently major strides have been made in the pricing of this technology since then. TI’s microcontroller is priced at $1.20.

Power Bits: April 14

Thursday, April 14th, 2011

Power Struggle Heats Up—So To Speak
The battle between ARM and Intel has come down to a fight over power—which one can run at the lowest power.

This is becoming particularly important in the tablet market, which is becoming the tool of choice for consumers of information—executives and salespeople on the go—rather than creators of content. And it’s one that could seriously eat into Intel’s mainstay computer market.

Intel’s latest salvo in this war involves the next version of its Atom processor, code-named Oak Trail, which is aimed at the tablet market. That will be followed by the 32nm Cedar trail. New in the technology is what Intel is billing “all-day” battery life and “enhanced deeper sleep.

Despite its prowess in the PC world, though, this market appears to be ARM’s to lose. Apple’s iPad runs on an A4 chip, which is a 45nm package-on-package that includes an ARM Cortex-A8 paired with a graphics processor. ARM also has made some inroads into the Android tablet space along with MIPS, which was one of the first processor makers to embrace Android. Both run at lower power than Intel’s Atom chips, in large part because there is no x86 legacy to support. http://en.wikipedia.org/wiki/Apple_A4

Experts At The Table: Billion-Gate Design Challenges

Friday, March 25th, 2011

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: Will anyone be able to afford to create these complex chips in the future?
Janac: Sure, but it will be extremely expensive.
Browne: Apple is doing it. They’ve come at it with a systems approach. The user will have a great experience because they’re going to add a whole bunch of devices. But we’ve got to find ways to attach to the software at a higher level. We’re doing a full system design. We’re not hooking up a couple of widgets anymore.
Baker: Apple has moved up the stack. From an EDA standpoint we see all these challenges. We’re actively seeing designs at 28nm, planning for 20nm. We’ve yet to see designs at 14nm. But the complexity of validating one of these devices, whether it’s a single die or a multiple-die approach and in the future 3D, is increasing by orders of magnitude.
Browne: With 100 times the number of elements you can’t just extend the methodologies we use today. You have to define the interactions so you can abstract this. You can’t manage this many power domains when the use models are different for all the users. There may be 200 things you’re turning on and off to reduce leakage and increase battery life. To date, most people haven’t done that. In the rush to get to production people want to know if it runs Android or Angry Birds, not whether you’ve done all the power management stuff up front. We’re back to the speed of execution in getting it almost right and being early.
Rajendiran: That’s correct. Verizon, after years of rumors, finally launched the iPhone. But as they got near to release they said it cannot do multitasking. Who was asleep at the wheel? Then the next day they had a software fix to enable that. Why didn’t they think about it ahead of time? With all these complications we should really partition who does what.
Browne: Yes, it’s a system problem.
Rajendiran: But it’s something people could have easily thought out ahead of time. We need to define the components that need to be addressed and give it to the people who can address it. If you take a processor and optimize it for a set of libraries vs. another set of libraries, for the same performance level, one might take a third of the power of the other one. But who should tell you that? Should it be the company that makes the processor or the company that builds the SoC?

LPE: But increasingly you’re not building the chip. You’re integrating parts.
Throndson: You can see people racing ahead of each other, depending on the pieces you’re considering. Part of it is just a matter of getting to market early with a solution. But in terms of parallel hardware, it’s still way out in front of parallel software. Even with power part of the answer is going back to better utilize the hardware that’s already there, whether it’s the processor itself or at the larger system level. It’s very difficult to optimize and deliver every component that goes into these systems today.

LPE: From the network-on-chip perspective, will these chips be running at the same node and power or will there be an array of nodes, power and legacy technologies.
Janac: You’re going to be dealing with multiple processes and legacy applications. It doesn’t make sense to put analog IP on a 16nm design. You will have to use multiple die using a system-in-package approach where the digital part of the system is running at the latest nodes optimized for low power and cost and the analog stuff is running on trailing-edge processes where the IP is available.
Browne: We’re building a system using building blocks, and good enough wins if it’s early enough. The more you re-use, theoretically, the quicker you can get there. But the real challenge is how you better enable mix and match in the software area.

LPE: And that ‘good enough’ is also tested well enough?
Browne: Good enough has programmability. The fabric allows reprogramming. We think it’s important to be able to do things in parallel. If you can get enough of them done simultaneously, even if they’re running slower, then you don’t need buffers to manage those serial events and you have less logic and less wires and slower transistors in the linear area of design. That also means there is less leakage.

LPE: Will the tools be able to deal with this kind of structure?
Baker: Re-use has been around for about 15 years. So what’s preventing the re-use? A lot of that scaling and functionality is available today. It’s not a new challenge. The challenge we face is that re-use isn’t happening. We’re redesigning these components with each iteration.
Janac: Once you get past RTL the tools are horizontal. The chain of synthesis, place and route, verification and DFM are applicable to that entire system. Above RTL it’s like the silos of IP. Those tools are not addressing that. The MIPS and ARM processors each have their own tools. Arteris’ NoC has its own tools. You wind up with horizontal silos where the IPs are tied to the tools. Only when they reach RTL do they hit the Magma, Mentor, Synospys and Cadence tools. There is no horizontal toolset that can handle all of these IPs at the architectural level.
Rajendiran: There’s no reason to keep up with Moore’s Law for things that have already been certified and verified. In the old days we were following it. When Moore came up with that law he wasn’t talking about cost. He was talking about transistors. At that time you could do a chip for $50,000. That’s not the case anymore. People are slowly coming to the realization that if you have a chip working, why bother re-doing all of it? You can put software on it, you can even re-do it on the latest process, and use an interposer to make it work. So 90% of the chip is already validated. You add new software and you get the chip out sooner.
Browne: You also cover more markets, which adds more complexity to the definition. The requirements are different for a smart phone and a tablet computer.

LPE: But some of the functionality may be the same between a smart phone and a set-top box, right?
Browne: Yes, and that’s why the big companies have more data points. They know which subsystems can be re-used. When you’re doing audio on these devices everything works. When you add more cores or video, it’s different. The guys with a bunch of technology in-house just need to add more things out of what they already have.

LPE: How many of these billion-gate designs will be on 2D structures vs. 2.5D or 3D?
Rajendiran: With 3D, the problem is more on the manufacturing side. When you drill a hole there are problems. It’s just a matter of time before full 3D works.
Browne: The fabless community is huge. There are $3 billion fabless companies that have very expensive product portfolios. There are also startups that build similar point devices to try to go after those markets. The difference is the big guys get to run more experiments. The little guy only has one.
Janac: The answer depends on what you’re trying to do. If you’re building a unified chip that fulfills a unique function, throwing it on 16nm process makes sense. If you’re mixing functions that are mixed signal, analog, RF or legacy it makes sense to put it on more die. But fundamentally the mixed-die approach is more expensive than trying to put it all on a single die in 2D, assuming you can use one process and the IP is all packaged correctly.

LPE: How many derivative chips do you need to get these days to make it economically feasible?
Browne: At 28nm the cost is about $80 million. How are you going to get that back?
Janac: People who make wireless chips are spinning them off into automotive and home gateways, so you wind up with seven to 10 derivatives for a successful platform.
Browne: In some cases a subsystem is re-used, in others it’s the same chip.

Next Page »