Posts Tagged ‘ARM’

Next Page »

Experts At The Table: Retrofitting Older Process Nodes

Friday, September 16th, 2011

By Ed Sperling
Low-Power Engineering sat down with Walter Ng, vice president of the IP ecosystem at GlobalFoundries; Vishal Kapoor, vice president of marketing for SoC realization at Cadence; Naveed Sherwani, CEO of Open-Silicon; John Heinlein, vice president of marketing at ARM; and Jeff Lukanc, director of engineering at IDT. What follows are excerpts of that conversation, which was held in front of a live audience at the Global Technology Conference in Santa Clara, Calif.

LPE: Is it harder to sell EDA tools for older nodes?
Heinlein: On one hand, the EDA requirements of the industry are evolving quickly. There’s one challenge where people have older tools, and you have to do new IP based on older flows. And then you have people wanting to use new tools on older nodes. They may want to use things like CPF. So we have this schizophrenia and we have to support that.
Kapoor: EDA is about tools, IP and services, and the reason the design components start to come in is that when you get to new nodes or existing nodes, you have to go broader than just tooling.
Sherwani: If you are developing tools for 14nm you are dealing with FinFETs and a lot of physical effects. But at 0.18 (microns) the issue is how to make one or two designers very efficient. We need to hire operations business people who have nothing to do with design in order to change that. EDA companies are very much focused on physics and getting to 20nm and 14nm. They still don’t have the mindset toward finishing a 0.18 design in one day. Is it possible to put together a flow that can be done by only one guy? In addition, many designs are derivative designs. Companies may want to add DDR3 to an existing design. The headcount and mindset required is very different.
Kapoor: We were joking before this panel that somewhere between the 34th and 43rd minute if I’m on a panel with Naveed he’s going to ask for free tools. With all due respect, that’s not a business we’re in. But in our core EDA business, we spend a lot of time on engineering efficiency. You will see a set of capabilities from Cadence that will address that. But if you can get an engineer to do a 180nm in a day, we should spin off a business.
Sherwani: The tools are focused on efficiency, but not on whether you can do designs in a day. If you can do a design in a day, you still have to verify that.
Ng: The point Naveed is raising is a cost issue. Whether it’s at the leading edge or older nodes, cost is in the purview of whether it even makes sense to do the design. Years ago when I was at Cadence we had a seven-day design goal. EDA hasn’t always looked at driving cost and efficiency.
Kapoor: First and foremost, EDA is about density and automation. The second part is that you have to figure out how the economics of the whole industry work. If at 40nm we spend $600 million putting together technology, to have anyone design at that node you need sufficient volume to get an acceptable return. And at 28nm it’s $1 billion and at 20nm $1.5 billion. You have to recognize that everyone can’t have an apps processor. That’s not going to happen. Just like there are limits to technology, there are limits to economics. If that’s what it will take us to put into it as part of the broader industry, that’s what we’re going to have to bear.
Ng: Do you think EDA has been driven to the same level of efficiency as other parts of the supply chain?
Kapoor: That’s not a fair question and here’s why: The way the business model for the tools piece works is different than for the semiconductor manufacturing. In the long term, if we bring on additional services will we look to be more in line with other parts of the supply chain, including Naveed’s business? Absolutely.

LPE: If power was not an issue at 180nm in the past, why is it such a big issue now?
Heinlein: Because the bar always moves. People are looking at applications that require low power much more than before. We also need to have power management ICs alongside other chips. And there’s a question of using the right hammer to solve a problem. The bar is different than it used to be.
Lukanc: The mix of things you put in a chip is different. There are mixed signal and power management. You can get a 40-volt PCB process at 0.25 microns. Now 30-volt processes are available at 0.13. You can mix things together and keep mask costs relatively low. Time to market is shorter, investment is lower and it requires fewer people.

LPE: What does the ecosystem look like with more foundries at older nodes?
Sherwani: These are not like TSMC or GlobalFoundries. They have their own IP houses or in-house IP.
Heinlein: That’s correct. These are companies that are very comfortable in their niche markets. That said, we are starting to witness sea changes in areas such as embedded microcontrollers, driven by the so-called Internet of things. That’s going to drive people to put microcontrollers and processors in places where they’ve never been before. Enabling modern software development and EDA development allows you to do more.
Kapoor: If you’re talking about a transducer or something like that, you’ll have to integrate the increasing analog and mixed signal capability with the digital capability. A mature node makes perfect sense. What you have to learn is what you need from the EDA side all the way to the manufacturing side.

Experts At The Table: Retrofitting Older Process Nodes

Thursday, September 8th, 2011

By Ed Sperling
Low-Power Engineering sat down with Walter Ng, vice president of the IP ecosystem at GlobalFoundries; Vishal Kapoor, vice president of marketing for SoC realization at Cadence; Naveed Sherwani, CEO of Open-Silicon; John Heinlein, vice president of marketing at ARM; and Jeff Lukanc, director of engineering at IDT. What follows are excerpts of that conversation, which was held in front of a live audience at the Global Technology Conference in Santa Clara, Calif.

LPE: What is the definition of a mainstream process node these days and why are older nodes so important?
Heinlein: We’re thinking of mainstream as 55nm and older. That’s where a lot of the high volume is. Even though it’s sexy to talk about the leading edge, last year about 75% of ARM’s royalties came from cores that were developed in 2006 and earlier. About 3 million of the 6 million cores we shipped were ARM 7.
Ng: From a manufacturing standpoint, the volumes are at 65nm. From that node it’s moving from 55nm and 40nm, but that’s still the bulk of the industry. A lot of companies are doing some very cool things that are very relevant today at those nodes. Even with some of the biggest companies, a lot of the volume is at 65nm. It’s what pays the bills. If you have 200mm capacity, those fabs are completely depreciated.

LPE: How about for the tools? Does the mainstream part of the market really pay the bills?
Kapoor: From an EDA perspective, 65nm pays the bills as much as 28nm and 20nm.

LPE: Is everything still following Moore’s Law? If a company is designing at 65nm, does it necessarily move to the next node?
Sherwani: We look at everything from networking to consumer applications. Some customers need the latest technology. But there are others who are at 0.18 (microns) and thinking about 0.13, and maybe they don’t to go there. The velocity of that move is segment-specific.
Lukanc: The mainstream for production is 0.13, but a lot of the new designs are ramping to 65nm. We’re looking at older technology and combining new things through integration. There may be a call management IC with a 30-volt option at 0.13 or 0.18, which allows the unique combination of analog and digital management on one chip. We can re-use some of the older technologies.

LPE: There’s a lot of investment in older processes these days. Why?
Sherwani: I visited about 10 fabs in China and I was surprised that none of them had 65nm processes. Most didn’t even have 90nm processes.
Ng: If you look at what’s driving a lot of technology today, it’s the consumer market. And that’s very cost-conscious. If you can’t take advantage of the latest technology, then you look at where your given application makes sense. Cost is very much a factor that customers consider at each process node. And for us, we have to find ways to keep investments in fabs relevant to our customers. We have a big focus on high voltage and power management. We have to find ways to add value on top of baseline logic, which is a commodity at this point.
Heinlein: If you look at smart phones, everyone is always focused on the processor and the high-end chip. But alongside those are the power management controllers and display drivers and RF/mixed signal. Another area for derivative value-added processes that Walter (Ng) mentioned is low leakage. When you get to 65nm leakage is a problem. There are ultra-low leakage variants and high-voltage variants coming out at the high end and the low end, so people can put those into applications that can run on a coin-cell battery for 10 years. To complement that there are ultra-dense libraries that bring the cost and the leakage down and which are suited very well to these kinds of applications.

LPE: If you develop a chip at 180nm and the process changes to low leakage or low power, does it yield the same?
Ng: The strategy in developing these new processes or modules on top of derivatives is to preserve the investment that was made earlier. It takes advantage of the proven solutions that are already there. When we originally developed those processes, at that time they were leading-edge processes. As you get much more volume using those processes, the manufacturing window becomes quite tight. You could probably tighten up the bit cells. But it’s a business tradeoff whether you re-invest in that or not. The yields are just as good.

LPE: What happens to the tools and the IP that was developed?
Heinlein: For the most part it all works. If you think about 180nm, nobody cared about leakage because it wasn’t an issue. Now, when people look at 180nm, they do care about leakage and power management. So we’re putting that back into 180nm.
Kapoor: The innovation at the leading nodes is going to drive benefits at the older nodes. You drive it back in terms of products, but you also drive it back in terms of design techniques. We developed a 28nm PHY, and we were challenged to do it differently because it’s for a leading node. Today we’re applying what we’ve learned back to 40nm and 65nm.
Lukanc: The best tools are developed at the leading nodes, but you may want to characterize older libraries for low power and power management.

LPE: If you improve an existing technology at an older node, can you charge more for it?
Lukanc: Yes. In general, what we’re offering is value-added solutions. In some cases we offer value-added solutions that are low power.

LPE: Will it be essential for older processes to be updated when we get into stacked die as a way of decreasing the overall power budgets and physical effects?
Sherwani: The answer is different for each area. There is no single, simple answer to that.
Kapoor: For a long time our industry has looked at the technology piece rather than the economics. The answer is, it depends. Can you get more value out of an older node? Yes. The economics will drive the longevity of nodes and what you can get out of them. But we cannot talk about the value of older nodes unless we invest in the newer ones.
Lukanc: If you have an existing product, you can look at the option of integrating oscillators or an EEPROM or something else on top of it to reduce the system cost. There are lot of things you can do in a package to reduce the overall cost, but you have to look at the total system cost. You may be offering a smaller footprint to the customer, but they may not be getting value out of that.
Heinlein: If you look at mixed signal and RF design at the leading-edge nodes, it’s really tough to get the transistor variation to be complementary to the analog. There’s a point at which it’s too hard, and in that case a heterogeneous 3D package makes sense.
Kapoor: With 3D ICs there’s a technical capability about whether you can marry different die. But you also have to look at it from a system capability. When you look at tablets, where the SoCs are talking in very high bandwidth to memory, that makes sense. The technology by itself won’t be an answer. You need to find out where it makes sense to use it.

LPE: Is investment in older process nodes an arms race that favors the big foundries?
Sherwani: The specialty foundries being built in places like China have nothing to do with companies like GlobalFoundries and TSMC. They will ship a lot of silicon. Over the next 10 years a lot of the analog silicon will be shipping out of China using all older nodes.
Kapoor: Those boutique fabs are certainly making investments in areas in which they specialize.
Ng: You have to continue to make fabs relevant and to drive a good margin. A big impetus for us in developing modules on top of our processes is that you do get the second- and third-tier foundries coming in and taking the floor out of the base logic price. That’s difficult for us to compete with. So we’re looking at where to add value and how to win a good percentage of market share. We have our investments in 200mm. We will continue to invest there.
Heinlein: We definitely see lots of specialty processes at the smaller players. We work with them and enable them. But once it gets to a certain point in the market they we work with the big players.

LPE: Will it become a battle of who has the deepest pockets?
Sherwani: The good thing about older nodes is that the investment needed is miniscule compared with the tens of billions of dollars at advanced nodes. A lot more players can be relevant at older nodes. At 14nm I don’t think there will be more than three or four players.
Ng: The incremental investment to bring up these value-added modules is nothing compared to the investment at the leading edge. The other side is that the equipment manufacturers are a leading component of the cost at the leading edge. At the mature nodes, you’re not buying a lot of new, expensive tooling.
Lukanc: That happens on the product development side, as well. To do a 100 million-gate design requires a certain amount of tools and people and mask costs. At the older technologies mask costs are quite cheap. And if you’re re-using technology and adding to it, you can keep NRE low so return on investment is quite high. You need to take advantage of mainstream older nodes as well as more aggressive nodes.
Ng: And most times our relationship with most of the leading-edge companies span multiple nodes.
Kapoor: At 14nm there are 5 or 10 customers. As a foundry, you have to worry about how you’re going to get the rest of the industry in. The economics even for the companies that can afford it aren’t that great. So you’re going to see continued innovation even at the older nodes.
Ng: A major part of the foundries’ concern is up and down the supply chain. It’s not just the fabs. It’s the tools, the support for IP providers, and packaging solutions. That’s a challenge we have to address as an industry.

Experts At The Table: Multi-Core And Many-Core

Thursday, August 11th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: How does cloud computing change the need for multicore and many-core processors?
Sherwani: Cloud architectures will evolve differently from mobile architectures. They will be homogeneous 8-, 16- and 32-core architectures. They knows a lot about what you are storing. You can put a lot of intelligence into what you’re storing, which is not the case in a mobile device.

LPE: So what does that mean for the mobile devices taking advantage of it?
Sherwani: It can certainly make mobile devices more efficient. You can store a lot more on the mobile devices. You can do a lot of streaming.
Martin: The application cloud interaction may change in character. People will write somewhat different apps in the future that will take advantage of what the cloud has to offer. This is why you’ll see cobwebs on the desktop in the future because no one is very interested in it anymore.
Sherwani: And if you look at video, with the cloud and a good wireless connection you don’t have to store the video. Video cameras will become a lot less expensive.
McDermott: This should be put into context. It’s amazing that people are so excited about a database. That’s all it is. I believe the vision for the mobile device is that you have access to all the data, and you selectively choose how to expose it. The browsing experience is different. You don’t try to replicate the desktop experience on a smaller screen. It’s a given. You take the appropriate content and you display it in a way that’s easiest to digest. I think the hardware on the mobile device will become smart enough to selectively show you the piece that you need on your mobile device. You don’t need an entire map. You just need to know where you are.

LPE: What’s interesting about databases, though, is that they’re one of the very few applications that really can do true parallel processing and scale effectively.
Sherwani: I’ve been saying for the last two years that we should stop giving people content. In five years all the content will be available. If you’re a mechanical engineer, everything you need will be on the Web. What we need to do, though, is teach people how to do something useful. This is the same thing with mobile devices. Whatever device will be useful will be the one that can quickly filter through what you’re looking for to get something done. It’s not about storing more information. Cloud brings that opportunity to people, devices and things. Our view of expertise will change. It won’t matter if you’re an electrical engineer. It’s whether you can get a task or series of tasks done. That will be more important than a Ph.D. We are 10 years from that, but this is how people of the next generation will think.

LPE: What you’re talking about is data mining for the masses?
Sherwani: Yes.
Martin: Before we get too carried away, there are a couple of issues that really need to be solved in this cloud paradigm. We do need to think a lot about privacy, security, and the ability of the infrastructure—both wired and wireless—to deliver all of this content off the cloud and onto the sea of mobile devices. We all know about the experiences of certain smart phones overloading networks and they’re still trying to improve the quality of the network. The wired infrastructure is not fault free. Security and privacy worry me more. If you upload all your data into some big infrastructure, you want your data secured.
Rohatgi: That’s the weakest link. Everybody’s pushing down this path. What worries me is the security and reliability. There are a ton of issues that need to be resolved. Creating a smart infrastructure for data mining can be done today. On the mobile side, there are probably some advances necessary to improve battery life, which is the No. 1 complaint I hear today. But the weakest links we hit are the communications channel, security, privacy and reliability. If those can be resolved then we can progress.
Martin: The technologies we’re all involved with are going to help in a big way. It just requires a bit of mobilization to focus on those issues.
McDermott: This reminds me of where we were with cell phones years ago when the processor went through certification with the carrier. The consumer doesn’t see all the certification on the network. The carrier loves new features. It’s more traffic for their store. It brings in a new wave of users. What they don’t want to see is something that disrupts their infrastructure. For the engineer, the certification is really intense and the field trials are difficult. The cell phone industry has to show a partition that you can certify your baseband and your protocol stack and that has to be isolated from other activity. That underlying security infrastructure is built into the certification. I think we’ll see that extended upward through commercial transactions to having trusted processes and transactions.

LPE: Will cores all be homogeneous or heterogeneous, and will some of them be virtualized?
Sherwani: All of the above. There will be homogeneous cores, heterogeneous cores and there will be virtualization. They all solve different problems. You need virtualization in data centers.

LPE: But will you need virtualization on your smart phone?
Rohatgi: We’re starting to see some of that. I don’t think the operating system wars are dead. And at the end of the day, there is some value to keeping RTOS access to legacy hardware and a high-level operating system like Android or Windows or IOS. From a security angle, it all depends on the use case. The mobile guys are really scared of virtualization of a single processor that has access to all memory. They want separate memory and separate everything.

LPE: This is similar to devices that have a partition between what’s used at home and at the office, right?
Rohatgi: Yes. It’s the same problem. And this almost ties into virtualization. On the privacy side, there isn’t a well-defined security layer with NFC (Near Field Communications Forum) and they’re talking about mobile payments. If you power on an Android phone and shut off all networking then your maps go haywire. Why? Because there’s a back channel that goes to some cloud that helps triangulate where you are. That information is stored to help applications of the future. I’m surprised people aren’t bothered by this. But to return to the question, we’re starting to see some effort down the path of virtualization even though it’s not widespread yet.
Martin: You won’t see virtualization down to the metal. In the dataplane layers it’s nice that processors can emulate other processors effectively, but close to the metal you want extreme efficiency and high performance.
Neifert: And that’s where I see the problem with virtualization. It’s the power. Virtualization is nice, but it’s an abstraction away, which is a power loss. At that point you need heterogeneous processing.
Rohatgi: Transmeta, about nine years ago when they started doing abstractions to hardware, had power numbers that were way down. It’s too bad that green energy wasn’t something that was important then. Still, the genesis of the Atom processor was entirely because of Transmeta..
Sherwani: A typical Bluetooth radio takes about 32 milliwatts of active power. At 65nm we have a Bluetooth radio that only uses 3.2 milliwatts. And there is a design on the board that will take it below 1 milliwatt. There are a bunch of engineers getting excited because over the last 100 years the basic design of a radio has not changed. What Marconi designed is essentially the same as we have today. But when you scale down the power needs to go down. It’s amazing how much lower you can go.
Rohatgi: There’s the other side of this, too. Battery technology has not evolved as much as we would like. For the analog components, it’s the switching characteristics that are governing it. That’s where you’re seeing a lot more intelligence. If you were to look at the power profiles of a mobile device, LEDs and LCDs were supposed to be the promise for low power. That hasn’t worked out. There are still 250 milliwatt drivers. The radio is probably No. 2 on the list after that.
McDermott: People’s expectations were that a screen would be a certain pixel density. Today that needs to be super high-definition. It’s beyond high-def.

LPE: So will we see more cores in the future or have we maxed out?
McDermott: As a programmer, how are you going to keep track of 100 cores? How are you going to program that intelligently? Either it’s going to be some array a programmer can visualize, or it’s going to be three or four very solid cores and let other cores do things like Bluetooth. You can’t keep 100 threads in your mind.
Rohatgi: There’s a limit to this. If you look at the desktop space, in 2006 when Intel began heading out on this multicore approach they found that success wasn’t nearly as fast as they thought. There’s probably a limit on mobile devices, too.
Sherwani: We did all this in the 1980s. nCube used to have a 16-core and 32-core machine. It works great up to 8 cores, but after that you lose it.
Martin: If you are trying to program a concurrent application and split it into different threads, there are inherent limits. Some very specialized applications may be very concurrent, but most are not.
Neifert: The programming model has a human in the center, and humans can only process so much. Until the fundamental programming model changes, you won’t see much advancement.

Experts At The Table: Multi-Core And Many-Core

Friday, July 29th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Is software taking advantage of the hardware in a power-efficient way?
Rohatgi: Yes, and the ultimate example of that is the Android operating system. Even though it relies on Linux there are on-demand and five levels built into Linux that controls at the software level the CPU registers or SoC registers to shut down power. You’re already seeing that at the operating-system level.
Martin: It depends upon which software you’re talking about. At the OS level, where lots of apps are running, there may be commoditization happening. Down at the dataplane, where people use application-specific processors, you can argue that’s the infrastructure. People want extreme power efficiency and reliable continuously executing functionality. That’s the place where heterogeneous multiple processors really shine. It’s almost an infrastructure layer in a mobile device. So you see different solutions depending on what level of the device you’re talking about. We see a drive to more heterogeneity, too. Baseband wireless infrastructure works better with heterogeneous processors than trying to shove that onto a multicore device.
Neifert: That’s certainly what we’re seeing in our customer base. They want one processor to run the modem subsystem or the WiFi and partition that off. The last thing you want to do is wake the application processor all the time. The application processors are getting more complex so you can talk and play games at the same time and surf the Web. The application processor has to handle all of that. The application processor may be power efficient, but not as power efficient as one that just runs the radio or data transfer.

LPE: Is it better to actually design a device with multiple processors or a single multicore processor?
Sherwani: When I was at Intel we believed it was the best processor ever developed. I never thought I would see ARM and x86 processors on the same device. We are not that far away right now—and I’m talking about having them on a single chip. Or it may be a MIPS or Tensilica core. Such processors will exist. We are very efficient these days about using power islands. We can put six or eight processors on a chip and we can put them to sleep when they’re not being used.

LPE: Is it more difficult to verify them?
Sherwani: The verification nightmare is growing exponentially, and it’s not clear to me how we will be doing verification five years from now. At the implementation level, verification is becoming a bigger and bigger piece. But it’s more of an architecture question than whether you’re using multicore or many cores.
Martin: This whole approach tends to lead to a more compositional design style where you’re composing well-understood systems. What you need to do is limit the interactions between them to a relatively high level of abstraction or control. You verify significantly each subsystem and then you verify without having a great deal of interaction between the subsystems.
Sherwani: It’s amazing that on a big chip people don’t do flop-to-flop timing on a block. This is a situation that would never happen in software between subroutines, but it happens all the time in hardware. In hardware we have not reached a maturity level where I take care of my block and you take care of your block. We have timing paths going to two blocks and you cannot time it unless you do the timing and verification together.
Neifert: I’ve got customers that will spend months validating their processor, fabric, memory and data path, throwing out all the various options on there and running that. That could be a single-core processor reaching out to memory, and they’ll spend a lot of time optimizing that. Now throw in one other master accessing the same memory and everything goes out the window because of all the different permutations when these things talk to each other. It now blows up exponentially. The nice thing about a multicore approach is that you’ve handed off a lot of that task to the processor guys and hope that they’ve done it properly. It may not be the optimal use for your application, but pushing the problem off to an IP provider and a multicore solution is what a lot of our customers are doing.

LPE: What’s the best way to take advantage of cores? Do you do it with Wide I/O or through multicore and a standard bus?
Sherwani: If you look at where Micron is going with this, the whole interface has been changed. The memory becomes a lot more intelligent instead of a dumb storage. You will be able to ask memory to do certain tasks. Processor people have tried to make memory as dumb as possible in order to commoditize it. All the value comes from the processor side. But balancing would be better so you can offload things. You can combine flash into the most cost-effective memory. Instead of saying, ‘Give me byte No. 7,’ you can say, ‘I need this piece of information.’ It’s a lot more power-efficient to do it that way.
McDermott: It’s quality of service. You’re not just making a data request. You’re saying, ‘I need high bandwidth or high efficiency or low latency.’ A processor may need only a small amount of data, but it may need it very efficiently and very fast. With video you need high bandwidth that is very predictable. Having graphics integrated is one way to go. Unless you have a view of the fabric, the quality of service and the end power engine it’s going to be very hard to engineer a one-point solution.
Martin: With a compositional approach, you may have big memories and then a lot of small distributed memories to keep data close to the area where it is being processed. And maybe you need some intelligent abstractions on things like DMA (direct memory access). That would give programmers more assistance in managing the data flow and data interaction so things will move out of central memory into local memory before they’re needed. That’s a different programming style. We need more flexibility in how hardware and software developers can compose these memory systems together.
Sherwani: If memory is knowledgeable about what is stored inside, it can give you service of the highest level. Right now you can’t do that. The attitude has been, ‘I have a board and I have a DIMM and I want this DIMM to be as low cost as possible.’ That approach has led us down this path. If you’re designing a microprocessor of any kind, it puts a lot of burden on the microprocessor to do all these things with memory. Eventually you will see memory microprocessors—storage with a processor on it—that can gate what is being stored on it. That is a new area, though, and I don’t think much has been done so far.
Rohatgi: In some respects this is already happening. If you think about cache controllers over the last 30 years, this is where you’ve seen a massive improvement. It isn’t user-level aware. It’s bit-level aware. And if your memory isn’t fragmented it works. Or in a multicore design, a coherency module is also very well aware of what it needs to do to keep synchronization between processors. I like the visionary statement of making it user-focused.
Neifert: If you look at the various SoCs on the market, they may use processors from ARM, MIPS and Tensilica, but a large number of them are still doing their own memory controllers because that’s a place to differentiate their design. There are more memory controllers coming out of Synopsys and Cadence, but in large part the bleeding-edge SoCs are still designing their own.
Sherwani: But you can go a lot further.
McDermott: There’s a big difference if you can optimize a path for video and have some pre-fetch algorithm. That may not apply to every chip. But in a custom design, you can partition as needed. When you define your coherency space you need to make them aware of these choices. It’s not just an arbitrary memory spec. You need to make them aware of how to use it.
Martin: That should lead to some opportunities for much more sophisticated memory control, and the kinds of data flows and accesses that people really want to do. That can be reflected in configurable memory IP. I’m not sure how rapidly that’s happening, but there are moves in that direction.
Sherwani: For the work we are doing with the [Micron] Hybrid Memory Cube, there’s a lot of excitement around that space. A completely different level of system design is possible with that kind of hybrid model.

Experts At The Table: Multi-Core And Many-Core

Thursday, July 21st, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Computers aren’t getting the power/performance boost today from multiple cores because the software can’t take advantage of them. How do we fix that?
Martin: Your computer isn’t a place where all the advanced design techniques are used. You have to look at battery-powered, cordless devices to look at the places where people use the most advanced design techniques. There they very often will have specialized application processors for different parts of the applications they want to run on those devices. Those processors are designed to be energy-efficient and to efficiently use battery power, and they probably do work better from one generation to the next—except for the case where they may throw on additional general purpose processors and don’t take advantage of energy consumption. You have to get a big distinction between multiple processors that are application specific vs. general-purpose processors that do not offer efficiency or better performance.
Rohatgi: Once the Intel-AMD megahertz wars ended people started heading down a different dimension of multicore. Back then they believed that changing the software ecosystem so that specific software or systems could be written to take advantage of multi-core, multi-thread, multiple processor designs would actually work. We’ve seen it work in many cases. You can reduce the latency when you’re executing a certain process or multiple processes. Another twist to this paradigm is people use core islands. The operating system may run on one core while another core is used for acceleration. Some people define that as multi-core, and that has been very successful because you can partition between a media processor engine, a video processor engine and a graphics processor engine. In terms of power consumption, that whole element needs to be pieced into this picture. When it comes to embedded SoC design vs. desktop design, those are very different when it comes to power consumption. That element hasn’t been worked through very cleanly on the desktop side, where suddenly you need 800-watt power supplies.
Neifert: The overall user experience that people have when interacting with a device has moved from the underlying hardware to the software. The emphasis has shifted to enhance the user experience. Opening a window on your desktop used to be simple. Now there’s shading and fancy graphics, so the same window that used to come up in 5 instructions may now take 500. It looks a lot nicer and in some cases that changes the user experience. But from the processing side, the focus stopped being on single-thread performance as the megahertz started burning up too much power. They branched out into multicore to solve that, but changing the software to accommodate that has been a big struggle. Changing the hardware to isolate that properly has been a struggle, too. Some of the processing that been done on computers is difficult to migrate over to mobile devices. A lot of the innovation on the desktop is now taking place in the embedded space. If you want to see the leading-edge design techniques, that is where you have to look.
McDermott: In the mobile area low power is associated with the battery life and the key to the user experience is maintaining functionality throughout a working day. We’ve gotten to that point. Now we’re engineering more productivity. There are more features you can run, more capabilities, more graphics, but still within that working day. Now what we’re seeing is low power is key to other markets. Data centers are predicted over the next few years to rival the airline industry for energy consumption. Cloud computing will lower the power a node, but that energy is still being used somewhere even though it’s shifted. What cloud changes is that if you run an application on one device and shift to a different device it’s no big deal. It takes advantage of the underlying computing architecture. There also may be a hierarchy of operating systems to deal with it, depending on the device.
Sherwani: We got very interested in how power relates to multiprocessing. If you are trying to predict power within a watt or two that’s no big deal. If you are trying to predict power within a milliwatt, that’s very difficult. We thought that by looking at implementation of the netlist we could predict power. That turned out to be not the case. Then we tried system-level design. That doesn’t work. We finally came to the conclusion that you have to have a user model. We needed a human model—a businessman, a lawyer, a student—and then analyze what they did during the day. Then we had to convert that into system level and then RTL level. This takes us far from what Open-Silicon does as a company, but we have found this the only way to accurately predict power. These kinds of human models don’t exist. We created two models of two types of people who use it. Then we started recording real human beings and calculating the model against them. Good models don’t exist if you want to accurately predict power.

LPE: Are we better off with many cores or multiple processors?
Martin: Multiple heterogeneous processors are the way to go, particularly in the mobile domain. With clusters of servers you may have many homogeneous tasks you want to map. The desktop is a bit of the orphan here. If you move to cloud computing and the highly mobile devices and ever-smarter phones, you wonder if people will worry about even having a tethered desktop. That means the innovation may be in the big server farms and the mobile devices, and the desktop may gather dust.
Neifert: It will be replaced by a docking station that you plug your mobile device into.
Martin: That’s right. Or as we have seen, some companies are combining mobile devices and a laptop together. The use cases are extremely interesting because there is no single use case. For a mobile device that has an advanced graphics processor, the game player may burn up battery by hammering that all the time. The music lover may be using MP3 decoding and get significantly longer time out of the battery. That drives significantly different use models and processor choices.
Rohatgi: There are a lot of different vertical markets. It ranges from digital still cameras to anything with a battery. There is a use case for multiple processors. Networking and cloud computing are very large markets. In the embedded space, what has happened is there are a lot of people in the SoC space. The hardware itself is heavily commoditizing. Even the operating system is commoditizing. The differentiation is how you pick and choose your IP. If it comes down to cost in a mobile phone, from the top up they don’t have a feature list or a use model. The discussion begins with, ‘What can you fit in a 7 x 7?’ Based on something like that, what kind of IP can you fit in there and still have a useful device? In the volume mobile phone market, the direction is to shrink the die as small as possible. It may be a 6 x 6 or a 5 x 5. In that case, I would choose multicore rather than multiple processors.
McDermott: In cell phones the issue used to be standby and talk time. People could self control that. If you talk more your battery goes down. People are starting to experience that if you want to play games you have to deal with this. We’re starting to deal with the apps developers. You used to have specialized OSes and applications. With the proliferation of open source you don’t know what could be running on there. It can run any app. We’re reaching out to the app developer to write code that is attentive to the power effects. There is an amazing learning curve through people writing a good game experience in a power budget that’s acceptable. You need to get the apps to be power-efficient.

Power Bits: Why Set-Top Boxes Are Energy Hogs

Thursday, July 21st, 2011

By Ed Sperling
For years, semiconductors have been getting more efficient. Desktop computers that used to peak out at 250 watts are now down to the 30- to 60-watt range. But set-top boxes, those inconspicuous little boxes that connect televisions to services provided by cable companies can consume even more.

The problem has become bad enough that the National Resources Defense Council issued a report last month saying digital video recorders, cable and other pay-TV boxes were costing U.S. consumers $3 billion a year.

So what went wrong? The answer actually has nothing to do with the semiconductors inside the boxes. It’s the back-end systems from the companies that offer pay-TV services—the use model into which chip designers had no visibility.

“The problem is that the MSO (multi-system operator) is querying the boxes regularly, which means they’re also spinning up the hard drives,” said Paolo Masini, principal architect for digital home at MIPS. “Over the long term, this problem will go away because functionality will be absorbed into the residential gateway. But in the short-term—meaning over the next few years—there will be a move of all these services into the cloud. That will offer huge power savings.”

How much savings? The starting target is 70%, and that’s the easy stuff. Add in more power-saving features and it can go significantly higher.

“There is a lot of synergy here with gaming consoles, too,” said Masini. “The companies making these devices have introduced reduced power versions, but they’re only slightly better. They’re now getting a lot of pressure to decrease their energy consumption, as well. The blocks and peripherals on set-top boxes and gaming systems are similar, and they use similar chips.”

Several companies compete in the set-top box chip market. MIPS is the current market leader, but ARM is competing with similar performance and power credentials. Intel has made some inroads, as well, but its primary focus is CPU and graphics performance rather than efficiency.

Power Bits: July 8

Friday, July 8th, 2011

By Ed Sperling

Computing is about to get smart—or at least make a bold attempt in that direction. The Universities of Manchester and Southampton in the U.K., the Engineering and Physical Sciences Research Council, and ARM and Silistix, are trying to develop a computer that mimics how nerve cells in the brain interact.

What’s interesting about this, at least from a chip standpoint, is that brains are many times more energy efficient than computers even though they are imperfect and use relatively slow components. But they do work well asynchronously, which is something that big computers have never mastered. Hopefully they’ve got some software engineers involved in this project, as well.

Along the same lines, The Institute of Physics reports that UC Berkeley researchers have developed memory devices that operate at close to the Landauer limit of minimum energy consumption because there are no moving parts. For physicists, that means no moving electrons. Rolf Landauer, incidentally, said that computation doesn’t require a minimum amount of energy but erasing information does.

5 Ways To Cut Power

Thursday, June 16th, 2011

By Ed Sperling
Low energy consumption with minimal leakage has emerged as the most competitive element in an IC design, regardless of whether it involves a plug, a battery, or whether it’s powered by a gasoline engine.

While components on an SoC aren’t always power-aware, they’ll have to be in the future as consumers focus first on energy efficiency. With rising fuel costs, a concern over global warming and a steady reminder that smart phones have to be plugged in every night, car companies are shifting their strategy from efficient hybrids to even more efficient plug-in hybrids and electric vehicles, and California has gone so far as to mandate that one-third of all electricity sold in the state by the end of 2020 must come from renewable sources.

This shift in public awareness hasn’t been lost on the chip industry, which has been rolling out some very complex advances well ahead of schedule. Here are some of the most important:

Clouds
The push toward a cloud-based infrastructure is a way of centralizing computing—basically a return to the time-sharing model once perfected by the mainframe and then re-distributed with the advent of the commodity PC server. The data processing world is re-aggregating, but this time with a difference. It’s not just that the computing is being centralized. It’s that the centralization is taking place in proximity of cheap power sources such as hydroelectric power, nuclear plants (for now) and wind farms.

“Cloud leads to big efficiency gains,” said Chris Rowen, chief technology officer at Tensilica. “Now you can put the computing farm where the energy is available. It’s an arbitrage opportunity. It’s not hard to ship bits when you compare that to the difficulty in transporting electricity.”

There’s a clear business case to be made on this front. An estimated 6.5% of electricity is lost in transmission, according to the U.S. Energy Information Administration. That may not seem like a lot until you consider those are high-voltage transmission lines. Bits are cheap, in comparison—even trillions of them—which is why there is talk now of centralizing portions of even base stations. Those parts that do intensive computation with a high degree of redundancy are prime candidates for being located in a data center.

“There’s a lot of computation needed to reduce noise and create a clean signal,” said Rowen. “But there’s also some computing that has to be done locally because there are tough latency requirements.”

Adaptive Body Biasing
Adaptive body biasing has been under serious discussion for the past five years as a way of reducing current leakage by controlling a device’s body voltage, which in turn increases the voltage threshold. The big advantage here is less switching to the off state. The downside is this is has been difficult stuff to design and manufacture.

“This was not seen as a mainstream approach, but now it’s showing up almost everywhere,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “This was seen as a challenging technique to implement, but now TI and Samsung are using it. If you change the body bias voltage, you impact the threshold voltage. You can increase or decrease leakage, as needed, and boost performance.”

Consultant Bhanu Kapoor, president of Mimasic, noted that for some high-performance applications the alternatives such as power gating may be impractical because it simply takes too long to turn on and off sections of a chip. In those cases, body biasing is the only choice.

Atomic-Level Changes
Another technique that has been particularly difficult to master is atomic-level control of channel doping on the manufacturing side. And while most experts don’t expect the process and manufacturing side to offer any huge gains, this one may be the exception.

Scott Thompson, chief technology officer at startup SuVolta, said that by improving the doping technique, both dynamic and static current leakage can be reduced with regular bulk CMOS.

“The problem is that the wall around the channel is leaky and it’s hard to control the shape,” said Thompson. “Strain engineering helps to control the atomic-level analysis. But there has been no other breakthrough other than changing the transistor, and we don’t see a need for that for all architectures.”

At its unveiling last week, SuVolta had lined up support from Fujitsu, Cypress, ARM and Broadcom. The company claims the technology is an alternative to FinFETs, which are more difficult to manufacture.

3D Transistors And Packaging
Nevertheless, the major foundries have committed to building FinFETs at advanced nodes. Intel’s announcement of a Tri-Gate three-dimensional transistor at 22nm has been a major topic in the semiconductor industry. The question is now that Intel has publicly committed to the technology, can it really be manufactured with sufficient yield? And can it be built effectively using the disaggregated foundry model in the near future?

These kinds of questions will remain unanswered at least for the next couple years. TSMC is planning to use FinFETs at 14nm, and GlobalFoundries has been working on the same technology. Nevertheless, the big advantage of FinFET technology is a sharp reduction in leakage while providing a significant performance boost.’

Creating stacks of die also has a huge effect on power, in part because the distances between logic and memory can be shortened significantly. A system-in-package version of stacked die, using interposer technology, is expected to begin widespread production over the next 12 to 18 months, bolstered by the new Wide I/O standard that increases the size of the pipes between logic and memory.

New Materials
Fully depleted SOI, silicon on sapphire, as well as new ways of putting them all together in stacks connected by low-cost interposers that can be made of glass have turned into major research efforts as companies seek to knock costs out of the bill of materials for new chips.

While the FD SOI has been well tested for years by the Common Platform participants, the others have only been used on a very limited basis. One approach now being considered is actually designing chips to run hotter rather than trying to keep the power down. While there are limits to this approach—no one wants to pick up a hot phone—there are times when performance is more important than heat.

Taken as a whole, all of these changes can have a significant reduction in power, particularly when coupled with efficient software code and more customized user controls—and end devices that actually use the power-saving technology that is being built into these chips.

Power Bits: May 27

Friday, May 27th, 2011

By Ed Sperling

Going Vertical
Now that everyone has gotten the energy-efficiency message down pretty well, the next step is to apply that to specific markets. That’s beginning to happen, too.

A leaked product roadmap from AMD shows machines with all-day battery life and a focus on everything from ultra-mobile notebooks to tablets.

Intel is refining its own message to go after specific markets, as well. The company has created a small-business cloud platform on a pay-as-you-go basis. Given the amount of energy consumed by underutilized servers, this is a huge efficiency play—as well as a way of Intel sidestepping the PC OEM for its share of the profits. 98

Companies such as Tensilica, meanwhile, have been focused heavily on low-power communications, most recently in the LTE and LTE Advanced space. And ARM and MIPS have been divvying up targeting a variety of specific markets. ARM has been focused on mobile devices and a slew of vertical applications ranging from medical devices to other consumer electronics is well documented. Likewise, MIPS has focused on set-top boxes and Android-based devices.

Lowering Carbon Dioxide
The International Energy Agency issued a report today that carbon dioxide emissions must be eliminated from electricity generation to limit the rise of global temperature to 2 degrees Celsius.

The report noted that total output of electricity and heat grew 55% between 1990 and 2008, but corresponding CO2 emissions grew 64.5% in the same period. The report recommends greater efficiency in lighting, heating, cooling and information technology, and powering with renewable sources of energy, nuclear, and carbon capture and storage.

This is good news for the electronics industry, in general, and the low-power engineering portion in particular.

Power Bits: May 6

Friday, May 6th, 2011

By Ed Sperling

The Other 3D
Intel will roll out processors using tri-gate finFET transistors at 22nm, which it says will sharply lower the operating voltage, boost performance and reduce leakage.

Multigate transistors have been the subject of research for decades, most prominently at UC Berkeley, because they can be used to reduce current leakage and increase density. Going vertical allows more transistors to be loaded onto a piece of silicon, which in the case of a processor is particularly important because more transistors can translate into better performance.

Intel claims the new structures will improve performance by 37% at low voltages. The company said that makes it ideal for small handheld devices, a market where Intel has not done very well in the past primarily because its chips are considered power hogs next to those using ARM and MIPS cores. That statement alone caused ARM’s stock to plunge 7% as speculation mounted that Intel could replace ARM cores inside of some Apple devices. This is pure speculation, of course. Apple never talks about that stuff and Intel hasn’t even intimated that. ARM’s stock recovered rather quickly, too.

Still, most companies have shied away from finFETs because they are extremely difficult to manufacture and potentially can add to the design and manufacturing cost. Intel’s big advantage in this regard is that it still owns its own fabs and develops its own manufacturing process, something that is far too costly for all but a handful of chipmakers.

An alternative to 3D structures is ultra-thin body silicon on insulator, which is now being tested by IBM, STMicroelectronics, Soitec and Globalfoundries. And there is a possibility of mixing things up to include both. But the writing is on the wall—big changes are ahead, and Intel’s move is a first big step in that direction.

TI Pushes FRAM
Microcontrollers have been used for years to reduce power in devices through such developments as multispeed motor control and intelligent sensors, but the real battle of late has been inside the microcontrollers themselves. Companies in this sector have been playing leapfrog with power numbers taking priority over performance increases.

TI’s latest rollout includes an ultra-low-power FRAM, or ferroelectric RAM (previously written as FeRAM). This type of RAM uses 250 times less power than EEPROM-based microcontrollers, according to TI, and can be written at speeds of 100 times faster. FRAM is not a new technology. It was developed in the 1990s by Ramtron, and has been manufactured by Fujitsu for more than a decade.

Apparently major strides have been made in the pricing of this technology since then. TI’s microcontroller is priced at $1.20.

Next Page »