Posts Tagged ‘Open Silicon’

3D DRAM Makers Inch Closer To Production

Thursday, December 1st, 2011

By Mark LaPedus
For some time, DRAM makers have been developing 3D memory chips, but commercial products still are not due out for some time because of technical and cost issues.

But the advent of the 3D DRAM era could be near the turning point, as two memory rivals have separately moved to bring their respective technologies closer to production. In one move, Micron Technology Inc. has disclosed the manufacturing flow for its recently announced Hybrid Memory Cube (HMC) technology, a 3D DRAM scheme geared for high-end servers and networking systems. Under the plan, IBM will manufacture the controller logic portions of the HMC within its own fab. Micron will make the memory portions, as well as assemble and test, the HMC devices within its own operations.

On another and more surprising front, Japanese DRAM maker Elpida Memory apparently has beat its larger rivals to the punch by announcing the industry’s first commercial Wide I/O DRAMs. The first device from Elpida, dubbed Wide IO Mobile RAM, is a 4 Gbit device based on a 30nm process technology and a 3D structure using through-silicon vias (TSVs). Elpida plans to sample its first Wide I/O DRAM devices this month. The devices are geared for next-generation smartphones and tablets.

Samsung Electronics Co. Ltd. and Hynix Semiconductor Inc. are also separately developing 3D DRAMs. The idea behind a 3D device is to stack existing die and connect them using TSVs, thereby lowering the resistivity and boosting the bandwidths. But the problems with 3D devices based on TSVs involve cost, technical issues and supply-chain headaches.

“There is a lot of attention and engineering resources being thrown at 3D right now by all DRAM developers, including Samsung, Micron, Elpida, and Hynix,” said Mike Howard, senior principal analyst for DRAM and memory at IHS iSuppli. “Wide I/O has yet to really reach a cost level that makes it competitive and we are likely still a few years away from mass adoption. Elpida may very well have a functioning part in the lab and may be able to produce test samples, but I think we’re still a few years away from this being used in anything but the most premium markets.”

Hank Lai, product planning for memory marketing at Samsung Semiconductor Inc., said Wide I/O DRAMs are not expected to gain traction until sometime in 2013. At present, smart phones and tablets are using plain-vanilla, low-power DDR3 DRAMs or mobile DRAMs based on the LPDDR2 interface standard. Before Wide I/O, the mobile market will move from LPDDR2 to the next-generation LPDDR3 interface standard, Lai said.
LPDDR2 has a maximum throughput of 8.5 Gbytes/second. LPDDR3 has a peak throughput of 12.8 Gbytes/second. Samsung claims its new LPDDR3 devices consume 20% less power than LPDDR2.

Elpida’s Wide IO Mobile RAM has 512 I/O pins. The device is said to achieve a data transfer rate of 12.8 Gbytes/second, roughly similar to LPDDR3. But Elpida’s Wide IO Mobile RAM has a height of 1.0mm, compared to 1.4mm with existing mobile DRAMs based on today’s package-on-package (PoP) technology.

Elpida acknowledged that the Wide I/O market will take time to evolve. The 4 Gbit Wide I/O DRAM will sample next month, but production “will take place sometime in the second half of 2012,” according to officials from Elpida. “For volume production, it will be sometime in 2013.”

In March of 2012, Elpida plans to sample a 16-Gbit DRAM, which is based on stacking four 4-Gbit Wide IO Mobile RAM chips. Mass production is due sometime after 2013, according to Elpida.

On the other end of the spectrum, Micron and Samsung are moving full speed ahead with HMC. “This is a slightly different product than Elpida’s and is targeted at server customers. The specs are very promising, but again, this is still a few years from hitting the big time—2013 at the soonest,” Howard said. “Samsung is also a part of the HMC group, lending weight to the product’s chances.”

In October, Samsung and Micron announced the creation of a consortium to develop an open interface specification for HMC. Micron is the actual designer of the HMC technology. Micron and Samsung, as well as Open-Silicon, Altera and Xilinx, are the founding members of the Hybrid Memory Cube Consortium (HMCC).

HMC will incorporate DRAM arrays stacked on a logic chip. The device is connected with 2,000 to 3,000 TSVs. HMC prototypes are said to clock in with bandwidth of 128 Gbytes/second.

It is not a widely known fact, but fabless ASIC house Open-Silicon is developing the controller IP for HMC. Colin Baldwin, director of marketing and business development for Open-Silicon, said the HMC controller will be based on the company’s Interlaken controller IP. Interlaken is a high-speed, chip-to-chip interface protocol that builds on the channelization and per-channel flow control features of SPI4.2. The Interlaken controller will serve as the interface between the memory and physical layer to help “boost the bandwidth” in the device, Baldwin said.

On the manufacturing front, the HMC device itself will go through a two-step process. The controller logic portion of HMC will be manufactured at IBM’s semiconductor fab in East Fishkill, N.Y., using the company’s 32nm, high-k metal gate process technology. IBM also will handle the TSV creation process based on Micron’s specifications.

Micron will develop and make the DRAM arrays in-house based on a 3xnm process within its own fabs, said Mike Black, a technology strategist at Micron. Micron will take the logic controller from IBM—and the in-house made memory arrays—and then will assemble and test the entire HMC device within Micron’s R&D production line in Boise, Ida, Black said.

Micron is in the qualification stage with the device. “We are feeling pretty good about it,” he said. “Most of the learning is done.”

Experts At The Table: Retrofitting Older Process Nodes

Friday, September 16th, 2011

By Ed Sperling
Low-Power Engineering sat down with Walter Ng, vice president of the IP ecosystem at GlobalFoundries; Vishal Kapoor, vice president of marketing for SoC realization at Cadence; Naveed Sherwani, CEO of Open-Silicon; John Heinlein, vice president of marketing at ARM; and Jeff Lukanc, director of engineering at IDT. What follows are excerpts of that conversation, which was held in front of a live audience at the Global Technology Conference in Santa Clara, Calif.

LPE: Is it harder to sell EDA tools for older nodes?
Heinlein: On one hand, the EDA requirements of the industry are evolving quickly. There’s one challenge where people have older tools, and you have to do new IP based on older flows. And then you have people wanting to use new tools on older nodes. They may want to use things like CPF. So we have this schizophrenia and we have to support that.
Kapoor: EDA is about tools, IP and services, and the reason the design components start to come in is that when you get to new nodes or existing nodes, you have to go broader than just tooling.
Sherwani: If you are developing tools for 14nm you are dealing with FinFETs and a lot of physical effects. But at 0.18 (microns) the issue is how to make one or two designers very efficient. We need to hire operations business people who have nothing to do with design in order to change that. EDA companies are very much focused on physics and getting to 20nm and 14nm. They still don’t have the mindset toward finishing a 0.18 design in one day. Is it possible to put together a flow that can be done by only one guy? In addition, many designs are derivative designs. Companies may want to add DDR3 to an existing design. The headcount and mindset required is very different.
Kapoor: We were joking before this panel that somewhere between the 34th and 43rd minute if I’m on a panel with Naveed he’s going to ask for free tools. With all due respect, that’s not a business we’re in. But in our core EDA business, we spend a lot of time on engineering efficiency. You will see a set of capabilities from Cadence that will address that. But if you can get an engineer to do a 180nm in a day, we should spin off a business.
Sherwani: The tools are focused on efficiency, but not on whether you can do designs in a day. If you can do a design in a day, you still have to verify that.
Ng: The point Naveed is raising is a cost issue. Whether it’s at the leading edge or older nodes, cost is in the purview of whether it even makes sense to do the design. Years ago when I was at Cadence we had a seven-day design goal. EDA hasn’t always looked at driving cost and efficiency.
Kapoor: First and foremost, EDA is about density and automation. The second part is that you have to figure out how the economics of the whole industry work. If at 40nm we spend $600 million putting together technology, to have anyone design at that node you need sufficient volume to get an acceptable return. And at 28nm it’s $1 billion and at 20nm $1.5 billion. You have to recognize that everyone can’t have an apps processor. That’s not going to happen. Just like there are limits to technology, there are limits to economics. If that’s what it will take us to put into it as part of the broader industry, that’s what we’re going to have to bear.
Ng: Do you think EDA has been driven to the same level of efficiency as other parts of the supply chain?
Kapoor: That’s not a fair question and here’s why: The way the business model for the tools piece works is different than for the semiconductor manufacturing. In the long term, if we bring on additional services will we look to be more in line with other parts of the supply chain, including Naveed’s business? Absolutely.

LPE: If power was not an issue at 180nm in the past, why is it such a big issue now?
Heinlein: Because the bar always moves. People are looking at applications that require low power much more than before. We also need to have power management ICs alongside other chips. And there’s a question of using the right hammer to solve a problem. The bar is different than it used to be.
Lukanc: The mix of things you put in a chip is different. There are mixed signal and power management. You can get a 40-volt PCB process at 0.25 microns. Now 30-volt processes are available at 0.13. You can mix things together and keep mask costs relatively low. Time to market is shorter, investment is lower and it requires fewer people.

LPE: What does the ecosystem look like with more foundries at older nodes?
Sherwani: These are not like TSMC or GlobalFoundries. They have their own IP houses or in-house IP.
Heinlein: That’s correct. These are companies that are very comfortable in their niche markets. That said, we are starting to witness sea changes in areas such as embedded microcontrollers, driven by the so-called Internet of things. That’s going to drive people to put microcontrollers and processors in places where they’ve never been before. Enabling modern software development and EDA development allows you to do more.
Kapoor: If you’re talking about a transducer or something like that, you’ll have to integrate the increasing analog and mixed signal capability with the digital capability. A mature node makes perfect sense. What you have to learn is what you need from the EDA side all the way to the manufacturing side.

Experts At The Table: Retrofitting Older Process Nodes

Thursday, September 8th, 2011

By Ed Sperling
Low-Power Engineering sat down with Walter Ng, vice president of the IP ecosystem at GlobalFoundries; Vishal Kapoor, vice president of marketing for SoC realization at Cadence; Naveed Sherwani, CEO of Open-Silicon; John Heinlein, vice president of marketing at ARM; and Jeff Lukanc, director of engineering at IDT. What follows are excerpts of that conversation, which was held in front of a live audience at the Global Technology Conference in Santa Clara, Calif.

LPE: What is the definition of a mainstream process node these days and why are older nodes so important?
Heinlein: We’re thinking of mainstream as 55nm and older. That’s where a lot of the high volume is. Even though it’s sexy to talk about the leading edge, last year about 75% of ARM’s royalties came from cores that were developed in 2006 and earlier. About 3 million of the 6 million cores we shipped were ARM 7.
Ng: From a manufacturing standpoint, the volumes are at 65nm. From that node it’s moving from 55nm and 40nm, but that’s still the bulk of the industry. A lot of companies are doing some very cool things that are very relevant today at those nodes. Even with some of the biggest companies, a lot of the volume is at 65nm. It’s what pays the bills. If you have 200mm capacity, those fabs are completely depreciated.

LPE: How about for the tools? Does the mainstream part of the market really pay the bills?
Kapoor: From an EDA perspective, 65nm pays the bills as much as 28nm and 20nm.

LPE: Is everything still following Moore’s Law? If a company is designing at 65nm, does it necessarily move to the next node?
Sherwani: We look at everything from networking to consumer applications. Some customers need the latest technology. But there are others who are at 0.18 (microns) and thinking about 0.13, and maybe they don’t to go there. The velocity of that move is segment-specific.
Lukanc: The mainstream for production is 0.13, but a lot of the new designs are ramping to 65nm. We’re looking at older technology and combining new things through integration. There may be a call management IC with a 30-volt option at 0.13 or 0.18, which allows the unique combination of analog and digital management on one chip. We can re-use some of the older technologies.

LPE: There’s a lot of investment in older processes these days. Why?
Sherwani: I visited about 10 fabs in China and I was surprised that none of them had 65nm processes. Most didn’t even have 90nm processes.
Ng: If you look at what’s driving a lot of technology today, it’s the consumer market. And that’s very cost-conscious. If you can’t take advantage of the latest technology, then you look at where your given application makes sense. Cost is very much a factor that customers consider at each process node. And for us, we have to find ways to keep investments in fabs relevant to our customers. We have a big focus on high voltage and power management. We have to find ways to add value on top of baseline logic, which is a commodity at this point.
Heinlein: If you look at smart phones, everyone is always focused on the processor and the high-end chip. But alongside those are the power management controllers and display drivers and RF/mixed signal. Another area for derivative value-added processes that Walter (Ng) mentioned is low leakage. When you get to 65nm leakage is a problem. There are ultra-low leakage variants and high-voltage variants coming out at the high end and the low end, so people can put those into applications that can run on a coin-cell battery for 10 years. To complement that there are ultra-dense libraries that bring the cost and the leakage down and which are suited very well to these kinds of applications.

LPE: If you develop a chip at 180nm and the process changes to low leakage or low power, does it yield the same?
Ng: The strategy in developing these new processes or modules on top of derivatives is to preserve the investment that was made earlier. It takes advantage of the proven solutions that are already there. When we originally developed those processes, at that time they were leading-edge processes. As you get much more volume using those processes, the manufacturing window becomes quite tight. You could probably tighten up the bit cells. But it’s a business tradeoff whether you re-invest in that or not. The yields are just as good.

LPE: What happens to the tools and the IP that was developed?
Heinlein: For the most part it all works. If you think about 180nm, nobody cared about leakage because it wasn’t an issue. Now, when people look at 180nm, they do care about leakage and power management. So we’re putting that back into 180nm.
Kapoor: The innovation at the leading nodes is going to drive benefits at the older nodes. You drive it back in terms of products, but you also drive it back in terms of design techniques. We developed a 28nm PHY, and we were challenged to do it differently because it’s for a leading node. Today we’re applying what we’ve learned back to 40nm and 65nm.
Lukanc: The best tools are developed at the leading nodes, but you may want to characterize older libraries for low power and power management.

LPE: If you improve an existing technology at an older node, can you charge more for it?
Lukanc: Yes. In general, what we’re offering is value-added solutions. In some cases we offer value-added solutions that are low power.

LPE: Will it be essential for older processes to be updated when we get into stacked die as a way of decreasing the overall power budgets and physical effects?
Sherwani: The answer is different for each area. There is no single, simple answer to that.
Kapoor: For a long time our industry has looked at the technology piece rather than the economics. The answer is, it depends. Can you get more value out of an older node? Yes. The economics will drive the longevity of nodes and what you can get out of them. But we cannot talk about the value of older nodes unless we invest in the newer ones.
Lukanc: If you have an existing product, you can look at the option of integrating oscillators or an EEPROM or something else on top of it to reduce the system cost. There are lot of things you can do in a package to reduce the overall cost, but you have to look at the total system cost. You may be offering a smaller footprint to the customer, but they may not be getting value out of that.
Heinlein: If you look at mixed signal and RF design at the leading-edge nodes, it’s really tough to get the transistor variation to be complementary to the analog. There’s a point at which it’s too hard, and in that case a heterogeneous 3D package makes sense.
Kapoor: With 3D ICs there’s a technical capability about whether you can marry different die. But you also have to look at it from a system capability. When you look at tablets, where the SoCs are talking in very high bandwidth to memory, that makes sense. The technology by itself won’t be an answer. You need to find out where it makes sense to use it.

LPE: Is investment in older process nodes an arms race that favors the big foundries?
Sherwani: The specialty foundries being built in places like China have nothing to do with companies like GlobalFoundries and TSMC. They will ship a lot of silicon. Over the next 10 years a lot of the analog silicon will be shipping out of China using all older nodes.
Kapoor: Those boutique fabs are certainly making investments in areas in which they specialize.
Ng: You have to continue to make fabs relevant and to drive a good margin. A big impetus for us in developing modules on top of our processes is that you do get the second- and third-tier foundries coming in and taking the floor out of the base logic price. That’s difficult for us to compete with. So we’re looking at where to add value and how to win a good percentage of market share. We have our investments in 200mm. We will continue to invest there.
Heinlein: We definitely see lots of specialty processes at the smaller players. We work with them and enable them. But once it gets to a certain point in the market they we work with the big players.

LPE: Will it become a battle of who has the deepest pockets?
Sherwani: The good thing about older nodes is that the investment needed is miniscule compared with the tens of billions of dollars at advanced nodes. A lot more players can be relevant at older nodes. At 14nm I don’t think there will be more than three or four players.
Ng: The incremental investment to bring up these value-added modules is nothing compared to the investment at the leading edge. The other side is that the equipment manufacturers are a leading component of the cost at the leading edge. At the mature nodes, you’re not buying a lot of new, expensive tooling.
Lukanc: That happens on the product development side, as well. To do a 100 million-gate design requires a certain amount of tools and people and mask costs. At the older technologies mask costs are quite cheap. And if you’re re-using technology and adding to it, you can keep NRE low so return on investment is quite high. You need to take advantage of mainstream older nodes as well as more aggressive nodes.
Ng: And most times our relationship with most of the leading-edge companies span multiple nodes.
Kapoor: At 14nm there are 5 or 10 customers. As a foundry, you have to worry about how you’re going to get the rest of the industry in. The economics even for the companies that can afford it aren’t that great. So you’re going to see continued innovation even at the older nodes.
Ng: A major part of the foundries’ concern is up and down the supply chain. It’s not just the fabs. It’s the tools, the support for IP providers, and packaging solutions. That’s a challenge we have to address as an industry.

Experts At The Table: Multi-Core And Many-Core

Thursday, August 11th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: How does cloud computing change the need for multicore and many-core processors?
Sherwani: Cloud architectures will evolve differently from mobile architectures. They will be homogeneous 8-, 16- and 32-core architectures. They knows a lot about what you are storing. You can put a lot of intelligence into what you’re storing, which is not the case in a mobile device.

LPE: So what does that mean for the mobile devices taking advantage of it?
Sherwani: It can certainly make mobile devices more efficient. You can store a lot more on the mobile devices. You can do a lot of streaming.
Martin: The application cloud interaction may change in character. People will write somewhat different apps in the future that will take advantage of what the cloud has to offer. This is why you’ll see cobwebs on the desktop in the future because no one is very interested in it anymore.
Sherwani: And if you look at video, with the cloud and a good wireless connection you don’t have to store the video. Video cameras will become a lot less expensive.
McDermott: This should be put into context. It’s amazing that people are so excited about a database. That’s all it is. I believe the vision for the mobile device is that you have access to all the data, and you selectively choose how to expose it. The browsing experience is different. You don’t try to replicate the desktop experience on a smaller screen. It’s a given. You take the appropriate content and you display it in a way that’s easiest to digest. I think the hardware on the mobile device will become smart enough to selectively show you the piece that you need on your mobile device. You don’t need an entire map. You just need to know where you are.

LPE: What’s interesting about databases, though, is that they’re one of the very few applications that really can do true parallel processing and scale effectively.
Sherwani: I’ve been saying for the last two years that we should stop giving people content. In five years all the content will be available. If you’re a mechanical engineer, everything you need will be on the Web. What we need to do, though, is teach people how to do something useful. This is the same thing with mobile devices. Whatever device will be useful will be the one that can quickly filter through what you’re looking for to get something done. It’s not about storing more information. Cloud brings that opportunity to people, devices and things. Our view of expertise will change. It won’t matter if you’re an electrical engineer. It’s whether you can get a task or series of tasks done. That will be more important than a Ph.D. We are 10 years from that, but this is how people of the next generation will think.

LPE: What you’re talking about is data mining for the masses?
Sherwani: Yes.
Martin: Before we get too carried away, there are a couple of issues that really need to be solved in this cloud paradigm. We do need to think a lot about privacy, security, and the ability of the infrastructure—both wired and wireless—to deliver all of this content off the cloud and onto the sea of mobile devices. We all know about the experiences of certain smart phones overloading networks and they’re still trying to improve the quality of the network. The wired infrastructure is not fault free. Security and privacy worry me more. If you upload all your data into some big infrastructure, you want your data secured.
Rohatgi: That’s the weakest link. Everybody’s pushing down this path. What worries me is the security and reliability. There are a ton of issues that need to be resolved. Creating a smart infrastructure for data mining can be done today. On the mobile side, there are probably some advances necessary to improve battery life, which is the No. 1 complaint I hear today. But the weakest links we hit are the communications channel, security, privacy and reliability. If those can be resolved then we can progress.
Martin: The technologies we’re all involved with are going to help in a big way. It just requires a bit of mobilization to focus on those issues.
McDermott: This reminds me of where we were with cell phones years ago when the processor went through certification with the carrier. The consumer doesn’t see all the certification on the network. The carrier loves new features. It’s more traffic for their store. It brings in a new wave of users. What they don’t want to see is something that disrupts their infrastructure. For the engineer, the certification is really intense and the field trials are difficult. The cell phone industry has to show a partition that you can certify your baseband and your protocol stack and that has to be isolated from other activity. That underlying security infrastructure is built into the certification. I think we’ll see that extended upward through commercial transactions to having trusted processes and transactions.

LPE: Will cores all be homogeneous or heterogeneous, and will some of them be virtualized?
Sherwani: All of the above. There will be homogeneous cores, heterogeneous cores and there will be virtualization. They all solve different problems. You need virtualization in data centers.

LPE: But will you need virtualization on your smart phone?
Rohatgi: We’re starting to see some of that. I don’t think the operating system wars are dead. And at the end of the day, there is some value to keeping RTOS access to legacy hardware and a high-level operating system like Android or Windows or IOS. From a security angle, it all depends on the use case. The mobile guys are really scared of virtualization of a single processor that has access to all memory. They want separate memory and separate everything.

LPE: This is similar to devices that have a partition between what’s used at home and at the office, right?
Rohatgi: Yes. It’s the same problem. And this almost ties into virtualization. On the privacy side, there isn’t a well-defined security layer with NFC (Near Field Communications Forum) and they’re talking about mobile payments. If you power on an Android phone and shut off all networking then your maps go haywire. Why? Because there’s a back channel that goes to some cloud that helps triangulate where you are. That information is stored to help applications of the future. I’m surprised people aren’t bothered by this. But to return to the question, we’re starting to see some effort down the path of virtualization even though it’s not widespread yet.
Martin: You won’t see virtualization down to the metal. In the dataplane layers it’s nice that processors can emulate other processors effectively, but close to the metal you want extreme efficiency and high performance.
Neifert: And that’s where I see the problem with virtualization. It’s the power. Virtualization is nice, but it’s an abstraction away, which is a power loss. At that point you need heterogeneous processing.
Rohatgi: Transmeta, about nine years ago when they started doing abstractions to hardware, had power numbers that were way down. It’s too bad that green energy wasn’t something that was important then. Still, the genesis of the Atom processor was entirely because of Transmeta..
Sherwani: A typical Bluetooth radio takes about 32 milliwatts of active power. At 65nm we have a Bluetooth radio that only uses 3.2 milliwatts. And there is a design on the board that will take it below 1 milliwatt. There are a bunch of engineers getting excited because over the last 100 years the basic design of a radio has not changed. What Marconi designed is essentially the same as we have today. But when you scale down the power needs to go down. It’s amazing how much lower you can go.
Rohatgi: There’s the other side of this, too. Battery technology has not evolved as much as we would like. For the analog components, it’s the switching characteristics that are governing it. That’s where you’re seeing a lot more intelligence. If you were to look at the power profiles of a mobile device, LEDs and LCDs were supposed to be the promise for low power. That hasn’t worked out. There are still 250 milliwatt drivers. The radio is probably No. 2 on the list after that.
McDermott: People’s expectations were that a screen would be a certain pixel density. Today that needs to be super high-definition. It’s beyond high-def.

LPE: So will we see more cores in the future or have we maxed out?
McDermott: As a programmer, how are you going to keep track of 100 cores? How are you going to program that intelligently? Either it’s going to be some array a programmer can visualize, or it’s going to be three or four very solid cores and let other cores do things like Bluetooth. You can’t keep 100 threads in your mind.
Rohatgi: There’s a limit to this. If you look at the desktop space, in 2006 when Intel began heading out on this multicore approach they found that success wasn’t nearly as fast as they thought. There’s probably a limit on mobile devices, too.
Sherwani: We did all this in the 1980s. nCube used to have a 16-core and 32-core machine. It works great up to 8 cores, but after that you lose it.
Martin: If you are trying to program a concurrent application and split it into different threads, there are inherent limits. Some very specialized applications may be very concurrent, but most are not.
Neifert: The programming model has a human in the center, and humans can only process so much. Until the fundamental programming model changes, you won’t see much advancement.

Experts At The Table: Multi-Core And Many-Core

Friday, July 29th, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Is software taking advantage of the hardware in a power-efficient way?
Rohatgi: Yes, and the ultimate example of that is the Android operating system. Even though it relies on Linux there are on-demand and five levels built into Linux that controls at the software level the CPU registers or SoC registers to shut down power. You’re already seeing that at the operating-system level.
Martin: It depends upon which software you’re talking about. At the OS level, where lots of apps are running, there may be commoditization happening. Down at the dataplane, where people use application-specific processors, you can argue that’s the infrastructure. People want extreme power efficiency and reliable continuously executing functionality. That’s the place where heterogeneous multiple processors really shine. It’s almost an infrastructure layer in a mobile device. So you see different solutions depending on what level of the device you’re talking about. We see a drive to more heterogeneity, too. Baseband wireless infrastructure works better with heterogeneous processors than trying to shove that onto a multicore device.
Neifert: That’s certainly what we’re seeing in our customer base. They want one processor to run the modem subsystem or the WiFi and partition that off. The last thing you want to do is wake the application processor all the time. The application processors are getting more complex so you can talk and play games at the same time and surf the Web. The application processor has to handle all of that. The application processor may be power efficient, but not as power efficient as one that just runs the radio or data transfer.

LPE: Is it better to actually design a device with multiple processors or a single multicore processor?
Sherwani: When I was at Intel we believed it was the best processor ever developed. I never thought I would see ARM and x86 processors on the same device. We are not that far away right now—and I’m talking about having them on a single chip. Or it may be a MIPS or Tensilica core. Such processors will exist. We are very efficient these days about using power islands. We can put six or eight processors on a chip and we can put them to sleep when they’re not being used.

LPE: Is it more difficult to verify them?
Sherwani: The verification nightmare is growing exponentially, and it’s not clear to me how we will be doing verification five years from now. At the implementation level, verification is becoming a bigger and bigger piece. But it’s more of an architecture question than whether you’re using multicore or many cores.
Martin: This whole approach tends to lead to a more compositional design style where you’re composing well-understood systems. What you need to do is limit the interactions between them to a relatively high level of abstraction or control. You verify significantly each subsystem and then you verify without having a great deal of interaction between the subsystems.
Sherwani: It’s amazing that on a big chip people don’t do flop-to-flop timing on a block. This is a situation that would never happen in software between subroutines, but it happens all the time in hardware. In hardware we have not reached a maturity level where I take care of my block and you take care of your block. We have timing paths going to two blocks and you cannot time it unless you do the timing and verification together.
Neifert: I’ve got customers that will spend months validating their processor, fabric, memory and data path, throwing out all the various options on there and running that. That could be a single-core processor reaching out to memory, and they’ll spend a lot of time optimizing that. Now throw in one other master accessing the same memory and everything goes out the window because of all the different permutations when these things talk to each other. It now blows up exponentially. The nice thing about a multicore approach is that you’ve handed off a lot of that task to the processor guys and hope that they’ve done it properly. It may not be the optimal use for your application, but pushing the problem off to an IP provider and a multicore solution is what a lot of our customers are doing.

LPE: What’s the best way to take advantage of cores? Do you do it with Wide I/O or through multicore and a standard bus?
Sherwani: If you look at where Micron is going with this, the whole interface has been changed. The memory becomes a lot more intelligent instead of a dumb storage. You will be able to ask memory to do certain tasks. Processor people have tried to make memory as dumb as possible in order to commoditize it. All the value comes from the processor side. But balancing would be better so you can offload things. You can combine flash into the most cost-effective memory. Instead of saying, ‘Give me byte No. 7,’ you can say, ‘I need this piece of information.’ It’s a lot more power-efficient to do it that way.
McDermott: It’s quality of service. You’re not just making a data request. You’re saying, ‘I need high bandwidth or high efficiency or low latency.’ A processor may need only a small amount of data, but it may need it very efficiently and very fast. With video you need high bandwidth that is very predictable. Having graphics integrated is one way to go. Unless you have a view of the fabric, the quality of service and the end power engine it’s going to be very hard to engineer a one-point solution.
Martin: With a compositional approach, you may have big memories and then a lot of small distributed memories to keep data close to the area where it is being processed. And maybe you need some intelligent abstractions on things like DMA (direct memory access). That would give programmers more assistance in managing the data flow and data interaction so things will move out of central memory into local memory before they’re needed. That’s a different programming style. We need more flexibility in how hardware and software developers can compose these memory systems together.
Sherwani: If memory is knowledgeable about what is stored inside, it can give you service of the highest level. Right now you can’t do that. The attitude has been, ‘I have a board and I have a DIMM and I want this DIMM to be as low cost as possible.’ That approach has led us down this path. If you’re designing a microprocessor of any kind, it puts a lot of burden on the microprocessor to do all these things with memory. Eventually you will see memory microprocessors—storage with a processor on it—that can gate what is being stored on it. That is a new area, though, and I don’t think much has been done so far.
Rohatgi: In some respects this is already happening. If you think about cache controllers over the last 30 years, this is where you’ve seen a massive improvement. It isn’t user-level aware. It’s bit-level aware. And if your memory isn’t fragmented it works. Or in a multicore design, a coherency module is also very well aware of what it needs to do to keep synchronization between processors. I like the visionary statement of making it user-focused.
Neifert: If you look at the various SoCs on the market, they may use processors from ARM, MIPS and Tensilica, but a large number of them are still doing their own memory controllers because that’s a place to differentiate their design. There are more memory controllers coming out of Synopsys and Cadence, but in large part the bleeding-edge SoCs are still designing their own.
Sherwani: But you can go a lot further.
McDermott: There’s a big difference if you can optimize a path for video and have some pre-fetch algorithm. That may not apply to every chip. But in a custom design, you can partition as needed. When you define your coherency space you need to make them aware of these choices. It’s not just an arbitrary memory spec. You need to make them aware of how to use it.
Martin: That should lead to some opportunities for much more sophisticated memory control, and the kinds of data flows and accesses that people really want to do. That can be reflected in configurable memory IP. I’m not sure how rapidly that’s happening, but there are moves in that direction.
Sherwani: For the work we are doing with the [Micron] Hybrid Memory Cube, there’s a lot of excitement around that space. A completely different level of system design is possible with that kind of hybrid model.

Experts At The Table: Multi-Core And Many-Core

Thursday, July 21st, 2011

By Ed Sperling
Low-Power Engineering sat down with Naveed Sherwani, CEO of Open-Silicon; Amit Rohatgi, principal mobile architect at MIPS; Grant Martin, chief scientist at Tensilica; Bill Neifert, CTO at Carbon Design Systems; and Kevin McDermott, director of market development for ARM’s System Design Division. What follows are excerpts of that conversation.

LPE: Computers aren’t getting the power/performance boost today from multiple cores because the software can’t take advantage of them. How do we fix that?
Martin: Your computer isn’t a place where all the advanced design techniques are used. You have to look at battery-powered, cordless devices to look at the places where people use the most advanced design techniques. There they very often will have specialized application processors for different parts of the applications they want to run on those devices. Those processors are designed to be energy-efficient and to efficiently use battery power, and they probably do work better from one generation to the next—except for the case where they may throw on additional general purpose processors and don’t take advantage of energy consumption. You have to get a big distinction between multiple processors that are application specific vs. general-purpose processors that do not offer efficiency or better performance.
Rohatgi: Once the Intel-AMD megahertz wars ended people started heading down a different dimension of multicore. Back then they believed that changing the software ecosystem so that specific software or systems could be written to take advantage of multi-core, multi-thread, multiple processor designs would actually work. We’ve seen it work in many cases. You can reduce the latency when you’re executing a certain process or multiple processes. Another twist to this paradigm is people use core islands. The operating system may run on one core while another core is used for acceleration. Some people define that as multi-core, and that has been very successful because you can partition between a media processor engine, a video processor engine and a graphics processor engine. In terms of power consumption, that whole element needs to be pieced into this picture. When it comes to embedded SoC design vs. desktop design, those are very different when it comes to power consumption. That element hasn’t been worked through very cleanly on the desktop side, where suddenly you need 800-watt power supplies.
Neifert: The overall user experience that people have when interacting with a device has moved from the underlying hardware to the software. The emphasis has shifted to enhance the user experience. Opening a window on your desktop used to be simple. Now there’s shading and fancy graphics, so the same window that used to come up in 5 instructions may now take 500. It looks a lot nicer and in some cases that changes the user experience. But from the processing side, the focus stopped being on single-thread performance as the megahertz started burning up too much power. They branched out into multicore to solve that, but changing the software to accommodate that has been a big struggle. Changing the hardware to isolate that properly has been a struggle, too. Some of the processing that been done on computers is difficult to migrate over to mobile devices. A lot of the innovation on the desktop is now taking place in the embedded space. If you want to see the leading-edge design techniques, that is where you have to look.
McDermott: In the mobile area low power is associated with the battery life and the key to the user experience is maintaining functionality throughout a working day. We’ve gotten to that point. Now we’re engineering more productivity. There are more features you can run, more capabilities, more graphics, but still within that working day. Now what we’re seeing is low power is key to other markets. Data centers are predicted over the next few years to rival the airline industry for energy consumption. Cloud computing will lower the power a node, but that energy is still being used somewhere even though it’s shifted. What cloud changes is that if you run an application on one device and shift to a different device it’s no big deal. It takes advantage of the underlying computing architecture. There also may be a hierarchy of operating systems to deal with it, depending on the device.
Sherwani: We got very interested in how power relates to multiprocessing. If you are trying to predict power within a watt or two that’s no big deal. If you are trying to predict power within a milliwatt, that’s very difficult. We thought that by looking at implementation of the netlist we could predict power. That turned out to be not the case. Then we tried system-level design. That doesn’t work. We finally came to the conclusion that you have to have a user model. We needed a human model—a businessman, a lawyer, a student—and then analyze what they did during the day. Then we had to convert that into system level and then RTL level. This takes us far from what Open-Silicon does as a company, but we have found this the only way to accurately predict power. These kinds of human models don’t exist. We created two models of two types of people who use it. Then we started recording real human beings and calculating the model against them. Good models don’t exist if you want to accurately predict power.

LPE: Are we better off with many cores or multiple processors?
Martin: Multiple heterogeneous processors are the way to go, particularly in the mobile domain. With clusters of servers you may have many homogeneous tasks you want to map. The desktop is a bit of the orphan here. If you move to cloud computing and the highly mobile devices and ever-smarter phones, you wonder if people will worry about even having a tethered desktop. That means the innovation may be in the big server farms and the mobile devices, and the desktop may gather dust.
Neifert: It will be replaced by a docking station that you plug your mobile device into.
Martin: That’s right. Or as we have seen, some companies are combining mobile devices and a laptop together. The use cases are extremely interesting because there is no single use case. For a mobile device that has an advanced graphics processor, the game player may burn up battery by hammering that all the time. The music lover may be using MP3 decoding and get significantly longer time out of the battery. That drives significantly different use models and processor choices.
Rohatgi: There are a lot of different vertical markets. It ranges from digital still cameras to anything with a battery. There is a use case for multiple processors. Networking and cloud computing are very large markets. In the embedded space, what has happened is there are a lot of people in the SoC space. The hardware itself is heavily commoditizing. Even the operating system is commoditizing. The differentiation is how you pick and choose your IP. If it comes down to cost in a mobile phone, from the top up they don’t have a feature list or a use model. The discussion begins with, ‘What can you fit in a 7 x 7?’ Based on something like that, what kind of IP can you fit in there and still have a useful device? In the volume mobile phone market, the direction is to shrink the die as small as possible. It may be a 6 x 6 or a 5 x 5. In that case, I would choose multicore rather than multiple processors.
McDermott: In cell phones the issue used to be standby and talk time. People could self control that. If you talk more your battery goes down. People are starting to experience that if you want to play games you have to deal with this. We’re starting to deal with the apps developers. You used to have specialized OSes and applications. With the proliferation of open source you don’t know what could be running on there. It can run any app. We’re reaching out to the app developer to write code that is attentive to the power effects. There is an amazing learning curve through people writing a good game experience in a power budget that’s acceptable. You need to get the apps to be power-efficient.

The Week In Review: Nov. 19

Friday, November 19th, 2010

By Ed Sperling
Synopsys announced the immediate availability of its DesignWare ARC processor for Blu-ray players, which it says uses less power and has higher performance. Synopsys acquired ARC with its Virage Logic purchase, as you remember. And if it seems as if Synopsys has taken a sidestep into another market, check out what it picked up with the acquisition of Optical Research Associates: illumination design and analysis software. Still, this is going to be a really interesting market as the world moves to lower-power light sources. You’ll be able to dial in color as well as intensity.

Synopsys also teamed up with SMIC to deliver a comprehensive SoC solution for SMIC’s 65nm process, and the two are working on the 40nm process.

Mentor Graphics added NetLogic’s multicore support to its embedded Linux portfolio, which has been used in a broad swath of markets ranging from mobile to network infrastructure.

Open-Silicon taped out a 2.4GHz processor using Cadence’s Silicon Realization integrated tool suite.

Verifying Low-Power Designs

Thursday, January 14th, 2010

By Ed Sperling
Power islands and multiple voltages used to be reserved for cell phone and process companies, but as more companies move to 65nm and 45nm process nodes these approaches to saving power—particularly in chips with multiple cores—are becoming mainstream.

The problem isn’t in the architecture of the chips, although that certainly brings its own set of challenges. More and more, the real holdup is at the verification level. While the percentage of time spent in verification has remained relatively steady—anywhere between 50% to 75% of the total time it takes between architectural design and tapeout—the size of the verification teams has doubled and in some cases tripled.

“Verification is the next big challenge,” said Naveed Sherwani, CEO of Open Silicon. “As an industry we have not done a good job managing verification. A new methodology would be very welcome. We have had to develop methodologies in-house to deal with this.”

Sizing up the problem
All of the major EDA vendors recognize the extent of the problem. They’ve been dealing with horror stories from the field since the 90nm process node. And according to TSMC, about two-thirds of the industry is now at that node or beyond.

The most advanced parts of the semiconductor industry are now working on 32nm and 28nm, with even more power states—on, off, sleep, and sometimes even more in-between states—more power islands and more processor cores. In the most advanced chips, some of those cores are even heterogeneous, which means they may have different voltages and states than the other cores. That allows a system to reduce power consumption overall and concentrate power where and when it’s most needed.

“When you cross 100nm, you’ve got to design this stuff in or you’re not competitive,” said Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics. “We’ve got a number of people well down the road on this. Larger companies with larger design teams can afford the engineering expense to make this work. But as more people go to more advanced nodes they’re going to be dealing with issues they never had to deal with before.”

The first thing that most designers encounter is complexity. What used to be done on a spreadsheet is much harder to manage now.

“There are a whole series of interrelated topics of increasing complexity,” said Srikanth Jadcherla, group director for R&T at Synopsys. “The state space is huge, and when you start dealing with three or four power islands it’s amazing how quickly the number of states and sequences explodes.”

It’s also amazing how complicated this stuff can get very quickly. Consider, for example, what happens when you’ve got a device and you’re checking e-mail. The processor wakes up a number of mixed signal blocks, then turns off what’s not being used. But that sequence also has to be ordered, which means you also have to order the power islands.

“You may wire it from low to high when you need to go from high to low,” said Jadcherla. “The problem is that you’re trying t predict island orders. You can create a safe graph, which is a set of possible states so you can look at a design and ask, ‘What are the safe ways this will work?’ But when you’re dealing with 36 to 40 islands, there’s no way you can set it up safely.”

Tales from the crypt
One of the most common mistakes that design teams make in chip engineering is internal organization and communication. The team design and communication has to reflect what’s going on in the chip design and verification.

“We’ve seen problems in a library group, for example, where they save power in a certain way that’s different from other groups,” said Mike Carroll, product marketing manager for front-end design at Cadence. “Communications between teams is not always the tightest loop. If one group instantiates it the wrong way, you may have power shutoff without state retention.”

In a library, that can be disastrous for a system—or at least some of the system’s functions.
It’s also a big problem in flash. Consider, for example, a smart phone where the low-battery signal is flashing and the system is ready to shut down to keep enough charge in the device to maintain essential data in memory.

“If you get a phone call at that time and you pick it up, it can be disastrous for the system,” said Synopsys’ Jadcherla. “But how do you prove that? It’s not easy. You need to come up with a methodology to test it. That’s where random constraints and testing come in.”

Another problem is when engineers route signals across other blocks or power domains. Pangrle noted that may not show up in the block diagram, particularly if the block is powered down.

“The key is to keep the logical hierarchy matching the physical hierarchy,” he said. “But design teams are not experienced with that. Another problem is that the signal may not be the same on one side as the other.”

That can also happen at advanced process nodes with process variations—an issue that no one even paid attention to at 130nm. At 45nm, it can be the difference between a functioning chip and a buggy one.

Advice from the experts
Low-power experts have consistently advised design teams to think about low power at the architectural level, and nothing has changed in that regard. What has changed are the numbers of possibilities for verification. Adam Sherer, product marketing manager at Cadence, said that for every power domain there are two-to-that-power possible states. So if there are two domains, there are four possible states, and so on.

“Verification does not have a theoretical limit, but pragmatically there are limitations,” Sherer said. “The problem is coverage. If you can manage to create a loop, you can extend it to the power domains. We’re seeing the same from the functional teams. Randomization testing is where the functional coverage comaes in. As long as there is coverage and you can see functional sequences you have vision into the power domain space. It has to be able to come out of shutdown and on the implementation side it has to work.”

That means establishing power intent so you shut off something at a particular time.

All the EDA companies say that a verification methodology helps, as well, although each favors their own flavor, whether it’s OVM or VMM. Other higher-level abstraction standards such as CPF and UPF, and TLM 2.0 also help significantly.

“With TLM you can figure out what’s in hardware and what’s in software and which blocks run at which voltage,” said Pangrle. “Then you can put in which blocks to shut down entirely and specify the power states.”

And if you can create an effective coverage model based upon those factors, then at least you have a chance of getting a chip out the door on time, possibly within budget, and one that actually works.