Posts Tagged ‘Arteris’

Next Page »

Getting The Balance Right

Thursday, August 11th, 2011

By Ann Steffora Mutschler
Defining the power architecture for a low-power design means striking a balance between the high-level abstraction and measurements made typically at RTL and below, but today that is easier said than done.

“The balance is that at the high level of abstraction, the design choices you make have a big effect over power, yet your ability to measure them is incomplete until you get much further down the design flow. That’s a balance that people have to strike and it tends to be a problem,” said Pete Hardee, director of solutions marketing at Cadence.

What works best at the high level of abstraction is the ability to run real system modes and get real activity vectors, which are becoming increasingly important. It’s actually better to take that information at the earlier abstraction level when a lot of data can be run. Software can be run on a virtual platform or an emulation box, which provide activity data. “It’s important to understand the modes because of all the complexity—the different power modes that a system is in relating to all the different system modes that need to be covered,” Hardee said.

The other piece of the equation is the characterization of every time there is activity, every time switching occurs, and what that means at the device level in terms of power. The problem is that characterization often isn’t available until later on in the design process.

“RTL is a good place where those come together,” Hardee noted. “Above RTL, we’re often guessing at that. If we can get to at least a relative ranking of the various architecture changes you have in mind, then you’re doing really well. And that’s all the above-RTL or system-level guys are trying to do at that stage.”

Fortunately, derivative designs allow you to get a little bit better than that, because if a similar platform has been done, there is probably some good characterization data from a previous design so that can be used formally or informally.

For any level of abstraction, the most important thing is to understand the limitations of the model, said Cary Chin, director of technical marketing for low power solutions at Synopsys. “Models that are used as intended can be quite accurate, but accuracy tends to drop off quickly if the assumptions are not met. For example, a high-level model for computing dynamic power based on transition frequency might be very accurate when a block is in normal operating mode, but in some special power saving mode the assumptions might need to be specially validated or the model adjusted or extended.”

Exactly what are we measuring?
“When you are measuring power, you are doing two different calculations–in a certain amount of time, with a certain amount of load, how many transistors are flipping on and off. Each time you do that is the act of power. It’s a very interesting problem to solve because nowadays it’s not just performance. [It’s about] how do you do it at a high level so you can get an architecture before you go down to the details. You don’t want to try it and see,” said Kurt Shuler, director of marketing at Arteris.

Models are the way to go from the high-level, and are typically validated against simulation at the lower level, Synopsys’ Chin said. So a block-level IP power model could be checked against a gate level analysis to verify correctness in multiple modes of operation. Similarly, gate models are validated against circuit simulation, and so on. “At each level, it’s important that the validation be as exhaustive as possible (including some measure of completeness) in order to build confidence at the higher levels of abstraction,” he said.

This data is generally available, but the model accuracy varies as the model is tuned. “Determining an accurate and compact set of parameters for any model is the ultimate goal, but that’s easier said than done. We learn by experience, applying new information to refine successive versions of the model to achieve better accuracy over time. The usual tradeoffs apply—time vs. space vs. accuracy,” Chin observed.

Captured within those models are dynamic and leakage power.

“It used to be that you needed activity to measure the dynamic power and leakage power,” Cadence’s Hardee said. “What’s changed is that now we have leakage increasing in today’s advanced nodes, and that has led to techniques specifically to control leakage like power shutoff. You’ve got to remember that the leakage calculation depends on the system modes and how long the blocks are shut off for, and that has to be factored in.”

That can be done at a number of levels—running either system software on a prototype or system software on a previous version of the chip if it is a derivative. What you are looking for are typical usage scenarios, such as how long you are in each of the identified high-level system modes, and what’s on and what’s off. From that you can create profiles, which in turn can be used to measure dynamic power and to affect leakage power.

The software perspective
Considering power consumption from the software point of view, Marc Serughetti, director of product marketing for virtual prototyping at Synopsys, noted that open software platforms such as Android have unlocked smart phones to a worldwide community of open source software developers.

“While users clearly benefit, what is the impact on power and battery life?” said Serughetti. “Power efficiency is becoming a key issue for software developers, and important quality criteria for their software. This impacts all the layers in the software stack. All layers need to be well integrated from a power management perspective and all functional entities contained in these layers need to cooperate. The big challenge for software engineers is getting insight into how well the system is performing in perspective of power.”

Here, virtual prototypes are useful as they provide a means to access such information as long as the information is available from the virtual prototype model. To be sure, advanced low-power techniques will soon be ubiquitous not just in mobile designs but in all designs: consumer electronics, data centers, and many other areas. Once stable they are expected to be widely available.

Power Bits: Cozying Up To Electronics

Friday, July 29th, 2011

By Ed Sperling
How do you interface with your electronics? For decades, the only way was a keyboard. Then came the mouse and other pointing devices. Then came voice commands. And finally, we have entered the realm of touchscreens.

In the future, it will probably be all of the above—and more. Kurt Shuler, marketing director at Arteris, in his blog this week pointed to gesture recognition as the next wave in interfaces—basically a way of making interactions even better. IBM had pioneered some lip-reading technology back a decade ago, which leads you to wonder exactly why it took so long.

But the current market for these devices may be less about what can be done than what can be done within a given power budget. Pattern recognition and movement has been part of artificial intelligence for decades. IBM even developed a database that can recognize shapes, and Bell Labs prior to the sale of Lucent was working on a program that could identify parts of faces and shapes of heads, with a distribution of probabilities for correct identification.

Semtech yesterday rolled out a proximity sensing and haptics control that it claims to be ultra low power—2.3 volts to 3.6 volts. For a device with a plug, this is a non-issue. But for mobile devices that are in sleep mode or off most of the time, waking up quickly with a touch or a gesture will be more difficult. Interfaces require at least something to remain on, and with power budgets being what they are, and more functionality on chips, it remains to be seen just what gets implemented on mobile devices.

Experts At The Table: Billion-Gate Design Challenges

Friday, April 1st, 2011

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: If you don’t look at this as 1 billion gates, but instead look at it from the standpoint of subsystems, is it easier to justify from a business standpoint?
Browne: Yes because you know these customers because you’re tier one, two or three in this segment and you know what to put together. You may be nowhere in another segment. So here you do something original. Here you try something new.
Rajendiran: There is a difference between a derivative and a variant. You can start out with one chip, do a re-spin and get a derivative. A variant is where you start out with a big system and then that hardware is given to all the divisions in the company. Each product line comes up with what to do to create a variant, mostly in software.
Janac: Is a variant a software re-spin?
Rajendiran: That’s what we’re seeing. It’s like a superset.
Browne: We’ve seen that in a lot of companies, too. You don’t know what you need for a particular market so you create a superset.
Rajendiran: Not only don’t you know what you need, but the markets are changing. You don’t have time to figure it out. Are you really going to do a billion-gate design from scratch? Probably not. When you do a new chip the traditional defect density model tells you that your yield is low. So can you easily take what you have and do it in four chips? This isn’t the traditional way of doing integration. If I can make them into four chips and tie them together with 2.5D, then you get better yield.
Browne: Or can company ‘A’ race to market with multiple chips. If not, then the slow and steady guy may win. How far do you jump out ahead before it’s off a cliff?

LPE: On a multicore/many-core implementation, are these core sizes becoming more heterogeneous?
Throndson: There’s definitely a lot more interest in that area. One of the more popular configurations in the application processor space for these Internet-connected applications is in mobile or the digital home. It may have floating point or no floating point, which can affect a significant chunk of the core size. That works on other features, too.
Browne: It’s hardware vs. software.
Throndson: Yes. Software needs to be a little bit more aware of where those dedicated resources exist, but that’s a manageable problem. It definitely helps to save power and area, though.

LPE: EDA traditionally has been one size fits all. Are the tools moving in those directions.
Baker: System-level change based on applications is very interesting. Right now we’re in a vertical space and there are functional verification, custom design and digital implementation areas. All of us are trying to find ways to automate the process by abstracting it up a level to get to an answer more quickly. The EDA industry needs to make tradeoffs on area, power and cost so we can add productivity to the design teams. Everyone is working on that now.

LPE: It used to be a tradeoff between power and performance. Is performance no longer an issue?
Janac: It depends on the market. If you’re in DTV and you’re operating on a 25% or 30% gross margin, the die size becomes very important because it’s so cost-sensitive. If you’re in a high-margin base station, area is less important. It’s all performance. It depends on the market. But in the billion-gate chip, the big concern will be risk. People get fired for being late and for quality problems.
Browne: But risk is different things for different people. Samsung’s president said his company will be using TSVs in 2013. There are ‘Haves’ and ‘Have Nots.’ If you need to get there first you’re going to have a different risk profile than if you’re a follower. And it’s a whole continuum.
Janac: But whoever is in charge of the Samsung TSV chip is going to get fired if he doesn’t get there by 2013. He’s got to be very cognizant of the implementation risk he’s going to take to do the project.
Browne: And someone else will get fired if the factory isn’t full.
Janac: But the guys who create the design don’t get fired if the factory isn’t full. They get fired for not delivering on time, on scope and with quality.

LPE: Will we be able to get these chips out the door on time with a billion gates?
Browne: We have to improve on quality at the same rate as we improve on dealing with complexity. It’s a marathon race, not a sprint.
Rajendiran: Some companies can afford to take a huge risk. Hopefully other companies will be smarter about how they approach this. It is important to differentiate by market. But there are more ways to get there than just by following Moore’s Law. We don’t have billions of dollars to write off.

LPE: Isn’t some of this about getting more granular in the design?
Janac: The key in a billion-gate design is how you manage the partitioning and the IP re-use. You need to understand the risk of not redoing the IP, as well as the risk of redoing it.
Browne: It’s all about how it works in the system. The guy with more understanding that will have the ability to reuse more cleverly.
Baker: Certain companies will rise and succeed because they’ve built the knowledge base internally.

LPE: What happens on the manufacturing side? How do you manage yield issues?
Rajendiran: At any process node it’s the same. One thing the better foundries do is apply their learning to get to a better level of yield. The more chips you do, the more expertise you have, the better you get. We’ve done it and learned it with in-house expertise. You have the building blocks, the tools and the expertise. That’s what sets one company apart from another. Anyone can buy the tools, but can the produce it?

Experts At The Table: Billion-Gate Design Challenges

Friday, March 25th, 2011

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: Will anyone be able to afford to create these complex chips in the future?
Janac: Sure, but it will be extremely expensive.
Browne: Apple is doing it. They’ve come at it with a systems approach. The user will have a great experience because they’re going to add a whole bunch of devices. But we’ve got to find ways to attach to the software at a higher level. We’re doing a full system design. We’re not hooking up a couple of widgets anymore.
Baker: Apple has moved up the stack. From an EDA standpoint we see all these challenges. We’re actively seeing designs at 28nm, planning for 20nm. We’ve yet to see designs at 14nm. But the complexity of validating one of these devices, whether it’s a single die or a multiple-die approach and in the future 3D, is increasing by orders of magnitude.
Browne: With 100 times the number of elements you can’t just extend the methodologies we use today. You have to define the interactions so you can abstract this. You can’t manage this many power domains when the use models are different for all the users. There may be 200 things you’re turning on and off to reduce leakage and increase battery life. To date, most people haven’t done that. In the rush to get to production people want to know if it runs Android or Angry Birds, not whether you’ve done all the power management stuff up front. We’re back to the speed of execution in getting it almost right and being early.
Rajendiran: That’s correct. Verizon, after years of rumors, finally launched the iPhone. But as they got near to release they said it cannot do multitasking. Who was asleep at the wheel? Then the next day they had a software fix to enable that. Why didn’t they think about it ahead of time? With all these complications we should really partition who does what.
Browne: Yes, it’s a system problem.
Rajendiran: But it’s something people could have easily thought out ahead of time. We need to define the components that need to be addressed and give it to the people who can address it. If you take a processor and optimize it for a set of libraries vs. another set of libraries, for the same performance level, one might take a third of the power of the other one. But who should tell you that? Should it be the company that makes the processor or the company that builds the SoC?

LPE: But increasingly you’re not building the chip. You’re integrating parts.
Throndson: You can see people racing ahead of each other, depending on the pieces you’re considering. Part of it is just a matter of getting to market early with a solution. But in terms of parallel hardware, it’s still way out in front of parallel software. Even with power part of the answer is going back to better utilize the hardware that’s already there, whether it’s the processor itself or at the larger system level. It’s very difficult to optimize and deliver every component that goes into these systems today.

LPE: From the network-on-chip perspective, will these chips be running at the same node and power or will there be an array of nodes, power and legacy technologies.
Janac: You’re going to be dealing with multiple processes and legacy applications. It doesn’t make sense to put analog IP on a 16nm design. You will have to use multiple die using a system-in-package approach where the digital part of the system is running at the latest nodes optimized for low power and cost and the analog stuff is running on trailing-edge processes where the IP is available.
Browne: We’re building a system using building blocks, and good enough wins if it’s early enough. The more you re-use, theoretically, the quicker you can get there. But the real challenge is how you better enable mix and match in the software area.

LPE: And that ‘good enough’ is also tested well enough?
Browne: Good enough has programmability. The fabric allows reprogramming. We think it’s important to be able to do things in parallel. If you can get enough of them done simultaneously, even if they’re running slower, then you don’t need buffers to manage those serial events and you have less logic and less wires and slower transistors in the linear area of design. That also means there is less leakage.

LPE: Will the tools be able to deal with this kind of structure?
Baker: Re-use has been around for about 15 years. So what’s preventing the re-use? A lot of that scaling and functionality is available today. It’s not a new challenge. The challenge we face is that re-use isn’t happening. We’re redesigning these components with each iteration.
Janac: Once you get past RTL the tools are horizontal. The chain of synthesis, place and route, verification and DFM are applicable to that entire system. Above RTL it’s like the silos of IP. Those tools are not addressing that. The MIPS and ARM processors each have their own tools. Arteris’ NoC has its own tools. You wind up with horizontal silos where the IPs are tied to the tools. Only when they reach RTL do they hit the Magma, Mentor, Synospys and Cadence tools. There is no horizontal toolset that can handle all of these IPs at the architectural level.
Rajendiran: There’s no reason to keep up with Moore’s Law for things that have already been certified and verified. In the old days we were following it. When Moore came up with that law he wasn’t talking about cost. He was talking about transistors. At that time you could do a chip for $50,000. That’s not the case anymore. People are slowly coming to the realization that if you have a chip working, why bother re-doing all of it? You can put software on it, you can even re-do it on the latest process, and use an interposer to make it work. So 90% of the chip is already validated. You add new software and you get the chip out sooner.
Browne: You also cover more markets, which adds more complexity to the definition. The requirements are different for a smart phone and a tablet computer.

LPE: But some of the functionality may be the same between a smart phone and a set-top box, right?
Browne: Yes, and that’s why the big companies have more data points. They know which subsystems can be re-used. When you’re doing audio on these devices everything works. When you add more cores or video, it’s different. The guys with a bunch of technology in-house just need to add more things out of what they already have.

LPE: How many of these billion-gate designs will be on 2D structures vs. 2.5D or 3D?
Rajendiran: With 3D, the problem is more on the manufacturing side. When you drill a hole there are problems. It’s just a matter of time before full 3D works.
Browne: The fabless community is huge. There are $3 billion fabless companies that have very expensive product portfolios. There are also startups that build similar point devices to try to go after those markets. The difference is the big guys get to run more experiments. The little guy only has one.
Janac: The answer depends on what you’re trying to do. If you’re building a unified chip that fulfills a unique function, throwing it on 16nm process makes sense. If you’re mixing functions that are mixed signal, analog, RF or legacy it makes sense to put it on more die. But fundamentally the mixed-die approach is more expensive than trying to put it all on a single die in 2D, assuming you can use one process and the IP is all packaged correctly.

LPE: How many derivative chips do you need to get these days to make it economically feasible?
Browne: At 28nm the cost is about $80 million. How are you going to get that back?
Janac: People who make wireless chips are spinning them off into automotive and home gateways, so you wind up with seven to 10 derivatives for a successful platform.
Browne: In some cases a subsystem is re-used, in others it’s the same chip.

Widening The Channels

Thursday, March 17th, 2011

By Ed Sperling
Wide I/O—both as a specific memory standard and as a generic approach for on-chip networking—has been looked at for the past couple of chip generations as a way of improving SoC performance. Increasingly, it also is being used as a key strategy for reducing energy consumption.

Wide I/O refers to a number of different approaches in on-chip networking, ranging from through-silicon vias in 3D stacks to interposers in 2.5D stacking. It also refers to a standard for memory communication being developed by JEDEC, as well as more dedicated channels for signals. In all cases, the added benefit is a reduction in power needed to drive a signal.

The tradeoff typically is between serial I/O and wide I/O. Serial I/O is simpler to design and works over longer distances, but it is far less power efficient. Wide I/O, in contrast, is higher bandwidth with big power savings—Samsung, for example, estimates its new 1Gbit mobile DRAM based on a 50nm process consumes 87% less power—but the technology is also more complicated to use. And in most cases, it’s also more costly.

Eliminating complexity while adding more
The concept of bigger pipes has always been a last resort for chip architects. It’s well known that shortening the distance a signal travels and reducing the resistance can drive down the amount of power needed for a signal. Reducing the overhead of serialization and deserialization can cut the power even further. But ironically, it has taken an explosion in SoC complexity for chip architects to seriously consider simplifying signal paths.

“We always go through this pendulum swing of what’s the optimal physical implementation vs. what’s the simplest way to do it even if it costs more silicon,” said Steve Roddy, vice president of marketing and business development at Tensilica. “So you can do things with 128 wires using serialized I/O, or you can do it with a lot fewer using wide I/O. The serialized I/O requires deserialization, which costs power. With wide I/O, which could simply be a lot of wires connected to the next block, you can lower the frequency and widen the channel.”

In a 2.5D stack, that extra silicon is easier to justify because it doesn’t add significantly to the overall footprint. In a system-in-package or package-on-package it may involve an interposer, which is another piece of silicon. It also can involve a through-silicon via in a 3D stack, which is wide enough to avoid any congestion.

“With a TSV you don’t need a standard I/O, which includes the I/O circuitry, patch and bond wire,” said Tom Quan, deputy director of design methodology and service marketing at TSMC. “So you get rid of all the I/O circuitry, and you have the same area, power and current. That results in a tremendous power savings. You also get a big boost in timing. And if you use an interposer, that’s silicon so it has the same resistance and capacitance of a standard IC. You can simulate them both together and get a predictable result.”

Eliminating bottlenecks
There are many good reasons for using wider pipes. One is that multicore and multiprocessor implementations generally are inefficient. The whole idea behind these implementations was that software would be able to run across multiple cores and multiple processors. That didn’t work out as planned, due to the inability to parallelize many applications, but cores were still designed to share the same memory.

That’s inefficient from a performance and a power perspective. Cores that are not in use should be turned off or powered way down. Moreover, when they need to connect to memory it should be along a clear path with as little congestion as possible and over the shortest distance possible.

“For some years to come we’re going to be seeing systems in package with interposers as the ideal solution,” said Joe Sawicki, vice president and general manager of Mentor Graphics’ Design-To-Silicon Division. “That will involve a lot faster interconnects, mostly to memory, and potentially to homogeneous logic. One of our customers was developing a digital chip and needed Bluetooth. They did it in a digital IC and they also did it in a SiP. The SiP destroyed the SoC in performance and power.”

But the question also is at what cost. While 2.5D approaches are relatively straightforward, the interposer does add some cost and the TSV can add even more.

“We are pursuing full 3D and so are most of the people in the phone business, primarily because of the form factor and cost,” said Riko Radojcic, director of engineering at Qualcomm. If you think about an interposer, you’re adding another die to the cost. Conceptually an interposer is an elegant solution and it works fine for someone who sells a product for $100. If you throw in a $1 interposer it’s no big deal. But if you’re making a $5 die and you throw in an interposer, it is a big deal.”

The same is true of through-silicon vias, although the ultimate advantages of this approach are expected to become more significant over time.

“TSV is expensive but is a good way of meeting the form factor,” said Navraj Nandra, senior director of marketing for Synopsys’ DesignWare Analog and MSIP Solutions Group. “You need to optimize for both low power and low cost packages. It’s like buying a $50k hybrid car that gives you 32mpg compared to a $22k 1.2L, 3-cylinder petrol engine car that gives you 50mpg. Everyone is excited about the hybrid car.”

Optimizing the signals
Behind the hubbub about the I/O technology is another often overlooked piece of the equation. The move to multiple processors and multiple cores was done largely as a knee-jerk response to the end of classical scaling at 90nm. What has happened since then is a much more measured response to how to use these cores more effectively, which requires much more granularity in the design process. Not all cores need to be on an ARM or MIPS processor, for example, and not all of them need to be in one place on an SoC—or even on the same die of a SiP or 3D stack.

In addition, not all of those cores or processors need to be the same size or run the same software.

“In addition to wide I/O there are dedicated point-to-point connections to relieve the system congestion,” said Tensilica’s Roddy. “Those can include general purpose memory and processor. When the system architect knows beforehand what’s going to be in the system they can add those connections up front. So you may have a video decoder and buffer and an audio decoder using separate memories, and those may change depending on whether they end up in a cell phone or a set-top box. But there are some things you don’t know at design time and you need the ability to generate system-specific interconnects, which is what’s being sold by companies like Arteris and Sonics.”

And finally, there is a simple mathematic principle behind the push to reduce power.

“The longer a signal has to travel, the more power it takes,” said Qi Wang, technical marketing group marketing director for Cadence Solutions Marketing. “A lot of issues in design come down to power. If you put the memory outside the chip, that takes power. If you want to speed up performance, that takes power.”

Bigger pipes over shorter distances can help solve that problem, and it’s a solution that is beginning to garner much more attention these days.

Experts At The Table: Billion-Gate Design Challenges

Thursday, March 17th, 2011

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: What are the big issues we need to contend with in billion-gate designs?
Rajendiran: Billion-gate designs are no longer a fantasy. We can do that at 28nm with a 20 x 20 mm chip. But just to put this in perspective, when we first sent a man to the moon they had three computers. The power and the memory those three had together was less than we have in a phone today. So the question you have to ask is are your really putting that to good use? And from a business perspective, will it work when it comes out and who can help across the business value chain?
Baker: We’re approaching billion-gate designs in the GPU or microprocessor area. In the SoC area, we’re approaching about 100 million gates. In the next generation, we’ll see SoCs with quad cores. Beyond that, there will need to be some very significant changes in what kinds of applications we can apply those to and how we’re going to deal with the power aspects. These will most likely be in the mobile market and we’re going to have to deal with system-level issues like verification, battery life, and power. From an EDA perspective we’re on track for capacity and for some of the turnaround time, but power will need some of the focus.
Throndson: Process migration hasn’t continued to scale forward. We hit a performance wall years ago. Power hasn’t scaled, either, as we reached some of the smaller geometries. Area is the one piece that is scaling better, which enables these large numbers of gates. The keys here are systems integration and multicore processing horsepower.
Browne: When you look at design costs for billion-gate designs you have to look at the markets that are going to drive them. The mobile market has enough volume to handle the cost of these types of designs. It also has a lot of parallelism and concurrency because there is a lot of functionality, and there are a lot of different use scenarios. Traditional EDA is scaling so it can take advantage of this—traditional designs partitioned at a chip boundary in a way that fits well with the system architecture. That’s probably where 80% of us will see business opportunities. The other 20% is where you take a design and partition it across two chips. Their bigger challenge is on the tool and the architecture side and the ability of semiconductor and system companies to manage that level of complexity. When you scale to four or eight cores, there’s a huge amount of parallelism and on-chip memory. The issue we see is how you get that right, and today the solution is a lot of subsystem design. LTE radios are a good example. We’re going to replace GSM radios with LTE radios. They’re going to be 15mm of area and have a half-dozen DSP cores, but it’s going to be a standalone system that allows you to do verification, have a known good block, and which is characterized with the others. But you can’t do this as a billion gates at the top level.
Janac: What I have in my house isn’t a personal computer. My phone is a personal computer, and it will have everything I need in terms of data, family photos, passwords and payment systems. It’s more like a supercomputer and it’s going to be the driver for the billion-gate design. You’ll need storage and the computing power to make this a true PC. There are four criteria for this. The first is processing power. We’re going to have to go to many cores, so you’ll need cache coherency to utilize those cores from a programming perspective. Another key is integration. How do you bring these cities of silicon together, which is where the communication system for the SoC becomes critical? You also need partitioning. As you build more and more functions, those functions have different dynamics. The modem has to go through SoC evaluation, so it’s on an 18-to-24 month cycle, whereas the efficient digital SoC people are going to be on an annual cycle. You have to decide whether you’re going to put it on one die or multiple dies, whether you can stack the functions, and whether you can mix processes in the same dies. The partitioning and the support for the partitioning are going to have to be there. The last part involves the cost of the hardware and software. The hardware cost has been increasing slowly but the software has been increasing rapidly. So how can you use the hardware and the parameters in the hardware to lower the cost of embedded software, if not the operating system?

LPE: Will an increase in granularity in designs, in terms of various core sizes, wider I/O and multiple cores and processors, affect how we build these devices?
Janac: We’re going to have tremendous power, but we’re not going to be able to afford to keep it all on. When you’re doing graphics the GPU will be on and the rest of it needs to be shut off. For audio it will be the same. You need to be able to manage turning on and off of this functionality. And in terms of 3D silicon, some of the high-power parts of the chip such as RF and some of the modems probably need to be on a different die and connected through wide I/O and TSVs (through-silicon vias). These things will need very intelligent and capable power architectures. While you have more transistors you’re still dealing with the same power budgets.

LPE: Won’t it be even tighter budgets? In 3D stacks, the dies are actually thinner?
Browne: The terminals are better in those packages, though. Even though the dies are thinner there is a lot better coefficient with the bonding. But it’s still a problem.
Throndson: But the power source is not scaling with the demands.
Browne: We’re seeing designs today with a dozen to 100 power domains. Those are at 40nm. We have customers starting 14nm designs now. You’re going to have to move to abstractions. There are 1,000 voltage domains. Somebody will have to have a product that generates the HAL (hardware abstraction layer) of software. We generate RTL. Generating RTL and C code are not that different. That’s where you’re going to see a lot of growth in the supply chain.
Rajendiran: If you look at 130nm, we used to have one type of transistor. Now we have multiple types of transistors and different process flavors, which add a level of complexity. You now have a whole bunch of different libraries, depending on which type of transistor you use. That’s an opportunity and a challenge. How are you going to pick and choose your implementation? Then you throw in a billion transistors, and you’re talking about putting it into a single SoC. It’s going to cost a lot of money and you don’t even know if you’re taking the right path to optimize power, performance and the market. And most of it is driven by consumer markets where each person will use a device differently. What you put on the chip affects battery, performance and even leakage. There are great opportunities, but it’s also more complex. It comes down to who can you partner with for the software, for planning the product, and for implementing the chip in hardware. And it really needs to be tied together so you hit the product introduction times.

Billion-Gate Chips

Wednesday, March 16th, 2011

Low-Power Engineering examines hurdles ranging from power to cost in billion-gate IC designs with Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma.

YouTube Preview Image

Experts At The Table: Concurrent Design

Friday, February 25th, 2011

Low-Power Engineering sat down with Marco Brambilla, ASIC design manager at STMicroelectronics; Charlie Janac, president and CEO of Arteris; Mike Gianfagna, vice president of marketing at Atrenta, and Javier DeLaCruz, director of semiconductor packaging at eSilicon. What follows are excerpts of that discussion.

LPE: Is concurrent design strategic—meaning is it done at the architectural level—or is it tactical across all phases of the design?
Gianfagna: You need both. That’s the bad news.
Brambilla: The tactical portion of it is possible today, but I have no idea how to do it strategically. Nobody is writing machine code anymore. It’s not efficient. If someone wrote Windows in machine code it would be a 1 megabyte executable instead of a 1 terabyte executable.
Janac: It would never get done. At Cadence for years we were trying to catch Calma, which was the leader in layout. They had 4 megabytes of RAM and they were running on a 16-bit minicomputer. And they were more efficient until the Sun 260 generation, which had 256 megabytes of memory, but Calma could never get off the Eclipse. You’re trading off use and configurability vs. time to market. If your cost is too high and you’re too inefficient you will not be competitive. On the other hand if you’re too efficient and make the last optimization of hardware you’ll never get done and you’ll lose the market.

LPE: Going forward, if time to market and standardized IP are essential, do we have the expertise to do concurrent design?
Brambilla: With different IPs, you have the problem of porting. You may have a piece of IP that works beautifully on a 65nm process. TSMC’s process will not be that different from ST’s, but you still have to port it to make it work. That’s a problem, because you have to face all the implementation steps. Today we don’t have the tactical portion done well enough. People need to know about certain coding styles or electromigration issues. That’s the tactical portion. The strategic portion is what you can do so people don’t have to be concerned about it. If we need to distribute a bus, we have to almost do the buffers by hand. There will always be certain areas where you need the expertise of people who have done it. But I’d rather have someone who understands what areas need to be addressed rather than have to deal with every portion of the design.
Janac: When you want to synthesize the network of a bus you don’t want to do it by hand.
Brambilla: Or if I know that is a peculiarity, I can deal with that, as well.
Janac: But if you look at the strategy of Synospys, they’re on the right track. They are packaging the IP, and someday they’ll package it with their tools so they wind up with a system where you can do that kind of analysis at the architectural level. But it’s going to take a combination of tools and IP. When you’re at SystemC level, how good is that analysis going to be? It won’t be very good unless it’s dealing with the USB 3.0 model or the network-on-chip model or the actual ARM model. You’d better have those models and they better come directly from the IP.
Brambilla: If I go to a vendor and they have the 32 L and the 32 G, which one do I choose? I have to make a decision at that level because if I choose the wrong node I might not be able to mix them. There will be vendors that will offer an L and G process and others that will offer an LG. I can kill myself with leakage or performance.
Gianfagna: We’re describing an interesting change. Picture a funnel, which is wide at the top and narrow at the bottom. In the bottom half, concurrent design is so hard in terms of balancing the physical effects, the variability and the integration effects that there are a small number of companies capable of dealing with that and build a chip that yields and works. But how does that small group of companies serve what’s above them in the funnel? The answer is some number of architectures that work, and then add in enough programmability and variability. The bottom of the funnel is a small number of companies that understand how to go from gate architecture to silicon. What’s above the funnel today—a large number of fabless semiconductor companies—they go away. What those people do then is to figure out how to add their own customization, whether it’s in the form of FPGA programming files or interesting ways to build a 3D stack and software. These increasingly will be software companies. The hardware will be an assumed thing.
Janac: The problem is that the top of the funnel is feeding junk into the bottom of the funnel. How do you get that knowledge from those experts into the front end of the cycle so you don’t get junk?
DeLaCruz: Yes, by the time they get it it’s too difficult to change anything. There are tradeoffs we make on IP selection. Sometimes high-end IP has a pre-set bump assignment on it, such as SerDes. That will dictate what stack you can use. Maybe going to a different performance can change the economics. There’s no way an EDA company can figure out what those tradeoffs are going to be. Or if you make other tradeoffs like increasing the amount of capacitance on a chip so you don’t have to put that capacitance in the package or on the PCB. There’s no one tool that considers all those different things. Or if I make this one tradeoff my power will go down, but I may be going to a process that has more leakage. These are the kinds of tradeoffs you need to make very early on. Do you go system on chip, network on chip or system in package? There are different tradeoffs. It’s a combination of expertise and resources.
Brambilla: I totally agree. To solve that you don’t need the packaging expert. You need someone with a vision of the final device. When we start we need to know this thing has to go on a PCB. If I use a package that’s too small it may save you $1 on the package and cost you $20 more on the board.
Gianfagna: The system-level engineers and architects are the guys at the top of the funnel. I would argue those people can’t worry about all the issues we’re talking about. They’ve picked a package and a set of silicon and a set of programmability, and now they’re trying to figure out how to use that effectively. That might be programming an FPGA layer in a stack. It might be choosing a different memory because there are a few that are pre-qualified. And then there’s a lot of application software that needs to be run on this. Those are the decisions that are made at the top of the funnel. The minute you start factoring in the technology node, you can’t get there.
Brambilla: There are limits to that.
Janac: People are starting to figure out the software and functionality use cases, and then they’re starting to figure out the hardware that supports those use cases.

Experts At The Table: Concurrent Design

Friday, February 18th, 2011

Low-Power Engineering sat down with Marco Brambilla, ASIC design manager at STMicroelectronics; Charlie Janac, president and CEO of Arteris; Mike Gianfagna, vice president of marketing at Atrenta, and Javier DeLaCruz, director of semiconductor packaging at eSilicon. What follows are excerpts of that discussion.

LPE: Is there cross-training going on to allow for concurrent design?
Brambilla: Yes, but the first step is that you need the teams to know what’s available. That includes training the managers and having good internal discussions and distribution of knowledge. At the initial phases you need the packaging guys in. You need the test guys in because if you put in an embedded DRAM and it takes three minutes to test, that’s not an option. We have the packaging, test, the back end and all the functions.
Janac: Do you have people in ST that are responsible for the overall methodology?
Brambilla: Yes. It’s a little more bottom-up, though. We know what kind of ASIC we do. Every division in ST has a more functional approach because we do it all. So we have central R&D that goes with a reference flow and tools. And inside the divisions we have dedicated people who think about what is the best flow to implement what we do. But design teams no longer have time to think about why they should invent the next clock distribution? I want someone to tell me that with this kind of complexity you go mesh.
DeLaCruz: Are you doing the same number of tapeouts now as in the past?
Brambilla: No.
DeLaCruz: So here’s the problem. No one is doing nearly as many tapeouts now because what used to be $100,000 for a mask set is now $3 million.
Brambilla: That’s not the big issue. The big guys with 60% or 70% market share don’t care about the cost of a mask set. The problem is productivity. You need 4x productivity at each new node. I had an ASIC at 65nm with 25 sub-chips, and every piece of this thing was different. So we will need 100 sub-chips for the next version at 22nm. It’s not the $1 million or $3 million for the mask sets. It’s the $40 million or $50 million to develop the ASIC.
DeLaCruz: But there’s also the issue of having all these high-end specialists around. If you’re going from 25 chips a year to 4 chips a year, then you have all these people who are going to be intensively involved in the chip for five or six weeks. You can’t have that. It’s going to drive the need for cross training and concurrent design. You can’t align things vertically anymore. You need broad levels of expertise.
Brambilla: I hear what you’re saying, but the big issue we’re seeing is productivity. We don’t have people idle because they’re doing fewer chips. To do those four chips today I need the same amount of people, plus some more, that I needed to do 16 chips at 65nm.

LPE: Where are the biggest problems in concurrent design? Is the software and hardware, verification or something else?
Janac: It’s basically about wires and gates. The gates scale but the wires don’t, so you need a better way of managing the wires and assembling the SoC. You can’t afford to re-do all of them in the next generation, so one of the big issues of IP re-use is how you support the protocols those subsystems communicate in, how you get them integrated easier into the next generation of the chip. It all comes down to architectural improvements to get to the next generation.
Brambilla: The next time you do a chip you need more bandwidth. Your Verilog is probably useless—or at least it’s not efficient. It was efficient when you designed it in that node. If you change the frequency there’s a problem.
Gianfagna: You’d need to change the microarchitecture, which is hard to do with Verilog.
Brambilla: Yes, so you’re redesigning it. To me there is a big issue every time you change from software to hardware, which is co-development. When you go from RTL to the physical world it’s more co-development. When you go from silicon to the package that’s more co-development. It used to be more than just separate islands. They were like separate continents. But the infrastructure today doesn’t help you as much as you need to increase productivity. I would need to move people just to describe the algorithms and have some tool generate the RTL, but that tool should generate the RTL knowing there are physical constructs. The RTL should be able to predict power and congestion issues. Today we have problems of power integrity because at 32/28nm and 22nm the density of the gates cannot be supported by the power grid.
DeLaCruz: What if you use two pieces of silicon instead of one? How do you deal with your structure then?
Brambilla: You can only handle that at the top level. This is something that requires training. It may make sense to do 5mm square on both ASICs and create more efficient communication between them. It costs more, but it may shave three or four months off the development time.
Gianfagna: What you’re describing is the need for better methodology with a globalized company and more localized infrastructure to use those resources. ST is a big enough company to have the resources to make that work. But what about the guys who don’t have that luxury? There are a lot of fairly large fabless companies that don’t have infrastructure to allow that to happen. How are they going to get to this new level of integration and new way of working? That’s a big challenge.
Brambilla: I know of five companies today that do ASIC services through the fab.
DeLaCruz: But ASIC services can be another island, unless they’re totally integrated with their supply chain.
Brambilla: It’s a huge problem.
DeLaCruz: Historically, there were design services for chip layout and packaging services. You can’t isolate those. It’s easier to get people to overlap in the same company. It’s really difficult to get people to overlap in different companies.
Brambilla: That’s why ST decided not to go fabless. At 20nm, if you don’t control the process how are you going to tune your back-end flow? How much does it cost to run silicon at a third-party fab to verify if your mesh clock tree or H-tree work?
Janac: But don’t the big guys have process teams? Guys like Qualcomm are basically running their own process.
Gianfagna: Yes, and if you look at their org chart you’d swear they own a fab.
Janac: But what you’re saying is that’s not the case with medium-sized companies, right?
Gianfagna: Yes, there are a lot of those companies.

LPE: In 3D stacking you may have a platform developed by a large IDM bolted onto something else. Does that work with the existing players and infrastructure, or do we need to re-think the design process?
Janac: If the bridges are well defined, you can make that work. You can envision an analog die in 90nm and another die in 22nm going to a memory. As long as the way it comes together is well defined, it should work. I don’t see another choice. Otherwise these mid-size companies go to FPGAs, or they become IP providers, or they die.

LPE: What you’re talking about is concurrent design across an ecosystem, not just within a single company, with a focus on everything from interoperability to power.
Janac: That’s right.

LPE: But it’s never been effectively done.
Janac: Companies like ARM can organize an ecosystem across multiple generations of products and multiple companies. We need to see more of that. If someone defines a 3D silicon methodology it can work. There aren’t other choices. A small guy cannot afford to make a 22nm chip. They may be able to go to a company like eSilicon, but there won’t be enough capital around the small and medium-sized guys to go to the latest nodes.
Brambilla: If you’re a startup, you need to prove your technology. If you’re lucky you can prove it at 90nm and then you hope you can be bought. If you’re trying to prove it at 20nm then your best bet is to be part of another company’s mask set. If you’re very small, you might have to wait until there are enough contributors to that mask set. It is true that you also need the ecosystem outside, and you will need some way of describing that—almost a super version of IP-XACT. But inside the ASIC we need to start solving the need of automating the tradeoff analysis. I want people to stop writing Verilog and algorithms, and then use a tool chain that allows them to converge toward silicon in a way that avoids all the issues you deal with today.
Gianfagna: You’re describing a top-down design methodology that comprehends hardware-software co-design, partitioning and physical implementation issues, and which balances it from the algorithm all the way through. That’s a great vision. But an alternative vision is that it’s too hard to do that. What if you come up with a hardware-based design flow that targets a large market with the ability for customization in software, and then you build a chip to address that? Now the co-design problem becomes, ‘Which architecture is most compatible with my software?’ I can just use that chip and customize the software. We’ve been predicting this for a long time, namely that all the differentiation becomes software.
Brambilla: We do have some progress in that direction. I see it as an intelligent way of attacking certain markets. I don’t see it in the switching market or cell phones.
Marvell designs a chip set, throws functions into a chip set, and they give it to Nokia or whoever they like.
Janac: Their volume is just barely enough to stay in that business.
Gianfagna: MediaTek has a similar strategy and they’re selling into the Chinese market.
Janac: But their stuff is highly optimized.
Gianfagna: That’s true. But the cell-phone market and the smart-phone application are very similar. We have 3G, 4G and a way to deliver the video. We have Wi-Fi. That all gets standardized. So the way that ‘Vendor A’ differentiates itself from ‘Vendor B’ is the software interface and maybe some clever stuff with touch screens. It’s more mechanical.
Brambilla: In that space I agree with you.
Janac: I don’t. One of the things that’s happening is we are in a computing architecture switch, from PC server to the cloud. What people have gotten wrong is those edge devices will need to become extremely sophisticated. The cloud will not always be available and you will need that sophistication to take advantage of the information that’s in the cloud. So those devices are going to go through a huge amount of innovation and become way more powerful than today. It may take several years but it will happen.
DeLaCruz: If you’re very highly standardized, you can probably program software to make some tradeoffs for you. When you’re dealing with a wider range of chips with analog content and some interface into memory you’re dealing with very different problems. I don’t think I would trust an EDA tool vendor to think of all these different options. They’ll implement certain things, but they’re going to be behind the curve by at least a year.
Janac: With the physical layout the tools were driven by design rules. But at the architectural level you really need IP. Without IP the tools do not have any reality. We’re going to see a combination of tools and IP at the architectural level. Without IP, ESL is a $50 million market. On the other hand, if you have the tools and the IP you can generate a lot of value. ARM cores will come with tools. Our interconnect will come with tools. The memory controllers will have tools. You’re going to see a unification of IP and EDA at the architectural level.
DeLaCruz: At that point in time the only options you’re presenting yourself with are the ones the IP vendors are giving you. It’s limited. But stepping back and taking a higher-level view, there may be a different way of looking at this problem.
Janac: The economics are forcing each company to build its own IP that’s core to its value. Otherwise it’s too hard to be too competitive across all IP and 60 subsystems. You have to pick from a menu of IP to build those parts of the chip that economics don’t allow you build yourself.

Concurrent Design

Friday, February 11th, 2011

The idea of developing software and hardware simultaneously isn’t new, but it has taken on renewed urgency in IC design because of growing complexity, including power and proximity issues. Low-Power Engineering captures the perspective of executives at four companies working in this market: Marco Brambilla of STMicroelectronics; Charlie Janac of Arteris; Mike Gianfagna of Atrenta, and Javier DeLaCruz of eSilicon.
YouTube Preview Image

Next Page »