Experts At The Table: Changing Design
Friday, February 3rd, 2012By Ed Sperling
System-Level Design sat down to discuss the changing design landscape with Juan Rey, senior director of engineering for Calibre in Mentor Graphics’ Design to Silicon Division; Michael McNamara, vice president and general manager of Cadence’s System-Level Division; Yervant Zorian, chief architect at Synopsys; Prasad Subramaniam, vice president of design technology at eSilicon; and Ravi Varadarajan, an Atrenta fellow. What follows are excerpts of that conversation.
SLD: As we go up in abstraction, will transaction-level modeling be enough? It doesn’t address software or the physical effects, does it?
Zorian: That’s correct. It’s a higher-level model. But it’s good for initial exploration.
Subramaniam: You need to be able to explore all the different multidimensional aspects of the problem—power, performance and area. Being able to look at one aspect is good, but it’s not sufficient. That’s the missing link here.
McNamara: That’s where you get to things like parasitic extraction. You need to lift up the power information from the RTL and the gate-level simulation and project that up to the TLM so you can run simulations there and boot software on it and get a sense of your power. As you switch to a different manufacturer, that also has an effect on the leakage power. You’d like to be able to project that information up, too. That’s crossing nodes with abandon. You’re going from a choice of one transistor to another and projecting that onto system-level power. But for us to tame the complexity we have to have a system that lets us do that. I may not care how fast it works. I may just need more than a day of life on a battery. I may need it to last the flight from here to India.
Varadarajan: The abstraction only works when there is correlation. It enables you to do design exploration. If you make all these choices and you can’t trust and implement those choices when you go to the back end then it’s not useful. The correlation is key as you go down in implementation. In a typical SoC you have IPs and complex bus fabrics switching up these IPs. What designers do is transform the SoC from a hierarchy where they collect IPs and push them to the bus fabrics into larger subsystems that get implemented by design teams. That is a difficult task. One of our customers complained there was a timing signal that went from the lower left corner of the chip to the top and all the way back down. That’s something that should have been addressed up front. When you go down into the implementation of a very complex IP that is coming from a third party, the implementation is going to be a function of how you break down the data flow, how you organize the memory and the floor plan of the entire IP. Being able to capture that and make high-value decisions, like whether you can push it from 400MHz to 500MHz, is going to be a function of how the floor plan is going to look. You need a predictable way of handling that. Once you have taken care of the global topology, all the physical synthesis tools are commodities. If you organize the data flow, they can all converge to the same timing targets. You need to demystify IP and break it down at the abstract level, and have enough confidence you can achieve your targets, whether that’s timing or condition or power, and then be able to implement that predictably when you go into the back end.
SLD: To get a design out faster you’re decoupling IP from the rest of the design. But to do exploration you need to really understand all the tradeoffs of performance, power and area from a system level. How do we bridge those disparate ideas?
McNamara: There’s a tools aspect to it of being able to extract information so it can be shared. The end customer needs to be trade off ‘A’ versus ‘B.’ You can assemble one company’s IP with another company’s IP, but you’re going to be just like someone else. So how do you differentiate?
Subramaniam: This is where system-level tools come into play. Today you can figure out performance and what speed you need to run your system at. But they don’t have the ability to absorb information that is available at the lower level. What we need to do is to input the implementation-related information up into the system-level tools. When the system-level designer is exploring the system bandwidth requirements and how to architect the system and organize the software and hardware partition, they should be cognizant of the implementation-level details at that level. That is a missing link today, and it’s a gap that needs to be addressed.
Rey: There is the issue of certification. You need to understand interactions when IP is placed in a certain area. Look at 3D integration, where you have things in different die that have to work together. Even though the general consensus is this is really a cost equation, as soon as you want to stretch the limits a little bit and go to higher frequency the approach breaks down. More research into implementation and testing will be required to make sure it works. When you go to very high frequencies we’re starting to see some effects that were not important before.
Subramaniam: If you look at memory, there are different kinds of memory interfaces. You can have a wide I/O interface in a 3D stack that is running at a much lower speed but giving you the same bandwidth as a much higher-speed interface that is SerDes-based and has fewer pin-count. In a 3D environment, both are applicable. In fact, the wide I/O is more amenable to a 3D environment. One of the things system-level tools should enable is for you to determine which makes more sense for this design. Should you use a SerDes-based high-speed serial interface to access memory or a wide I/O low-speed interface? This is where the power-performance-bandwidth-area tradeoff comes into play.
McNamara: It’s also the implementation. If it’s displaying video you need a lot of data, but it’s also 24 frames per second so you know exactly what bandwidth is needed for the next frame. Wide I/O could make more sense in that case. If you have a low-latency message and you need to get it quickly and it’s small, both would work. You could send the same message over either channel. But you need to see which one is better for the application. We’ve talked about power-performance-area, but there are many other cost metrics—things like routability. One design may be 10% smaller but the routing guys will kill you. You also have reliability. As geometries get smaller and smaller, maybe there’s another way of implementing something that’s more reliable, particularly if this is going into the back office of a bank. If you care about one of these other dimensions, is there a way to extract that information and project it through the system?
Subramaniam: You need to define the metrics and have those metrics defined for each one of these subsystems. That’s required for exploration.
Zorian: Exactly. But architecture-level exploration is there already—the prototyping capabilities where you can explore different architecture options, whether it’s 3D or 2D or wide I/O or not wide I/O. You can play with those. We do have virtual prototyping capabilities that allow you to play the game early on—pre-silicon.
SLD: That’s assuming your IP is very well characterized, though. From the big companies like Cadence, Synopsys, Mentor and ARM, that’s going to happen. From other sources, maybe not.
Zorian: When we started IP it was just for re-use. It was a design piece. Today that isn’t the case. There is differentiation between IP providers. I believe in IP completeness. You need to put out a complete package for IP. From a design point of view and from a manufacturing point of view you have to prove it in silicon, you have to have the silicon reports with it, and you have to maintain it internally with built-in self-test and built-in self-repair and debug diagnostics. That makes your IP complete, and it’s a differentiator in the future between one IP provider and another.
Subramaniam: This is where a standard like IP-XACT may come into play. If you have a host of third-party IP providers out there, how can the small IP players enable their IP in the ecosystem? The answer is by using a common standard to characterize the IP and fit their IP into the tools environment. That will enable designers to access that IP in their environment.
McNamara: And you mentioned the subsystem. It’s no longer that I want this PCI Express. I need a compute subsystem that gives me the floating point, the graphics, and so on. There’s IP that’s software for that. Often IP suppliers of these subsystems are giving you demo drivers. They almost work. You’re assembling this device to get to market by Christmas. You figure all you have to do is slap this thing together, put on Android and you’re there. Then you realize the driver doesn’t actually work. Then you integrate it all with the rest of the system and you find out you have to fix the driver again, because while it worked great after you fixed it on a point-to-point link, it doesn’t work as well on a shared bus. There’s also a software component there that has to be tested and proven. When you get to platform-based design, there will be common subsystems and the Linux kernels will already know about these. They’ll already have the device drivers. So then there may be hot IP from another company that gives you 10% more performance with 15% less area, but the Linux kernel doesn’t know about it. That’s now another challenge to the small IP providers. They need Unix developers working for them to deliver all of this and they may not be able to afford them. But how we’ve made these devices better every year is by taming complexity.
SLD: We also have to start dealing with yield issues, right? Multiple known good die could end up in a package or stack as all bad die.
Zorian: It’s not only the dies. It’s also the interconnects. Those TSVs are of a different nature. The defects that can hurt them are different. Our ability to test them at the end is not sufficient because by then you’ve destroyed your whole device. At the stack level you have to test and prove and retest.
SLD: It’s also the interposer, right?
Zorian: Same approach.
SLD: From a designer standpoint, assuming we have subsystems that can plug and play and IP that can be mixed and matched, how do companies differentiate themselves.
McNamara: That’s where having a complete flow is essential. One company did a great memory controller that could handle any device with 15 common memories, and you could use this IP on your device and build something that didn’t require custom memory. What this meant was there was wasted silicon. There were bits available to talk to memories that this device would never talk to. When we have this great system where we have the transformation tools down and the analysis tools up and this way to select IP and put it together, there’s another phase of this optimization that goes across the whole thing. It isn’t just gate-level optimization to clean up a couple bits. You’re looking across the whole stack and realizing the customer is going to run iOS on it and it will never run Android, so maybe there’s something you can get rid of reliably on this design, but keep the IP for another design and delete something else. That’s a differentiator. You need a way to optimize a design.
Rey: There is always the possibility of stretching the technology into limits that are defined in a conservative way by the manufacturer. Yield is one aspect. Another aspect is related to performance. You can see the companies that have an intimate understanding of the technology processes. They can go beyond the recommended rules and the established rules that the foundries have imposed, and it can give them an advantage.
SLD: Can that be done in a disaggregated market, or can it only be done by the large IDMs?
Rey: It can be done in a disaggregated market, and it has been proven by some of the companies we work with. What you need is a certain level of volume so the manufacturers will pay attention to you. Otherwise it’s going to be a lot harder.
Subramaniam: I agree. If you look at the performance of different technology nodes, there’s a significant overlap in performance between neighboring technologies—40nm and 28nm, and 40nm and 65nm. Somebody who makes a smart choice can differentiate their product by implementing it in a cheaper technology versus someone else who uses brute force in a more expensive technology. That’s one way of differentiation. But in spite of there being a lot of re-use, there’s always going to be some aspect of the problem which is unique to the individual—algorithms, software techniques—and those will be designed independently. They will be complementary to the re-usable IP and that will always be there. But when you talk about wasted silicon, that is a by-product of re-use. Not everyone can afford to do a custom design for every design. There will be some customization, but that will be dedicated to a small portion of the overall system.

