By Ed Sperling
System-Level Design sat down to discuss verification with Albert Camilleri, engineering director for Qualcomm’s QCT Division; Jim Kenney, marketing director for Mentor Graphics’ emulation division; George Zafiropoulos, vice president of solutions marketing at Synopsys; Ran Avinun, marketing group director for Cadence’s system design and verification group; Charles Janac, CEO of Arteris; and Luc Burgun, CEO of EVE. What follows are excerpts of that conversation.
SLD: What needs to change when we get into networking the signal around the chip? Is it mindset, is it adding something new to verification, or are we dealing with a whole different problem than we’ve dealt with in the past?
Janac: One of the things that makes verification easier is that the networking techniques can be used to isolate the individual blocks and IPs from the data communication. You can then start to verify with verification IP as a proxy for the traffic. You can verify the communications subsystem in isolation. That makes the integration a lot easier. It gives you the ability to run a bunch of verification tests for the chip communication subsystem by itself. Before, when it was all mixed, it was much more difficult. The verification of the top level gets easier.
SLD: So there appear to be two trends going on here. One is a discrete way of looking at this functionally. The other way is to put it all together into a high-level system approach. But something isn’t working because it still costs too much.
Janac: What you’re dealing with is a race between extreme growth of complexity and functionality and improving verification—emulation that’s able to be used earlier in the design cycle, divide-and-conquer module verification, software-hardware co-simulation ability. Do you make verification easier or keep up with complexity?
Kenney: Just being able to keep up is a win. We’re doing well just by keeping up.
Mentor internally has a culture of hardware and software. Not enough software developers are using the tools we’ve developed, so we aren’t solving the problem for everyone. But internally there’s a lot of cross-pollination between the embedded tool developers and the functional verification tool developers. That’s how we’re trying to solve the hardware-software issue. And now you see power management software being layered on top of that, tying power consumption to various software routines. All of that has to come together to make the system to work.
SLD: Will subsystems, which are basically larger blocks, make verification easier or more difficult?
Camilleri: It depends. Re-use is always helpful, and the re-use of a subsystem would be very helpful. But it’s never simple. The demand may be something that isn’t quite what the previous one was. There may be demand for a version that works at a lower tier or a higher tier, or one that does more video. Much as we’d like to re-use more at the system level, we’re more likely to re-use at the core level. But verifying subsystems is the way to go, and hopefully we will develop methodologies that allow us to tolerate smaller changes in subsystems and be able to re-use the bulk of it. A lot of the testbenches can be re-used. But the whole concept is a challenge.
Avinun: If you’re creating verification IP, there’s probably more functionality than you need at the system level. You need it at the block level. We’ve made tremendous progress to scale those and to be able to re-use verification IP for hardware-software verification as well as TLM. But now the question is whether you really want to re-use them because 90% of what you did doesn’t really require re-use at the system level because you’re wasting your cycles. And then you have other kinds of tests that you need to run at the system level that you can’t run at the block level because you can’t re-use it. And beyond this, when you look at verification IP that is driven by software like Android or operating systems, that will look totally different. So the question is whether you want to re-use everything. As an industry we’re trying to figure out how to automate block-level verification and verification IP that has been used for blocks should be used for systems, and which other things need to be created. There may be verification IP that addresses H.264 and MPEG vs. PCI. We’re working on this, but it isn’t defined yet.
SLD: How are you defining re-use?
Avinun: From block level to system level, and not so much from one design to another one.
Zafiropoulos: In the abstract, you can’t possibly design everything from scratch on big chips. The challenge is what’s the quality and can it be deployed and how long does it take to be able to absorb somebody else’s design content—and be able to re-deploy it. In the typical SoC today half of the blocks on the top are easily supplied by third parties. That gives the designer the ability to focus on IP that’s differentiating for them. The downside is, how many different people are you going to be buying this IP from? You could put 60 or 70 IP cores on an SoC. If you’re buying that from 20 different vendors, that’s immense. Just absorbing all of this third-party content, and then understanding the quality of that content, becomes very important.
Camilleri: One thing we haven’t touched upon is debug. There’s the verification aspect of this, which is the proactive work we have to do. But then there’s the debug, and this depends on your meaning of re-use. If I interpret the word re-use to include the portability of test as well as design, being able to debug quickly and efficiently and on the best platform for finding a bug is very important to our schedules. That’s why you need a seamless methodology across platforms. If a software person finds a bug, you need to be able to find the root cause as quickly as you can. It’s very difficult to do that if they’re using a completely different set of tools than what we’ve got, and we can’t replicate it to see what’s going on. Being able to leverage my testbenches across all the different platforms is paramount.
Zafiropoulos: But debug to a hardware guy and a software guy are quite different. On the hardware side, debugging a hardware interconnect has to be done at the protocol level. You can’t look at it using bits and bytes because there’s too much to look at. You have to look at it at the protocol level or at the application-specific level to understand what is really going on.
Camilleri: And you have to root-cause it to the block you’re working on.
Zafiropoulos: It’s almost like there’s a debug stack where you look at it using multiple levels of abstraction.
Kenney: We have a customer who’s helping us solve this problem. If the software guy has a problem here, how does he communicate that to the hardware guy? They came up with an interesting way of dealing with this. They say, ‘go solve this using this particular piece,’ and then we can synchronize where the software failed and what that’s doing in hardware. It allows you to get to the same place quicker.
SLD: Does NoC technology make it easier to communicate between the hardware and software?
Janac: Yes, because the data traffic is carried by the NoC. You can put in things like latency probes, observation probes and statistics collectors that give you information in the software about what is happening in the hardware. You can use it to trap errors. That’s an area that’s just starting up, but the networking technology enables that.
Burgun: That’s a good way to write an abstraction. In the end the designers don’t need to deal with huge waveform files to understand what is going on in the design.
Janac: That information can be processed in the emulation systems much more effectively.
Avinun: There are multiple platforms. Some are better to run higher performance. Some are better to run debug. Our philosophy is that the way to solve it is to use the platform for what it was created for. Or you write for one platform and then you have 100% correlation to another platform and do the debug there. There are different methods to help you debug on top of this.
SLD: Do the existing verification methods and tools work in stacked die?
Zafiropoulos: We don’t know for sure, but we have some suspicions. One suspicion is there will be thermal variations. What will the impact on power be if it’s not uniform across the die? We don’t know. But if you look at power, functional verification was perfectly good in the 1 and 0 world when low-power techniques became prevalent. Functional started to go a little more analog, and we’re wondering if there will be an effect from thermal on things like timing. It might be there are more corner cases to be simulated.
Burgun: 3D will enable more complexity. This is where network-on-chip tools will be beneficial. But the architecture will create constraints and we don’t know what the end result will be. Will it create more issues? We don’t know.
Kenney: From where we sit, functional verification hits a lot more function in the 3D space.
SLD: Doesn’t this redefine what a system is?
Camilleri: Yes, it’s more and more integration. There will be a whole set of electrical issues. Will there be more functionally than we’re already dealing with? We don’t know. Electrical is certainly harder, and there are more physical issues so there is more strain on the hardware-software interdependency.
SLD: Do we start changing our perception of what is good enough?
Janac: The only solution is divide and conquer. If you add a die that has been completely verified, as long as you engineer the interface correctly then you only have to verify the interface.
Zafiropoulos: But if you can’t verify the full end-to-end functionality of the device it may be different.
Janac: You will have to do system-level verification, but you shouldn’t have to delve down into that die.
Zafiropoulos: Yes, the question is whether you have to do a gate-level, transistor-level verification of everything? Hopefully not.
Kenney: If we can equate it to separate chips today, they settle for verifying the chips separately. Maybe they’ll do the die individually and assume they’re going to go together okay. We don’t know yet.