Posts Tagged ‘debug’

Best Practices For Multicore SoC Test And Debug

Thursday, October 21st, 2010

By Ann Steffora Mutschler
In increasingly complex SoC designs, many of which contain multiple cores and multiple modes, determining best practices for testing and debugging is a moving target.

Jason Andrews, architect at Cadence Design Systems, said multicore debug is a huge issue. It isn’t easy to do, and there aren’t many good ways to do it.

He suggested one approach is to try to use virtual platforms as a way to do multicore debug in the context of software running on multiple cores plus whatever the state of hardware system is. “I look at it and try to provide an environment that is a programmer’s view for all the different cores in order to see what’s going on with each core, including the importance to the hardware and the peripherals. The programmers are reading and writing registers and interacting with the hardware behavior. Today there is not an easy way to do that, so even if you have good software debugger it only sees software. It doesn’t really see hardware,” he explained.

Complexity also can be exacerbated based on the type of multicore design, as Frank Schirrmeister, director of product marketing for system-level solutions at Synopsys noted.

“As you look at homogeneous multicore where you just are adding compute resources and you basically don’t have any dependency of the functions running on those cores between each other, there’s not much challenging there. Every core does the same thing in principle. You’re just feeding the data into it step by step. So if you have that decoupled I think the current techniques will work because you do single core debug and then multiply it because you don’t have much dependency,” he said. “The challenges come in when you do heterogeneous multicore and you are trying to distribute functions across those cores. Now you have very intricate dependencies.”

With heterogeneous multicore developments on the hardware side, software debug very likely will make virtual platforms essential because traditional techniques are breaking. If you are doing this on a real chip and your hardware stops, you’re stuck, Schirrmeister said. “What you can do in simulation is backtrack. You can reproduce it in simulation, you can do things that are simply not possible in the final chip. You can hold one processor and let the others continue to hone in on the dependencies. Multicore and debug on virtual platforms gives an additional push because the traditional techniques are breaking.”

In terms of visualizing SoC complexity, Andrews believes the best way to do this is to give a visual picture to the person doing the debugging of the state of the system. “A big part of it, in terms of system and multicore, is to visualize the state of the system in an easy way. We do this a lot either with programs that give you a list of hardware, or transaction-level views to get a concise summary of the behavior that just happened or is happening.”

Cadence has embraced SystemC and TLM-2.0, which it said provides the interconnect between the components so that all of the behavior between the cores and between the different peripherals in the hardware can automatically extracted, he explained. The company also extended SystemC a bit to define classes, which can be easy to visualize, and tools such as a state of a particular peripheral or the registers in the peripheral.

Still, more is needed to allow better visualization to the hardware. “A lot of it is extracting the information so that you can make sense out of it because for a particular operation it can have many low-level transactions and it’s really hard to sit and sift through one at a time. By providing tools that automatically extract transaction sequences which say, ‘These last 20 things you did are setting up a DMA between the video controller and the memory,’ and show you the parameters and behaviorally what it is doing, then it makes it a lot easier to understand,” he continued.

In essence, the information provide during the construction of the hardware platform is also used at run time for analysis and visualization.

The market for virtual platforms has developed such that customers are partnering closely with tool providers. The tool companies are doing some amount of model building and platform development to bring to the customers and then may extend it and add more models based on customer requirement.

The first wave of virtual platforms involved fixed simulation configurations. However, the current generation of customers wants to have more flexibility, Andrews said. “They want somebody to come with the tools, with a good library of models and connections to the right IP providers but then they also want to have something they can extend and work with.”

Managing capacity and size
At the other end of the spectrum but just as critical is manufacturing test of SoCs.

Complex SoCs today have hundreds of millions of gates, and managing the sheer capacity and size of designs puts tremendous pressure on test engineers, said Greg Aldrich, director of marketing for Mentor Graphic’ silicon test solutions group. “In manufacturing test, one of the things that you have to deal with is concern about creating the test patterns in a reasonable amount of time: Can I create the data? How long does it take to create? How many machines are needed? How much horsepower is needed on the machines to create that?”

Even more critical is whether or not the test data will provide adequate coverage and will fit on the test equipment, he said. “It’s really a throughput in manufacturing issue because that’s where it’s not a one-time cost—it’s a recurring per device cost. So every additional second you have to sit on the tester to test this SoC is an additional cost that you will incur for every chip that gets manufactured.”

To deal with the size of the volume of data today, embedded compression is used, which is a technology that has been around almost 10 years and allows a piece of logic to be embedded onto the front of the scan chains. Another way test engineers are looking at dealing with just the volume of data in the cost of test is built-in self-test (BIST), which is the mainstream for memories, Aldrich said.

Taking a hierarchical approach to test can also be employed to manage the sheer size of SoCs. “Test has been a back-end process where you don’t do anything until the complete gate-level netlist is done, and then you do it all at once. With a few hundred million gates, that becomes increasingly more difficult and in fact most large SoCs are being partitioned when they are being designed anyway,” he explained. “In a lot of cases, customers are looking to use a partitioned approach or hierarchical approach to doing all of the test as well: generating the test patterns instead of the whole design at one time or doing one partition of the design and then leveraging those test patterns and propagating them up to the top level of the SoC when it gets compiled.”

Challenges on the horizon
Not surprisingly, power is an issue at every turn. “It doesn’t matter if it’s a low-power cell phone application or a high-power server chip. If you’ve got a few hundred million gates, power can be an issue with a chip running at several watts or several milliwatts,” Aldrich noted.

In addition, customers today are struggling with figuring out why they are getting failures in manufacturing. “You might have a process that is at 70% yield, which means you are getting, if you are in high-volume, thousands and thousands of failures that are coming from the test patterns. The challenge a lot of people are looking to now is how to take that data and figure out why this thing is failing. Is it a yield problem? Is it a design problem? Is it one of my design-for-manufacturing rules that isn’t quite right?”

Mentor has observed the last couple of years the area and technology of diagnosing scan pattern failures emerging in order to take the failures coming from the tester during manufacturing test, feed them back into a diagnosis tool along with all of the design information (including the physical layout information) in order to identify what caused the failure and where is this failure located.

To be able to leverage that and do some statistical yield analysis to identify if it is all just random particle effects or if there are systematic issues within the manufacturing or design process is significant. “This is happening in high-volume design and especially as customers are moving to smaller technology nodes,” Aldrich said.

Mentor’s diagnosis and analysis tools in this area all fall under the Tessent brand name, following the company’s acquisition last year of LogicVision.

Experts At The Table: System-Level Verification

Friday, July 17th, 2009

System Level Design sat down to discuss issues in system-level verification with Frank Schirrmeister, director of product development in Synopsys’ solutions group; Donald Cramb, director of professional services at Eve; Patrick Sheridan, director of marketing at CoWare, and Scott Sandler, president of SpringSoft USA. What follows are excerpts of that conversation.

By Ed Sperling

SLD: Verification has been the biggest time hog in chip design, and it has become a system-level problem at advanced process nodes. How are we going to resolve this?
Sheridan: You’re right that it isn’t just a chip-level problem anymore. It’s a system verification problem. We have to do things differently. In the context of multicore, it’s potentially multiple subsystems, multiple processors, multiple software stacks, integrating them and debugging both the hardware and software side.
Cramb: I think it’s the word ‘verification’ that gets people confused. There’s this thing called chip verification. But from a system-level, it’s validation. You bring the software and hardware together and validate the whole environment. You’re not just verifying against a spec. You’re validating how this thing is going to work for the end user
Sandler: Is it validation if it’s a whole system and verification if it’s a chip?
Schirrmeister: No, validation is against intent. Is it really what the user wanted? There are three elements to verification. One is the traditional hardware verification. The second is system validation, which is the chip in the context of the overall system. The third is software. It’s the whole aspect of software development and verification in the context of hardware. Whenever we had too much complexity in the past, we went up in levels of abstraction. The same will happen in verification. We are using transaction-based verification, and eventually we will have to figure out how to verify these higher-level models.

SLD: Is validation the same as debugging?
Sandler: There’s a general tendency to say you’re debugging when you’re verifying or validating. Ultimately what happens is when you validate or verify, things happen that you don’t understand. In my view, that’s the point at which you shift to debugging. You let the thing run, it does something you don’t understand, and then a human has to get involved. Debugging can only happen when someone is looking at some output or data.
Schirrmeister: Validation and verification can both be done if you express the design intent for validation and if you have a specification that you can verify against. Both can be done offline. You have a regression suite and it says, ‘It works.’ Debug is the last resort when this regression suite says, ‘It doesn’t work.’ Now you need to go in and debug. The hardware-software interface becomes critical there.
Cramb: The level of abstraction for verification is getting higher, but the same is true for debug.
Sandler: No matter what level you do the validation or verification, you have to be able to do the debug at that same level. The same things that affect simulators and models affect the data capture mechanisms you need to hook the person into the process and show people what is going on.

SLD: So what’s new here?
Cramb: For us the term is smart debug. You bring together not just wave forms and monitors, but you raise it to the right level of abstraction.
Sandler: Debug is the same thing we were doing years ago with the logic analyzer and the oscilloscope years ago on the bench. Then we went to wave forms and source views. When you do that at the next level up with transactions and C-level source code and you have the debugger linked in and synchronized, it’s just a progression.
Schirrmeister: It’s not the word ‘debug’ that’s in question. It’s the scope of the work. There’s IP debug, which is block-level, there’s chip debug and there’s system debug. The way we phrase things at Synopsys is system prototyping. You want to mix the different abstraction levels. You want to have virtual, including the software, connected to prototypes, which run in hardware-assisted environments. Together with system prototyping, you have system debug. You need to be able to correlate what happens in the software on one side with what happens on the hardware side. One of the things users are looking for is the ability to set an assertion in hardware, or set a break point in software, which is kind of the same thing if it’s conditional, and be able to hold the whole system. The scope really has changed since the complexity has increased.
Sheridan: When you’re validating, you’re still in the process of designing and making tradeoffs of features vs. performance vs. cost. You may not be at the point where you’re verifying whether it’s being implemented correctly. Those problems become more important to solve at the system level. Moving up in abstraction is the only way to make this work productive, whether you’re the designer of the system who is looking at architectural issues, or the people implementing hardware or software. They all have to come together.

SLD: In the past, verification occurred several steps down the line after the initial design. With increased complexity does it now need to move forward in the design flow?
Cramb: That’s validation.
Schirrmeister: It’s a thin line. Validation is against intent. On the architecture side, it’s validation.
Sandler: Since the beginning of architectural design, there’s been some intent specified. It has to run this fast, it has to be able to handle these kinds of problems, this is the input and this is the output. There’s always been a level of validation that had to be done. I don’t think there’s much change here. There’s more tools for this now and more commercialization.
Schirrmeister: Yes, it always was a requirement, but commercially it was not supported by tools. People always did it with C and C++ models. We tried to commercialize it, but it was too early. But over the past two years, the pressure has been growing to solve this issue. If you get your design out in time but you haven’t met all the architectural requirements for performance, your design will be very short-lived and your career as a project manager may be very short-lived, also. Pressure has increased over the last two or three years. We always wait when we make these transitions for projects to run into the wall over and over again. But people now realize they will have to switch to system-level design.
Sheridan: If you have a system design that’s multicore with multiple subsystems and you’re changing the camera piece from 3 megapixel to 8 megapixel, that’s more that just verification tests. It’s adjusting the challenges that impact performance of the entire product. So you have a system-level issue that’s on top of the verification of how you build the camera. The complexity is greater, and the impact of any changes is greater.

Web Seminar: Making The Right Architectural Decisions

Tuesday, April 28th, 2009

On May 6, from 11 a.m. to noon, Mentor Graphics will examine:

  • How to create a system-level transaction model;
  • Simulation of the TLM to approximate system processing and traffic;
  • How o debug the platform to achieve confidence that it is appropriately modeling the system activity, and
  • Analysis of the system to identify bottlenecks and potential tradeoffs in performance and power consumption.
To register, click here.