Published on December 07th, 2005
The statistics published by Collett International tell me that over 70% of chips have functional bugs in them the first time around. Many of these problems aren't severe enough to cause a chip re-spin. As a result, many companies still claim that their chips work on first silicon. But every one of the problems found in those almost-functional chips has to be diagnosed and debugged before a determination can be made about the bug severity. Is there a workaround? Can it be fixed in software? You don't know until you understand the problem.
Until recently, debugging such systems wasn't a big problem. A system-on-a-chip (SoC) contained a single processor, some memory, and a few peripherals. In most cases, the principle means of communication between the components of the chip--the bus--was brought out to the external pins of the chip. This approach allowed additional logic or memory to be added.
A quick look into an SoC today shows a much different picture. Multiple processors talk over several buses while multiple subsystems are tightly bound to each other. It becomes impossible to bring all of the necessary communications signals to the outside of the chip. Clearly, on-chip visibility has become a real issue.
Even with the added complexity, a number of companies found it sufficient to monitor the processors. Adaptation of the JTAG standard made it possible for logic in the processor to capture the state of the registers. It would then feed them out to a software debugger so that the designer had full visibility into what the processor was doing.
Unfortunately, JTAG has some severe performance restrictions that make it difficult to analyze the system in real time. When multiple processors are added, engineers face a tough choice. They can put both processors onto the same JTAG chain. In this case, they can see the relationship of events between them. This approach slows the system even more, however. Engineers also can choose to use different ports for each processor and lose the ability to see the interaction between them. Even without this problem, it's not often possible to see everything going on inside the chip. A more global chip visibility and debug solution was therefore needed.
A few years ago, a market started to develop for prefabricated, parameterizable IP blocks that could be added into the design. Post-silicon verification can take a number of different forms. In the simplest of cases, scan cells can be added in strategic places and hooked up through the JTAG port. Recognizing this problem, FPGA vendors have provided tools like ChipScope to help designers see what is going on internally. But many people want more than just visibility. They want to be able to place logic analyzers on the chip.
Some companies, such as FS2 (recently purchased by MIPS), have concentrated on tracing the processor and buses. Other firms, such as Temento Systems, are trying to make it a more integral part of the chip's functionality. Companies that have adopted assertions can communicate the value of adding these checkers. They increase visibility and improve comprehension about what's going on in the chip. They make debug considerably faster while offering numerous other benefits.
Why throw away those assertions once you go to the FPGA or silicon? Temento Systems ties them both together, enabling the assertions to migrate down the tool chain and assist with the post-silicon debug as well. The assertions have already defined the critical aspects of the design. As a result, most of the work has already been done. Just hook up those assertions to an on-chip transaction storage system or provide a mechanism to stream this off chip. The chip can now become as easy to debug as it was with a simulator.
Sure, there's a cost to all of this extra logic. But nobody would say that extra logic for test purposes isn't worth every penny. In many cases, pins also have become more valuable than logic area. Some of today's chips have their die size set by the number of pins rather than the amount of logic needed. The spare die can mean going to larger geometries with cheaper production costs. Or it can be used for additional logic that helps to ensure a quality product. If you have die to spare or room in the FPGA, why not use it to effectively reduce your debug and comprehension times? One of the approaches talked about here may fit in with your design flow.