Part of the  

Chip Design Magazine

  Network

About  |  Contact

Debugging and the Scientific Method

 

Laboratory work is getting fairly heated here at A&M. Equipment is becoming available for use, and as each new bouncing bundle of joy makes its way into the laboratory, it has to be fully characterized. Many times, sadly, the initial characterization is that the ‘new to us’ piece of equipment is just flat out broken. Here’s where our laboratory work parallels the world of chip design and functional verification. While it’s rather easy to determine that the damned thing has bugs, it can be a complete bitch to figure out their root cause. Not shockingly, since my work is taking place in a laboratory these days, I’m using the scientific method, to weed through the possibilities. What’s interesting is that I’m using a technique I brought with me from the world of functional verification which I learned in a management training class of all places.

A Few Debugs Moments Back on the Homestead

The technique starts with that buzzword of all management buzzwords, brainstorming. Wait! Don’t run! This brainstorming can be done in the relative peace and quiet of your own desk. Of course, it would probably be even better if you involved few of your cohorts, but we’ll save that for another post. In a spreadsheet, or word processor table, just write down every possible thing you can think of that could cause the bug.

Each of these ideas is a hypothesis. There are no bad ideas in brainstorming.  We’ve all heard that in teambuilding class right?  We do, however, have to qualify each hypothesis just a bit. Can you devise an experiment that will prove or disprove the hypothesis by changing only one piece of your simulation? If not, then rework the hypothesis into two or more independent, (In the lab we like to use the big word ‘orthogonal’. Makes us feel important.), hypotheses each of which can be tested by modifying only piece of the simulation. As your doing this work, be sure to document in a sentence or two what the test for the hypothesis is in a column to the right of the one holding the hypothesis. By orthogonalizing your hypotheses in this manner, you won’t have to perform further analysis later to figure out which modification resulted in a change in the simulation. In addition, should your modification result in the simulation being even more broken, you’ll have an easy time finding the culprit.

Now that you have a list of orthogonalized hypotheses, the next step is to assign a priority to each of them. Ideas that you believe are more likely to be the root problem should receive a higher priority. Experiments, or tests, are then run in the order in which they were prioritized. When each experiment is complete, its results are recorded in an additional column in the spreadsheet, and any new hypotheses based on the data returned are added as new rows.

 

Utilizing the Compute Farm
In functional verification, one of the beautiful advantages we have over working in a lab with an actual piece of hardware is that we can kick off hundreds of simulations in parallel if we want to. If it’s possible to check all your hypotheses at once, do it! Make sure to retain any testcases you generate, even for hypotheses that weren’t correct. If you suspected a problem could arise for whatever reason, chances are it eventually might. Now, you have a testcase watching for the issue moving forward.

 

Wringing Every Last Drop from the Process
Once the issue is completely debugged back to its root cause, take a few minutes to review your debug spreadsheet. Look at your failed hypothesis, there’s still value there. Why did you think that hypothesis might be an issue? Was it a piece of your code that’s broken frequently in the past? Does it need more random testing? Was it an under-documented portion of the design? Can you file an issue to enhance the documentation, or add a few assertions that will fail if your perception of the block’s use model turns out not to be correct? It’s a simple game, look at the hypothesis, ask yourself what was worrisome there, and then ask how these worries can be addressed most efficiently so they never have to be addressed again.

Finally, embed the spreadsheet into revision control and record a link to it in your bug database. It memorializes your monumental effort to make everything right with the design once again. More importantly, when something similar comes up, you won’t have that, “Does anyone remember what we tried when…?” moment. You’ll know what you did, what did and didn’t work, and why you did it.

 

Leave a Reply