• Article
Published in December / January 2005 issue of Chip Design Magazine
Cinematic-Graphics Design Deluges Traditional Verification
Combine acceleration technology with traditional emulation tools for a very effective and high-performance verification environment.The size of graphics-based processor designs is on the rise. As a result, it can take several weeks to several months to run the tests that are required to verify the system-level sign-off design. This trend has prompted a dramatic upsurge in the popularity of hardware-assisted verification from leading graphics-processor developers. Tools like logic emulation offer significant throughput at the expense of long setup time and design alteration. Emulation is therefore suitable for the back end of the verification cycle or system-level integration. Traditionally, however, it has not been viable for use during system integration. Failures that are detected during this phase can be effectively debugged using an alternative hardware-assisted solution called acceleration. (see Figure 1)
One company that’s finding dramatic success with this type of verification flow is NVIDIA. Using best-in-class verification tools, NVIDIA has realized the benefits of acceleration. It also has incorporated a methodology in which acceleration and emulation can co-exist.
Interactive 3D graphics are integral to various computing and entertainment platforms, such as workstations, consumer and commercial desktop PCs, Internet appliances, and video-gaming consoles. 3D graphics form a powerful broadband medium. They enable the communication and visualization of information. It doesn’t matter if that information is in professional or commercial applications. Professional applications include digital-content creation and computer-assisted design and manufacturing (CAD/CAM). In contrast, financial analysis and business-to-business collaboration are examples of commercial applications. 3D graphics also is an enabling technology for simply surfing the Internet or playing games.
From a verification standpoint, the visually engaging and interactive nature of 3D-graphics chips makes them extremely complex and challenging. In-circuit emulation has made it possible to rapidly model the functionality of large digital designs. During the design’s debug phase, emulators provide full visibility--but with only limited depth. An alternative methodology combines hardware acceleration as a logic-verification engine for design debug. For system integration, in-circuit emulation is deployed as a regression engine. Many advantages can be derived from supplementing acceleration as a debug engine while emulation is used for software development and running long regressions. This approach improves the utility of emulation and debug. It also shortens the turnaround time by virtue of 100% visibility without reducing speed or compromising depth.
NVIDIA is known for driving innovative design methodologies. In part, this reputation stems from the extreme complexity, pricing pressure, and product life of graphics processing units (GPUs). NVIDIA’s goals were straightforward and yet demanding. After all, its high-end GeForce 6800 product packs 222 million transistors (see Figure 2). Because it is manufactured using IBM’s 0.13-micron process geometry, the company needed to meet an aggressive verification schedule while increasing verification coverage.
Figure 1: A significant gap in verification time and performance exists between the traditional methodologies of hardware emulation and event-driven software simulation.
The specific verification goals that were required for the GeForce 6800 project include:
• Bring up a new generation of GPUs on an accelerated verification platform in a one-week time frame. Derivative chips must be brought up in a few days.
• Automate the “Compile-Run-Debug� process so that ASIC design engineers could use an accelerated verification platform.
• Verify GPU and frame-buffer/system-memory interaction.
• Validate AGP/PCI-bus interface functions.
• Ensure functionality at various levels of abstraction (RTL and gates).
• Expand accelerated verification solution to ATPG and BIST applications.
For this project, NVIDIA’s design methodology involved the development of “transaction-accurate� C-language models. These models are compared against Register-Transfer-Level (RTL) models. Until this comparison is validated, the C model is not considered “golden.� Designers can write tests at any level and reuse them at different levels of abstraction.
The entire design methodology is based upon Verilog 2001. Designers create tests to validate the different functionality of the system-on-a-chip (SoC). Depending on that functionality, tests are classified into various levels. For example, higher-level tests run for a longer duration. Each test contains intelligent checkers and trackers, which provide valuable data upon test failure. These features enable designers to focus on the subset of the SoC, thereby expediting the debug process.
Figure 2: This graphic depicts the architectural block diagram of the GeForce 6800 series product.
As soon as the team builds a new design database, it is validated through a series of sanity tests using software simulators. Typically, these tests take four to five hours. Upon the successful completion of the sanity tests, the design database is made available for full-chip validation with logic emulation.
Before generating an emulation model, the GPU is synthesized to gates using a synthesis tool. The preparation for generating an emulation model then begins. It involves the generation of emulation-friendly memory models and simplified phase-locked-loop (PLL) models. The preparation also includes the mapping of simulation checkers and trackers to emulation-friendly, memory-based implementation. In addition, the testbench environment is detached from the design under test (DUT). The emulation model is then subjected to the real-life stimulus, which is applied through a PCI/AGP rate adaptor.
Preparation for emulation can take a couple of weeks. As soon as the emulation model is up and running with the same sanity tests, several thousand tests are automatically submitted to an emulation system. When design failures occur on the emulator, appropriate triggers are set using a built-in logic analyzer. This analyzer is based on intelligence that was gathered from the messages generated by checkers and trackers. Upon setting trigger conditions, failing tests are re-run on the emulators. Their goal is to capture a snapshot of waveforms, which are used to debug failures. It takes a few minutes to set appropriate triggers in the logic analyzer and capture a window of waveforms. If the failure is not captured by the trigger conditions, the above process is repeated. A new set of trigger conditions is then defined for the logic analyzer.
Figure 3: For a viable solution to the verification gap, use a product like Tharas Systems� Hammer-100 as the logic verification and debug engine.
The emulation-based methodology successfully enables NVIDIA engineers to debug GPUs in the context of the entire system with real-life games. To provide more debug throughput, however, an alternative methodology also is deployed. That methodology is based on an accelerated-verification methodology, which allows designers to capture waves for the entire duration of a test. It therefore alleviates the need to set triggers in emulation, which is a complex and time-consuming process. In addition, this accelerated-verification methodology is deployed to run applications like ATPG, BIST, or functional-gate-level simulation (see Figure 3).
REFINED FUNCTIONAL VERIFICATION
The refined-functional-verification methodology uses Tharas Systems’ logic verification and debug engine, the Hammer-100. As soon as full-chip Register-Transfer-Level is available, it is compiled using fully automated scripts for Hammer-100. To accelerate the turnaround time, scripts automatically detect incremental changes in the Verilog design database. New memory configurations are an example of such a change. This methodology uses the same Verilog code that is used for software simulation.
Figure 4 shows the refined methodology, in which designers use the Hammer-100 as a main debug engine. Yet it also co-exists with logic emulation technology. Failing tests on the emulator are sent off automatically to the Hammer-100 acceleration farm. There, waveforms are generated for each test. At the same time, designers use Hammer-100 for traditional RTL and gate-level design debug.
Thanks to innovations in the Hammer-100 compilation process, the designer can compile designs at a rate of up to 50-million RTL-gate equivalents per hour on a single 3-GHz Linux workstation. The Hammer-100’s processor-based architecture, scheduling algorithms, and compact database enabled large multimillion-gate designs to be compiled on a single PC in 10 to 15 minutes. This speed allows an entire design to be turned around multiple times a day. For even faster design debug cycles, Hammer-100 offers a modular and incremental compile feature. That feature only re-compiles portions of the design or testbench that have changed. It also supports cross-compiler capability, as it compiles on Linux and runs on Solaris or vice versa.
Figure 4: This graphic illustrates the concept of a refined verification methodology.
In this effort, support for IEEE Verilog 1364-2001 was extremely valuable and timely. The Hammer-100 accelerates a broad set of hardware-description-language (HDL) constructs. Among those constructs are latches, multiple clocks, gated clocks, switch-level models, models with signal strengths, and user-defined primitives (UDPs). Non synthesizable Verilog constructs, which included memory and pad models, benefited from acceleration. Behavioral constructs, such as system tasks, user tasks, and initial blocks, also were accelerated. Typically, those constructs are deployed in testbenches. In addition, the Hammer-100’s ability to obey design delays proved to be an especially useful feature. This characteristic enables the Hammer-100 to accommodate delays on the output pins.
Because the Hammer-100 is well integrated with standard debug tools, it was adopted into NVIDIA’s existing simulation methodology. This approach made every signal within the design visible to the user via low-overhead waveform dumping (fsdb, vpd, vcd, and sst formats). It could either be run in regression mode or the design team could set break points and step through a simulation region of interest. Debugging was performed in hardware just as it is with an emulator. It was therefore extremely fast. The design team also examined signal values and set breakpoints in the accelerator.
NVIDIA utilized the flexibility of concurrent, parallel, and progressive trace conversion. Based on a user-definable trace translation period, concurrent trace conversion rapidly delivered the entire simulation trace data. It delivered this data soon after the acceleration run was completed. Using a farm of PCs, parallel trace conversion generated post-processed trace data. Meanwhile, progressive trace conversion helped store the last “n� cycles of trace data from the time of interest.
A BETTER WAY
Using acceleration technology from Tharas, NVIDIA put together a very effective and highly productive verification environment. It combined the strengths of both the acceleration technology from Tharas and existing emulation tools. The resulting verification flow was automated with existing emulation tools. Failures detected during emulation prompted the necessary tests to be launched on the Hammer-100 with 100% signal visibility. As tests progress on the accelerator, waveforms will then be collected. Next, they are automatically dispatched to the appropriate designers for debug.
Figure 5: The combination of acceleration technology and existing emulation tools results in a highly productive and effective verification environment.
Specific deliverables of the NVIDIA verification environment included:
• Scripts to compile Register-Transfer-Level/gate-level design
• Scripts to automatically generate memory cells modeled inside Hammer-100. The design database evolves to generate memory models, which are mapped inside Hammer-100. As those models are generated, the scripts automatically identify newly added memory cells.
• Automated compile scripts. Using Hammer-100, multiple compile databases are maintained. They establish synchronization between in-circuit emulation and accelerated verification. When the design changes, a new compile is automatically launched.
• A script that automatically launches a test or suite of tests on Hammer-100 as soon as a failure is detected on an in-circuit emulator. It uses the same database version.
• An automated trace-data conversion script that utilizes the Linux farm at NVIDIA. As a result, waveforms are available for debug in minutes.
• An intelligent regression script that determines, based on the time spent in generating vector stimulus, whether a simulation job is suitable for the Linux farm. It utilizes either a software event simulator or accelerated verification using Hammer-100.
• On-the-fly waveform generation for emulation failures
• Fully automated Hammer flow with existing simulation and emulation environments
• ATPG and BIST acceleration environment.
• Verification performance that is able to accommodate ATPG and BIST gates of up to 47X to 53X and RTL of 10 to 20X (full dump).
Using this methodology, NVIDIA has verified several generations of GPUs.
While several Hammer-100s are used for RTL debug, few systems are dedicated for gate-level applications like ATPG and BIST. With its ability to force/release nodes and its huge in-system memory, Hammer-100 is an effective solution for ATPG and BIST. It also is being deployed as a gate-level functional acceleration engine within NVIDIA .
Using the combination of Hammer-based accelerated verification and NVIDIA’s existing in-circuit emulation methodology, NVIDIA’s verification team was able to successfully complete verification of the GeForce 6 series (see Figure 5). The series architecture supports Microsoft Corporation’s DirectX 9.0 Shader Model 3.0. It also incorporates a fully programmable, high-definition-video processor.
Narendra Konda is a verification manager at NVIDIA. He is responsible for the management of all hardware-assisted verification projects. Prior to NVIDIA, he worked at 3dfx Interactive Inc., where he pioneered a verification methodology. He also spent five years at Quickturn Design Systems in the applications engineering group. Konda holds a M.S. in Electrical Engineering from Portland University.
Sanjay Sawant is director of marketing and business development at Tharas Systems. He is responsible for driving the existing and future direction of the hardware-assisted product line. Prior to Tharas, he worked at Cadence. There, he evolved RTL emulation technology from concept to reality. Sawant holds a M.B.A. in Technology Management from the University of Phoenix.
......................................................................









