Blogs

Pallab's Place

bloggerNetwork ICs - packaging is a key design element

I recently had a chance to have a conversation with Judy Priest of Cisco about some of the design and packaging issues for...

JB's Circuit

bloggerGoing Beyond and Returning to Reusability

Design for the Consumer Era is seen as the next iteration of the infamous Design-for-X paradigm shift by keynote presenter at...

Tuning into Jim

bloggerGoing, Going, Almost Gone

There has been a trend over the past several years in the electronics community. It has been driven by the dismal economy...

Taken For Granted

bloggerDATE 2010 Preview

The Design Automation and Test in Europe 2010 conference will be held in Dresden Germany from March 8 to 12. DATE...

Poll

Where will the device design growth be in ten years?
Multicore
Programmable
Wireless
Low-Power
IP
New Technology
   
View Results

Article

[ Printer Friendly ]

Published in December/January 2004 issue of Chip Design Magazine

Hardware Tools for Design

Risk reduction always comes at a cost--trial and error will determine how much.

As IC size and complexity increase in accordance with Moore’s Law, designers need to improve their verification capability to speeds, which are orders of magnitude faster than a software simulator. Faster simulation is particularly important in products that include human interfaces such as audio or video applications --or any application for that matter that has to handle high data-rate signals.

Unfortunately for circuit simulators, data sets these days and logic interconnections are reducing simulator throughput to less than 10 Hz for large, complex designs. Just getting through the boot sequence of an embedded microprocessor can take a full day, and only then can you start feeding gigabytes of data into the system.

In-circuit emulation remains the only way to check many of the larger and faster designs in real world situations. Applications including communications devices and PCs--and now some consumer products --need a “gold standard� successfully running non-trivial amounts of data before they can be deemed ready for release to market. In fact, long data sequences are making emulation an increasingly mainstream technology, because some designs can only find a real bug after a full 10 seconds of real time operation (something that’s next to impossible in a 10 Hz software simulator).

Mitch Weaver, vice president of marketing for the functional verification group at Cadence Design Systems, Inc. (San Jose, CA), notes that designs have grown to the point where an average design (around 2.5 million gates) needs to run on an accelerated hardware platform to get through the verification. Subsequently, current generations of machines now present unified platforms of simulation software and accelerated hardware, where the user can move back and forth across the software and hardware with little change in operating calls across functions.

Facing hard time

Further exacerbating the situation, software developers are needing to integrate their work with the hardware models, but the respective time domains are radically different. Hardware single-stepping increments a clock by one count, while software single-stepping involves advancing the code by one instruction. Unfortunately, a single instruction can use anywhere from one clock to a dozen or more, depending upon the processor and instruction set, while the execution chronology of a single instruction is further blurred by the multiple-issue, multiple-execution unit architectures in use.

One solution is to port the design to a hardware prototyping system. Many have tried to use a number of large programmable logic devices (PLDs) to create a custom hardware set that matches the design both physically and logically. This process has grown easier of late as PLD manufacturers have begun to embedded microprocessors into their products.

However, it’s still a daunting task to map a design into the fewest possible programmable devices (ideally, just one). Worse yet, if the original target was an ASIC, a design has to be modified to map it to an FPGA and often the design must be partitioned and mapped into multiple FPGAs because it’s simply too large to fit into one programmable chip. Meanwhile, developing a prototype using PLDs requires the creation of a custom PCB and jury-rigged debug hardware and software, while partitioning the system into multiple PLDs and establishing the differences between PLD logic structure and actual gates adds to the challenge.

As an interesting aside, some users of the largest programmable logic devices are themselves starting to look closely at the use of acceleration hardware. As their designs exceed the 500k gate barrier and their tools show the throughput degradation characteristic of all software simulators approaching overload, the alternatives available to PLD designers are beginning to converge towards those available to ASIC designers.

Meanwhile, as an alternative to the use of custom hardware, users can always try to construct a large server farm and run hundreds of simultaneous simulations. However, this process only improves the process if the design or data are highly parallel and the system can flag the differences among the many runs. Otherwise, the designer is forced to somehow extract the data and thereby identify marginal or unacceptable configurations.

Software tools continue to change and improve. In the verification space, formal methods can certainly improve throughput, but only to the extent that the formal tools can handle the design sizes and complexities. One verification start-up, Carbon Design Systems, Inc. (Waltham, MA), converts RTL into a custom object-oriented representation that runs at kHz speeds, with the intent of providing a working environment for both drivers and firmware development.

Ultimately, users can avail themselves of a specialized hardware system designed for the task. This solution is an aggregation of hardware and software that masks the partitioning and transformation issues from the designer. Difficulties notwithstanding, simulation is still the most popular tool for most phases of design because it continues to offer full visibility to all nodes. Simulation allows the user the ability to control all nodes through presets or initial states. Accelerators will have to be compatible with the native simulator to maximize acceptance in the industry.

Laying out the case

The two main flavors of hardware are emulator--hardware that can run up to a few MHz with some links to native simulation software--or accelerator--specialized hardware that speeds up the execution of the underlying HDL much like a DSP speeds up numerical calculations in an SoC (system-on-chip). Establishing the definitions in greater detail:

Emulation it’s the process of moving the actual gate or RTL representation into hardware, either programmable logic or dedicated hardware. Capable of providing simulation speeds from the hundreds of thousands of cycles per second to a few million cycles per second, emulation is probably best suited for system and software integration, especially with the lower cost replicate boxes. The downsides of the technology include high costs (exceeding $1 million), effort and resources required to load the design into the machine, and expectations that the design is fairly close to completion.

Acceleration is a software-based simulation process, but with specialized hardware to speed up the simulation. These systems are capable of running a gate-level simulation at 1000 to 100,000 cycles per second -- not fast enough to use at-speed data or perform a useable human interface, but orders of magnitude faster than the software simulator. Accelerators are well suited for early stage and block-level debug tasks, where simulation is the primary tool. Unfortunately, the use of accelerators is always challenged by improvements in simulator performance and by ever-increasing speeds in underlying workstations and server farms. If an accelerator only gets 4x to 10x the performance of the base simulator, it may not be worth the costs in time and money.

In addition to emulation and acceleration systems, there are also various combinations of hardware/software that can co-simulate target hardware/software, meeting a growing need in many large designs where time-to-first-samples is constrained by the availability of the drivers and the firmware.

In the witness box

Charlie Miller, vice president of marketing for Aptix Corp. (Sunnyvale, CA), maintains that in-circuit emulation is a partial solution at best. You can’t use the testbenches because that leads to the increased use of co-emulation. Therefore, some of the job runs on the emulator, and the balance runs on a simulator; the two are linked through a set of software and hardware interfaces.

Ron Burns, vice president marketing at Axis Systems, Inc. (Sunnyvale, CA), agrees and notes that there’s an additional issue in the time required to bring up the design in the system. Burns says IC designers need a way to transition from acceleration to targetless emulation, while more quickly incorporating the testbench into the box. The performance of an emulator is measured in two parts: first the time to generate the emulator model and second, the time to perform the checking. Test generation can be raised to the transaction level, which greatly reduces detail and increasing performance, while further work goes on in behavioral processors and transaction interfaces.

Meanwhile, emulation still takes too much time to set up and run, according to Dave Ruggiero vice president marketing for Pittsburgh Simulation (Pittsburgh, PA). He says users have to change their concept of what constitutes an unknown input in order to use the boxes. The degree to which the methodology changes can lengthen the transition to a hardware-based debug and verification environment to a point that most designers can accept.

For interfaces to real world data streams, someone needs to build a custom rate adapter for each application. A rate adapter is basically an extended FIFO that starts and stops the data stream when the hardware can’t keep up. Either the vendor or the user needs to invest in developing custom rate adapters or together the vendor and customer have to develop the custom rate adapters.

Accelerators are like real hardware, there are no X or unknown states once the machine starts running. This according to Sanjay Sawant, director of marketing at Tharas Systems (San Jose, CA), who adds that this limitation is one of the major barriers to a more widespread acceptance of the hardware-assisted technology.

Polling the jury

Acceleration hardware can help debug systems-level designs much more easily if it can handle a wide range of intellectual property (IP) reuse in many forms--RTL, gates, assertions, C, firmware, and application software. To ease the integration of IP, emulation at RTL is moving towards transaction-based designs and testbenches.

Bo Nilsson, marketing manager at Hardi Electronics AB (Lund, Sweden), opines that when buying IP, it’s very important to know how to verify that the IP actually works. Too many people are familiar with the pain of buying IP that came with a weak spec and bad test/verification procedures. For those veterans of the vagaries of IP, it’s good to be able to run the blocks in hardware to test that it really works before it’s committed into a design.

Integrating IP is an excellent use for emulation according to Giovanni Mancini, product marketing director for the emulation division at Mentor Graphics Corp. (Wilsonville, OR). He asserts that while IP can enhance design size and complexity, the blocks are usually designed so that each one is considered to be in a separate domain. Only when the system is integrated do conflicts across domains becomes apparent, and cause data throughput to decrease or cease outright. It goes without saying that design teams needs to solve these issues in a timely manner, way before they become a impediment to a project.

Miller of Aptix, comments that the increased level of abstraction moves the designer from bits and bytes to transactions. As the amount of detailed data decreases, useful information on system-level functionality increases, allowing for easier access to trouble spots. The important analysis work moves up to the block (and interface) level. To optimize performance in the hardware, therefore, designers need to develop higher levels of reusable testbenches and verification intellectual property

While difficulties persist in checking the interfaces and handshakes associated with block-level integration, users today are increasing efforts to improve performance and flexibility at super-block integration levels of design. And as verification challenges are increasing at this level, users are rapidly migrating to high-level verification languages and increased levels of abstraction.

Beyond a reasonable doubt

Accellera has recently released the SCE-MI 1.0 standard (Standard Co-Emulation Modeling Interface) and some companies are now supporting it. The companies that helped develop the standard are obviously in favor of the standard, but others in the industry remain skeptical. (see Sidebar: “Hardware/Software Interface�)

Meanwhile, Chris Tice, senior vice president and general manager for the verification acceleration group at Cadence, says that the SCE-API (SCE Application Programming Interface) is good, but needs to be more open to other simulators and hardware. Cadence is leading the Design Working Group (DWG) for the API. The current release depends on hardware specific calls. The next version will extend the openness to other simulators and emulators.

Mitch Dale, product marketing manager for the emulation division at Mentor Graphics, declares that Ikos donated the original standards to Accellera and now has equipment in use production at six user sites. He admits that some of the standard may be a little too hardware specific. Nevertheless, the existing SCE-API defines an interface from the hardware to the HDL simulator, and to any testbench languages. The long-term vision is that the SCE-API will become a part of SystemVerilog. This vision is driven by the push to increased levels of abstraction, the need to address increasing levels of detail to complete the design, and the integration of functions in the tools which helps to span the gaps between simulation and hardware.

Steve Wang, vice president at Axis Systems, agrees that the current version is too hardware dependent and too focused on cycle-based emulation. Wang argues that the next generation of the specification should address this weakness by eliminating the need for vendor-specific calls. The specification needs a standard transaction interface, he says.

Approaching the bench

In practice, it takes from 2.5 to 5 years to develop and implement the changes in tools, design rules, philosophies, and methodologies needed to guarantee the benefits of emulation. But users need the right tool for the job today. Software simulation continues to improve its capabilities and performance, and though acceleration acts as a co-simulator at the block level, software simulators are still the best tools for HDL testbenches.

One specific element that differentiates between simulation and accelerated simulation is that the emulator cannot map the pads into the design. Therefore, design styles need to be flexible enough to account for fairly major differences in accelerated and software simulated operations, just as the design environments have to adjust for differences in simulation tools.

Adding a box to the design flow requires changes in methodologies, especially when it is operating in conjunction with a simulator or providing outputs into a simulator environment. For those machines that are based on programmable logic, for instance, one obvious change required is the need to partition the design to extract those portion(s) that benefit from speed up and that will fit in the available PLDs. Because of the differences in intermediate details, the designer needs to change the design process and create a parallel emulation database that mirrors the full simulation.

For greatest gains in overall productivity, users need to acquire hardware systems that have fast compile times to map the design to the analysis engine, fast run times to reduce the time required to bring up the design, good debug productivity to quickly find and fix bugs, and -- last, but not least--affordability.

Yiftach Tzori, founder of Simpod, Inc. (Santa Clara, CA), suggests that someone planning to start working in an accelerated hardware mode should pay attention to the following considerations. First, the user needs to ensure that the testbench is not intrusive at the point of acceleration --or can be selectively disabled while in acceleration mode. Second, if the design block exceeds the capacity of a single FPGA (over 500k gates), the user must pay close attention to the interconnections between modules due to the pin limitations of the FPGAs. Finally, users should be ever mindful of any special constraints associated with proprietary microprocessors or buses.

Beating the system

Key attributes of all systems should include flexibility, reusability, and the ability to expand the system. Of course, the speed and capacity of the systems is important, but even the fastest and largest hardware system won’t help much if design files and output take hours to transfer from the simulator to the hardware and back again. Considering the cost of some of these systems, the ability to share hardware resources across a number of users may be one important way to justify the equipment.

Meanwhile, it’s important to keep in mind that certain types of errors, like ambiguities and errors in the specifications, can only be found by prototyping. Changes in languages, tools, and methodologies can unexpectedly negate the speed advantages of the dedicated hardware. And should you choose to ignore these warnings, your sentence may be harsh--hard labor and heavy fines in time and lost market opportunities. 

Company

Product

Function/target verification task

Technology

Gate capacity

Effective simulation speed

Sharing capabilities

Aptix Corp.
Sunnyvale, CA
www.aptix.com

System Explorer MP3CF

System prototypes

Programmable interconnect plus FPGA

FPGA dependent

40 MHz

yes

 

System Explorer MP4CF

System prototypes

Programmable interconnect plus FPGA

FPGA dependent

40 MHz

yes

 

Software Integration Station

Field programmable circuit board replicate

Programmable interconnect plus FPGA

FPGA dependent

40 MHz

yes

 

Modules

Daughterboard for ICs

FPGA, IP cores

 

 

 

Axis Systems, Inc.
Sunnyvale, CA www.axisSystems.com

XoC

Co-verification for ARM processor based SoC

Reconfigurable computing (RCC) in FPGA

2.5 M gates

500 KHz

yes

 

Xtreme II

Platform verification

RCC

100M gates

1 MHz

yes

 

Xtreme

System verification

RCC

20 M gates

500 KHz

yes

 

Xcite-1000

RTL acceleration

RCC

2 M gates

100 KHz

yes

 

xcite 2000

RTL acceleration

RCC

20 M gates

100 KHz

yes

Cadence Design Systems
San Jose, CA
www.quickturn.com

Palladium

Acceleration, and in-circuit emulation

Custom processor

16 M gates

750 kHz

 

 

Cobalt Plus

Emulation, logic analyzer

Custom processor

20 M gates

600 kHz

multi-user mode

 

Cobalt Ultra

Emulation, acceleration, Hardware-software coverification

Custom processor

112 M gates

600 kHz

up to 32 users

 

Mercury

In-circuit emulation

Custom FPGA

20 M gates

100 kHz

 

 

 

 

 

 

 

 

Hardi Electronics
Lund, Sweden
www.hardi.se

Haps

Rapid prototyping

FPGA

8 M gates

200 MHz

no

Mentor Graphics, Corp
Wilsonville, OR
www.mentor.com/emulation

Nsim

Acceleration

Custom hardware

25 Mgates

2kHz

up to 8 users

 

Ares RTL Accelerator

Acceleration

Custom hardware

3.6 m (RTL) gates

1 KhZ

no

 

Celaro

Emulation

Custom hardware

30 M gates

2 MHz

up to 4 users

 

Vstation-30m

Emulation

Custom hardware

30 M gates

2 MHz

no

 

Vstation -15m

Emulation

Custom hardware

15 M gates

2 MHz

no

 

Vstation 5MX/15MX

Replicates for software development

Custom hardware

15 Mgates

2 MhHz

no

Pittsburgh Simulation
Pittsburgh, PA
www.pitsim.com

PSC V400

Logic, timing, and fault simulation

Reconfigurable computing (RCC) in FPGA

128 M gates

100k

yes, convert 1 board to controller for each user

Simpod, Inc.
Santa Clara, CA
www.simpod.com

Deskpod

System prototyping, in circuit emulation, testing

FPGA

8 M gates/board

250 kHz

no

 

Adapter boards

Daughterboard for processors, other IP

 

 

 

 

Tharas Systems
Santa Clara, CA
www.tharas.com

Hammer

Acceleration

Custom hardware

32 M gates

100 kHz

yes

......................................................................

EDAC EDAC GSA IEC OCP Si Subscribe Advertise About Us Contact Us