Blogs

Taken For Granted

bloggerDATE 2010 Preview

The Design Automation and Test in Europe 2010 conference will be held in Dresden Germany from March 8 to 12. DATE...

Tuning into Jim

bloggerGoing, Going, Almost Gone

There has been a trend over the past several years in the electronics community. It has been driven by the dismal economy...

EDA Thoughts

bloggerCarbon Footprint is Good For ICs

IBM just demonstrated graphene transistors that could become a replacement for pure silicon-based ICs. | Photo...

Pallab's Place

bloggerNetwork ICs - packaging is a key design element

I recently had a chance to have a conversation with Judy Priest of Cisco about some of the design and packaging issues for...

Poll

Where will the device design growth be in ten years?
Multicore
Programmable
Wireless
Low-Power
IP
New Technology
   
View Results

Article

[ Printer Friendly ]

Published in Aug / Sept 2005 issue of Chip Design Magazine

What is the Best Way to Take Algorithmic Design to Implementation?

Instead of a "head-to-head" column, it might be more accurate to entitle this a "head-to-toe" column. Yatin and I agree that designing chips based on highly complex mathematical algorithms warrants special consideration. Given the focus of our respective companies, it won't be surprising to see that we focus on different ends of the design problem.

An increasing number of ASIC designs are based on highly mathematical algorithms. Why? Media-processing systems, which contain wireless communications, imaging or audio processing, are all based on mathematical algorithms. These systems require a unique design process that start with the initial description of the algorithm and continuing to the final implementation.

Getting the algorithm right and implementing it on the right mix of hardware and software is the key to a successful system. The implementation decisions start at the architectural level. To ensure design success, however, the high-level model must be tightly coupled to the implementation design flow. More implementation detail needs to be brought into the algorithmic design. Tradeoffs can then be made at a higher level. In addition, more implementation detail needs to be passed to the register-transfer-level (RTL), verification, and software engineers. They'll start from a firmer footing as they begin to create the realizable description of the system.

Media-processing systems are signal-processing-centric. Consider Ultra Wideband (UWB), 802.11n, or H.264. The signal-processing algorithm is the intellectual core of the design. The complex mathematical algorithm must be described at a high level so that it can be thoroughly characterized and optimized for mathematical accuracy. The algorithm design language of choice is MATLAB from The Mathworks.

Initially, there isn't a distinction between the hardware and software portions of the algorithm. It's possible that the entire algorithm will be implemented as an application-specific integrated circuit (ASIC). It also is possible that the algorithm will be implemented as software executing on a standard digital signal processor (DSP). For our discussion, let's consider a common case for sophisticated signal-processing systems: Part of the algorithm becomes custom RTL while another part executes on an embedded core.

For mathematical accuracy, holistic design is important. The whole algorithm must be completely characterized before it can be divided. Usually, a small group of system architects starts by creating a MATLAB model of the algorithm. The initial algorithm that's described is an idealized floating-point model. Extensive simulations are executed to characterize the mathematical behavior.

To become an end product, this algorithm will have to go through multiple transitions. Being able to reproducibly go from the high-level description of the ideal behavior to implementable RTL or deployable C is fundamentally important to the design process. To make accurate tradeoffs at the implementation level, system architects need a reliable way to go from MATLAB to either RTL or C. In addition, implementation engineers need accurate guidance on the algorithm's technology requirements.

Let's make these tradeoffs more concrete by using an example. Consider the design of a wireless communication chip that's roughly characterized by an inner and outer receiver. The data rate of the inner receiver is much higher than the rate of the outer receiver. It wouldn't be uncommon to try to implement the outer receiver on an embedded core with the inner receiver implemented as a custom ASIC. How do the system architects begin to make these assessments so that they can pass the implementation details to the various engineers responsible for building the different pieces? The whole system is described in MATLAB. Profiling capabilities in MATLAB can be used to determine the relative that was spent in both receivers. That first cut will determine if the hardware/software split can be made.

Let's assume that the split looks promising. To begin to model the real-world effects of going to either an embedded core or into gates, more detail needs to be added to the ideal or �golden? floating-point algorithm. By adding modeling capability into the �golden? algorithm and simulating early on, it's more likely that the algorithm that's handed off to the implementation engineers can work well in the target.

For the software portion, this would mean converting the data widths and ideal math of the floating-point algorithm to the actual data widths and math of the target processor. The more complex and realistic model would then have to be simulated again in MATLAB to determine if the behavior of the algorithm is still acceptable. If this modeling can be accomplished early in the design process, software engineers have a better starting point as they write the embedded software. They won't be worrying about quantization effects while trying to optimize code for efficiency.

For the hardware implementation, chip designers wouldn't want to just add information to model the exact bit widths of the datapaths and hardware math. They'd want to be able to analyze and optimize the widths of the datapaths. RTL simulation isn't the place to debug quantization issues. Optimizing for quantization is best done at the MATLAB level. Again, being able to do this at the algorithm level gives them the ability to optimize it for mathematical efficiency. With the accurate and optimized datapath information built into the �golden? algorithm model, RTL engineers can concentrate on building the correct RTL instead of debugging quantization issues.

What happens if the design team finds that the hardware/software split doesn't work as the design progresses? With the �golden? algorithm accurately reflecting the split and quantization details, changes can be made and simulated much more easily than having to work at the embedded-software and register-transfer levels. Even more powerful would be the ability to synthesize the MATLAB directly to a more structured and detailed language, such as C. That language could then be used as the starting point for both the embedded-software and RTL creation. Of course, all of the analysis needed to make the detailed power, speed, and size tradeoffs that are necessary to build the system must still be made. But automatic synthesis will allow more exploration of the implementation possibilities. ?

Lisa Schmidt is Vice President of Marketing for Catalytic Inc. of Palo Alto, Calif. (www.catalyticinc.com).

The industry has frequently debated the benefits of electronic system level (ESL) and designing at multiple levels of abstraction. Although the proponents say that the benefits are numerous, the widespread adoption has lagged the forecasts. The reasons are many, complex, and intertwined. Whether they're the classic systems-on-a-chip (SoCs) or emerging networks-on-a-chip (NoCs), most of today's chips contain several million logic gates, several million bits of memory, and several million lines of embedded software. How do you decide which features should be converted into silicon and which ones should stay soft? It's not easy to separate software and hardware.

Let's start with the design of a communication chip. If you were a system (chip) architect with hardware genes, your software view might be fairly limited. You start thinking about the bits and bytes and buses. But you're very narrowly focused on a handful of options and how the data is manipulated. Fundamentally, you need to challenge the quality of input for implementation. How do you know that one algorithm is better than the other or whether you have sufficient physical requirements provided to you for implementing the device? After all, it's very likely that an inferior algorithm will be much cheaper (cost) and easier to implement (time to market) for an entry-level product. In contrast, cost may be a secondary consideration for the developer of a device that requires high reliability. But you can't evaluate many alternate algorithms with the hardware tools at your disposal.

If you're a system architect with a strong software background, however, algorithms are what you think of first--the flow and manipulation of data. You figure out the best performance to achieve the desired throughput. You simulate a MATLAB model under different conditions and validate your assumption. Then you pass the model on to someone who understands hardware. You may even generate an RTL netlist and give it to him or her. Your job is done�Or is it?

The implementation of a device--especially a complex one--is much different from an algorithm development or validation. Yes, you compile your RTL and simulate, synthesize, analyze, and debug. Then there are the physics, economics, and real-life business issues. I don't mean that algorithm development does not involve an understanding of these issues. But when it comes to implementing a real device, there are many more considerations than the throughput or the quantization errors involved. System (chip) -level considerations include the following: Can the logic gates for this algorithm fit into a selected device? Can it be manufactured reliably with the amount of power consumed or heat dissipated? Can it be tested and perhaps repaired in the field without service disruption? Finally, what are the development costs so it can be competitively priced to offer a solution that makes business sense?

Even when we consider two not-so-distant steps in the design flow of taking an algorithm and translating it into RTL, a number of considerations must be taken into account. Will a complex floating-point operation in the algorithm be implemented as a custom designed datapath? Or will it be constructed by discrete logic gates from a library?

Two algorithms may exhibit equal performance--one requiring 1 million bits of multi-port embedded memory and another with many small, distributed register files and a much smaller embedded memory. But will built-in repair requirements for acceptable device reliability yield double the cost? Is the centralized bus architecture for one large memory likely to increase routing congestion? Or will many long wires cause signal-integrity problems? Before you choose the algorithm to implement in hardware, make sure you assess its impact.

Depending upon how the high-level algorithm was coded and eventually translated into RTL, the implementation can vary drastically. RTL designers have often dealt with poor coding styles by adhering to strict guidelines. They've also provided special directives to clarify the design intent. The process to map algorithms to RTL is a guided one as well. But it often lacks proper physical data (library) or understanding of the implementation considerations. Even if the physical data is available, many optimization controls in logic and physical synthesis can quickly invalidate such considerations (i.e., lack of a common understanding and correlation between two domains).

Here, a good bridge between algorithm developers and hardware implementers is necessary. The ability to efficiently code and analyze high-level algorithms is very important. It must be taken into the implementation world with RTL and as many physical constraints as the system designer can possibly identify. Such constraints include area, power, performance, testability, and software accessibility.

At this stage, there should be rapid analysis of the potential implementations. A software engineer can then explore the hardware implementation and assess its feasibility. This task is not the same as implementing hardware. Rather, it's the ability to estimate the area, performance, power, congestion, and testability--the implementation-quality metrics--for a given technology. It is RTL exploration, constraint validation, and the estimation of quality metrics.

The predictability of the implementation plays an important role. By their very nature, estimations are inaccurate. Any iteration to re-code the algorithm because of a difference between estimated and implemented quality metrics is very costly. Such estimation must be done using a system that performs continuous refinement from RTL to GDS. It must understand the entire RTL-to-GDS spectrum. At the same time, it must provide tradeoffs between accuracy and run time to be of practical use. If the exploration of an algorithm for a 5-million-gate device takes an entire day (24 hours), the number of times you perform such explorations will be very few. If it takes an hour or two and provides significant insight into quality metrics, however, you could evaluate many variations of your algorithms.

If the exploration process was in the same environment that implements the RTL-to-GDS flow, your prior efforts would now be the basis for real implementation. You use the same RTL, constraints, libraries (i.e., process information), and even the same scripting and debug environment. The result is an improvement in both design predictability and designer productivity.

Chip designers have gained tremendous predictability and productivity by using a unified design environment for the entire RTL-to-GDS flow. The same benefits can be extended to algorithmic designers with RTL exploration. So�Is it going to be a standard-cell ASIC, a structured array, or a programmable device? 130, 90, or 65 nm? CDMA or TDMA? ?

Yatin Trivedi is the Director of Synthesis, DFT, and Formal Verification Product Marketing at Magma Design Automation (www.magma-da.com).

......................................................................

EDAC EDAC GSA IEC OCP Si Subscribe Advertise About Us Contact Us