Chip Design
IN FOCUS

Josh Lee sees maturity in the services niche

Click on Chip Design IN FOCUS now to Stay Informed
 
 

Blogs

The Canonical Hamiltonian

Snooping on Planes from Space!
blogger

Hot on the heels of recent NSA revelations, the European Space Agency announced today[1] that they've put a satellite...

JB's Circuit

DAC 2013 Pictures
blogger

[gallery order="DESC"]

Chipnastics

New Processor Core Options Try Some ARM Wrestling
blogger

When designing a system on a chip (SoC) that employs one or more embedded processor cores, the choice of available...

Pallab's Place

EUV and eBeam at SPIE ADV Litho 2013
blogger

While the main manufacturing flows are still focusing on optical lithography, eBeam and EUV are still making progress on...

Taken For Granted

DAC 2012: Mystical Confluence: ESL Hockey Stick and The Cup!
blogger

Another note from DAC 2012:  In Gary Smith's Sunday night pre-DAC talk, he mentioned that in 2011, ESL tools took off...

Tuning into Jim

Mobile Devices Facing the Battery Gap
blogger

The semiconductor industry is often described in terms of ‘gaps' between two or more parametrics. With the rapidly...

Poll

Have you had CDC bugs slip through resulting in late ECOs or chip respins?
No
ECO
Respin
   
View Results

Affiliate Sponsors

Article

[ Printer Friendly ]

Making FPGA Synthesis Physically Aware

With the constant move to smaller process geometries, FPGA routing delays dominate timing more than ever. RTL synthesis followed by place and route can no longer guarantee fast timing closure. Since the synthesis engine does not know what the placement and routing of logic will be, it is forced to work with limited information. This lack of knowledge results in place and route spending a great deal of time trying to meet the user's timing constraints. In most of today's designs, a single pass cannot achieve timing closure. Consequently, the user is forced into multiple iterations of fine tuning the constraints, re-coding the logic and experimenting with different switches during synthesis and place and route to meet the constraints.

While this approach works for smaller designs, larger and more complex designs require more and longer design iterations to complete. With place-and-route taking several hours to complete for larger designs, this can severely impact the project schedule. Since time to market can determine the success or failure of a new product, this approach just is not feasible.

As FPGAs move into more complex design spaces, the market demands a fresh approach - an approach that requires the synthesis engine to not only be aware of the physical aspects of the design (such as packing rules, placement and routing) but also to reduce the effort needed in achieving timing closure.

Current Synthesis Technologies

The traditional approach to synthesis has been to compile and synthesize designs with minimal or no feedback from place and route. This method worked well with older FPGA architectures where cell delay is larger than routing delay. Designs were also simpler, meaning that traditional timing models could yield good results.


Figure 1: Comparison of the Runtimes of Various Synthesis Techniques

With increasing design complexity and shrinking process geometries, the traditional synthesis approach was no longer working. Timing correlation had also become a major issue, and consequently, so had quality of results. So FPGA tool vendors looked towards physical synthesis. This approach has the advantage of using real timing and placement data received from the place-and-route tool. The synthesis tool used this information to perform optimizations on the logic based on real timing data. While QoR improved, it still required multiple iterations. Since each pass required a place and route, this iteration could, for a complex design, take 24 to 36 hours! (Figure 1) These multiple iterations made achieving quick timing closure nearly impossible. Additionally, due to the complexities of physical synthesis, it is really only usable by expert FPGA designers.

From an EDA vendor's perspective, getting the physical device information and implementing physical synthesis for all FPGA suppliers is a challenge. Vendor-independent tools are able to support only a limited number of devices and vendors. This narrow support limits the choices available to the end user, who needs to fit the design in the smallest, slowest and cheapest possible FPGA.

Multiple alternatives have been tried to avoid the pure physical synthesis approach, but with limited success. The placement-centric flow uses a quick placement pass to determine the placement and then synthesizes the design based on that placement. This flow suffers from the inability to reliably predict routing (and placement), thereby throwing out all the good work done at the synthesis stage. Routing-centric or physical syntheses flows use a full placement and routing pass. But as stated earlier, this flow suffers from costly routing cycles and limited device support. Floorplanning has traditionally been a non-starter since it requires the designer to do a great deal of planning, and more often than not, design changes render the original floorplan useless.

Physically Aware Synthesis

For the synthesis tool to succeed, it needs to be aware of the physical characteristics of the design, and then minimize (or eliminate) design iterations. In addition, the tool needs to have wide device support. These capabilities should also ensure better QoR, quicker timing closure and the ability to select the cheapest (read smallest and slowest) FPGA available to the designer.


Figure 2: Physically Aware Synthesis Design Flow

What is needed is a unique approach of physically aware synthesis achieved with the help of advanced timing analysis (Figure 2). This approach does a quick pass of delay estimation by applying the concept of statistical timing analysis to the physical layout of the whole chip. During this analysis, a high-level view of the physical chip, including placement and packing rules, is always maintained. Based on this advanced delay estimation, physical synthesis techniques such as re-timing, replication and re-synthesis are performed on critical paths.


Figure 3: Before Retiming



Figure 4: After Retiming

Retiming across registers is performed when there is a negative slack on one side of the register while there is a positive slack on the other side. (Figure 3) The logic on the critical path not meeting timing is moved across the register (Figure 4) to ensure a more balanced timing path. This process can be performed across various design elements such as DSP blocks, carry chains, multipliers and multiplexers.


Figure 5: Before Replication



Figure 6: After Replication

Replication is a useful technique to ensure better timing. With the advanced delay estimation, very long critical paths can be accurately determined. Suitable logic on these paths are identified and replicated (Figure 5 versus Figure 6), helping to reduce fan-out on critical nets. By reducing fan-out and replicating logic on critical paths, the synthesis tool provides more start and end points to the placer during final place and route. This flexibility helps the place-and-route tool to achieve timing closure much faster. The re-synthesis techniques adopted help in avoiding increasing area utilization. All of these benefits are achieved by a simple, easy-to-use, push-button style synthesis. The end user is required only to provide reasonable (or required) timing constraints for the tool to provide the best results.

This approach can be successfully used for all the leading FPGA vendors with good results as it does not require in-depth knowledge of the physical layout of the FPGA. Design iterations can also be significantly reduced (or eliminated) resulting in significant time savings.

Conclusion

The innovative physically aware synthesis solution presented here can provide better QoR and faster timing closure with minimal (or no) impact to area utilization. Furthermore, as it does not require detailed FPGA architectural knowledge, this approach also helps the EDA vendor to quickly support the widest variety of devices. The end user can then select the cheapest/best device for the implementation. In addition, faster timing closure helps the user to get to market faster. These benefits combine to produce huge cost and time savings.

Comments about this article? Share your thoughts by writing our editorial director: jblyler@extensionmedia.com.

......................................................................

EDAC EDAC GSA IEC OCP Si Subscribe Advertise About Us Contact Us