Making FPGA Synthesis Physically Aware
While this approach works for smaller designs, larger and more complex designs require more and longer design iterations to complete. With place-and-route taking several hours to complete for larger designs, this can severely impact the project schedule. Since time to market can determine the success or failure of a new product, this approach just is not feasible.
As FPGAs move into more complex design spaces, the market demands a fresh approach - an approach that requires the synthesis engine to not only be aware of the physical aspects of the design (such as packing rules, placement and routing) but also to reduce the effort needed in achieving timing closure.
Current Synthesis Technologies
The traditional approach to synthesis has been to compile and synthesize designs with minimal or no feedback from place and route. This method worked well with older FPGA architectures where cell delay is larger than routing delay. Designs were also simpler, meaning that traditional timing models could yield good results.
Figure 1: Comparison of the Runtimes of Various Synthesis Techniques
With increasing design complexity and shrinking process geometries, the traditional synthesis approach was no longer working. Timing correlation had also become a major issue, and consequently, so had quality of results. So FPGA tool vendors looked towards physical synthesis. This approach has the advantage of using real timing and placement data received from the place-and-route tool. The synthesis tool used this information to perform optimizations on the logic based on real timing data. While QoR improved, it still required multiple iterations. Since each pass required a place and route, this iteration could, for a complex design, take 24 to 36 hours! (Figure 1) These multiple iterations made achieving quick timing closure nearly impossible. Additionally, due to the complexities of physical synthesis, it is really only usable by expert FPGA designers.
From an EDA vendor's perspective, getting the physical device information and implementing physical synthesis for all FPGA suppliers is a challenge. Vendor-independent tools are able to support only a limited number of devices and vendors. This narrow support limits the choices available to the end user, who needs to fit the design in the smallest, slowest and cheapest possible FPGA.
Multiple alternatives have been tried to avoid the pure physical synthesis approach, but with limited success. The placement-centric flow uses a quick placement pass to determine the placement and then synthesizes the design based on that placement. This flow suffers from the inability to reliably predict routing (and placement), thereby throwing out all the good work done at the synthesis stage. Routing-centric or physical syntheses flows use a full placement and routing pass. But as stated earlier, this flow suffers from costly routing cycles and limited device support. Floorplanning has traditionally been a non-starter since it requires the designer to do a great deal of planning, and more often than not, design changes render the original floorplan useless.
Physically Aware Synthesis
For the synthesis tool to succeed, it needs to be aware of the physical characteristics of the design, and then minimize (or eliminate) design iterations. In addition, the tool needs to have wide device support. These capabilities should also ensure better QoR, quicker timing closure and the ability to select the cheapest (read smallest and slowest) FPGA available to the designer.
Figure 2: Physically Aware Synthesis Design Flow
What is needed is a unique approach of physically aware synthesis achieved with the help of advanced timing analysis (Figure 2). This approach does a quick pass of delay estimation by applying the concept of statistical timing analysis to the physical layout of the whole chip. During this analysis, a high-level view of the physical chip, including placement and packing rules, is always maintained. Based on this advanced delay estimation, physical synthesis techniques such as re-timing, replication and re-synthesis are performed on critical paths.
Figure 3: Before Retiming
Figure 4: After Retiming
Retiming across registers is performed when there is a negative slack on one side of the register while there is a positive slack on the other side. (Figure 3) The logic on the critical path not meeting timing is moved across the register (Figure 4) to ensure a more balanced timing path. This process can be performed across various design elements such as DSP blocks, carry chains, multipliers and multiplexers.
Figure 5: Before Replication
Figure 6: After Replication
Replication is a useful technique to ensure better timing. With the advanced delay estimation, very long critical paths can be accurately determined. Suitable logic on these paths are identified and replicated (Figure 5 versus Figure 6), helping to reduce fan-out on critical nets. By reducing fan-out and replicating logic on critical paths, the synthesis tool provides more start and end points to the placer during final place and route. This flexibility helps the place-and-route tool to achieve timing closure much faster. The re-synthesis techniques adopted help in avoiding increasing area utilization. All of these benefits are achieved by a simple, easy-to-use, push-button style synthesis. The end user is required only to provide reasonable (or required) timing constraints for the tool to provide the best results.
This approach can be successfully used for all the leading FPGA vendors with good results as it does not require in-depth knowledge of the physical layout of the FPGA. Design iterations can also be significantly reduced (or eliminated) resulting in significant time savings.
The innovative physically aware synthesis solution presented here can provide better QoR and faster timing closure with minimal (or no) impact to area utilization. Furthermore, as it does not require detailed FPGA architectural knowledge, this approach also helps the EDA vendor to quickly support the widest variety of devices. The end user can then select the cheapest/best device for the implementation. In addition, faster timing closure helps the user to get to market faster. These benefits combine to produce huge cost and time savings.
Comments about this article? Share your thoughts by writing our editorial director: firstname.lastname@example.org.