Gopi Kudva, Cadence
At smaller process nodes, chip designers are struggling to meet their aggressive schedules and power, performance, and area (PPA) demands in the ever-so-competitive system-on-chip (SoC) market. One of the most pressing problems designers are facing these days is not knowing how the netlist they produce in synthesis will work out in the place-and-route (P&R) process.
Not only does this lack of predictability impact the design itself, but it also dampens, unnecessarily, quality of life. After all, isn’t it always better when you know that what you created is good – and will allow you to go home at a reasonable time each evening, without worrying that an unknown problem will surface the next day?
At 28nm and below, SoCs are much more complex, making it more challenging than ever to meet PPA targets. Wires dominate the timing at these advanced nodes, so there’s a greater chance of encountering issues such as routing congestion and timing delays. You must cram more transistors into the die, and have to reduce dynamic and leakage power.
So, why are you still doing traditional synthesis?
Physically aware synthesis – the ability to bring in physical considerations much earlier in the logic synthesis process – is something that can dramatically improve the design process and significantly shorten the time spent fixing problems. Let’s discuss some key physically aware synthesis techniques that can help you speed up the physical design closure process for your next high-performance, power-sensitive SoC.
Physically Aware Synthesis
Today’s physically aware synthesis technologies bring physical interconnect modeling earlier into the synthesis process to help you create a better netlist structure, one that’s more suitable for today’s P&R tools.
You can start with no floorplan, and allow the synthesis to come up with one. You can give it a very basic floorplan. But the better the floorplan you have, the better you can take advantage of global synthesis optimization with the more detailed physical interconnect. Essentially, you are getting rid of the old logical-physical barrier. You’ll no longer need to, with fingers crossed, wait for your “backend” engineer to say “yay” or “nay.”
There are four physically aware synthesis innovations that we will discuss here. They are:
- Physical layout estimation (PLE)
- Physically aware mapping (PAM)
- Physically aware structuring (PAS)
- Physically aware multi-bit cell inferencing (PA-MBCI)
Of course, before you can come up with a good floorplan, you need to have a good initial netlist. To create that initial netlist, you can still use physical information and use physical layout estimation (PLE). For this, you just need some basic physical information, such as LEF and cap tables/QRC tech files. The floorplan DEF is optional here.
PLE is a physical modeling technique for capturing timing closure P&R tool behavior for RTL synthesis optimization. It allows you to create a good initial netlist for floorplanning. And the result? Better timing-power-area balance. PLE:
- Uses actual design and physical library info
- Dynamically adapts to changing logic structures in the design
- Has the same runtime as synthesizing with wireload models
Once you have a good initial netlist, you can create a good initial floorplan. Previously, this floorplan was used for P&R stages, and not in synthesis. But now, you can use this floorplan to allow the synthesis engine to “see” long wires before actually building the logic gates for the improved, physically aware netlist.
The steps in the latest physically aware RTL synthesis flow are shown in Figure 1. The three main steps in this synthesis flow are:
- Generic gate placement
- Global physical RTL optimization
- Global physical mapping
Figure 1: Physically aware RTL synthesis flow
Physically aware mapping (PAM)
PAM is all about improving timing with increased correlation.
- Initially places the optimized generic gates and macros
- Optimizes the placed generic gates and macros (RTL level optimization, including datapath optimization)
- Estimates routes and congestion for the placed generic gates and macros, taking into account physical constraints such as placement and routing blockages
- Performs parasitic (Resistance and Capacitance) extraction using a unique extraction method on the estimated routes
Figure 2. Physically aware mapping accounts for long wire delays in RTL synthesis
After generic gate placement, every wire in the design has a physical delay. The synthesis engine can now accurately “see” which paths are critical. Global synthesis now does timing-driven cell mapping based on physical wire delays, translating the generic gates into standard gates based on the provided technology library and creating an optimized netlist.
By considering real wire delays, PAM has demonstrated the ability to deliver up to 15% improved timing. After all, if you know in advance that a certain wire will be long and you know where that extra delay is because of the long wire, you can structure the netlist more accurately to account for these delays. With this knowledge, synthesis is also in a better position to “squeeze” critical paths and “relax” non-critical paths based on wire delays.
Physically aware structuring (PAS)
What PAS does:
- Provides optimized binary/one-hot multiplexer (mux) selection
- Targets high-congestion structures, such as cross bars, barrel shifters, and memory-connected mux chains
- Decomposes a large mux into a set of smaller muxes, each of which can potentially share the decode logic. Decoding logic, in turn, is intelligently partitioned using physical input pin knowledge.
- Generates congestion-aware decode islands via smarter select line sharing and duplication
The result of PAS: better placement that decreases routing congestion.
Figure 3. Physically aware structuring RTL synthesis flow
To illustrate the benefits provided by PAS and PAM, we considered a Flash memory design whose floorplan had a small channel of digital logic surrounded by Flash memory. This design suffered from congestion and timing issues due to a poor logical synthesis wire model. Timing closure was impossible. Once the engineering team utilized Cadence® Encounter® RTL Compiler Advanced Physical Option, which features the physically aware synthesis capabilities we have been discussing, TNS improved from ~12,400ns to ~750ns. The technology helped improve timing correlation by identifying long paths during physical synthesis and it also helped identify and alleviate congestion. In the end, the engineering team was pleased to experience significantly reduced design turnaround time and synthesis to place-and-route iterations.
As another example, we have a networking SoC with a one million instance block and with a large volume of muxes. Initially, with traditional synthesis, the engineers working on this design had initial significant horizontal and vertical congestion, hence the design was not routable. Using the physically aware capabilities of Encounter RTL Compiler Advanced Physical Option, the engineering team met their timing and area goals with a routable design with little congestion.
Typically, just to get the design to route, engineers have to “pad” the layout so much in order to account for the bad structure of the netlist! The wires are also longer due to extra spacing the padding creates, and leads to extra buffering and increased power. With physically aware synthesis, you can easily remove the extra padding and margins, thereby reducing area significantly and shrinking the die, lowering the wire length and power.
Physically aware multi-bit cell inferencing (PA-MBCI)
Multi-bit cell inferencing (MBCI) merges single-bit flops into a multi-bit version of flops. Using a physically aware MBCI (PA-MBCI) synthesis strategy can help reduce total chip power—10% or better dynamic power savings in many cases!
In synthesis, you can merge single-bit flops into a multi-bit flops using either a logical or physical method. A logical method is where synthesis considers only the netlist and converts as many flops into multi-bit flops without considering the flop locations and proximity. The disadvantage of this method is that you could end up with flops at two opposite ends of the floorplan merged, creating a placement problem and unnecessarily long wires, which in turn can create timing and routing problems.
Encounter RTL Compiler Advanced Physical Option features physically aware multi-bit merging. Physically aware multi-bit merging merges the sequential cells while considering the compatibility and physical neighborhood from the natural placement. This is a “correct by construction” process which makes sure flops are merged after placement, only when there is benefit in a specific cost factor (typically timing, area, leakage, and dynamic power), while not degrading other cost factors.
The result: the PA-MBCI process avoids timing degradation, reduces wirelength, minimizes congestion, and reduces power.
Multi-Bit Flops – Advantages and Best Practices
As an example of the benefits of using a MBCI flow, let’s take a look at the impact of this flow on development of a design based on an advanced-node embedded processor. Compared to using traditional synthesis techniques, applying physically aware synthesis to this processor yielded:
- 15% clock tree area
- 60% TNS (improved hold timing)
- 6.4% dynamic
- 4% leakage
- 4.7% routing
Tips and Tricks
To get optimal results from physically aware synthesis, consider these techniques:
- For generating an initial netlist, use PLE
- Use this PLE netlist to create a starting floorplan
- With this floorplan, perform synthesis staring from RTL
- Enable PAM
- If your design has high-congestion structures, such as cross bars, barrel shifters, and memory-connected mux chains, enable PAS
- If you have multi-bit flops in your technology libraries, enable PA-MBCI
- Now you have a physically aware netlist: use Encounter RTL Compiler Advanced Physical Option to perform standard cell placement and optimization
- Note that in this article, we only discussed generic gate placement and not standard cell placement
Physically aware synthesis techniques that can help accelerate the physical design closure process for high-performance, power-sensitive SoCs at 28nm and below.
Given the challenges of aggressive schedules, dominant wires, and the need for improved PPA in advanced-node SoCs, there’s a greater chance for routing congestion and delays in tapeout due to PPA issues. By accounting for physical considerations much earlier in the logic synthesis process, physically aware synthesis can help accelerate physical design closure. Physically aware synthesis techniques such as PLE, PAS, PAM, and PA-MBCI—available in Cadence’s Encounter RTL Compiler Advanced Physical Option—are contributing to better PPA and faster design convergence for advanced-node designs.