The Perfect Recipe
By Chris Rowen
I’ve been working on logic synthesis and layout for almost 30 years, and the technology never ceases to amaze me. The core problem is a hard one: How do you take a high-level logic description, written in in human-comprehensible terms, and transform it into a near-optimal network of gates, realized in any desired semiconductor process?
Logic synthesis has evolved enormously over the years, applying ever more sophisticated transformations in the combination and sizing of gates in order to find common sub-expressions, map to target logic cells, reduce path delay and lower power dissipation. Layout also has changed radically, as complex standard-cell libraries, optimized for specific CMOS processes, have become the most common building blocks for complex logic functions. And all this operates under the unyielding requirements of strict functional compatibility with the original description, typically Verilog or some high-level language.
The logic transformations of logic synthesis act to chop up the logic into a uniform logic “puree” in which the original gates of the design are no longer identifiable. It’s like putting a tomato in a food processor. As clusters of gates are replaced by logical equivalents, most of the original signal names are lost—only explicit register state elements, usually flip-flops, have a chance of retaining any one-to-one connection to the original Verilog. On the other hand, this logic puree becomes a highly versatile ingredient in the SoC design kitchen. Multiple functional blocks can be stirred together and optimized to create better compound functions.
The radical de-structuring of the logic in synthesis creates great challenges in placement and routing of the cells. Even if the logic originally was expressed in a regular structured form, as many datapath functions are, that regularity is destroyed in the “puree” process. The job of placement is to discover or rediscover the optimal (x,y) topology for that logic, to meet the often-conflicting goals of area, speed, power, interface organization and block aspect ratio required in the ultimate full-chip design. Then the router needs to reconnect all those blocks within the space available—the routing channels over and between the logic cells—to make the circuit work again. It’s like trying to reconstruct the whole tomatoes again from the tomato paste. The results are imperfect, but still remarkably tasty.
As designs get bigger and bigger, the challenge gets worse. A leading-edge digital signal processor core may contain hundreds of thousands of cells, implementing more than one million basic logic gates. Designers must choose the best recipe. One recipe calls for decomposing the processor core into a dozen or more sub-units, which are pushed individually through synthesis, then placed and routed together. This method retains a degree of structure, but forgoes the benefits of optimizing the logic across the sub-unit boundaries. An alternate recipe calls for pureeing the whole core together, then relying on placement and routing to reconstruct the natural organization of the processor. This second recipe is more time consuming in the tools, but generally seems to lead to the best results.
Recently my team has been applying this second method to the latest version of our ConnX BaseBandEngine 64 DSP core. We faced a dilemma with the recipe. Even applying the most advanced synthesis, placement and routing tools resulted in a design with one small area with very high routing congestion. The density of wires reached a critical threshold where the required connections could be completed, but not with the expected timing. Some wires had to take “scenic routes” around the congested area. But what was causing the congestion? We couldn’t just look at the gates in that small region because all of the names had been lost in synthesis. We tried coloring the layout plots by major function unit, based on the retained names of their flip-flops, but the area of congestion remained a dark lump. Finally we devised a way to “taste” the lump and trace thousands of signals back to their associated flip-flops to learn why all those wires converged in this one place. We quickly identified an obscure and relatively small function unit that was defined with an excessive number of global connections. By making a small tweak in this little function unit, the worst-case routing congestion looks significantly better.
I love to cook, and processor-based SoC design sometimes poses similar puzzles. A standard recipe is a great starting point, but you also need to pay close attention to the taste and texture of what you’re creating. When you’re in the kitchen doing something new, you can invent new twists on the recipe to make things even more delectable.
—Chris Rowen is the chief technology officer at Tensilica.
