Published on July 29th, 2008
EDA vendors are all talking about their “multi-core plans” these days. We’ve hit the wall on processor clock speeds. CPUs are no longer getting faster, but we are getting more of them. And there is no doubt that we need as many as we can put to work because the computational load is exploding, runtimes are getting worse, and it’s taking longer to close designs. It’s critically important that we make efficient use of multi-core technology, but it’s no easy “slam-dunk.” EDA software must be transformed or rebuilt from the ground up to use many processor cores effectively. Otherwise the results can be very disappointing.
EDA vendors use a variety of terms—multi-threading, multi-core, multi-CPU—to describe the capability of their software, but these are ambiguous and can’t really differentiate one product from another by themselves. It’s difficult to tell which products are really on the right technology roadmap when it comes to adopting multi-core technology. I’ll try to peel the onion a bit to highlight the challenges and potential solutions to the multi-core challenge for EDA.
Firstly, what’s driving the need for multi-core in EDA now more than ever before? The demand for compute cycles in EDA is exploding. At every stage of the design lifecycle, from system-level simulation, SPICE modeling, physical layout and verification, mask resolution enhancement, and testing, the computing load is growing due to both the increasing size of the designs, and the need for more robust and accurate modeling. Take IC implementation, or physical layout, as a case in point. We are facing an exponentially increasing number of gates, an exploding number of modes and corners due to process and manufacturing variability and multiple operating modes, more complex computations to ensure signal integrity (SI), multiple voltage domains, and many other challenges. For example, the latest SoC designs at 45nm have more than 100+ million gates, and the number of mode/corner combinations over which the design must be closed can number in the tens or hundreds. This is driving exponentially increasing EDA computing workloads.
Since the growth in computing power is now coming for multiple cores instead of increasing clock speeds, EDA software must be highly optimized for multi-core platforms. We can only get faster turnaround if we can take full advantage of the availability of multiple cores by keeping all the processor cores as busy as possible all the time. This requires breaking the overall problem into pieces that are independent, that is, whose results will not affect each other. Of course, this is the trick in any parallel computing challenge.
In place-and-route, there are many way to partition the overall work flow, and a lot of the low-hanging fruit has already been picked in the form of coarse-grained parallelism. But unfortunately, straightforward enhancements to legacy code will only provide performance improvement (that is, scale) to a few cores. To really take advantage of future compute platforms with tens or hundreds of cores, you need a sophisticated approach that includes a combination of coarse-, medium-, and fine-grained parallelism aimed at the most critical performance bottlenecks.
Timing analysis is the fundamental “cost optimization function” for all P&R decisions, and virtually every change in a layout will impact timing in complex ways. Consequently, parallelizing timing analysis provides one of the biggest potential improvements across the overall implementation flow. But timing analysis and optimization is difficult to parallelize because the associated data is highly interrelated across the chip. Think about the non-sequential nature of combinatorial loops, signal integrity, and time borrowing. In these problems, running computations in parallel can result in non-deterministic behavior if not done correctly.
The solution requires advanced analysis of the data dependencies in each specific IC design or layout, and a highly efficient way to organize processing to minimize the need for synchronization locks and data sharing that can steal away efficiency and limit scalability. To complement dataflow analysis technology, the basic architecture of the EDA tool must lend itself to a parallel approach. In addition, the tool must be able to employ different parallelization strategies using a mix of coarse-, medium-, and fine-grain parallelization based on the characteristics of the specific IC design and the relevant tasks in each step of the design the flow.
These requirements create a huge challenge for legacy EDA products. Existing code, written for serial execution, is very difficult to modify for parallelization. As a short-term fix, multi-core support is often provided as a “wrapper” around an existing code kernel. This parallelizes some aspects of the workflow, but not necessarily the most challenging and compute intensive parts. For example, wrappers can’t parallelize the most critical timing analysis tasks such as extraction, delay calculations, and signal integrity analysis. The result is modest scaling improvement that only extends to a few cores.
Newer software architectures are ahead in the multi-core game in this respect. These can employ a kernel suited to fine-grained parallelization with minimal synchronization overhead. With the right software architecture, even the most critical tasks like timing optimization can scale almost linearly over hundreds of cores.
So when you’re evaluating the potential of multi-core technology to accelerate your IC design flow, don’t be placated by generic software terms like “multi-threading, multi-processing, or multi-core-enabled.” Look under the cover at the EDA tool architecture and demand performance data that demonstrates real scalability across multiple cores for much faster design closure.
Sudhakar Jilla is the marketing director for Place & Route Products at Mentor Graphics.