Part of the  

Chip Design Magazine

  Network

About  |  Contact

Posts Tagged ‘Cadence’

Next Page »

Pushing the Performance Boundaries of ARM Cortex-M Processors for Future Embedded Design

Friday, October 31st, 2014

By Ravi Andrew and Madhuparna Datta, Cadence Design Systems

One of the toughest challenges in the implementation of any processors is balancing the need for the highest performance with the conflicting demands for lowest possible power and area. Inevitably, there is a tradeoff between power, performance, and area (PPA). This paper examines two unique challenges for design automation methodologies in the new ARM®Cortex®-M processor:  how to get maximum performance while designing for a set power budget and how to get maximum power savings while optimizing for a set target frequency.

Introduction

The ARM®Cortex®-M7 processor is the latest embedded processor by ARM specifically developed to address digital signal control markets that demand an efficient, easy-to-use blend of control and signal processing capabilities. The ARM Cortex-M7 processor has been designed with a large variety of highly efficient signal processing features, which demands very power- efficient design.

Figure 1: ARM Cortex-M7 Block Diagram

The energy-efficient, easy-to-use microprocessors in the ARM Cortex-M series have received a large amount of attention recently as portable and wireless / embedded applications have gained market share. In high-performance designs, power has become an issue since at those frequencies power dissipation can easily reach several tens of watts.   The efficient handling of these power levels requires complex heat dissipation techniques at the system level, ultimately resulting in higher costs and potential reliability issues. In this section, we will isolate the different components of power consumption on a chip to demonstrate why power has become a significant issue. The remaining sections will discuss how we approached this problem and resolved it using Cadence® implementation tools, along with other design techniques.

We began the project with the objective of addressing two simultaneous challenges:

1. Reach, as fast as possible, a performance level with optimal power (AFAP)

2. Reduce power to the minimum for a lower frequency scenario (MinPower)

Before getting  into the details of how we achieved the desired frequency  and power  numbers,  let’s first examine the components which contribute to dynamic power  and the factors which gate  the frequency  push. This experiment has been conducted on the ARM Cortex-M7 processor.  The ARM Cortex-M7 processor has achieved 5 CoreMark / MHz – 2000 CoreMark* in 40LP and typical 2X digital signal processing (DSP) performance of the ARM Cortex-M4 processor.

Dynamic power components

In high-performance microprocessors, there are several key reasons which are causing a rise in power dissipation. First, the presence of a large number of devices and wires integrated on a big chip results in an overall increase in the total capacitance of the design. Second, the drive for higher performance leads to increasing clock frequencies, and dynamic power is directly proportional to the rate of charging capacitances (in other words, the clock frequency). A third reason that may lead to higher power consumption is an inefficient use of gates.  The total switching device capacitance consists of gate oxide capacitance, overlap capacitance, and junction capacitance. In addition, we consider the impact of internal nodes of a complex logic gate.  For example, the junction capacitance of the series-connected NMOS transistors in a NAND gate contributes to the total switching capacitance, although it does not appear at the output node.  Dynamic power is consumed when a gate switches. However, interest has risen in the physical design area, to make better use of the available gates by increasing the ratio of clock cycles when a gate actually switches. This increased device activity would also lead to rising power consumption. Dynamic power is the largest component of total chip power consumption (the other components are short-circuit power and leakage power). It occurs as a result of charging capacitive loads at the output of gates.  These capacitive loads are in the form of wiring capacitance, junction capacitance, and the input (gate) capacitance of the fan-out gates. Since leakage is <2% of total power, the focus of this collaboration was only on dynamic power.

The expression for dynamic power is:

In (1), C denotes the capacitance being charged /discharged, Vdd is the supply voltage, f is the frequency of operation, and α is the switching activity factor. This expression assumes that the output load experiences a full voltage swing of Vdd. If this is not the case, and there are circuits that take advantage of this fact, (1) becomes proportional to (Vdd * Vswing). A brief discussion of the switching factor α is in order at this point. The switching factor is defined in this model as the probability of a gate experiencing an output low-to-high transition in an arbitrary clock cycle. For instance, a clock buffer sees both a low-to-high and a high-to-low transition in each clock cycle. Therefore, α for a clock signal is 1, as there is unity probability that the buffer will have an energy-consuming transition in a given cycle. Fortunately, most circuits have activity factors much smaller than 1. Some typical values for logic might be about 0.5 for data path logic and 0.03 to 0.05 for control logic. In most instances we will use a default value of 0.15 for α, which is in keeping  with values reported in the literature for static CMOS designs [1,2,3]. Notable exceptions to this assumption will be in cache memories, where read /write operations take place nearly every cycle, and clock-related circuits.

Here are five key components of dynamic power consumption and how we addressed a few of these components:

• Standard cell logic and local wiring

• Global interconnect (mainly busses, inter-modular routing, and other control)

• Global clock distribution (drivers + interconnect + sequential elements)

• Memory (on-chip caches) — this is constant in our case

• I /Os (drivers + off-chip capacitive loads) — this is constant in our case

Timing closure components

One fundamental issue of timing closure is the modeling of physical overcrowding.  The problem involves, among other factors, the representation and the handling of layout issues. These issues include placement congestion, overlapping of arbitrary-shaped components, routing congestion due to power/ground, clock distribution, signal interconnect, prefixed wires over components, and forbidden regions of engineering concerns.  While a clean and universal mathematical model of physical constraint remains open, we tend to formulate the layout problem using multiple constraints with sophisticated details that complicate the implementation. We need to consider multiple constraints with a unified objective function for a timing-closure design process. This is essential because many constraints are mutually conflicting if we view and handle their effects only on the surface. For example, to ease the routing congestion of a local area, we tend to distribute components out of the area to leave more room for routing.  However, for multi-layer routing technology, eliminating components does not save much on routing area. The spreading of components actually increases the wire length and demands more routing space. The resultant effect can have a negative impact on the goals of the original design. In fact, the timing can become much worse. Consequently, we need an intelligent operation that identifies both the component to move out and the component to move in to improve the design.

Accurately predicting the detail routed signal-integrity (SI) effects, before the detail routing happens, and its impact to timing is of key interest.  This is because a reasonable misprediction of timing before the detail route would create timing jumps after the routing is done.  Historically, designs for which it is tough to close timing have relied solely on post-route optimization to salvage setup /hold timing. With the advent of “in-route optimization”, timing closure has been bridged earlier during the routing step itself using track assignment. In addition, if we can reduce the wire lengths and make good judgment calls based on the timing profiles, we can find opportunities to further reduce power.  This paper will walk through the Cadence digital implementation flow and new tool options used to generate performance benefits for the design. The paper will also discuss the flow and tool changes that were done to get the best performance and power efficiency out of the ARM Cortex-M7 processor implementation.

Better Placement and Reduced Wirelength for Better Timing and Lower Power

As discussed in the introduction, wire capacitance and gate capacitance are among the key factors that impact dynamic power, while also affecting wire delays. While evaluating the floorplan and cell placement, it was noticed that the floorplan size was bigger than needed and the cell placement density was uniform. These two aspects could lead to spreading out of cells, resulting in longer wirelength and higher clock latencies. In order to improve the placement densities, certain portions of the design were soft-blocked, and the standard cell densities were kept above 75%.

Figure 2: Soft-Blocked Floorplan

Standard cell placement plays a vital role. If the placement is done right, it will eventually pay off in terms of better Quality of Results (QoR) and wirelength reduction. If the placement algorithms can take into account  some of the power  dissipation-related issues, like reducing  the wirelength and considering overall slack profile of the design, and also make the right moves during placement, this would tremendously improve the above mentioned aspect. This is the core principle behind the “Giga Place” placement engine. The Giga Place engine, available in Cadence Encounter® Digital Implementation System 14.1, helps place the cells in a timing-driven mode by building up the slack profile of the paths and performing  the placement adjustments based  on these  timing slacks. We have introduced this new placement engine on the ARM Cortex-M7 design and seen good improvements on the overall wirelength and Total Negative Slack (TNS).

Figure 3: “GigaPlace” Placement Engine

With a reduced floorplan and by removing the uniform placement and utilizing the new GigaPlace technologies, we were able to reduce the wirelength significantly. This helped push the frequency as well as reduce the power. But, there were still more opportunities available to further benefit the frequency and dynamic power targets.

Figure 4: Wirelength Reduced  with “GigaPlace and Soft-Blocked” Placement

Figure 5: Total Negative Slack (ns) Chart

In-Route Optimization: SI-Aware Optimization Before Routing to Achieve Final Frequency Target

“In-route optimization” for timing optimization happens before routing begins. This is a very close representation of the real routes, which does not account for the DRC fixes and the leaf-cell pin access. This enables us to get an accurate view of timing /SI and make bigger changes without disrupting the routes.  These changes are then committed to a full detail route.  In-route optimization technology utilizes an internal extraction engine for more effective RC modeling. The timing QoR improvement observed after post-route optimization was significant at the expense of a slight runtime increase (currently observed at only 2%). A successful usage of an internal extraction model during in-route optimization helped reduce the timing divergence seen as we go from the pre-route to the post-route stage.  This optimization technology pushed the design to achieve the targeted frequency.

Figure 6: In-Route Optimization Flow Chart

Design Changes and Further Dynamic Power Reduction

In the majority of present-day electronic design automation (EDA) tools, timing closure is the top priority and, hence, many of these tools make the trade-off to give priority to timing. However, opportunities exist to reduce area and gate capacitance by swapping cells to lower gate cap cells and by reducing the wirelength. To address the dynamic power reduction in the design, three major sets of experiments were done  to examine the above aspects.

In the first set of experiments, two main tool features were used in the process of reducing dynamic power.  These were the introduction of the “dynamic power optimization engine” along with the “area reclaim” feature in the post-route stage.  These options helped save 5% of dynamic power @400MHz and enabled us to nearly halve the gap that earlier existed between the actual and desired power target.

Figure 7: Example of Power Optimization

In the second set of experiments, the floorplan was soft-blocked by 100 microns to reduce the wirelength. This was discussed in detail in an earlier section. This floorplan shrink resulted in:

• Increasing the density from ~76% to 85%

• Wirelength reduction by 5.1% – post route

• Area (with combo of #1 and shrink) shrinkage by ~4% – post route

This helped saved an additional 2% @400MHz, and the impact was similar across the frequency sweep.

The third set of experiments was related to design changes where flop sizes were downsized to a minimum at pre_ cts opt and the remaining flops of higher drive strengths were set to “don’t use”.  This helped to further reduce the sequential power.  An important point to note is that the combinational power did not increase significantly. After we introduced the above technique, we were able to reduce power significantly, as shown in the charts below.

Results

By using these latest tool technologies and design techniques, we were able to achieve 10% better frequency and reduced the dynamic power by 10%. Results are shown here based on the 400MHz and 200MHz for the dynamic power reduction.

Table 1: Dynamic Power Reduction Results

The joint ARM /Cadence work started with addressing challenges at two points /scenarios on the PPA curve:

1. Frequency focus with optimal power (400MHz)

2. Lowest power at reduced frequency   (200MHz)

For scenario #1, out of box 14.1 allowed us to reach 400MHz. With the use of PowerOpt technology, available in Encounter Digital Implementation System 14.1, we were able to reduce power to an optimal number.  For scenario #2, additional use of GigaPlace technology and inherently better SI management allowing relaxed clock slew, and much higher power reduction at 200MHz was possible. With the combination of ARM design techniques and Cadence tool features, we were able to show 38% dynamic power reduction (for standard cells) going from

400MHz – 13.2-based run to 200MHz – 14.2 best power recipe run.

Summary

Reducing the wirelength and slack profile-based placement, and predicting the detailed routing impact in the early phase of the design, are important aspects to improve the performance and reduce the dynamic power consumption in designs. Tools perform better when given the right floorplan along with the proper directives at appropriate places. With a combination of design changes,  advanced  tools, and engineering expertise, today’s physical design engineers  have the means  to thoroughly  address  the challenges associated  with timing closure while keeping  the dynamic power  consumption of the designs low.

Figure 8: Dynamic Power ( Normalized) for Logic

Several months of collaborative work between ARM and Cadence, driven by many trials, have led to optimized PPA results. Cadence tools – Encounter RTL Compiler/ Encounter Digital Implementation System 14.1 – have produced better results out of box compared to Encounter RTL Compiler/ Encounter Digital Implementation System 13.x. The continuous refinement of the flow along with design techniques such as floorplan reduction and clock slew relaxation allowed a 38% dynamic power reduction. The ARM /Cadence implementation Reference Methodology (iRM) flow uses a similar recipe for both scenarios: lowest power (MinP) and highest frequency (AFAP).

References

[1] D. Liu and C. Svensson, “Power consumption estimation in CMOS VLSI chips,” IEEE Journal of Solid-State

Circuits, vol. 29, pp. 663-670, June 1994.

[2] A.P. Chandrakasan and R.W. Broderson, “Minimizing power consumption in digital CMOS circuits,” Proc. of the

IEEE, vol. 83, pp. 498-523, April 1995.

[3] G. Gerosa, et al., “250  MHz 5-W PowerPC microprocessor,” IEEE Journal of Solid-State Circuits, vol. 32, pp.

1635-1649, Nov. 1997.

Blog Review – Monday October 27 2014

Monday, October 27th, 2014

Synopsys won’t let the hybrid debate mess with your head; automating automotive verification; the write stuff; software’s role in wearable medical technology; ARM’s bandwidth stretching.
By Caroline Hayes, Senior Editor

Playing with your mind, Michael Posner, Synopsy, relishes a mashup blog, with a lion/zebra image to illustrate IP validation in software development. He does not tease the reader all through the blog though, and gives some sound advice on mixing it up with ARM-based system for development and FPGA for validation and combinations in-between.

Indulging in a little bit of a promo-blog, Richard Goering, deconstructs the addition to the Incisive additions of Functional Safety Simulator and Functional Safety Analysis for the vManager. We will let him off the indulgence, though, as the informative, well-researched piece is as much a blog for vehicle designers as it is for verification professionals.

Not that he needs much practice in a writing studio, Hamilton Carter is still turning up for class and finds parallels in the beauty of prose and the analysis of code. Instead of one replacing the other, he advocates supplementing one with the other so that the message and intent is clear for all.

Taking an appreciative step back, Helene at Dassault, reviews the medical market and how the wearable trend might influence it. She also looks at how the company’s software helps designers understand what is needed and create it.

There are plenty of diagrams to illustrate the point that Jakublamik is making in his blog for bandwidth consumption. After clearly setting out the culprits for bandwidth hunger, he lays out the ARM Mali GPU appetizers in a conversational, yet detailed very useful blog (and with a Chinese version available too).

Cortex-M processor Family at the Heart of IoT Systems

Saturday, October 25th, 2014

Gabe Moretti, Senior Editor

One cannot have a discussion about the semiconductor industry without hearing the word IoT.  It is really not a word as language lawyers will be ready to point out, but an abbreviation that stands for Internet of Things.  And, of course, the abbreviation is fundamentally incorrect, since the “things” will be connected in a variety of ways, not just the Internet.  In fact it is already clear that devices, grouped to form an intelligent subsystem of the IoT, will be connected using a number of protocols like: 6LoWPAN, ZigBee, WiFi, and Bluetooth.  ARM has developed the Cortex®-M processor family that is particularly well suited for providing processing power to devices that consume very low power in their duties of physical data acquisition. This is an instrumental function of the IoT.

Figure 1. The heterogeneous IoT: lots of “things” inter-connected. (Courtesy of ARM)

Figure 1 shows the vision the semiconductor industry holds of the IoT.  I believe that the figure shows a goal the industry set for itself, and a very ambitious goal it is.  At the moment the complete architecture of the IoT is undefined, and rightly so.  The IoT re-introduces a paradigm first used when ASIC devices were thought of being the ultimate solution to everyone’s computational requirements.  The business of IP started  as an enhancement to application-specific hardware, and now general purpose platforms constitute the core of most systems.  IoT lets the application drive the architecture, and companies like ARM provide the core computational block with an off-the-shelf device like a Cortex MCU.

The ARM Cortex-M processor family is a range of scalable and compatible, energy efficient, easy to use processors designed to help developers meet the needs of tomorrow’s smart and connected embedded applications. Those demands include delivering more features at a lower cost, increasing connectivity, better code reuse and improved energy efficiency. The ARM Cortex-M7 processor is the most recent and highest performance member of the Cortex-M processor family. But where the Cortex-M7 is at the heart of ARM partner SoCs for IoT systems, other connectivity IP is required to complete the intelligent SoC subsystem.

A collection of some of my favorite IoT-related IP follows.

Figure 2. The Cortex-M7 Architecture (Courtesy of ARM)

Development Ecosystem

To efficiently build a system, no matter how small, that can communicate with other devices, one needs IP.  ARM and Cadence Design Systems have had a long-standing collaboration in the area of both IP and development tools.  In September of this year the companies extended an already existing agreement covering more than 130 IP blocks and software.  The new agreement covers an expanded collaboration for IoT and wearable devices targeting TSMC’s ultra-low power technology platform. The collaboration is expected to enable the rapid development of IoT and wearable devices by optimizing the system integration of ARM IP and Cadence’s integrated flow for mixed-signal design and verification.

The partnership will deliver reference designs and physical design knowledge to integrate ARM Cortex processors, ARM CoreLink system IP, and ARM Artisan physical IP along with RF/analog/mixed-signal IP and embedded flash in the Virtuoso-VDI Mixed-Signal Open Access integrated flow for the TSMC process technology.

“The reduction in leakage of TSMC’s new ULP technology platform combined with the proven power-efficiency of Cortex-M processors will enable a vast range of devices to operate in ultra energy-constrained environments,” said Richard York, vice president of embedded segment marketing, ARM. “Our collaboration with Cadence enables designers to continue developing the most innovative IoT devices in the market.”  One of the fundamental changes in design methodology is the aggregation of capabilities from different vendors into one distribution point, like ARM, that serve as the guarantor of a proven development environment.

Communication and Security

System developers need to know that there are a number of sources of IP when deciding on the architecture of a product.  In the case of IoT it is necessary to address both the transmission capabilities and the security of the data.

As a strong partner of ARM Synopsys provides low power IP that supports a wide range of low power features such as configurable shutdown and power modes. The DesignWare family of IP offers both digital and analog components that can be integrated with any Cortex-M MCU.  Beyond the extensive list of digital logic, analog IP including ADCs and DACs, plus audio CODECs play an important role in IoT applications. Designers also have the opportunity to use Synopsys development and verification tools that have a strong track record handling ARM based designs.

The Tensilica group at Cadence has published a paper describing how to use Cadence IP to develop a Wi-Fi 802.11ac transceiver used for WLAN (wireless local area network). This transceiver design is architected on a programmable platform consisting of Tensilica DSPs, using an anchor DSP from the ConnX BBE family of cores in combination with a smaller specialized DSP and dedicated hardware RTL. Because of the enhanced instruction set in the Cortex-M7 and superscalar pipeline, plus the addition of floating point DSP, Cadence radio IP works well with the Cortex-M7 MCU as intermediate band, digital down conversion, post-processing or WLAN provisioning can be done by the Cortex-M7.

Accent S.A. is an Italian company that is focused on RF products.  Accent’s BASEsoc RF Platform for ARM enables pre-optimized, field-proven single chip wireless systems by serving as near-finished solutions for a number of applications.  This modular platform is easily customizable and supports integration of different wireless standards, such as ZigBee, Bluetooth, RFID and UWB, allowing customers to achieve a shorter time-to-market. The company claims that an ARM processor-based, complex RF-IC could be fully specified, developed and ramped to volume production by Accent in less than nine months.

Sonics offers a network on chip (NoC) solution that is both flexible in integrating various communication protocols and highly secure.   Figure 3 shows how the Sonics NoC provides secure communication in any SoC architecture.

Figure 3.  Security is Paramount in Data Transmission (Courtesy of Sonics)

According to Drew Wingard, Sonics CTO “Security is one of the most important, if not the most important, considerations when creating IoT-focused SoCs that collect sensitive information or control expensive equipment and/or resources. ARM’s TrustZone does a good job securing the computing part of the system, but what about the communications, media and sensor/motor subsystems? SoC security goes well beyond the CPU and operating system. SoC designers need a way to ensure complete security for their entire design.”

Drew concludes “The best way to accomplish SoC-wide security is by leveraging on-chip network fabrics like SonicsGN, which has built-in NoCLock features to provide independent, mutually secure domains that enable designers to isolate each subsystem’s shared resources. By minimizing the amount of secure hardware and software in each domain, NoCLock extends ARM TrustZone to provide increased protection and reliability, ensuring that subsystem-level security defects cannot be exploited to compromise the entire system.”

More examples exist of course and this is not an exhaustive list of devices supporting protocols that can be used in the intelligent home architecture.  The intelligent home, together with wearable medical devices, is the most frequent example of IoT that could be implemented by 2020.  In fact it is a sure bet to say that by the time the intelligent home is a reality many more IP blocks to support the application will be available.

Blog Review – Monday Oct 13, 2014

Monday, October 13th, 2014

Cambridge Wireless discusses wearables; Cadence unmask Incisive ‘hidden treasures’; ON Semi advocates ESD measures; Synopsys presents at DVCON Europe; 3DIC reveals game-changer move at IEEE S3S

At this month’s Cambridge Wireless SIG (Special Interest Group) David Maidment, ARM, listened to an exchange of ideas for wearable and new business opportunities but with considerations for size, cost and consumer ease of use.

Revealing rampant prejudice for all physical media, Axel Scherer, Cadence, learns a lesson in features that are taken for granted and offers a list of 10 features in Incisive that may not be evident to many users.

Silicon ESD protection need to consider automotive designers, encourages Deres Eshete, explaining the reasons why ON Semiconductor has introduced the ESD7002, ESD7361, and ESD7461 ESD protection devices.

This week, DVCON Europe will include a tutorial from Synopsys about VCS AMS to extend digital verification for mixed-signal SoCs. Hélène Thibiéroz and colleagues from Synopsys, STMicroelectronics and Micronas will present October 14, but some hits of what to expect are on her blog.

Following on from the 2014 IEEE S3S conference, Zvi Or-Bach, discusses how monolithic 3D IC will be a game changer as he considers how existing fab transistor processes can be used and looks ahead to EDA efforts for the technology, covered at the conference.

Caroline Hayes – Monday October 13, 2014

Blog Review – Monday Oct. 06, 2014

Monday, October 6th, 2014

Real Intent assesses ARM TechCon; processor update; measure, analyse, change – writing and design rules; Richard Goering enjoyed learning more about the mbed IoT platform.
By Caroline Hayes, Senior Editor

ARM TechCon proved a place to reflect on both ARM’s contribution to the SoC space and also for Real Intent’s CTO, Pranav Ashar and changes in the SoC paradigm.

Different uses for the ARM Cortex-A17 and Cortex-A12 are the topic of a blog by Stefan Rosinger, ARM and how the two will become one – in name anyway.

There are parallels between electronic design principles and academic writing, discovers Hamilton Carter. The theories at the heart of both disciplines are discussed. Your assignment this week is to compare, contrast and discuss.

Another person taking notes at ARM TechCon was Richard Goering, Cadence. He reviews a keynote by ARM CTO Mike Muller describing the mbed OS platform for ARM Cortex-M based devices and mbed device server.

What Is Not Testable Is Not Fixable

Wednesday, September 17th, 2014

Gabe Moretti, Senior Editor

In the past I have mused that the three letter acronyms used in EDA like DFT, DFM, DFY and so on are superfluous since the only one that counts is DFP (Design For Profit).  This of course may be obvious since every IC reason for existence is to generate income for the seller.  But it is also true that the above observation is superficial since the IC must be testable, not only manufacturable and must also reach a yield figure that makes it cost effective.  Breaking down profitability into major design characteristics is an efficient approach, since a specific tool is certainly easier to work with than a generic one.

Bassilios Petrakis, Product Marketing Director at Cadence told me that: “DFT includes a number of requirements including manufacturing test, acceptance test, and power-on test.  In special cases it may be necessary to test the system while it is in operation to isolate faults or enable redundancies within mission critical systems.  For mission critical applications, usage of logic and memory build-in-self-test (BIST) is a commonly used approach to perform in-system test. Most recently, a new IEEE standard, P1687, was introduced to standardize the integration and instrument access protocol for IPs. Another IEEE proposed standard, P1838, has been initiated to define DFT IP for testing 3D-IC Through-Silicon-Via (TSV) based die stacks.”

Kiran Vittal, Senior Director of Product Marketing at Atrenta Inc. pointed out that: “The test coverage goals for advanced deep submicron designs are in the order of 99% for stuck-at faults and over 80% for transition faults and at-speed defects. These high quality goals can only be achieved by analyzing for testability and making design changes at RTL. The estimation of test coverage at RTL and the ability to find and fix issues that impact testability at RTL reduces design iterations and improves overall productivity to meet design schedules and time to market requirements.”

An 80% figure may seem an under achievement, but it points out the difficulty of proving full testability in light of other more demanding requirements, like area and power to name just two.

Planning Ahead

I also talked with Bassilios  about the need for a DFT approach in design from the point of view of the architecture of an IC.  The first thing he pointed out was that there are innumerable considerations that affect the choice of an optimal Design for Test (DFT) architecture for a design.   Designers and DFT engineers have to grapple with some considerations early in the design process.

Bassilios noted that “The number of chip pins dedicated to test is often a determining factor in selecting the right test architecture. The optimal pin count is determined by the target chip package, test time, number of pins supported by the automated test equipment (ATE), whether wafer multi-site test will be targeted, and, ultimately, the end-market segment application.”

He continued by noting that: “For instance, as mixed signal designs are mostly dominated by Analog IOs, digital test pins are at a premium and hence require an ultra low pin count test solution. These types of designs might not offer an IEEE1149.1 JTAG interface for board level test or a standardized test access mechanism. In contrast, large digital SoC designs have fewer restrictions and more flexibility in test pin allocation.  Once the pin budget has been established, determining the best test compression architecture is crucial for keeping test costs down by reducing test time and test data volume. Lower test times can be achieved by utilizing higher test compression ratios – typically 30-150X – while test data volume can be reduced by deploying sequential-based scan compression architectures. Test compression architectures are also available for low pin count designs by using the scan deserializer/serializer interface into the compression logic. Inserting test points that target random resistant faults in a design can often help reduce test pattern count (data volume).”

The early exploration of DFT architectures to meet design requirements – like area, timing, power, and testability – is facilitated by modern logic synthesis tools. Most DFT IP like JTAG boundary scan, memory BIST collars, logic BIST and compression macros are readily integrated into the design netlist and validated during the logic synthesis process per user’s recipe. Such an approach can provide tremendous improvements to designer productivity. DFT design rule checks are run early and often to intercept and correct undesirable logic that can affect testability.

Test power is another factor that needs to be considered by DFT engineers early on. Excessive scan switching activity can inadvertently lead to test pattern failures on an ATE. Testing one or more core or sub-block in a design in isolation together with power-aware Automatic Test Pattern Generation (ATPG) techniques can help mitigate power-related issues. Inserting core-wrapping (or isolation logic) using IEEE1500 is a good way to enable a core-based test, hierarchical test, and general analog mixed signal interactions.

For designs adopting advanced multi-voltage island techniques, DFT insertion has to be power domain-aware and construct scan chains appropriately levering industry standard power specifications like Common Power Format (CPF) and IEEE1801. A seamless integration between logic synthesis and downstream ATPG tools helps prime the test pattern validation and delivery.

ATPG

Kiran delved in greater details into the subject of testability by giving particular attention to issues with Automatic Test Pattern Generation (ATPG).  “For both stuck-at and transition faults, the presence of hard to detect faults has a substantial impact on overall ATPG performance in terms of pattern count, runtime, and test coverage, which in turn has a direct impact on the cost of manufacturing test. The ability to measure the density of hard to detect faults in a design early at the RTL stage, is extremely valuable. It gives RTL designers the opportunity to make design changes to address the issue while enabling them to quickly measure the impact of the changes.”

The performance of the ATPG engine is often measured by the following criteria:

- How close it comes to finding tests for all testable faults, i.e. how close the ATPG fault coverage comes to the upper bound.  This aspect of ATPG performance is referred to as its efficiency. If the ATPG engine finds tests for all testable faults, its efficiency is 100%.

- How long it has to run to generate the tests.  Full ATPG runs need to be completed within a certain allocated time, so the quest for finding a test is sometimes abandoned for some hard to test faults after the ATPG algorithm exceeds a pre-determined time limit.

- The larger the number of hard to test faults, the lower the ATPG efficiency.  The total number of tests (patterns) needed to test all testable faults. Note that a single test pattern can detect many testable faults.

To give a better idea of how test issues can be addressed Kiran provided me with an example.

Figure 1 (Courtesy of Atrenta)

Consider Figure 1, which has wide logic cones of flip flops and black boxes (memories or analog circuits) feeding a downstream flip flop. ATPG finds it extremely difficult to generate ‘exhaustive’ patterns and the test generation time is either long or the fault coverage is compromised. These kinds of designs can be analyzed early at RTL to find areas in the design that have poor controllability and observability, so that the designer can make design changes to improve the efficiency (test data and time) of downstream ATPG tools to generate optimum patterns to not only improve the quality of test, but also be economical to lower the cost of manufacturing test.

Figure 2 (Courtesy of Atrenta)

Figure 2 shows the early RTL analysis using Atrenta’s SpyGlass DFT tool suite. This figure highlights the problem through the schematic representation of the design and shows a thermal map on the low control/observe areas, which the designer can fix easily by recoding the RTL.

The analysis of the impact of hard to test faults at RTL can save significant design time in fixing low fault coverage and improving ATPG effectiveness for runtime and pattern count early in the design cycle, resulting in over 50x more efficiency in the design flow to meet the required test quality goals.

Conclusion

Bassilios concluded that “further improvements to testability can be achieved by performing a “what if” analysis with test point insertion and committing the test points once the desired coverage goals are met. Both top-down and bottom-up hierarchical test synthesis approaches can be supported. Early physical placement floorplan information can be imported into the synthesis cockpit to perform physically aware synthesis as well as scan ordering and congestion-free compression logic placement.”

One thing is certain: engineers will not rest.   DFT continues to evolve to address the increased complexity of SoC and 3D-IC design, its realization, and the emergence of new fault models required for sub-20nm process nodes.  With every advance, whether in the form of a new algorithm or new IP modules, the EDA tools will need to be updated and, probably, the approach to the IC architecture will need to be changed.  As the rate of cost increase of new processes continues to grow, designers will have to be more creative in developing better testing techniques to improve the utilization of already established processes.

Blog Review – Monday, Sept 01, 2014

Monday, September 1st, 2014

The generation gap for connectivity; seeking medical help; IoT messaging protocol; Cadence discusses IoT design challenges.

While marvelling at the Internet of Things (IoT), Seow Yin Lim, Cadence, writes effectively about its design challenges now that traditional processor architectures and approaches do not apply.

ARM hosts a guest blog by Jonah McLeod, Kilopass, who discusses MQTT, the messaging protocol standard for the IoT. His detailed post provides context and a simplified breakdown of the protocol.

Bringing engineers together is the motivation for Thierry Marchal, Ansys, who writes an impassioned blog following the biomedical track of the company’s CADFEM Users Meeting in Nuremberg, Germany. We are all in this together, is the theme, so perhaps we should exchange ideas. It just might catch on.

Admitting to stating the obvious, John Day, Mentor Graphics, states that younger people are more interested in automotive connectivity than older people are. He goes on to share some interesting stats from a recent survey on automotive connectivity.

Caroline Hayes, Senior Editor

Complexity of Mixed-signal Designs

Thursday, August 28th, 2014

Gabe Moretti, Senior Editor

The major constituent of system complexity today is the integration of computing with mechanical and human interfaces.  Both of these are analog in nature, so designing mixed-signal systems is a necessity.  The entire semiconductor chain is impacted by this requirement.  EDA tools must support mixed-signal development and the semiconductor foundries must adapt to using different processes to build one IC.

Impact on Semiconductor Foundries

Jonah McLeod, Director of Corporate Marketing Communications at Kilopass Technology was well informed about foundries situation when ARM processors became widely used in mixed-signal designs.  He told me that: ” Starting in 2012 with the appearance of smart meters, chip vendors such as NXP began integrating the 32-bit ARM Cortex processor with a analog/mixed-signal metrology engine for Smart Metering with two current inputs and a voltage input.

This integration had significant impact on both foundries and analog chip suppliers. The latter had been fabricating mixed-signal chips on process nodes of 180nm and larger, many with their own dedicated fabs. With this integration, they had to incorporate digital logic with their analog designs.

Large semiconductor IDMs like NXP, IT and ST had an advantage over dedicated chip companies like Linear and Analog Devices. The former had both logic and analog design expertise they could bring to bear building these SoCs and they had the fabrication expertise to build digital mixed-signal processes in smaller process geometries.

Long exempt from the pressure to chase smaller process geometries aggressively, the dedicated analog chip companies had a stark choice. They could invest the large capital expenditure required to build smaller geometry fabs or they could go fablite and outsource smaller process geometry designs to the major fabs. This was a major boost for the foundries that needed designs to fill fabs abandoned by digital designs chasing first 40nm and now 28nm. As a result, ffoundries now have 55nm processes tailored for power management ICs (PMICs) and display driver ICs, among others. Analog expertise still counts in this new world order but the competitive advantage goes to analog/mixed-signal design teams able to leverage smaller process geometries to achieve cost savings over competitors.”

As form factors in handheld and IoT devices become increasingly smaller, integrating all the components of a system on one IC becomes a necessity.  Thus fabricating mixed-signal chips with smaller geometries processes grows significantly in importance.

Mladen Nizic, Product Marketing Director at cadence noted that requirements on foundries are directly connected to new requirements for EDA tools.  He observed that: “Advanced process nodes typically introduce more parametric variation, Layout Dependent Effects (LDE), increased impact of parasitics, aging and reliability issues, layout restrictions and other manufacturing effects affecting device performance, making it much harder for designers to predict circuit performance in silicon. To cope with these challenges, designers need automated flow to understand impact of manufacturing effects early, sensitivity analysis to identify most critical devices, rapid analog prototyping to explore layout configurations quickly, constraint driven methodology to guide layout creation and in-design extraction and analysis to enable correct-by-construction design. Moreover, digital-assisted-analog has common approach in achieving analog performance, leading to increased need for integrate mixed-signal design flow.”

Marco Casale-Rossi, Senior Staff Product Marketing Manager, Design Group, Synopsys points out that there is still much life remaining in the 180nm process.  ”I’ll give you the 180 nanometer example: when 180 nanometers was introduced as an emerging technology node back in 1999, it offered single-poly, 6 aluminum metals, digital CMOS transistors and embedded SRAM memory only. Today, 180 nanometers is used for state-of-the-art BCD (Bipolar-CMOS-DMOS) processes, e.g. for smartpower, automotive, security, MCU applications; it features, as I said, bipolar, CMOS and DMOS devices, double-poly, triple-well, four aluminum layers, integrating a broad catalogue of memories such as DRAM, EEPROM, FLASH, OTP, nad more.   Power supplies span from 1V to several tens or even hundreds of Volts; analog & mixed-signal manufacturing processes at established technology nodes are as complex as the latest and greatest digital manufacturing processes at emerging technology nodes, only the metrics of complexity are different.”

EDA Tools and Design Flow

Mixed-signal designs require a more complex flow than strictly digital designs.  They often incorporate multiple analog, RF, mixed-signal, memory and logic blocks operating at high performance and different power domains.  For these reasons engineers designing a mixed-signal IC need different tools throughout the development process.  Architecting, developing testing and place and route functions are all impacted.

Mladen observed that “Mixed-signal chip architects must explore different configurations with all concerned in a design team to avoid costly, late iterations. Designers consider many factors like block placement relative to each other, IO locations, power grid and sensitive analog routes, noise avoidance to arrive to optimal chip architecture.”

Standard organizations, particularly Accellera and the IEEE have developed versions of generally used HD languages like Verilog and VHDL that provide support for mixed-signal descriptions.  VHDL-AMS and Verilog-AMS continue to be supported by the IEEE and working groups are making sure that the needs of designers are met.

Mladen points out that “recent extensions in standardization efforts for Real Number Modeling (RNM) enable effective abstraction of analog for simulation and functional verification almost at digital speeds.    Cadence provides tools for automating analog behavioral and RNM model generation and validation. In last couple of years, adoption of RNM is on rise driven by verification challenges of complex mixed-signal designs.”

Design verification is practically always the costlier part of development.  This is partly due to the lack of effective correct by construction tools and obviously by the increasing complexity of designs that are often a product of various company design teams as well as the use of third party IP.

Steve Smith, Sr. Marketing Director, Analog/Mixed-signal Verification at Synopsyspointed out that: “The need for exhaustive verification of mixed-signal SoCs means that verification teams need perform mixed-signal simulation as part of their automated design regression test processes. To achieve this requirement, mixed-signal verification is increasingly adopting techniques that have been well proven in the purely digital design arena. These techniques include automated stimulus generation, coverage and assertion- driven verification combined with low-power verification extended to support analog and mixed-signal functionality.  As analog/mixed-signal design circuits are developed, design teams will selectively replace high-level models for the SPICE netlist and utilize the high performance and capacity of a FastSPICE simulator coupled to a high-performance digital simulator. This combination provides acceleration for mixed-signal simulation with SPICE-like accuracy to adequately verify the full design. Another benefit of using FastSPICE in this context is post-layout simulation for design signoff within the same verification testbench environment.”

He continued by saying that: “An adjacent aspect of the tighter integration between analog and digital relates to power management – as mixed-signal designs require multiple power domains, low-power verification is becoming more critical. As a result of these growing challenges, design teams are extending proven digital verification methodologies to mixed-signal design.  Accurate and efficient low-power and multiple power domain verification require both knowledge of the overall system’s power intent and careful tracking of signals crossing these power domains. Mixed-signal verification tools are available to perform a comprehensive set of static (rule-based) and dynamic (simulation run-time) circuit checks to quickly identify electric rule violations and power management design errors. With this technology, mixed-signal designers can identify violations such as missing level shifters, leakage paths or power-up checks at the SoC level and avoid significant design errors before tape-out. During detailed mixed-signal verification, these run-time checks can be orchestrated alongside other functional and parametric checks to ensure thorough verification coverage.”

Another significant challenge to designers is placing and routing the design.  Mladen described the way Cadence supports this task.  “Analog designers use digital logic to calibrate and tune their high performance circuits. This is called digitally-assisted-analog approach. There are often tens of thousands of gates integrated with analog in a mixed-signal block, at periphery making it ready for integration to SOC as well as embedded inside the hierarchy of the block. Challenges in realizing this kind of designs are:

- time and effort needed for iteration among analog and digital designers,

- black-boxing among analog and digital domains with no transparency during the iterations,

- sharing data and constraints among analog and digital designers,

- performing ECO particularly late in the design cycle,

- applying static timing analysis across timing paths spanning gates embedded in hierarchy of mixed-signal block(s).

Cadence has addressed these challenges by integrating Virtuoso custom and Encounter digital platforms on common OpenAccess database enabling data and constraint sharing without translation, full transparency in analog and digital parts of layout in both platforms.”

Casale-Rossi has described how Synopsys addresses the problem.  “A&M/S has always had placement (e.g. symmetry, rotation) and routing (e.g. shielding, length/resistance matching) special requirements. With Synopsys’ Custom Designer – IC Compiler round-trip, and with our Galaxy Custom Router, we are providing our partners and customers with an integrated environment for digital and analog & mixed-signal design implementation that helps address the challenges.”

Conclusion

The bottom line is that EDA tools providers, standards developing organization and semiconductor foundries have significant further work to do.  IC complexity will increase and with it mixed-signal designs.  Mixed-signal third party IP is by nature directly connected to a specific foundry and process since it must be developed and verified at the transistor level.  Thus the complexity of integrating IP development and fabrication will limit the number of IP providers to those companies big enough to obtain the attention of the foundry.

Newer Processes Raise ESL Issues

Wednesday, August 13th, 2014

Gabe Moretti, Senior Editor

In June I wrote about   how EDA changed its traditional flow in order to support advanced semiconductors manufacturing.  I do not think that, although the changes are significant and meaningful they are enough to sustain the increase in productivity required by financial demands.  What is necessary, in my opinion, is a better support for system level developers.

Leaving the solution to design and integration problems to a later stage of the development process creates more complexity since the network impacted is much larger.  Each node in the architecture is now a collection of components and primitive electronic elements that dilute and thus hide the intended functional architecture.

Front End Design Issues

Changes in the way front end design is done are being implemented.  Anand Iyer, Calypto’s Director of Product Marketing focused on the need to plan power at system level.  He observed that: “Addressing DFP issues need to done in the front end tools, as the RTL logic structure and architecture choices determines 80% of the power. Designers need to minimize the activity/clock frequency across their designs since this is the only metric to control dynamic power. They can achieve this in many ways: (1) Reducing activity permanently from their design, (2) Reduce activity temporarily during the active mode of the design.”  Anand went on to cover the two points: “The first point requires a sequential analysis of the entire design to identify opportunities where we can save power. These opportunities need to be evaluated against possible timing and area impact. We need automation when it comes to large and complex designs. PowerPro can help designers optimize their designs for activity.”

As for the other point he said: “The second issue requires understanding the interaction of hardware and software. Techniques like power gating and DVFS fall under this category.”

Anand also recognized that high level synthesis can be used to achieve low power designs.  Starting from C++ or SystemC, architects can produce alternative microarchitectures and see the power impact of their choices (with physically aware RTL power analysis).  This is hugely powerful to enable exploration because if this is done only at RTL it is time consuming and unrealistic to actually try multiple implementations of a complex design.  Plus, the RTL low power techniques are automatically considered and automatically implemented once you have selected the best architecture that meets your power, performance, and cost constraints.

Steve Carlson, Director of Marketing at Cadence pointed out that about a decade ago design teams had their choice of about four active process nodes when planning their designs.  He noted that: “In 2014 there are ten or more active choices for design teams to consider.  This means that solution space for product design has become a lot more rich.  It also means that design teams needs a more fine grained approach to planning and vendor/node selection.  It follows that the assumptions made during the planning process need to be tested as early and often, and with as much accuracy as possible at each stage. The power/performance and area trade-offs create end product differentiation.  One area that can certainly be improved is the connection to trade-offs between hardware architecture and software.  Getting more accurate insight into power profiles can enable trade-offs at the architectural and micro architectural levels.

Perhaps less obvious is the need for process accurate early physical planning (i.e., understands design rules for coloring, etc.).”

As shown in the following figure designers have to be aware that parts of the design are coming from different suppliers and thus Steve states that: “It is essential for the front-end physical planning/prototyping stages of design to be process-aware to prevent costly surprises down the implementation road.”

Simulation and Verification

One of the major recent changes in IC design is the growing number of mixed/signals designs.  They present new design and verification challenges particularly when new advanced processes are targeted for manufacturing.  On the standard development side Accellera has responded by releasing a new version of its Verilog-AMS.  It is a mature standard originally released in 2000. It is built on top of the Verilog subset of the IEEE 1800 -2012 SystemVerilog.  The standard defines how analog behavior interacts with event-based functionality, providing a bridge between the analog and digital worlds. To model continuous-time behavior, Verilog-AMS is defined to be applicable to both electrical and non-electrical system descriptions.  It supports conservative and signal-flow descriptions and can also be used to describe discrete (digital) systems and the resulting mixed-signal interactions.

The revised standard, Verilog-AMS 2.4, includes extensions to benefit verification, behavioral modeling and compact modeling. There are also several clarifications and over 20 errata fixes that improve the overall quality of the standard.Resources on how best to use the standard and a sample library with power and domain application examples are available from Accellera.

Scott Little, chair of the Verilog AMS WG stated: “This revision adds several features that users have been requesting for some time, such as supply sensitive connect modules, an analog event type to enable efficient electrical-to-real conversion and current checker modules.”

The standard continues to be refined and extended to meet the expanding needs of various user communities. The Verilog-AMS WG is currently exploring options to align Verilog-AMS with SystemVerilog in the form of a dot standard to IEEE 1800. In addition, work is underway to focus on new features and enhancements requested by the community to improve mixed-signal design and verification.

Clearly another aspect of verification that has grown significantly in the past few years is the availability of Verification IP modules.  Together with the new version of the UVM 1.2 (Universal Verification Methodology) standard just released by Accellera, they represent a significant increment in the verification power available to designers.

Jonah McLeod, Director of Corporate Marketing Communications at Kilopass, is also concerned about analog issues.  He said: “Accelerating Spice has to be major tool development of this generation of tools. The biggest problem designers face in complex SoC is getting corner cases to converge. This can be time consuming an imprecise with current generation tools.  Start-ups claiming montecarlo spice accelerations like Solido Design Automation and CLK Design Automation are attempting to solve the problem. Both promise to achieve Spice-level accuracy on complex circuits within a couple of percentage points in a fraction of the time.”

One area of verification that is not often covered is its relationship with manufacturing test.  Thomas L. Anderson, Vice President of Marketing at Breker Verification Systems told me that: “The enormous complexity of a deep submicron (32, 38, 20, 14 nm) SoC has a profound impact on manufacturing test. Today, many test engineers treat the SoC as a black box, applying stimulus and checking results only at the chip I/O pins. Some write a few simple C tests to download into the SoC’s embedded processors and run as part of the manufacturing test process. Such simple tests do not validate the chip well, and many companies are seeing returns with defects missed by the tester. Test time limitations typically prohibit the download and run of an operating system and user applications, but clearly a better test is needed. The answer is available today: automatically generated C test cases that run on “bare metal” (no operating system) while stressing every aspect of the SoC. These run realistic user scenarios in multi-threaded, multi-processor mode within the SoC while coordinating with the I/O pins. These test cases validate far more functionality and performance before the SoC ever leaves the factory, greatly reducing return rates while improving the customer experience.”

Blog Review – Mon. August 11 2014

Monday, August 11th, 2014

VW e-Golf; Cadence’s power signoff launch; summer viewing; power generation research.
By Caroline Hayes, Senior Editor

VW’s plans for all-electric Golf models have captured the interest of John Day, Mentor Graphics. He attended the Management Briefing Seminar and reports on the carbon offsetting and a solar panel co-operation with SunPower. I think I know what Day will be travelling in to get to work.

Cadence has announced plans to tackle power signoff this week and Richard Goering elaborates on the Voltus-Fi Custom Power Integrity launch and provides a detailed and informative blog on the subject.

Grab some popcorn (or not) and this summer blockbusters, as lined up by Scott Knowlton, Synopsys. Perhaps not the next Harry Potter series, but certainly a must-see for anyone who missed the company’s demos at PCI-SIG DevCon. This humorous blog continues the cinema analogy for “Industry First: PCI Express 4.0 Controller IP”, “DesignWare PHY IP for PCI Express at 16Gb/s”, “PCI PHY and Controller IP for PCI Express 3.0” and “Synopsys M-PCIe Protocol Analysis with Teledyne LeCroy”.

Fusion energy could be the answer for energy demands and Steve Leibson, Xilinx, shares Dave Wilson’s (National Instruments) report of a fascinating project by National Instruments to monitor and control a compact spherical tokamak (used as neutron sources) with the UK company, Tokamak Solutions.

Next Page »