Part of the  

Chip Design Magazine

  Network

About  |  Contact

Posts Tagged ‘Cadence’

Next Page »

Blog Review – Monday, December 15 2014

Monday, December 15th, 2014

Rolling up her sleeves and getting down to some hard work – not just words, Carissa Labriola, ARM, opens a promised series of posts with an intelligent, and through analysis of the Arduino Due and there is even the chance to win one. This is a refreshingly interactive, focused blog for the engineering community.

It’s coming to the end of the year, so it is only to be expected that there is a blog round-up. Real Intent does not disappoint, and Graham Bell provides his ‘Best of’ with links to blog posts, an interview at TechCon and a survey.

There is a medical feel to the blog by Shelly Stalnake, Mentor Graphics, beginning with a biology text book image of an organism to lead into an interesting discussion on parasitic extraction. She lists some advice – and more importantly – links to resources to beat the ‘pests’.

Always considerate of his readers, Michael Posner, Synopsys, opens his blog with a warning that it contains technical content. He goes on to unlock the secrets of ASIC clock conversion, referencing Synopsys of course, but also some other sources to get to grips with this prototyping tool. And in the spirit of Christmas, he also has a giveaway, a signed copy of an FPGA-Based Prototyping Methodology Manual if you can answer a question about HAPS shipments.

Another list is presented by Steve Carlson, Cadence, but his is no wishlists or ‘best of’ in fact it’s a worst-of, with the top five issues that can cause mixed-signal verification misery. This blog is one of the liveliest and most colorful this week, with some quirky graphics to accompany the sound advice that he shares on this topic.

Blog Review – Monday December 08 2014

Wednesday, December 10th, 2014

Industry forecasts sustained semi growth; EVs just go on and on; Second-chance webinar; Tickets please; Play time; Missed parade

By Caroline Hayes, Senior Editor

Bringing 2014 to a close on an optimistic note, Falan Yinug, director, Industry Statistics & Economic Policy, Semiconductor Industry Association (SIA) tries to understand the industry’s quirky sense of timing while reporting that the World Semiconductor Trade Statistics (WSTS) program revised its full-year 2014 global semiconductor sales growth forecast to 9% ($333.2 billion in total sales) an increase from the 6.5% it forecast in June. It also forecasts that positive sales trend to continue with a 3.4% increase in sales in 2015 ($344.5 billion in total sales) and beyond, with $355.3 billion in 2016.

First road rage, now range anxiety. Apparently it is a common ailment for EV (electric vehicle) drivers. John Day, Mentor Graphics, takes heart from a report by IDTechEx which says that a range extender will be fitted to each of the 8million hybrid cards produced in 2025 and predicts the introduction in 2015 of hybrid EVs with fuel cell range extenders and multi-fuel jet engines to increase driver options.

It’s hardly a stretch to find someone who remembers using public transport before MIFARE ticketing, but Nav Bains, NXP looks at the next stage for commuters using a single, interoperable programming interface for commuters to tap NFC mobile devices to provide the ticketing service.

More time-warp timings, as Phil Dworsky, ARM, tells of a webinar entitled Avoiding Common Pitfalls in Verifying Cache-Coherent ARM-based Designs, which has been and gone but can be watched again, simply by registering. He even lists the speakers (Neill Mullinger and Tushar Mattu, both Synopsys) and lists what you missed but what you can catch again in the recorded webinar.

Enamoured with e code, Hannes, Cadence, directs people who just don’t get it to the edaplayground website, with links to a video for e-beginners.

Recap of what you missed, impactful blogs from the last 3 months
Perhaps frustrated that no-one seems to have notice, Michael Posner, Synopsys, patiently outlines some of his favourite blog posts from the last couple of months. He wants to draw your attention to prototyping in particular (it features heavily in the list) as well as abstract partitioning and the joy of vertical boards.

IoT Cookbook: Analog and Digital Fusion Bus Recipe

Tuesday, December 2nd, 2014

Experts from ARM, Mathworks, Cadence, Synopsys, Analog Devices, Atrenta, Hillcrest Labs and STMicroelectronics cook up ways to integrate analog with IoT buses.

By John Blyler, Editorial Director

Many embedded engineers approach the development of Internet-of-Things (IoT) devices like a cookbook. By following previous embedded recipes, they hope to create new and deliciously innovative applications. While the recipes may be similar, today’s IoT uses strong concentration of analog, sensors and wireless ingredients. How will these parts combine with the available high-end bus structures like ARM’s AMBA? To find out, “IoT Embedded Systems” talked with the head technical cooks including Paul Williamson, Senior Marketing Manager, ARM; Rob O’Reilly, Senior Member Technical Staff at Analog Devices; Mladen Nizic , Engineering Director, Mixed Signal Solution, Cadence; Ron Lowman, Strategic Marketing Manager for IoT, Synopsys; Corey Mathis, Industry Marketing Manager -  Communications, Electronics and Semiconductors, MathWorks; Daniel Chaitow, Marketing Manager, Hillcrest Labs; Bernard Murphy, CTO, Atrenta; and Sean Newton, Field Applications Engineering Manager, STMicroelectronics. What follows is a portion of their responses. — JB

Key points:

  • System-level design is needed so that the bus interface can control the analog peripheral through a variety of modes and power-efficient scenarios.
  • One industry challenge is to sort the various sensor data streams in sequence, in types, and include the ability to do sample or rate conversion.
  • To ensure the correct sampling of analog sensor signals and the proper timing of all control and data signals, cycle accurate simulations must be performed.
  • Control system and sensor subsystems are needed to help reduce digital bus cycles by tightly integrating the necessary components.
  • Hardware design and software design have inherently different workflows, and as a result, use different design tools and methodologies.
  • For low-power IoT sensors, the analog-digital converter (ADC) power supply must be designed to minimize noise. Attention must also be paid to the routing of analog signals between the sensors and the ADC.
  • Beyond basic sensor interfacing, designer should consider digitally assisted analog (DAA) – or digital logic embedded in analog circuitry that functions as a digital signal processor.

Blyler: What challenges do designers face when integrating analog sensor and wireless IP with digital buses like ARM’s AMBA and others?

Williamson (ARM): Designers need to consider system-level performance when designing the interface between the processor core and the analog peripherals. For example a sensor peripheral might be running continuously, providing data to the CPU only when event thresholds are reached. Alternatively the analog sensor may be passing bursts of sampled data to the CPU for processing.  These different scenarios may require that the designer develop a digital interface that offers simple register control, or more advanced memory access. The design of the interface needs to enable control of the peripheral through a broad range of modes and in a manner that optimizes power efficiency at a system and application level.

O’Reilly (Analog Devices): One challenge is ultra-low power designs to enable management of the overall system power consumption. In IoT systems, typically there is one main SoC connected with multiple sensors running at different Output Data Rates (ODR) using asynchronous clocking. The application processor SoC collects the data from multiple sensors and completes the processing. To keep power consumption low, the SoC generally isn’t active all of the time. The SoC will collect data at certain intervals. To support the needs of sensor fusion it’s necessary that the sensor data includes time information. This highlights the second challenge, the ability to align a variety of different data types in a time sequence required for fusion processing. This raises the question “How can an entire industry adequately sort the various sensor data streams in sequence, in types, and include the ability to do sample or rate conversion.?”

Nizic (Cadence): Typically a sensor will generate a small (low voltage/current) analog signal which needs to be properly conditioned and amplified before converting it to digital signal sent over a bus to memory register for further processing by a DSP or a controller. Sometimes, to save area, multiple sensor signals are multiplexed (sampled) to reduce the number of A2D converters.

From the design methodology aspect, the biggest design challenge is verification. To ensure analog sensor signals are sampled correctly and all control and data signals are timed properly, cycle-accurate simulations must be performed. Since these systems now contain analog, in addition to digital and bus protocol verification, a mixed-signal simulation must cover both hardware and software. To effectively apply mixed-signal simulation, designers must model and abstract behavior of sensors, analog multiplexers, A2D converters and other analog components. On the physical implementation side, busses will require increased routing resources, which in turn mean more careful floor-planning and routing of bus and analog signals to keep chip area at minimum and avoid signal interference.

Lowman (Synopsys): For an IC designer, the digital bus provides a very easy way to snap together an IC by hanging interface controllers such as I2C, SPI, and UARTs to connect to sensors and wireless controllers.  It’s also an easy method to hang USB and Ethernet, as well as analog interfaces, memories and processing engines.  However, things are a bit more complicated on the system level. For example, the sensor in a control system helps some actuator know what to do and when to do it.  The challenge is that there is a delay in bus cycles from sensing to calculating a response to actually delivering a response that ultimately optimizes the control and efficiency of the system.  Examples include motor control, vision systems and power conversion applications. Ideally, you’d want a sensor and control subsystem that has optimized 9D Sensor Fusion application. This subsystem significantly reduces cycles spent traveling over a digital bus by essentially removing the bus and tightly integrating the necessary components needed to sense and process the algorithms. This technique will be critical to reducing power and increasing performance of IoT control systems and sensor applications in a deeply embedded world.

Mathis (Mathworks): It is no surprise that mathematical and signal processing algorithms of increasing complexity are driving many of the innovations in embedded IoT. This trend is partly enabled by the increasing capability of SoC hardware being deployed for the IoT. These SoCs provide embedded engineers greater flexibility regarding where the algorithms get implemented. The greater flexibility, however, leads to new questions in early stage design exploration. Where should the (analog and mixed) signal processing of that data occur? Should it occur in a hardware implementation, which is natively faster but more costly in on-chip resources? Or in software, where inherent latency issues may exist? One key challenge we see is that hardware design and software design have inherently different workflows, and as a result, use different design tools and methodologies. This means SoC architects need to be fluent in both C and HDL, and the hardware/software co-design environments needed for both. Another key challenge is that this integration further exacerbates the functional, gate- or circuit-level, and final sign-off verification problems that have dogged designers for decades. Interestingly, designers facing either or both of these key challenges could benefit significantly from top-down design and verification methodologies. (See last month’s discussion, “Is Hardware Really That Much Different From Software?”)

Chaitow (Hillcrest Labs): In most sensor-based applications, data is ultimately processed in the digital realm so an analog to digital conversion has to occur somewhere in the system before the processing occurs. MEMS sensors measure tiny variations in capacitance, and amplification of that signal is necessary to allow sufficient swing in the signal to ensure a reasonable resolution. Typically the analog to digital conversion is performed at the sensor to allow for reduction of error in the measurement. Errors are generally present because of the presence of noise in the system, but the design of the sensing element and amplifiers have attributes that contribute to error. For a given sensing system minimizing the noise is therefore paramount. The power supply of the ADC needs to be carefully designed to minimize noise and the routing of analog signals between the sensors and the ADC requires careful layout. If the ADC is part of an MCU, then the power regulation of the ADC and the isolation of the analog front end from the digital side of the system is vital to ensure an effective sampling system.

As always with design there are many tradeoffs. A given analog MEMS supplier may be able to provide a superior measurement system to a MEMS supplier that provides a digital output. By accepting the additional complexity of the mixed-signal system and combining the analog sensor with a capable ADC, an improved measurement system can be built. In addition if the application requires multiple sensors, using a single external multiple channel ADC with analog sensors can yield a less expensive system, which will be increasingly important as the IoT revolution continues.

Murphy (Atrenta): Aside from the software needs, there are design and integration considerations. On the design side, there is nothing very odd. The sensor needs to be presented to an AMBA fabric as a slave of some variety (eg APB or AHB), which means it needs all the digital logic to act as a well-behaved slave (see Figure). It should recognize it is not guaranteed to be serviced on demand and therefore should support internal buffering (streaming buffer if an output device for audio, video or other real-time signal). Sensors can be power-hungry so they should support power down that can be signaled by the bus (as requested by software).

The implementation side is definitely more interesting. All of that logic is generally bundled with the analog circuitry into one AMS block and it is usually difficult to pin down a floor-plan outline on such a block until quite close to final layout. This makes full-chip floor planning more challenging because you are connecting to an AMBA switch fabric, which likes to connect to well-constrained interfaces because the switch matrix itself doesn’t constrain layout well on its own. This may lead to a little more iteration of the floor plan than you otherwise might expect

Beyond basic sensor interfacing, you need to consider digitally assisted analog (DAA). This is when you have digital logic embedded in analog circuitry, functioning as a digital signal processor to perform effectively an analog function but perhaps more flexibly and certainly with more programmability that analog circuitry. Typical applications are for beamforming in radio transmission and for super-accurate ADCs.

Figure: The AMBA Bus SOC Platform is a configurable with several peripherals and system functions, e.g., AHB Bus(es), APB Bus(es), arbiters, decoders. Popular peripherals include RAM controllers, Ethernet, PCI, USB, 1394a, UARTs, PWMs, PIOs. (Courtesy of ARM Community - http://community.arm.com/docs/DOC-3752)

Newton (STMicroelectronics): Integration of devices such as analog sensors and wireless IP (radios) is widespread today via the use of standard digital bus interfaces such as I2C and SPI. Integration of analog IP with a bus – such as ARM’s AMBA – becomes a matter of connecting the relevant buses to the digital registers contained within the IP. This is exactly what happens when you use I2C or SPI to communicate to standalone sensors or wireless radio, with the low-speed bus interfaces giving external access to the internal registers of the analog IP. The challenges for integration to devices with higher-end busses isn’t so much on the bus interface, as it is in defining and qualifying the resulting SoC. In particular, packaging characteristics, the number of GPIO’s available, the size of package, the type of processing device used (MPU or MCU), internal memory capability such as flash or internal SRAM, and of course the power capabilities of the device in question: does it need very low standby power? Wake capability?  Most of these questions are driven by market requirements and capabilities and must be weighed against the cost and complexity of the integration effort.

The challenges for integration to devices with higher-end busses isn’t so much on the bus interface, as it is in defining packaging characteristics, available GPIOs, type of processing device, memory such as flash or internal SRAM, and power capabilities.

Blyler: Thank you.

This article was sponsored by ARM.

ARM and Cortex are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. mbed is a trademark of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved.

Hot Trends for 2015

Tuesday, December 2nd, 2014

Chi-Ping Hsu, Senior Vice President, Chief Strategy Officer, EDA and Chief of Staff to the CEO at Cadence

The new system-design imperative

We’re at a tipping point in system design. In the past, the consumer hung on every word from technology wizards, looking longingly at what was to come. But today, the consumer calls the shots and drives the pace and specifications of future technology directions. This has fostered, in part, a new breed of system design companies that has taken direct control over the semiconductor content.

These systems companies are reaping business (pricing, availability), technical (broader scope of optimization) and strategic (IP protection, secrecy) benefits.  This is clearly a trend in which the winning systems companies are partaking.

They’re less interested in plucking components from shelves and soldering them to boards and much more interested in conceiving, implementing and verifying their systems holistically, from application software down to chip, board and package. To this end, they are embracing the marriage of EDA and IP as a speedy and efficient means of enabling their system visions. For companies positioned with the proper products and services, the growth opportunities in 2015 are enormous.

The shift left

Time-to-market pressures and system complexity force another reconsideration in how systems are designed. Take verification for example. Systems design companies are increasingly designing at higher levels, which requires understanding and validating software earlier in the process. This has led to the “shift left” phenomenon.

The simple way to think about this trend is that everything that was done “later” in the design flow is now being started “earlier” (e.g., software development begins before hardware is completed).  Another way to visualize this macroscopic change is to think about the familiar system development “V-diagram” (Figure 1 below). The essence of this evolution is the examination of any and all dependencies in the product planning and development process to understand how they can be made to overlap in time.

This overlap creates the complication of “more moving parts” but it also enables co-optimization across domains.  Thus, the right side of the “V” shifts left (Figure 2 below) to form more of an accelerated flow. (Note: for all of the engineers in the room, don’t be too literal or precise; it is meant to be thematic of the trend).

FIGURE 1

Prime examples of the shift left are the efforts in software development that are early enough to contemplate hardware changes (i.e., hardware optimization and hardware dependent software optimization), while at the other end of the spectrum we see early collaboration between the foundry, EDA tool makers and IP suppliers to co-optimize the overall enablement offering to maximize the value proposition of the new node.

A by-product of the early software development is the enablement of software-driven verification methodologies that can be used to verify that the integration of sub-systems does not break the design. Another benefit is that performance and energy can be optimized in the system context with both hardware and software optimizations possible.  And, it is no longer just performance and power – quality, security and safety are also moving to the top level of concerns.

FIGURE 2

Chip-package-board interdependencies

Another design area being revolutionized is packaging. Form factors, price points, performance and power are drivers behind squeezing out new ideas.  The lines between PCB, package, interposer and chip are being blurred.

Having design environments that are familiar to the principle in the system interconnect creation, regardless of being PCB, package or die centric by nature, provides a cockpit from which the cross fabric structures can be created, and optimized.  Being able to provide all of the environments also means that data interoperable data sharing is smooth between the domains.  Possessing analysis tools that operate independent of the design environment offers the consistent results for all parties incorporating the cross fabric interface data.  In particular power and signal integrity are critical analyses to ensure design tolerances without risking the cost penalties of overdesign.

The rise of mixed-signal design

In general, but especially driven by the rise of Internet of Things (IoT) applications, mixed-signal design has soared in recent years. Some experts estimate that as much as 85% of all designs have at least some mixed-signal elements on board.

Figure 3: IBS Mixed-signal design start forecast (source: IBS)

Being able to leverage high quality, high performance mixed signal IP is a very powerful solution to the complexity of mixed signal design in advanced nodes. Energy-efficient design features are also pervasive.  Standards support for power reduction strategies (from multi-supply voltage, voltage/frequency scaling, and power shut-down to multi-threshold cells) can be applied across the array of analysis, verification and optimization technologies.

To verify these designs, the industry has been a little slower to migrate. The reality is that there is only so much tool and methodology change that can be digested by a design team while it remains immersed in the machine that cranks out new designs.  So, offering a step-by-step progression that lends itself to incremental progress is what has been devised.  “Beginning with the end in mind” has been the mantra of the legions of SoC verification teams that start with a sketch of the outcome desired in the planning and management phase at the beginning of the program. The industry best practices are summarized as: MD-UVM-MS – that is, metrics-driven unified verification methodology with mixed signal.

Figure 4: Path to MS Verification Greatness

Blog Review – Monday November 24, 2014

Monday, November 24th, 2014

Call for new technology for SoC verification; Four steps to integrated design; Securing the IoT; My Big Data is bigger than yours; Immigration issues

Cadence fellow Mike Stellfox is the subject of an interesting Q&A, relayed by Richard Goering, Cadence, where he talks about UVM at SoC and system level and the need for a new approach

Essential tips from a keynote in Japan by architect Cristiano Ceccato, are put to good use by Akio, Dassault Systèmes. It turns out that there are parallels with bricks and mortar for those dealing with IP blocks and design teams.

An optimistic note for a secure IoT is sounded by Zach Shelby, ARM, as he details the component parts in this blog.

Perhaps growing tired of empty boasting, Michael Ford looks at just how big data collection is on today’s factory floor, and how savings can be made.

The fact that 3,000 copies of its virtual prototyping book have been distributed is the least noteworthy in the blog by Tom De Schutter, Synopsys. A follow-up survey has produced some interesting views on software challenges for virtual prototyping.

Taking a different view to Europe, which is currently wrestling with immigration controls, limits and quotas, Peter Muller, Intel and Brian Toohey, SIA welcome the initiatives of President Obama for increasing the skilled visa program that should benefit the industry.

Blog Review – Monday, Nov. 17 2014

Monday, November 17th, 2014

Harking back to analog; What to wear in wearables week; Multicore catch-up; Trusting biometrics
By Caroline Hayes, Senior Editor.

Adding a touch of nostalgia, Richard Goering, Cadence, reviews a mixed signal keynote at Mixed-Signal Summit that Boris Murmann made at Cadence HQ. His ideas for reinvigorating the role of analog make interesting reading.

As if there wasn’t enough stress about what to wear, ARM adds to it with its Wearables Week. Although David Blaza finds that Shane Walker, IHS is pretty relaxed, offering a positive view of the wearables and medical market.

Practise makes perfect, believes Colin Walls, Mentor, who uses his blog to highlight common misconceptions of C++, multicore and MCAPI for communication and synchronisation between cores.

Biometrics are popular and ubiquitous but Thomas Suwald, NXP looks at what needs to be done for secure integration and the future of authentication.

Blog review – Monday, Nov. 03 2014

Monday, November 3rd, 2014

Intel cooks up a vision of the IoT; Cadence turns up the verification volume; Synopsys celebrates being ‘-free’; ARM and AMD join RapidIO.org; Imagination adds some details to wearable devices.

Envisioning the future scenario of the connected kitchen, Dylan Jarson, considers the inter-cusine communication but also the demands this will place on data centers and the changes that this may mean. (His vision of appliances talking to you and to each other makes a refreshing change from the teenager monologue heard in our kitchen: “What is there to eat? When’s lunch? I’m hungry/starving/famished”.)

Cadence, just like Pink, wants to get this party started. Steve Carlson celebrates and urges everyone to join in the SoC verification and even provides a comprehensive list of ingredients (and diagrams) needed for a good mix of progress and innovation.

An interesting premise is proposed by Tom De Schutter, enjoy what is not there. He is talking about hardware-free software development and adapts a gluten-free marketing slogan, for engineers who might be hardware-intolerant.

Steve Leibso delivers RapidIO news – ARM and AMD have joined the switched fabric interconnect organization. Xilinx, as a RapidIO.org member will track the 64bit processor in preparation for the 64bit processor specification.

Part of a series, Alexandru Voica adds some stats about wearable devices to pad out a ‘teaser blog’ with links to two SoCs from Imagination’s partners.

Pushing the Performance Boundaries of ARM Cortex-M Processors for Future Embedded Design

Friday, October 31st, 2014

By Ravi Andrew and Madhuparna Datta, Cadence Design Systems

One of the toughest challenges in the implementation of any processors is balancing the need for the highest performance with the conflicting demands for lowest possible power and area. Inevitably, there is a tradeoff between power, performance, and area (PPA). This paper examines two unique challenges for design automation methodologies in the new ARM®Cortex®-M processor:  how to get maximum performance while designing for a set power budget and how to get maximum power savings while optimizing for a set target frequency.

Introduction

The ARM®Cortex®-M7 processor is the latest embedded processor by ARM specifically developed to address digital signal control markets that demand an efficient, easy-to-use blend of control and signal processing capabilities. The ARM Cortex-M7 processor has been designed with a large variety of highly efficient signal processing features, which demands very power- efficient design.

Figure 1: ARM Cortex-M7 Block Diagram

The energy-efficient, easy-to-use microprocessors in the ARM Cortex-M series have received a large amount of attention recently as portable and wireless / embedded applications have gained market share. In high-performance designs, power has become an issue since at those frequencies power dissipation can easily reach several tens of watts.   The efficient handling of these power levels requires complex heat dissipation techniques at the system level, ultimately resulting in higher costs and potential reliability issues. In this section, we will isolate the different components of power consumption on a chip to demonstrate why power has become a significant issue. The remaining sections will discuss how we approached this problem and resolved it using Cadence® implementation tools, along with other design techniques.

We began the project with the objective of addressing two simultaneous challenges:

1. Reach, as fast as possible, a performance level with optimal power (AFAP)

2. Reduce power to the minimum for a lower frequency scenario (MinPower)

Before getting  into the details of how we achieved the desired frequency  and power  numbers,  let’s first examine the components which contribute to dynamic power  and the factors which gate  the frequency  push. This experiment has been conducted on the ARM Cortex-M7 processor.  The ARM Cortex-M7 processor has achieved 5 CoreMark / MHz – 2000 CoreMark* in 40LP and typical 2X digital signal processing (DSP) performance of the ARM Cortex-M4 processor.

Dynamic power components

In high-performance microprocessors, there are several key reasons which are causing a rise in power dissipation. First, the presence of a large number of devices and wires integrated on a big chip results in an overall increase in the total capacitance of the design. Second, the drive for higher performance leads to increasing clock frequencies, and dynamic power is directly proportional to the rate of charging capacitances (in other words, the clock frequency). A third reason that may lead to higher power consumption is an inefficient use of gates.  The total switching device capacitance consists of gate oxide capacitance, overlap capacitance, and junction capacitance. In addition, we consider the impact of internal nodes of a complex logic gate.  For example, the junction capacitance of the series-connected NMOS transistors in a NAND gate contributes to the total switching capacitance, although it does not appear at the output node.  Dynamic power is consumed when a gate switches. However, interest has risen in the physical design area, to make better use of the available gates by increasing the ratio of clock cycles when a gate actually switches. This increased device activity would also lead to rising power consumption. Dynamic power is the largest component of total chip power consumption (the other components are short-circuit power and leakage power). It occurs as a result of charging capacitive loads at the output of gates.  These capacitive loads are in the form of wiring capacitance, junction capacitance, and the input (gate) capacitance of the fan-out gates. Since leakage is <2% of total power, the focus of this collaboration was only on dynamic power.

The expression for dynamic power is:

In (1), C denotes the capacitance being charged /discharged, Vdd is the supply voltage, f is the frequency of operation, and α is the switching activity factor. This expression assumes that the output load experiences a full voltage swing of Vdd. If this is not the case, and there are circuits that take advantage of this fact, (1) becomes proportional to (Vdd * Vswing). A brief discussion of the switching factor α is in order at this point. The switching factor is defined in this model as the probability of a gate experiencing an output low-to-high transition in an arbitrary clock cycle. For instance, a clock buffer sees both a low-to-high and a high-to-low transition in each clock cycle. Therefore, α for a clock signal is 1, as there is unity probability that the buffer will have an energy-consuming transition in a given cycle. Fortunately, most circuits have activity factors much smaller than 1. Some typical values for logic might be about 0.5 for data path logic and 0.03 to 0.05 for control logic. In most instances we will use a default value of 0.15 for α, which is in keeping  with values reported in the literature for static CMOS designs [1,2,3]. Notable exceptions to this assumption will be in cache memories, where read /write operations take place nearly every cycle, and clock-related circuits.

Here are five key components of dynamic power consumption and how we addressed a few of these components:

• Standard cell logic and local wiring

• Global interconnect (mainly busses, inter-modular routing, and other control)

• Global clock distribution (drivers + interconnect + sequential elements)

• Memory (on-chip caches) — this is constant in our case

• I /Os (drivers + off-chip capacitive loads) — this is constant in our case

Timing closure components

One fundamental issue of timing closure is the modeling of physical overcrowding.  The problem involves, among other factors, the representation and the handling of layout issues. These issues include placement congestion, overlapping of arbitrary-shaped components, routing congestion due to power/ground, clock distribution, signal interconnect, prefixed wires over components, and forbidden regions of engineering concerns.  While a clean and universal mathematical model of physical constraint remains open, we tend to formulate the layout problem using multiple constraints with sophisticated details that complicate the implementation. We need to consider multiple constraints with a unified objective function for a timing-closure design process. This is essential because many constraints are mutually conflicting if we view and handle their effects only on the surface. For example, to ease the routing congestion of a local area, we tend to distribute components out of the area to leave more room for routing.  However, for multi-layer routing technology, eliminating components does not save much on routing area. The spreading of components actually increases the wire length and demands more routing space. The resultant effect can have a negative impact on the goals of the original design. In fact, the timing can become much worse. Consequently, we need an intelligent operation that identifies both the component to move out and the component to move in to improve the design.

Accurately predicting the detail routed signal-integrity (SI) effects, before the detail routing happens, and its impact to timing is of key interest.  This is because a reasonable misprediction of timing before the detail route would create timing jumps after the routing is done.  Historically, designs for which it is tough to close timing have relied solely on post-route optimization to salvage setup /hold timing. With the advent of “in-route optimization”, timing closure has been bridged earlier during the routing step itself using track assignment. In addition, if we can reduce the wire lengths and make good judgment calls based on the timing profiles, we can find opportunities to further reduce power.  This paper will walk through the Cadence digital implementation flow and new tool options used to generate performance benefits for the design. The paper will also discuss the flow and tool changes that were done to get the best performance and power efficiency out of the ARM Cortex-M7 processor implementation.

Better Placement and Reduced Wirelength for Better Timing and Lower Power

As discussed in the introduction, wire capacitance and gate capacitance are among the key factors that impact dynamic power, while also affecting wire delays. While evaluating the floorplan and cell placement, it was noticed that the floorplan size was bigger than needed and the cell placement density was uniform. These two aspects could lead to spreading out of cells, resulting in longer wirelength and higher clock latencies. In order to improve the placement densities, certain portions of the design were soft-blocked, and the standard cell densities were kept above 75%.

Figure 2: Soft-Blocked Floorplan

Standard cell placement plays a vital role. If the placement is done right, it will eventually pay off in terms of better Quality of Results (QoR) and wirelength reduction. If the placement algorithms can take into account  some of the power  dissipation-related issues, like reducing  the wirelength and considering overall slack profile of the design, and also make the right moves during placement, this would tremendously improve the above mentioned aspect. This is the core principle behind the “Giga Place” placement engine. The Giga Place engine, available in Cadence Encounter® Digital Implementation System 14.1, helps place the cells in a timing-driven mode by building up the slack profile of the paths and performing  the placement adjustments based  on these  timing slacks. We have introduced this new placement engine on the ARM Cortex-M7 design and seen good improvements on the overall wirelength and Total Negative Slack (TNS).

Figure 3: “GigaPlace” Placement Engine

With a reduced floorplan and by removing the uniform placement and utilizing the new GigaPlace technologies, we were able to reduce the wirelength significantly. This helped push the frequency as well as reduce the power. But, there were still more opportunities available to further benefit the frequency and dynamic power targets.

Figure 4: Wirelength Reduced  with “GigaPlace and Soft-Blocked” Placement

Figure 5: Total Negative Slack (ns) Chart

In-Route Optimization: SI-Aware Optimization Before Routing to Achieve Final Frequency Target

“In-route optimization” for timing optimization happens before routing begins. This is a very close representation of the real routes, which does not account for the DRC fixes and the leaf-cell pin access. This enables us to get an accurate view of timing /SI and make bigger changes without disrupting the routes.  These changes are then committed to a full detail route.  In-route optimization technology utilizes an internal extraction engine for more effective RC modeling. The timing QoR improvement observed after post-route optimization was significant at the expense of a slight runtime increase (currently observed at only 2%). A successful usage of an internal extraction model during in-route optimization helped reduce the timing divergence seen as we go from the pre-route to the post-route stage.  This optimization technology pushed the design to achieve the targeted frequency.

Figure 6: In-Route Optimization Flow Chart

Design Changes and Further Dynamic Power Reduction

In the majority of present-day electronic design automation (EDA) tools, timing closure is the top priority and, hence, many of these tools make the trade-off to give priority to timing. However, opportunities exist to reduce area and gate capacitance by swapping cells to lower gate cap cells and by reducing the wirelength. To address the dynamic power reduction in the design, three major sets of experiments were done  to examine the above aspects.

In the first set of experiments, two main tool features were used in the process of reducing dynamic power.  These were the introduction of the “dynamic power optimization engine” along with the “area reclaim” feature in the post-route stage.  These options helped save 5% of dynamic power @400MHz and enabled us to nearly halve the gap that earlier existed between the actual and desired power target.

Figure 7: Example of Power Optimization

In the second set of experiments, the floorplan was soft-blocked by 100 microns to reduce the wirelength. This was discussed in detail in an earlier section. This floorplan shrink resulted in:

• Increasing the density from ~76% to 85%

• Wirelength reduction by 5.1% – post route

• Area (with combo of #1 and shrink) shrinkage by ~4% – post route

This helped saved an additional 2% @400MHz, and the impact was similar across the frequency sweep.

The third set of experiments was related to design changes where flop sizes were downsized to a minimum at pre_ cts opt and the remaining flops of higher drive strengths were set to “don’t use”.  This helped to further reduce the sequential power.  An important point to note is that the combinational power did not increase significantly. After we introduced the above technique, we were able to reduce power significantly, as shown in the charts below.

Results

By using these latest tool technologies and design techniques, we were able to achieve 10% better frequency and reduced the dynamic power by 10%. Results are shown here based on the 400MHz and 200MHz for the dynamic power reduction.

Table 1: Dynamic Power Reduction Results

The joint ARM /Cadence work started with addressing challenges at two points /scenarios on the PPA curve:

1. Frequency focus with optimal power (400MHz)

2. Lowest power at reduced frequency   (200MHz)

For scenario #1, out of box 14.1 allowed us to reach 400MHz. With the use of PowerOpt technology, available in Encounter Digital Implementation System 14.1, we were able to reduce power to an optimal number.  For scenario #2, additional use of GigaPlace technology and inherently better SI management allowing relaxed clock slew, and much higher power reduction at 200MHz was possible. With the combination of ARM design techniques and Cadence tool features, we were able to show 38% dynamic power reduction (for standard cells) going from

400MHz – 13.2-based run to 200MHz – 14.2 best power recipe run.

Summary

Reducing the wirelength and slack profile-based placement, and predicting the detailed routing impact in the early phase of the design, are important aspects to improve the performance and reduce the dynamic power consumption in designs. Tools perform better when given the right floorplan along with the proper directives at appropriate places. With a combination of design changes,  advanced  tools, and engineering expertise, today’s physical design engineers  have the means  to thoroughly  address  the challenges associated  with timing closure while keeping  the dynamic power  consumption of the designs low.

Figure 8: Dynamic Power ( Normalized) for Logic

Several months of collaborative work between ARM and Cadence, driven by many trials, have led to optimized PPA results. Cadence tools – Encounter RTL Compiler/ Encounter Digital Implementation System 14.1 – have produced better results out of box compared to Encounter RTL Compiler/ Encounter Digital Implementation System 13.x. The continuous refinement of the flow along with design techniques such as floorplan reduction and clock slew relaxation allowed a 38% dynamic power reduction. The ARM /Cadence implementation Reference Methodology (iRM) flow uses a similar recipe for both scenarios: lowest power (MinP) and highest frequency (AFAP).

References

[1] D. Liu and C. Svensson, “Power consumption estimation in CMOS VLSI chips,” IEEE Journal of Solid-State

Circuits, vol. 29, pp. 663-670, June 1994.

[2] A.P. Chandrakasan and R.W. Broderson, “Minimizing power consumption in digital CMOS circuits,” Proc. of the

IEEE, vol. 83, pp. 498-523, April 1995.

[3] G. Gerosa, et al., “250  MHz 5-W PowerPC microprocessor,” IEEE Journal of Solid-State Circuits, vol. 32, pp.

1635-1649, Nov. 1997.

Blog Review – Monday October 27 2014

Monday, October 27th, 2014

Synopsys won’t let the hybrid debate mess with your head; automating automotive verification; the write stuff; software’s role in wearable medical technology; ARM’s bandwidth stretching.
By Caroline Hayes, Senior Editor

Playing with your mind, Michael Posner, Synopsy, relishes a mashup blog, with a lion/zebra image to illustrate IP validation in software development. He does not tease the reader all through the blog though, and gives some sound advice on mixing it up with ARM-based system for development and FPGA for validation and combinations in-between.

Indulging in a little bit of a promo-blog, Richard Goering, deconstructs the addition to the Incisive additions of Functional Safety Simulator and Functional Safety Analysis for the vManager. We will let him off the indulgence, though, as the informative, well-researched piece is as much a blog for vehicle designers as it is for verification professionals.

Not that he needs much practice in a writing studio, Hamilton Carter is still turning up for class and finds parallels in the beauty of prose and the analysis of code. Instead of one replacing the other, he advocates supplementing one with the other so that the message and intent is clear for all.

Taking an appreciative step back, Helene at Dassault, reviews the medical market and how the wearable trend might influence it. She also looks at how the company’s software helps designers understand what is needed and create it.

There are plenty of diagrams to illustrate the point that Jakublamik is making in his blog for bandwidth consumption. After clearly setting out the culprits for bandwidth hunger, he lays out the ARM Mali GPU appetizers in a conversational, yet detailed very useful blog (and with a Chinese version available too).

Cortex-M processor Family at the Heart of IoT Systems

Saturday, October 25th, 2014

Gabe Moretti, Senior Editor

One cannot have a discussion about the semiconductor industry without hearing the word IoT.  It is really not a word as language lawyers will be ready to point out, but an abbreviation that stands for Internet of Things.  And, of course, the abbreviation is fundamentally incorrect, since the “things” will be connected in a variety of ways, not just the Internet.  In fact it is already clear that devices, grouped to form an intelligent subsystem of the IoT, will be connected using a number of protocols like: 6LoWPAN, ZigBee, WiFi, and Bluetooth.  ARM has developed the Cortex®-M processor family that is particularly well suited for providing processing power to devices that consume very low power in their duties of physical data acquisition. This is an instrumental function of the IoT.

Figure 1. The heterogeneous IoT: lots of “things” inter-connected. (Courtesy of ARM)

Figure 1 shows the vision the semiconductor industry holds of the IoT.  I believe that the figure shows a goal the industry set for itself, and a very ambitious goal it is.  At the moment the complete architecture of the IoT is undefined, and rightly so.  The IoT re-introduces a paradigm first used when ASIC devices were thought of being the ultimate solution to everyone’s computational requirements.  The business of IP started  as an enhancement to application-specific hardware, and now general purpose platforms constitute the core of most systems.  IoT lets the application drive the architecture, and companies like ARM provide the core computational block with an off-the-shelf device like a Cortex MCU.

The ARM Cortex-M processor family is a range of scalable and compatible, energy efficient, easy to use processors designed to help developers meet the needs of tomorrow’s smart and connected embedded applications. Those demands include delivering more features at a lower cost, increasing connectivity, better code reuse and improved energy efficiency. The ARM Cortex-M7 processor is the most recent and highest performance member of the Cortex-M processor family. But where the Cortex-M7 is at the heart of ARM partner SoCs for IoT systems, other connectivity IP is required to complete the intelligent SoC subsystem.

A collection of some of my favorite IoT-related IP follows.

Figure 2. The Cortex-M7 Architecture (Courtesy of ARM)

Development Ecosystem

To efficiently build a system, no matter how small, that can communicate with other devices, one needs IP.  ARM and Cadence Design Systems have had a long-standing collaboration in the area of both IP and development tools.  In September of this year the companies extended an already existing agreement covering more than 130 IP blocks and software.  The new agreement covers an expanded collaboration for IoT and wearable devices targeting TSMC’s ultra-low power technology platform. The collaboration is expected to enable the rapid development of IoT and wearable devices by optimizing the system integration of ARM IP and Cadence’s integrated flow for mixed-signal design and verification.

The partnership will deliver reference designs and physical design knowledge to integrate ARM Cortex processors, ARM CoreLink system IP, and ARM Artisan physical IP along with RF/analog/mixed-signal IP and embedded flash in the Virtuoso-VDI Mixed-Signal Open Access integrated flow for the TSMC process technology.

“The reduction in leakage of TSMC’s new ULP technology platform combined with the proven power-efficiency of Cortex-M processors will enable a vast range of devices to operate in ultra energy-constrained environments,” said Richard York, vice president of embedded segment marketing, ARM. “Our collaboration with Cadence enables designers to continue developing the most innovative IoT devices in the market.”  One of the fundamental changes in design methodology is the aggregation of capabilities from different vendors into one distribution point, like ARM, that serve as the guarantor of a proven development environment.

Communication and Security

System developers need to know that there are a number of sources of IP when deciding on the architecture of a product.  In the case of IoT it is necessary to address both the transmission capabilities and the security of the data.

As a strong partner of ARM Synopsys provides low power IP that supports a wide range of low power features such as configurable shutdown and power modes. The DesignWare family of IP offers both digital and analog components that can be integrated with any Cortex-M MCU.  Beyond the extensive list of digital logic, analog IP including ADCs and DACs, plus audio CODECs play an important role in IoT applications. Designers also have the opportunity to use Synopsys development and verification tools that have a strong track record handling ARM based designs.

The Tensilica group at Cadence has published a paper describing how to use Cadence IP to develop a Wi-Fi 802.11ac transceiver used for WLAN (wireless local area network). This transceiver design is architected on a programmable platform consisting of Tensilica DSPs, using an anchor DSP from the ConnX BBE family of cores in combination with a smaller specialized DSP and dedicated hardware RTL. Because of the enhanced instruction set in the Cortex-M7 and superscalar pipeline, plus the addition of floating point DSP, Cadence radio IP works well with the Cortex-M7 MCU as intermediate band, digital down conversion, post-processing or WLAN provisioning can be done by the Cortex-M7.

Accent S.A. is an Italian company that is focused on RF products.  Accent’s BASEsoc RF Platform for ARM enables pre-optimized, field-proven single chip wireless systems by serving as near-finished solutions for a number of applications.  This modular platform is easily customizable and supports integration of different wireless standards, such as ZigBee, Bluetooth, RFID and UWB, allowing customers to achieve a shorter time-to-market. The company claims that an ARM processor-based, complex RF-IC could be fully specified, developed and ramped to volume production by Accent in less than nine months.

Sonics offers a network on chip (NoC) solution that is both flexible in integrating various communication protocols and highly secure.   Figure 3 shows how the Sonics NoC provides secure communication in any SoC architecture.

Figure 3.  Security is Paramount in Data Transmission (Courtesy of Sonics)

According to Drew Wingard, Sonics CTO “Security is one of the most important, if not the most important, considerations when creating IoT-focused SoCs that collect sensitive information or control expensive equipment and/or resources. ARM’s TrustZone does a good job securing the computing part of the system, but what about the communications, media and sensor/motor subsystems? SoC security goes well beyond the CPU and operating system. SoC designers need a way to ensure complete security for their entire design.”

Drew concludes “The best way to accomplish SoC-wide security is by leveraging on-chip network fabrics like SonicsGN, which has built-in NoCLock features to provide independent, mutually secure domains that enable designers to isolate each subsystem’s shared resources. By minimizing the amount of secure hardware and software in each domain, NoCLock extends ARM TrustZone to provide increased protection and reliability, ensuring that subsystem-level security defects cannot be exploited to compromise the entire system.”

More examples exist of course and this is not an exhaustive list of devices supporting protocols that can be used in the intelligent home architecture.  The intelligent home, together with wearable medical devices, is the most frequent example of IoT that could be implemented by 2020.  In fact it is a sure bet to say that by the time the intelligent home is a reality many more IP blocks to support the application will be available.

Next Page »