Posts Tagged ‘Low-Power Design’

Next Page »

Traversing The Abstraction Landscape

Thursday, May 10th, 2012

By Ann Steffora Mutschler

Back in the early days of semiconductor design engineers could count the number of transistors on their chip with their own two eyes. They designed and worked at the same level of design abstraction when doing the timing analysis. Tools were SPICE-like, maybe abstracted with slightly simpler timing models than the SPICE-level transistor models.

Thanks to Moore’s Law, the number of transistors that can fit on a chip has grown to the billions, which obviously can’t be counted with the naked eye. But they also no longer scale with SPICE. Abstraction has been the way out by providing a higher-level view on the design.

“Clearly, even when I’m at gate level, I know I’m not getting the same accuracy that I would be getting at SPICE level, but if my models are good enough and it is close enough, I’m willing to take that slight hit to be able to do bigger designs,” noted Barry Pangrle, a solutions architect for low-power design at Mentor Graphics. “That’s the progression that we’ve gone from—transistors to gates. Then people were doing schematic capture and everything was gate-level models. Then we went to RTL and we started moving to RTL models. Now we are moving on into system and bigger components and functional blocks. At each level, we’re giving up some measure of accuracy—it’s just not going to be as detailed. It’s not going to be as fine-grained and the hope is though that we have enough information that we can make the decisions at that level of abstraction.”

The abstraction levels in use today were developed over a long period of time. They are well-defined because a huge amount of work was done in terms of both modeling, to make sure we can move between levels, and to ensure there is the appropriate level of detail to accomplish what needs to happen in that level.

“Today, we’ve tuned it and created enough modeling around it so we can get the information that we need out,” said Cary Chin, director of technical marketing for low-power solutions at Synopsys. “But I would say that the model isn’t general enough if we thought of some new use of these connections and voltages and expected it to give us the data that we wanted. Whereas if you did that all in SPICE, it likely would [provide the right data] because that’s one indication of the maturity of the model—whether you can use it for things that weren’t anticipated originally when you built the model.”

At the RTL level engineers synthesize down to a gate-level netlist so that they can bring in their gate level models, Pangrle said, with the hope that based on the information they get from those models, they can create something that’s going to be representative of what they need at the RTL level. “Now we’re looking at going one level beyond that and saying, ‘Okay, at the next level of abstraction what kind of information can we capture here?’ The tricky part is making sure that you still have the level of accuracy that you need to be able to make the types of design decisions that you’re going to rely on that information.”

But these levels of abstraction are not all fun and games. For engineering teams doing low-power designs, there are many challenges moving between these different design abstraction views, the biggest one between the RTL to gate because these two abstraction levels have too many big differences, explained Qi Wang, technical marketing group director for low power and mixed signal at Cadence. “On top of that, there is a lot of handshake of tools between those two levels.”

For example, he said an important aspect of low-power design is to gather activities. RTL simulation is run to collect activity, so all of the signal activity is annotated along with all the signal names. The engineer hopes to re-use that activity at the gate level, but the problem is the name seen at the RTL may not be the name seen at the gate level because the synthesis tool renames the files.

Power formats

In addition to this renaming, a lot of optimization can happen between the RTL and the gate level, which means that some signal may simply optimize out. Another possibility is that the logic may not optimize out but the representation can be changed, Wang said. “On the activity side, this is a flow challenge. The activity file you get for the RTL you hope you can re-use for the gate level, but many times you will find it is very difficult.”

Another kind of difficulty involved is with the power format, no matter what standard is, Wang noted. “The whole idea is that you describe your power intent in another file… If you write a power format file for RTL, which means it will be used for the RTL so all the names you refer to would be the RTL names. Now when you get to the gate level you hope you can use the same RTL level power intent because I want to keep my golden power intent through the design and verification flow.” But this will have the same problem as in the activity file.

To address this formal verification techniques can be used to indicate which RTL register names map to the corresponding flip flop on the netlist with a name-mapping file.

Then on the power intent side, he suggested the easiest way to deal with the renaming issue to have the synthesis tool write out a new power intent file, which automatically will reflect the name changes and the hierarchy ungrouping. When it comes to enabling the flow, however, the power intent written out by the synthesis must be equivalent to the original power intent, which is where power-aware equivalence checking tools are utilized to prove that the new power intent and the old power intent are equivalent.

Twenty years of hard labor

Traditionally, traversing levels of abstraction has been relatively straightforward—it’s just a lot of work. “If you look at the library modeling process that has evolved to go from kind of transistor level to gate level, things are very well defined today,” Chin said. “Libraries are super solid and vendors know how to characterize things even as the technology changes. That’s an example of a level of abstraction that’s pretty mature because over the last three, four or five generations of technology, we haven’t had to make major changes. There have been many, many little extensions and timing models and functionality and things like that but basically since we haven’t changed the fundamental design flow, the models and libraries have stayed pretty much the same, which is great.”

There have been similar advances in synthesis. “If you look at this between RTL and gate level, synthesis has changed a lot over that time, as well, but in general if you couple synthesis with verification tools and formal verification tools, things have actually grown nicely so that we still have very dependable flows that most people are still pretty happy with. You can push the button and trust what comes out at the other end. And as you recall, it took us 20 years to develop that level of trust,” he concluded.

Once the engineering community moves en masse to the system level, that 20 years could easily be duplicated.

Low Power Drives Performance And TCO

Thursday, October 6th, 2011

By Pallab Chatterjee
A common theme at this year’s Custom Integrated Circuit Conference was the reduction of power and power management while increasing data throughput. Historically, the show has featured new techniques for ultra high accuracy and brute force improvements in performance at all costs. The main theme this year was that in a world of mobile endpoint devices, the goal is to get performance in a stringent low power envelope.

Highly attended sessions at the kickoff of the event were focused on 3D device interconnect, photonic interconnect, design and process interaction and energy efficiency from data center to handhelds. The systems shown operated up to 50Gb/s data rates and typically had less than 20mW/lane power dissipation. These designs were exploring the next generation in high-density connectivity for centralized compute applications.

The 3D and interconnect sessions reviewed the architectural and design aspects and considerations of creating mixed and stacked memory and logic systems, traditional stacked memory die with edge connects—with the results above 15 die in a stack, and the challenges of testability before and after the die are assembled. Some of the testing issues relate to the use of Known Good Die (KGD) in a pin-available die back-to-front stack vs. a hidden pin face-to-face configuration. A key point is the thermal management of these die stacks and care for power supply and multi-state power design to make sure that both die are available at the same time for data to be passed back and forth. As a new constraint, mechanical issues regarding the die stack have become part of the design flow.

There also was discussion about using photonic interconnects as a replacement technology for copper. The driver for this technology is an overwhelming reduction in power and reduction of energy loss through heat and signal degradation over voltage- or current-based signaling in copper. The photonic presentations featured systems operating down to 1.2v on a 0.13um process to produce 7.4Gbps results, as well as wavelength-division-multiplexed links in sub-65nm processes that had compensation for process and thermal-induced ring resonator mismatches. The session ended with a presentation of power-efficient I/O design that focused on active power reduction design techniques for single-rate symmetric systems.

The session on energy management covered the full power spectrum from large- scale systems through handheld devices. This is one of the first times the ecosystem for the hardware use models was discussed as a whole at an event, and in particular that the entire system hierarchy has common power constraints. At the large system level (data center, storage, computer servers) the issue is thermal management. Simple reduction in component power consumption does not necessarily reduce overall energy costs. A system of rule-based proactive policies has been created to identify operating cost issues and support a variety of thermal management techniques, including allowing the circuits to run at a higher temperature prior to cooling. Currently cooling costs exceed component operating power costs.

The continuation of the systematic design power reduction included wide dynamic range operation for near-threshold-voltage (NTV) designs. This technique allows for the larger scaling of operating power supply voltage, which results in a power reduction proportional to the square of the voltage. The use of FPGAs to access advanced process technologies with lower power characteristics was shown. These designs also can minimize power-hungry, high-current board-level I/Os and traces, while having multiple functions connected by low-power, on-chip interconnects. The results are custom implementations that support multi-task reconfigurable computing while providing high- performance computation for specific tasks.

Finally, the power supplies for the design, in a discussion of high-efficiency techniques for DC-DC converters was presented. Multi-power designs, including some systems with three or more supply voltages (5V, 3.3V, 1.8V, 1.5V, and 1.25V) dissipate a lot of wasted power in the step-down conversion of these supplies. High-efficiency fast transient DC-DC converters can help minimize these systemic power losses not associated with performance, but with the system architecture.

Outside of the CICC event, these techniques were utilized in new lower TCO systems. At the recent Oracle World event the new computer and storage systems have been released as “engineered systems.” The basis is a new processor design, which includes new memory and backplane design, tiered storage (DRAM, flash and disk) and connectivity access to a revised operating system and new applications. The “system” is designed and aware of overall power reduction, energy efficiency and reduced cost of operation. The recognition is that just component “sleep state” power reduction is not sufficient anymore for real applications, and that the use chain is now part of the design constraints.

Extending Battery Life

Thursday, July 21st, 2011

By Ed Sperling
In the past it was all about clock frequency. People bought the latest computer and frequently paid a premium because it could crunch numbers faster. But as computing moves from the desktop into handheld devices, that focus is radically changing.

Low-Power Engineering caught up with Mark Bohr, senior fellow and director of Intel’s process architecture and integration, to talk about this shift and what needs to be solved in the future.

LPE: How much of a performance gain and an energy reduction do you get using Tri-Gate?
Bohr: In the low-voltage range, which is around 0.7 volts, we’re achieving about a 37% speed-up. Or conversely, another benchmark is power savings. If we benchmark these against our 32nm planar devices using Tri-Gate, we reduce active power by about 50%.

LPE: Is there any advantage in dropping the voltage lower than 0.7 volts?
Bohr: Yes, that is the name of the game now—to provide the lowest possible active power. Reducing operating voltage is a way of doing that. When I use 0.7 volts as the benchmark, that does not imply it’s the lowest voltage we can use.

LPE: What’s the foreseeable limit in terms of how low you can drop the operating voltage?
Bohr: That’s an important question for both process technology and circuit design. There’s no simple answer, but we are pushing that from both sides. On the transistor process side we are trying to make them usable from the lowest operating voltage. We also are bringing in some circuit design tricks to better enable low-voltage operation.

LPE: What are the challenges with that?
Bohr: There are two factors that you limit you as you try to drive operating voltage lower. One is for state retention on memory elements, like a static RAM cell. The other is performance and how controllable the performance is at a low voltage level. You have to fight both as you push voltage lower.

LPE: We’ve evolved into a society of impatient people, but is the concern now performance or plugging in your mobile device every night?
Bohr: My impression is that a lot of delays we see are not so much dependent on the speed of the processor but the bandwidth to memory.

LPE: We’ve been struggling with bottlenecks for decades, whether it’s I/O or memory. What’s changed?
Bohr: It’s not so much a microprocessor chip that operates at a higher frequency, but one that provides the performance level we expect on our desktop computers but in our hand. That’s where the power reduction is important.

LPE: It’s interesting how much power is now dominates Intel’s focus. What’s changed?
Bohr: It’s what the market wants. The market isn’t beating down our door for a 5GHz processor, even in a desktop solution. The fan noise is not going to be pleasant. People want high performance in their hand with long battery life.

Applications And Low Power

Thursday, May 12th, 2011

By Pallab Chatterjee

As new process technologies are being developed to make devices smaller, they are also driving the operating power lower for the devices and systems.

The goal is to reduce the power requirements for the system and hence increase the functional life on a single battery charge. This concept has worked in the semiconductor industry from 10-micron processes down to the 65nm process node. Below 65nm, the rules are changing, not because of the process that is available to manufacture the device but what people are doing with the devices.

At the 40nm node and below, the billion-transistor chips are possible. These are not practical at the larger geometries for a number of reasons all related to manufacturability. The trouble with a billion-transistor chip is it does a lot of different stuff and has a lot of computing capability. To support this with a reasonable power factor, the design will support multiple power grids and power controls, so blocks can be turned on and off as needed. This technique helps extend battery life by only running what is needed for a given operation. This is the default direction for general-purpose processor cores and memories in the industry.

For designs that do not need 1 billion devices, a proportionally large chip can be designed for the function at this node because the extra devices have a very small incremental cost. The trouble with adding extra devices such as display drivers, graphics cores and accelerators, and connectivity blocks, is these devices are hard to turn off and save power. It is generally not acceptable to turn off the I/Os and connectivity of an appliance if it will be receiving or transmitting data. There is a low power spec (802.3az) that describes how to power down connectivity, but it requires both sides of the connection to work. These designs also are hampered by the applications that are being run on them in order to balance the power.

If you think of a tablet product, when it is sending or receiving WiFi/3G, the display does not have to be active and can be powered down. However, the connectivity block must be on. But if the content that is being received is streaming video, then the full function of the tablet has to be on to display the graphics, fill the buffers, and handle the connectivity. This changes the battery use model, as it is typically designed for low-duty cycle applications. Watching a streaming video movie does not constitute a low duty cycle application.

Another driver of the power factor is how much resolution is needed. Modern DSLRs routinely operate in the 10+MP still-image market and video is now almost always 1080p. These large datasets and extended streaming times task the low power design, and the chips are not optimized for the high-performance blocks (GPU and NIC) being at 100% duty cycle.

With the release of general-purpose cores for CPUs and GPGPUs, the low-power implementation cannot be limited to bus architectures and power-down blocks. To effectively support the design in a system (smartphone, tablet, netbook) the application has to be considered (gaming, streaming media, office functions, e-mail, web surfing) and the power profile for each application mode optimized. It is this optimization and the steady-state performance, displaying an e-book or streaming a video, that currently drives the power partitioning and the power management methodology that should be used. The verification world now has consider applications above the OS and use models with sensors/MEMS as the main power handling constraints, not just the “How many devices can I put in the box” mentality that has existed since the mid ’70s.

Power Bits: Sept. 17

Friday, September 17th, 2010

By Ed Sperling
The University of Washington and Georgia Institute of Technology have come up with an interesting concept for cutting the power needed for communication inside of buildings.

The approach is a new twist on wireline communications, which use the electrical wiring in a building as an antenna. That can save power because there is less distance for signals to travel, which means communication can happen at much lower power than using a whole-house or whole-building wireless router.

Signals still have to make their way throughout a building, though, which is why most of the past approaches use a mesh network, whereby nodes communicate with nearby nodes. Under the new approach, detailed in a white paper, only the base station receiver is actually wired to the powerline. The wireless signals are then routed through the powerline rather than through other nodes on the network.

The result is much lower power consumption because there is less distance to cover. Sensors only have to communicate with the nearest plug. It also results in much easier set-up by users and far less frequent battery changes inside of sensors.

The researchers also looked at the most power-intensive piece of wireless sensor nodes, the RF radio, and discovered that during communication the most power is consumed during receiving because it is always active. They substituted devices that could only send but not receive, with the brains of the operation centralized rather than distributed.

Will it catch on? Maybe. There are still some kinks to work out, such as data reliability and what happens at higher frequencies. But if battery life can be increased from months to years or even decades, convenience alone may make these kinds of systems much more interesting.

Estimating Power From Mobile Device Apps

Thursday, September 9th, 2010

By Ann Steffora Mutschler
How do software application developers – even the ones sitting at home on their living room sofas with laptops – measure the power consumption of their application on the target device? This is a big problem today (something that is painfully obvious to owners of iPhones or Blackberries), and it will only get bigger.

Software engineers may think it is not their problem. They can write whatever code they want, then push off the issues to the hardware engineers who, in fact, have limited control.

To be sure, a hardware/software co-design environment is eventually going to be the ‘new frontier’ with models of abstraction used at higher and higher levels so that engineers can emulate certain applications or functions. And, of course, new tools will be needed to take these considerations into account. But from all accounts, those tools may still be years away from the engineers’ workbench, let alone the software development kit of the at-home developer.

Ideally, if high-level models can be created that break through the RTL descriptions of the hardware to the transaction level, hardware information can be captured and brought up to the software applications, whether that includes power consumption, software domains, or the like. Then engineers could see the impact of software and modify hardware accordingly, said Vic Kulkarni, general manager and senior VP of the RTL business unit at Apache Design Solutions. “Today it is the reverse: because you use whatever hardware is available and then software developers they don’t really have knowledge of what that hardware is capable of doing as such.”

Pete Hardee, director of solutions marketing at Cadence Design Systems noted that today’s smart phones, as convergent devices, contain about as much computing power as stand-alone devices had recently. “A smart phone today can easily contain the same processing power as mainstream PCs or laptops had maybe four or five years ago.” They contain video capabilities that would have been set-top boxes just a couple of years ago; high-definition video, and 3- to 5-megapixel cameras. At the same time, while we’ve had enormous leaps in the hardware technology, obviously still following Moore’s Law, the leaps in software productivity have actually outpaced Moore’s Law to make that happen on a mobile device. The thing it hasn’t outpaced is poor old battery technology. So despite all of this going on, we’ve still got lithium-ion batteries. Designers have done a great job to squeeze what they can out of them, but fundamentally we still expect to get through at least a full working day and get home and put the phone on charge.”

Granted, it does depend what you’re doing with the phone, but bottom line is that all of it is under software control. “When you’re analyzing power it’s not just about characterization of the hardware. You have to run with a significant number of system modes that represent the high activity of when I’m busy on all these various applications but also represent the low activity when I’m not busy, and also switching between those system modes so I can work out when it’s worth powering down parts of the device and when it’s not,” he said.

The challenge for many chip companies today is the need to simulate 30 different system modes. In addition, they are painstakingly measuring the bandwidth in all of those modes, in various parts of the chip and working out exactly how the power management system needs to cope: what can be slowed down, what needs to be sped up so it can be shut down for longer. All of these various modes need to be checked out. “Being able to measure the power in response to real system activity running real software becomes a big deal and there are very, very few solutions that can do that,” he said.

The prevalent thinking of today leans towards virtual platforms to do this measurement, but Hardee believes they are too abstract to be able to measure the effects on power. “As soon as you really need to look at the power scheme that is implemented in the hardware then you need to run at an accuracy which is going to slow down a virtual platform.”

To be fair, Cadence’s approach does include virtual platforms through its transaction-level simulators, and integration with the fast processor models from ARM and various other processor models available, but the company stresses its hardware-based emulation system for power-aware simulation.

Shabtay Matalon, ESL market development manager at Mentor Graphics, believes engineers already are familiar with the notion of abstraction—they started by abstracting gates to RTL and now there is an abstraction of RTL functionality at the higher-level writing using SystemC and transaction-level modeling. “People are aware that you can also abstract timing by creating a model that doesn’t contain all the information but has sufficient information to get the notion of timing. What people may not be aware is that we can create a model that can be used by the software engineer that contains an abstraction of power all the way up to ESL or TLM.”

This model associates power with the traffic flowing through these transaction-level models. Once those models get created they can be stitched together, Matalon said. The models can be of peripherals, of processors, or of devices, and can be stitched together to create a platform on which applications software can run.

Virtual platforms are the way to go at the very high-level, agreed Cary Chin, director of technical marketing for Synopsys’ low-power solutions group. “There are some pretty good ways to hook into the software stack through a virtual platform. But I still think that the connection from the virtual platform on down through to high-level RTL is still a little bit broken because there’s a lot of stuff that needs to happen to connect those environments together.”

The big question to answer here, though, is how much we want the software developer to be controlling the hardware directly, he said. It’s basically directly up against the idea of information hiding. “In a software development environment we try to hide things because there are things we can’t actually decide better at high-level versus a low-level. Those concepts come in exactly when you’re spanning software down into the hardware realm, as well, so it’s very hard to tell. You want to write software that’s really transportable between environments and things like that, but if you’re tied into closely to a particular hardware platform it makes that very difficult, as well.”

Educating the software developer
“With all of this, it would still be possible to write bad software that is very inefficient in the way data is used—maybe something that unnecessarily continually refreshes the LCD screen, for instance,” said Hardee. “How people get feedback for that really boils down to the application development kits that are provided by either the phone manufacturer or the network operator (Sprint has an application development network). On phones that use Android, there’s a development system. It would be possible to give people feedback in terms of bad optimization, bad memory usage, etc. in those development kits.”

Part of the solution may be an ecosystem or partnership approach, as well. “The idea of [EDA vendors] at some point partnering with somebody like Apple or Google to really extend their development kits down might actually make as much sense as trying to build stuff up from the hardware side because those guys have a lot of resources and they could actually help a lot in terms of meeting in the middle,” Chin added.

But that still doesn’t solve one of the big issues, which is the great divide that exists between the software and hardware worlds. “The chasm between hardware and software is bigger than the chasm between front-end and back-end design. The two worlds are not really well connected today and ultimately, if you think about it from the software development standpoint, there are different levels of abstraction in some sense that one can think about. There are high-level programming languages like C/C++, and then there is the low-level programming which is assembly code,” noted Will Ruby, senior director of product engineering and applications at Apache Design Solutions.

At least some of this can be dealt in the short term by using models, but some will also require new technology such as smart compilers.

“Assembly is actually closer to hardware but people typically don’t program in assembly unless they are doing embedded programming. Somehow the notion of hardware needs to be transported into a C/C++ or Java-type development environment. That’s where the models come in. We need models to represent the hardware behavior, but I think we would also need something like a smart compiler that can take advantage of some of these hardware hooks and understand that if you’re writing a program for a mobile application, you need to make some tradeoffs during compilation for performance or power consumption. People on the hardware side think about this all the time, but on the software side it’s not easy to do. So compilers may need to evolve in that direction. Compilers need to be hardware-aware and need to understand what hardware is doing,” he concluded.

A New Reference For Low-Power Processors

Thursday, September 9th, 2010

By Pallab Chattejee
Just how much power can you squeeze out of a processor without destroying performance?

Ask IBM. The company introduced a new methodology for power and energy management on its multicore processor chips. The new PowerPC chip, the Power 7, has eight main processor cores each with its own L2 and L3 cache and two central memory controllers. The architecture for the design is built around an energy and power management schema called EnergyScale.

The EnergyScale system is a data-dependent, policy-based system that interprets activities in the processor cores, the memory hierarchy and the main memory. It is made up of four distinct parts: Sense, Decide, Control, and Actuate. The sense function is performed using both digital-thermal sensors (DTS) and critical-path monitors (CPM). The DTS utilizes 44 on-chip sense points that are organized as five per chiplet, emergency self-protect thermal throttling, and on the main memory controllers. The CPM detects circuit timing margin to help guide the optimal frequency and voltage adjustments.

The decide block is an off-chip, dedicated-function microcontroller that gets its information on the status of the chip though an EnergyScale I2C Slave communication port. To assist in the performance of the EnergyScale microcontroller, the system minimizes the communications bandwidth by packing the sensor data to reduce the number of read operations, multicasting the responses to reduce the number or writes and creating an automated on-chip transaction table which allows the sensor data to be streamed out in a single I2C command.

The control block features per-core frequency control ranging from -50% to +10% of the nominal frequency, on-chip support for off-chip voltage control, memory power management, and a command rate interface control. The core frequency control, in order to minimize latency, has an automated fast frequency slew of more than 50MHz per microsecond. The voltage control is done through a serial voltage I2C command interface, and is fully automated based on the policies that are defined. The memory management includes power-down modes for the DIMMs and also reducing the data access rate as needed. As the Power7 chip is an symmetric multiprocessing (SMP) system, and has SMP based memory interfaces, the command-rate interface control was built with asynchronous control to be as adaptable as possible while addressing the needs of any core chiplet.

The Actuate function uses three different power-down modes beside the normal operating mode. These modes are per-core, and are based on both levels of power reduction and latency to return to full function. The modes are “Nap,” which targets about 5 microseconds of latency to return to operation, and is structured on turning off the clocks to the execution units; “Sleep,” which features 1 millisecond of turn-on latency and which has the clocks shut off while also purging the local caches; and “Heavy Sleep,” which has a 2 millisecond target recovery time. In this mode, all the cores are in “Sleep” mode, and the voltage is reduced to all the cores, caches and the states are loaded into low-voltage retention registers. The exit from heavy sleep includes an automated voltage ramp back to full operating voltage as the hardware is automatically initialized. These energy policies are in addition to the per-core frequency scaling, and the associated core voltage scaling that goes with the frequency adjustment.

In addition to the direct sense, the firmware of the off-chip microcontroller can estimate functions based on the data coming in to adjust energy for leakage, temperature, and power supply variation. The last portion of intelligence for the energy-control system is the CPM. The circuitry dynamically detects margin in circuit timing and eliminates the potentials for static conservative margin guard-banding in the active designs.

The net result is more than a 50% improvement in the power for the individual cores as a system package using the automated on-chip controls and the off-chip microcontroller firmware based signal loop (as shown in the following figure).

Pricey Processes For Low Power

Thursday, April 8th, 2010

By Pallab Chatterjee
Recently Samsung gave an update on the status and availability of its advanced 32/28nm process technology for use in foundry. The process is targeted for shipping designs to customers at the end of this year, with a road map that continues through the 22/20nm nodes and down to 15nm.

What was particularly interesting were several key innovations that have made this all possible, as well as the company’s statement that the real driver is reduced power.

The new processes, co-developed with IBM, follow the large commercial success of the Intel achievement of using a Hafnium “Hi-K” metal gate process. Although this terminology has been around for a few years and is the dominant technology in the microprocessor marketplace, there has been some “uncertainty” in the design community about what it actually buys the designer. The Hi-K gate technology is a process development that directly addresses the leakage current problem that arose in CMOS technology at the 90nm node and has persisted through the 45nm node. The scaling on process technology using Moore’s law is a three-axis scaling—x and y for the length and width of the transistor used to make the basic devices, and also z or the vertical dimension. Z is the thickness of the gate dielectric, which controls the intrinsic speed and performance of the device by setting the difference between “on” and “off.”

Since the late 1960′s the scaling of all three axes has taken place concurrently—until the 90nm node, that is. At 90nm the complexities of lithographic processing, planarization, materials used for interconnect, isolation between devices and reduction in application power supply were moved up from third- to fourth-order issues to become the dominant drivers. This made the leakage current and capacitance issues with the z-direction scaling the secondary challenge. This focus on the other processing issues caused the gate scaling to stall, and not continue proportionately with the x and y scaling, resulting in leakage, multi-power islands, high electric fields, and high-stress devices and designs that have dominated the past few years.

The lithography solution is staying optical with multiple patterning solutions through the 22/20nm node. The planarization, interconnect and device stacking for “multi-die” technologies are progressing to address the function vs. density vs. space requirements going forward, which allowed time to develop the new materials needed to make the gate dielectric (replacement of standard SiO2 with an Hf based material) and re-start the z-dimension scaling. At the 32/28nm node, the reduced leakage and increased device performance (difference between “on” and “off” states) brings a new level of design capability.

Results using the process in foundry-type circuits (embedded processors with memory, custom logic, and standard commercial interface connectivity) are showing as much as a 35% power reduction for the same operation specification as existing circuits. This power reduction comes from both the ability to drop the operating supply voltage for the same performance specification and from an overall reduction in leakage/standby state power for “idle” modes in a design.

The new process technology, now starting to become available from multiple suppliers, does bring an opportunity to create a new generation of mobile appliances. There is a significant challenge to the design community to address these benefits as a mainstream technology solution. The cost of entry into the design game at these nodes is very high. A typical 32/28nm SoC is probably going to contain more than 500 million devices, including embedded memory, and will likely have a very high pin count. This will require a big design team to architect, design, assemble, and test, not counting the very aggressive 20-plus man-years of IC design (5M devices/man year for the flow X 20 people = 100M devices + 400M in third party embedded memory), and application software development.

These design costs are on top of the fab costs, which are targeted at more than $4M for masks, plus the wafer fab, package and test. And it is looking like the big boys at the $30 million-minimum per design are the only ones who will be left at the table for real “low power” process game.

Killer Bugs

Thursday, April 8th, 2010

By Ed Sperling

Hardware and software bugs are all around us. When an application suddenly dies or a smart phone freezes because of the unanticipated interaction between hardware and software blocks in a system on chip, most users aren’t even the least bit fazed. They usually just re-boot and forget about it.

Bugs caused by power are an entirely different matter, however. For one thing, they’re usually fatal. For another, they’re getting much, much harder to detect. And third, they’re harder to fix when they are detected.

“Debugging is getting much more difficult because when the lead generator is powered off, how do you find out that there’s a problem? You may have two power lines with a different Vdd because of connectivity and it will not work,” said Bhanu Kapoor, president of Mimasic, a consultancy focused on low power. “With power you used to have a single voltage. Now you have different supply lines, so you get new problems. Some of this can be detected in the netlist, but some of these problems also show up in the course of manufacturing.”

The problem is magnified by the addition of multiple power islands and multiple cores.

“Correct delivery of a power supply is at the core of many of the power issues, and traditional testing methods use a fault model that is based on wires erroneously connected to supply or ground,” Kapoor said. “And needless to say, incorrect delivery of power will result in fatal issues for proper operation of the chip. For example, an isolation cell at the output of a power domain ensures active regions receive meaningful signal when this domain is shut down. If the supply to the isolation cell itself is switched off due to either an incorrect wiring or improper placement of the isolation cell then active regions will see some unknown values that will lead to failure of operation in this mode. “

Similar things will happen if the power supply for a level-shifter has wiring issues. It may be worse here since depending upon of voltage differences, the issue may only show up sometimes. And there may be these very hard to find sneaky leakage paths that drain the battery much faster without any functional problem ever showing up. They will also sneak through testing methods and only show up as a fatal business issue.

Consider a real-world example: A major wireless chipmaker was recently headed to tapeout when it ran some additional tests and found eight bugs related to wrong implementations of power intent. “They would have caused catastrophic failures,” said Peter Hardee, director of solutions marketing at Cadence Design Systems. “Things are getting a lot more complex. You may have power domain ‘A’ physically separated from power domain ‘B,’ and at some point they need to talk. The problem is that the wires may run through power domain ‘C.’ Was ‘C’ on or off when you verified the chip?”

It’s not that the wireless chipmaker didn’t understand all of these issues, either. Even at the most sophisticated chip companies where power intent and design was part of the up-front architectural decisions, problems still surface late in the design cycle. A device may be functionally verifiable but have fatal errors. And there’s no magic button to push or even an integrated tools flow that solves everything.

“A lot of things that used to be secondary issues are now primary issues,” said Vic Kulkarni, general manager of the RTL business unit at Apache Design Automation. “In the past, you could just put a lot of margin into the design, but the voltage has to be high for that to work. Today, the margin is no longer there.”

Dueling priorities
Creating SoC designs has always been about making tradeoffs between area, power and performance. Before 90nm, however, the power was more of an afterthought than part of the initial planning process. At 65nm and beyond, it is now an integral part of every chip, along with software and IP—which also were afterthoughts at older process nodes.

“The reality is if you have a performance issue or a power problem, it stems from the fact that you may validated the hardware in isolation, but not in the context of the software application,” said Shabtay Matalon, ESL marketing manager in Mentor Graphics’ Design Creation Division. “There are ways to fix functionality in terms of software. But I’m not aware of one that can fix power or performance by fixing the software.”

IP is likewise a problem when it comes from multiple sources and when it involves multiple voltages. Big IP vendors are all emphasizing power-aware IP so that it can be re-used more easily. But the amount of IP inside all SoCs is growing steadily, in large part because there are too few engineers inside companies to re-invent that IP and still get a chip to market on time.

Not all of that IP runs at the same voltage, and not all of it is necessarily used in a manner in which it was intended by the IP vendor. And while power methodologies such as UPF and CPF are supposed to account for that, some of it still slips through the cracks. In the best-case scenario, some of that can be fixed with software. There are plenty of cases that don’t fit that description, however.

“The fatal bugs are the ones that kill the company before the product ships,” CEO of MCCI Corp. “What causes those are mask spins. Behind those are system-level problems. You hook it up to a critical system and it doesn’t work. It’s down at the PHY level or the RTL level and it’s not accessible to software.”

Rethinking Test

Thursday, February 11th, 2010

By Ann Steffora Mutschler

The responsibility of semiconductor test has long sat solely with the test engineer as the chip designer focused on the functionality of the device. However, particularly in low-power designs, when the device is being tested, much higher power levels are applied than normal functional operation – sometimes causing the device to fail.

This ‘false failure’ can lead to unnecessary yield loss on the production line requiring significant time and effort to diagnose because the extra power applied to the device may indicate incorrectly that the device is bad when it is not.

The goal of the test engineer is to reduce the cost to test a device. Therefore, they want their automatic test pattern generation (ATPG) tools to generate a lot of activity and test a lot of the chip. As a result, a lot of power is being consumed—typically exceeding the functional power budget between 7x to 10x.

This occurs because the chip is designed with a power budget in functional mode. “If you think about the design of a chip, most chips aren’t operating all parts of the chip at the same time and ATPG doesn’t look at functionality — it just looks at the structure and to minimize the cost or minimize the patterns it’s trying to make as much activity happen in the chip in order to get test all simultaneously,” explained Robert Ruiz, senior product marketing manager for test automation products at Synopsys.

In the past, ATPG tools really didn’t need to look at power consumption — the chips were small enough, the power rails were big enough, and there wasn’t a big prevalence of low-power designs. On top of that, there weren’t compression techniques being used, which further exacerbates the problem because the goal of a low-power design is to minimize switching activity, while the goal of compression is to maximize it. This is a very big deal for test engineers, but it is not an issue traditionally highlighted in the design community given designers’ focus on functionality—even though designers may take partial ownership about how to implement some of the design-for-test solutions.

Ruiz indicated that approximately three years ago the impact of power on test became a big area of Synopsys’ R&D effort based on feedback from a number of customers. At that time, he said, there were some customers who reported power issues related to test. They did some redesign, which resolved the issues at hand, but believed it could be a problem in the future. “It has certainly evolved to the point where most customers say they definitely have found a power issue during test,” Ruiz said.

Test is tricky for low-power designs
Greg Aldrich, director of marketing for the Silicon Test Systems group at Mentor Graphics Corp. said one of the problems in test is how to create test patterns that have lower power profiles in terms of what data gets shifted in, which is dramatically complicated by the use of on-chip compression and on-chip test structures. Previously, test was performed by shifting data into scan chains, issuing the clock cycle, shifting the data out, and then comparing it to the golden response data, whereby the scan chains were directly connected to the tester.

However, most designs today utilize either built-in self test (BIST) or on-chip/embedded compression, which is still a deterministic process. But instead of the tester directly shifting data into the scan chain, there is a decompressor that it goes through that sits on chip. The tester shifts data into the decompressor, which is expanded internally, essentially creating the data on-chip, Aldrich explained.

What complicates the process is that since the data is being created on chip a new on-chip piece of logic must also be created, so Mentor invented a new low-power decompressor that allows the designer to control the stuff on chip, he said. “It’s not as simple as just changing what’s on the tester. You actually have to change some of the embedded test logic on chip to be able to control that. I think that is going to be primarily how switching activity is going to be controlled during the test—by controlling how the test patterns are created and then how the test patterns are loaded.”

Similarly, Synopsys rolled out an ATPG approach that doesn’t require any hardware or DFT change (which no customer really wants to do), Ruiz said. The company’s TetraMAX tool was enhanced about three years ago to allow the user to dial in a budget of the switching activity, which serves as a proxy for power consumption. And, if a customer wants to be more aggressive and active in managing power consumption, there are other hardware techniques including Synopsys’ DFTMAX tool as it puts off the scan chain.

Likewise, Mentor’s Aldrich noted that in terms of innovations both on the design side as well as on the test side to help deal with the impact of power on test, “It’s all focused on how to reduce the switching activity during the test. Historically, a lot of that has been done by partitioning the test and that is still the case especially as you move to designs that have multiple voltage domains or multiple power islands. Being able to just sequence the tests for each one of those allows you to test a smaller piece of the design. That has some implications on the test time and cost that it takes to test the device but that’s one approach.”

Mentor has also added more control into its tools as to how much is switching during the test process. For example in its ATPG tools, users can specify constraints to the test pattern generation tool to indicates how much switching is allowed during the test pattern.

“The more aggressive they are in terms of lowering the amount of switching during the test process, the higher it is in terms of test costs. It’s going to take more test patterns, it’s going to take more compute time to create the test patterns but it is a knob they will have control over now. They really have no other choice other than designing the power structures in the design such that they can handle 50% switching activity—that’s the only other alternative,” Aldrich said.

In the end, the objective of test is to create the highest coverage in the smallest number of test patterns. What that means from the perspective of the design, it means you want to try and switch on everything possible in the design on every cycle on the tester—and that’s the opposite goal of low-power design. That said, a complete rethinking of compression algorithms and other test technology is in order.

Next Page »