Posts Tagged ‘Nvidia’

Processor Subject To Change

Thursday, February 9th, 2012

By Ann Steffora Mutschler
With power complexity driving sophisticated management techniques, SoC design engineering teams are turning to a new class of customizable processor architectures from ARM, CEVA, NVIDIA, Qualcomm and Tensilica and others to take advantage of the best in power saving techniques.

While these new architectures are novel approaches, the concepts are not especially new, particularly in mobile applications.

“If you look at what mobile processors have been doing, I would argue they’ve been doing some sort of big.LITTLE for a long time,” explained Nandan Nayampally, director of applications processor marketing in the processor division of ARM. “By that I mean you have microcontrollers taking charge when the big application processor is not working, or you’ve got video engines being separate from the main application processing. The compartmentalization of the activities around the chip have been always a focus for mobile because you will save power any which way you can. That’s a given.”

ARM has observed that what’s changed in the recent past is that the main OS needs to be running more and more of the time because with apps like Twitter feeds and Facebook updates, those are little apps that are constantly running on top of the OS.

As fun and/or useful as they are, these apps are killing battery life.

Nayampally explained the big.LITTLE architecture with an example. “Let’s say I’m doing an MP3 playback in the old days. You’d say, ‘I’m running on the big core, I kick off the task to a little core and then turn off the big processor because the MP3 can run just fine on a microcontroller type device. It’s all on the same die. Then suddenly you get a call and it wakes up the big processor and it takes over again. But when you offloaded that MP3 in the olden days—six months or so ago—you actually could have a separate task that wasn’t really run by the OS. Now there are so many more things and services that people are coming to expect that you can’t have them done specifically for targets that are different from the application processor itself and they run on top of the OS. Now you are telling the chip, ‘No, I won’t do these specialized things as separate things for very power-efficient sub-components, they have to be done by the main processor.’ But the main processor also has to become very schizophrenic in the level of performance it requires for the main tasks as well as what it needs for the little tasks.”

Source: ARM

What makes big.LITTLE interesting is that the processors are fully coherent so the software engineer doesn’t have to worry as much about maintaining every piece of data. The coherency in hardware takes care of that. That makes the software development quicker and can actually improve performance and battery life.

Designed to be an extension of DVFS, there are multiple use models in which big.LITTLE can work, with the simplest use meant to be effectively transparent to the OS, Nayampally continued. “The power management software always speaks to a driver that is the right power and performance needed based on what is required. If, for example, you had today’s processor and it was using the lowest performance level it could while doing Twitter update, it just can’t be as efficient as something that was designed to be a fifth smaller or something like that. What if your DVFS had a next step that is more efficient and you can work there for a while? From an OS standpoint, or an application standpoint, it doesn’t matter. It’s just another step in your DVFS. Underneath it what happens is the driver now can do the kick-off to switch the operations from the big core to the little core or from the little core to the big core or cluster in fact.”

NVIDIA’s Tegra 3 employs variable symmetric multiprocessing (vSMP) while Qualcomm uses asynchronous symmetrical multiprocessing (aSMP) – which are the same principles that govern ARM’s big.LITTLE architecture.

NVIDIA’s Tegra 3, launched last November is a quad-core mobile processor for smartphones and tablets, currently shipping in the ASUS Transformer Android tablet. A company spokesman explained that behind Tegra 3’s power efficiency is a fifth lower-power “companion” CPU core that goes with the four CPU cores and is specifically targeted at battery savings. Tegra 3’s architecture allows it to provide the best combination of performance and battery life by switching between the four main CPU cores and the fifth core for less demanding tasks and active standby mode.

For CEVA, which licenses DSPs, programmability has always been the name of the game, according to Eran Briman, the company’s vice president of marketing. About seven years ago it became apparent that general-purpose DSPs are not going to make the cut for next-generation designs—particularly in 40nm communications designs. In one of its newest offerings, the CEVA-XC DSP software-defined radio architecture, users can run the complete receive and transmit channels entirely in software, except for very few hardware engines that simply don’t make sense in software, he said. To accompany this and to allow for advanced power management, CEVA recently released a software development kit that includes advanced power management. Looking ahead, Briman believes there will be fully programmable communications units on SoCs.

CEVA isn’t the only company in the DSP space to see this trend.

“Many baseband designs particularly, when they are operating on complex protocols and care a lot about energy have moved to neither completely hard-wired—because that would be too fragile or intolerant of inevitable corrections and improvements—nor completely general-purpose, because a general-purpose processor is generally much less energy-efficient than something that is more specific to the task at hand,” observed Chris Rowen, CTO at Tensilica. “Especially in low-power baseband processing, we’re seeing more and more optimization of programmable engines to do this, where the baseband subsystem might include 6 or 8 or 10 different cores that are programmable. Some of these still may be fairly general-purpose, because you may say in this function though there’s a wide variety of different tasks that I need to do on the data and it is more energy efficient for me to have one that is shared among these different, diverse functions than to have one piece of hardware for every single function. That would make it too big. Having a programmable solution can in some cases also make it a smaller solution. In general, small is good for energy.”

Tensilica offers a range of DSP cores. It also allows users to build their own customized dataplane processors.

Power Bits: The Price Of Power

Thursday, January 12th, 2012

By Ed Sperling
The Consumer Electronics Show used to be about cool gadgets and really fast performance. Now it’s about really cool gadgets—the kind that use less energy.

Witness Intel’s big announcements at CES this year. The company showed off its Medfield processor aimed at the tablet market, and it entered a multi-year, multi-device relationship with Motorola based on Atom processors and Android. It even made a big splash about the ultrabook market, which is important now that the netbook market has largely evaporated. At the heart of all of this stuff is lower leakage—remember the finFET—less energy consumption, and reasonable performance.

Texas Instruments made a big push into the low-power Bluetooth market with a low-energy SoC it claims uses 33% less power. The company claims the chip will enable low-power sensors that can operate for more than a year on a coin-cell battery.

Even Nvidia, which has always been about raw performance, is now pitching power conservation through its fifth core—a lower performance, lower-power addition to its four-core Tegra 3 chip. In many ways, this does on a single chip implementation what laptop makers are doing with multiple chip implementations.

While each of these moves separately might be a market test, collectively they speak about a much more important trend in design. Making the battery last is now every bit as important to a design as area, performance, features and cost.

Power Bits: Hidden Cores

Friday, September 23rd, 2011

Nvidia has an interesting surprise. Its upcoming four-core Tegra GPU actually has five cores. The extra “companion” core will be used for less-compute-intensive tasks to save on battery and includes an ARM Cortex-A9.

This is a new idea for a processor company, whether it’s a GPU or a CPU. It’s not a new idea for a systems company. Depending on how you define the system, SoC makers have been doing this for the better part of a decade and Dell has been offering similar approaches in its laptops for years.

But what’s intriguing here is that Nvidia is basically turning the GPU into an SoC, and if you had mentioned that to Nvidia five years ago its executives probably would have stared at you like you were from another planet. But given Intel’s push into the SoC world, this is no longer such a foreign concept. Nvidia has just released a white paper on the subject.

One of the interesting side notes in that paper is a hint at a basic flaw in Android 3.x, which appears to suffer from the same limitations as more mature OSes such as Windows and OS X. Android supports muiltiprocessing, but it assumes all cores are created equal. They are, but they shouldn’t be if power is an issue, which raises questions about what exactly general-purpose processors and operating systems will be used for—or limited to—in the future.

The approach that Nvidia has come up with is variable SMP, meaning cores get used as needed and tasks are split depending on where they can run most efficiently. It doesn’t make sense, for example, to do background maintenance on a GPU, while it also doesn’t make sense to do high-performance tasks on an A9. Efficiency is now the driver, and we are simply at the starting point for re-engineering just about everything.

–Ed Sperling

The Missing Pieces In Power Modeling—And Who’s Going To Provide Them

Thursday, February 10th, 2011

By Ed Sperling
The push to develop power models is growing at each node, and at 22nm it will be virtually impossible to proceed without one or more models for power.

Providing these kind of models is easier said than done, however. Creating an accurate power model requires accurate data from all the other pieces on a chip that potentially can affect the power. That includes how third-party IP is actually used, the interaction of multiple states, and even how software utilizes a processor.

Consider, for example, a virtualization layer that is added into a consumer device—an approach now under widespread consideration among device manufacturers because not all of the functions can take advantage of multiple cores. At the architectural level this makes perfect sense because virtualization simultaneously maximizes performance and utilization, which is a winning formula for efficiency. The problem is that using more cores also uses more energy, and the distribution of average use may vary greatly depending on applications or the interaction of applications. Running multiple games, for example, could drain a battery in a fraction of the time it would normally last for a voice call or playing music. And multitasking can greatly accelerate battery drain.

That’s only part of the issue, though. Higher utilization generates more heat in the form of dynamic and static leakage current. The more functions in use, the greater the dynamic current (or switching current). That can affect everything from signal integrity to the ability of memory to function properly to the overall lifespan of a device. And it can make modeling extremely difficult.

“This is a function of the operating system, or whatever software layer you’re using,” said Rob Aitken, an ARM fellow. “You determine the wake-up time and if it’s supposed to shut down different cores. But you can’t power it up right away because the IR drop would be too large, so you have to power up slowly. That means you have to model a speed limit on how quickly it wakes up.”

The challenges grow as more voltages are added for different CPUs. “If you’re operation a CPU at one voltage and the next at a different voltage you get an IR drop across the buses,” said Aitken.

New tools
Most of the large chipmakers have developed their own power models, which are specific to their particular designs. This isn’t something many chipmakers see as a core competency, however, which is why a number of EDA companies have put stakes into this market.

One of the most ambitious efforts comes from Apache Design Solutions, which has created a chip power-modeling tool. It’s an important start, but the accuracy depends on a lot of other factors beyond Apache’s control. That explains why Apache is working with the GSA to create some standards in the IP world.

Startup Parallel Engines is providing details about the available information on power, as well, for about 12,000 pieces of IP. But the accuracy of that information varies, in part, depending on how it is used.

“The power model of a chip needs to include accurate characterization of the multiple IPs that are included in the design,” said Dian Yang, Apache’s general manager. “But if those vendors supplying the IP do not give enough details about its power parameters and behavior, the resulting model will not be very accurate. Also, an accurate model needs to know things like the impedance of the die. But a simple power number based on an average estimation does not tell you that. You need a model that is based on transient analysis to address the dynamic behavior and the true impedance of the die.”

All three of the largest EDA vendors have worked to build power intent models, which help greatly on the functional verification side. Both Synopsys and Mentor Graphics back the Unified Power Format, while Cadence backs the Common Power Format. There has been work to bridge those two specifications by major standards organizations such as Si2 and Accellera. But no matter how much the EDA vendors and standards organizations insist that those differences are easy to bridge, that’s not the experience of chip companies.

“I have major issues with these standards,” said Sunil Malkani, director of IC design engineering for the GPS group at Broadcom. “The standards for power intent don’t work together, and sometimes the previous versions of a those standards don’t work with the current standard.”

He’s not alone in that viewpoint. John Busco, senior manager for design implementation at Nvidia, said the very existence of competing standards defeats the purpose of having them in the first place.

“I’m a little more forgiving when the standards don’t do everything you want them to do,” he said. “My pet peeve is dueling standards like CPF and UPF.”

While EDA vendors publicly don’t like to challenge their customers or potential customers, they say privately that more often it’s the fault of the IP and the way it’s being used than the power intent models themselves. “The user can capture the intended behavior of the design already, and if they add a few more lines of code involving the IP they can make sure the power intent is captured, too,” said one EDA insider.

Mixed models
The power intent specifications are particularly important in the verification stage, which remains the most time-consuming part of the design process. Those design intent specs are integrated with the power models, allowing engineers to map the power limits of the chip and the safe parameters for operation. But in the IP world, and even when it comes to reusing blocks and subsystems, there are not always power models available. At that point, the best that can be hoped for is that the existing models are power-aware.

“The biggest problem our customers are impacted by is legacy models,” said Prapanna Tiwari, CAE manager at Synopsys. “They were created when low power was not a concern, and the models don’t comprehend voltage. The second problem is that even if they want to create a power-aware model, they can’t do the entire power network in Verilog and hook it up to every power model that is being created.”

Limiting choices
Another major problem is the sheer number of choices that are available to designers and architects of these chips. The number of variables increases with each new process node, as well as the proximity effects of other components in an SoC, packaging, what software is being used and how it is being used, multiple cores, multiple states, multiple voltages and ultimately 3D stacking. Add to that multiple IP options and the effects on power models become overwhelming.

“We may well see standards for limits on the number of power models that are available,” said ARM’s Aitken. “If you look at the 1801 standard (UPF 2.0), there are certain things that are legal in it and certain things that aren’t. This could well be the direction.”

That doesn’t mean having a menu of choices will make SoC development any easier, but at least it would limit the number of variables that engineers have to wrestle every time they decide to integrate third-party IP or re-use their own IP. Still, there are a lot of changes to be made before even this step happens. As with all SoC engineering, nothing is guaranteed and not everything is predictable.

Power Bits: Jan. 7

Friday, January 7th, 2011

By Ed Sperling
Microsoft will develop its next version of Windows for AMD, Microsoft and ARM SoCs. The emphasis is on SoCs, and the focus of SoCs has been on two things: power and the reusability of existing and commercially developed IP.

This is an interesting challenge for Microsoft, as well as for Intel, AMD, and ARM’s slew of partners. A general-purpose OS takes a lot more code to create—and it takes a lot more power to use—than a real-time operating system or an embedded version. The result is greatly reduced battery life and more time with a plug in the wall. Even open-source Linux has the same problem, which is why companies such as Mentor Graphics offer a slimmed down embedded version.

The big question for architects of these SoCs will be one of priorities. What takes precedence? Is it processing power? Is it performance? Or is it segregation of more efficient code for individual cores.

Microsoft’s announcement doesn’t address these kinds of issues. Intel has said next to nothing other than a canned statement from Douglas Davis, VP and GM of the tablet group: “…what is so exciting is how our two companies will be able to match a tailored, low-powered operating system with future generations of our popular Intel Atom processors…”

And comments from ARM, and ARM customers Nvidia, Qualcomm and TI have been no more enlightening. This isn’t a simple problem to solve while maintaining backward compatibility with bloated applications developed when power efficiency were far less critical than ease of use and connectivity. And it’s not one that anyone is likely to be talking about for at least a year or more. But when they finally do start talking, it will be very interesting to hear how these companies will position Windows and its very large code base.

Power Or Performance?

Thursday, June 10th, 2010

By Pallab Chatterjee
Most microprocessors have shifted to new small geometry processes in order to be the most efficient at power and high performance. However there is always a trade-off between power, performance and area (PPA) for semiconductors, and this is especially relevant for processors. In the current design space, processors are created as general-purpose products, but they are generally put into user applications that need to be optimized for either power or performance.

The main CPU processors, such as Intel’s iX series, AMD’s Phenom II series, and Nvidia’s video GPU products are routinely not operated at their standard performance specifications. They are either over-clocked or operated at alternate cores voltages in the end-user applications to increase performance and data throughput. Because the processors are operated in a non-standard condition, the design requirements have to include acceptable limits for these additional modes of operation. The chip cores can either be run at a higher voltage—up to 50% more than the standard voltage. The main clock rate for the chips, the core master clock, may be as high as 50% faster than the nominal clock frequency. To support all of the other functions such as thermal management, I/O and memory interface, and the standard bus handshake, the chips have to have additional control logic to support operation at different performance specifications.

Nvidia’s GPUs support additional power supplies modes and connections as standard. The nominal core voltage is 1.2V, and can be increased up to 1.4V. This configuration alone does not maximize the performance. Additional adjustment of the over-clocking of key portions of the chip need to be performed both with and without the voltage adjustment. This over-clocking needed to be balanced for which portions of the chip get the performance increase so the design does not overrun the local memory or the bus interface and introduce wait states. When these performance changes are made, they are a static change thath affects the overall configuration of the graphics board and fan.

Parameters that can be adjusted include: the FSB, memory bus, AGP bus, PCI-E bus, GPU core clock, GPU memory bus, memory timing registers, and hardware-specific performance tuning registers. As these changes affect the dynamic power of the board, fan and cooling controls are included to help keep the design at the nominal die operating temperature. The higher-performance operation can increase the die temperature by as much as 20C if upgraded cooling is not applied. Due to the complexity of the performance enhancement, the voltage scaling and clock scaling are no longer done by just putting in a different regulator and a different crystal.

To control these changes and make sure that the chip still operates in a safe design area, Nvidia has produced a control software program nTune for end user to adjust these parameters.

General purpose CPU processors have a few more data dependencies than GPUs, but have the same customer performance issues. Since CPUs were introduced people have been pushing the performance aspect of the PPA tradeoff. Just like GPUs, you can adjust the core voltage, and also over-clock portions of the chip. Unlike the GPU, the setting are not static and do not produce the same results under all data conditions. For this reason, the higher performance processors now have automated algorithms for performance improvement based on the data set.

For the Intel processors this is part of the “Turbo Mode,” which does an automatic over-clocking for the duration of the processor operations that need the higher performance. The power envelope for the processor design, including the thermal management, has to take into account these dynamic over-clock modes in addition to traditional systematic over-clocking. Unlike most SOCs, processor designs and most multi-core embedded designs have data-dependent timing and performance characteristics as well as user adjustable applications ranges.

Power Bits

Thursday, April 8th, 2010

By Ed Sperling
Nvidia is jumping into a slew of non-traditional low-power market with its Tegra 2 chips. Given this is a combination of ARM Cortex-A9 cores and GPUs, it’s an interesting play across a variety of consumer markets than the computer graphics market that Nvidia grew out of. This draws battle lines against a slew of new competitors ranging from Intel (in the non-GPU areas) to Freescale and Texas Instruments. The key differentiators: Speed, price and battery life.

While consumers are very aware of just how much battery life they get, that arguably pales compared to what’s going on in medical and industrial equipment. If your portable medical device runs out of battery it can be life-threatening. And if your industrial control is sitting in a place where you can’t easily replace a battery, it can bring down an entire assembly line until it’s fixed. Analog Devices seems to be very aware of this with its new line of SHARC processors. You’d guess that others can’t be far behind.

You’d probably guess right, too. STMicroelectronics introduced its own low-power motor controls for everything from air-conditioners to washing machines. The company also has come out with evaluation platforms to let potential customers simulate its chips in action.

And if you’re wondering why companies are paying so much attention to LED lighting, check out the specs on a new GE bulb—9 watts for the equivalent of a 40-watt incandescent bulb.  That may explain why many local governments are ripping out the sodium vapor streetlights and replacing them with LEDs.

Low-Power Architectures Go Mainstream

Thursday, January 14th, 2010

By Pallab Chatterjee
Until recently, low power engineering has been defined by the automated use of EDA tools in the design flow to help cut back on peak dynamic power. The new generation of mobile and video products has forced a change in that methodology.

There are two other fast rising architectural approaches. The first is multicore, which is prevalent in new product introductions from Nvidia, Samsung SLSI, Imagination Technology, NetlogicMicro, Broadcom, and Qualcomm. To address the usability specs required by e-readers, mobile Internet devices and other mobile information products, a new compute architecture was needed that did not just rely on “function disabling” as a power reduction technique. All of these companies introduced designs that are focused on multicore architectures, where there is complete functionality available at all times even though the process has been optimized for low power.

This low power optimization has to do with custom library design creation, modification of internal clocking schemes, datapath and buffer optimization, memory segmentation and placement, and most importantly dynamic control of the design’s power use and speed based on the data content of the information being processed on a per-packet basis. This re-architecture of products was the key enhancement with the new dual Cortex Nvidia Tegra, which is targeted to e-readers and tablet PCs, as well as the high-performance Alchemy multicore and multithreaded processors for automotive and navigation applications, and the many new video and communications appliances from Broadcom and Qualcomm.

The basis for most of these systems are ARM processors cores (A8 or A9 primarily) or MIPS cores. This shift has allowed both a performance increase in the end systems as well as a nearly doubling of the operating battery life.

The second prevalent low-power methodology is the segmentation of design to a CPU and a GPU rather than a single compute engine. While the initial impression is, this takes more power, the GPU is actually more power-efficient on graphics and some video data than the CPU, and on general use functions, the CPU is more power-efficient than the GPU. For most of the smart phones and media processing chips, this approach has replaced bigger single-processor cores with clock-gating and multi-voltage device process solutions.

These architectural changes were implemented to address both the data dependence of the power use and the yield-process variability of sub-wavelength manufacturing. As most of the applications have a very thin and small form factor, they are bound by a fixed or diminishing power envelope. To address the longer term of operation the components can lower the operating voltage, but this does not take into account the associated reduction in performance in the power envelope that is associated with it. In order to address this aspect of design, the mobile handset and mobile computing requirements have driven to the smallest geometry process flows available.

The utilization of these processes (45nm and 40nm, currently) requires restricted design rules, restricted topologies and limited device size diversity to yield well. These designs are optimized with new RTL and physical libraries, new floor plans, and power routing to highlight the data path symmetry that is required by the data sets being processed. Examples of this are new 3dmedia processor in 40nm by Samsung for mobile phones that utilize the IMG Tech 3D video and graphics engine and a high-performance ultra low power ARM CPU.

The distributed multicore approach also has been utilized in high performance for lower power products. AMD/ATI introduced the 5970 Radeon graphics card at the Consumer Electronics Show. The card has two GPUs and is a Direct X11 product with more than 4.6TFlops of peak performance. The restructuring of the device/cell library, its reliance on proven 40nm bulk CMOS processing and the use of GDDR5 memory allows the product to operate with a peak power of about 300 watts but only requires 51 watts for nominal operation. The design was optimized for power and a data control flow to support the 3200 parallel stream processors and the 160 texture units. Dynamic power is managed based on how many streams and texture units are needed at any time based on the contents of the data that being processed on any given cycle.

Most of these new systems are targeting use of Samsung’s low-power DDR3 memory, which operates at 1.3v vs. 1.5 volts and offers higher densities than DDR2. These higher-density, low power solutions can provide in excess of 35% overall power footprint reduction for the design, if used with 32nm low-power flash memories in SSD applications rather than rotating media.

The takeaway from CES this year is that architectural engineering and new firmware control methods are now seen as essential to address the functional requirements of the new mobile communication and processing platforms. This is an intelligent shift from recent years, when only feature size reduction and blind tool-based selection of power gating and power routing were in vogue.

First Down On The 40nm Line

Tuesday, June 30th, 2009

The race to 40nm is over. Some chipmakers are already there, taping out designs and implementing IP that has already been qualified at the 40nm process.

When exactly volume production begins and when yields improve is a matter of conjecture. TSMC so far is the only major foundry actively using the 40nm process, which is a half-node beyond 45nm. But the Common Platform already has briefed analysts and customers on its 40nm process, even though most of its work is at 45nm, and the Global Foundry—the AMD spinoff—has 40nm ready to go if there is customer demand.

A side benefit to consumers—and a big headache for design engineers—is that the power envelope continues to shrink with the line-widths. Low power is now standard in every design, which puts pressure on all IP vendors to create low-power versions at least concurrently with their newly qualified IP, if not first—or to make all versions low power. In the past, low-power versions typically trailed initial rollouts by 6 to 18 months.

And while that doesn’t mean all pieces of an SoC design need to be manufactured using a 40nm process—non-volatile memory, for example, is still at least a node behind—it does mean that research is well underway and on track for 32/28nm and that 40nm appears to be a relatively stable manufacturing process.

AMD, with its ATI line, and Nvidia both have 40nm versions of their latest graphics processors, which typically run at the leading edge of Moore’s Law because there is far greater potential for using more cores with existing software than many other chips. Video, in particular, is one of the easier applications to write for multiple cores because graphics rendering can be parsed into discrete units.

Low power everywhere

The power envelope in a more densely-packed piece of silicon has to be significantly lower, however. Signal integrity is a growing problem, according to design engineers, in part because of the density and the amount of current moving through the wires. Higher density also opens up real estate on a single chip for more functions that previously were on multiple chips or even multiple devices.

All of that points to lowering power wherever possible. And it means that to be successful in the market, low power design is a must. Virage Logic, which makes a variety of memory and logic IP, saw the trend clearly at 65nm when it incorporated low-power options into all of its IP instead of offering a separate low-power version.

“At 40 nanometers, if you want to create a new chip it has to be low power,” said Brani Buric, Virage’s executive vice president of marketing and sales. “We used to have high-density, high-speed and low-power versions of our IP. At 40nm, there are no separate low power products. There is a full set of low power features in both our high-density and high-speed IP, whether that’s memories or logic.”

AMD’s graphics processor group rolled out its first product at 40nm this spring. Stan Ossias, director of product management in AMD’s global/discrete graphics unit, said the bulk of the company’s work is still at 55nm and the company got a huge performance gain by re-architecting its 55nm chips.

“A lot of what we do has to do with predicting the readiness of the process at any time,” said Ossias. “We capitalize on the IP that’s available and the design he have to maximize our competitiveness. Last year, we had the choice of going to 40nm using the same architecture, but we thought we could do a better job of reaching our performance goals by redesigning the architecture. We didn’t feel the 40nm process was ready.”

That approach is one that is becoming more common among companies that typically hopped from one process node to the next in the past. The complexity of getting to the next node, along with the rising costs and uncertainties about manufacturability, yield and the IP needed in a design—not all IP available at 40nm has been proven in silicon yet—makes each new process node an increasing risk, and one that is no longer just an automatic decision.

At least part of the risk assessment also has to do with power consumption. Each new node also requires reducing the power consumption, which involves a litany of design tricks ranging from power gating for active power to utilizing power islands for static leakage, different gate structures and a variety of exotic insulation materials.

“Power is one of the fundamental areas we think about with technology evolution,” said Ossias. “Every time we shrink the process, we have to put more and more effort into decreasing power. That involves not just the individual device, but how that device interoperates with other devices. It’s a big consideration.”

40 vs. 45nm

Even moving from 45nm to 40nm is raising some questions. The foundry business is extremely competitive and having the next process used to be a competitive advantage, but so far only TSMC is actively pushing 40nm. The foundry told analysts that it opted for 40nm instead of 45nm because the process could be tuned better for device performance.

Joanne Itow, managing director of manufacturing at Semico Research, said the number of half nodes is exploding. She said that gives both foundries and companies a chance to firm up the processes and move more gradually to the next full node. The Common Platform, for example, is working on 28nm, which is the half node between 32nm and 22nm.

Global Foundries, which is the AMD spinoff, will work with customers for a specific implementation at 40nm or refine its bulk 45nm process, according to spokesman Jon Carvill. But he said the next step under development is a 32nm and 28nm bulk CMOS process.

Still, now that the foundries have reached the node and are working on the next one, the question remains of just how many chipmakers will move to the next half node and how quickly. There is a lot of conjecture now that the pieces are falling into place for 40nm production, but so far there are no definitive answers.

On, Off and Mostly Off

Friday, March 13th, 2009

By Ed Sperling

System-on-chip architecture has always been about getting the most performance out of a device, and the basic premise is that when you turn on a device it is always on.

That approach has been challenged over the past few years with a fundamental shift toward more of the design being in the ‘off’ position. Aside from reversing decades of engineering practices and assumptions, that accomplishes a couple of very significant things.

First of all, with static leakage a persistent issue in all devices at 90nm and below, the simplest thing to do from the standpoint of the device’s power budget is to turn parts of a chip completely off. That has become the norm in most designs, which is why the number of power domains is growing. Some use different voltages, some are turned off completely when not in use, and still others are reduced to various levels of standby, depending upon how quickly they need to return to a full “on” position. All of this saves battery life in handheld devices, and it saves power in large racks of servers in data centers.

From an architectural standpoint, the key concern has been prioritization of function and what is most important to the consumer. In a smart phone, for example, the phone must be able to receive a call at all times while data needs to be uploaded regularly but not in place of a phone call. And a camera can be switched off almost all the time. In a television or computer, almost all functions are on at all times, but in a more acceptable state. The long delay in booting up a computer from scratch or waiting for a television to warm up was considered unacceptable by consumers so a standby mode was added, basically giving priority to their time while reducing energy consumption.

In the future, however, more of the device will move to the off position, regardless of whether it’s a home appliance, a computer in the home or in the corporate enterprise, or a handheld device with limited battery life. Work is underway to develop intelligent devices that reside inside plugs so that once devices are fully charged they no longer draw current.

Ferroelectric memory (FeRAM) is another option in devices. The construction works the same way as DRAM, but it uses a ferroelectric layer rather than a dielectric one. The advantage is lower power draw, higher speed and more write-erase cycles. So far, cost has been a deterrent, but with power now at a premium in designs, experts believe there is some hope that FeRAM could grow as part of an overall low-power design.

The more immediate solution, however, is multiple power islands. Bhanu Kapoor, founder of Mimasic, a low-power design services company. The problem comes when you turn those islands on and off.

“It’s not hard to imagine a situation where you go from ‘standby’ to ‘on’ and then to a large portion of the chip being ‘on,’ said Kapoor. “That can lead to voltage spikes on the device, and it gets worse as you move to many-core computing where you have a large number of processing cores.”

He noted that Nvidia is developing a 512-core graphics chip that is highly parallel with cores divided into groups of 24—a many-core approach as differentiated from a multicore approach. That could create as many as 30 power islands, however, and he said each of those islands has to be sequenced to avoid huge power spikes. From a design standpoint, that is no simple task.