Posts Tagged ‘GPU’

Applications And Low Power

Thursday, May 12th, 2011

By Pallab Chatterjee

As new process technologies are being developed to make devices smaller, they are also driving the operating power lower for the devices and systems.

The goal is to reduce the power requirements for the system and hence increase the functional life on a single battery charge. This concept has worked in the semiconductor industry from 10-micron processes down to the 65nm process node. Below 65nm, the rules are changing, not because of the process that is available to manufacture the device but what people are doing with the devices.

At the 40nm node and below, the billion-transistor chips are possible. These are not practical at the larger geometries for a number of reasons all related to manufacturability. The trouble with a billion-transistor chip is it does a lot of different stuff and has a lot of computing capability. To support this with a reasonable power factor, the design will support multiple power grids and power controls, so blocks can be turned on and off as needed. This technique helps extend battery life by only running what is needed for a given operation. This is the default direction for general-purpose processor cores and memories in the industry.

For designs that do not need 1 billion devices, a proportionally large chip can be designed for the function at this node because the extra devices have a very small incremental cost. The trouble with adding extra devices such as display drivers, graphics cores and accelerators, and connectivity blocks, is these devices are hard to turn off and save power. It is generally not acceptable to turn off the I/Os and connectivity of an appliance if it will be receiving or transmitting data. There is a low power spec (802.3az) that describes how to power down connectivity, but it requires both sides of the connection to work. These designs also are hampered by the applications that are being run on them in order to balance the power.

If you think of a tablet product, when it is sending or receiving WiFi/3G, the display does not have to be active and can be powered down. However, the connectivity block must be on. But if the content that is being received is streaming video, then the full function of the tablet has to be on to display the graphics, fill the buffers, and handle the connectivity. This changes the battery use model, as it is typically designed for low-duty cycle applications. Watching a streaming video movie does not constitute a low duty cycle application.

Another driver of the power factor is how much resolution is needed. Modern DSLRs routinely operate in the 10+MP still-image market and video is now almost always 1080p. These large datasets and extended streaming times task the low power design, and the chips are not optimized for the high-performance blocks (GPU and NIC) being at 100% duty cycle.

With the release of general-purpose cores for CPUs and GPGPUs, the low-power implementation cannot be limited to bus architectures and power-down blocks. To effectively support the design in a system (smartphone, tablet, netbook) the application has to be considered (gaming, streaming media, office functions, e-mail, web surfing) and the power profile for each application mode optimized. It is this optimization and the steady-state performance, displaying an e-book or streaming a video, that currently drives the power partitioning and the power management methodology that should be used. The verification world now has consider applications above the OS and use models with sensors/MEMS as the main power handling constraints, not just the “How many devices can I put in the box” mentality that has existed since the mid ’70s.

Low-Power Architectures Go Mainstream

Thursday, January 14th, 2010

By Pallab Chatterjee
Until recently, low power engineering has been defined by the automated use of EDA tools in the design flow to help cut back on peak dynamic power. The new generation of mobile and video products has forced a change in that methodology.

There are two other fast rising architectural approaches. The first is multicore, which is prevalent in new product introductions from Nvidia, Samsung SLSI, Imagination Technology, NetlogicMicro, Broadcom, and Qualcomm. To address the usability specs required by e-readers, mobile Internet devices and other mobile information products, a new compute architecture was needed that did not just rely on “function disabling” as a power reduction technique. All of these companies introduced designs that are focused on multicore architectures, where there is complete functionality available at all times even though the process has been optimized for low power.

This low power optimization has to do with custom library design creation, modification of internal clocking schemes, datapath and buffer optimization, memory segmentation and placement, and most importantly dynamic control of the design’s power use and speed based on the data content of the information being processed on a per-packet basis. This re-architecture of products was the key enhancement with the new dual Cortex Nvidia Tegra, which is targeted to e-readers and tablet PCs, as well as the high-performance Alchemy multicore and multithreaded processors for automotive and navigation applications, and the many new video and communications appliances from Broadcom and Qualcomm.

The basis for most of these systems are ARM processors cores (A8 or A9 primarily) or MIPS cores. This shift has allowed both a performance increase in the end systems as well as a nearly doubling of the operating battery life.

The second prevalent low-power methodology is the segmentation of design to a CPU and a GPU rather than a single compute engine. While the initial impression is, this takes more power, the GPU is actually more power-efficient on graphics and some video data than the CPU, and on general use functions, the CPU is more power-efficient than the GPU. For most of the smart phones and media processing chips, this approach has replaced bigger single-processor cores with clock-gating and multi-voltage device process solutions.

These architectural changes were implemented to address both the data dependence of the power use and the yield-process variability of sub-wavelength manufacturing. As most of the applications have a very thin and small form factor, they are bound by a fixed or diminishing power envelope. To address the longer term of operation the components can lower the operating voltage, but this does not take into account the associated reduction in performance in the power envelope that is associated with it. In order to address this aspect of design, the mobile handset and mobile computing requirements have driven to the smallest geometry process flows available.

The utilization of these processes (45nm and 40nm, currently) requires restricted design rules, restricted topologies and limited device size diversity to yield well. These designs are optimized with new RTL and physical libraries, new floor plans, and power routing to highlight the data path symmetry that is required by the data sets being processed. Examples of this are new 3dmedia processor in 40nm by Samsung for mobile phones that utilize the IMG Tech 3D video and graphics engine and a high-performance ultra low power ARM CPU.

The distributed multicore approach also has been utilized in high performance for lower power products. AMD/ATI introduced the 5970 Radeon graphics card at the Consumer Electronics Show. The card has two GPUs and is a Direct X11 product with more than 4.6TFlops of peak performance. The restructuring of the device/cell library, its reliance on proven 40nm bulk CMOS processing and the use of GDDR5 memory allows the product to operate with a peak power of about 300 watts but only requires 51 watts for nominal operation. The design was optimized for power and a data control flow to support the 3200 parallel stream processors and the 160 texture units. Dynamic power is managed based on how many streams and texture units are needed at any time based on the contents of the data that being processed on any given cycle.

Most of these new systems are targeting use of Samsung’s low-power DDR3 memory, which operates at 1.3v vs. 1.5 volts and offers higher densities than DDR2. These higher-density, low power solutions can provide in excess of 35% overall power footprint reduction for the design, if used with 32nm low-power flash memories in SSD applications rather than rotating media.

The takeaway from CES this year is that architectural engineering and new firmware control methods are now seen as essential to address the functional requirements of the new mobile communication and processing platforms. This is an intelligent shift from recent years, when only feature size reduction and blind tool-based selection of power gating and power routing were in vogue.