Experts At The Table: Power Architecture’s Biggest Challenges
By John Blyler
Low-Power Engineering sat down with key members of the Power.Org Power Architecture community to discuss today’s leading processor issues. Power Architecture is the general term that denotes all “Performance Optimization With Enhanced RISC” (POWER), PowerPC and Cell processor technologies. The Power.Org interviews included: Kaveh Massoudian, CTO of Power.org Strategic Alliance at IBM; Vinay Ravuri, vice president and general manager of the processor business unit for Applied Micro (formerly AMCC); Dac Pham, fellow and director of Power Architecture cores and platforms at Freescale; and Christina Rodriguez, director of multicore software at LSI. What follows are excerpts from those interviews.
LPE: This is the era of multicore processors. How is that changing both single- and multi-threaded code developments?
Pham: Single-threaded performance is still important for all of the legacy code that cannot be parallelized. If you have more than about 10% of sequential code in your program, then Amdahl’s Law shows us that anything over eight cores has diminishing value. Still, when you design a symmetrical multicore system, you have to make both the sequential and parallel sides happy. In the networking world, this divides rough into data- and control-plane applications. Switching, such as packet forwarding, is performed in the data plane. Routing of packets is performed in the control plane. The data plane lends itself more to parallelism while the control plane tends to be more sequential.
Ravuri: From a high-level, application-space viewpoint, multicore systems are the future. The industry will not go back to single-core devices. I see potentially multiple threads across multiple cores. For example, consider two cores, where each core could have two threads. This would give you four threads in two cores, which is something new in the embedded space.
Massoudian: In 2000 Power Architecture was the first true multicore platform—dual core on the same die—on the market with the Power4. Today, IBM’s Power7 brought the first eight-core chip into the industry. But multicore itself isn’t as important as multithreading and simultaneous multithreading. Each core in an eight-core Power7 can simultaneously run four independent threads. This means that each core is effectively four cores. Intel’s Nehalem can only handle two simultaneous threads per core. Multithreading introduces many challenges in both the system software (operating-system kernel and device drivers) and higher-level application programs. Multithreading is a challenge for programming in general. Multicore technology just extends the multithreading problem to multiple cores. It is really still the same problem. How do you parallelize software? Algorithm architects and software programmers are all human, which means that they tend to think in a serial fashion. That is why all software tends to be sequential.
LPE: How does Power Architecture enable the implementation of different processor types, from basic embedded to hybrid servers, full servers, and high-end mainframe systems?
Ravuri: Some clarification is needed here, since “server” is a big, generic term. Power Architecture has an embedded instruction-set specification known as Book 3E. Similarly, Book 3X is a server specification used by companies like IBM. It is possible to take the embedded specification and produce a server chip, although that chip would not be strong enough to work as a server. The requirements between an embedded computer and a server are quite different from a power, performance and price perspective. You could not make embedded devices and then say that they would work seamlessly in the server market. Power Architecture companies like Freescale, LSI, and Applied Micro are in the embedded market.
Massoudian: A hybrid server usually refers to a general-purpose computer that has been augmented with accelerator co-processors for different applications ranging from security for encryption and decryption, Java engines, XML parsing, vector instructions, graphics or networking offloaded functions, deep packet inspections, etc. Many things become possible if you process with a dedicated hardware-accelerator engine versus software that runs on a general-purpose processor. This is typically what is meant by a hybrid server. We don’t think that any one processor architecture can really be optimized for every workload. That is why we are proponents of heterogeneity in the data center as well as in computing in general. Power Architecture was a leader in hybrid computing in the early 1990s when Motorola, followed by Freescale, introduced vector unit and co-processors in a multicore approach. In terms of heterogeneity, the Power Architecture-based Cell Broadband Engine was jointly developed by IBM, Toshiba, and Sony. The Cell greatly accelerated multimedia- and vector-processing applications. Recently, IBM announced the world’s first supercomputer to break the petaflop performance mark. This supercomputer combined AMD’s Opteron processor with a Cell, which was an example of the heterogeneity and hybrid type of architecture working together.
Pham: General-purpose processors are used for networking tasks like acceleration. This is one area where Power Architecture has dominated the market. Networking systems are supported by a variety of operating systems (OSs) including proprietary ones like Cisco’s Internetworking OS (IOS), which is used for their routers and switches. A significant design challenge involves matching the right type of performance accelerators to the right applications. While accelerators may present additional coding challenges, they also improve power performance. The most optimized accelerators are hardwired into the chip’s gates by a designer who really understands the target application. A more programmable approach would be software (microcode) written to a small processor engine that can handle micro-programs with a lot of big libraries tailored to a particular type of acceleration. People do that in graphics, security, and packet processing in the wireless baseband architecture.
LPE: Competitors in the processor market are aggressive regarding virtualization. How do you view virtualization in both the server and embedded spaces?
Ravuri: Virtualization is important. Data centers are fully virtualized, which focuses mainly on the server market. However, we are seeing the same virtualization trend in the embedded-appliance space. Virtualization allows you to run multiple OSes, partition resources, and segment-segregated users. There are multiple ways to implement virtualization ranging from software- and hardware-based methods to power-based and pure virtualization. The Power Architecture specification supports virtualization through hypervisor technology.
Rodriguez: We have made the Power Architecture part of our virtual-pipeline platform. The virtual pipeline is a set of on-chip interconnects that links all of the cores. It is a message-passing architecture that provides a logical-layer link and controls the movement of data packets. In essence, it allows designers to have a flexible data pipeline anywhere on the chip. Data packets can be processed by the Power Architecture cores, our processing engines, or hardware accelerators. The idea of the virtual pipeline is a very fast path, so it doesn’t become a gatekeeper between the cores.
LPE: What do you see as the trends for embedded low-power and software design?
Massoudian: In the embedded space, everyone is doing clock and power gating to lower power consumption. Power Architecture is also doing a lot of dynamic power management using voltage and frequency scaling. Software is used to automatically monitor the chip to achieve the best possible performance while using the least amount of power.
Ravuri: There is a huge code base that has been written for various processor architectures. In the consumer market, that code base is written for ARM processors. In the enterprise-telecommunications industry, PowerPC processors are dominant. That code base is a big plus for PowerPC because you have large network companies like Cisco with huge amounts of legacy code written around proprietary OSes. These companies don’t want to disturb that code base. But PowerPC needs to embrace newer trends, such as Google’s Android, which is an open-source OS for cell phones, netbooks, tablets, and other devices. In the future, these trends will migrate to other devices that will become the new growth vectors for the embedded market. To embrace these trends will require a processing architecture that is flexible and supports a growing ecosystem. In the short term, it means the Power Architecture must continue to provide a port for Android to compete with non-Power Architecture processors. By port, I mean an optimized, high-performance hardware platform to run Android. Of course, the compiler and related software-development tools must also be available.
Tags: Applied Micro, Freescale, IBM, LSI, Power.org







