Posts Tagged ‘Software’

User Perspective: Hardware-Software Co-Design

Thursday, October 7th, 2010

By Ann Steffora Mutschler
With software teams today twice as large as hardware teams for any given complex SoC project, there is no doubt it is an ideal time to agree on the best way for these worlds to intersect. And even though the semiconductor industry has been actively discussing hardware-software co-design for at least a decade a mainstream solution has yet to be commercialized.

Progress is being made, however. Philippe Magarshack, general manager of central CAD and design solutions and design enablement director at STMicroelectronics, shared his thoughts with Low-Power Engineering.

Magarshack pointed out that ST has been leading the methodology paradigm shift from having a sequential set of activities where hardware is designed first, and after silicon is obtained then software development would be started. Now, he said, ST works in a much more parallel fashion where the IP and SoC hardware are modeled earlier. This allows ST’s hardware engineering team to provide very high-level, typically transaction-level, models to the software team such that the software can start not exactly at the same time as hardware but maybe one quarter behind hardware development starts and literally three or four quarters before silicon.

“The paradigm shift enablement is performed when we proposed to the rest of the industry this level of function called transaction-level modeling–TLM–which is built onto a SystemC syntax. Not only did we propose this new paradigm, but we also structured it initially in a standardization committee body rounding up EDA partners, other system houses and eventually this was transferred into Accellera where it is now in very good hands in terms of evolution,” Magarshack explained.

Another challenge that comes into play here is that IP is very complex and typically a processor core or peripheral IPs have to come equipped with software drivers; engineers developing the IP now need to have this development environment such that it is not only hardware but software. As such, ST is working to standardize the development and verification methodology for IP developers as well as the methodology to stitch together the IPs not only in the hardware world but also the software world, he pointed out.

Low-power challenges
Low-power design brings special challenges when doing hardware/software codesign, Magarshack acknowledged. “That’s definitely a direction that we have been working on for a while—not only in wireless but also in the space of consumer starting from the hardware angle and moving up.”

To address the need for low power, he noted that ST developed process-specific solutions like transistors, capacitors, resistors and power switches that are tuned toward bringing low-power solutions for wireless products. “On top of that we develop hardware IPs that are very much tuned for low-power. For instance specific RAM compilers or analog blocks like PLLs. Also, we eventually build a whole model of the SoC that is not only a functional model but is also a power-aware model. This is at an abstract enough level that we can actually run some software and typically get some indication of what the power consumption is of this particular piece of software on the hardware. By having an elaborate debug methodology we are able to look at where most of the power is consumed in the software and find out what can be done differently either in the software or in hardware to get to better power consumption.”

While not quite yet in production, ST is in prototyping mode in terms of methodology that allows the software developer to at last be aware of the impact of software development before the silicon comes out of the fab.

At a higher level, Magarshack also shared his thoughts on the top three challenges today with hardware-software co-design. “The most difficult challenge I can think of is the fact that culturally, hardware developers don’t function the same way as software developers. That’s really the number one challenge—to put together people that have different ways of working. With software, even after it is released to the customer you can fix a bug in five minutes so bugs are not as important and there is a constant stream of fixes that is possible. With hardware designers, if there is a bug on silicon the cycle time to fix it is three or six months. Just this simple difference brings a whole slew of different methods in different attention to quality. Attention to the tradeoff between quality and innovation and risk is very different. And when you bring those two communities together, we do see that it is a very big challenge.”

He observed that in some sense hardware development is looking more like software because engineers are writing RTL code, and this is very similar software. While this may be true, “at the end of the day you have to freeze the whole thing out to silicon and if there is a single bug this is typically a three to six month cycle time to fix that.”

The next challenge is that because engineers are used to working sequentially, to suddenly have to work in parallel requires addressing a stream of dependencies that are different. “There is back and forth and haggling that happens, so in terms of project management this is a much more complex scenario that you have to manage,” Magarshack explained. “The good consequence is that at the end of the day it saved two or three quarters.”

Finally, looking at tooling, he believes the next frontier after reasonably solving the problem of functional code design and code verification between hardware and software, is definitely power-aware design for low-power simultaneously of hardware and software.

Again, while a prototype solution exists within ST, he reiterated that it is not a fully streamlined tool. Still needed is some standardization and definition of the right format and syntax. To this end, ST is working on an extension of the System C language to enable that.

But it is safe to say it will be another five years until it reaches the designers desktop, Magarshack added.

Estimating Power From Mobile Device Apps

Thursday, September 9th, 2010

By Ann Steffora Mutschler
How do software application developers – even the ones sitting at home on their living room sofas with laptops – measure the power consumption of their application on the target device? This is a big problem today (something that is painfully obvious to owners of iPhones or Blackberries), and it will only get bigger.

Software engineers may think it is not their problem. They can write whatever code they want, then push off the issues to the hardware engineers who, in fact, have limited control.

To be sure, a hardware/software co-design environment is eventually going to be the ‘new frontier’ with models of abstraction used at higher and higher levels so that engineers can emulate certain applications or functions. And, of course, new tools will be needed to take these considerations into account. But from all accounts, those tools may still be years away from the engineers’ workbench, let alone the software development kit of the at-home developer.

Ideally, if high-level models can be created that break through the RTL descriptions of the hardware to the transaction level, hardware information can be captured and brought up to the software applications, whether that includes power consumption, software domains, or the like. Then engineers could see the impact of software and modify hardware accordingly, said Vic Kulkarni, general manager and senior VP of the RTL business unit at Apache Design Solutions. “Today it is the reverse: because you use whatever hardware is available and then software developers they don’t really have knowledge of what that hardware is capable of doing as such.”

Pete Hardee, director of solutions marketing at Cadence Design Systems noted that today’s smart phones, as convergent devices, contain about as much computing power as stand-alone devices had recently. “A smart phone today can easily contain the same processing power as mainstream PCs or laptops had maybe four or five years ago.” They contain video capabilities that would have been set-top boxes just a couple of years ago; high-definition video, and 3- to 5-megapixel cameras. At the same time, while we’ve had enormous leaps in the hardware technology, obviously still following Moore’s Law, the leaps in software productivity have actually outpaced Moore’s Law to make that happen on a mobile device. The thing it hasn’t outpaced is poor old battery technology. So despite all of this going on, we’ve still got lithium-ion batteries. Designers have done a great job to squeeze what they can out of them, but fundamentally we still expect to get through at least a full working day and get home and put the phone on charge.”

Granted, it does depend what you’re doing with the phone, but bottom line is that all of it is under software control. “When you’re analyzing power it’s not just about characterization of the hardware. You have to run with a significant number of system modes that represent the high activity of when I’m busy on all these various applications but also represent the low activity when I’m not busy, and also switching between those system modes so I can work out when it’s worth powering down parts of the device and when it’s not,” he said.

The challenge for many chip companies today is the need to simulate 30 different system modes. In addition, they are painstakingly measuring the bandwidth in all of those modes, in various parts of the chip and working out exactly how the power management system needs to cope: what can be slowed down, what needs to be sped up so it can be shut down for longer. All of these various modes need to be checked out. “Being able to measure the power in response to real system activity running real software becomes a big deal and there are very, very few solutions that can do that,” he said.

The prevalent thinking of today leans towards virtual platforms to do this measurement, but Hardee believes they are too abstract to be able to measure the effects on power. “As soon as you really need to look at the power scheme that is implemented in the hardware then you need to run at an accuracy which is going to slow down a virtual platform.”

To be fair, Cadence’s approach does include virtual platforms through its transaction-level simulators, and integration with the fast processor models from ARM and various other processor models available, but the company stresses its hardware-based emulation system for power-aware simulation.

Shabtay Matalon, ESL market development manager at Mentor Graphics, believes engineers already are familiar with the notion of abstraction—they started by abstracting gates to RTL and now there is an abstraction of RTL functionality at the higher-level writing using SystemC and transaction-level modeling. “People are aware that you can also abstract timing by creating a model that doesn’t contain all the information but has sufficient information to get the notion of timing. What people may not be aware is that we can create a model that can be used by the software engineer that contains an abstraction of power all the way up to ESL or TLM.”

This model associates power with the traffic flowing through these transaction-level models. Once those models get created they can be stitched together, Matalon said. The models can be of peripherals, of processors, or of devices, and can be stitched together to create a platform on which applications software can run.

Virtual platforms are the way to go at the very high-level, agreed Cary Chin, director of technical marketing for Synopsys’ low-power solutions group. “There are some pretty good ways to hook into the software stack through a virtual platform. But I still think that the connection from the virtual platform on down through to high-level RTL is still a little bit broken because there’s a lot of stuff that needs to happen to connect those environments together.”

The big question to answer here, though, is how much we want the software developer to be controlling the hardware directly, he said. It’s basically directly up against the idea of information hiding. “In a software development environment we try to hide things because there are things we can’t actually decide better at high-level versus a low-level. Those concepts come in exactly when you’re spanning software down into the hardware realm, as well, so it’s very hard to tell. You want to write software that’s really transportable between environments and things like that, but if you’re tied into closely to a particular hardware platform it makes that very difficult, as well.”

Educating the software developer
“With all of this, it would still be possible to write bad software that is very inefficient in the way data is used—maybe something that unnecessarily continually refreshes the LCD screen, for instance,” said Hardee. “How people get feedback for that really boils down to the application development kits that are provided by either the phone manufacturer or the network operator (Sprint has an application development network). On phones that use Android, there’s a development system. It would be possible to give people feedback in terms of bad optimization, bad memory usage, etc. in those development kits.”

Part of the solution may be an ecosystem or partnership approach, as well. “The idea of [EDA vendors] at some point partnering with somebody like Apple or Google to really extend their development kits down might actually make as much sense as trying to build stuff up from the hardware side because those guys have a lot of resources and they could actually help a lot in terms of meeting in the middle,” Chin added.

But that still doesn’t solve one of the big issues, which is the great divide that exists between the software and hardware worlds. “The chasm between hardware and software is bigger than the chasm between front-end and back-end design. The two worlds are not really well connected today and ultimately, if you think about it from the software development standpoint, there are different levels of abstraction in some sense that one can think about. There are high-level programming languages like C/C++, and then there is the low-level programming which is assembly code,” noted Will Ruby, senior director of product engineering and applications at Apache Design Solutions.

At least some of this can be dealt in the short term by using models, but some will also require new technology such as smart compilers.

“Assembly is actually closer to hardware but people typically don’t program in assembly unless they are doing embedded programming. Somehow the notion of hardware needs to be transported into a C/C++ or Java-type development environment. That’s where the models come in. We need models to represent the hardware behavior, but I think we would also need something like a smart compiler that can take advantage of some of these hardware hooks and understand that if you’re writing a program for a mobile application, you need to make some tradeoffs during compilation for performance or power consumption. People on the hardware side think about this all the time, but on the software side it’s not easy to do. So compilers may need to evolve in that direction. Compilers need to be hardware-aware and need to understand what hardware is doing,” he concluded.

Experts At The Table: Verification Nightmares

Thursday, May 13th, 2010

By Ed Sperling
Low-Power Engineering sat down with Shabtay Matalon, ESL marketing manager in Mentor Graphics’ Design Creation Division; Bill Neifert, CTO at Carbon Design Systems; Terrill Moore, CEO of MCCI Corp., and Frank Schirrmeister, director of product marketing for system-level solutions at Synopsys. What follows are excerpts of that conversation.

LPE: How important is a high-level model in verification?
Matalon: If you have a reference model that is a TLM and you have a good way to find equivalency between the TLM and RTL, then why not give the TLM to software designers? For many applications TLM without timing will be sufficient for certain timing-critical tests. You also need a TLM to model timing accurately for the approximately timed level. But you can create hundreds or thousands of replicates with a TLM platform that are free. The replication is free—or almost free. For the software guys, that’s a very powerful solution.
Schirrmeister: And that’s the challenge to figure out. We haven’t quite figured out how to do equivalency checking against the TLM.
Neifert: The average SoC has tens to hundreds of blocks. If you’re starting from scratch and want to generate your TLMs, that’s a great approach. But what we’re seeing is that companies are only developing 20% to 40% of this IP internally and the rest they’re getting from outside. Who knows what form that stuff is in.
Moore: Isn’t equivalence checking hard intrinsically hard?
Schirrmeister: Yes, but from a verification perspective everything we do has to add up to less than what we do today. If adding TLMs to your software isn’t helping you to reduce the time you spend on verification, people will be hesitant.
Matalon: Allow me to disagree. First, I’m not talking about using formal methods to validate equivalency between a TLM and RTL. But inherently when you build an OVM environment to validate a block, you need a reference model. What does it mean to do verification of the RTL? It’s a comparison. We always validate RTL by comparison. You can use simulation techniques to say this TLM is functionally equivalent to the RTL that’s getting implemented. If your TLMs allow you to model the registers correctly and things that maybe in the past weren’t done, we can assemble TLMs. It will be the standard practice of every IP provider to provide a TLM 2.0-compatible model. For the re-used IP, which constitutes 80% of the design, I think this opens the door for replication of the TLM.
Schirrmeister: I agree with the equivalence verification. We’re not quite there with IP providers. But the challenge with TLM models, because you don’t have synthesis and formal techniques, it is not just an ordinary part of every design flow. It’s an additional effort. And what happens at the end is someone changes the RTL before tapeout and people don’t keep the TLM models in sync with what ends up being implemented at the end.
Neifert: They should be generating this automatically from the RTL. Then you solve that bottom end.
Matalon: If the RTL has changed without updating the reference model, then you haven’t validated your RTL in the context of the system. It’s all about bridging between the transaction level and the RTL. To change functionality, you change two lines of code.
Schirrmeister: For us on the TLM side, it’s always an investment decision.
Moore: It all comes down to economics. Why do people do something stupid like changing the RTL without changing the model? It’s because they think they’ll make money by doing so.
Schirrmeister: And if you have a set of hundreds of them it’s hard to keep them in sync.

LPE: Let’s talk about economics. Verification used to be 70% of the NRE. Is it going up? Or is it now blurred between what’s verification and what isn’t?
Neifert: It’s getting blurred. It’s as much an integration issue as anything else. You’re obviously spending more money now, but the integration task is taking over some of verification because people are using software to drive some of this. Is it a software budget or a verification budget? I don’t think you can draw that line as definitively anymore because you probably re-use some of that stuff in your software once you verify things work.
Schirrmeister: Verification definitely is going up overall. The question is where it’s done. Hardware verification has gone down and the hardware verification manager is thrilled. But the verification nightmare has shifted to software.
Moore: The classic example is a baseband processor in a cell phone. That processor contains boot code and it has to operate the USB and operate the software during mass production. If that doesn’t work you don’t have a product. And because it’s sitting in the mask, that boot routine has got to be right.
Matalon: Verification doesn’t go down as a whole. How can it go down with increased complexity from multicore and functions that are implementing hardware and software? It really depends on who the verification manager is. If I’m the verification manager and I’m confined to System Verilog and my life is to carve out verification for hardware blocks, my life is easier. Now there are off-the-shelf transactors and you can just add them in. But if you’re the verification manager who has to validate your design is correct and meeting spec at the system level, and also responsible for meeting performance and low power, then the load is not going down. And if you’re not keeping up with advanced methodologies, you will be in trouble.
Moore: And as each node comes along the absolute cost of failure is escalating.
Matalon: If you don’t validate early and catch what you call stupid bugs, or in some cases nasty bugs, you are in trouble. That’s where the shift is happening. The kind of verification people do will shift from the block level to the newer ESL space where there isn’t maturity yet.
Schirrmeister: Verification is never complete. It’s a question of when you are comfortable enough. But the sword of Damocles is always hanging over you. If you mess up the chip, it’s $3 million for a new mask in never-recurring costs. If it’s software, there’s always service pack two. But in the case of Toyota, the impact can be devasting.
Moore: The economics of Moore’s Law were such that shipping fast was imperative.
But it’s not just Toyota. There’s a strong suspicion there have been numerous glitches in drive by wire. If you look into, there are lots of situations where software problems are present. Verification is a hard problem and you have to really set up your workflow so you’re throwing every tool that’s economically justified.
Matalon: I don’t see any tool being taken off the table. It’s all methodology and which tool you use when. Not everyone is using the more innovative technologies. For someone used to waiting for the silicon to come back and then sticking it on a board and validating it, this is a huge transition. Validating by writing a model at the SystemC level and dealing with virtualization is much different. You need to know when to use the tool, use the right tool as early as possible, and take advantage of what you develop earlier during the downstream phase. If you use TLMs early, then you re-use those TLMs when you verify. And if you use silicon validation, which is not going away, then use all the tools that you have used before for reference debugging and running system-level scenarios where you replicate some of the problems you see in silicon by validating your original assumptions. If you have to debug a problem on your silicon, that’s very hard. You can use emulation as a reference for debugging. You can use a transaction-level model with timing and power information to compare what you have received, where you made the mistake and how to fix it.
Neifert: That TLM framework can be used throughout the debug cycle, which is where you get the dollars to justify it. Initially you can look at it as an incremental expense. But when you look at how it scales and you realize you don’t need to generate an independent model for this and an independent model for that, that’s where the real value is.

LPE: What are the bugs that are fatal? Are they power? Design?
Neifert: If you look at the stuff that makes the news, it was the division problem in the Pentium. It was a pure hardware bug. If you applied the verification tools of today, that would have been caught. Today it’s not just hardware. Most engineers think there’s some aspect of software.
Schirrmeister: The fatal bugs are the ones that cannot be corrected in software today and which end careers and are not reported on. Those are the ones you never read about.
Moore: The fear for most companies in our space are the bugs that kill companies. What causes those? Mask spins. And what causes those typically are system-level problems. You go to hook it up to a critical system and it doesn’t work. And it’s down at the RTL level and it’s not accessible because of all the protocols that are running too fast to make it accessible.
Matalon: If I look at a functional bug that can be overcome by software, it’s not a fatal bug. A functional bug that cannot be fixed by software is fatal. But the reality is that if you have a performance issue or a power problem, where does it stem from? It’s probably because you’ve validated hardware in isolation, but not in the context of the software. Those are the things that are fatal and scare users away. We have a customer that designed IP and wanted to find out if it would meet performance in the context of the system. They couldn’t simulate at the gate level so they abstracted to the TLM. There are ways to fix functionality sometimes with software. But I’m not aware of any way to fix a design that hasn’t met performance or power in the context of the software by fixing the software.

Software Becomes The Main Differentiating Factor

Wednesday, June 10th, 2009

By Ed Sperling

Software has always been critical in determining what makes one chip different from another, but for the next couple of process nodes it will take on new significance. Rather than just defining function, it also will be one of the key determinants in performance and function.

Behind this change is a bottleneck in lithography, which generally is not something most design engineers even consider. Design for manufacturing tools are generally as close as they come to the manufacturing side, and for many of them, that’s as close as they ever want to get.

All of that will change at 32nm and 22nm, however. With extreme ultraviolet lithography still not ready for prime time—it most likely won’t be ready until at least 15nm, or maybe beyond—double patterning has emerged as the best alternative for building complex SoCs.

The current lithography technology uses a wavelength of 193nm, compared with EUV, which is 13.5nm. The last time the industry encountered a lithography problem of this magnitude was heading down to the 1 micron process node, when many scientists predicted the end of Moore’s Law.

That never happened, of course. The semiconductor industry consistently has managed to skirt problems by using a variety of tricks. The likely workaround over the next couple nodes will include double patterning of some sort, coupled with some highly restrictive design rules.

Those design rules are a formula for creating chips that can be manufactured with reasonable yield and a minimum of re-spins. But they also make one vendor’s chip look very much like another’s, leaving software as the primary—and in some cases the only—differentiator. And that differentiation applies to how much power a chip consumes, how fast it runs, as well as the look and feel of a device.

“As we get into 32nm, there is no other option but double patterning with immersion,” said Joanne Itow, managing director for manufacturing at Semico Research. “Very restrictive rules are the only way to get to manufacturing. At 22nm, there is not much more we can do without either EUV or e-beam (electron beam) technology.”

That helps explain why Intel bought Wind River last week. If the application software can be written for a thin executable layer, then its performance and energy consumption can be tailored to one or more specifically sized cores.

Texas Instruments has been working on the same approach. Srik Gurrapu, TI’s C5000 product marketing manager, said TI’s open multimedia application platform (OMAP) uses real-time operating systems from Wind River to cut power and improve performance.

“We don’t want a single operating system,” Gurrapu said. “Medical, industrial and commercial products all need different operating systems. There are different ways of achieving power savings.”

How that relationship will change with Intel’s acquisition is uncertain. But the trend toward using software to differentiate power consumption and performance already is well established for some specialized processors and microcontrollers.

Chartered Semiconductor, part of the Common Platform triumvirate with IBM and Samsung, has been working much more closely with ARM in recent months. The reason, once again, is the lithography bottleneck.

“Intel is buying its way into solutions, but for the foundry model things are still disaggregated,” said Walter Ng, vice president of design enablement alliances at Chartered. “But down the road, we’ll be sitting at the same table with ARM as an IP provider, an EDA provider and a software provider.”

Ng called it a “natural progression” of the relationship the Common Platform has established with ARM and other software developers. The Common Platform companies have their ecosystem, and ARM has its own. Ng expects the two to begin merging at future process nodes.

“Everyone says there is a chasm between design and manufacturing, but with ARM included there is no chasm. Design expertise, process and manufacturing are all there. Libraries have gone through two to three architectural reviews. The architecture that ARM is implementing to is beneficial to the process,” he said.

TSMC has been working more closely with its software and IP partners, as well. The company has been imposing restrictive design rules since 55nm, according to Itow.

That makes software all the more interesting for the next few years. So how well can you program in C?

Writing Software For Low-Power Systems

Wednesday, April 15th, 2009

By Ed Sperling

Almost any discussion of software in low power systems these days involves some sort of multicore approach.

That is particularly true at 90nm and below. At 65nm, unless there is a very distinct purpose for a low-power single-core device, it probably is utilizing at least two cores, and at 45nm the numbers can continue to rise, depending upon how many functions the chip is being used for and how important processing power will be.

For developers used to working in the symmetric or asymmetric multiprocessing world, where single-core processors arranged in arrays within the same device and tied together by middleware and very fast connectors, moving everything inside a chip actually makes low-power design simpler. In the SMP or AMP world, it was impossible to turn processors on and off. That’s already standard practice in multicore chips, which is a more controlled environment for running software than the multiprocessing world.

But designing software for multicore devices requires a lot more up-front planning than back-end work-arounds to really save power.

First of all, it’s important to note up front that not all applications can be parallelized to take advantage of multicore, and of those that can very few can be compiled once and scale to more cores as they become available. It’s a great concept, and multicore chip companies like Intel and IBM say great progress is being made, but there’s a whole other group that will counter with, “Don’t count on it.”

Moreover, multiprocessing was optional for applications. At 65nm and below, multicore chips are the norm. If software can’t utilize more than one core, the other cores are useless.

Second, multicore can mean many things. In a system on chip, it typically involves heterogeneous cores. In a processor, the cores are generally homogeneous. Writing software that takes advantage of many cores requires a multiprocessing operating system and applications that can be run in parallel. In an SoC, the software can be divided up by function using everything from a multiprocessing operating system like Linux to real-time operating systems that are written for a very specific function.

“The real trick is that if you break up an application, you have to do it at the modeling level,” says Irv Badr, Rational senior product marketing manager at IBM. “Breaking it at the source-code level is very difficult. If you break it at the modeling level, it’s as simple as pushing a button. You want the ability to move things around by asking ‘What if?’ That is very important. You also need to make sure when you’re modeling that the software isn’t coupled to the hardware. Some hardware can be used by a lot of software.”

More problems, more tools

A number of tools have been created to help migrate existing software to multicore architectures. The most recent is Prism, which is made by Critical Blue. It allows developers to analyze and explore code changes to take advantage of multicore hardware doing everything from dependency analysis to recalculation of the scheduler on multiple cores.

“The software guys didn’t ask for multicore,” said David Stewart, Critical Blue’s CEO. “But the only way we’re going to get more performance is if the software guys react.”

Intel, meanwhile, has created its own programming language to migrate applications to multicore architectures. Known as Ct, the language helps to parallelize applications that can run in parallel. The key to working in this type of environment is understanding the application well enough to know what can be split off and run on multiple cores and what cannot—and how much overhead there is in pulling the pieces back together for the user.

Ct isn’t the first language to attempt to ease the burden of parallelization instead of sequential software development. Software engineers who have been working in the multiprocessing world for awhile say it probably won’t be the last, either.

In Europe, a consortium known as eMuCo, for the embedded Multi-Core Processing for Mobile Communication, is taking a different approach by developing a standard platform for future mobile devices based on multicore architectures. The stated goal is to develop the controller, operating system and application layers. Members include ARM, Infineon, Telelogic, GWT-TUD, as well as four universities.

Promises, promises

If all of this can be made to work, there is enormous upside from both a performance and a low-power perspective. In devices such as a smart phone, for example, cores regularly are put into sleep mode. That can extend the battery life from hours to days, and in some cases even weeks.

Already, work is underway that teams up some unlikely partners. ARM’s Cortex controller is being combined with IBM’s Cell processor, for example, in a 60-core deployment on multiple chips, said IBM’s Badr. He said that in the enterprise, multicore can reduce power consumption by a factor of three, which allows blade servers to run three times as long because they run cooler.

But there’s a catch, too. While there’s money attached to making it work right this time, the problem has been studied for decades without major breakthroughs. The jury is still out on just how many cores is enough and how many is too much, and which software will work in what configuration. But given the realities of physics on a piece of silicon, there will be at least some multicore headaches in every programmer’s future.

Writing Application Software Directly To The Metal

Friday, March 13th, 2009

By Ed Sperling

How necessary is an operating system?

That question would have been considered superfluous a decade ago, possibly even blasphemous and career-limiting. But it now is beginning to surface in low-power discussions, particularly in compute-intensive applications where performance and power are both critical. General-purpose operating systems constantly call on the processor for updates, while software written straight into the metal using Verilog or System C can be written for specific cores.

Highly parallelized applications such as search, particularly in bioinformatics, already are exploring writing applications directly into FPGAs. And heterogeneous cores may give application developers more reason to write to the chip rather than an operating system application programming interface (API).

For application developers, power is as much a balancing act with performance as it is for hardware developers. While classical scaling before 90nm provided both power and performance benefits at each process node, the decision has moved largely to one or the other. For every gain in performance, there has to be a subsequent drop in power somewhere on the chip. Otherwise the clock speed cannot be improved without burning up the chip.

That has prompted software developers to look for different solutions. Even Intel, whose success was built almost entirely on tight integration with operating systems—Windows, Mac OSX and Linux—is looking at utilizing some of the cores in its future chips differently.

“There is broad agreement that we need to be able to represent the ability to do parallelism at the application level and not force everything through the operating system,” said Pat Gelsinger, senior vice president in charge of Intel’s Enterprise Group. “Any time you have a call through the operating system to get a resource—whether it’s a thread or an I/O—your application has gone away for thousands of clock cycles. You want to do that when you need something that only the operating system can give you.”

Typically the operating system acts like a layer of middleware. It makes the connections through its APIs that allow applications like Office to work together so that portions of one application can be dragged and dropped into another. But in highly parallel applications, the interactions are largely within the application rather than with other applications.

“There is an active effort to move some of this parallelism to the application level so the application programmer, given the right tools and libraries, can take advantage of that.” Gelsinger said. “Microsoft has taken steps like that recently with networking and the NPI (network programming interface) layer—moving it into the user space. Use the operating system for what you need it for, but allow parallelism to be more lightweight. Those steps are under way, and they will have great benefit. It started out as the HPC (high-performance computing) community, where they were using tens of thousands of threads.”

IBM is likewise experimenting with a thinner operating system layer for its Power architecture. Brad McCredie, chief architect of the new Power 6 chip and an IBM Fellow, said one of the first examples are hardware accelerators, which are being used to speed up applications.

“We’ve already created an architected layer in the Cell processor,” said McCredie. “It’s not exactly writing software into the metal. We gave the software programmers an architected interface, so we hid some of the messiness of the 100 gigaflop accelerator with a new generalized interface, which is OpenCL. We expect to put in multiple types of accelerators in the future.”

At some point, though, even this approach will run out of steam. McCredie said the debate inside IBM right now is when exactly that point will occur. He believes it will happen at 22nm.

“Eventually we’re going to run out of power on a chip,” he said. “The next way will be to design devices to do fewer and fewer things. That trend will happen. The question is whether we will be able to invent a more specific device that can do 80% of the workloads at less power? If it only does 10%, then no one will write a line of code for it. But if it covers 80%, then it will have much better power/performance.