Multicore Programming: The Next Frontier?

By Ed Sperling

From a distance it looks like a game of hot potato. But this version is played by hardware and software engineers, who normally don’t have much to do with each other.

The hardware engineers say you can’t get any more performance out of a single core on a chip without cooking it, so they’ve added more cores and tossed the problem over the wall to the software engineers. But the software engineers say that while they can thread functions across cores, there are very few applications that actually will scale to use more cores without completely rewriting every software application at each new process node.

Companies such as Intel and IBM and most of the computer science departments at major universities are feverishly working on this problem. Unfortunately, they still haven’t come up with a solution, and the reason isn’t because this is a new problem. It’s been festering for four decades, and so far there isn’t a breakthrough. Programmers think serially, not in parallel, and there is no magic bullet to automate the programming.

David Patterson, the Pardee Professor of Computer Science at UC Berkeley and head of the parallelization effort there, calls multicore programming “the El Dorado of computer science” and refers to parallel computing as “an open research project.”

That may prove to be a polite assessment of the problem. More to the point, if there’s no breakthrough in software there will be no compelling reasons to upgrade computers or even handheld devices such as cell phones. Without performance upgrades, sales cycles will slip and the tech boom of the past 60 years either will begin slowing at an alarming pace or there will be massive shifts in how technology is sold and used.

“There is no killer multiprocessor,” Patterson says. “But programmers needing more performance have no choice except parallel processing.”

Where it works, where it doesn’t

That doesn’t mean parallel processing doesn’t work. Some applications adapt exceptionally well to multiple cores. In the commercial enterprise, databases and search functionality, for example, are showcases for what can be done with multiple cores. The individual tasks can be parsed onto as many cores or processors as are available. Often referred to as embarrassingly parallel tasks, these kinds of applications can scale almost infinitely with minimal tweaking of the application.

The same is true in the simulation world. Mentor Graphics last week introduced a parallel version of its Olympus SoC timing analysis and optimization engine that shows very little performance reduction when parsed onto different cores. The result is that two cores offers almost double the performance of a single core, and four cores roughly quadruples it.

“The problem is parsing into independent tasks and then bringing it back together again,” said Sudhakar Jilla, director of marketing in Mentor’s place and route group. To no small extent, that means understanding the application and its interaction with the processor so well that it can be broken down into distinct processes.

The same will never be true for most personal productivity applications. While you might be able to split some functions off of an Excel spreadsheet or Microsoft Word to take advantage of two cores, the same process would have to be repeated at four cores, eight cores, and so on.

UC Berkeley’s Patterson said people have been trying to achieve automatic parallelization for years. “We see hundreds of cores on a chip seven years out. Today, there is very little software taking advantage of the cores. Cores are idle almost all the time, and there’s plenty of reason for pessimism.”

Back to the drawing board

One solution may be a new language or languages to run on multicore chips. That ultimately may prove to be the best choice, but many people remain skeptical.

Intel has taken a first stab at the problem with a language called CT. Until now, Ct has worked largely on a shared memory system, but the company is considering whether to use a distributed computing environment approach so that an application can scale to every node on the system.

All of this will take time, of course. The first step is for libraries and frameworks to be parallel-enabled, which Intel believes will happen in the next one to two years. After that, it could take 5 to 10 years for the development language to become mainstream—something that will require lots of work on the part of Intel, its partners, and research currently being done by universities around the globe.

IBM and Microsoft also are working on their own versions of parallel programming. So far the companies have not released details of their efforts. But the goal in all cases is to “divide and conquer” by breaking down the pieces that can be run in parallel.

Add to that an inherent incompatibility between future chip strategies by both IBM and Intel. IBM has opted for heterogeneous cores in its future chips. Intel is focusing its efforts on homogeneous cores. It’s likely that the two worlds will merge with a mix of homogeneous and heterogeneous cores, but it raises some programming issues that are not yet resolved.

Security Issues

There are other challenges in the multicore world that don’t exist in the single core chip. Security, in particular, is much more of a concern because of the flow of data between cores.

“With multicore, there are new challenges to utilize the individual cores,” says Andrew Sloss, the ARM’s liason to Microsoft, said the difficulty is controlling communication across cores and avoiding “excessive broadcasting.”

“We define security as hardware protection that makes it too expensive to break into the system,” Sloss says, adding that in all systems important data needs to be isolated.

Business Issues

No matter how big this challenge looks, or how much pessimism accompanies it, most people involved believe the electronics industry has no choice but to solve it—or radically change their focus.

While corporate IT will continue to buy servers, the vast majority of electronics these days are sold into the consumer world. Typically, what sells new products are either dramatically lower power consumption and equal or improved performance.

If no one can figure out how to scale programs on multicore chips, or the uptake is limited to the current scientific and highly mathematical applications, then the road map for future chips shifts. Moore’s Law is still feasible, but it may no longer be relevant.

Share the knowledge:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis
  • YahooMyWeb

Tags: ,

Comments

8 Responses to “Multicore Programming: The Next Frontier?”

  1. Multicore Programming: The Next Frontier? Says:

    [...] Full Story [...]

  2. Peter Robertson Says:

    Multicore is not the answer as it fails for at least three reasons.

    1: It is hoping to make legacy code run faster by some automatic or semi-automatic
    process. The reality is that existing sequential programs are doomed. We either
    waste huge amounts of effort in dead ends that restrict progress by forever
    looking backwards or we realise that new code written to a better model is what
    eventually must dominate.

    2: Processor complexity is growing out of control. It is a sad fact that computing
    is the only area of engineering where “more complex” is considered “better”.
    Multicore is a prime example. Even though a certain complexity is inevitable, it
    should also be invisible. It seems that processor designers add new features with
    little consideration of their effects on software (or if they do, it’s only for
    existing software built to work round existing hardware deficiencies). For example,
    should you really need >100 registers to program a transfer on one serial link
    (sRIO on TI C6000 processors)? Are there no engineers with the knowledge and
    courage to protest about such nonsense when they see it?

    3: Multicore is a shared-resource architecture and as such cannot scale. Even if
    existing multicore architectures were perfect, the needs of tomorrow will require
    greater numbers of processors and we would end up right back where we are now. We
    would be forced to add yet another ad hoc mechanism (such as building clusters of
    multicores) that would lead us right back to rewriting software.

    At some stage we will have no choice but to move to true distributed parallelism that
    does not have these built-in limitations. There will still be hardware challenges, but
    they and their solutions must not be directly visible to the underlying programming model.

    Processors are essentially sequential; parallelism comes from having many of them.
    Given this, existing programming languages, even though dreadful in many ways, are
    widely known and adequate to express what happens on each processor. All that is needed
    is an environment in which explicitly sequential, but independent, components can be
    distributed over numerous processors. We would then have a chance that programs written
    for such systems would survive and match the faster processors of the future.

    Peter S. Robertson
    3L Ltd

  3. John Gross Says:

    Peter Robertson hits the nail on the head when he points out that most code that is being written for multi-core assumes a shared memory model, and because this entails use of reference (to avoid gratuitous movement of data), will not work in a distributed memory environment. And since it is generally agreed that the shared memory model is unlikely to scale past 8/16 cores, in about 3 or 4 years time, the industry is looking at another major upheaval.

    Connective Logic’s Blueprint toolset provides a means of developing software that works with both memory models. In the shared memory case data is exchanged by reference, but when the same code runs on a cluster the data is transparently moved and cached. When it’s no longer referenced it’s garbage collected. The developer only ever sees a Single Virtual Process and mapping to actual physical processes is achieved using ‘late-accretion’, which does not impact on the ‘application’ code itself (see http://www.connectivelogic.co.uk).

  4. Gene Bushuyev Says:

    The article presents very good observations on both technical and programming challenges facing multi-core development. And if the hardware challenges, like memory performance, bus saturation, etc. may be solved in the near future, software problems will remain. The fact that human brain cannot multitask (medical fact) naturaly leads to software solutions that are sequential. Simply giving software developer a good thread library doesn’t change that, it’s still very hard for a person to reason in asynchronous, parallel manner. That results in software programs that are slow to develop, and debug, and that contain not only logical bugs, but also bugs, related to unpredictable thread counteraction.
    On a positive side, working on our own event-driven GBL designer, I’ve found that partitioning a design to modules, which communicate only through signal interfaces and are not coupled otherwise, leads to quite natural parallelization. Event-driven architecture naturally allows a developer to reason about event-response decisions, not about sequential program flow.
    Designs that are based on event-driven mechanism show amaizingly loose coupling and can be naturally parallelized without any extra efforts from a programmer.

  5. ed Says:

    The bigger issue behind all of this is what’s going to be the impetus to buy new hardware–computers or otherwise–if it doesn’t offer better performance.

    Car makers are in a bind now for the same thing. The rising price of gas (even if it has taken a dip, it will go back up) means the real focus is gas mileage. It’s great to step on the accelerator and feel real power, but the biggest selling point is fuel economy. That’s why rental car companies are charging extra money for hybrids.

    Multicore doesn’t offer better energy efficiency and it doesn’t offer better performance for most applications. So what’s the selling point?

  6. sebastien renaud Says:

    Really interesting subject, but as always, I fail to understand why people think a paradigm shift ( new programming model/language ) is necessary in order to take advantage of it. Humans think in sequential terms and cores execute in a sequential fashion ( if you abstract the multi-alu / instruction reordering stuff that have been the norm on in SINGLE core processors for a long while ). Bottom line is: I strongly agree with Gene Bushuyev you will always gain from structuring your sequential modules with clean and minimized interfaces. Taking advantages of multiple cores with that kind of code is much easier. I also agree with the fact that some application are more geared towards parallelism than others. However, who really needs to split up Word or Excel? Assuming that 16 cores CPUs are not too far off, a single core for each application seems good enough, no?

  7. ed Says:

    One core is plenty for most applications. But dedicating a core to each application, or at least to some applications, is incredibly inefficient. Data centers are now dealing with that problem. Each server is running at a max of 15% utilization because everyone was afraid to load more than one Microsoft app and OS on a server because it wasn’t as robust as Unix or any of the mainframe or minicomputer OSes. Now they’re resorting to virtualization, which brings its own issues. Just lobbing the challenge to the software programmers isn’t helping. They’ve already concluded over four decades that they can’t solve it.

  8. Brian Says:

    [quote]Multicore doesn’t offer better energy efficiency and it doesn’t offer better performance for most applications. So what’s the selling point?[/quote]

    Fragrate and frame rate. Megabucks go into improving the KDR (Kill/Death Ratio). This is gaming lingo, but applies to Military operations, and to Wall Street. We use supercomputers because we lack good parallel computers (and the supers are multiple parallel) to modle nuclear implosions and explosions, Wall Street does (and will increasingly) do the same to “milk the cow” (exploit the market) to the critical edge of not killing it.

Leave a Reply