Posts Tagged ‘IBM’

Next Page »

System Bits: April 3

Tuesday, April 3rd, 2012

Size Matters
IBM is joining forces with the Netherlands Institute for Radio Astronomy on a five-year project to create an extremely fast, low-power exascale computer system that will process data from the worlds largest radio telescope.

The project, called DOME, is named after the protective cover on telescopes and the well-known Swiss mountain. IBM researcher Ton Engbersen said the amount of data that will be collected each day is roughly the equivalent of twice the daily Internet traffic from around the globe. An estimated 300 to 1,500 petabytes of data also will need to be stored after processing.

Radio telescope arrays will be used to glimpse into the past and the future. Source: SKA

At the heart of this gargantuan system is massive computing performance plus huge data-transfer links that are well beyond the most advanced current technology. This gives entirely new meaning to the idea of wide I/O, which likely will filter down in some form to commercial applications.

The radio telescope, also known as the Square Kilometer Array, uses millions of antennas to collect radio signals with a surface area that is about the width of the United States. The goal: to explore dark matter, evolving galaxies and the origins of the universe.

Optical Sensors
UC Santa Cruz has won a grant to begin exploring optical sensing technology that can detect single molecules, potentially replacing very expensive equipment with a single chip.

The goal is to create medical devices that are easily developed, inexpensive and portable and which can target disease-related molecules. The chips use standard IC processes that enable light propagation through small amounts of liquid on chips.

Unlike consumer electronics, though, nothing in the medical field moves quickly. Labs on a chip were created nearly a decade ago with great fanfare, but still have only made a small dent in the medical world.

–Ed Sperling

Why PCs And Servers Aren’t Going Away

Thursday, March 22nd, 2012

By Pallab Chatterjee
With the rise of mobile appliances, smart phones and tablets, there has been a lot of discussion about the place for PCs, servers, embedded processors and networks. A number of companies have claimed they will rule the world of computing and there will no room for others.

Reality seems to be somewhat different, however. The mobile end point devices—smart phones, tablets, and netbooks, are content-consumption devices. They playback content—video, music, still images, business data—that already exist.

The majority of business data is created on desktop/laptop PCs that use an x86 processor from Intel or AMD. These have been the dominant platform since the early ’80s and still are the workhorses inside most offices. Tablets are starting to be brought in to supplement the PCs and extend their lifecycle, but they are not displacing the existing machines. The computing power of these larger systems allows them to be used for presentation creation, as opposed to viewing, graphics creation, report writing and calculation. These are in addition to the engineering and scientific uses, which are also compute-intensive.

A common misunderstanding is that the multi-core microcontrollers that are in the tablets and smartphones can perform equivalent computational tasks to microprocessors. The applications on these devices (power optimized in-order-execution controllers with direct mapped memory) are created on machines using out-of-order execution featured microprocessors, which also provide deep, virtualized memory, large data stores and the true multi-user and multi-tasking. This allows for the creation of memory and runtime-optimized applications (i.e. Web browsing and games with multiple pre-defined playing levels and performance metrics) that have both known and minimized data and resource extents for the micro-controller based players.

As a result, standard development environments on Windows/Linux/Mac OSX using x86 machines are the default basis for the application and content creation for the mobile appliances. These are not just created on workstations. They also are created on a server base. Depending on who is quoted, the ratio of server cores per end point device is in the range of 1:8 to 1:20. This means worst case for servers, at the 1:20 ratio, it would require 2.5 billion x86 cores to address the 50 billion end-point devices forecast for the Internet of Things. Rather than spelling the death of big iron devices, it means massive sales in this market. Based on real applications, the ratio will average out to something closer to 1:12, which brings the number of cores close to 5 billion.

The advantage of these machines is the ability to support virtual users (multi-simultaneous clients) using products from Microsoft, IBM, VMware and Citrix, as well as full virtual machines. A virtual machine differs from multi-user approaches in that the I/Os, storage, security and CPU/GPU interaction are also virtualized for each user. This allows for mapping direct-attached and tiered storage, including a storage-area network, to be virtualized for access from the virtual machines.

Currently, the virtualization support for the microcontrollers and their associated hypervisors do not support full virtual machine capabilities. In discussions with more than 75 enterprise and data center administrators, this need for full storage and memory access, as well as out-of-order execution to support multi-applications from multi-users at once, are preventing microcontrollers from gaining ground in the server space. They have made only limited gains, mostly for targeted applications at the edge of the network for running Web servers and fixed fill-in-form applications that can be crafted the same way that end point code is created.
This accounts for, at most, 1% of the server environment at a ratio of one core for four users. It also brings with it a cost of development and support that is about four to five times the cost of general-purpose code that has single release capabilities and does not need multiple operating variants to be deployed for support of multiple device platforms and OSes.

Coherency Becomes A Stack Of Issues

Thursday, March 22nd, 2012

By Ed Sperling
As complexity increases and the industry increasingly shifts away from ASICs to SoCs, the concept of coherency is beginning to look more like a stack of issues than a discrete piece of the design.

There are at least five levels of coherency that need to be considered already, with more likely to surface as stacked die become mainstream over the next few years. Perhaps even more mind-numbing, this stack itself will have to take on a level of coherency over the couple generations of chips.

Let’s take a closer look.

Cache coherency
The concept of keeping data coherent historically was relegated to processor makers such as IBM, Intel and AMD, which have focused on improving performance through faster access to data. One solution to that improved performance has been multithreading and multiprocessing. Along with that, these vendors have added in various levels of cache memory for faster recall of important data.

More cores also makes it harder to effectively use these caches. Data has to be kept consistent, which requires more system overhead in terms of processing and power just to maintain that coherency. And it gets even harder as more cores are added into an SoC, which increasingly are not same size, do not run at the same frequency, and sometimes do not even connect directly to the main CPU.

“With cache coherency, some of the traffic may be serviced by the cache on another GPU,” said Drew Wingard, CTO at Sonics. “If you’re just using an ARM core, the CPU coherence is sufficient. But the GPU uses its own local memory. You really want it to be fully cache coherent across all of those.”

But even finding the data to maintain consistency may be a problem in a complex SoC.

“You can view what’s in memory, or view it and be able to change what’s in memory, but first you have to find it,” said Kurt Shuler, vice president of marketing at Arteris. “If you have four cores, the most efficient way to hook them up is for each core to have its own cache and graphics to have its own cache. If you change something, you have to snoop in all the caches to make sure it’s consistent.”

But there is also a move in the completely opposite direction—sharing memories among multiple cores—because it reduces the number of components on the bill of materials. The Low-Latency Interface specification from the MIPI Alliance is a case in point, where a memory can be shared between a modem and an applications processor. Intel, meanwhile, has added on-chip graphics that share memory with the CPU.

“The whole design gets more complex,” said Shuler. “You have more traffic beyond the cores, and from a power standpoint the overhead goes up.”

Still, cache coherency is one of the better-understood pieces of this stack. It has been an issue ever since multiprocessing was first employed in the 1960s. “Snooping” has been widely used since that time.

Software coherency
A newer facet of coherency involves embedded software. Because SoCs now include an increasing amount of software in the design, engineering teams now have to wrestle with coherency issues that previously were dealt with by the operating system.

“Fundamentally you’ve got two combined issues here,” said Andy Meyer, verification architect for Mentor Graphics’ Design Verification Technology Division. “You’ve got cache coherency, where the same data is being viewed in a couple places. And then you’ve got an issue with consistency in the simple code in a uniprocessor that now has to run on a second processor. The ordering of events can change in multiprocessing.”

Those problems crop up regularly in verification, but not always with the expected results. It’s difficult to effectively write the stimulus in a testbench for coherency. What happens, for example, when a core is shut down to save power?

“The scariest part is when there is no OS support,” said Meyer. “There’s also a big problem with heterogeneous cache, such as when you have a CPU working with a GPU.”

Another issue has to do with effective coverage in verification, already a problem for complex SoCs. States frequently are distributed across multiple chips and multiple boards. Timing varies from one state to another, and can be particularly problematic if snooping functions are tied to a state. And parallelism continues to baffle even the most advanced teams.

“Standard coverage methods don’t work well here,” said Meyer. “You have to query in ways you traditionally didn’t have the power to query and ask questions across months of regressions. For instance, ‘Have we been here ever—or in the last two months.’ Until coverage steps up, people with deep knowledge of verification running hundreds of full-time emulator systems are finding out at the last minute that it’s not okay to ship.”

I/O coherency
Tied in with both cache coherency and software coherency is I/O coherency. Increased communication on a chip, between chips, and between a chip and the outside world, have turned what used to be a relatively straightforward networking issue into a complex jumble of prioritization and synchronization.

“You have to deal with this even in single processors,” said Sonics’ Wingard. “You may have a PCI core streaming data into memory. Today, without I/O coherence, it’s difficult to determine what is coming in. The CPU has no way of knowing what was transferred when it dos a copy from non-cache to cache.”

He noted that personal computers had I/O coherency for a long time, particularly with direct memory access. DMA was developed initially to help solve the bottleneck that occurred when a CPU was involved in an I/O transfer. Rather than tie up the CPU with that transfer, the CPU continued running, then accepted an interrupt when the transfer was completed.

But with more of this being moved onto a chip, keeping coherency while moving data back and forth from more places is becoming much more difficult.

Ecosystem coherency
One of the least addressed facets of the coherency stack involves business and communication issues across a supply chain for a particular SoC rather than the actually technology itself. Even where competitive suspicions can be overcome, the very different approaches taken for designing components, IP and software, as well as language barriers, create one of the more difficult and less tangible challenges in the coherency stack.

“The challenge going forward is that you have a bunch of people who may not be that skilled in system development driving the chip and spec for one design, and other supplier trying to orchestrate things,” said Mike Gianfagna, vice president of marketing at Atrenta. “So you bring them together to solve a problem for one customer in 12 weeks and then they move on. You’ve got corporations coming together and bringing all these pieces together almost like the way a movie is done. But is there a coherent way to communicate data and information risks and still provide good visibility from a power/performance/area point of view?”

For decades this task has been handled by IDMs, but in the SoC world there are far fewer IDMs these days. Many of these chips are built using third-party IP such as cores from ARM or MIPS, DSPs from companies such as Tensilica, and standard IP from the Big Three EDA vendors.

Coherency in stacked die
It’s uncertain whether stacking of die, either in 2.5D or 3D configurations will make coherency easier or harder. The answer is likely to be a little of both.

“With 2.5D and 3D, you’re looking at low-power memory access,” said Arteris’ Shuler. “You put the DRAM closer to the CPU, the addressing is wider and you get rid of some of the latency. But you also need coherency across all of this.”

No one is sure yet how multiple high-speed communication channels between die will affect coherency. If the channel between the core is wider and shorter that will improve data speed, but if processors and DRAM are scattered on multiple die, with some of them shut down, some partially shut down, and others fully active, it may make it harder to keep track of data and make sure it is all synchronized.

Different Tradeoffs

Thursday, February 23rd, 2012

By Ed Sperling
The push to “smaller, faster and cheaper” hasn’t changed since ICs were first introduced, but the context for those requirements is beginning to shift—with enormous consequences.

What was once done on multiple chips continue to migrate to a single chip or package because of cost, but in some cases the decisions about goes where go well beyond an individual device to include a network of systems. Power and heat have forced some of those decisions. Others are being driven by shorter market windows that affect business decisions about exactly when to move to smaller, faster and cheaper, and whether to keep a design in two dimensions or move to three. In some cases, it even has evolved into a tradeoff about sharing resources to make up for additional costs elsewhere in a design.

“Form factor is everything in a lot of these cases, and you’re being forced to make tradeoffs involving a lot of different pieces,” said Mike Gianfagna, vice president of marketing at Atrenta. “But that requires you to know exactly what you’re doing. A lot of times you don’t. What happens when you reduce the number of layers? Do you know the impact on the system? You may not. But competitive pressure is also forcing you to rethink everything.”

Rethinking designs
Some of these changes are as fundamental as where the processing gets done. While the concept of cloud computing has been around since the days of time sharing on mainframe computers in the 1960s, the ability to offload processing and storage on the fly—and to load balance across compute farms around the globe—adds a modern twist to it all.

The result is a handheld device with the performance capabilities of a compute farm—but with the design focused far less on local processing and storage and more on communication and battery life.

This is evident with a number of upcoming communications schemes and protocols in the handheld market. LTE Advanced, for example, which is expected to find its way into smart phones and base stations over the next four years, focuses on reducing power while increasing performance. One of the best ways to do that is by shifting what processing is done where.

“One of the key decisions is how much processing and intelligence is in the cell phone versus the cloud,” said Graham Wilson, a product marketing manager at Tensilica. “You also have to understand deeply what cores are being used for. There is no room for fat. We’re also going to see a big shift in infrastructure from homogeneous to heterogeneous.”

That means rather than a giant cell tower on the highest hill or building, smaller boxes will be mounted on houses and strung together in a mesh network. “Every house will have its own femto cell or pico cell box so they’re less reliant on the macro cell and they work off each other,” Wilson said.

That changes what resources can be committed within a design to processing, to communication, to storage, and where it can be done best—whether it’s a central processing unit or lots of smaller processors for individual uses. It also boosts the ability to cut some costs in different places than just by shrinking the process geometries in a design.

The Low-Latency Interface working group of the MIPI alliance, for example, is currently working on a new standard that allows DRAM memory to be shared between two chips. NoC technology vendors, in particular, have seen this push because it requires a highly efficient network-on-chip infrastructure.

“The big advantage is that it allows you to get rid of an entire memory chip,” said Kurt Shuler, vice president of marketing at Arteris. “The modem and the application processor are sharing the same memory. You also reduce the number of pins, which is important because it allows you to use those pins for other things.”

He notes there is a very slight performance hit. But the ability to eliminate an entire memory chip can save a couple dollars in a design. Multiply that times millions of units and the savings are huge—far greater than just shrinking the features on a die.

Rethinking packaging
Stacking die offers another alternative to improving performance and time to market, but the tradeoff will be in cost unless additional components can be eliminated. Adding an interposer layer or TSVs will be expensive—at least initially—even though 2.5D and full 3D stacking hold the promise of dramatically improving performance through shorter distances, bigger pipes for data, and lower power because signals will not have to be driven as far.

While this packaging approach is still under development, foundries report that chips are rolling out using this approach. “This is already happening,” said Luigi Capodieci, R&D Fellow at GlobalFoundries. “It’s mostly a decision of which design processes to use in the chip, and that decision will have to be made by the chip designers.”

Stacked die also allow IP developed at older nodes—particularly analog—to be attached through Wide I/O to other chips developed at more advanced processes. That, at least in theory, substantially reduces the time it takes to design a chip because much of it can be based on what has been previously developed.

“Re-use leads to a reduction in time to market,” said Shrikrishna Gokhale, COO and managing director of Open-Silicon’s India unit. “This opens up the lifecycle of different IP and puts the emphasis on packaging and re-use.”

It also puts greater emphasis on software-hardware co-design, he said, and requires more emphasis on defining partitioning earlier in the architecture phase. In addition, it requires a rethinking of what gets done where. Some portions of the design that used to be in separate locations now have to be co-located in the same place because of the constant need to update models and data for both hardware and software teams.

“The logic front-end design needs to be done at the same location as the software,” he said. “That’s less important at the back end, which is the physical implementation.”

Other tradeoffs are less obvious, though, particularly to design engineers. One involves weight.

“Half the weight of a tablet is the battery,” said Drew Wingard, CTO of Sonics. “You can’t afford to add a bigger battery so you have to do an increasing amount of computation with lower power. That means you look at more efficient ways of doing that computing. One is using the GPU as a general-purpose CPU, which allows you to get a lot of performance at low energy.”

He noted that utilizing the GPU requires it to be easily accessible to software developers. And it requires much better management of clock domains, voltages and on-off functionality within an acceptable power budget. And to be really energy-efficient, users need to be able to easily input their own usage models.

Rethinking manufacturing
Some of the changes that are under way are forcing a major shift in manufacturing, too. Staying on the Moore’s Law road map has always been a given for high-volume digital designs, but with double patterning required at 14nm and the delay in extreme ultraviolet lithography, alternatives are being considered that could have ramifications throughout IC design.

“Double patterning is the biggest issue we’re dealing with right now,” said Jean-Marie Brunet, director of product marketing for model-based DFM and place and route integration at Mentor Graphics. “We’re even looking at triple patterning, but there is no way to have density balance between the layers when you do that.”

Lars Liebman, an IBM distinguished engineer, said his company has been working on commercializing self-assembly for finFETs because even multi-patterning isn’t sufficient beyond 14nm. That has implications throughout the design chain. For one thing, it can increase the density on existing process nodes. For another, many of the tools for automating design, particularly on the DFM side, will need to be rewritten.

Conclusion
Area, power and performance have always been the standard metrics for tradeoff in any IC design. What’s changing significantly is why those tradeoffs are being made and where the benefits will show up. Changes targeted at an individual chip in the past, or even a block or subsystem, may now be aimed at a much broader level.

The good news is that infrastructure changes—everything from manufacturing approaches to communications networks—evolve much more slowly and deliberately than those made in the individual device or chip. The bad news is that sometimes that moves so slowly that it can affect what’s done elsewhere in this much broader system. But some change is underway at every level, and managing that change—and the tradeoffs it will demand—will be much more challenging in the future.

Ambient Computing: Interdependencies Rule

Thursday, January 26th, 2012

By Ann Steffora Mutschler
Ambient computing: Just the concept conjures up images of a Star Trek-like ‘Computer’ that is ever at the ready, awaiting a query at any moment, and which can discern as well as perform significant tasks. While Apple’s Siri gets there partway, it is significant because the concepts that make the technology possible behind the scenes draw upon a multidisciplinary, interdependent approach.

Ambient computing for a human being means whatever they are around—be it a refrigerator or a phone or a watch or sunglasses—all of these places contain computers that are just there and ready. They don’t need to boot up and they can communicate with other devices. “But,” cautioned Kurt Shuler, director of marketing for Arteris, “that’s really challenging for the industry right now.”

One of the reasons, not surprisingly, is power, he said. “From a human interface design standpoint, even though it may be in super sleep mode 99% of the time, when a human being says, ‘Hey, I want to open my fridge,’ it’s got to happen right away. We still haven’t figured out how to do that really well yet.” Shuler suspects this is because chip guys develop independently from software guys who develop independently from device guys, with Apple being one of the few companies that does all three.

But it’s not all science fiction. Cary Chin, director of technical marketing for low-power solutions at Synopsys, observed that we really haven’t been this close on many fronts of ambient computing for a long time and that many things have happened just in the last couple of years.

“This idea of ‘always on’ is just one of the things,” Chin said. “Clearly the idea of ‘always available’ computing is one of the first requirements, and that has a lot to do with all the low power, energy efficiency-related things. This whole idea that the vast majority of systems really should be, by default, ‘off’ but have enough ‘on’ that the rest of the system can be in extremely low power standby and wake up very quickly whenever they’re needed and then go back to sleep. These ideas exactly fit in with the idea of larger system being always available and more recently, within the last few years, integrating that with a mobile solution is another piece.”

Chin sees four requirements for ambient computing to become a reality:

  1. Always Available. He said this is an area where we are doing great. “There’s no doubt that within the next few years more stuff will have happened. This idea of low-power, always-on standby is clearly the way.” His view of the not too distant future is that a lot of devices won’t have on-off switches anymore because it is harder and harder to distinguish between on and off. Most things are always on, but they are not wasting energy when they don’t have to be.
  2. Communications. “There’s a ton of stuff going in there obviously with mobile devices but a lot of it has to be more in the context of extreme low power communications and again, this is an area in the last few years that has taken huge leaps,” Chin asserted. For example, the latest iPhone supports Bluetooth 4.0, a low-energy mode that supports devices that can be powered for years or more for more passive targets in communications. On the Google front, they are supporting more the idea of NFC with the near field communications standards.
  3. Human Interface. This is the man-machine interface, and where Apple’s Siri comes in, which is making great strides toward popularizing a natural language interface. Along with this is the transition to a touch interface, driven by smartphones and tablets. “In this whole human interface thing, we’re kind of in the next revolution and touch is the next piece. I can really envision a combination of a natural language interface combined with either not necessarily even a touch interface, but really this idea of an almost Wii-like interface where you can do these commands in the air because that would make a lot more sense with regard to just entering stuff onto the computer,” he predicted.
  4. Improving Machine Learning and AI. Chin noted that for many years it has been obvious for those in the technology industry that we have gone through this entire generation with the division between humans and computers being pretty much in the same place. “We haven’t really moved that forward. Things move much faster now—computers are way faster, much more storage—but basically the dividing line between what the human is expected to do and process versus what the computer is expected to do hasn’t really changed in the last 30 years pretty much.” Here again, he points to the Siri interface as having made big strides in this area, which is almost a mini version of the IBM Watson computer that plays Jeopardy (http://www-03.ibm.com/innovation/us/watson/index.html). The next step is moving the interface forward to a point where the command interface isn’t based on a command or even on a command and a bunch of aliases. It is interpreting what your intent is and the machine figures out what command, parameters and what engines, etc., are needed.

No more lone wolves
What this means for system architects of the very near future is that they can’t work independently any longer. “It used to be when you had a complex chip design, you’d have your test expert, you’d have your power architect, you’d have your timing closure person, you’d have separate experts that would worry about their axis of the chip and would all work sort of independently to get it done,” said Mike Gianfagna, vice president of marketing for Atrenta. “It doesn’t work that way any more because the minute you lower power you potentially mess up testability, and the minute you change testability, you might mess up your synchronization schemes for the clocks. So everything is interdependent. You can’t have a team of people working independently and somehow get it done. The experts need to be enabled to work collaboratively and understand the implications of what they do on one thing and how it affects something else. This requires more concurrent engineering and requires the various optimization tools to work in concert with each other and concurrently.”

He noted that the industry has talked about concurrent engineering for a very long time but it hasn’t been a need-to-have. Where it really becomes a need-to-have is around 22nm because, “You just can’t get there from here. You’ve got to co-optimize everything or you can’t close the design. Concurrent engineering and the need to balance all these things simultaneously become critical. You still have your experts but the experts need to be able to work more collaboratively and that only works if the tools can give you real-time feedback on if you change timing, what happens to timing, power, area, testability.”

In essence, to make ambient computing truly a reality, all parts of the ecosystem—from device to network to cloud—are completely reliant on each other for success. Realizing ambient computing requires some lateral thinking and reinvention in the entire electronics industry. But this is exactly what we will see in the years to come.

Additional reading:
Ambient Computing Blog: Where the Wild Things Are
A look at Apple’s Siri
How Speech Recognition Will Change the World

Reverse Engineering

Thursday, January 26th, 2012

By Ed Sperling
Fabs and foundries frequently have been the savior of flawed designs, fixing problems such as power and performance, identifying design issues and often developing solutions to those problems.

Over the next couple of process nodes, and in stacked die that will span multiple processes, there will be far fewer saves coming from the back end. Double and triple patterning, stress effects, new materials and the laws of physics are forcing a change in direction. In fact, for the first time design teams will have to make up for a slew of changes and challenges on the manufacturing and packaging side, employing new methodologies, new tools and deeper levels of expertise.

In a keynote speech at the SEMI Industry Strategy Symposium last week, Applied Materials chairman and CEO Mike Splinter sounded the alarm over the changes ahead. “Change is accelerating,” said Splinter. “Compared with the last 15 years, the next five years will have more changes and more inflection points. And it’s not just about complexity. It’s happening at the foundational level of how an IC is made.”

He’s not alone in that assessment. Bernie Meyerson, an IBM fellow, said CMOS is now in “the end game.” While CMOS certainly isn’t going away, there are physical limits for what can be done to extend it. That has spawned extensive research into alternative materials such as silicon on insulator and graphene, new elements for insulation, as well as new structures such as FinFETs and carbon nanotube FETs.

So what does this mean for design at advanced nodes? Lots more work on design for manufacturability, more complexity in achieving the same kinds of boosts in performance and energy efficiency that were taken for granted at older nodes, and much more up-front checking of just about everything.

“From 40nm to 28nm to 20nm, the number of checks for physical verification will grow by leaps and bounds,” said Michael White, director of product marketing for Calibre. “There are almost 1,000 more DRC checks from 40nm to 28nm between early production and volume production. We are also capturing additional context-dependent yield detractors. For example, historically we have had spacing checks. Now we have spacing checks and we need to check all of the other geometries in the neighborhood, including lithography and fill issues. Those are extra constraints.”

Lithography used to be something design teams never had to consider. But the delay in EUV will require double patterning at 22/20nm and potentially even triple patterning of at least some portions of the chip at 14nm. This becomes particularly challenging for design teams, because one of the approaches under serious consideration is something called spacer-assisted double patterning. In simple terms, a polygon design may look nothing like what’s on the mask using SAPD. This is akin to driving a car in reverse using the rearview mirror where nothing that appears in the mirror resembles the road.

Stacking effects
One solution to these issues is stacking of die, whether in 2.5D or 3D configurations. The so-called “More Than Moore” approach bundles technologies together at nodes that make sense for a particular function, rather than trying to fit everything into the most advanced process. So while the logic or memory may be created at 22nm or 14nm, for example, analog may be developed at 130nm.

This all makes sense in theory, but it also adds a new dimension of complexity that ripples back and forth between the design and the manufacturing worlds. It also exposes the entire supply chain into the design process, because problems detected anywhere along the chain can affect multiple other areas—and it’s possible that no single segment can solve them alone.

“Over the next three to five years chips will go vertical,” said Naveed Sherwani, CEO of Open-Silicon. “The question is how we are going to put together 3D ICs and what will go into them. There is a lot that needs to be done in this area.”

Sherwani contends that tools and methodologies should make it easier and quicker to do derivative designs. That’s the goal, and at least part of the solution involves companies learning to use the tools they have more effectively, and to apply some discipline to their methodologies. It’s easy to get blinded by the number of permutations and choices from the growing complexity.

“As process geometries continue to get smaller and the amount of IP used increases, the complexity of the design process becomes a major issue, which puts pressure on the entire development team from a coordination and communication standpoint,” said Simon Butler, CEO of Methodics “Also, with software elements and power constants, which are really just other types of IP, added to the already very complex mix of things, design teams need better ways to manage the entire SoC development process and synchronize all the moving parts. Internal design organizations already struggle with managing remote design teams. Now, with a disaggregated design chain consisting of separate companies, the need for real-time collaboration and managed data exchange is critical.”

That sentiment is echoed across the industry. Frank Schirrmeister, senior director for the Cadence System Development Suite, said that in principal tools allow engineers to model almost everything they need. “This isn’t a tool problem. It’s a discipline problem. But the other side of this is that in 1993 logic synthesis was pretty simple. Twelve years later, the whole process is not longer understandable by any engineer.”

Margin call
One of the most effective ways to deal with unknowns in the past is guard-banding—the process of building extra safeguards into ICs. That worked until about 65nm, but at advanced nodes it can cause performance degradations or drain batteries more quickly, or both.

“The guard band for synthesis is a smaller percentage at 28nm and it’s even smaller at 20nm,” said Jack Browne, senior vice president of sales and marketing at Sonics. “So you’ve got to be able to interoperate with the right guys. We’re all trying to manage a horrible amount of complexity and simplify it. The problem is there is too much that’s new and not enough experience points so that people can make the safe choices. There are significant unknowns on everyone’s road map.”

One potential solution—and one that’s being considered by a number of large chip and IP companies—is to harden everything into pre-qualified, pre-verified subsystems. While this limits the number of permutations, it does take some of the risk out of using those blocks. But too many hardened subsystems also can limit the ability of companies to differentiate their designs. And while that works well at a company like Apple, it does not work so well at a chip company trying to sell technology to Apple’s competitors.

“With subsystems you’ve closed the black box and given up the chance to turn some of the dials,” Browne said. “We’re seeing this with the TI OMAP team, which has accumulated a significant number of libraries and with Broadcom. And Toshiba has created video and RF subsystems.”

Caution ahead
All of these issues have raised questions about what needs to be fixed in the design flow, what needs to be extended, and how this will unfold over time. The reality is that changes may be slow because there is serious uncertainty about exactly what problems will erupt, where and when.

“There’s always a risk of getting too far ahead with the tools,” said Steve Smith, senior director of platform marketing at Synopsys. “We will add capabilities to current tools to make them 3D aware, but the goal is to enable engineers to do what they do best. We’re already dealing with multicorner, multimode design, and 3D will be another dimension. We might have coupling effects and we certainly will have a challenge with temperature. But most of the processes are familiar, and changing things in a working flow is always risky.”

Experts At The Table: The Future Of Stacked Die

Thursday, December 15th, 2011

By Ed Sperling
System-Level Design sat down to discuss the future of stacked die with Riko Radojcic, director of engineering at Qualcomm; Prasad Subramaniam, vice president of design technology at eSilicon; Mike Gianfagna, vice president of marketing at Atrenta; and Herb Reiter, 3D/TSV working group chair for the GSA. What follows are excerpts of that conversation.

SLD: Where are we with 2.5D and 3D?
Radojcic: I think 2.5D was a misnomer, because that implies they are sequential. It’s clear that what we call 2.5D and 3D are going to co-exist for a long time. Some things make sense with an interposer and some make sense to be 3D.
Reiter: I agree—2.5D is a parallel effort to 3D. Lots of things will not use 3D because it’s too expensive. In 2.5D we will see production this year. With 3D it will take until next year for the first ones. I would guess computing or networking would be the first.
Radojcic: I would think those guys will pursue 2.5D.
Subramaniam: Memory makers are already offering 3D solutions today. If you look at just the memory chip, to increase the size of the memory rather than the die they’re stacking it vertically. That kind of 3D is already in production. It’s the question of co-mingling logic and memory that will take time. The advantage of 2.5D is that it allows afterthought. It allows you to take an existing design and to create a new set of I/Os and put in a 3D type of application.
Radojcic: I see no value in doing that. You’re creating an expensive solution to something you can do more cheaply. If you add the 3D interposer you’re adding another wafer. That’s cost. We can solve that problem with a flip chip. It’s cheaper.
Subramaniam: I disagree. We’ve done the analysis. It allows us to take an existing design, like an ARM subsystem in 28nm, even though surrounding logic doesn’t have to be at that 28nm process node. It can be 40nm or 65nm. Rather than building a new chip at 28nm, I can take my existing design, use it as one component of my 3D IC, and build a second chip in a cheaper, older technology.
Radojcic: Yes, as long as you’ve architected your chip like that, such that you can partition it.
Subramaniam: You can’t take any design, no. There has to be some partitioning in the architecture and some forethought. It’s not 100% an afterthought, but there is still some afterthought there.
Radojcic: You have to architect for it. If you haven’t done that, taking an existing chip will just cost you more. If you have done that, of course there is an avenue to doing things better and more flexibly.
Subramaniam: There is enough flexibility in designs that allow you to partition it in some manner.
Radojcic: True, but before 3D came along most of us wouldn’t have partitioned. We wouldn’t have architected it that way. To be able to leverage that value proposition, you must have 3D in mind.
Gianfagna: That’s true. It’s a premeditated act. If you don’t think it through way up front it doesn’t work.
Subramaniam: Because the SoC has a well-defined architecture, it lends itself to this type of application.
Radojcic: But only if you plan for it ahead of time.

SLD: Is this true in all cases?
Reiter: That’s the view of a high-volume supplier. I see low-volume solutions where they use an existing die, put it face down on an interposer, and connect memory to it. So for low to medium volume, 2.5D works. You call it an afterthought. I call it a customized solution.
Radojcic: Why wouldn’t you do that in a traditional multichip package?
Subramaniam: Because you don’t get the interconnectivity. The advantage of a silicon interposer is that you get thousands of interconnects.
Radojcic: But you have to design it sufficiently so you can leverage the interconnects from die to die. If you had designed for a traditional design, though, you would say, ‘I can’t have thousands of interconnects so I’m going to make a serial interface with 100 pins.’ If you take that design for a 100-pin interconnect and stick in an interposer it’s an expensive way of doing things.
Subramaniam: You may be able to take some internal signals out, which you are not able to do with a traditional MCM (multi-chip module) approach.

SLD: Let’s do a reality check. How far along are we toward stacking?
Gianfagna: Last year we had a hot-wired 3D system that was 2D with a bunch of scripts and manual effort. The customer base had strange, contrived designs and they were trying to see what they could and couldn’t do, and the foundries didn’t know what they wanted to do. A year later we have native 3D planning capability, the customer base has specific designs for implementation this year and next, and the foundries have a laser-sharp focus on process learning, mostly around 2.5D initially. If that’s a metric, things are clearer this year than last year. From an EDA perspective, I still think the market is two years away. But we still think this is big.
Reiter: If you look at the Atom chip with the FPGA from Altera, that’s basically a 2.5D solution. The FPGA is for customizing things. The Atom chip was not designed for this application.
Radojcic: But why use an interposer? Why not use a substrate and a multichip package?
Reiter: You could do that.

SLD: What’s missing from the tools side to make all this work?
Reiter: The ability to demonstrate what this technology can do is the most important capability. If you look at big corporations, top management is still hesitant to invest in this technology. If we could demonstrate in a credible way what it can do, people will be more successful in getting money to start programs using this technology.
Gianfagna: The way that happens is the early adopters blaze the trail, everyone tries to follow and the market heats up. What’s needed are commercial drivers. The tools aren’t there, but they’re close enough.
Subramaniam: The tools are not the issue. The development needed to support 3D is incremental. It can be done with the existing infrastructure. It’s really the end application.
Radojcic: Other than path-finding, which is hard to do with traditional tools. And the analysis.
Gianfagna: The complexity is higher. We’ve discovered that, too. RTL prototyping for a single chip has a certain set of challenges. When you go to 3D the modeling requirements are much greater, the constraint generation is more complicated. And we need standards. We can generate all the constraints, but we don’t know where to put them and how to express them because there is no agreed upon way to do that.

SLD: Do the standards organizations know where to start with all of this?
Radojcic: Standards are on a good track. We’ve worked with Si2 and Sematech to propose initials blasts for standards so we can feed them into Si2 and the EDA community and accelerate the process. The bits and pieces are moving, and we are on track to have a set of design exchange format standards by early next year.
Reiter: And Wide I/O.
Radojcic: Yes. The standards are channeled and the engine is revving.
Reiter: We have a bunch of players in a 3D enablement center participating. There are 15 companies listed, including Intel, IBM, TSMC, GlobalFoundries, and so on.
Radojcic: The way this was set up was Sematech said we are going to start a 3D enablement center initiative driven by the SIA. All the members of Sematech were mapped into this. Then a number of companies like Qualcomm, LSI and ASE joined.

Collaboration Grows

Thursday, October 20th, 2011

By Ed Sperling
A series of recent announcements by the Big Three EDA vendors and their well-known partners from across the disaggregated SoC ecosystem is lending new credence to the impact of collaboration.

While IDMs such as Apple, Intel, Samsung and IBM continue to blaze their own trail, developing in-house tools, methodologies, processes and chips, fabless companies working with foundries and tools developers are beginning to show some of the same benefits for a much lower cost.

One such effort involves Cadence, ARM and TSMC, which together unveiled a 20nm Cortex A-15 chip. Mike Inglis, executive vice president and general manager of ARM’s processor division, said teams from each company worked closely together to find out what was broken on the process side, then fed that information back into performance optimization and packaging and worked it into the design flow.

“This is how you more easily get to a more optimized solution more quickly,” Inglis said. “It also enables the leading edge and the trailing edge to get to market more quickly.”

This is what IDMs have always done, taking information back and forth between the design teams and the fab and adding tweaks all along the way. But what’s changing is that fabless companies appear to be catching up more quickly than most industry observers believed was possible.

“We’re seeing collaboration that is both horizontal and vertical,” said Lip-Bu Tan, president and CEO of Cadence. “Horizontal involves industry standards among peers and does not differentiate end products. With vertical collaboration, the goal is an end product that is differentiated, whether that involves IP, EDA, the foundry or software.”

Mentor Graphics, meanwhile, rolled out the next version of its Nucleus real-time operating environment that was developed with partners such as Texas Instruments, GCT and Stonestreet One. In a move aimed at conserving power, Mentor has moved some of the power management capabilities such as dynamic voltage and frequency scaling into the kernel of the RTOS, according to Jan Klube, director of the Nucleus product line.

“The software design was built into the application from the beginning versus folding complexity onto the application,” said Klube. “So developers get a simple power management API and a power-aware RTOS.”

One of those developers is TI, which has been working with Mentor as well as ARM for its Stellaris microcontrollers. Miguel Morales, worldwide marketing manager for the MCUs, said the microcontrollers are sold with pre-written software wrapped up in kits.

“Collaboration will have to accelerate,” said Wally Rhines, Mentor’s chairman and CEO, who noted that Mentor is also working with TSMC on “reliability” kits. He added that it will be critical to respond together to new and emerging problems, particularly with stacked die where stress, thermal and parasitic effects will create as-yet unknown issues.

Synopsys, meanwhile, has been working closely with TSMC and ARM to improve yield and deal with process variations.

“As we look ahead, there is the notion that an upstream tool can know what a downstream tool must do,” said Aart de Geus, chairman and CEO of Synopsys. “We need to be able to move forward to place and route before we finish synthesis, and we need to be able to question why we should do all the work if an issue is not resolvable.”

De Geus noted that collaboration is the answer to systemic complexity. “We must be committed, and we will need to collaborate with partners that have competence.” He added that there also is a need for quick compromise, balancing a “great enough” solution against a better one that will take longer to develop.

Betting On Glass TSVs

Thursday, September 22nd, 2011

By Ed Sperling
There are two big issues when it comes to through-silicon vias. One involves cost. The second involves heat—in particular, how to get heat out of a stacked die and what the thermal coefficient of the TSV will be to make sure it expands at a rate consistent with the SoCs in a package.

To address these issues, System-Level Design caught up with Rao Tummala, professor of electrical and computer engineering and material sciences, as well as the director of the 3D Systems Packaging Research Center at Georgia Institute of Technology, where work has been under way for several years to address these issues. What follows are excerpts of that conversation.

SLD: Why use glass?
Tummala: There are a number reasons. One is that it can be done pinless. A second reason is that it’s highly insulating, with extremely high resistivity, as opposed to silicon. We also know how to handle thin glass for embedded applications. The infrastructure is already available. And we know how to metallize glass. So it’s the best material except for one problem.

SLD: What’s the problem?
Tummala: We have to make holes in glass that are very small, with very high throughput, at very low cost. That’s the main problem we see with glass. But if you solve that problem, then it becomes an ideal material for semiconductor applications.

SLD: So how close are you to solving that?
Tummala: We’ve actually solved most of it. Like everything else, we know how to make glass thin—from 30 to 75 microns in thickness. We developed the process in partnership with the companies we work with to make small holes very fast. We can make more than 1,000 holes in one step. And we know how to metallize. We actually formed an electronic substrate by putting in thin wires and other metal layers, and through-via metallization so we can add components on both sides.

SLD: Who’s behind this effort?
Tummala: We have about 15 companies funding this research. Now we are looking to replace organic packages that are used by companies like Intel, AMD and IBM and almost everyone else. All the smart phones are going to very high-speed images, which will require extremely high logic-to-memory bandwidth. Everyone is moving toward through-silicon vias in every chip. All the semiconductor companies are betting on that technology. I’ve been promoting interposer technology. With glass we think we can substitute for silicon with no TSV in the logic chip and interconnect that with an interposer using extremely high bandwidth. We are looking at other applications, too. I cannot go into the details. But we are running an IEEE workshop here in November on this topic.

SLD: What’s the difference in the thermal coefficient of glass versus silicon?
Tummala: In the case of silicon it’s fixed. It’s 3ppm (parts per million per degree Celsius), plus or minus. In the case of glass, you have options depending on the type of glass you pick. You can go from 3ppm to 9ppm. We think that picking glass at 8ppm, which is between the 3ppm of chips and the organic board at 17 ppm, would put the interposer right in between. That’s the best way to solve that problem.

SLD: Doesn’t that vary depending upon the packaging, as well?
Tummala: Yes, in the case of 3D chips, if you take a lot of real estate with copper vias, you could end up with maybe 6 to 8 ppm for that 3D stack. If you put 5 micron vias on 16 micron centers, which is roughly a third of that area, that’s about 8 ppm.

SLD: Can glass also be a channel for heat or ESD?
Tummala: You can use glass in two ways. One is to isolate, so if you put logic and memory all in one stack you end up heating the memory chips, as well. You don’t want to heat the memory, but you have no choice. If you put logic on one side of glass and memory on the other, the glass works as an insulator. You also can use glass for conductivity. Right now you get rid of heat with heat sinks. We expect our technology of making holes will be so chip compared with silicon that we should be able to metalize a lot of those holes with copper and be able to use that for thermal conductivity. It will be even better than a silicon chip. In theory we should be able to get very high conductivity locally, if you need it, by having copper vias through glass.

SLD: So the glass becomes an insulator around the copper via?
Tummala: Yes, exactly. You end up with a better signal.

SLD: What’s the timing for glass TSVs?
Tummala: We have demonstrated the technology. We know how to make holes and metallize. Now we’re dealing with some of the liabilities and demonstrating them. I would say a two-year time frame is realistic for it to be commercially available.

SLD: Are all the major foundries and chip companies looking at this?
Tummala: Yes. In the last six months, we have moved all these technologies into glass. The next step will be to use glass for chips. We think wafers are good, but they’re too expensive, so we’re looking at panels that are 700mm to 900mm. That will provide hundreds or thousands of interposers. We started looking at glass for cost, but we’re also seeing performance improvements. You get both with glass.

SLD: Is defect density easier to control in glass than silicon?
Tummala: Glass is super smooth. Unlike silicon, which needs to be polished, glass comes out smooth.

SLD: So you don’t need CMP?
Tummala: No, that’s not necessary.

Tri-Gate’s Fallout

Thursday, May 26th, 2011

By David Lammers
Intel Corp. dropped a rock into the pond of transistor technology when it announced its 22nm tri-gate technology in San Francisco earlier this month. The ripples continue to move out from that event, with impacts on IDMs, foundries, and fabless semiconductor companies being closely studied.

Now that Intel has come out of the closet with its tri-gate technology, “the foundry customers are all going to ask, ‘When am I going to get a FinFET? What does it look like?’” said one source, who asked not to be identified.

What they may find is a transistor that is rather difficult to build, at least for the companies that lack the resources to make the jump from planar to vertical structures. “Intel’s competitors will all be taking that thing (the tri-gate device) apart. They will learn from it. They will catch up, but it is not automatic and takes time. Intel has shown its technology leadership, but of course they have to invest an enormous amount of money to stay ahead and the competitors have to spend a much smaller amount to copy,” the source said.

Opinions differ on how quickly finFETs will enter the SoC space. At the Intel tri-gate rollout, Intel architecture general manager Dadi Perlmutter said Intel’s goal is to achieve “parity,” rolling out MPUs and SoC products on the latest technology at the same time. The lag is declining node by node, he said.

Planar vs. FinFET

Analyst Nathan Brookwood, sees Intel introducing tri-gate-based, 22nm, Atom-based SoCs for smartphones and tablets in the fourth quarter of 2012. Those “Silvermont” SoCs would be supplanted in 2014 by the 14nm-based “Airmont” SoCs. If that scenario proves accurate, Intel will be on the market with Atom-based and MPU products at the same time in 2014.

If Intel meets its target, and if TSMC rolls its finFET technology in 2015 at the 14nm node, at least two companies would be on vertical transistors for SoCs. There is speculation that TSMC might pursue a planar transistor for low-cost applications at the 14nm generation, using finFETs for the high-performance graphics MPUs, FPGAs, and others. And some believe that Intel will be more active in the foundry space, partly as a way to monetize the estimated $2 billion it took to develop the 22nm tri-gate technology.

Dean Freeman, a manufacturing technology analyst at Gartner Inc., said Intel’s tri-gate technology is impressive. “However, the SOI group won’t give up any ground.” The SOI consortium is working closely with ARM to demonstrate lower power consumption, at 1 to 2 GHz performance, for smart phones. But Freeman said most of those smartphone chips are produced on bulk wafers today, and they will be reluctant to spend much on the additional wafer cost represented by UTB-SOI wafers. Even AMD has switched to bulk (non-SOI) technology for its low-cost Fusion products, he noted.

On the other hand, Freeman said the vertical devices require a big change in the design tools, and a complete redesign of a company’s proprietary intellectual property. “Not all devices need 3D. Tri-gate will be used for Intel’s X86 products, and IBM will go 3D for its high-performance devices. Some high-performance ASSPs might need 3D as well. I am not certain about the ARM devices,” he said.

Gary Patton, an IBM vice president who manages the Fishkill Alliance including Samsung, Toshiba, STMicro, and GlobalFoundries, said the alliance is developing several different transistors for the 14nm node. IBM will continue to develop an SOI technology with finFET transistors, adding its on-chip SOI-based embedded DRAM technology. Other members of the alliance need a bulk FinFET, and others, including STMicroelectronics, are pursuing a planar UTB-SOI approach (which IBM refers to as Extremely Thin (ET)-SOI) using back-gate biasing underneath the planar channel to boost performance or reduce power consumption.

“ET-SOI with a back-bias operation is pretty comparable with finFETs for certain applications. FinFETs are pretty complex, and ST Micro is pretty confident in ET-SOI,” Patton said during a brief interview at the Advanced Semiconductor Manufacturing Conference, held in Saratoga Springs, N.Y., this month. Patton said members of the Fishkill Alliance and IBM Albany will give three papers at the upcoming VLSI Symposium, planned for early June, on SOI finFETs, bulk finFETs and ET-SOI.

“FinFETs have some performance advantages, but Intel and others will have to show that they can control the tolerances, including at the source and drain regions. On the other hand, ET-SOI appears to have some resistance problems, so we’ll have to see how it plays out,” Patton said.

Freeman said the Fishkill Alliance has been a huge success, but warned that the shift to a tri-gate transistor “does give Intel a crack at the mobile device market, as the power consumption is very good.”

The Gartner analyst added, “What IBM needs to look out for is an Intel alliance forming. You already have Toshiba and Samsung working with Intel on some transistor technology, so there could be some cracks forming. There is the possibility of two camps, but Intel is so protective of its IP it will be interesting to see how this plays out.”

Chenming Hu, who led a UC Berkeley team that did much of the early work on both finFETs and UTB-SOI a dozen years ago, said he believes for finFETs and UTB-SOI technology will be deployed. Manufacturing finFETs, with the need for a very thin fin at close tolerances, is challenging for all but the largest companies such as Intel and TSMC.

“If the interface with the design team is close, and the resources are large enough, the lure of finFETs is that they can be scaled. But it does take investments. UTB-SOI does not take as much technology development investment,” Hu said.

UC Berkeley's Hu

“I remain steadfast in my belief that both FinFETs and UTB-SOI will be going to manufacturing,” Hu said. “I expect both to go into production. The very large companies, such as Intel and TSMC, will have the resources to go to FinFETs. Some other companies may go to UTB-SOI. ST Microelectronics is probably the closest to using UTB-SOI. FinFETs may be more versatile in performance and power. On the other hand, FinFETs take a lot more development resources, in terms of the manufacturing control, the layouts, and the libraries. In FinFETs, the gate widths are discrete, rather than continuous. And the thickness of the fin needs to be scaled, along with the gate length.”

Scott Thompson, a professor at the University of Florida, said the manufacturing challenges of finFETs may provide Intel with a five-year lead, or longer.

“Developing a complex technology like tri-gate requires significant investment in silicon resources and manpower—development teams of perhaps more than 1,000 people. The complexities for development mean that hundreds of thousands of wafers have to be run to solve the issues. The tri-gate development is at least an order of magnitude more complex than strained silicon at 90nm, or HKMG at 45nm. That is why it took Intel eight years to implement, and why I don’t think anyone else will have in market for more than five years,” said Thompson, who spent two decades in technology development at Intel’s technology and manufacturing group at Hillsboro, Ore.

Manufacturing perfect fins over billions or trillions transistors is quite a challenge, Thompson said, adding that “it can be done in a fab that runs a single process, with equipment and settings that are kept constant. The manufacturing flow has unique advantages for high-end processors but does have problems supporting several key features needed for SOCs: multiple threshold voltages, and thin and thick oxides in support of analog.”

Next Page »