Posts Tagged ‘TSV’

Bigger Pipes, New Priorities

Thursday, July 28th, 2011

By Ann Steffora Mutschler
From the impact of stacking on memory subsystems to advances in computing architecture, Micron Technology is at the forefront in the memory industry. System-Level Design sat down to discuss challenges, as well as some possible solutions, that plague memory subsystem architects with Scott Graham, general manager for Micron’s Hybrid Memory Cube (HMC) and Joe Jeddeloh, whose team developed the logic portion of the HMC. What follows are excerpts from that discussion.

SLD: As the industry moves to employ stacking techniques, what are some of the overall impacts on the memory subsystem?
Jeddeloh: The TSV itself enables a very low power interconnect, and as we move up and down in the Z direction one of the objectives to take the greatest advantage of that is try not to move in the X and Y direction very much. For us, we create a DRAM architecture that is tiled. Instead of a DRAM die being one large device that has one set of I/Os on it, we break it into, say, 16 separate DRAMs, in essence much like a multicore processor. Each of those DRAMs has its own interface so when you go to access data, you go to a very local area of DRAM. Instead of lighting up a really large DRAM array or page or row, your cache line comes from a more localized area. At the start, it’s a more efficient access from the DRAM itself because we’re not moving bits very far in an X, Y direction. We’re not lighting up an extra number of transistors and capacitors. It’s a more directed access.

Then we take that access and being that it is coming from a tile or partition that is, say, 1/16th the size of a normal DRAM, we then move that down the Z direction on a TSV in a very localized access. There are two big themes. We lit up fewer transistors and we moved it a shorter distance so it becomes a very efficient transfer of that data down that Z pipe.

SLD: What does that do to cache? Can you get rid of some levels of cache?
Jeddeloh: Not necessarily. That will be a conversation on the longer-range implications. Cache hierarchies are created because of not only latencies but bandwidth deficiencies of going to a memory subsystem. When you can have thousands of these TSVs with a very low power profile you can get a tremendous amount of bandwidth now moving in a cube. The cache structures that we rely on today…you think of them differently when there is so much bandwidth that is potentially available so they can begin to be rethought.

SLD: Is Micron supporting Wide I/O?
Graham: Which version? There is a low-power Wide I/O and then there is a Wide I/O derivative that basically spawned from that group that is in the JEDEC task group right now. It is being explored and they are actually calling it high-bandwidth memory, but it is essentially a Wide I/O effort. Micron is actively participating in both of those.

SLD: What is the impact of stacking on performance of memory subsystems?
Jeddeloh: Two aspects to that. One is just raw bandwidth. You can put so many TSVs into that stack that you can generate more than a magnitude increase of bandwidth. But when we start talking about tiling or the partition into concurrent resources, where there’s a traditional memory subsystem, there’s always a bank conflict that has to be managed and there are a limited number of banks. When you think of a DIMM, maybe it has 4, 8, 16 banks in it, and that fundamental access to that bank isn’t going to come down latency-wise a great deal. That just can’t be pushed.

The physics can’t push it that far unless you go to something like an RLDRAM and pay a big die cost to get that. In a traditional DRAM, the banking concept is still going to be there, but once you go into a memory cube where you have these tiles and partitions, each of those has its own bank structure. So instead of 8 banks, you have 128 banks, 256 banks and each of these are put into parallel DRAM structures so you have a tremendous amount of concurrency available. You can think of a many-core processor coming at a many tiled memory system that marries up and can handle a lot of concurrent transactions.

SLD: How do architects make the system design trade-offs in terms of memory subsystems?
Jeddeloh: One is the processor needs to get bandwidth onto it. As we go to more and more cores, it’s becoming more and more bandwidth-hungry. In this generation, you can’t stack the DRAM on top of the processor because the processor is too hot. That means the processor has to go off chip to get that bandwidth, and that just becomes and I/O pin power silicon density. You need to connect a pipe to that processor that can bring in as much bandwidth at the lowest amount of power or for that investment, and for something like a memory cube you can put more density in a very local area and put that right next to the processor. When you think about that topology, then once again we are reducing distances such that we can create a power advantage. And power is really the No. 1 theme going forward. Once we reduce that power, we can create a smaller, more efficient I/O structure when the processor and the memory system are right next to each other.

As an architect, you start thinking about this not only from a logical perspective but the physical perspective—being able to stack these 3D structures in a very localized area and create a very dense, low-power interconnect. This is also going to mean new materials, perhaps, like MCMs, silicon interposers, but not the traditional 10 inches of FR-4 because it just consumes way too much power to ship the bits over that structure.

As a memory architect, you start thinking about this from many different facets. You have the materials, the locality, and then of course you have a thermal issue. As we move close to that processor, we start looking into the processor’s thermal solutions space.

And, then you start thinking about the concurrency of having resources like this. If you have, say, 8 cubes hooked up to a processor, there’s a tremendous amount of bandwidth and concurrency that can happen in a very small area.

SLD: How do you deal with the heat issue in the HMC?
Jeddeloh: In many of the early instances, we’re going to be part of the processor’s cooling complex. We put a top or lid on it. DRAM doesn’t like heat. It messes up the refresh. If we are not on top of the processor, the heat is manageable. Once you create that low power I/O, which is enabled by changing the locality of the overall system topology—and we’re not creating as much power within the cube itself—then we stack it up and pull the heat out the top.

Source: Micron Technology

SLD: How far out is the HMC from being able to be used in production designs.
Graham: Our plan of record is for production to begin in the second half of 2013.

SLD: What trends do you see happening in computing architecture?
Jeddeloh: It used to be megahertz. Then it was multicore. We believe that the trend is low-power memory bandwidth. That prime piece of real estate is what sits down at the bottom of those TSVs. That’s a very, very valuable piece of real estate and there’s going to be a struggle over how that value proposition is going to be established.

New 3D Stacking Techniques Emerge

Thursday, December 16th, 2010

By Pallab Chatterjee
To take advantage of the capabilities of the new technologies, design and circuit architectures in the future will have to be closely coupled with the basic device creation.

That shift was the subject of a special session at the recent IEDM conference focusing on the confluence of technology and design. One such area under discussion involved 3D ICs. While a lot of discussion has taken place about the use through-silicon vias (TSVs) for connecting stacked die, this technology is still in its early stages for reliability and design rules. TSVs require additional process steps beyond the standard wafer processing and they have to be spaced away from active circuitry on all sides. The comparatively large size (typically 5um x 5um or larger) on a small process (40nm node), for a size factor of 125x, created a complete blockage of all routing from the bottom of the TSV to the top of the chip and has impact on design closure. As there is mechanical compression from the bonds on the TSV to the adjacent die, the TSV cannot be located over all types of circuitry without affecting device operation.

A solution to these issues is the use of a design technique called through-circuit interconnect (TCI), which is a low power RF solution that does not require any additional processing steps. The technique uses small-field RF antennas that are approx 100um/side and which are connected by standard high-speed transceivers. The technique has the advantage of being compatible with low-power processes as the data transfer energy is only 0.01pJ/bit and can communicate to multiple die in a multi-die stack. The data transfer has been tested at 11Gb/s/ch and can be aggregated to 8Tb/s over 1000 channels in only 6.4mm-sq. The inductive coupling technique supports a reliability BER<10-14 and a cost reduction of more than 20 cents per chip. Moreover, the technique has been in development and has presented results since 2004.

One of the performance and reliability issues for 3D stacked ICs is the thermal handling of the die. In a normal single-die arrangement there is direct uniform contact between the die and the package, which can have a variety of heat dissipation techniques used to cool the part. In a stacked-die scenario there is a localized thermal issue in the areas the die overlap. A technique was shown that uses inter-tier microchannels etched in the bulk side of the die for pumping a cooling fluid. The technique is very effective, but work is needed to optimize the flow rate control of the micro-pumps in the system to reduce their energy use, and balance the flow rates to the dynamic thermal loads. The preliminary results have shown a 21% improvement in system-level energy vs. air cooling.

On the circuit side, there are changes in the basic P and N devices. New processes have advanced features such as strain engineering, high-K dielectrics, and novel channel materials & device structures. Traditional devices have performance optimization based on pre-determined Vdd and Ioff levels. However, these new devices have power and performance constraints that require simultaneous optimization of Vdd, Ioff, Lg, Tox, and other parameters. Applications such as SRAMs have different noise margins per device with new processes, and as a result the devices have process technology variability sensitivity and a strong sensitivity to the use in circuits and the device configuration.

Preparing For 3D IC Stacking

Thursday, July 22nd, 2010

By David Lammers
Through-silicon vias (TSVs) are in various stages of late development, but design and manufacturing challenges remain before companies can gain the full benefits of the third dimension.

Two camps are pushing hard to introduce TSVs—the design community and the manufacturing equipment companies. The initial goal is to connect graphics memories to graphics processors in mobile systems. Integrated device manufacturers (IDMs) such as Samsung Electronics are racing to use TSVs to couple high-bandwidth DRAMs with processors. Samsung counts Apple as a major customer. Qualcomm and foundry partner TSMC are creating their own design and manufacturing ecosystem for TSV-enhanced mobile IC solutions.

Sesh Ramaswami, senior director of strategy for the TSV program at Applied Materials said Applied has had about 50 people working on TSV-related technologies since 2008, and now has a complete toolset ready. “We are extending all of the knowledge gained from Damascene processing (for copper chip interconnects) to TSVs,” he said.

The equipment and materials companies have gained valuable learning from the early adoption of TSVs in CMOS image sensors, said Didier Louis, a project leader at the Leti R&D consortium in Grenoble, France. Leti worked closely with STMicroelectronics and Nokia to develop a TSV process flow, used to create image sensors in which the TSVs connect the CMOS image sensor and memory. Leti is not yet working on a logic-to-memory TSV solution, but Louis said, “We have in our toolbox all the knowledge. To manage a logic-memory TSV integration it helps if the dice are the same size, and if the manufacturer knows where to drill the vias.”

Signal and ground TSV architectures may be needed for video applications. (Source: Robert Geer, CSNSE)

Fig. 1: Signal and ground TSV architectures may be needed for video applications. (Source: Robert Geer, CSNSE)

Getting the bandwidth increases promised by TSVs will require careful interconnect design optimization, said Robert Geer, a professor at the College of Nanoscale Science and Engineering in Albany, N.Y.. “Every time a designer uses a TSV, you are losing device area,” Geer said, noting that about 10% of the typical die area is consumed by the vertical interconnects. While performance gains are there to be had, Geer reminded an audience at Semicon West that “as nice as TSVs are, they are still copper, which has a frequency limit of about 1GHz” for a 5µm-diameter TSV. For memory access, bandwidth of 2 terabits per second (Tb/s) is sufficient, but logic-to-logic computation requires 5 to 6 Tb/s, and RF signals need much more bandwidth, 50 to 100 Tb/s.

Video and RF will require up to 100 Tb/s of bandwidth. (Source: Robert Geer, CNSE)

Fig. 2: Video and RF will require up to 100 Tb/s of bandwidth. (Source: Robert Geer, CNSE)

Power is a critical issue. “You want the signals to go through (from logic to memory) at a femtoJoule rate,” Geer said.

For power-conscious mobile systems, TSVs are the only realistic way to connect a graphics/video processor to several layers of graphics memory, where 12.8 GB/s of bandwidth is needed between the processor and DRAM memory for high-definition video. A conventional (non-TSV) HD video solution would require high-frequency operation over 2,000 I/O pins, a non-starter for any battery-operated system, said Pol Marchal, director of IMEC’s TSV development effort.

Geer and CNSE colleague Wei Wang have studied the interconnect architectures needed for “many-core” SoCs with the processor blocks running at relatively low frequencies. Network-on-chip (NoC) architectures for these TSV-enhanced many-core solutions will be required. For video processing and other high-bandwidth requirements, Geer said a coaxial interconnect design, with each signal TSV surrounded by four ground/buffer TSVs, may be required.

While the design community develops the expertise and EDA tools required for TSV-enhanced interconnect architectures, the equipment and materials vendors are ironing out their own challenges. Fusen Chen, executive vice president at Novellus Systems, said TSVs of 5 to 6µm are difficult to fill without voids. Because of the CTE mismatch between copper and silicon, “the copper wants to pump out” from the via, Chen said, adding that for Novellus “the key is our ability to pre-wet in a unique way.”

Keeping the cost of electroplating down, particularly for high-aspect ratio (20:1) vias, is another challenge, Chen said. Novellus introduced its Sabre 3D electroplating system, optimized for TSVs, redistribution layers (RDLs), and other wafer-level packaging applications at Semicon West this month. That sets the stage for an intense electroplating competition between Novellus and Applied Materials, which last year bought electoplating vendor Semitool Inc. (Kalispell, Mont.).

Also at Semicon West, Applied introduced the Avila CVD system for the vias-last TSV process flow, where temperature control is critical. In the vias-last flow, TSVs are formed from the backside after the wafer is thinned. In the vias-middle approach, the TSVs are created in the wafer fab after formation of the contacts.

Stressing Over 3D

Thursday, June 24th, 2010

By David Lammers
Pol Marchal recalls putting a stacked 3D prototype on his desk at IMEC in Leuven, Belgium, last year, which a visitor picked up and examined two months later. “I don’t think this chip will work,” the visitor said, causing Marchal, principal scientist at IMEC’s 3D system integration program, to put the stacked die under a microscope. Sure enough, Pol found that mechanical stress had relaxed over time and the top die had delaminated.

3D researchers around the world are paying much closer attention to thermal and mechanical stress, particularly as ever-thinner die are stacked and connected. At last week’s 2010 Symposium on VLSI Technology in Honolulu, IMEC researchers described the mismatch in the coefficient of thermal expansion between copper through-silicon vias (TSVs) and the surrounding silicon. Using a 45nm digital analog converter test chip, the IMEC team measured transistor drive currents with TSVs located at various distances from the active circuits. Tensile stress near the TSVs was in the range of mega Pascals (MPa), declining to zero stress at a distance of about 10 microns from the TSV edge – a distressingly long distance for today’s leading-edge devices.

That kind of fundamental information (IMEC will deliver a more-complete paper at IEDM in December), Marchal said, is allowing EDA vendors and chip manufacturers to begin creating stress models. “Thermal and mechanical stresses induced by TSVs is a big worry for the community, but I believe there are different ways to mitigate them. A lot depends on what type of devices you use. Stress is not uniform for different types of devices; for example, long channel devices see more impact.”

At its Leuven facility, IMEC is fabricating a series of test chips, named after volcanos, which have sensors positioned at various places on the die to measure thermo-mechanical stress. “We position the smart sensors at the most critical places inside the stack, with the DRAM on top, to study the thermal and mechanical impacts. Then we provide the information to our supply chain partners, including the DRAM makers and packaging houses. Our partners, such as Qualcomm and STMicro, want to gain a head start. When they start RTL-level design they want to know what is feasible and what is not.”

Fig. 1: IMEC is working with several EDA partners to create a pathfinding flow for 3D prototype creation. (Source: IMEC)

Fig. 1: IMEC is working with several EDA partners to create a pathfinding flow for 3D prototype creation. (Source: IMEC)

While the EDA community is making progress, Marchal worries about is the relative absence of the packaging design community. Co-design of the die and package is particularly important as top die are thinned to 50 microns or less, positioned on top of a much thicker die.

“The die are becoming as thin as aluminum foil, and if you have the wrong glue, or different heat cycles, the die do not remain flat. That builds up stress across the die. The EDA and packaging communities need to be more active in how to analyze this,” Marchal said.

These challenges will be solved, particularly as increasingly large investments are being made in 3D TSV technology. At the DAC conference last week, Myung-Soo Jang, a design infrastructure manager at Samsung Electronics, described Samsung’s plan to use TSVs to link a mobile logic device with a 400 MHz DDR3 memory. “Because we are the world’s largest memory maker, which also has systems expertise from our cell phone business, we believe we have an advantage in this area,” Jang said.

Memory manufacturers such as Samsung and Toshiba, bring certain advantages, but fabless and foundry vendors are forming their own alliances, including EDA vendors and packaging houses. At DAC, L.C. Lu, director of the design methodology division at TSMC, called 3D TSV “the next killer application” and outlined a 3D design flow that TSMC is developing.

The mobile systems vendors need TSV interconnects to support the bandwidth needed for new data services, which include point-to-point video, a market set to accelerate over the next five years. High-definition video encoding requires about 12.8 GB/s of bandwidth between the processor and DRAM memory. The only way to do that in a mobile system is with TSV-connected logic-memory solution, using three or four tiers of die thinned to 30 to 40 microns. To achieve the same bandwidth, conventional DRAMs would require the power-hungry GDR5 DRAM standard, some 2,000 I/O pins, and a frequency that would kill any cell phone battery quickly.

Sitaram Arkalgud, director of Sematech’s 3D interconnect initiative in Albany, N.Y., said Sematech is focused now on a copper-copper bonding, vias-middle manufacturing flow that will come into use in several years. To tackle the stress challenge better standards are needed for how to measure and report thermal and mechanical stress levels. At next month’s Semicon West in San Francisco, Sematech and Germany’s Fraunhofer IZFP research center will hold a workshop on 3D stress management, including DFM-like approaches for managing stress, material properties, and measurement techniques.

Arkalgud said Sematech and SEMI are working on a data exchange format for TSV applications, and will hold a meeting on the subject at Semicon West and again at Semicon Europa in the fall.
“Especially as we go to thinner die the TCE and stress issues will be something we need to watch. To do that, we have to agree on how to measure it, so a data exchange specification needs to be there,” Arkalgud said.

3D Integration: Extending Moore’s Law Into The Next Decade

Thursday, August 27th, 2009

By Cheryl Ajluni

At the 46th Design Automation Conference in San Francisco last month, attention turned to a discussion of how to extend the momentum of Moore’s Law into the next decade. One plausible solution, according to Philippe Magarshack, the general manager of Central CAD & Design Solutions at STMicroelectronics, is 3D stacking for complex System-on-Chips (SoCs).

The concept of 3D stacking or integration technology is not new. In fact, 3D stacking of dies has been successfully demonstrated and is currently being commercially employed in some embedded domains (e.g., stacking DRAM memory on CPU cores). A recent 3D IC report from Yole Développement suggests that by 2012, the number of 3D IC-processed wafers could surpass 10 million units, driven in part by handset, wireless and computing applications. Given the intense interest and work going into developing 3D integration technology, this prediction seems just about right—assuming, of course, that a few challenges can first be met.

Exploring the third dimension

Very simply put, 3D integration consists of stacking integrated circuits and connecting them vertically so that they behave as a single device. A 3D chip is therefore just a stack of multiple device layers with direct vertical interconnects tunneling through them. So what’s the big deal about 3D integration?

Today’s semiconductor chips face extreme pressure to achieve increased performance, while reducing their size and accommodating lots of new functionality. When these factors coalesce in traditional 2D chips, longer interconnects result. In SoCs, longer interconnects translate into reduced speed and increased power consumption.

A key benefit of 3D integration is that it can reduce the length of interconnects. Additionally, it provides higher transistor density, faster interconnects and heterogeneous technology integration, with potentially lower power, cost and faster time-to-market. As Matt Nowak, director of engineering in the VLSI technology group of Qualcomm’s CDMA technology division, pointed out in a DAC 2008 presentation, the 3D approach “achieves extremely high densities, allowing us to use heterogeneous technologies and reduce form factor. The key is that it enables the use of new differentiating technologies to build new architectures that cannot be built in existing technologies.”

Eyeing recent developments

Up to this point, most efforts in 3D integration have focused on developing different fabrication techniques for stacking multiple device layers and forming the vertical interconnects. Much of the work has been done through collaborations with academia, industry organizations and government-sponsored laboratories around the world. One of the key technologies to come out of this research is a next-generation interconnect technology known as Through-Silicon Via (TSV). The TSV is a vertical electrical connection that passes completely through a silicon wafer or die to produce multilevel chips with an optimum combination of cost, functionality, performance, and power consumption. By using TSV technology, 3D ICs can pack greater functionality into a smaller footprint and realize shorter critical electrical paths, resulting in faster operation.

Some of the other developments to come out of ongoing 3D integration research were recently recognized at the Electronic Components and Technology Conference. Sandia National Laboratories presented details of its W TSV process, which is said to provide a suitably low-resistance metal with a coefficient of thermal expansion close to Si, a via fill that is conformal, and can be readily integrated into IC fabrication. IMEC introduced a novel process for die-to-wafer bonding (using Cu-Cu bonds) of its 3D SIC technology and a scalable TSV technology for 3D wafer-level packaging. Its TSV technology is designed for 3D structures where interconnects are fabricated after standard CMOS processing.

SEMATECH also is focusing its 3D research on TSV technology, particularly for implementation. The industry organization is actively working to bring together partners from across industry—chipmakers, equipment and materials suppliers, assembly and packaging service companies—to make 3D TSV suitable for high-volume manufacturing (Figure 1).

Figure 1. In contrast to the 2D-SoC or 3D System-in-Package, 3D TSVs offer a cost-effective way to achieve high density and performance, while also being able to integrate non-CMOS products with CMOS. The SEMATECH 3D project is based on cost modeling to assure products will be both manufacturable and affordable.

Help: tool support needed!

While ongoing research and development is absolutely critical to the success of 3D integration, perhaps one of the greatest challenges it faces is tool support in terms of design techniques and methodologies. Without it, engineers have virtually no efficient way to exploit the technology’s benefits. Tool support is especially critical when it comes to 3D integration because vertical stacking tends to increase thermal resistances, further exacerbating temperature-induced problems that can negatively affect system reliability, performance and leakage power. The use of 3D also will significantly complicate the typical design flow.

The key, of course, lies in creating a standardized design environment and methodology for physical design of 3D chips that could support a range of different tools. Having the tools integrated in one place would make it easier for designers to explore and make architectural decisions and then, to hand those decisions off to next stages in the design process.

3D IC integration is still in its infancy and, as a result, tools developed today for one specific application (e.g., stacked memory) may not be suitable for heterogeneous integration tomorrow. Nevertheless, there are some tools available now, with more in development. Some of these tools include:

3D PathFinding

Javelin Design Automation. 3D PathFinding provides a detailed 3D flow for accurate performance/power/cost estimates that can be used for rapid design exploration and optimization of 3D stacked ICs. Developed in collaboration with IMEC and Qualcomm, the solution extends Javelin’s existing PathFinding methodology and j360 Silicon PathFinder physical design prototype platform to support virtual chip design (Figure 2).

Figure 2. Javelin’s 3D PathFinding solution allows the designer to assess the impact of various 3D interconnect strategies throughout the IC design and fabrication process, in a matter of just a few hours or days. Silicon process engineers can use it to fine-tune their technology to the system architecture specifications.

MAX-3D, R3Integrator, R3CAD, and R3Artist; R3Logic

These tools, developed through work conducted as part of research programs sponsored by the Defense Advanced Research Projects Agency (DARPA), enable 3D IC design and analysis (Figure 3). MAX-3D is a 3D mask layout tool whose technology file includes all properties of stacking process, wafer orientations, bond materials, via electrical/material properties, and also incorporates 2D foundry design kits. R3Integrator is used for die/interposer/package co-design with TSVs. R3CAD is a java-based, multi-platform tool for 3D design research and prototype study and R3Artist is an embedded 3D layout editor (Figure 3).

R3Logic is currently collaborating with STMicroelectronics and CEA-LETI to develop a full 3D design flow for 3D heterogeneous system and system-in-package design.

Figure 3. R3Artist features single and multiple wafer technologies, integrated material properties database and solid model extraction, including dielectric layers.

3DCACTI

3DCACTI estimates the optimum access times and power dissipation of a cache using 3D IC technology for a given number of active device layers and by partitioning device layers for various technology nodes. Based on the estimation, it searches for the optimized configuration that provides the best delay, power and area efficiency trade-off according to the cost function for a given number of different 3D partitions.

3D Magic and PR3D, Massachusetts Institute of Technology

3D Magic is a comprehensive layout methodology for 3D circuit-layout editing and extraction with MAGIC, an open source layout editor developed by UC Berkeley. PR3D is a placement and routing tool for standard cell design in 3D. Both tools were developed through MIT’s Interconnect Focus Center Research Program. MIT also developed SysRel (System-Level IC Reliability) for assessing the interconnect reliability of 3D ICs from a thermal-aware perspective at the circuit-layout level.

Conclusion

With the pressure on traditional 2D chips mounting, 3D integration has begun to establish itself as a viable means of breathing more life into Moore’s Law. It certainly touches on all the hot buttons in the industry today, namely low power, cost and time-to-market. The challenge will be in ensuring that these benefits are realized in a timely and efficient manner. 3D-specific design tools and methodologies are coming to meet this challenge head on. In the meantime, the tools available now and the groundwork for future tools and methodologies being laid by industry organizations, academia and commercial companies alike, will go along way in ensuring 3D integration plays a critical role in the future of the semiconductor industry.