Posts Tagged ‘Samsung’

Next Page »

3D DRAM Makers Inch Closer To Production

Thursday, December 1st, 2011

By Mark LaPedus
For some time, DRAM makers have been developing 3D memory chips, but commercial products still are not due out for some time because of technical and cost issues.

But the advent of the 3D DRAM era could be near the turning point, as two memory rivals have separately moved to bring their respective technologies closer to production. In one move, Micron Technology Inc. has disclosed the manufacturing flow for its recently announced Hybrid Memory Cube (HMC) technology, a 3D DRAM scheme geared for high-end servers and networking systems. Under the plan, IBM will manufacture the controller logic portions of the HMC within its own fab. Micron will make the memory portions, as well as assemble and test, the HMC devices within its own operations.

On another and more surprising front, Japanese DRAM maker Elpida Memory apparently has beat its larger rivals to the punch by announcing the industry’s first commercial Wide I/O DRAMs. The first device from Elpida, dubbed Wide IO Mobile RAM, is a 4 Gbit device based on a 30nm process technology and a 3D structure using through-silicon vias (TSVs). Elpida plans to sample its first Wide I/O DRAM devices this month. The devices are geared for next-generation smartphones and tablets.

Samsung Electronics Co. Ltd. and Hynix Semiconductor Inc. are also separately developing 3D DRAMs. The idea behind a 3D device is to stack existing die and connect them using TSVs, thereby lowering the resistivity and boosting the bandwidths. But the problems with 3D devices based on TSVs involve cost, technical issues and supply-chain headaches.

“There is a lot of attention and engineering resources being thrown at 3D right now by all DRAM developers, including Samsung, Micron, Elpida, and Hynix,” said Mike Howard, senior principal analyst for DRAM and memory at IHS iSuppli. “Wide I/O has yet to really reach a cost level that makes it competitive and we are likely still a few years away from mass adoption. Elpida may very well have a functioning part in the lab and may be able to produce test samples, but I think we’re still a few years away from this being used in anything but the most premium markets.”

Hank Lai, product planning for memory marketing at Samsung Semiconductor Inc., said Wide I/O DRAMs are not expected to gain traction until sometime in 2013. At present, smart phones and tablets are using plain-vanilla, low-power DDR3 DRAMs or mobile DRAMs based on the LPDDR2 interface standard. Before Wide I/O, the mobile market will move from LPDDR2 to the next-generation LPDDR3 interface standard, Lai said.
LPDDR2 has a maximum throughput of 8.5 Gbytes/second. LPDDR3 has a peak throughput of 12.8 Gbytes/second. Samsung claims its new LPDDR3 devices consume 20% less power than LPDDR2.

Elpida’s Wide IO Mobile RAM has 512 I/O pins. The device is said to achieve a data transfer rate of 12.8 Gbytes/second, roughly similar to LPDDR3. But Elpida’s Wide IO Mobile RAM has a height of 1.0mm, compared to 1.4mm with existing mobile DRAMs based on today’s package-on-package (PoP) technology.

Elpida acknowledged that the Wide I/O market will take time to evolve. The 4 Gbit Wide I/O DRAM will sample next month, but production “will take place sometime in the second half of 2012,” according to officials from Elpida. “For volume production, it will be sometime in 2013.”

In March of 2012, Elpida plans to sample a 16-Gbit DRAM, which is based on stacking four 4-Gbit Wide IO Mobile RAM chips. Mass production is due sometime after 2013, according to Elpida.

On the other end of the spectrum, Micron and Samsung are moving full speed ahead with HMC. “This is a slightly different product than Elpida’s and is targeted at server customers. The specs are very promising, but again, this is still a few years from hitting the big time—2013 at the soonest,” Howard said. “Samsung is also a part of the HMC group, lending weight to the product’s chances.”

In October, Samsung and Micron announced the creation of a consortium to develop an open interface specification for HMC. Micron is the actual designer of the HMC technology. Micron and Samsung, as well as Open-Silicon, Altera and Xilinx, are the founding members of the Hybrid Memory Cube Consortium (HMCC).

HMC will incorporate DRAM arrays stacked on a logic chip. The device is connected with 2,000 to 3,000 TSVs. HMC prototypes are said to clock in with bandwidth of 128 Gbytes/second.

It is not a widely known fact, but fabless ASIC house Open-Silicon is developing the controller IP for HMC. Colin Baldwin, director of marketing and business development for Open-Silicon, said the HMC controller will be based on the company’s Interlaken controller IP. Interlaken is a high-speed, chip-to-chip interface protocol that builds on the channelization and per-channel flow control features of SPI4.2. The Interlaken controller will serve as the interface between the memory and physical layer to help “boost the bandwidth” in the device, Baldwin said.

On the manufacturing front, the HMC device itself will go through a two-step process. The controller logic portion of HMC will be manufactured at IBM’s semiconductor fab in East Fishkill, N.Y., using the company’s 32nm, high-k metal gate process technology. IBM also will handle the TSV creation process based on Micron’s specifications.

Micron will develop and make the DRAM arrays in-house based on a 3xnm process within its own fabs, said Mike Black, a technology strategist at Micron. Micron will take the logic controller from IBM—and the in-house made memory arrays—and then will assemble and test the entire HMC device within Micron’s R&D production line in Boise, Ida, Black said.

Micron is in the qualification stage with the device. “We are feeling pretty good about it,” he said. “Most of the learning is done.”

AMS Challenges Growing

Thursday, December 1st, 2011

By David Lammers
Analog and mixed signal (MS) devices will play an ever-increasing role in saving energy, particularly as the “Internet of Things” expands to about 10 billion units per year over the next decade. But as leading-edge design rules scale to 28nm and below, enhanced with high-k/metal gate technologies, it is becoming increasingly challenging to integrate AMS devices on SoCs.

Tyson Tuttle, chief operating officer at Silicon Laboratories, said demand for mixed-signal technology will accelerate in an energy-conscious world. As people turn to electric vehicles, solar energy, and the “Internet of Things,” mixed-signal sales will increase sharply. “People will pay money to save energy,” Tuttle said at a Global Semiconductor Association (GSA) event held in Austin, Texas. For example, the embedded devices that rely on harvested energy, such as electricity harvested from vibrations, will become a multi-billion-dollar opportunity over the next decade, Tuttle said.

Increasingly, the “digital-centric” approach to AMS functionality is being employed to improve power consumption. AMS “enables new ways of managing energy and resources,” he said, citing smart meters, which deliver real-time monitoring of power consumption as one example.

Mixed-signal devices are about one-tenth of the $300 billion chip industry now.

Silicon Labs has shipped about a billion radio chips, which provide the FM radio function on handsets, personal media players, and other systems. The company is gaining traction in the integrated CMOS TV tuner market, which Tuttle said “is a hard problem” due to multiple broadcasting standards, noise margins, and other challenges.

While the topic of the event was the confluence of AMS and 28nm technology, few are close to the leading edge for largely AMS products. Silicon Labs uses relatively relaxed design rules (55nm at 2.5V is the most advanced process, and 90nm is common for chips with embedded flash). Tuttle said “it will be a while” before the TV tuner chip goes to 45nm technology, for example, partly because of mask costs. “There is no one application or chip right now that will pay for a 28nm mask set.”

While digital and analog use much different process technologies, digitally-assisted mixed signal hews closer to the leading edge. (Source: Silicon Laboratories)

AMS technology is becoming more challenging at 28nm. At the GSA event, only a few hands went up when the panel moderator—Mahesh Tirupattur, executive vice president of Analog Bits—asked how many people in the audience were designing at 28nm design rules. Jose Alvarez, design collateral manager at Freescale Semiconductor, said his company has several 28nm SoC designs underway.

“There is a significant amount of analog work, a lot more than we originally thought,” Alvarez said during the panel discussion on the challenges of 28nm AMS technology. As SoCs move to fast Serial I/O buses, design teams are being challenged. Clocking of the dozen or more phase locked loops (PLLs) is “very complex,” he said. Packaging is another challenge, he said, noting that 3D stacked ICs may be used to incorporate “high-end analog into low-end SoCs.”

3D and 2.5D (interposer) integration will provide “a boon for MS integration if the thermal issues can be worked out. I think we’ll see a lot more 3D packaging at 28nm and below,” he said.

“Today’s complex SoCs are throttled by power considerations,” Alvarez said. Putting circuits to sleep is one solution, but the design challenge is “putting the hooks in there to make sure those circuits come back to life” in a timely fashion.

Ana Hunter, in charge of Samsung’s U.S. foundry operations, said Samsung has several 28nm SoCs in the prototyping phase now, with 20nm devices headed toward shuttles, all including extensive AMS technology.

The panelists agreed that HK/MG and double patterning provide additional challenges. While HK/MG technology will reduce gate leakage, the metal gates result in increased variability, requiring more stringent circuit simulations. Similarly, double patterning introduces adjacent metals, oftentimes on different masks, which requires improved static timing analysis to ensure that the timing circuits work correctly. Extra margins and restricted layouts may be required.

The result is increased spending on AMS IP. Sanjay Krishnan, a management consultant at Keystone Strategy Inc., said about 40 percent of the intellectual property (IP) owned by foundries is analog mixed signal (AMS), while only 30 percent is digital IP. Foundries such as TSMC and Global Foundries are building up their AMS IP libraries in order to “add value like Apple did, creating their own ecosystems.”

“As AMS moves to the foundries, IP becomes more important,” Krishnan said. “The industry is evolving, going to more external IP. It is all part of the disassociation of the supply chain,” he said.

Samsung, Micron Unveil 3D Stacked Memory And Logic

Thursday, October 6th, 2011

By Ed Sperling
Samsung and Micron have joined forces to create 3D stacked memory, a development that has profound implications for manufacturing, packaging, design and power.

The fruit of their joint venture is the Hybrid Memory Cube—a hybrid of memory and logic—that comes in either four- or eight-layer stacks of memory. The logic layer is a memory controller that will work like a hypervisor for testing, routing and optimization.

The new device is a true 3D stack, including between 2,000 and 3,000 through-silicon vias. The die themselves will be manufactured at the 20nm process node or smaller, with an expected jump in throughput that will enable movement of the same amount of data for 70% less power, according to Scott Graham, general manager of DRAM marketing for Micron Technology.

Graham said the consortium will send invitations out to potential partners and that the specification for the HMC will be finalized next year. Still to be worked out is who manufactures the HMC. Both companies are expected to use different manufacturing facilities.

What becomes particularly interesting with 3D memory is the possibility of using the memory much more judiciously with heterogeneous cores so only the resources that are needed are actually used. That can save on power while also reserving enough performance for those applications that require more memory and processing power. These memories can be used both in 3D stacks, as well as 2.5D stacked configurations where the memory is connected through an interposer layer.

Both Graham and Pablo Temprano, director of DRAM and graphics marketing at Samsung Semiconductor, acknowledged there are numerous possible scenarios for using this technology. They noted that some customers also are looking at using 3D stacked memory to replace some of the cache on a chip because moving data in and out of memory can be extremely fast.

Power Bits: Phones And Tablets On A Diet

Friday, September 30th, 2011

By Ed Sperling
In the race to make smart phones more attractive, three conflicting marketing needs are coming into play—and raising some serious challenges for SoC architects and engineers.

First, this new breed of devices needs to last at least a day between charges using more energy-hungry applications such as games, GPS systems and Internet searches. And, incidentally, they also have to receive voice and video calls, text messages, stay connected to multiple signals ranging from 3G to WiFi to LTE, and be able to stream video without dying mid-movie.

Second, the marketing departments have deemed a thinner phone and tablet to be more desirable for consumers. If power engineers had their way these devices would be connected to a battery the size of a brick and none of these challenges would be a problem.

Finally, these devices need to have even better performance than in the past. The trend is toward mobility, and mobility requires the same kind of search speed that’s available on a PC—or at least close enough to it not to be deemed a problem—as well as the ability to stream video without a blip. That usually means more cores that can work in sync when necessary, and which can be put into sleep mode when they’re not being used.

There are examples of this new trend. While the new iPhone is expected to have better battery life in a similar package, the new devices showing up in the Android world look increasingly slick and remarkably thinner. Samsung’s new 1.5GHz dual-ARM-core Exynos processor is expected to fit the bill for more performance, including 3D graphics, using less energy. And Amazon’s new Fire tablet uses a seven-inch touchscreen for up to 7.5 hours per charge of video.

The question now is just how much smaller these devices can get while still delivering the same kind of area-performance-power tradeoffs and meeting these three requirements. Semiconductor engineers always have viewed limitations as a series of new challenges, but at some point they will be up against the dual barriers of physics and cost. And while the physical barriers can be overcome for a price, at some point that price may be too high.

Limits For TSVs In 3D Stacks?

Thursday, September 8th, 2011

By Ed Sperling
Semiconductor design always has been about solving technology issues one node at a time, often in the face of a perpetual barrage of looming problems. In fact, if there is any change at all, it’s in the number of threats that have to be solved now at each node, most of them driven by ever-increasing density and the laws of physics.

Stacking die holds the promise of becoming something of a game changer because it can solve multiple issues at once—power, performance, physical effects such as noise and crosstalk—while creating its own issues such as who’s responsible when two known good die don’t work in a package.

But the surprise among companies working with this packaging approach is that it’s harder to remove the heat from stacked die than anyone initially thought. The generally accepted premise that silicon is a good conductor of heat is true, but apparently not true enough. Early tests show that 3D stacks are showing some limits for through-silicon vias.

“What we found is that you have about a 7 to 10 watt maximum for through-silicon vias using current technology,” said Greg Bartlett, senior vice president of technology and integration engineering at GlobalFoundries. “After that you have to go to an interposer.”

This is somewhat counterintuitive, because most engineers have always assumed that 3D stacking would be the successor to 2.5D stacks. Unless something is done to change the technology, it may be the other way around. This is good news in one sense. It’s cheaper and easier to work with an interposer, which contains TSVs on a separate piece of silicon, than with TSVs running directly through stacked layers of thinner chips. There is less stress to deal with from drilling through a layer of silicon, and yield is higher if those TSVs are run through a thicker piece of silicon.

“The big problem now is that with a dense TSV the heat is trapped,” said Dian Yang, senior vice president of product management at Apache. “You have to use metal to dissipate the heat. People didn’t know the power density would be so high, and that has causes thermal issues that are much more severe.”

In 2.5D stacking, the tradeoff is the footprint. A 3D stack is much smaller and can fit into smaller spaces, which is why it has been of particular interest to companies such as Broadcom and Qualcomm.

It’s not the TSV technology itself that is causing problems. It’s the location of the TSVs. There are still places where TSVs work extremely well, such as inside of interposers and in stacked memory configurations. Memory is particularly attractive because it doesn’t generate heat anywhere near the level of logic. Micron and Samsung are both developing stacked memory configurations using TSVs and claim faster performance, higher density and lower power. This kind of memory can be used in a 2.5D as well as a 3D stack.

Other considerations are under way, as well, such as using different substrate materials using different cooling methods, such as microfluidics. But there will either have to be a compelling technology reason, which so far has not been proven, or a major ability to reduce the cost of these approaches before this kind of technology hits the mainstream. Until then, it’s anyone’s guess whether and for how long a pure 3D stacking approach will be successful.

Low Power Drives New Architectures

Thursday, September 8th, 2011

By Pallab Chatterjee
Power became the driving discussion at several major events last month.

The global cries for energy reduction, which have been mainstream since the early 1970s on the political level, have now moved to being real economic realities for component and systems suppliers. Chipmakers are finding that lower power makes good economic sense—lower cost of packaging, lower cost of ownership of the products, higher reliability and, most importantly, the differentiation in power reduction methods is resulting in a lower cost of sales for the products as it is increasing the customer retention.

Once a methodology is selected for the chips, it is carried through to the board, then the system and eventually the software that runs on it. This makes the cost of changing the power method very expensive and typically keeps the customer on multiple generations of hardware and components from the same suppliers under the same software umbrella.

The Hot Chips conference featured several dramatic network and multicore server products that all had enhanced power management. The power management formally was multiple rails (I/Os and cores) and sometimes a thermal shutdown. The new systems are pervasive to the point that architectures are created with equal attention paid to power management and data throughput. The features shown were multiple power supplies, variable power voltages, block-based shutdown and turn-on, new circuits to minimize turn-on/turn-off, alternate clock tree distribution systems, lower power PLLs and clocks, and even new logic methods.

Fulcrum presented a 1 billion packet/second frame processor, which ended up being a case study for the applicability of non-synthesized sequential logic or asynchronous design. The logic structure, while known in the past, has never been implemented in such a large-scale application before, and the results included not only better performance but a power envelope that was task-acceptable.

Similarly IBM, Intel, Tilera and Cavium presented next-generation many-core designs with performance targeted at application needs over the next 5 to 10 years, but with power profiles at levels similar to chips of many decades back. The general rule is that power per transistor in these designs is less that 100 times what it was five years ago.

On the system side, data centers are the driver. Dell addressed the issue of power reduction for its servers by not just swapping components, but also re-qualifying the systems to work at extended temperature ranges. This means peak air temperature can be as high as 113 degrees Fahrenheit (45C) for its servers without sacrificing performance or warranty. This increase from 80 degrees Fahrenheit means there is no need to provide chilled air to cool the machines. The cost of the environmental air is generally equal or greater than the cost of the energy to run the servers.

To keep the component power down, these servers use new 30nm DDR3 DRAM from Samsung, which are now down to operating at 1.35V from 1.8V. The reduction in the power supply, and the reduction of geometry to make the devices, provides higher performance, higher density and an overall reduced power envelope. Google has noticed that by using virtualized machines and high DRAM on its servers it can eliminate the power from rotating media and go to mostly high-memory machines. This architecture systematically drops power at the data center level by double digit percentages and provides an increase in performance. The performance increase allows for the implementation of new features such as “instant search” while a user is inputting the full search field.

Facebook, which is new to the game on hardware, took a fresh look at power and started not with the chips, the memory or even the board, but with “how is the power getting to the computers?” It was able to provide a 12% to 15% reduction in power by looking at and redesigning the power supply input (408V to 24v signal path) and eliminating the UPS in its servers. This is a new area of high-power and high-current design that companies need to think about and look at. Facebook also ended up changing the board designs for the base compute server modules. Information on the Facebook approach and other areas to address the power can be found at OpenCompute.

Power as defined by the EDA community, which is “dynamic peak power in active mode,” as well as in idle mode, multi-mode and transition, and even infrastructure, will all play key role in next-generation low-power design.

Power Bits: July 15

Friday, July 15th, 2011

By Ed Sperling

Portability Play
Synopsys is working with GlobalFoundries to deliver interoperable process design kits later this year at advanced nodes. iPDKs are particularly important for companies looking to use designs for multiple markets. A general-purpose process, for example, is critical for markets looking for higher performance, while low-power processes are important in applications where battery life is a differentiating factor.

The problem is that many of these designs are not always portable between processes, despite the fact that power and performance are considered tradeoffs in most designs.

The companies said the 65nm G and enhanced low power (LPe) kits are available now. Versions for other process nodes will be available later this year.

Stacked die demo
Imec, the Belgian research organization, demonstrated a stacked die with DRAM on logic at Semicon this week. The chip is a prototype of what is expected to become a mainstream approach as companies seek to re-use existing analog IP and subsystems from previous nodes, as well as to add flexibility and speed to complex designs.

What’s particularly interesting about the prototype is Imec’s description of how heat can be removed from the die. Logic generates a fair amount of heat, but the DRAM die acts as a conductor for some of that heat. Qualcomm observed similar effects in its own stacking research last year.

Imec’s work was done in conjunction with GlobalFoundries, Intel, Micron, Samsung, TSMC, Fujitsu, Sony, Amkor and Qualcomm.

5 Ways To Cut Power

Thursday, June 16th, 2011

By Ed Sperling
Low energy consumption with minimal leakage has emerged as the most competitive element in an IC design, regardless of whether it involves a plug, a battery, or whether it’s powered by a gasoline engine.

While components on an SoC aren’t always power-aware, they’ll have to be in the future as consumers focus first on energy efficiency. With rising fuel costs, a concern over global warming and a steady reminder that smart phones have to be plugged in every night, car companies are shifting their strategy from efficient hybrids to even more efficient plug-in hybrids and electric vehicles, and California has gone so far as to mandate that one-third of all electricity sold in the state by the end of 2020 must come from renewable sources.

This shift in public awareness hasn’t been lost on the chip industry, which has been rolling out some very complex advances well ahead of schedule. Here are some of the most important:

Clouds
The push toward a cloud-based infrastructure is a way of centralizing computing—basically a return to the time-sharing model once perfected by the mainframe and then re-distributed with the advent of the commodity PC server. The data processing world is re-aggregating, but this time with a difference. It’s not just that the computing is being centralized. It’s that the centralization is taking place in proximity of cheap power sources such as hydroelectric power, nuclear plants (for now) and wind farms.

“Cloud leads to big efficiency gains,” said Chris Rowen, chief technology officer at Tensilica. “Now you can put the computing farm where the energy is available. It’s an arbitrage opportunity. It’s not hard to ship bits when you compare that to the difficulty in transporting electricity.”

There’s a clear business case to be made on this front. An estimated 6.5% of electricity is lost in transmission, according to the U.S. Energy Information Administration. That may not seem like a lot until you consider those are high-voltage transmission lines. Bits are cheap, in comparison—even trillions of them—which is why there is talk now of centralizing portions of even base stations. Those parts that do intensive computation with a high degree of redundancy are prime candidates for being located in a data center.

“There’s a lot of computation needed to reduce noise and create a clean signal,” said Rowen. “But there’s also some computing that has to be done locally because there are tough latency requirements.”

Adaptive Body Biasing
Adaptive body biasing has been under serious discussion for the past five years as a way of reducing current leakage by controlling a device’s body voltage, which in turn increases the voltage threshold. The big advantage here is less switching to the off state. The downside is this is has been difficult stuff to design and manufacture.

“This was not seen as a mainstream approach, but now it’s showing up almost everywhere,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “This was seen as a challenging technique to implement, but now TI and Samsung are using it. If you change the body bias voltage, you impact the threshold voltage. You can increase or decrease leakage, as needed, and boost performance.”

Consultant Bhanu Kapoor, president of Mimasic, noted that for some high-performance applications the alternatives such as power gating may be impractical because it simply takes too long to turn on and off sections of a chip. In those cases, body biasing is the only choice.

Atomic-Level Changes
Another technique that has been particularly difficult to master is atomic-level control of channel doping on the manufacturing side. And while most experts don’t expect the process and manufacturing side to offer any huge gains, this one may be the exception.

Scott Thompson, chief technology officer at startup SuVolta, said that by improving the doping technique, both dynamic and static current leakage can be reduced with regular bulk CMOS.

“The problem is that the wall around the channel is leaky and it’s hard to control the shape,” said Thompson. “Strain engineering helps to control the atomic-level analysis. But there has been no other breakthrough other than changing the transistor, and we don’t see a need for that for all architectures.”

At its unveiling last week, SuVolta had lined up support from Fujitsu, Cypress, ARM and Broadcom. The company claims the technology is an alternative to FinFETs, which are more difficult to manufacture.

3D Transistors And Packaging
Nevertheless, the major foundries have committed to building FinFETs at advanced nodes. Intel’s announcement of a Tri-Gate three-dimensional transistor at 22nm has been a major topic in the semiconductor industry. The question is now that Intel has publicly committed to the technology, can it really be manufactured with sufficient yield? And can it be built effectively using the disaggregated foundry model in the near future?

These kinds of questions will remain unanswered at least for the next couple years. TSMC is planning to use FinFETs at 14nm, and GlobalFoundries has been working on the same technology. Nevertheless, the big advantage of FinFET technology is a sharp reduction in leakage while providing a significant performance boost.’

Creating stacks of die also has a huge effect on power, in part because the distances between logic and memory can be shortened significantly. A system-in-package version of stacked die, using interposer technology, is expected to begin widespread production over the next 12 to 18 months, bolstered by the new Wide I/O standard that increases the size of the pipes between logic and memory.

New Materials
Fully depleted SOI, silicon on sapphire, as well as new ways of putting them all together in stacks connected by low-cost interposers that can be made of glass have turned into major research efforts as companies seek to knock costs out of the bill of materials for new chips.

While the FD SOI has been well tested for years by the Common Platform participants, the others have only been used on a very limited basis. One approach now being considered is actually designing chips to run hotter rather than trying to keep the power down. While there are limits to this approach—no one wants to pick up a hot phone—there are times when performance is more important than heat.

Taken as a whole, all of these changes can have a significant reduction in power, particularly when coupled with efficient software code and more customized user controls—and end devices that actually use the power-saving technology that is being built into these chips.

High Performance And Low Power

Thursday, April 14th, 2011

By Pallab Chatterjee
As mobile platforms become a larger part of the component spectrum, their need for optimization beyond low power has moved to the forefront.

Traditionally, standard “line-cord” based products in both the consumer and commercial sectors have used the “G” label processes from semiconductor foundries. These processes had the highest-yielding combination of design rules, device performance and leakage as a tradeoff triad. The “G” processes were then further split into the “HP” and “LP” flows. The “HP” processes are high-performance optimized with the most aggressive design rules, lowest Vt, and support standard to higher operating voltages. The “LP” processes are optimized for low power and feature design rules targeted for the lowest leakage, support lower operative voltages, and tend to have the slowest transistors of the three options.

These process labels have been the industry norm from the 250nm era through the 40nm processes. At 28nm and below, a new process is emerging called the “HPL” or “HPM” process. UMC offers an HPL flow, which is a high performance and low power dual-corner optimized technology. At TSMC, the newly offered HPM flow is for high-performance mobile applications and is also optimized for high performance and low power.

The complexity of SoCs for mobile applications has driven them to use cutting-edge processes. The rise of computing visualization and content playback has forced these extended battery operation cycle products to embrace multicore architectures with embedded memory as the main design. To accommodate these activities, along with high-performance graphics handling, the designs have moved to single die SoCs, which minimize I/O as a method to reduce power.

These multicore designs also feature advanced power management based on switched power controls and a controlled state-based turn-on/turn-off of the power grid to different power blocks. Power-switch devices, with the ability to have very large devices to minimize the “on” resistance, are typically not optimized for high-performance processes. The new flows allow these devices to be built, along with high-performance processor and graphics cores, with significantly lower leakage than on HP flows.

TSMC announced the new flow earlier this month as a specialized optimization for battery operation, low-operating voltage, low leakage, and high-speed logic and memory access at the 28nm and 22nm nodes. The mobile platforms are driving enough of the wafer volumes to warrant a specialized flow rather than a “mix and match” from the other processes. The driver is not only smart phones, but also netbooks, tablets and other platforms that will consume graphic content. This content is spit between gaming applications and video/TV material. The video/TV material has the additional power optimization point of RF for the streaming connection to receive the content. The gaming content tends to reside locally on the platform.

This new process optimization also is driving new IP. The I/Os typically are migrating over from standard LP processes, as there is no major change to the external world. However, high performance IP is not applicable to the new flow. The basis of the new IP is power control and operation in a power envelope. From this constraint, the performance optimization is then imposed.

Companies such as Imagination Technologies, which feature soft IP, will not have any major issues with optimization to the new process offering. However, hard processor cores, cache memories, DSP’s, graphical user interfaces and display controllers will have to be redesigned. These blocks will need to incorporate the power-switching logic into their design, and support native multi-voltage blocks.

With UMC and TSMC offering these processes for foundry, and Intel and Samsung having them as internally use new processes, it won’t be long before GlobalFoundries and the Common Platform bring this new optimization point to market.

Widening The Channels

Thursday, March 17th, 2011

By Ed Sperling
Wide I/O—both as a specific memory standard and as a generic approach for on-chip networking—has been looked at for the past couple of chip generations as a way of improving SoC performance. Increasingly, it also is being used as a key strategy for reducing energy consumption.

Wide I/O refers to a number of different approaches in on-chip networking, ranging from through-silicon vias in 3D stacks to interposers in 2.5D stacking. It also refers to a standard for memory communication being developed by JEDEC, as well as more dedicated channels for signals. In all cases, the added benefit is a reduction in power needed to drive a signal.

The tradeoff typically is between serial I/O and wide I/O. Serial I/O is simpler to design and works over longer distances, but it is far less power efficient. Wide I/O, in contrast, is higher bandwidth with big power savings—Samsung, for example, estimates its new 1Gbit mobile DRAM based on a 50nm process consumes 87% less power—but the technology is also more complicated to use. And in most cases, it’s also more costly.

Eliminating complexity while adding more
The concept of bigger pipes has always been a last resort for chip architects. It’s well known that shortening the distance a signal travels and reducing the resistance can drive down the amount of power needed for a signal. Reducing the overhead of serialization and deserialization can cut the power even further. But ironically, it has taken an explosion in SoC complexity for chip architects to seriously consider simplifying signal paths.

“We always go through this pendulum swing of what’s the optimal physical implementation vs. what’s the simplest way to do it even if it costs more silicon,” said Steve Roddy, vice president of marketing and business development at Tensilica. “So you can do things with 128 wires using serialized I/O, or you can do it with a lot fewer using wide I/O. The serialized I/O requires deserialization, which costs power. With wide I/O, which could simply be a lot of wires connected to the next block, you can lower the frequency and widen the channel.”

In a 2.5D stack, that extra silicon is easier to justify because it doesn’t add significantly to the overall footprint. In a system-in-package or package-on-package it may involve an interposer, which is another piece of silicon. It also can involve a through-silicon via in a 3D stack, which is wide enough to avoid any congestion.

“With a TSV you don’t need a standard I/O, which includes the I/O circuitry, patch and bond wire,” said Tom Quan, deputy director of design methodology and service marketing at TSMC. “So you get rid of all the I/O circuitry, and you have the same area, power and current. That results in a tremendous power savings. You also get a big boost in timing. And if you use an interposer, that’s silicon so it has the same resistance and capacitance of a standard IC. You can simulate them both together and get a predictable result.”

Eliminating bottlenecks
There are many good reasons for using wider pipes. One is that multicore and multiprocessor implementations generally are inefficient. The whole idea behind these implementations was that software would be able to run across multiple cores and multiple processors. That didn’t work out as planned, due to the inability to parallelize many applications, but cores were still designed to share the same memory.

That’s inefficient from a performance and a power perspective. Cores that are not in use should be turned off or powered way down. Moreover, when they need to connect to memory it should be along a clear path with as little congestion as possible and over the shortest distance possible.

“For some years to come we’re going to be seeing systems in package with interposers as the ideal solution,” said Joe Sawicki, vice president and general manager of Mentor Graphics’ Design-To-Silicon Division. “That will involve a lot faster interconnects, mostly to memory, and potentially to homogeneous logic. One of our customers was developing a digital chip and needed Bluetooth. They did it in a digital IC and they also did it in a SiP. The SiP destroyed the SoC in performance and power.”

But the question also is at what cost. While 2.5D approaches are relatively straightforward, the interposer does add some cost and the TSV can add even more.

“We are pursuing full 3D and so are most of the people in the phone business, primarily because of the form factor and cost,” said Riko Radojcic, director of engineering at Qualcomm. If you think about an interposer, you’re adding another die to the cost. Conceptually an interposer is an elegant solution and it works fine for someone who sells a product for $100. If you throw in a $1 interposer it’s no big deal. But if you’re making a $5 die and you throw in an interposer, it is a big deal.”

The same is true of through-silicon vias, although the ultimate advantages of this approach are expected to become more significant over time.

“TSV is expensive but is a good way of meeting the form factor,” said Navraj Nandra, senior director of marketing for Synopsys’ DesignWare Analog and MSIP Solutions Group. “You need to optimize for both low power and low cost packages. It’s like buying a $50k hybrid car that gives you 32mpg compared to a $22k 1.2L, 3-cylinder petrol engine car that gives you 50mpg. Everyone is excited about the hybrid car.”

Optimizing the signals
Behind the hubbub about the I/O technology is another often overlooked piece of the equation. The move to multiple processors and multiple cores was done largely as a knee-jerk response to the end of classical scaling at 90nm. What has happened since then is a much more measured response to how to use these cores more effectively, which requires much more granularity in the design process. Not all cores need to be on an ARM or MIPS processor, for example, and not all of them need to be in one place on an SoC—or even on the same die of a SiP or 3D stack.

In addition, not all of those cores or processors need to be the same size or run the same software.

“In addition to wide I/O there are dedicated point-to-point connections to relieve the system congestion,” said Tensilica’s Roddy. “Those can include general purpose memory and processor. When the system architect knows beforehand what’s going to be in the system they can add those connections up front. So you may have a video decoder and buffer and an audio decoder using separate memories, and those may change depending on whether they end up in a cell phone or a set-top box. But there are some things you don’t know at design time and you need the ability to generate system-specific interconnects, which is what’s being sold by companies like Arteris and Sonics.”

And finally, there is a simple mathematic principle behind the push to reduce power.

“The longer a signal has to travel, the more power it takes,” said Qi Wang, technical marketing group marketing director for Cadence Solutions Marketing. “A lot of issues in design come down to power. If you put the memory outside the chip, that takes power. If you want to speed up performance, that takes power.”

Bigger pipes over shorter distances can help solve that problem, and it’s a solution that is beginning to garner much more attention these days.

Next Page »