Posts Tagged ‘test’

The Hidden Costs Of Test

Thursday, October 6th, 2011

By Ed Sperling
As complexity grows in SoCs, so does the ability to accurately test them. That helps explain why there are so many different types of tests and so much confusion about what to use to perform those tests, when to test, and where in the flows to include those tests. But what’s less well known is that tests done improperly also can give false results, labeling good chips as bad—or in some cases actually killing a good chip.

Some of the problems occur when testing involves SoCs with multiple power islands and voltage rails. The primary reason in many cases is a bad test design. Use cases for smart phone chips, for example—the so-called worst-case scenarios—have multiple power islands turned on at the same time. In a testing scenario, including built-in self test (BiST), all of those islands may go on at the same time if they’re not carefully scheduled.

“You can burn a chip with a bad schedule,” said Yervant Zorian, chief architect at Synopsys. “There are two different modes of operation. In one, the CPU is doing it’s own work. Then, you have a memory scan or BiST when it’s idle. Throughout the life of the chip, 90% of the time it will do a normal function and there will be no problems. But during test mode the activity is creating excitement in the chip.”

That excitement is sometimes maximum power, and there are limits for the amount of power that a chip can handle. The best way to avoid this problem is to test in sequence so that all power islands are not on at the same time, but this requires up-front planning. Test frequently is an afterthought in many designs. In addition, while testing is good, over testing can be very bad.

“There are two levels of failure from test,” said Giri Podichetty, technical marketing engineer at Mentor Graphics. “One is at the soft level. The second is in the power supply. You can get a false failure even if the chip is okay. But you also can get a catastrophic failure if there is too much heat or current.”

The flip side of this is that testing itself is ineffective, allowing bad chips to reach the market. One of the unique challenges at advanced process nodes is the amount of power needed is not scaling at the same rate as the transistors in the design. That can greatly impact test, said Podichetty, because small voltage swings with high leakage can cause significant problems.

“Power integrity can be localized, but it may not be what you expected,” he said. “You also may have the chip running slower and it may be hotter.”

Mix and match
Another challenge is the so-called mix and match approach of chipmakers. Tools and IP are bought from multiple vendors, with some of that IP developed on different process nodes. In addition, not all methodologies are up-to-date because chipmakers will frequently push older tools and methodologies.

With soft IP, much of this can be factored into synthesis. But with hard IP, testing requires a real understanding of the IP and how it will be used.

“Some designs can be tested in their entirety, but others need to use a partitioned test approach,” said Robert Ruiz, senior product marketing manager for test automation products at Synopsys. “You need to attack the problem with different methods. ATPG (automatic test pattern generation) can model faults. You may need to do dynamic bridges that exist for a moment vs. a static bridge. But the challenge for test engineers is to establish the operating parameters of frequency and voltage range. Basically what you’re doing is creating shmoo plots, where frequency is one axis and voltage is on the other. Then you try to push the devices to the corners and determine what’s in the spec and what’s not in the spec.”

Engineers also need to make sure the test program is more sensitive to the power budget than in the past, he said. If the budget is set too low, there’s a danger of under testing.

“Low power is not the best way to handle testing,” he said. “Power-aware is the right approach. Low power actually minimizes the power, so you end up under-testing.”

Conclusions
While test can go a long way toward making chips more reliable, done wrong test also can damage chips or provide false results. In a single die, that can be expensive. In a stacked die, it can be multiple times more expensive.

It’s important to note that each chip is different, each organization doing the testing is different, and the number of combinations of what to test, where to test and what to use is increasing. That’s why there is so much activity in test these days, with all three of the big EDA vendors and many of the smaller ones working to secure a stronger position in this area. Where there is pain there frequently is opportunity, and there appear to be plenty of both in this area.

DFT: Essential For Power-Aware Test

Thursday, September 8th, 2011

By Ann Steffora Mutschler
Power-aware test is a major manufacturing consideration due to the problems of increased power dissipation in various test modes, as well as test implications that come up with the usage of various low-power design technologies.

Challenges for test engineers and test tool developers include understanding the various concerns associated with power-aware test, development of power-aware design-for-test (DFT), automatic test pattern generation (ATPG) techniques, test power analysis flows, among other issues.

“Power is one of those key drivers in what we’re building today, especially in mobile applications,” said Savita Banerjee, SoC test and verification manager at LSI. “In the past speed, area and cost were the big-ticket items, but power has really come into play.”

Banerjee: Power has really come into play in designs.

Her team specifically works with storage customers where power, cost and performance are all critical. “The two important aspects of power-aware test for us are related to the actual DFT implementation and the impact it has on yield,” she said. “Typically, the SoCs that we are developing have a power architecture and are designing power into what they are building so they can get additional power savings rather than just reducing power supplies and skewing parts and things. They actually have very comprehensive power architectures that allow them to get more power savings than they normally would have if they hadn’t built that into the design.”

From a test angle, it’s important for the DFT architecture to actually align with those power management strategies, Banerjee stressed. “DFT is not really sitting on the sidelines now. It really has to integrate well with the overall design solution. It’s important that when you define a test strategy and a test architecture, you need to make sure you are implementing those DFT structures in such a way that you consider the power architecture and the power needs of that application.”

In the past, this was a manual process, she said. LSI has chosen to work with Synopsys to implement DFT and leverage power-aware constructs. “The most important thing is how power intent is specified and how that gets carried forward in the DFT implementation process.”

She noted that when LSI first engaged with Synopsys more than a year and a half ago, the flow that was advertised out of the box didn’t work as expected. “We spent a good year working very closely with them to help mature the flow and really define what our requirements were and come up with a flow that was a lot more seamless. I think it was one of those examples where the EDA vendors do need to partner with actual users of their tools to be able to deliver what’s needed.”

The intersection of power and test
Power and test intersect in a number of areas. First, DFT must respect the power intent. “When someone puts test into their design, they want to make sure that if it is a low-power design it doesn’t break the low power,” explained Robert Ruiz, senior product marketing manager for test automation products at Synopsys. “As we consider designs that are developed for low power there are often multiple voltage domains and power domains, and the DFT has to be cognizant of that now. In the old days, it was just taking the flip-flops, converting those to scan flops, hooking those up and everything worked well. That can’t be done so easily now.”

As an example, when a scan chain crosses from one voltage domain to another, a level shifter has to be put in. That task becomes even more complex because if the tool just blindly puts in level shifters it will cause to be a huge area increase. So the challenge is to not break the low-power design—be aware of when level shifters need to be placed in, how isolation cells are handled, etc.

To do this, typically a tool takes in one of the power format files—either IEEE 1801 or CPF (for now, at least)—which describes the power intent. The tool then utilizes that information to determine where and when to put in a level shifter. That said, there are always different goals with different customers–sometimes it’s area, sometimes it’s timing –and the weighting of those can be different so there is flexibility within the toolset for the user to trade off things.

A second area where power and test intersect is in the DFT logic, which itself needs to be power aware. It shouldn’t consume much power or any power at all during the functional mode or mission mode, thereby minimizing the power consumption of the test logic itself.

Stephen Pateras, product marketing director for silicon test at Mentor Graphics, agreed that the test itself should not provide power issues. “In other words, when you’re applying the test you’re not increasing the power of the device or the ability of the device to deal with the power levels. You don’t want to increase your average power during test beyond what the design has been architected for. That’s a big issue because test generally tends to exercise a device in a much greater way than is done functionally.”

Pateras: Test should not add power issues.

Another area where power and test merge, and one of importance for users, is the amount of power consumed on the silicon device at test time.

“Even though low-power design has been a big buzz for a while and is in fact a reality, in the test engineers were hit with the power problems very early on. The main reason for that is if the goal of a manufacturing test program is not to just comprehensively test the chip but also to do it in a cost-effective manner, it means test as much of the chip as possible. That corresponds to a lot of activity on the chip, and activity means a lot of power draw potentially exceeding the power budget,” Ruiz said.

But just how accurate is this testing?

“It’s kind of ironic that your worst-case chip behavior in terms of power is actually during test because you artificially create random activities to exercise your test coverage, but at the same time you keep drawing current,” said Qi Wang, technical marketing group marketing director for Cadence solutions marketing. “This perspective means that you have to overdesign your power distribution network to accommodate for the worst case happening during test. However, in the real case it is overdesign, which kills performance and silicon area.”

Wang: Overdesign risks.

Interesting, he pointed out that the power management functionality already on the silicon can be leveraged to achieve low-power test. “However, this is the most challenging because you have to change your test methodology and enhance your flow. If you want to do this, you need a way to control the domain on/off and isolations by the tester, not by the circuit functionality,” Wang explained.

What users really need
Going forward, LSI’s Banerjee said yield is another aspect of test that could use some enhancement. Having a tighter loop between what the ATPG tool says and having it tied into the power estimation tools to actually see what happens on silicon would be really beneficial.

She said the technology LSI uses has been enhanced and is a lot more power-aware than it had been in the past and the company is definitely leveraging that. “It’s basically related to the activity of the patterns. Scan tends to exercise a lot more of the design than would normalized be exercised in typical mission mode. When that happens, we run the risk of having unnecessary fallout or yield loss because of that increased power consumption, so we could end up throwing away a defect-free part.”

“They’ve got switches in place to address the activity factor,” she continued. “Unfortunately that conflicts with pattern count because typically to reduce your pattern count, you want to run a lot more stuff in parallel but you want to reduce activity you want to run less in parallel. So they are kind of conflicting problems, but both need to be solved. So it would be nice if there was a way to be able to iterate and do some exploration between power savings and effectively test time in one go. That way you can get the power savings that you need but not shoot yourself in the foot when it comes to test time. I think they have the hooks in place to do that with all of their capabilities. It’s not something they can do and publish a generic recipe that would work for all designs but if they could provide a framework for the individual users to do that assessment or exploration, that would help.”

Power-aware testing of stacked devices
When it comes to implementing power-aware test on true 3D ICs, there may be additional complexities—or maybe not.

Synopsys’ Ruiz contends that test is actually the easiest issue for 3D IC. “What is a 3D IC stacked die? There’s already the ability to put in DFT to access chips, there’s technology to access multiple chips on a board—that’s called boundary scan IEEE 1149.1. Stacked chips tend to look like chips on a board and there are already existing test standards to do that. There’s already ability to put a boundary scan around the chip and there is ability to generate patterns. There’s ability to access, there’s a standard to describe dealing with multiple chips and there is ability to generate patterns, so all of the core and fundamental technologies are already available and customers are simply using them straightforward,” Ruiz concluded.

Ruiz: 3D test is a snap.

Mentor Graphics has similar thinking on this and detailed its strategy for 3D-IC design, verification and testing in March. And in June, Cadence and Imec said they developed an automated test solution for 3D ICs.

3D Stacked Die Create Unique Test Issues

Thursday, December 2nd, 2010

By Ann Steffora Mutschler
While 3D die stacking promises a number of benefits including smaller footprint, faster speed, lower power and possibly lower cost, testing those devices isn’t going to be simple.

There are varying degrees of challenges aligned with varying types of defects that occur throughout the process, from wafer fabrication to package assembly to system-level assembly. And at each level, there are phenomena that impact quality and that lead to potential defects, said Ed Malloy, product manager for Encounter Test at Cadence Design Systems.

At the wafer level are all of the challenges normally seen in a non-3D scenario, including design and process related defects—traditionally called “stuck at” and delay-based defects, which are timing or process-related defects. With the “stuck at” or static types of defects at the wafer level, the dynamic defects are typically process-related, where a higher resistance might occur on a trace on the die that slows the signal down.

Both of these challenges become even more complicated by the stacking process. “As you stack these devices, there are thermal impacts that need to be modeled and understood,” Malloy said. “So as we go down the process curve the cell behavior becomes more widely varying depending on your process and your temperature. The temperature fluctuations have a significant impact on the quality of your signal integrity. So if you’re not understanding the thermal characteristics or dynamics that exist between the stacked devices, then this can lead to failure.”

Effective testing of 3D integrated devices starts with having a high-quality ‘known good die’ (KGD) test, explained Greg Aldrich, director of marketing for the Silicon Test Systems group at Mentor Graphics.

“The first step in testing for a 3D package is to be able to test the wafers or the standalone die before any packaging gets started. Typically that will get done at wafer test, so you’re testing multiple wafers that will eventually get packaged together. You have to have a test that does a very high quality and tries to detect everything you can possibly detect before you start packaging the individual die.”

One of the biggest challenges with 3D stacking has to do with the number of pins because there may not be access to all of the pins during wafer probe. Many more pins are used in 3D stacking, but a large number will not get packaged. Instead, they will be attached with a through-silicon via (TSV) or in another way, and there may not be big enough probe pads to probe them during wafer test.

“Today’s probe technology is unable to handle the finer pitch and dimensions of TSV tips. The wafer has to be thinned by about 75% for them to be stacked, but what happens is that the tips of the TSVs can be exposed. So as the thinned wafer is contacted by the wafer probe there is a danger of damaging the wafer,” explained Samta Bansal, product marketing for Encounter digital implementation system, and 3D IC guru, at Cadence Design Systems.

“You may have thousands and thousands of internal pins and you are only bonding out a very small subset. So especially after you get the package together, how do you get access to everything and fully test everything from a very small number of pins?” said Mentor’s Aldrich.

Mentor does all testing through a TAP (test access port) interface, under the IEEE boundary scan specification 1149.1 that has a 5-pin interface for test access. For example, all of its BIST is initiated and results gathered through the TAP interface.

A second piece of the 3D testing involves the interconnect between the package to verify everything that goes through the TSVs, as well as things that can be bonded out and packaged separately, he said. In testing the interconnect there could be two SoCs stacked on the package, for example, and there may be a lot of different types of interfaces between those SoCs. Some may be standard digital, some may be analog signals, and some may be high-speed serial-bus interfaces. Another scenario could be a memory chip, like a DRAM, stacked on top of an SoC. In both cases, the SoCs or the memory must be fully tested once it is sitting in the package.

Aldrich noted that Mentor is leveraging all of its technology in a different methodology. “For example, for testing a memory chip that’s sitting on top of an SoC, we can take our memory BIST and instead of putting it on the same die that the memory exists—which is what we would typically do for embedded memory—we can put it on the SoC and then interface the BIST for that memory through the normal bus interface between the memory and the DRAM. So with the memory BIST logic sitting on the SoC we can then test the DRAM chip itself. It’s a different methodology of using some of the BIST logic that we have.”

For testing the die-to-die interconnect, IEEE 1149.1, the boundary scan standard for single-ended digital signals can be used to test all the signals in between SoCs in a stack, and IEEE 1149.6 that is for AC coupled differential signals.

Managing power during test
As the industry moves down the process nodes, the consumption of power during test mode is impacting signal integrity of the chips, therefore power must be managed during test mode. “Traditionally test mode consumes anywhere between 2X and 5X the power of a normal functional mode. Because of the variations in the lower processes, you can’t afford to have these wider swings of power – they impact not only the signal integrity but the reliability. You can have early defects if you’re not managing power, so they can pass on a tester but fail in the field. By managing power at the wafer level, it mirrors the power consumption during functional mode – that’s an essential part of ensuring that ‘known good die’ are going into the 3D assembly process,” Cadence’s Malloy explained.

Within the 3D package device itself, power consumption must also be managed including memories, which can consume a lot of power, he said. “When they do, it affects temperature, which affects performance, and signal integrity – it’s a spiraling effect. So you have to have the capability to manage your memory tests, your logic tests and all the power. At the same time, you need to be testing areas that would potentially be susceptible to thermal effects: there may be some high risk areas that you would want to test after assembly. An easy one would be to model and test the through silicon vias and interface as well as the communication across chips. Of course you want to check the connectivity but still need to be checking some of the core logic and how it’s behaving once these devices are packaged.”

As part of a 3D stacked die methodology a test sign-off or test vector sign-off should be included in order to know where potential hot spots may be, which power analysis tools can do.

Other companies doing work with 3D ICs include Apache Design Solutions, which in last several years, has been working with a number of leading semiconductor companies including TSMC on 3D ICs. In fact, in June, TSMC included Apache’s 3D IC power and noise tools in its Reference Flow 11.0 and Analog/Mixed-Signal (AMS) Reference Flow 1.0.

The drive to reduce power and increase performance demands advanced packaging technologies such as SiP and 3D-IC/TSV. Apache recognized that these technologies pose major power, thermal, and stress challenges due to the coupling of power delivery network between digital and analog dies and their heat transfer properties. As such, Apache’s tools generate CPM and CTM as hand-off compact models representing the die power and thermal behaviors, and are extended to utilize CPM and CTM for multi-die chip-package analysis.

For additional information on this subject there are two blogs of note:

–Samta Bansal’s blog.
–Ed Malloy’s blog.

Rethinking Test

Thursday, February 11th, 2010

By Ann Steffora Mutschler

The responsibility of semiconductor test has long sat solely with the test engineer as the chip designer focused on the functionality of the device. However, particularly in low-power designs, when the device is being tested, much higher power levels are applied than normal functional operation – sometimes causing the device to fail.

This ‘false failure’ can lead to unnecessary yield loss on the production line requiring significant time and effort to diagnose because the extra power applied to the device may indicate incorrectly that the device is bad when it is not.

The goal of the test engineer is to reduce the cost to test a device. Therefore, they want their automatic test pattern generation (ATPG) tools to generate a lot of activity and test a lot of the chip. As a result, a lot of power is being consumed—typically exceeding the functional power budget between 7x to 10x.

This occurs because the chip is designed with a power budget in functional mode. “If you think about the design of a chip, most chips aren’t operating all parts of the chip at the same time and ATPG doesn’t look at functionality — it just looks at the structure and to minimize the cost or minimize the patterns it’s trying to make as much activity happen in the chip in order to get test all simultaneously,” explained Robert Ruiz, senior product marketing manager for test automation products at Synopsys.

In the past, ATPG tools really didn’t need to look at power consumption — the chips were small enough, the power rails were big enough, and there wasn’t a big prevalence of low-power designs. On top of that, there weren’t compression techniques being used, which further exacerbates the problem because the goal of a low-power design is to minimize switching activity, while the goal of compression is to maximize it. This is a very big deal for test engineers, but it is not an issue traditionally highlighted in the design community given designers’ focus on functionality—even though designers may take partial ownership about how to implement some of the design-for-test solutions.

Ruiz indicated that approximately three years ago the impact of power on test became a big area of Synopsys’ R&D effort based on feedback from a number of customers. At that time, he said, there were some customers who reported power issues related to test. They did some redesign, which resolved the issues at hand, but believed it could be a problem in the future. “It has certainly evolved to the point where most customers say they definitely have found a power issue during test,” Ruiz said.

Test is tricky for low-power designs
Greg Aldrich, director of marketing for the Silicon Test Systems group at Mentor Graphics Corp. said one of the problems in test is how to create test patterns that have lower power profiles in terms of what data gets shifted in, which is dramatically complicated by the use of on-chip compression and on-chip test structures. Previously, test was performed by shifting data into scan chains, issuing the clock cycle, shifting the data out, and then comparing it to the golden response data, whereby the scan chains were directly connected to the tester.

However, most designs today utilize either built-in self test (BIST) or on-chip/embedded compression, which is still a deterministic process. But instead of the tester directly shifting data into the scan chain, there is a decompressor that it goes through that sits on chip. The tester shifts data into the decompressor, which is expanded internally, essentially creating the data on-chip, Aldrich explained.

What complicates the process is that since the data is being created on chip a new on-chip piece of logic must also be created, so Mentor invented a new low-power decompressor that allows the designer to control the stuff on chip, he said. “It’s not as simple as just changing what’s on the tester. You actually have to change some of the embedded test logic on chip to be able to control that. I think that is going to be primarily how switching activity is going to be controlled during the test—by controlling how the test patterns are created and then how the test patterns are loaded.”

Similarly, Synopsys rolled out an ATPG approach that doesn’t require any hardware or DFT change (which no customer really wants to do), Ruiz said. The company’s TetraMAX tool was enhanced about three years ago to allow the user to dial in a budget of the switching activity, which serves as a proxy for power consumption. And, if a customer wants to be more aggressive and active in managing power consumption, there are other hardware techniques including Synopsys’ DFTMAX tool as it puts off the scan chain.

Likewise, Mentor’s Aldrich noted that in terms of innovations both on the design side as well as on the test side to help deal with the impact of power on test, “It’s all focused on how to reduce the switching activity during the test. Historically, a lot of that has been done by partitioning the test and that is still the case especially as you move to designs that have multiple voltage domains or multiple power islands. Being able to just sequence the tests for each one of those allows you to test a smaller piece of the design. That has some implications on the test time and cost that it takes to test the device but that’s one approach.”

Mentor has also added more control into its tools as to how much is switching during the test process. For example in its ATPG tools, users can specify constraints to the test pattern generation tool to indicates how much switching is allowed during the test pattern.

“The more aggressive they are in terms of lowering the amount of switching during the test process, the higher it is in terms of test costs. It’s going to take more test patterns, it’s going to take more compute time to create the test patterns but it is a knob they will have control over now. They really have no other choice other than designing the power structures in the design such that they can handle 50% switching activity—that’s the only other alternative,” Aldrich said.

In the end, the objective of test is to create the highest coverage in the smallest number of test patterns. What that means from the perspective of the design, it means you want to try and switch on everything possible in the design on every cycle on the tester—and that’s the opposite goal of low-power design. That said, a complete rethinking of compression algorithms and other test technology is in order.