Posts Tagged ‘Sonics’

Next Page »

System-Level Models Redefined

Thursday, May 24th, 2012

By Ann Steffora Mutschler
It wasn’t that long ago that the promise of system-level models was an easy implementation path and the ability to then reuse the models in a different design, for a different target application. But how reusable are those models in reality? The answer depends on whom you ask.

First, it is important to define what a system-level model is, noted Frank Schirrmeister, group director of product marketing for system development in the system and software realization group at Cadence. “If a system-level model is defined as a TLM model or something even higher above then by virtue of its abstraction, it’s actually re-usable by definition so to speak. I always compare them to the gate-to-RTL jump. Is the RTL model re-usable? Absolutely, because we have automation underneath to remap it to several technologies. Is the TLM model/the system level model for, let’s say, your high-level synthesis input re-usable? Absolutely, it’s reusable for that particular implementation it does. And then, you have the automation around it to actually get the implementation done.”

Further, are these models reusable in general terms the higher up in levels of abstraction? From his perspective they are—they are reusable across different applications and different designs. Otherwise it wouldn’t be commercially feasible for system-level houses or EDA vendors to provide them, he argued.

However, Schirrmeister pointed out, “You need to be precise about what you re-use them for. If you go up from the RTL to the TLM level first, then these models are re-usable for sure when it comes to processor models because they are re-usable for every design that uses the processor model.”

But not so fast, said Drew Wingard, co-founder and chief technology officer at Sonics. “The place where the system models have the bigger challenge is in trying to imagine when I integrate these things together, how is it going to perform? And there we have some challenges.”

The challenges boil down to the fact that for most of these applications, the cost mandate requires that the cheapest DRAM system is used for the SoC. The SoC maker may want to sell its SoC for $10 while the DRAM cost was approximately $8, but if more expensive DRAM is needed it could bump the SoC price to $12. At that point the end user may say they are still willing to buy the SoC but are only willing to pay $7 for it.

“The real challenge of modeling the performance of DRAM with enough accuracy to predict is not the bailiwick of most of the system level modeling initiatives. The virtual platform models don’t give you any real concept of performance and certainly nothing near detailed enough,” he explained.

These cost pressures combined with design complexity is changing the perspective on how models should be used.

“The notion of having a seamless path from having a high-level model and synthesizing it to maybe VHDL/Verilog model and onto hardware—I don’t see it happening. It might still be an industry dream of a couple of people and it would help a lot. It would help proliferate virtual prototyping a lot and it would help proliferate platform architecture design a lot, but it just doesn’t seem feasible to really have that seamless flow,” asserted Tom De Schutter, senior product marketing manager for system-level solutions at Synopsys.

Instead of an implementation point of view, he believes the view on re-use of models currently is defined more on a use-case point of view.

Besides developing the use case for creating testbenches from models, De Schutter explained that work is being done on how can models be re-used across different types of use cases for different types of software developers, be it OS porting or middleware development onto more verification use cases of IP blocks or looking at it from a software performance and energy point of view.

“Because the software is becoming so important, maybe it’s not that important that the model has an implementation path as long it provides value across the lifecycle of the different stages of software and the different types of software—the value of the model establishes itself, as well,” he said.

Toward this end of making the models re-usable across different use case scenarios, it comes down to defining the different things that a model has to do to be useful for those use cases.

“In a lot of cases, the way we as an industry—customers and vendors—looked at it, there was always a notion that you need a lot of accuracy, you need a lot of timing for models to be useful and, of course, the more complex systems become the more that breaks,” De Schutter continued. This becomes clear particularly with the latest approaches to processor design such as ARM’s big.LITTLE approach. “If you look into those systems, just that specific subsystem has up to eight processors, and that’s not taking into account the rest of the system where the baseband, Bluetooth, WiFi, the power management system—everything has cores. So it’s becoming very hard to have very accurate models, and accuracy then defined as timing accuracy, and simulate them in a reasonable simulation speed.”

De Schutter said the current thinking is that the software itself doesn’t need timing accuracy or cycle accuracy to be developed or even optimized. “Again, looking at it from a different point of view rather than from a hardware point of view and an architecture design or an implementation point of view, we are starting to more and more look at it from a software point of view. How can this software help optimize the system? ARM big.LITTLE is actually a perfect example of this,” he added.

In closing, Wingard offered some harsh criticism for some approaches promoted by some vendors today. He believes the models some companies are providing to their customers are not accurate enough to do architectural sign-off exercises.

“While they might help the designer try to get to an intermediate design point, they’re still forcing the development team to go to an emulator to prove whether the architecture is viable or not. They have this additional problem that even if it works on the emulator, it doesn’t mean it will work on the layout of the chip…so that when they get into layout the floorplan changes associated with dealing with the actual layout constraints end up rippling back to their architecture and creating additional substantial performance problems that need to be re-architected. That generates an additional round of problems that basically force the customers to tape out sub-optimal solutions,” he concluded.

EDA’s Cloudy Vision

Thursday, May 24th, 2012

By Ann Steffora Mutschler
Since the dawn of EDA, the industry has largely operated under a traditional software distribution model whereby the customer would run the software it licensed on its own hardware equipment. With the sophistication of advanced IT management techniques as well as education surrounding “The Cloud,” it may be safe to predict that engineers in the not-to-distant future will be designing and verifying SoCs entirely in the cloud.

Right now, however, it’s a different story.

“As an industry, it’s pretty nascent for EDA applications,” observed Dave Desharnais, senior group director for product management in the silicon realization group at Cadence. “We ended up getting permission to use our tools in the cloud with constraints because there really was no formal way to do that in a third party cloud context. It’s really been large-scale companies that have been coming to us for about the last three years, on the tune of probably 10 to 12 a year, asking us because they are Cadence customers, ‘How do we use your tools in the cloud? We want to use your tools in the cloud.’

A frontrunner in this model, Cadence has offered its Hosted Design Solutions since 2007. It is a turnkey, private cloud offering born out of its design services. “It’s not a driver for our business but it’s certainly an option for a certain class of customers—mostly small and medium sized,” he said. With about 50 customers today, it has seen linear growth since the program began.

Desharnais is quick to point out, “This is really our entrance into the software as a service (SaaS) market because the customer owns the data but we want to provide that pathway in, all the compute resources, it’s turnkey and it’s beyond emulation, it’s everything: custom analog to digital ICs, all the physical and logic verification, emulation.”

Datacenter at Cadence Design Systems (Source: Cadence)

“It’s really no different from a business model in traditional EDA—you buy a license for a certain time. In this case you’d buy a set of licenses for a certain time with some bursting capabilities. That’s the SaaS model,” he said.

An even earlier player in the cloud arena is Sonics. Drew Wingard, co-founder and chief technology officer, said he feels like a grizzled veteran of this topic. He explained that way back in 2001, Sonics introduced a vehicle called SoCworks. Today it would be referred to as SoC- and IP-core-based design in the cloud.

“A challenge we were trying to address in those days, if you look at the basic business transaction model behind selecting IP cores to go onto an SoC, is that one of the big challenges is this horribly long dance that happens at the beginning before the customer can actually do the evaluation,” said Wingard. “It ends up being very messy, and in many cases it’s like six months to get through this process. The idea we had at the time was the reason for a lot of this protectiveness is around worries about the theft of the IP or worries about the pollution of the engineering, and that in the chip field we don’t worry about this as much because the chip that we distribute is pretty obscured. It’s buried inside a package, and you can’t really figure out what’s going on inside. The distributor can ship you a chip for evaluation purposes without all of this stuff. What if we could get to that same level of abstraction around IP cores?”

Sonics built a set of servers, in what would now be called the cloud, that ran the development environment that was part of its solutions. It still is, and it “allowed people to mix and match IP cores from who were then some of the major providers of IP cores—guys like MIPS and Tensilica, the inSilicon part of Synopsys—and actually plug and play them together around our interconnect fabrics, run some basic simulations so they could try before they had to go through all that legal stuff. We were way ahead of the curve on this clearly,” he added.

Mentor Graphics too is not new to the private cloud. Michael Buehler-Garcia, senior director of marketing for Calibre design solutions explained that as part of Calibre, there is a multi-thousand CPU farm in the floor of the building where the Calibre team is located, which is run as a private cloud to its different development teams around the world. “So are we doing cloud? Well, yes, and anybody who runs OPC accesses an SoC cloud because the size of designs requires multiple CPUs to run it.”

Datacenter at Mentor Graphics (Source: Mentor)

Today, the company expands its cloud play with its announcement (http://www.mentor.com/company/news/) of a cloud-based DFM Analysis Service based on the Calibre platform for TSMC 28nm and 40nm foundry customers, which analyzes the customer’s design database to meet the requirements for TSMC’s lithography process checking (LPC) flow. It delivers a results database containing hotspot locations and fixing hints that can be used by routers to perform corrections. Buehler-Garcia expects it to be attractive option for customers who tape out only a few advanced- node devices per year.

In addition, Synopsys has been offering verification-on-demand in the cloud for a few years.

IaaS Before SaaS
Before engineering teams embrace SaaS, Cadence’s Desharnais believes the infrastructure as a service (IaaS) model will take off as an interim step.

Interestingly, he said, customers are requesting permission to run Cadence software in the cloud…even if they don’t know what to do with it yet.

“What is shocking to me is very large sophisticated mobile telecom-type companies [like those] in the San Diego area—those guys aren’t doing it. Those guys are leading, bleeding edge, and they are the guys you would expect would have it. They want permission but they haven’t pulled the trigger on it yet because they are still trying to figure it out and how they are going to use it,” Desharnais continued.

“What I’m seeing the most, if you look at the semiconductor industry at large, it’s not so much a SaaS model, it’s more of an IaaS model. There’s a huge reluctance to put anything like an RTL on the Web or any sort of physical design that can be effectively pirated or moved so the security pieces scare them. As an EDA vendor, we’re actually not in a position to solve that. There are more systemic kind of things that are in the way. But we see companies taking a baby step in this direction, and this is where they are getting pressure from their CFO or their CIO or their CEO to start moving to more of an operational expense versus a capital expense model. And they say, ‘If we’re doing it for other things in our company (financials, HR) why not do it with EDA tools?’” he suggested.

In the long run, what probably makes the most sense is for EDA tools to be hosted by third party cloud providers such as Osmosix, Plunify and Xuropa, as opposed to private clouds hosted by the EDA vendors since customers won’t want a locked solution.

A newcomer to this scene is SiCAD, which comes out of stealth mode today. What’s different about this company is that it makes no tools of its own, but pulls together a cloud-based multivendor solution for a number of vendors’ products, including all of the Big 3 EDA companies and many point tools. As CEO Jai Iyer tells it, the key was to identify the pain points where utilization is extremely high but utilization over a year is low.

“The problem is that peak use for some of these tools last two to three months, and the rest of the time they’re sitting around idle,” said Iyer. “When you look at signoff tools for static timing analysis, extraction and DFM, they’ve all got low utilization and they’re expensive.”

It remains to be seen whether a multivendor complete solution will fare any better than previous attempts by established companies. At the end of the day, many will still argue that the business model is not working. But there are still those who believe that—at least someday—it will. The only question is when.

Customer IT Deployment Types (Source: Oracle)

Cloud-Scale SoCs

Wednesday, May 23rd, 2012

Sonics CTO Drew Wingard talks about what’s changing in SoC design as performance ramps up on mobile devices and power is ratcheted down to save battery life.

YouTube Preview Image

The Interconnect Game

Thursday, April 26th, 2012

By Ed Sperling
Having a single bus protocol is something most SoC engineers can only dream about. Reality is often a jumble of protocols determined by the IP they use, which can slow down a design’s progress.

The problem stems largely from re-use and legacy IP. While it might be convenient to use only on an AXI standard protocol from ARM, most chips are a combination of IP tied to specific protocols that require complex interconnects, add significant time to the verification process, and often have an impact on performance.

“It’s never AMBA, Sonics or Arteris for everything,” said Mike Gianfagna, vice president of marketing for Atrenta. “There are a lot of configurations on a chip. You’ve got crossbar switching and arbitration schemes. The big question, particularly when you get into 3D stacking, is which one you should use. So you come up with half a dozen configurations and you experiment for power, performance and area.”

He said the on-chip interconnect problem is one more complexity issue that has to be ironed out. But it also has some unusual pitfalls. “An IP block is like an amoeba. It can morph in unpredictable ways. You need to be able to analyze that up front.”

How we ended up here
There have been a number of attempts over the past 15 years to avoid this kind of problem. In 1996, when the Virtual Socket Interface Alliance (VSIA) was formed, SoCs were still in their infancy even though more and more chips included some sort of processor. The hot topic at that time was whether to decouple the processor from the chip and isolate components from the interconnect. That gave rise to a handful of ARM standard buses.

“The job of the interconnect fabric is to just make it work,” said Drew Wingard, CTO of Sonics. “But what’s happening in designs is the total level of integration is going through the roof. We’re now seeing chips with more than 100 IP cores, MPEG encoders and decoders and Huffman algorithms, and you need the interconnect in a subsystem to be a good match for what you’re trying to do. The interconnect needs to be optimized for that.”

But within a single design there may be dozens of interconnects from multiple vendors, including some that were internally developed by the chipmaker.

“There will still be custom semiconductor companies doing their own interconnects,” Wingard said. “But for the bulk of the design, the number of interface standards generally is going down and most IP cores are much more latency tolerant than they used to be.”

Past, present and future
To a large extent, SoC developers are suffering from the same kind of backward-compatibility issues as software and processor vendors have been wrestling with for decades. What makes it an issue now is the level of integration and the emphasis on re-use of IP because of cost and time-to market constraints.

“If you look at the big companies, there is a long legacy of using things so they have a lot more heterogeneous stuff,” said Laurent Moll, CTO at Arteris. “Some of it they got through acquisition. If you were to create a brand new company—and there aren’t many of those these days—with a clean sheet of paper they would most likely pick the IP that is homogeneous. So you might settle on AXI as the dominant protocol, and you might even be able to achieve that today because most commercial IP is available with AXI.”

He said the first reason companies choose a homogeneous interconnect fabric is integration and verification. “It’s easier to have one person be the expert on a team than have to work with a bunch of other experts. It also takes less time to verify, fewer tools, and less time to integrate.”

Also key is performance, but that’s far less of a clear-cut decision because not all IP behaves the same way in different designs. “There are sets of protocols that don’t like to talk with each other,” Moll said. “Even the same protocols sometimes don’t work as well together as you would expect.”

Even more complexity
Just getting these various IP blocks to talk with each other is hard enough. Doing it efficiently is as much art as science. But at the center of any discussion of power there is almost always the interconnect fabric.

“Logically, the longest wires on a chip are in the interconnect,” said Sonics’ Wingard. “You have to get to all four edges of the chip. That’s why interconnect architectures are frequently restructured to decrease the time it takes to get a signal from one side to the other.”

Wide I/O and stacked die are being viewed as a way of dramatically reducing distances on a chip by running them through an interposer. To a large extent, that’s an interconnect problem. With non-uniform memory characteristics, one chip may be one or two ticks closer, which in turn improves throughput and scalability. It also allows designers to load balance data structures and traffic, Wingard said.

The downside of this approach, again, is choice—too many choices, in fact.
“The Achilles heel of 3D is too many options,” said Atrenta’s Gianfagna. “You have to reduce the number of choices quickly. So even when you come up with your bus architectures, power domain management is still a big deal.”

Gap Vs. Gap

Thursday, April 26th, 2012

By Ed Sperling
Among tools vendors it’s been standard practice to listen closely to customers but not deliver everything they ask for—or at least not always on the customers’ timetable.

This strategy has worked well enough for both sides in the past, but at 20nm and in stacked die configurations, the level of tension between these two worlds is increasing, and the gaps in the tool chain are becoming more noticeable. Part of the problem is that skyrocketing complexity is forcing more automation, but integration issues, physical effects, process variation and the realities of physics make it more difficult and time-consuming to develop tools to make that complexity more manageable. R&D budgets for EDA companies already are hovering around 30% or more, compared with average R&D investments of about 10% to 20% in other areas of chip development and manufacturing, and betting on the wrong area can have a significant impact on EDA company’s earnings.

The other part of the problem is that chipmakers’ own internal tools are running out of steam at advanced nodes because of the need to bridge both hardware and software design environments and because old methods of doing things are way too slow and very often ineffective. This is clearly reflected in the fortunes of EDA tools vendors, which have been rising steadily for the past couple of years, with the strongest growth in areas such as ESL, including hardware-software co-design and software prototyping, and emulation.

Add in stacking of die, in both 2.5D and 3D configurations, and the number of issues that have to be dealt with by both chipmakers and tools vendors increases by orders of magnitude. On top of that there are double patterning issues at 20nm, finFETs at 14, and potentially 450mm wafers that will require significantly higher yields to be cost-effective, but which may be harder to test in wafer-on-wafer or die-on-wafer configurations.

Where chipmakers see challenges
Riko Radocjic, director of design for silicon initiatives at Qualcomm, breaks the design process down into three areas—design authoring, which is the actual chip design; pathfinding, which includes exploration for how to best build a chip; and tech tuning, which is physical space exploration. Most of the EDA tools have been effective in the design-authoring phase, with some point tools now finding their way into the pathfinding area. But the real challenges are in the tech-tuning area.

Mechanical stress becomes a serious issue in 3D stacks, ranging from the effects of TSVs to die alignment. “You cannot solve the problems with a tool. They have to be solved in a flow,” he said. “It’s a debug nightmare. You need a separate domain that takes external stresses and produces a set of rules. You also need hotspot checking to make sure you have caught all of the interactions.”

Also missing, he said, are EDA tools that understand the materials inside a stacked die, and a standard PDK for all mechanical and thermal properties.

“Thermal is the next frontier,” he said during a presentation at the Electronic Design Process Symposium this month. “You need to manage for hotspots and overall system power. On a global level you have skin temperature and overall system power. On a local level you have to manage hot spots, junction temperatures and power density. And there is also the compounding factor that all advanced systems use some form of thermal management. We need a system-chip co-design methodology and tools to deal with this. We cannot solve thermal issues only at the component level. It must be system and component, and we will need tools for pathfinding thermal issues. We don’t even know where to put our thermal sensors. We need thermally aware floor planning.”

Summing it up, he said it amounts to 3D-aware co-design tools for package, system and thermal, and a flow to integrate everything.

A stacked die of the future; memory on top left, with logic/memory in middle on top and I/O and analog RF blocks on top right. All feed into interposer stack in the middle. Source: Qualcomm.

Altera, which has just developed its first 2.5D stacked FPGA prototype, is encountering similar thermal and mechanical issues. While the company continues to offer scaled down tools for FPGA development, it needs the most advanced tooling for creating those FPGAs in the first place. Topping his list are robust standards for cells, IP and stacked ICs, as well as tools to help quickly identify some of the problems that Altera encountered while developing its prototype stacked die.

Signal integrity issues encountered and addressed by Altera in stacked die using interposers.

“We’re looking for more of a divide and conquer strategy,” said Arif Rahman, product architect at Altera. “Die stacking will be an enabler for a complete solution in the future, but it will not just be an FPGA. It will be an FPGA plus other accompanying functions.”

Where EDA tools vendors see challenges
For the tools vendors, the list of problems that need to be solved is exploding. So many things need to be fixed and solved that it’s imperative just to focus on both what will have the most impact and what will provide the greatest long-term returns.

At least part of that effort involves existing tools, which have to be run faster and do more things than in the past. This is particularly true in areas such as emulation, which in the past were used almost exclusively for hardware. They are now becoming the tool of choice for software verification because the complexity of the software makes it far too slow to run using simulation. What takes hours or days in simulation can be measured in seconds in emulation. And given the fact that verification is still the lion’s share of the NRE, anything that can be done to solve this problem is considered a big win.

Mentor Graphics’ announcement this week of enhancements to its emulation tools is a case in point. Recognizing that software engineers are using the emulation tools as much as the hardware engineers, the company has added a virtualization layer that allows a workstation to be a front end—matching the way software engineers work—rather than doing work in a lab the way hardware engineers typically work.

“This allows one workstation per user,” said Jim Kenney, director of marketing for Mentor’s Emulation Division. “We’ve also been working to improve performance and capacity so you have more robust software execution and debug.”

Mentor isn’t alone in this quest. Cadence has been updating its own emulation, and all the EDA vendors have been racing to improve the reach and integration of their tools. Bassilios Petrakis, product marketing director at Cadence, noted that building smaller die that yield better is still a challenge that needs to be solved—particularly before stacking becomes mainstream.

“When you look at multiple die with TSVs, the cons are that the ecosystem is still emerging, there is no volume production yet and there are thermal issues,” Petrakis said.

Samta Bansal, senior product marketing for SoC Realization at Cadence, predicts that stacking memory on logic using an interposer will become mainstream beginning in 2013 to 2014, with TSVs becoming mainstream by 2015. She said work in EDA typically needs to begin three to four years before these efforts, noting that it began in earnest at Cadence in 2009. Synopsys rolled out its 2.5D tool flow last month and is working on a full 3D flow, and Mentor has been working on a variety of areas ranging from test to modeling of stacked die over the past several years.

But EDA vendors also need to pick new areas for the future, and this is where even the best educated guesses become difficult.

“EDA traditionally has been an industry where big companies acquire small companies doing interesting things,” said Wally Rhines, chairman and CEO of Mentor, noting that markets that have shown strong growth include DFM, formal verification, ESL and power analysis. He said the next wave of electrical design challenges include low-power design at higher levels of abstraction, optimizing embedded software for power, in-circuit emulation, design for test, physical verification, stacked die verification, and system design that extends beyond the PCB.

That concern is echoed by Drew Wingard, chief technology officer at Sonics: “From an EDA perspective, the next layer up in the power hierarchy is how we convince ourselves that the hardware and software are working together correctly. This is a different protocol check than we normally do. You have dependencies, because you can’t turn off one until another turns off. The mix of hardware and software makes it difficult to prove what’s correct. Right now there is not even enough time to test the power management until the second spin.”

He noted that just trying to get software to turn on the power management features in a chip is a challenge. “The thermal/power reduction to be gained by turning on features already in a chip can be significant.”

One issue that almost certainly needs attention is derivative designs. Getting them out the door is painful, expensive, and time-consuming.

“A lot of engineering that’s being done is derivative engineering,” said Naveed Sherwani, president and CEO of Open-Silicon. “This is not something that EDA vendors focus on, but it’s something that’s definitely needed. What’s out there is a kluge of methodologies and flows. EDA so far has not woken up to this opportunity. They certainly listen to their customers, but they’re still not close enough. You have to do the work to understand it, and the revisions and changes that are needed are painful. A derivative is almost like a new project. There can be 1 million degrees of improvement here.”

Conclusions
All of this requires tools—notably more and new capabilities built into existing tools—as well as new tools that can integrate all of these pieces. But what gets addressed first is a difficult balance.

While chipmakers at the leading edge are used to developing some of their own tools, methodologies and dealing with poor yields, their existing development is running out of steam. That means moving forward at advanced nodes and in stacked configurations will require developing entirely new versions of tools, methodologies—an enormous expense by anyone’s calculations.

Qualcomm's proposed tech-tuning flow.

EDA vendors, meanwhile, have their work cut out for them just updating their existing tools, and they are cautious about massive investments in new areas that may not return dividends within an appropriate time frame—or within an immature supply chain when it comes to stacking of die.

“To get ROI back on tools of this complexity you need more than 20 customers,” said Mike Gianfagna, vice president of marketing at Atrenta. “That means you’re going to be negative on that investment for three or four years. So you really have to pick your battles, and small companies probably can’t do this at all.”

Gianfagna noted that for chipmakers the challenge is too many options. “You need a way to prune the solutions space fast. You have to figure out which architectures to choose quickly and which roads to pursue further. The real gap is not in the tech tuning. It’s coming up with the right architecture that supports meaningful decision-making.”

The question now is when the gaps that each side sees will merge, and when it will become profitable enough to take an investment risk.

Blog Review: March 28

Wednesday, March 28th, 2012

By Ed Sperling
Mentor’s Dennis Brophy looks back on the life of the man who first pulled him into the standards world, Don Loughry. It’s a good story and a great eulogy to one of the stars of the standards effort.

Cadence’s Richard Goering examines an all-too-common phenomenon in testing a chip—exploding it. Testing a chip with everything on is a lot different than testing it with the normal functional power. Make sure you check out the photo.

Synopsys’ Navraj Nandra looks at non-volatile memory and why it’s important for smartphones with near-field communications. When you swipe your phone, speed and battery life are critical.

How many TVs can U.S. households hold? Apparently not as many as TV makers would like. IHS iSuppli’s Lisa Hatamiya predicts flat panel shipments will fall for the first time ever this year.

Mentor’s Michael Ford compares the taming of young music students to orchestrating of the chip manufacturing process. Just imagine if Toscanini had been in charge of a 28nm fab.

Cadence’s Adam Sherer digs into verification of power-aware designs and why they should be running low power in every regression test.

In case you’ve wondered where you can augment your verification skills for AMS, Synopsys’ Helene Thibieroz details who’s teaching this summer at UC Santa Cruz. Bring your surfboard.

Mentor’s Mike Jensen rolls out Part 5 of his analog modeling epic, this one focusing on implementation of equations using VHDL-AMS.

And in case you missed the most recent issue of the System-Level Design newsletter, here are some standout blogs:

–Mentor’s Jon McDonald sheds light on cycle-accurate models and why they’re not always necessary or even good.

–Synopsys’ Achim Nohl shares some insights about virtualization and ARM’s big.LITTLE processor.

–Cadence’s Frank Schirrmeister steps back and assesses how many of ESL’s core pieces have moved beyond the early adopter phase.

–Sonics’ Frank Ferro asserts that speed is still the crucial requirement for all SoCs. Damn the torpedoes, full speed ahead.

–Arteris’ Kurt Shuler looks ahead to the coming shakeout in the design industry and who’s going to be affected.

–Atrenta’s Mike Gianfagna compares SoC development to an old video game with much higher stakes.

–eSilicon’s Javier DeLaCruz looks at which companies will be the drivers of TSV packaging.

–Methodics’ Simon Butler expounds on the continuous build approach and why it’s necessary to take SoC design out of the Dark Ages.

SoCs Go Mainstream

Thursday, March 22nd, 2012

By Ed Sperling
The monolithic ASIC, which has been the bread-and-butter of chipmakers for decades, is giving way to systems on a chip among mainstream chipmakers and at mainstream process nodes.

This shift has been overhyped, overpromised and slow to materialize. While SoCs have been common for years in mobile electronics and for high-performance platforms such as gaming consoles, they have always been more expensive to design and manufacture. But at 40nm and beyond—and increasingly even at 65nm and 90nm—physics, an increasing amount of software and the inclusion of more third-party IP are forcing changes in best practices for designing chips. And as the industry heads into 2.5D stacking over the next couple years, subsystems that can be part of systems in package will add even great emphasis, as well as some new wrinkles, to the shift.

“It’s happening now and it will continue to happen,” said Tom Lantsch, executive vice president of corporate development at ARM. “We’re seeing application processors that are heterogeneous multicore on the same chip with graphics engines and video engines and they’re now running Symbian instructions. A lot of this shift is based on power. There’s a realization that you can do things other ways more efficiently.”

So what exactly is the difference between an SoC and an ASIC? The common definition is that an SoC includes one or more processors plus software and peripherals, making it a complete system rather than a ASIC, which is suited for a very specific task.

“The ASIC customer used to be the system house,” said Hans Bouwmeester, director of IP at Open-Silicon. “But now the system houses and fabless semiconductor companies are focusing on horizon tasks. It’s not divided by front end and back end anymore. It’s horizontal and vertical, which is re-use or availability of IP and competence. If you look at ARM’s chips, they’re applicable across multiple domains and customers are willing to outsource that development to them.”

This shift hasn’t been lost on Open-Silicon or eSilicon, both of which are shifting from an ASIC to an SoC approach. And both say the SoC world will explode once the once the industry begins adopting 2.5D stacking over the next couple years—a move that also may include more emphasis on FPGA platforms as part of the 2.5D stack.

Partition issues
At least part of what an SoC brings to the design table is flexibility. There is an ability to try different things, and at each new process node more room to experiment. But silicon is never free, even if it is available. Shrinking feature sizes creates its own set of problems at each new process node.

The typical method of deal with these problems is a “divide and conquer” approach. If there are 500 blocks, those blocks can be aggregated according to function, shared resources, or some other scheme. But in an SoC, finding the right line on which to base that partitioning is more difficult. Even worse, it can change, depending upon which market a chip will serve.

“If you do a flat design you always get the best quality,” said Sudhaker Jilla, product marketing director at Mentor Graphics. “But as the chip grows the runtime becomes unbearable. It can go from hours to more than a week. The alternative is to use a hierarchical approach, but then you have a problem of performance. You want the turnaround time of a hierarchical flow, but the quality of a flat one.”

The reality is both are needed for SoCs, but that also means a significant learning curve for the design teams. They need to learn new tools, figure out how to partition their designs—whether it’s by blocks, geography, or IP.

“The key is that companies need to figure out how to divide and conquer,” said Jilla. “Will it be dual-core or quad-core? Or will it be multiple different cores?”

More tools, more IP
For EDA and IP vendors, this is only good news. Selling to the biggest chipmakers has always been lucrative, but continuing to sell to those same customers while also adding incremental business is a big win. FPGA tools have been sufficient, for example, to do basic layout and verification, but put that same FPGA into an SoC or a stacked die configuration, add software and third-party IP, and then try to integrate it all together and the complexity easily outpaces what the typical FPGA tool can do.

“The biggest trend is that people are spending 35% to 40% of their effort writing software,” said John Koeter, vice president of marketing for Synopsys’ solutions group. “When you get down to 28nm or 20nm, companies are spending more than 50% of the time to market developing software. If you look at an SoC today, it’s usually two to four host CPUs, two to four GPUs, and it’s increasingly heterogeneous.”

He said that opens up huge opportunities for linking software to hardware, and virtualizing the hardware and software. It also opens up opportunities for IP, tools to help integrate that IP, exploratory tools that can show the tradeoffs at the architectural stage, and a suite of verification tools and verification IP.

“Just from a verification standpoint you’ve got to tackle this at several levels,” said Pete Heller, senior product line manager at Cadence. “You’ve got to look at it from the subsystem and block level for functional reasons. And you’ve got to look at the full SoC and pump real data through the system so you can get as much real-life validation as you can. Then there’s a third level, which is to put it into the hands of 100,000 people and let them be the guinea pigs after you’ve already worked out all the bugs you can.”

What is a subsystem?
That leads to the next phase of this whole development scheme—fully integrated and tested subsystems, which are expected to begin hitting the market over the next year in preparation of more SoCs and 2.5D stacked die.

“If you look back 10 years when Gartner was tracking design starts, in 2000 there were about 20,000 chip designs a year,” said Drew Wingard, CTO at Sonics. “Now we’re seeing more SoCs because you have processors sprinkled around the chip that may or may not even show up in the bill of materials and that you may or may not have access to.”

Increasingly, those pieces will be combined into fully integrated systems that include IP, possibly processors, and perhaps even shared resources such as memory with standardized interfaces. That approach will become particularly useful when chips can be stacked, either in 2.5D or 3D, and it will completely render the number of design starts meaningless. There will be more design starts, but the final outcome may be subsystems rather than chips—or chips that are part of a stack rather than the fully integrated stack itself.

“A general-purpose processor may not be the most efficient way to accomplish a task,” said Wingard. “This has led to a huge discussion around subsystems. Not everyone believes each function needs a processor. But how independent is a subsystem going to be? You can quickly get into a situation where you have enough performance most of the time, but there may be specific and critical sequences where you don’t have enough.”

There has been a lot of talk about subsystems across the industry lately, and companies are positioning themselves to take advantage of this shift. But the challenges of making this all work are huge.

“This is similar to the challenge embedded companies have faced for a long time,” said Simon Butler, CEO of Methodics. “It’s one thing if you’re dealing with a homogeneous environment where the tools talk together. But when you have to bring all these different pieces together and make sure all the parts are aligned, it’s going to be very difficult.”

Past, present and future
Still, the road to SoCs has been set and it’s gaining momentum. That became very obvious at the Consumer Electronics Show over the past couple of years.

“What’s changed is the user experience is now a combination of hardware and software,” said Mike Gianfagna, vice president of marketing at Atrenta. “We’re seeing the consumerization of electronics. The idea isn’t new. Joe Costello was talking about this a decade ago. But it’s finally happening. The semiconductor content is enabling the user experience.”

That will only increase as future designs allow more choices of IP, software, processors and ultimately subsystems on a chip—and more intelligent tradeoffs to make it all work faster and cheaper while using less energy.

Coherency Becomes A Stack Of Issues

Thursday, March 22nd, 2012

By Ed Sperling
As complexity increases and the industry increasingly shifts away from ASICs to SoCs, the concept of coherency is beginning to look more like a stack of issues than a discrete piece of the design.

There are at least five levels of coherency that need to be considered already, with more likely to surface as stacked die become mainstream over the next few years. Perhaps even more mind-numbing, this stack itself will have to take on a level of coherency over the couple generations of chips.

Let’s take a closer look.

Cache coherency
The concept of keeping data coherent historically was relegated to processor makers such as IBM, Intel and AMD, which have focused on improving performance through faster access to data. One solution to that improved performance has been multithreading and multiprocessing. Along with that, these vendors have added in various levels of cache memory for faster recall of important data.

More cores also makes it harder to effectively use these caches. Data has to be kept consistent, which requires more system overhead in terms of processing and power just to maintain that coherency. And it gets even harder as more cores are added into an SoC, which increasingly are not same size, do not run at the same frequency, and sometimes do not even connect directly to the main CPU.

“With cache coherency, some of the traffic may be serviced by the cache on another GPU,” said Drew Wingard, CTO at Sonics. “If you’re just using an ARM core, the CPU coherence is sufficient. But the GPU uses its own local memory. You really want it to be fully cache coherent across all of those.”

But even finding the data to maintain consistency may be a problem in a complex SoC.

“You can view what’s in memory, or view it and be able to change what’s in memory, but first you have to find it,” said Kurt Shuler, vice president of marketing at Arteris. “If you have four cores, the most efficient way to hook them up is for each core to have its own cache and graphics to have its own cache. If you change something, you have to snoop in all the caches to make sure it’s consistent.”

But there is also a move in the completely opposite direction—sharing memories among multiple cores—because it reduces the number of components on the bill of materials. The Low-Latency Interface specification from the MIPI Alliance is a case in point, where a memory can be shared between a modem and an applications processor. Intel, meanwhile, has added on-chip graphics that share memory with the CPU.

“The whole design gets more complex,” said Shuler. “You have more traffic beyond the cores, and from a power standpoint the overhead goes up.”

Still, cache coherency is one of the better-understood pieces of this stack. It has been an issue ever since multiprocessing was first employed in the 1960s. “Snooping” has been widely used since that time.

Software coherency
A newer facet of coherency involves embedded software. Because SoCs now include an increasing amount of software in the design, engineering teams now have to wrestle with coherency issues that previously were dealt with by the operating system.

“Fundamentally you’ve got two combined issues here,” said Andy Meyer, verification architect for Mentor Graphics’ Design Verification Technology Division. “You’ve got cache coherency, where the same data is being viewed in a couple places. And then you’ve got an issue with consistency in the simple code in a uniprocessor that now has to run on a second processor. The ordering of events can change in multiprocessing.”

Those problems crop up regularly in verification, but not always with the expected results. It’s difficult to effectively write the stimulus in a testbench for coherency. What happens, for example, when a core is shut down to save power?

“The scariest part is when there is no OS support,” said Meyer. “There’s also a big problem with heterogeneous cache, such as when you have a CPU working with a GPU.”

Another issue has to do with effective coverage in verification, already a problem for complex SoCs. States frequently are distributed across multiple chips and multiple boards. Timing varies from one state to another, and can be particularly problematic if snooping functions are tied to a state. And parallelism continues to baffle even the most advanced teams.

“Standard coverage methods don’t work well here,” said Meyer. “You have to query in ways you traditionally didn’t have the power to query and ask questions across months of regressions. For instance, ‘Have we been here ever—or in the last two months.’ Until coverage steps up, people with deep knowledge of verification running hundreds of full-time emulator systems are finding out at the last minute that it’s not okay to ship.”

I/O coherency
Tied in with both cache coherency and software coherency is I/O coherency. Increased communication on a chip, between chips, and between a chip and the outside world, have turned what used to be a relatively straightforward networking issue into a complex jumble of prioritization and synchronization.

“You have to deal with this even in single processors,” said Sonics’ Wingard. “You may have a PCI core streaming data into memory. Today, without I/O coherence, it’s difficult to determine what is coming in. The CPU has no way of knowing what was transferred when it dos a copy from non-cache to cache.”

He noted that personal computers had I/O coherency for a long time, particularly with direct memory access. DMA was developed initially to help solve the bottleneck that occurred when a CPU was involved in an I/O transfer. Rather than tie up the CPU with that transfer, the CPU continued running, then accepted an interrupt when the transfer was completed.

But with more of this being moved onto a chip, keeping coherency while moving data back and forth from more places is becoming much more difficult.

Ecosystem coherency
One of the least addressed facets of the coherency stack involves business and communication issues across a supply chain for a particular SoC rather than the actually technology itself. Even where competitive suspicions can be overcome, the very different approaches taken for designing components, IP and software, as well as language barriers, create one of the more difficult and less tangible challenges in the coherency stack.

“The challenge going forward is that you have a bunch of people who may not be that skilled in system development driving the chip and spec for one design, and other supplier trying to orchestrate things,” said Mike Gianfagna, vice president of marketing at Atrenta. “So you bring them together to solve a problem for one customer in 12 weeks and then they move on. You’ve got corporations coming together and bringing all these pieces together almost like the way a movie is done. But is there a coherent way to communicate data and information risks and still provide good visibility from a power/performance/area point of view?”

For decades this task has been handled by IDMs, but in the SoC world there are far fewer IDMs these days. Many of these chips are built using third-party IP such as cores from ARM or MIPS, DSPs from companies such as Tensilica, and standard IP from the Big Three EDA vendors.

Coherency in stacked die
It’s uncertain whether stacking of die, either in 2.5D or 3D configurations will make coherency easier or harder. The answer is likely to be a little of both.

“With 2.5D and 3D, you’re looking at low-power memory access,” said Arteris’ Shuler. “You put the DRAM closer to the CPU, the addressing is wider and you get rid of some of the latency. But you also need coherency across all of this.”

No one is sure yet how multiple high-speed communication channels between die will affect coherency. If the channel between the core is wider and shorter that will improve data speed, but if processors and DRAM are scattered on multiple die, with some of them shut down, some partially shut down, and others fully active, it may make it harder to keep track of data and make sure it is all synchronized.

Managing Complexity With Advanced Packaging

Thursday, March 22nd, 2012

By Ann Steffora Mutschler
Engineering teams across the globe continue to pound the process geometry treadmill to stay on the curve of Dr. Moore to achieve better speed or lower power or smaller die—and it all adds up to increased complexity in the design and packaging. However, with advanced forms of die stacking such as package-on-package, silicon-in-package, 2.5D silicon interposer technology and other techniques, engineering teams now have more degrees of freedom around how chips are constructed.

A significant consideration in moving from one process generation to the next is that there are many IP functions that must migrate. “Sometimes it’s too expensive to port it from one generation to the other and you may not need it as far as the speed or as far as the power,” noted Shafy Eltoukhy, vice president of manufacturing operations for Open-Silicon.

This is where advanced die stacking comes into play. The engineering team may consider going to 28nm for one particular aspect of the function—for example, to get a better speed in the ARM processor—while there are a lot of other interfaces for a particular die that may not have to be in that advanced process node. A USB 2.0 or 3.0 does not have to be in 28nm to achieve the requirements—it could be in 90nm or 40nm, he said.

“The whole notion of re-using IP is common, though something not as commonly discussed is the reusability of die. What we’ve been seeing a fair amount of is companies saying, ‘I’m going to use advanced packaging techniques that are available today and I’m going to take this older generation die that I’ve got sitting on the shelf. And I’m going to make a much smaller new chip to complete it or extend it or interface to it. And I’m going to put that all into a multi-chip module, or advanced packaging structure, and circle back and use a lot of the IP that is in actual hardware form and make that available.’ It’s not mainstream, but reusing IP 15 years ago wasn’t mainstream either,” said Jack Harding, president and CEO of eSilicon.

Engineering teams tend to have a certain function they really want to squeeze and go to the next generation, but there are a lot of other functions in the design that don’t have to be in the latest generation, Eltoukhy observed. In advanced SoCs, customers are paying first and foremost for the IP development. “You are paying more dollar-wise per silicon area for a function that does not have to be in 28nm.”

What process node makes sense
Naturally this leads to a discussion about not bringing every single function into the next generation, especially because some analog and RF functions do not scale very well. So why not stay in the previous generation and partition the design in order to leverage older technology where available and not re-invent it?

“What I have to do instead is some kind of interface between this technology and the new technology. I put only the function that I want in the technology that can handle it and leave the other somewhere else,” he noted.

The question then becomes how to connect these together. “You certainly can connect them on the package level, which people used to call MCM (multi-chip module). You can actually get multiple die and bolt them in the substrate of the package and connect them. But the package technology has been way, way behind compared to the silicon technology, and you may end up with much higher power and slow interfaces and so on,” Eltoukhy explained. This has led to the development of silicon interposer technology in order to replace the substrate interconnect or the package interconnect, which is commonly known as 2.5D stacking.

Essentially, silicon interposer technology connects one die to another instead of connecting to a package, thereby reducing power and improving speed. Xilinx already has made its version of 2.5D-stacked technology available with certain product families.

Another use of 2.5D would be in a processor design that needs to talk to a DRAM, he continued. “Most people have a DDR interface and you go through the board to interface with the memory. But this approach is slow and large. Instead of buying a DRAM package from a DRAM vendor, we ask the vendor to sell us a known good die, which can be attached with processors on an interposer so you don’t have to go outside the chip. The DRAM can talk to the processor right away and the form factor will be much, much smaller. So there are multiple applications for that interposer—mixing the process nodes so that you can reduce the cost and so on, and improving the yield or bringing up some known good die from the DRAM to your die.”

“The application processors, which are really only delivered with package-on-package memory, end up with a very easy knob in that system—they can pile on different amounts of DRAM. To them it’s almost the same design and it is the same software. A couple of bits different in the software and suddenly they’ve got a new derivative part,” said Drew Wingard, CTO of Sonics.

“In many cases the die itself has more package attachment or wire bonding sites than the package may have pins, so you may take the same die and put it into a different package with different amounts of I/O resources available, and then sell those chips—even though they are the same fundamental chip design—at different price points. That’s been going on for a long, long time but with some of the more advanced packaging technologies, there are new degrees of freedom there,” he added.

While sounding tantalizing, all of these options are still under development. Complicating widespread deployment are two factions in the industry at odds as to the right path forward. On one side are the semiconductor foundries, which would like to enable customers to use a transposer because, at the end of the day, they want to sell more dies to put on the interposer, Eltoukhy explained. “They say, ‘We can give you the interposer but you buy the dies from us and we can glue it together for you.’”

In another camp are packaging providers such as Amkor and ASE that fear losing business to the foundries and would also like to offer the interposer to their customers so they won’t go and do the interposer with their foundry. “These two camps are fighting now because it requires some investment from a capex point of view,” he added.

Managing complexity, saving dollars
In addition to dealing with complexity, advanced die stacking techniques can save big dollars, eSilicon’s Harding asserted. “You could measure it just in terms of NRE dollars, you could measure it in engineer years of work, you could measure it in terms of time to revenue. By any metric, going down the advanced-package, multi-die solution is better by two orders of magnitude than just actually making a new chip, and I would argue it’s probably better by one order of magnitude by just doing RTL modification, which still has high NRE and a lot of technical risk, albeit you have a product that is closer to being the final product. These decisions are classic risk-reward.”

Blog Review: Feb. 29

Wednesday, February 29th, 2012

By Ed Sperling
Synopsys’ Hezi Saar digs into the future of mobile devices—dropping some hints about what’s still to come. Expect more granularity and customization. This should raise the stress level inside IT departments. Forget going postal. The new phrase may be “going terminal.”

Cadence’s Richard Goering reports on the Accellera town hall meeting for the future of EDA standards. Standards are important. So was this meeting. But can you imagine if these people really comprised an entire town?

Mentor’s Colin Walls is headed to Embedded World in Nuremberg, Germany. We expect a full report. Embedded software, as design teams well know, is becoming a very big deal in IC design.

DeepChip’s John Cooley has compiled a list of Magma tool users voting to save those tools post-merger. Given the fact that the merger was completed last week, the ball is in Synopsys’ court.

TLM Central’s Tom De Schutter interviews Alex Braun of the European SystemC User’s Group. Given the fact that the early adopters of ESL were in Japan and Europe, this is good sounding board for what’s going on in system-level design.

Cadence’s Frank Schirrmeister looks at virtual divides and fixed subsystems. As customers demand pre-integrated and pre-verified IP, subsystems will grow in importance.

Mentor’s Harry Foster talks about Mentor’s Verification Academy for UVM. With all of the Big Three EDA vendors now on board with UVM, it will be interesting to see where they carve out their differentiation.

Synopsys’ Helene Thibieroz details what happened at the HSPICE special interest group event during DesignCon. If you missed it, here’s the short version.

And in case you didn’t see the most recent System-Level Design newsletter, here are some standout blogs:

–Synopsys’ Achim Nohl digs deep into pre-silicon SoC bring-up and why models make it easier.

–Cadence’s Frank Schirrmeister examines the differences between software and hardware developers and how to bridge the two worlds.

–Sonics’ Frank Ferro predicts we will never stop talking about SoC architectures for mobile devices.

–And Arteris’ Kurt Shuler compares C2C and MIPI LLI on the path to stacking of die.

Next Page »