Posts Tagged ‘ARM’

Next Page »

Experts At The Table: IP

Friday, March 23rd, 2012

By Ed Sperling
Low-Power Engineering sat down to talk about IP with John Goodenough, vice president of design technology and automation at ARM; Simon Butler, CEO of Methodics; Navraj Nandra, senior director of marketing for DesignWare analog and mixed signal IP at Synopsys, and Neil Hand, product marketing group director at Cadence. What follows are excerpts of that discussion.

LPE: The supply chain needs to function almost like an extended IDM model, right?
Goodenough: Yes, but it’s not a new concept. All of this was done in the automotive industry 25 years ago. The semiconductor industry is transitioning there. People are trying to manage the risk of taking a product out, and they are dependent on a lot of moving parts. Their goal is to understand and manage the risks.
Butler: But it needs to come in a consistent way. You don’t want to be on a plane every week. You need to have an abstraction that gives you the visibility you need without having to have a VPN license.
Hand: You need to set up a hosted design chain for the customer. Everyone is working within that common collaborative environment so that when something goes wrong it can be quickly addressed. As there are new revisions, they automatically drop into that environment and the customer sees them. That’s a trend that’s happening now.
Butler: That might be true if you’re both working on parts of the SoC. But if you’re a systems house and you’re assembling, then it’s a different tool set.
Hand: That’s correct. But the trends in the microcosm of IP are beginning to move into that realm, as well. Once you get into a system context, the EDA/VIP world doesn’t really fit into a system environment with their supply chain. That’s a challenge we have to resolve.

LPE: How does that affect growth of IP?
Hand: It affects everything up and down in the stack. It goes down to integrating into the SoC with RF, RF-like technologies such as optical, data converters, analog—all of that is starting to come as IP instead of standalone chips. The software and firmware stacks are more of an IP area. And once that gets solved, the next thing is how you build that into the system level models and supply chain models that are required for that. But we’re at such a low level on the IP side that there’s a lot of integration that has to happen.
Goodenough: I just came from Linaro. When we look at the new IP, it’s the software IP and analog IP. It’s the next logical thing to do. It’s the up-and-coming thing where people are looking to reduce cost. It’s no longer a real differentiator, so you just outsource it. But then you have to look at managing those software communities. It’s open-source software communities and making sure the platform and the instruction set and the memory maps of the platform architecture are being consistently reflected up to the software community and the operating system guy, so that when you plug those things together they work.
Hand: And in some cases that IP may become standalone and part of a 3D stack, in which case you have to manage that whole supply chain. How do you get that integrated onto the stack? In some cases, because of cost, risk or performance, you may not want to integrate some of this IP natively into the SoC.

LPE: Analog is a classic example of that, right? In 2.5D, you may want a whole separate chip at an older process node.
Nandra: Yes. We do build stuff that integrates into more nodes, but we also have customers that would like to put their analog into a 65nm power management IC, including the rest of the interfaces, and then the rest of the SoC at 20nm or 14nm. At 65nm, the power management is leading-edge technology. There are still some design challenges, although they’re not so difficult if you’ve worked at that node before. The point about stacking and packaging is quite interesting. From a signal integrity perspective, a lot of these things become easier. You don’t have long wires or cables anymore. You just have some communication going through the software and a via. The challenge becomes thermal dissipation between the substrates. You have a substrate at 65nm and one that’s at a smaller node with very different thermal characteristics. Someone has to figure out how to widen the memory lines so they don’t fuse together.
Goodenough: It’s a new context. They’re just wires, but they’re wires done in a different way.
Hand: The other challenge is a business one. Who owns the risk? If you have Wide I/O and a Wide I/O memory chip, the memory chipmaker says this is a known good die but it couldn’t be tested out on the landing pads. So it gets thinned out, stacked on seven other dies and then you finally do a test of the whole stack. It doesn’t work. Who owns the problem? You’ve got eight memory chips and an SoC, and it’s packaged and pinned out. Who owns the cost?
Nandra: From a practical packaging perspective, all of these technologies in 3D IC and wide I/O are really expensive. We’ve had similar discussions on Wide I/O. It’s a throwback technology with significant performance, but you have to invest heavily in your package. When it comes down to high-volume parts, people aren’t going to pay the money for this.
Hand: It depends. It’s the overall cost of the system that’s important. If you can get the overall power down and performance up, companies will invest. We’ve got customers investing in this now because it’s a way of differentiating. If you’ve got smartphone SoC vendors and they can differentiate with better power and performance and win the socket, they’ll do what they have to.
Nandra: With Wide I/O, that roadmap has been pushed out as people try to make LPDDR meet that requirement. Today, JEDEC is looking at LPDDR 4. That will push out Wide I/O further. From a technical perspective I totally get why big companies are looking at this technology. They’re also looking at fully depleted SOI, for example. But it’s not going to make it into a tablet or smartphone.
Goodenough: It’s a question of when the cost is right.
Hand: For many customers, LPDDR 3 will solve their immediate problems. But if you look at the trend, this is already happening. To get terabit per second performance out of memory you have to go to stacks. It’s not a question of whether the cost equation will work for this. It’s just a matter of whether it’s next year or the year after that.

LPE: We’re looking at a complete bifurcation of the market—those who do massive volume versus those who work in lower volumes.
Goodenough: It’s not so much volume as how much you can recover from your investment in how you make your silicon. Whether that’s a micro on a board, a processor in an FPGA, a custom chip, all that matters is how much profit you’re recovering. If you can only recover a wafer-thin margin you’re not going to be investing in new technologies.
Hand: Then you need volumes of 50 million units a year to get your money back.

LPE: But this does play into subsystems, because you can integrate all the pieces and achieve much greater volume, right?
Goodenough: It’s no different than boards. You’re seeing that happening in SoCs and FPGAs. Whether it’s going onto a board, a hard block in an FPGA, a soft fabric in an FPGA or a custom ASIC, they’re all basically different compile points that end up as a piece of silicon with a different price and a different energy envelope. And if you go to China with a standard part, you can probably turn a board around in two or three days. That’s a big difference from spending 2.5 years doing a custom IC.
Hand: You’ll have much of what was on the board in an SoC itself. Whether that’s integrated into an SoC or a stack or just integrated parts, it depends.
Goodenough: But if I’m a customer I don’t really care about that. I only care about how much power does it use, how much does it cost and what’s the form factor.
Hand: There have been many chips in the telecom world that make no sense to manufacture if you measure them by consumer SoC metrics, such as how many units have shipped. But the value you get out of each of those means they can make the economics work. Going back to context, there’s an economic context that people building a system are operating in. If you’re providing IP, you have to provide the right deliverables in the economic as well as the technical context. That’s what will drive subsystems more than anything else, too—the economic context.

LPE: And time to market is part of that economic equation?
Hand: Yes. If most of your chip is good enough to get the job done and you can do it in a few months of integrating extra pieces, while assuming everything you didn’t touch works perfectly, that’s a compelling argument.
Nandra: We see that with smartphones and tablets. In China, customers are starting to get into the tier-two markets. They’re all about derivatives. The idea is that you do reduce the cost of the IP.
Goodenough: You have to maintain IP, though. The IP may be fixed but the context is evolving. You have to evolve your IP as that context changes from four-layer boards to two-layer boards, or 32nm to 14nm. IP has a long lifetime and you have to anticipate where it’s going to land.
Butler: What’s particularly interesting is the IP view of defect tracking. A defect in IP never really goes away. There’s always somebody using it somewhere, and you need to know. It’s different from software where the lifetime is project-based. What we need is something that tracks bugs in IPs that goes into a system context so you get all your dependencies.

Experts At The Table: IP

Friday, March 16th, 2012

By Ed Sperling
Low-Power Engineering sat down to talk about IP with John Goodenough, vice president of design technology and automation at ARM; Simon Butler, CEO of Methodics; Navraj Nandra, senior director of marketing for DesignWare analog and mixed signal IP at Synopsys, and Neil Hand, product marketing group director at Cadence. What follows are excerpts of that discussion.

LPE: Are we seeing a blurring of the lines between design teams because of the shift to more third-party IP?
Hand: Customers want off-the-shelf IP. But it’s a very big distinction between becoming part of their design team and working with them all the way through, versus working on an outsourced basis. Customers want predictable, proven IP.
Butler: It may take a month to deliver the library, and then there are all kinds of bugs. So you don’t ship the library anymore. You work in a common environment.
Hand: Most customers don’t have to deal with that level. But the changes you make to hard IP should be zero.
Nandra: In terms of the feature sets and configurations, when it comes to physical IP it’s a well defined set. The work is done by the IP vendor. They’re proving it on silicon. And then what we ship is a black box.

LPE: Isn’t all IP a black box?
Nandra: The softer it gets, the more configurations customers expect.
Hand: With soft IP such as a memory compiler, you essentially have a compiler for that memory compiler. It’s still delivered as code to a customer. They have to do the implementation, so it’s more of a gray box.

LPE: But the trend is still away from touching that, right?
Hand: You still need to go through the physical implementation. You need to take their design context and optimize it or you won’t be able to hit the power, performance and area targets.
Goodenough: You need to supply the IP with the recipe to get people to the context. We provide soft IP and we provide the recipe to go through an implementation flow.
Nandra: The customer touches the IP, but the goal is to configure it.
Goodenough: They’re touching it in a carefully constrained way.
Hand: Some of these IP blocks have a huge number of configurations, so no two deliverables will be exactly the same.
Butler: And you don’t want to create part numbers for these things. It’s better to ship the IP with the tool that helps configure the IP.

LPE: One of the evolving trends in design these days is more rationalized use of resources—only what’s needed. A second trend is to do more exploration, comparing different IP and different implementations. Are those trends in sync?
Hand: It depends on the risk associated with configurability. On soft IP they want configurability because there’s a lower perceived risk. There will be some parts where if you touch the core there’s too much risk. There’s always a tradeoff between configurability and risk.
Goodenough: If you look at an ARM core, we’ll let people modify that cache sizes and take a gross functional block and turn a SIMD accelerator on and off. If you go and play with a new bus topology for an SoC, you have to go validate it, but there’s a lower perceived risk around that. One trend we do see, which our customers are pushing, is to provide larger subsystems. It’s not just the core, but a core and a memory subsystem that’s ready to go out with software that is known to run on it, including the BIG.little switches. They can adopt the correct risk profile that they want.

LPE: That’s basically managing context internally, right?
Goodenough: Yes. The amount of effort are putting into validation, whether it’s logical validation of systems or signoff, or going through the implementation floorplans, they’re equating time to market with something that is known good.
Hand: Unless there’s a material impact. Your customers want to focus on their differentiation. If you can give them something that’s proven and working and not going to impact their differentiation, they’ll take it as it is. If they can turn a knob and lower power, then they want that knob to turn. If you look at SoCs today, about 80% of them are the same. What’s different is how they’re configured, how they’re balanced and how they’re mapped to their customer application. That’s the secret sauce they bring to the table.

LPE: What happens to the IP industry if we’re pushing into larger and larger subsystems that are more contextually aware?
Hand: There will be increasing consolidation. The cost associated with building IP is going up. You can’t bring a small piece of IP to market that people that people will bank on for a 28nm or 20nm chip. That will be a natural process of maturing of the industry.
Goodenough: I agree. It’s a scale problem—the number of people you’re trying to supply while staying on the edge of physics and software. Only a company of scale can do that?
Butler: But the cost of entry for startups is down.
Goodenough: Yes, and you can still do an IP model as a one-off for one company because your context is constrained. You can still see a lot of innovation where there is a constrained problem. The question is how you scale from one to many. That’s a challenge for the classic IP industry.
Hand: It’s similar to what happened in core EDA. Everyone said EDA startups were dead because it wasn’t scalable. EDA startups continue to happen, but once it gets to a point of scale if it’s rival technology then one of the larger EDA companies acquires them. It will be similar for IP. Once it becomes viable and needs scale, either they become a major player with a large investment—something that’s unlikely—or they will be bought up by another company.
Nandra: We se a lot of companies that want to go to a complete chip, and through that process realize they have a lot of valuable IP. They get on the radar of other companies that need that sort of function. Most companies don’t start out as IP companies. They start out doing design services or with aspirations of becoming a chip company, and through that process they build a lot of interesting functions that are valuable to other people.
Goodenough: You see a lot of technical innovation in function.
Hand: If you look at interface IP, that’s one area where there has been a lot of consolidation. Five years ago there were a lot more companies doing standards-based IP. That’s shrinking rapidly. But there are other areas where there isn’t the level of standardization, such as analog front ends. There is a lot of innovation there.
Butler: OpenAccess was a good way of bring EDA vendors onto a single platform. That drove innovation because it meant startups could be on the same database as Cadence and Synopsys and it was easier to plug their tools in. Will there be something similar in the IP world?
Goodenough: That was the original idea behind IP-XACT, which is a meta-data standard. There are some very successful things being done off the back of that, such as the ability to define register maps and take a lot of pain out of integration. IP-XACT is necessary but not sufficient by itself. You need other standards to glue the Legos together. But you also need to put your glue into a modeling environment. Which one do you use? Which synthesis flow do you use? There is diversity, which is legitimately driven because people are trying to optimize design points and cost structures.
Hand: And a big piece of that is defining a quality standard. What is an acceptable quality for IP and how do you measure that? If customers can’t quantify something it’s seen as a risk. As you going forward, the lack of a well-defined quality standard for smaller companies makes it hard to prove to their customers that it’s worth buying.
Butler: Quality is an intangible thing. It’s not clear you’ll ever define it.
Goodenough: And there is no standard integration environment. Putting a standard definition of quality is nearly impossible.
Hand: But even if you solve all those modeling and integration problems, there’s still a question—if you’re a smaller player—how do you prove the quality of what you’ve got.
Goodenough: It’s a business-to-business transaction.
Hand: But then you have a small company dealing with a large company, which is putting its whole future up for grabs based on a piece of unproven IP. How do you prove it? As a small company, that’s a big problem.
Goodenough: That’s back to the trust issue. You trust the guy doing the IP because you’ve worked together for 20 years and they’ve done this before. ARM is a trusted company. We’ll stand behind it, no matter what it takes.
Hand: But it’s easier for ARM, Synopsys or Cadence to build that trust than three or four guys doing consulting and working in a shed.
Nandra: Most of the customers willing to take on a bigger risk are driven by cost. The actual cost is always more because something inevitably goes wrong.

LPE: A lot of this started with very standardized pieces. Those are no longer so standard. Are there new markets shaping up for IP?
Nandra: The biggest growth is in smartphones and tablets. These customers are driving the smaller technology nodes, and there’s lots of innovation at the fabs to get devices to work at these small feature sizes. There’s a combination happening between baseband chips and application processors. Customers are looking at combining the two. That makes a huge SoC, and we all have to work in technology that isn’t very friendly.
Butler: We see the blurring of boundaries. When a company is looking to interface with a vendor on the bleeding edge, there are so many revisions and so much churn as they nail down what the IP needs to be that they need to have a different way of interacting with a customer. Just downloading something from the Web site doesn’t work anymore. You need visibility into the customer’s environment and visibility into regressions in the test environment. One vendor has set up a portal to give their customers visibility into the IP that’s being generated. That way the customers can figure out if the vendor is on track to have something within the promised four weeks or six weeks. It’s having a way to bring the two teams together and build trust, without having to be on site or have VPN access, and be able to abstract out the quality and the progress is happening.
Goodenough: This is trust and concurrent engineering.

Experts At The Table: IP

Thursday, March 8th, 2012

By Ed Sperling
Low-Power Engineering sat down to talk about IP with John Goodenough, vice president of design technology and automation at ARM; Simon Butler, CEO of Methodics; Navraj Nandra, senior director of marketing for DesignWare analog and mixed signal IP at Synopsys, and Neil Hand, product marketing group director at Cadence. What follows are excerpts of that discussion.

LPE: Where are the problems with IP?
Nandra: Customers are asking for IP blocks to be in the leading-edge technologies. You’ve got high-performance requirements for analog to reside in an SoC, which is designed for digital performance. Our customers are asking us to follow all the digital scaling trends without sacrificing performance. On the soft IP, there’s a lot more complexity in the functionality. There are requirements for PCI Express Gen 3 and USB 3.0. The complexity is increasing significantly. Plus, a lot of these things are standards-based but they want differentiated IP for power, area and performance.
Butler: The complexity of the systems is increasing. Assembling the IP and managing all the different interfaces and the various deliverables for the IP is becoming a real challenge. As these complex SoCs begin to integrate third-party IP, as well as the IP developed in-house, there’s no one person who has a full understanding of all the deliverables. You may have a person who understands the analog space but not the RTL requirements. There are a lot of derived views. One of our customers has 108 views for a single IP. When it comes to promoting that IP up to the SoC level they’re asking for automation around that and an integrated verification platform that can gauge whether a particular change fits with the consistency checks across the views.
Hand: One of the big changes is the scope of what is expected to be covered in a piece of IP. As the amount of IP being used and the complexity increases, so does the scope of a particular piece of IP, both in terms of how much functionality it covers and the verification environments. A big part of that as you get more IP you have to move up a level so each piece is more manageable. Otherwise the integration of the SoC becomes an intractable problem.
Goodenough: The main change we’re seeing is that IP is expected to operate on the bleeding edge of physics and software. We see a twin challenge to make sure the IP is validated, packaged and fit for purpose in those two domains. We’re doing that in an environment that now has pace—the level of pace that’s required in engineering with the IP consumers is the key differentiator. You’re concurrently developing the IP with your lead customers, who are on the edge of physics and on the edge of software. IP is becoming less of a nice little box and more of a concurrent engineering process. We see this trend that a lot of activity in IP re-use assumes a stable world, and the world is not stable. Things like change management—managing ECOs, configuration management, managing patch levels—that’s where all our focus is. We can define what RTL is. We can define what a piece of verification IP is. But there is never a stable definition because everything is evolving.

LPE: What you’re talking about is context. The context is more complex, right?
Hand: That’s correct. You may want to explore the PCB environment it’s in and do a signal integrity analysis to make sure it all works. Other customers want it all to fit into a virtual system model to combine with the rest of their IP. Everything is becoming much more concurrent. The good news is it’s driving a lot more of the EDA tools and technologies that have been out there for a while.
Nandra: When your customers are challenged with really short product cycles, they want the IP quickly—even when the technology is not stable. We’ve started designing 28nm and 20nm IP with very early versions of the PDKs. It’s a mini-context. You have to design in an environment where the stuff around your IP isn’t stable. When it gets into an SoC that’s another context, where you have to figure out noise and coupling and SKU. And there’s a context above that, at the system level, where people have to figure out how the package and the lead frame relate to the IP and how that relates to the SoC. It’s almost like multi-context. But IP is at the lowest end of the food chain. If there’s a problem, you get the phone call first. A lot of time we find problems in the cable or the connector or the board, but we’re the ones who have to figure it out. The upside is we learn a lot about cables and connectors and boards, which is critical to our IP business.
Hand: If you look who’s buying IP today, a lot of times it’s customers who never bought IP in the past. Now you’ve got standard interfaces that don’t add value for the customer to build themselves. What’s changed is that in the past the IP market was one step behind. Now it’s at the leading edge.

LPE: But not everything is always on and off. Sometimes it’s somewhere in between.
How does that affect context?
Goodenough: One aspect of IP quality is whether it is functionally fit for purpose. The scope of environments you’re trying to validate for scales up. If you take BIG.little, you’re validating a multicore system that’s interacting in complex ways with BIG.little switches, hypervisors and operating systems on top of that. As an IP provider, you’re now anticipating the environment your IP will be deployed into. Otherwise, everyone will be pointing back to the IP provider if there’s a problem. If you don’t understand the context—complex software and physics environments—you don’t know whether it really is your problem. ARM works in partnership from the applications developers down to the foundry. A key part of IP is being able to understand the context and marshal the ecosystem, not just today but to what it’s going to be next year. With a big multicore system running the latest version of Android in someone’s SoC and it’s just fallen over, who’s problem is it? We’re putting a lot of emphasis on system debug and system finger-pointing. One of the biggest challenges on schedules is trying to triage the debug and find out where the problem is. It may be an SI problem on the board. It could be in the driver.
Hand: That’s what’s driving a lot of it. If a customer outsources a piece of IP, they’re also outsourcing their core expertise in that area. Who are they going to lean on for their expertise? It will be the IP provider. The IP provider does have to understand the whole concept. You do have to become the expert.
Butler: Yes, you become the fall guy.
Hand: You are the expert and you quickly have to get to the cause of it. If it’s your problem the customer knows you will fix it quickly. If it’s not, the customer knows you’ll determine it’s in a specific area.
Butler: So how do you monetize that kind of expertise?
Hand: It depends on the context of what’s going on. With leading-edge IP, there’s a larger business agreement because you are assisting them with that. But it was no different when verification IP started. When something died, the first assumption was the verification IP was bad. This isn’t different.

LPE: Is IP really being characterized properly?
Butler: No. One of the problems we see when we look at the design methodologies inside big SoC houses is they’re looking for a continuous build approach to hardware design because they have so many software and firmware variants they’re using to make their offering unique. What they’re finding is just doing the validation is a huge problem.
Goodenough: We internally and externally see this as a configuration management problem. At one time when you looked at configuration in an SoC it was all about how to rapidly do X, Y or Z. Now the hardware is pretty much fixed. You’ve turned this piece off, you’ve tied this one off, and now it’s a different software stack in the mobile space.
Butler: And there is so much complexity in all these different levels that people are scared to release blocks because they worry they’re the ones who are going to break it. They don’t have visibility across all the various pieces. The tools are still catching up, particularly when it comes to hardware-software compatibility. It’s kind of a black art.
Nandra: Each customer has a different constraint file set up. You have to shift those unique constraints to that customer. An interesting statistic is that it can take up to a month to download a library. Those databases are getting huge.
Goodenough: The file sizes are terabytes.
Nandra: The corner sets are becoming unique. You have constraints and corner sets and all these environments they’re looking at.

LPE: What’s the solution? Is it to provide more context or more pieces, such as subsystems?
Hand: It’s a combination of both. One part is the pieces will get bigger as a natural evolution. The other is giving people tools to explore the context, whether it’s hardware or software or co-verification. A third part is a way of capturing the metadata that defines that IP within a different context. That way you have a way of exploring the architecture with the metadata that defines this level.
Butler: The barriers are getting blurred and the IP provider is becoming an extension of the design team. It’s starting to sound like an outsourced design environment.
Nandra: The customer is expecting you to be part of the design team until the product gets out the door.

Processor Subject To Change

Thursday, February 9th, 2012

By Ann Steffora Mutschler
With power complexity driving sophisticated management techniques, SoC design engineering teams are turning to a new class of customizable processor architectures from ARM, CEVA, NVIDIA, Qualcomm and Tensilica and others to take advantage of the best in power saving techniques.

While these new architectures are novel approaches, the concepts are not especially new, particularly in mobile applications.

“If you look at what mobile processors have been doing, I would argue they’ve been doing some sort of big.LITTLE for a long time,” explained Nandan Nayampally, director of applications processor marketing in the processor division of ARM. “By that I mean you have microcontrollers taking charge when the big application processor is not working, or you’ve got video engines being separate from the main application processing. The compartmentalization of the activities around the chip have been always a focus for mobile because you will save power any which way you can. That’s a given.”

ARM has observed that what’s changed in the recent past is that the main OS needs to be running more and more of the time because with apps like Twitter feeds and Facebook updates, those are little apps that are constantly running on top of the OS.

As fun and/or useful as they are, these apps are killing battery life.

Nayampally explained the big.LITTLE architecture with an example. “Let’s say I’m doing an MP3 playback in the old days. You’d say, ‘I’m running on the big core, I kick off the task to a little core and then turn off the big processor because the MP3 can run just fine on a microcontroller type device. It’s all on the same die. Then suddenly you get a call and it wakes up the big processor and it takes over again. But when you offloaded that MP3 in the olden days—six months or so ago—you actually could have a separate task that wasn’t really run by the OS. Now there are so many more things and services that people are coming to expect that you can’t have them done specifically for targets that are different from the application processor itself and they run on top of the OS. Now you are telling the chip, ‘No, I won’t do these specialized things as separate things for very power-efficient sub-components, they have to be done by the main processor.’ But the main processor also has to become very schizophrenic in the level of performance it requires for the main tasks as well as what it needs for the little tasks.”

Source: ARM

What makes big.LITTLE interesting is that the processors are fully coherent so the software engineer doesn’t have to worry as much about maintaining every piece of data. The coherency in hardware takes care of that. That makes the software development quicker and can actually improve performance and battery life.

Designed to be an extension of DVFS, there are multiple use models in which big.LITTLE can work, with the simplest use meant to be effectively transparent to the OS, Nayampally continued. “The power management software always speaks to a driver that is the right power and performance needed based on what is required. If, for example, you had today’s processor and it was using the lowest performance level it could while doing Twitter update, it just can’t be as efficient as something that was designed to be a fifth smaller or something like that. What if your DVFS had a next step that is more efficient and you can work there for a while? From an OS standpoint, or an application standpoint, it doesn’t matter. It’s just another step in your DVFS. Underneath it what happens is the driver now can do the kick-off to switch the operations from the big core to the little core or from the little core to the big core or cluster in fact.”

NVIDIA’s Tegra 3 employs variable symmetric multiprocessing (vSMP) while Qualcomm uses asynchronous symmetrical multiprocessing (aSMP) – which are the same principles that govern ARM’s big.LITTLE architecture.

NVIDIA’s Tegra 3, launched last November is a quad-core mobile processor for smartphones and tablets, currently shipping in the ASUS Transformer Android tablet. A company spokesman explained that behind Tegra 3’s power efficiency is a fifth lower-power “companion” CPU core that goes with the four CPU cores and is specifically targeted at battery savings. Tegra 3’s architecture allows it to provide the best combination of performance and battery life by switching between the four main CPU cores and the fifth core for less demanding tasks and active standby mode.

For CEVA, which licenses DSPs, programmability has always been the name of the game, according to Eran Briman, the company’s vice president of marketing. About seven years ago it became apparent that general-purpose DSPs are not going to make the cut for next-generation designs—particularly in 40nm communications designs. In one of its newest offerings, the CEVA-XC DSP software-defined radio architecture, users can run the complete receive and transmit channels entirely in software, except for very few hardware engines that simply don’t make sense in software, he said. To accompany this and to allow for advanced power management, CEVA recently released a software development kit that includes advanced power management. Looking ahead, Briman believes there will be fully programmable communications units on SoCs.

CEVA isn’t the only company in the DSP space to see this trend.

“Many baseband designs particularly, when they are operating on complex protocols and care a lot about energy have moved to neither completely hard-wired—because that would be too fragile or intolerant of inevitable corrections and improvements—nor completely general-purpose, because a general-purpose processor is generally much less energy-efficient than something that is more specific to the task at hand,” observed Chris Rowen, CTO at Tensilica. “Especially in low-power baseband processing, we’re seeing more and more optimization of programmable engines to do this, where the baseband subsystem might include 6 or 8 or 10 different cores that are programmable. Some of these still may be fairly general-purpose, because you may say in this function though there’s a wide variety of different tasks that I need to do on the data and it is more energy efficient for me to have one that is shared among these different, diverse functions than to have one piece of hardware for every single function. That would make it too big. Having a programmable solution can in some cases also make it a smaller solution. In general, small is good for energy.”

Tensilica offers a range of DSP cores. It also allows users to build their own customized dataplane processors.

Step Away From the Spreadsheet

Thursday, February 9th, 2012

By Ann Steffora Mutschler
Engineers today spend more than a quarter of their time trying to meet power specifications.

A survey of more than 700 engineers by Calypto illustrates just how important and time-consuming power management is today for engineering teams. As consumer devices grow ever more complex, the need to deal with, analyze and optimize power at not just the RTL but at the system level is the next challenge, even if the path to reach that goal is not yet clear.

The opportunities for optimizing a design for power efficiency are greatest at the architectural level of abstraction. The further a design moves downstream the less effective optimization techniques become, noted Yossi Veller, chief scientist for ESL at Mentor Graphics, in a white paper he co-authored for ARM’s IQ Magazine. “Power optimization must begin with architectural analysis, exploration, and optimization of power and timing at the electronic system level (ESL). According to a study by LSI Logic, techniques available at the RTL synthesis phase have the ability to reduce power by 20%; those at the gate level offer a 10% reduction; while those at the layout level can reduce power by only 5%. Waiting until the RTL to begin optimizing for power is a wasted opportunity because power usage can be reduced by 80% at the ESL.”

Fig. 1: The ability to optimize power at the architectural far exceeds that at lower levels of abstraction.

“Traditional power optimization tools are really working at the lower levels of abstraction,” explained William Ruby, senior director of RTL power product engineering at Apache Design. “If you look at synthesis, if you look at physical design, there are some automated techniques that are available in those tools. But those are in a category of additional refinement-type steps. Once you have the design architecture nailed down, then you can add in some optimizations based on those tools and you can get some additional incremental power savings, but the part that is missing is enabling the true design-for-power efficiency. If you look at modern chip architectures, they are extremely complex and the RTL descriptions of these architectures are even more complex such that RTL in some cases is no longer seen as a viable architectural description language. You want to be able to describe the architecture of the design in a high level of abstraction.”

With this description comes the requirement to be able to analyze power. Today, this is done by synthesizing the design from a high-level description such as C++ down to RTL, and then an RTL power analysis tool can function and give feedback into the architectural domain. But what needs to accompany this synthesis-loop-back type of flow and give some indication of what the power numbers is more intelligence in those high level tools. They need to point out inefficiencies in a design at both the RTL and architectural levels.

Chris Rowen, CTO and co-founder of Tensilica sees two big challenges for power analysis tools. “One, it is very, very difficult to isolate where the real problem is. It only makes sense to really measure power at the level when you have really synthesized the logic and laid it out and you actually know what the physical design looks like, because the physical design has a huge impact on what the power dissipation of the circuit it.”

By the time it has gone through synthesis and place and route, you have really very little visibility into what was the original logic being questioned. “It all goes into the Cuisinart and all you get is this amorphous mush of gates at the end. So if someone asks you, ‘How much power is being dissipated in my multiplier versus in my divider versus in my register file,’ I don’t know anymore because I have to process them all together in order to get good physical results. But then it all has been aggressively remapped into other logic forms and I can’t isolate the power easily. So you have to work in rather indirect ways to figure out whether the power was being dissipated in one function versus another.”

A second problem, he said, involves system-level tracking of different scenarios. “It is extremely difficult to reach your power goal if you say, ‘Let me use the worst case assumption about each subsystem. I’m going to assume that every piece of my baseband is on, and every piece of my Layer 2 and Layer 3 protocol stack is on, and my image processor is on, and my apps processor is running full out, and all of my RF subsystems are running,’ because of course you’d exceed your power budget by a factor of two or three. Instead people recognize they’re not all on at the same time, the system doesn’t work that way. When you are doing one thing, then you’re typically not doing something else. Therefore, you only have to look at the particular combination of subsystems that is on at that time. However, the software guys have really poor tools to correlate what’s going on in the higher-level operating modes to what’s going on in terms of actual power dissipation in different subsystems. They are completely shooting in the dark where they do not have anything like the kind of accuracy for the modeling of these things.”

As a step towards true system-level power analysis, engineering teams are gradually figuring out that they need to build approximate models of power in addition to simulation environments that are fast enough to run realistic scenarios and to capture real activity. “Ironically getting power information is more than anything else probably a function of getting fast enough simulation, because only if you can run realistic size scenarios will you really gain interesting information,” he said.

This has become one of the big drivers of ESL, which until recently has been relatively slow to catch on. But complexity at advanced nodes, including power considerations, have significantly boosted it’s appeal.

“What the user would like is to have at the very early stages, when he has a TLM model of the design, is at least a relative assessment what architecture decisions will impact the energy in which direction,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “He will also want to know how the software impacts all of that. From a technology perspective, TLM models allow you to do that so it’s fairly straightforward to annotate power-related data into TLM models,” he asserted.

Annotating models with data just like annotating performance is a challenge and can be approached in three ways:

First, he said, “You can start with your assumptions, with your power budget. TLM models and virtual prototypes allow you to then execute your assumptions so you have in your power envelope/power budget. You say, ‘These tasks should take that much power, I know that from past experience,’ and then you execute your virtual platform with those annotated, estimated data or budgeted data. And you get dynamic results depending on what tasks the software ends up calling, how long a cell phone is used for which task in a day, and so forth.”

Second, annotate back from when you have RTL. “At the RTL level you have these switching formats that you can derive from the RTL to get a good idea about the activity,” Schirrmeister continued.

And third, it can be dealt with at the silicon level by taking previous designs, measuring power information and annotating back into TLM models.

Design engineers are undoubtedly looking for analysis and optimization at the system level so they can do power analysis and power estimation before RTL is available and before they can do gate-level simulations. But are they truly ready to adopt it?

Achim Nohl, technical marketing manager for Synopsys’ solutions group pointed out that today, power analysis starts with gate-level simulation. “If you talk to a hardware engineer and tell him, ‘We are going to employ virtual prototyping and high-level models to do power analysis,’ he will certainly look at you a little strange because he thinks, ‘I’m doing all those back-end optimizations and all those specific things to optimize power. How will you ever be able to reflect that in a virtual prototype simulation?’ But that’s not the point. For virtual prototyping, the granularity of a system is very much different. You’re not looking at just the memory controller. You’re looking at the CPU with the memory controller, the buses, the interconnect, the peripherals and how all those things are orchestrated to find out where the different hot spots are and what is best way to program all those pieces. What is the best scheduling technique? That is the concern at that level.”

When a new chip is architected today, estimates are done to determine whether the chip is feasible at all from a power perspective, he said. “Today, people are using spreadsheets in order to do this analysis, and this can only be a worst case analysis because they don’t know the dynamics and can’t reflect the dynamics of the system in those spreadsheets.”

While the pure architectural level tools don’t exist yet, many users are likely content with high-level synthesis tools for the time being. Apache’s Ruby believes they are good in their own respects but they are not actually meant to give architectural guidance; they are just meant to synthesize the design above the RTL.

One final thought for nervous system architects: The architectural tools of the near future will not replace the actual architect unless they become truly artificial intelligence, which is not likely to happen any time soon, Ruby concluded.

The Next Big Challenge

Thursday, January 12th, 2012

By Ed Sperling
Software is the next big target in the quest to make electronics more energy efficient, but it’s proving a far bigger challenge than most systems architects originally believed it would be.

There are several very large big problems to deal with in software. Writing efficient code for small processors isn’t one of them. In fact, the proliferation of small processors across an SoC makes it easier to deal with at least a portion of the software software. Code can run directly on the bare metal, some of it can be nothing more than an executable file, and still other code can run on a real-time operating system written for a specific purpose or even on slimmed down versions of operating system code.

But bringing all of this code under the control of an SoC is another matter, despite the fact that this is the best way to manage power and minimize physical effects in a chip. Solving this problem requires integration and coherency across a chip, which in turn requires software architects and system architects to work together up front. This may be a goal among companies, but it certainly isn’t a reality.

“You need coherence to develop a high-end software design,” said Dan Driscoll, Nucleus software architect for Mentor Graphic’s Embedded Software Division. “At this point integration is a large portion of the effort, and the problem has yet to be solved. One thing that helps is a single development environment. If you use multiple profiling tools it’s more difficult to pull that together into a system.”

Devils in the details
Just understanding the interactions between various hardware portions of an SoC has far exceeded human limits in complex SoCs, even at mainstream process nodes. Most companies use a block or subsystem approach to deal with this complexity, working on smaller pieces and then assembling them into the whole and hoping it works as a single system.

Software increases the complexity by orders of magnitude, because an increasing amount of software now controls functionality across the chip. It determines what remains on, what gets turned off, in what sequence, at what speed, and what gets priority. It also determines how much power and memory can be allocated to a given function or logic subsystem—at least in 2D designs. (In stacked die, it may be possible to dedicate portions of memory to logic blocks to minimize this issue).

“This is the job of the controller software for the overall system,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “You tell it to execute this API or put data over here. This is a high-level sequence, and it can do connectivity between different cores of a processor. You also can add up the energy transactions and memory transactions that will trigger.”

Multi-core, many-core, and multiple processors
A second big problem stems from the types of processors being used. The ability to write software applications that can take advantage of multiple cores is an old and well-understood issue—about four decades old, in fact. And while it’s easy for processor makers to add more cores onto a piece of silicon and hand it off to applications developers to deal with, the reality is that most applications cannot be parsed to take advantage of more than eight cores, and in many cases the number is likely to be fewer than four.

Databases, scientific calculations and graphics rendering, where there is extreme redundancy, are the exceptions. Even some games can have functionality parsed across cores. For most other applications, though, the limit it probably two to four cores. And if these cores are running popular general-purpose operating systems such as Windows, Mac OSX or Linux, chances are pretty good that it’s not the most efficient implementation of a function even though it may be the most convenient.

RTOSes have been used by the military for decades as a much more energy-efficient alternative, although most of that work was far less concerned about the energy than about security and performance. Their shift into commercial applications such as mobile phones makes them especially suitable for managing specific functions on separate processor cores in an SoC. It doesn’t make sense, for example, to utilize a multicore general-purpose processor for audio enhancements, and if it isn’t running on a general-purpose processor then it probably doesn’t need a general-purpose OS, either. But those functions still have to work with other parts of the chip without affecting signal integrity or creating hardware proximity effects such as heat, ESD and electromigration.

“The idea of SMP (symmetric multiprocessing) beyond 8 to 16 cores is not realistic for most applications,” said Mentor’s Driscoll. “We’re almost stuck with AMP (asynchronous multiprocessing) as part of large multicore implementations. But we’re seeing cases where you may have a TI OMAP 5, running a dual-core ARM Cortex A-9, an A4 and a DSP. You may have six or seven cores, and a general-purpose operating system going through this part of the system. That operating system may control other DSP interfaces, including RTOSes.”

Verification and testing brain freezes
This approach leads to another problem, though. How do engineering teams verify and test this complex SoC, which now may include multiple types of processors and processor cores, various types of software, and a central software management scheme that probably involves a standard operating system? There may even be middleware making some of the connections, and in homogeneous environments possibly even a virtualization layer that may include hypervisors that can run on bare metal.

“The first thing you have to deal with is a traffic debug issue,” said Cadence’s Schirrmeister. “In many cases, the partitioning may happen by hand. But how you pull this all together may affect your debug strategy. Tensilica presented an extreme example involving a printer design, where they had a block diagram of the functionality and the cores. The printer company used Tensilica cores, which allowed them to replace the functions done in RTL with programmable functions. The connections worked, the memories worked, and the functionality was done in software as bare-metal, low-level software.”

There’s a tradeoff in doing that, however. Driscoll said that pushing functionality down to lower-end processors makes integration more complex. In addition, measuring power consumption becomes more difficult because it means adding up energy transactions that the memory transactions will trigger.

“That means you need data to verify what works at the block level, the subsystem and in the overall system,” Schirrmeister said. “And some chips have processors you can’t access from outside for security reasons. You need flexibility in the software because of security, but you are not allowed to see it from the outside.”

Conclusion
While there has been much attention devoted to finding a common language between hardware and software engineers, the real path forward may be more focused on matching goals at the architectural stage, and then being able to swap information as a design progresses.

Virtual platforms that allow software to be developed earlier in the process help. So do some of the features that are being built into RTOSes these days. In addition, stacked die will help eliminate some issues, while creating new ones. But the real challenges will continue to be integration of hardware and software, and of various types of software with other software—with an eye toward remaining within a power budget and understanding how code affects energy consumed over time.

Power Bits: Last Laptop Standing, Bacteria Power

Friday, January 6th, 2012

Road Warrior Tools
Being able to fly cross-country using a laptop all the way without plugging it in is one thing. Being able to fly across the Pacific Ocean is quite another.

The race is on not just to extend battery life, but to extend it while actually doing something useful on all mobile devices, whether that’s a PC or a smart phone. That requires a significant amount of specialization in both the processor and the software.

Lenovo’s announcement this week of its ThinkPad X1 Hybrid is a case in point. The laptop includes something called Instant Media Mode, which it calls a second PC. Based on a dual-core Qualcomm processor running Linux, this chip can be used to watch videos, listen to music and surf the Internet. That still leaves the regular Intel chip to do the bulk of the heavy lifting, but it’s an interesting approach.

Lenovo isn’t the first company to come up with this idea, of course. Dell introduced a similar device back in 2009. The current iteration, called Latitude ON and available in its lineup, uses an ARM Cortex M3 core running in a Broadcom chip to achieve what ARM claims is multi-day battery life.

This also helps explain why the netbook market segment has largely disappeared overnight, wedged out by tablets on one side and long-life laptops on the other. Interestingly, ARM seems to be the common thread in all of these.

Bacteria In Space
It’s amazing what you can do when you don’t need air to sustain life. The U.S. Naval Research Laboratory is looking at “microrovers,” vehicles that weigh in at about 2.2 pounds (1 kilogram) and powered by microbes.

The combination of low-power electronics, low energy consumption and microbial fuel cells that continue to generate energy while regenerating themselves is an unusual approach. The anaerobic bacterium (geobacter sulfurreducens) is expected to have an extremely long lifespan, which will be essential in deep-space exploration.

Electron microscope image of bacteria used for power. Source: U.S. Navy.

The Navy says a portion of the energy generated by the bacteria will be used to maintain electronics, with the rest used to charge a battery or capacitor. From there, the robot will be propelled using a tumbling or hopping motion.

One question, though: Do you need environmental impact statements for this kind of stuff?

–Ed Sperling

One On One: ARM CTO Mike Muller

Thursday, December 1st, 2011

LPE: How far does Moore’s Law extend forward and what are we likely to encounter along the way?
Muller: The good news is there is no known solution for 7nm. That implies that between now and then it’s okay. When I talk to people they seem fairly confident they’re going to get there. Exactly how they don’t know. Will there be any miracles needed? Yes, probably one or two. But 14nm and down close to 7nm will happen. The bad news is that frequency will be flat with constant leakage.

LPE: That’s an interesting perspective.
Muller: Life is full of tradeoffs. People have traditionally taken different tradeoffs on process development. But in the past a lot of that was about getting frequency uplift. You can trade that in lots of different ways, and there is still frequency uplift to be had. But that costs you in terms of leakage, and people worry about that much more than they used to and where those tradeoffs are made.

LPE: Does that leakage continue even with the introduction of FinFETs and other techniques?
Muller: New process techniques like FinFETs help, but they’re one-off advances. You draw your curve, and there are times when you get ahead of the curve. Then you’re on the gradual slope back down again. So there are one-off things that really help with leakage. But once you’ve done that, you’ve still got three impossible things to do before breakfast to get you back down to 7nm. Those steps are part of the solution, but they don’t solve it to the point where leakage is going away and you don’t need to worry about it anymore.

LPE: How about dropping the voltage?
Muller: We’ve always done voltage scaling, and DVS (dynamic voltage scaling) continues. There will be different learnings about how much voltage scaling can you get. If you can do it, voltage is one of the best things you can do for saving dynamic and static power. That will continue, but the margins are getting harder to find.

LPE: ARM just introduced its big.LITTLE approach. What’s the thinking behind that?
Muller: The idea is that you can crank down the voltage and save power and scale it. There are times when you need performance, which is the ‘big’ part, and there are times when you don’t need that. You cannot build as efficient a microarchitecture for the big cores as you can for the little cores because getting that single-thread performance involves a lot of microarchitecture complexity and speculation, which ultimately costs you power. If you don’t need all of that performance, and your voltage scaling has run out of anywhere to go, the right thing to do is to task migrate onto an identical but smaller core with a simpler microarchitecture. That works wherever you are and on whichever process. It will always be true. You will be able to build much more efficient little cores than big cores.

LPE: How does this affect the overall device architecture?
Muller: This is an OS-level task migration, which happens anyway. You determine how many SMP cores you need to light up. Then you do task migrations. It’s another step to migrate onto a smaller core. That’s something you just build into the OS. You don’t need to add any extra magic. It’s already happening.

LPE: Is this going to apply in stacked die with rightsizing of functions?
Muller: The stacked die is almost an orthogonal issue. It’s happening today with flash and SoCs put into the same package because of packaging constraints. It opens the door to completely different die-to-die memory interfaces, which allow you to build more efficient systems than going off-chip, down-chip to a separately packaged die. It changes some of the memory bandwidth. But it’s just a computer at the end of the day, so main memory bandwidth is one of the fundamental determinants of performance. Stacking allows you to change that. Whether you’re stacking big cores, little cores, or big.LITTLE cores in combination, for different applications you’ll need different combinations. And you exploit that with main memory bandwidth.

LPE: It doesn’t sound like we’ve made much real progress in terms of true multiprocessing software for most jobs.
Muller: When I went to university, which admittedly was a few years ago, I was taught never to trust an MP solution from a hardware guy. That was one of the lectures from a guy who invented the sub-routine. I think he was right, but for low numbers of cores—eight and less—SMP is a fixed problem or a solved problem because you have enough system complexity that you do have a browser and a background task. You don’t have to worry too much about how well you’ve taken an application and threaded it.

LPE: But you’ve split the functions rather than threaded the application, right?
Muller: That’s the first step. And for two, three or four cores you can do that without really having to re-do anything. When you get into re-programming applications like your browser and executing that on multicore, there are a limited number of applications that drive that performance envelope. Your small applet you’re running doesn’t touch it. You’re going to do browsers and virtual reality apps where programmers are willing to go back and figure out how to re-program and rewrite it. It’s true that the general software community is not set up for generating multicore applications. For most applications, you don’t need it. Beyond that, there is database lookup that’s independent of any one application scaling.

LPE: So populating an SoC with small processors is a way of splitting off functions?
Muller: Yes, and heterogeneous isn’t just about big.LITTLE. It’s about having entire subsystems for tasks, which may be a Cortex-A5 running a complicated audio subsystem that might actually be for custom hardware. If you open up an SoC for a mobile phone you’ll find all of those things in there. The challenge is the programming model for that heterogeneous system, let alone programming the multicore apps processor with lots of cores in it.

LPE: And you need coherence across all of that, right?
Muller: Some of it is about system-level coherency, and some of that is in the programming model. There are three or four emerging standards for that. What they address is which computing where. You still come back to manual placement of the different processing elements for different tasks. That’s not a solved problem.

LPE: So as you look forward, is power and/or leakage the big issue?
Muller: If you go back to ARM 1990, we always talked about power/performance/area and the tradeoff between them. I don’t think that’s changed. If it’s all about power, run at a kilohertz, sub-threshold, and you come up with completely different solutions. If it’s only the Internet of things and tiny embedded microcontrollers, you still have to figure out what’s your budget, what’s your power and what’s your performance, and balance between them. In the future we won’t just worry about power.

LPE: But in the future will power become more important in the PPA equation?
Muller: It depends on who’s talking. We’ve always had power up there as a fundamental part of what we do. There is no sudden change of course. Power really matters in system-level integration, whether it’s megawatts in server farms or milliwatts of active power in a small SoC device. We’ve always worried about that. It’s just maturing for more systems, but it’s something we’ve always done.

Power Gating And Power-Centric Programing

Thursday, November 3rd, 2011

By Pallab Chatterjee
SoC design has a number of techniques for power management. One of the more prevalent methods is to use power gating to turn on and off blocks based on applications being run, and mode controls. Power gating while being supported by the two major EDA power design flows, UPF and CPF, still has some implementation challenges.

The flows have to make sure that the states of the logic at the interface to the blocks being turned off do not get corrupted due to changes on the shared ground/supplies. Basic power gating is well known. However, its use in both multiple power supply systems and multi-logic threshold systems still has some challenges. Power gating requires the outputs of the switched gates to be isolated from the control signals on the inputs, and also that the output get clamped at some state—low, high or “last value.”

The power gating function results in a reduction in the logic level swing due to the IR drop of the “on” device between the logic cell and the power supply/ground. Gate bias and level-shifting to a second set of power rails to drive the gate buffer control logic allows for the power gating devices to have a reduced IR drop to the virtual supply (VVDD). Timing construction for this type of function, however, is transparent to the UPF/CPF design flows.

A workaround for the logic that has to interface with power-controlled blocks is to use state retention registers. This solution has quite a bit of area/performance penalty as it requires a formal and powered-on register bank for each I/O-facing logic block in the sub-block. The gate count is expensive for full state coverage, and partial state coverage has validation issues. There is an additional cost of power and latency. The latency is due to the loading and unloading of the software state for save/restore.

To address these issues, designers can use an enhanced DFF with connection to the always-on retention register power supply. This cell would have to support save, hold, restore and normal operation functions. UPF and CPF do not always work directly with these non-RTL states and impact the validation flow. A further challenge is the functional planning and implementation of set and reset signals through the retention registers and the impact of those signals on the data being held for the “off” blocks.

ARM, in the Cortex M class products, has implemented low-cost state retention using sub-period clocks and secondary power supplies for the retention devices. These sub-period clocks allows Set and Reset functions to occur on an asynchronous basis with the system clock. The logic blocks are generally built using clocks from a DVFS control system.

The challenge for using these blocks is to not only integrate them into the timing flow of the circuits, but to make sure that the retention registers can safely provide data, at the correct logic level, with the blocks that are on. As application programs gain control of the power gating function, simple state machine-based control for these registers is not sufficient. Programming optimization of the high-level language function now have more interaction with the data flow per block. This results in environments such as OpenCl, which sends tasks to both distributed CPUs and GPUs through common and segmented memory controls, having a great deal of impact on when blocks are on or off. Normally, a compute task that has no output view is contained just in the CPU signal path, and the GPU can be powered down. Under OpenCL, it is possible to have this task sent to both the CPU and the many threads of the GPU and then combine the results in central memory. This has an impact on the power control, because to achieve the performance enhancement of the extra computation capability you cannot tolerate the latency of a turn-on, reset or restore, and then store and turn-off cycle of the GPU. This latency is typically longer than the compute cycle.

The design verification is still hampered by the fact that none of the logic verification environments can model these turn-on and turn-off state transitions as the power supplies change under application software control. The simulations are based on timing for the power supply control switch transitions, and estimates based on RC load for the blocks to be either available or not.

Five Important Changes That Will Affect Power

Thursday, November 3rd, 2011

By Ed Sperling
So far most of the energy savings in SoCs have been achieved using two main approaches—turning off most of the chip most of the time, and changing the materials used to insulate against current leakage.

Over the next few years, changes to designs will be more radical, encompass more pieces of a bigger system, and they will be orders of magnitude more effective. From a market standpoint, there is little choice. Computing increasingly is going mobile, and time between charges is a competitive edge. The caveat is that increased battery life has to come with a subsequent increase in functionality. Everything that could be done with a plug now will have to be done without one.

That means rethinking everything from the hardware design to the usage model to the software that runs on those platforms. And it means getting chips out the door at least as quickly, if not more quickly. Here are five trends and approaches that collectively, and sometimes individually, will have a big impact on energy efficiency, power consumption and leakage:

1. Rethinking the basics. Some of the biggest advances in efficiency will come from optimizing existing technology. There is more to turn off, more pieces to improve, and there are more ways of doing it better.

Consider something as basic as the clock, for example. The big focus has been maximizing frequency for nearly five decades. There are even concurrent clocks to make that happen. But having them always on and always running at the same frequency means they use a lot more energy than necessary.

“Design has always centered around the clock being the heartbeat of the system,” said Chi-Ping Su, senior vice president of R&D for Cadence’s Silicon Realization Group. “So people always assume the clock will be on. What we have found, working with ARM and the processor type of design, is that the clock consumes an extremely large percentage of the power. Timing and frequency are based on the clock. So you build a tree to be the ideal clock and you do everything based on that. When we started looking at it, we started asking why clocks need to be balanced at all.”

So how much energy can be saved? Su contends the amount is up to 30% of clock-tree power and up to 50% of dynamic power for the entire system.

He’s not alone in touting these kinds of numbers. Most SoC tools developers believe that dealing with energy/power/leakage at or before RTL can mean significant savings for the overall design.
“All the low-hanging fruit is still available to chip designers,” said Vic Kulkarni, senior vice president and general manager at Apache Design. “We find that even advanced designers are more concerned with meeting functionality and identifying power bugs. What they forget is the relationship between data, clock, reset and enable—the four signals in an SoC.”

2. Reducing distance and resistance. Over the next two years the SoC industry will undergo a radical shift that will continue for years to come. Rather than plotting Moore’s Law linearly, transistors will be placed in three dimensions.

Driven partly by re-use, partly by time-to-market pressures and partly by physical limitations, 2.5D and 3D stacking will have an enormous effect on energy consumption and power. By stacking memory and other components on top of logic, the distance a signal must travel can be shortened significantly, along with the energy necessary to drive that signal.

“Moore’s Law is not a law,” said Wally Rhines, chairman and CEO of Mentor Graphics. “But the easiest way to reduce the cost of a transistor for the last 40 years has been shrinking feature sizes and growing wafer sizes. We are coming into an era where it will be more cost effective to stack die than to shrink feature sizes. We will hit it with memory before logic, but as with all new technologies we will adopt it before it is cost effective because of unique capabilities.”

Whether it’s done with an interposer, package-on-package, or flip-chip bumped die, Rhines said there is a 70% decrease in power dissipation if the memory can be put on top of a processor.

And that’s just for starters. By adding more processors that are sized for a particular function and tying that to just the right amount of memory, rather than a whole memory chip or block, far less power is needed. Companies such as Tensilica and ARM have been making this case for some time. With stacked die, their arguments are likely to receive far more attention.

3. New materials and structures. Calling a material “new” is something of a misnomer in SoC design. Most of the techniques that we consider revolutionary have been around for decades, but they haven’t been developed enough to the point where they are cost effective, both from a yield and materials standpoint.

Through-silicon VIAs, for example, have been talked about since the late 1950s, and interposers in 2.5D packages are simply a collection of TSVs on a single die. But there are still issues to be worked out. Shang-Yi Chiang, senior vice president of R&D at TSMC, said there questions remain about how to integrate a substrate with an interposer, and how to debug it at different phases of development so it can be tested.

“There are a lot of parasitics to deal with in 2.5D,” Chiang said. “And with 3D we need time to make sure we can calibrate it.”

The other kind of 3D—structures such as FinFETs, tunnel FETs and nanowires—have been on the drawing board since the 1990s. All of these structures can lower leakage by controlling the gate at multiple points. FinFETs are planned in volume for 14nm by both GlobalFoundries and TSMC, while Intel may begin using them as early as 22nm.

These structures hold the promise of radically reducing leakage of both static and dynamic power using all modes of operation—at least initially.

“The problem is these are a one-off thing,” said Mike Muller, chief technology officer at ARM. “FinFETs do reduce leakage, but once you’ve done that you’ve still got three impossible things to do before breakfast. Those kinds of steps are part of the solution.”

Muller said combining those with stacking techniques will go even further. “It opens the door to completely different die-to-die memory interfaces which allow you to build more efficient systems than when you go off the chip, down the serial interface to a separately packaged die. It changes the memory bandwidth, and this is just a computer at the end of the day so memory is one of the fundamentals for performance. Stacking allows you to change that.

4. Lowering the voltage. One of the benefits of 3D structures such as FinFETs and stacking of die is that they make it easier to lower the voltage in certain parts of the chip. The reason is that the minimum voltage for DRAM may be higher just to maintain functionality than it is for logic or I/O. By separating those functions into different die, issues such as state retention and leakage can be confined and dealt with independently—the so-called divide-and-conquer approach.

So how low can the voltage go? Several years ago, researchers at IBM said the minimum voltage for an SoC would be at least 0.7 volts. It now appears it can be as low as 0.1 or 0.2 volts, and research is under way to lower it even further.

“You can get down to 0.3 or 0.2 volts without any problems,” Qi Wang, technical marketing group director at Cadence, said during a recent roundtable. “If you keep the aspect ratio of the depth and the height of a FinFET then you can guarantee the performance, but you do have other physical effects. Nothing is free. But the voltage can go much lower than what the textbooks say.”

5. Fixing software. Software is the last piece of the puzzle to fix, and it’s been one of the hardest for a number of reasons.

First of all, software takes longer to create and perfect than hardware. This is evident in all the bug fixes and updates. All three of the top EDA players are involved in this effort. Synopsys is working on software prototyping to get allow software to be written even before the hardware is ready. Mentor has been involved in simplifying the creation of RTOSes and embedded software. And Cadence has shifted its design approach so that software and hardware can be done far more concurrently.

But getting software out on time is only a first step. The next step is to make software function more efficiently, an approach that dates back to the RISC vs. CISC wars of the 1990s. Reduced instruction set computing was more efficient than complex instruction set computing, which boosted performance. By taking that approach one step further, it also can reduce the amount of energy consumed by a particular task, and be used to manage the overall power in an system much more efficiently.

Work on symmetric multiprocessing continues, as well. How far that will go is anyone’s guess, but for most applications we now seem to be facing a limit on the number of cores that can be effectively used by most applications. Talk about unlimited number of cores has given way to limited numbers of cores and unlimited numbers of processors spread throughout a system—most of which are off most of the time.

Taken together, all five of these trends will have a huge effect on efficiency, power and leakage. And now that battery life is a competitive issue, it also is likely to be used by vendors and seen as a value add instead of an unnecessary engineering cost—or worse, a nuisance.

Next Page »