Part of the  

Chip Design Magazine


About  |  Contact

Posts Tagged ‘Altera’

Connected IP Blocks: Busses or Networks?

Tuesday, November 26th, 2013

Gabe Moretti

David Shippy of Altera, Lawrence Loh from Jasper Design, Mentor’s Steve Bailey, and Drew Wingard from Sonics got together to discuss the issues inherent in connecting IP blocks whether in a SoC or on stack die architecture.

SLD: On chip connectivity uses area and power and also generates noise. Have we addressed these issues sufficiently?

Wingard: The communication problem is fundamental to the high degree of integration. We must consider how to partition the design to obtain sufficient high performance while consuming the least amount of power. A significant portion of the cost benefits of going to a higher level of integration comes from the ability to share critical resources like off chip memory or a single control processor across a wide variety of elements. This communication problem is fundamental to the design integration. We cannot make it take zero area, we cannot make it have zero latency so we must think how we partition the design so that we can get things done with sufficient high performance and low power.

Loh: Another dimension that is challenging is the verification aspect. We need to continue to innovate in order to address it.

Shippy: The number of transistors in FPGAs is growing but I do not see that the number of transistors dedicated to connectivity growing at the same pace as in other technologies. At Altera we use network on chip to efficiently connect various processing blocks together. It turns out that both area and power used by the network is very small compared to the rest of the die. In general we can say that the resources dedicated to interconnect are around 3% to 5% of the die area.

Wingard: Connectivity tends to use a relatively large portion of the “long wires” on the chip, and so even if it may be a relatively modest part of the die area it runs the risk of presenting a higher challenge in the physical domain.

SLD: When people talk about the future they talk about “internet of things” or “smart everything”. SoC are becoming more complex. Two things we can do. One knowing the nature of the IP that we have to connect we could develop standard protocols that use smaller busses or we can use a network on chip. Which do you think is most promising?

Wingard: I think you cannot assume that you know what type of IP you are going to connect. There will be a wide variety of applications and a number of wireless protocols used. We must start with an open model that is independent of the IP in the architecture. I am strongly in favor of the decoupled network approach.

SLD: What is the overhead we are prepared to pay for a solution?

Shippy: The solution needs to be a distributed system that uses a narrower interface with fewer wires.

SLD: What type of work do we need to do to be ready to have a common, if not standard verification method for this type of connectivity?

Loh: Again as Drew stated we can have different types of connectivity, some optimized for power, other for throughput for example. People are creating architectural level protocols. When you have a well defined method to describe things then you can derive a verification methodology.

Bailey: If we are looking to an equivalent to UVM you start from the protocol. A protocol has various levels. You must consider what type of interconnect is used then you can verify if they are connected correctly and control aspects like arbitration. Then you can move to the functional level. To do that you must be able to generate traffic and the only way to do this is to either mimic I/O through files or have a high level model to generate the traffic so that we can replicate what it will happen under stress to make sure the network can handle the traffic. It all starts with basic verification IP. The protocol used will determine the properties of its various levels of abstraction, and the IP can provide ways to move across these levels to create a required verification method.

Wingard: Our customers expect that we will provide the verification system that will prove to them that we can provide the type of communication they want to implement. The state of the art today is that there will be surprises when the communication subsystem is connected to the various IP blocks. The advantage of having protocol verification IP is that what we see today is that the problems tend to be system interaction challenges and we do not yet have a complete way of describing how to gather the appropriate information to verify the entire system.

Bailey: Yes there is a difference between verifying the interconnect and doing system verification. There is still work that needs to be done to capture the stress points of the entire system: it is a big challenge.

SLD: What do you think should be the next step?

Wingard: I think the trend is pretty clear. More and more application areas are reaching the level of complexity where it makes sense to adopt network based communication solutions. As developers start addressing these issues they will start to appreciate the fact that there may be a need for system wide solution for other things like power management for example. As communication mechanisms become more sophisticated we need to address the issue of security at the system level as well.

Loh: In order to address the issue of system level verification we must understand that the design needs to be analyzed from the very beginning so that we can determine the choices available. Standardization can no longer be just how things are connected, but it must grow to cover how things are defined.

Shippy: One of the challenges is how are we going to build heterogeneous systems that interact together correctly while also dealing with all of the physical characteristics of the circuit. With many processors, and lots of DSPs and accelerators we need to figure a topology to interconnect all of those blocks and at the same time deal with the data coming from the outside or being output to the outside environment. The problem is how to verify and optimize the system, not just the communication flow.

Bailey: As the design part of future products evolves, so the verification methods will have to evolve. There has to be a level of coherency that covers the functionality of the system so that designers can understand the level of stress they are simulating for the entire system. Designers also need to be able to isolate the problem when found. Where does it originate from? An IP subsystem, the connectivity network, or is it a logic problem in the aggregate circuitry?

Contributors Biographies:

David Shippy is currently the Director of System Architecture for Altera Corporation where he manages System Architecture and Performance Modeling. Prior to that he was Chief Architect for low power X86 CPUs and SOCs at AMD. Before that he was Vice President of Engineering at Intrinsity where he led the development of the ARM CPU design for the Apple iPhone 4 and iPad. Prior to that he spent most of his career at IBM leading PowerPC microprocessor designs including the role of Chief Architect and technical leader of the PowerPC CPU microprocessors for the Xbox 360 and PlayStation 3 game machines. His experience designing high-performance microprocessor chips and leading large teams spans more than 30 years. He has over 50 patents in all areas of high performance, low power microprocessor technology.

Lawrence Loh, Vice President of Worldwide Applications Engineering, Jasper Design
Lawrence Loh holds overall management responsibility for the company’s applications engineering and methodology development. Loh has been with the company since 2002, and was formerly Jasper’s Director of Application Engineering. He holds four U.S. patents on formal technologies. His prior experience includes verification and emulation engineering for MIPS, and verification manager for Infineon’s successful LAN Business Unit. Loh holds a BSEE from California Polytechnic State University and an MSEE from San Diego State.

Stephen Bailey is the director of emerging technologies in the Design Verification and Test Division of Mentor Graphics. Steve chaired the Accellera and IEEE 1801 working group efforts that resulted in the UPF standard. He has been active in EDA standards for over two decades and has served as technical program chair and conference chair for industry conferences, including DVCon. Steve began his career designing embedded software for avionics systems before moving into the EDA industry in 1990. Since then he has worked in R&D, applications, and technical and product marketing. Steve holds BSCS and MSCS degrees from Chapman University.

Drew Wingard co-founded Sonics in September 1996 and is its chief technical officer, secretary, and board member. Before co-founding Sonics, Drew led the development of advanced circuits and CAD methodology for MicroUnity Systems Engineering. He co-founded and worked at Pomegranate Technology, where he designed an advanced SIMD multimedia processor. He received is BS from the University of Texas at Austin and his MS and PHD from Stanford University, all in electrical engineering.

Architectural Changes Ahead

Thursday, September 13th, 2012

By John Blyler and Staff
For the past couple of process nodes chipmakers have been developing power-saving features that have been largely ignored by OEMs. That’s beginning to change.

The need to do more and faster processing within the same or smaller power budget is forcing significant architectural changes, more efficient software, and new materials into the equation. They are showing up in some of the latest announcements and presentations from companies across the semiconductor industry.

Architectural leaps
David “Dadi” Perlmutter, in his keynote address at the Intel Developer Forum this week, hinted at some architectural changes that will help pave the way for new voice and gesture-recognition interfaces. One involves near-threshold voltage scaling, something he referred to as “versatile performance.” As he put it, “if the platform is not warm enough, you scale down.”

To get to the next steps, Intel will need to add a number of architectural changes. The first will be rolled out next year with a 22nm processor, code-named Haswell, that includes its TriGate or finFET technology. That will be followed by a 14nm chip, which Intel reportedly is already testing.

Intel has been working with a variety of materials, including fully depleted SOI, and it has been experimenting with various gate structures and stacking approaches. But which ones ultimately get used depend on when it becomes economically required to change its processes and manufacturing. The company may buy some time just by using bulk CMOS combined with EUV lithography and 450mm wafer technology, in which it has invested heavily over the past few months. Bigger wafers and commercially viable EUV could well pave the way for advances at the next couple of process nodes.

In a speech prior to IDF, Intel Labs’ Gregory Ruhl talked about the energy benefits of Near Threshold Voltage (NTV) computing using Intel’s IA-32, 32nm CMOS processor technology. The so-called “Claremont” prototype chip relies on an ultra-low voltage circuit to greatly reduce energy consumption. This class of processor operates close to the transistor’s turn-on or threshold voltage—hence the NTV name. Threshold voltages vary with transistor type, but are typically low enough to be powered by a postage-stamp sized solar cell.

The other goal for the Claremont prototype was to extend the processor’s dynamic performance—from NTV to higher, more common computing voltages—while maintaining energy efficiency. Ruhl’s results showed that the technology works for ultra-low power applications that require only modest performance, from SoCs and graphics to sensor hubs and many-core CPUs. Reliable NTV operation was achieved using unique, IA-based circuit design techniques for logic and memories.

Further developments are needed to create standard NTV circuit libraries for common, low-voltage CAD methodologies. Apparently, such NTV designs require re-characterized constrained standard cell library to achieve such low corner voltages.

Rethinking standard approaches
Michael Parker, senior technical marketing manager at Altera, began a session at the recent Hot Chips conference by highlighting advances in the floating-point accuracy of FPGA devices. FPGAs are inherently better at fixed-point calculations, in part due to their routing architecture. Conversely, accurate floating-point calculations are dependent upon multiplier density for the extensive use of adders, multipliers, and other trigonometric functions. Often, these functions are pulled from libraries to form an inefficient multiplier implementation.

According to Parker, Altera took a different approach by using a new floating-point fused data path implementation instead of the existing IEEE-based method. The data path approach removes the typical normalization and de-normalization steps required in the multiplier-based IEEE representation. However, the data path approach only achieves this high floating point accuracy on smaller matrix functions (like FFTs), where low power GFlops per Watt performance and low latency—thanks to enough on-chip memory—are the primary requirements.

New materials
Robert Rogenmoser, senior vice president of product development and engineering at SuVolta, a semiconductor company focused on reducing CMOS power consumption, discussed ways to reduce transistor variability for low-power, high-performance chips.

Transistor variability at today’s lower process geometries comes from the typical sources of wafer yield variations and local transistor-to-transistor differences. Such variability has forced the semiconductor industry to look at new transistor technologies, especially for lower power chips.

What is the solution? Rogenmoser, in his Hot Chips presentation, discussed the pros and cons of three transistor alternatives: finFET or TriGate; fully-depleted silicon-on-Insulator (FD-SOI); and deeply-depleted channel (DDC) transistors. FinFET or TriGate technology promises high-drive current, but faces manufacturing, cost and intellectual property challenges. The latter point refers to IP changes required to support the new 3D transistor gate structures.

According to Rogenmoser, FD-SOI transistor technology enjoys the benefits of undoped channels, but lacks the capability of multi-voltages and a limited supply chain—a point that FD-SOI supporters say has already changed. Still, SuVolta favors deeply depleted channel transistors. This process offers straightforward insertions into bulk planar CMOS—especially from 90nm to 20nm and below. Equally important is the easy of migration of existing IP to the DDC process, he explained.

Rogenmoser concluded by explaining how DDC technology can bring back common low power tools to lower nodes, e.g., dynamic voltage and frequency scaling; body biasing and low-voltage operation.

Stacking die
Going vertical, or even horizontal through an interposer, is one of the most significant and physically observable architectural changes in the history of semiconductors. By shortening the wires and increasing the size of the data pipes, power can be reduced and performance can be increased significantly.

But how real is stacking? According to Sunil Patel, principal member of the technical staff for package technology at GlobalFoundries, it’s very real. “For 2.5D, 2014 will be a very interesting year,” said Patel. “By the end of 2013 the capability will be in place. Designs already are being considered and tried out. 3D mainly depends on memory standards and memory adoption. We’ll see a package-on-package and memory-on-logic configuration first. 3D memory has its own route, which is ahead of that. 3D memory on logic could be late 2014.”

He’s not alone in this belief. Steve Pateras, product marketing director for test at Mentor Graphics, said that from a tapeout point of view—the only window EDA companies have into architectural changes—2.5D already is happening. “We have customers taping out 2.5D. For 3D, we’re seeing design activity for memory on logic. Next year we’ll see some tapeouts.”

And Thorsten Matthias, business development director at EVGroup, said equipment is being sold to foundries right now to make this happen. “By the end of next year we believe all the major players will have production capacity for both 2.5D and 3D,” he said. “That’s probably not 20,000 to 50,000 wafers per month, but there will be production capacity at every player that wants to take a leading role. By the end of next year there will be a supply chain for 2.5D and 3D, although probably at a lower volume and for high-end products.”

Anatomy Of An Acquisition

Thursday, December 15th, 2011

By John Blyler
Lattice Semiconductor’s proposed acquisition of FPGA start-up SiliconBlue Technologies for $62 million in cash is the latest signal that the smart-phone market may be showing signs of overcrowding.

While researchers are quick to point out the growth rates of smart phones sales versus computers, there also are an unprecedented number of companies vying for a stake of that market. Lattice’s push into adjacent markets is a hedge against that overcrowding.

Lattice until now has focused on the high end of the smart phone market. Silicon Blue targets mid-range players such as watch companies.

Doug Hunter, vice president of marketing at Lattice, said both companies occupy complementary spaces in the mobile consumer market. Silicon Blue offers a reduced feature set at lower power and with a one-time programmable (OTP) memory technology that it licensed exclusively from Kilopass. “This will allow us to go into customers with both a simpler and smaller or bigger and more fully featured suite of products,” explained Hunter.

By far the larger company, Lattice has more than $250 million in cash on the balance sheet with a good quality track record, said Hunter. The company also has a much wider distribution and sales network than start-up Silicon Blue, which should help win sales from customers that are reluctant to deal with a start-up company.

Still, Lattice has had its share of challenges in recent times, including numerous CEOs over the last six years and loss of market share to giants such as Xilinx and Altera. Hunter acknowledge these challenges, but highlight the company’s current strategy of finding niche to “differentiate, duck, bob and weave” against the two industry giants.

The acquisition of Silicon Blue fits that strategy. In addition to its mid-range handset sales, Silicon Blue recently won a design in an unusual ultra-lower power niche market. Watchmaker giant Citizen Watch selected SiliconBlue’s extremely low-power FPGA device for use in its new Eco-Drive Satellite Wave watch. Citizen claims that this is the world’s first solar-powered GPS-synchronized watch.

One key element in this selection by Citizen was the ultra low power of the company’s 8,000 FPGA logic cells, based on TSMCs 65nm low-power standard CMOS process. The other key factor was the tiny 4×5 mm footprint of the wafer-level chip package, where the ball-grid array (BGA) is placed directly on the wafer. This ensures a very thin package, essentially the same size as the dye.

Silicon Blue optimizes its designs for ultra-low power by using transistors with very fast switching speeds in critical areas of the design like clock trees. Additionally, their design makes use of the default “off” state inherent in FPGAs. “The network is only switched on when it is being used,” explained a company spokesman.

This move by Citizen to incorporate greater electronic functionality in its watches represents an interesting convergence between the worlds of traditionally mechanical-digital systems and fully electronic systems. Citizen’s Eco-Watch is a traditionally high-end timepiece that incorporates modern GPS technology. On the other side of the convergence are fully electronic systems like Apple’s Nano, a multimedia player with Wi-Fi connectivity that now incorporates a digital watch display.

Worst Case Power Varies With Geometrics

Thursday, July 8th, 2010

By John Blyler
When designing for low power operation, engineers are constrained by the worst case (highest power) ratings for the silicon. But the power distribution characteristics of silicon can vary significantly from wafer lot to lot for the latest, lowest process geometry. How can designers deal with the worst case power ratings in their low power, high volume FPGAs designs?

First, let’s consider the process. To establish the power distribution range for their products, FPGA vendors start with a target yield. This yield provides the initial cost structure and allows them to publish numbers based on characterization over a statistically meaningful number of wafer lots, notes Christian Plante, director of marketing for low-power and mixed-signal FPGAs at Actel. “We characterize our silicon over many lots. Thus, it can take us a little while to put worst-case numbers (for the latest geometrics) into our software modeling tools.” The reason for this delay is that the latest process geometric nodes are less tamed than the older, higher, established nodes.

Characterizing worst-case conditions at higher nodes like 130nm isn’t a big problem. The manufacturing processes at these geometrics are well known. Thus, the power distribution curves are much tighter with less variation.

It’s the lower geometrics, like Xilinx’s and Altera’s 28nm processes, where the power distribution between wafer lots will be the most variant. And while this variation will tighten-up as the process matures, that will take some time.

Process variations during manufacturing also can worsen the affects of static power leakage, notes Michael Kendrick, product planning manager for Lattice Semiconductor. “As we move forward with geometries the voltage threshold decreases, which in turn causes static power leakage to increase, relative to dynamic power.” This results in a wider distribution of static power consumption over time – increasing the worst-case power constraints for FPGA designers.

Engineers are not without options. There are several techniques to mitigate the effects of static power leakage. For example, designers can be more careful on the mix of high-speed transistors used, since these transistors have higher leakage, says Kendrick. There are also process improvements that reduce leakage at 28nm.

The uncertainties of exact worst-case low-power conditions at lower geometrics, like 28nm, may give FPGA vendors of higher node chips an advantage. After all, the power distribution at higher nodes is more fully understood. Less variation in the power distribution of well-known, higher node geometrics should translate to less variant in the worse case power ranges.

But Actel’s Plante adds a note of caution, explaining that if the power distribution strays too far outside of customer expectations then the FPGA vendors can’t sell those chips—except to a customer that will accept the additional power consumption.

Further, FPGA vendors at the lower process nodes, like Xilinx’s new 28nm Virtex 7 and Altera’s Stratix V product lines, offer the lower power that is inherent with the move to smaller process geometry. Also, Xilinx emphasizes the power benefits of scalability with their new 28nm offerings. Both their lower-end, higher-volume and high-end, higher-performance FPGA families are built on the same underlying architecture, which may help mitigate the effects of wafer power distribution variations at the newer node.

The move to new process geometrics always brings new challenges. Fully understanding the variation of power distributions within the silicon is but one of those challenges that FPGA designers must understand when designing to worst-case power conditions.