Part of the  

Chip Design Magazine


About  |  Contact

Posts Tagged ‘Xilinx’

Next Page »

Are Best Practices Resulting in a Verification Gap?

Tuesday, March 4th, 2014

By John Blyler, Chief Content Officer

A panel of experts from Cadence, Mentor, NXP, Synopsys and Xilinx debate the reality and causes of the apparently widening verification gap in chip design.

A panel of semiconductor experts debate the reality and causes of the apparently widening verification gap in chip design.Does a verification gap exist in the design of complex system-on-chips (SoCs)? This is the focus of a panel of experts at DVCon 2014, which will include Janick Bergeron, Fellow at Synopsys; Jim Caravella, VP of Engineering at NXP – Harry Foster, Chief Verification Technologist at Mentor Graphics, John Goodenough, VP, ARM,  Bill Grundmann, a Fellow at Xilinx; and Mike Stellfox, a Fellow at Cadence. JL Gray, a  Senior Architect at Cadence, organized the panel. What follows is a position statement from the panelist in preparation for this discussion. – JB

Panel Description: “Did We Create the Verification Gap?”

According to industry experts, the “Verification Gap” between what we need to do and what we’re actually able to do to verify large designs is growing worse each year. According to these experts, we must do our best to improve our verification methods and tools before our entire project schedule is taken up by verification tasks.

But what if the Verification Gap is actually occurring as a result of continued adoption of industry standard methods. Are we blindly following industry best practices without keeping in mind that the actual point of our efforts is to create a product with as few bugs as possible, as opposed to simply trying to find as many bugs as we can?

Are we blindly following industry best practices …

Panelists will explore how verification teams interact with broader project teams and examine the characteristics of a typical verification effort, including the wall between design and verification, verification involvement (or lack thereof) in the design and architecture phase, and reliance on constrained random in absence of robust planning and prioritization to determine the reasons behind today’s Verification Gap.

Panelist Responses:

Grundmann: Here are my key points:

  • Methodologies and tools for constructing and implementing hardware have dramatically improved, while verification processes appear to have not kept pace with the same improvements.  As hardware construction is simplified, then there is a trend to have less resources building hardware but same or more resources performing verification.  Design teams with 3X verification to hardware design are not unrealistic and that ratio is trending higher.

    … we have to expect to provide a means to make in-field changes …

  • As it gets easier to build hardware, performing hardware verification is approaching software development type of resourcing in a project.
  • As of now, it very easy to quickly construction various hardware “crap”, but it very hard to prove any are what you want.
  • It possible that we can never be thoroughly verification “clean” without delivering some version of the product with a reasonable quality level of verification.  This may mean we have to expect to provide a means to make in-field changes to the products through software-like patches.

Stellfox: Most chips are developed today based on highly configurable modular IP cores with many embedded CPUs and a large amount of embedded SW content, and I think a big part of the “verification gap” is due to the fact that most development flows have not been optimized with this in mind.  To address the verification gap, design and verification teams need to focus more on the following:

  • IP teams need to develop and deliver the IP in a way that it is more optimized for SoC HW and SW integration.  While the IP cores need to be high quality, it is not sufficient to only deliver high quality IP since much of the work today is spent in integrating the IP and enabling earlier SW bring-up and validation.

    There needs to be more focus on integrating and verifying the SW modules with HW blocks …

  • There needs to be more focus on integrating and verifying the SW modules with HW blocks early and often, starting at the IP level to Subsystem to SoC.  After all, the SW APIs largely determine how the HW can be used in a given application, so time might be wasted “over-verifying” designs for use cases which may not be applicable in a specific product.
  • Much of the work in developing a chip is about integrating design IPs, VIPs, and SW, but most companies do not have a systematic, automated approach with supporting infrastructure for this type of development work.

Foster: No, the industry as a whole did not create the verification challenge.  To say this lacks an understanding of the problem.  While design grows at a Moore’s Law rate, verification grows at a double exponential rate. Compounded with increased complexity due to Moore’s Law are the additional dimensions of hardware-software interaction validation, complex power management schemes, and other physical effects that now directly affect functional correctness.  Emerging solutions, such as constrained-random, formal property checking, emulation (and so on) didn’t emerge because they were just cool ideas.  The emerged to address specific problems. Many design teams are looking for a single hammer that they can use to address today’s verification challenges. Unfortunately, we are dealing with an NP-hard problem, which means that there will never be a single solution that will solve all classes of problems.

Many design teams are looking for a single hammer that they can use to address today’s verification challenges.

Historically, the industry has always addressed complexity through abstraction (e.g., the move from transistors to gates, the move from gates to RTL, etc.). Strategically, the industry will be forced to move up in abstraction to address today’s challenges. However, there is still a lot of work to be done (in terms of research and tool development) to make this shift in design and verification a reality.

Caravella: The verification gap is a broad topic so I’m not exactly sure what you’re looking for, but here’s a good guess.

Balancing resource and budget for a product must be done across much more than just verification.

 Verification is only a portion of the total effort, resources and investment required to develop products and release them to market. Balancing resource and budget for a product must be done across much more than just verification. Bringing a chip to market (and hence revenue) requires design, validation, test, DFT, qualification and yield optimization. Given this and the insatiable need for more pre-tape out verification, what is the best balance? I would say that the chip does not need to be perfect when it comes to verification/bugs, it must be “good enough”. Spending 2x the resources/budget to identify bugs that do not impact the system or the customer is a waste of resources. These resources could be better spent elsewhere in the product development food chain or it could be used to do more products and grow the business. The main challenge is how best to quantify the risk to maximize the ROI of any verification effort.

Jasper: [Editor’s Note: Although not part of the panel, Jasper provided an additional perspective on the verification gap.]

  • Customers are realizing that UVM is very “heavy” for IP verification.  Specifically, writing and debugging a UVM testbench for block and unit level IP is very time consuming task in-and-of-itself, plus it incurs an ongoing overhead in regressions when the UVC’s are effectively “turned off” and/or simply used as passive monitors for system level verification.  Increasingly, we see customers ditching the low level UVM testbench and exhaustively verifying their IPs with formal-based.  In this way, the users can focus on system integration verification and not have to deal with bugs that should have been caught much sooner.

    UVM is very “heavy” for IP verification.

  • Speaking of system-level verification: we see customers applying formal at this level as well.  In addition to now familiar SoC connectivity and register validation flows, we see formal replacing simulation in architectural design and analysis.  In short, even without any RTL or SystemC, customers can use an architectural spec to feed into formal under-the-hood to exhaustively verify that a given architecture or protocol is correct by construction, won’t deadlock, etc.
  • The need for sharing coverage data between multiple vendors’ tool chains is increasing, yet companies appear to be ignoring the UCIS interoperability API.  This is creating a big gap in customers’ verification closure processes because it’s a challenge to compare verification metrics across multi-vendor flows, and they are none too happy about it.

Blog Review – Feb. 24 2014

Monday, February 24th, 2014

By Caroline Hayes, Senior Editor

ARM prepares for this week’s Embedded World in Nuremberg; Duo Security looks at embedded security; Xilinx focuses on LTE IP; Ansys rejoices in fluent meshing and Imagination strives to define graphic cores for comparison.

Equipped with a German phrase app on his (ARM-based) smartphone, Philippe Bressy is looking forward to Embedded World 2014, held in Nuremberg this week. His blog has some handy tips for tackling the show and why it is worth visiting the company’s stand and technical conference programme.

Anticipating his presentation at 2014 RSA Conference, Mark Stanislav, Duo Security, shares some insight into Internet of Things security. In the New Deal of Internet-Device Security, he explores security features in a mobile society for individuals, companies and governments.

Another exhibition that is happening in Europe this week, is Mobile World Congress, and Xilinx’s Steve Leibson looks at 4G and LTE proliferation and the latest IP from the company to support point-to-point and multipoint line of sight communications for 60GHz and 80GHz radio backhaul.

The virtues, even joys, of fluent meshing, is put under the spotlight by Andy Wade, Ansys in his blog. He considers the trends in CAE simulation, including innovations such as 3D and more complex geometries. There is also a link to the company’s top tech tips.

An interesting blog from Imagination Technologies attempts to compare graphics processors accurately. Rys Sommefeldt sets out what and how cores can be combined and used, and most importantly, how to compare like with like.

Blog Review January 06

Monday, January 6th, 2014

Happy 2014! This week the blog review is full of optimism for what the next 12 months has in store. By Caroline Hayes, senior editor.

A new year promises new hopes and, surely, some new technology. In Technology and Electronics Design Innovation: Big Things, Small Packages, Cadence’s Brian Fuller looks at some technology that caught his eye and some of the challenges and even moral dilemmas they may pose.

More predictions for 2014, as Intel’s Doug Davis looks forward to CES (Consumer Electronics Show). In Internet of Things: Transforming Consumer Culture and Business he looks ahead to CEO Brian Krzanich’s fireside chat (Fireside? In Vegas- really?) and urges visitors to grab a coffee from a vending machine. (Intel are at CES booth 7252.)

In a case of “What were they thinking?” Dustin Todd laments recent government action that he believes has increased, not reduced the threat of counterfeit chips. The SIA (Semiconductor Industry Association) feels thwarted and this debate is likely to go on for some time.
Still with CES, but also one eye on the TV schedules, SLeibso at Xilinx delves into the Vanguard Video H.265/HEV codec that was used in the 4K trailer for the Emmy winning House of Cards. Vanguard is demonstrating the codec this week in Las Vegas.

Blog Review – Dec 09

Monday, December 9th, 2013

Google encourages the world to wish the Queen of Software a happy birthday and prompts a revival of interest in her – and perhaps the role of women in technology; there is more news on the value of FinFET vs FDSOI and ARM looks back and looks ahead at DSP support and plays with RFduino.

If you googled anything today, you will see the graphic celebrating Grace Hopper’s 107th birthday. However, if you visit Harvard professor, Harry Lewis’ blog you will be charmed by a video link there showing the lady herself interviewed by a (very young) David Letterman. I thought she was taking out her knitting at one point, but it is actually a visualization of a nanosecond!

Sandy Adam is also a Grace Hopper fan. His blog celebrates “the queen of software” before looking to the next generation to take the crown as he introduces the Hour of Code project, part of Computer Science Education Week.

Drew Barbier ARM blog has fun with a kickstarter project called RFduino. As the name implies, RFduino is a Arduino shrunken to the size of a finger-tip with added wireless. Cool!

In the blink of an eye, the Sleibson, Xilinx blog caught my attention as the author explains the mechanics behind the Zynq SoC, driving an LED connected as a GPIO.

Mentor’s Arvind Narayanan poses some tough questions about FDSOI versus FinFET performance and better power than bulk; will the former at 20nm bridge the 16nm finFET gap; and what about cost. The war of words is well illustrated and may challenge your perceptions.

Jeffrey Gallagher, Cadence suggests getting crafty to maximize the potential of vias earlier in the flow, to save time and energy – what’s not to like?

There is a great showreel at the blog by Aurelien, Dassault Systèmes, showcasing the company’s acquisition of German 3D visualization company, RealTime Technology.

Looking back and also looking ahead, Richard York, ARM, proudly relates his finest DSP moment, with the announcement that Matlab’s Simulink environment will direct code-calls to produce optimised code for ARM, through the CMSIS DSP libraries. Looking ahead, the demo will be running at Embedded World in Nuremberg, in February.

Blog Review – Dec 02

Monday, December 2nd, 2013

By Caroline Hayes, senior editor

While everyone else hits the shops, one blogger want to see presents fall from the sky. This week there is also a sleek design project, new uses for SoCs, an Arduino project (with video) a hypervisor discussion and a summit review.

The power of the blog – it just keeps giving, as Sleibso, a Xilinx employee demonstrates. He references Dave Jones, proprietor of the video blog who discusses “Automated PCB Panel Testing” which set the author to thinking about how Zynq All Programmable SoCs can be used in a new way – as ATE ports.

Sleek good looks are not just the preserve of the fashion world. Take a look at bleu, the CATIA design showcar that was developed with Dassault Systèmes’ technologies and with CATIA designers and engineers working in concert to produce a symphony of aesthetically pleasing, aerodynamic design to a very tight schedule. Arnaud’s blog has video that shows off the sleek car while revealing the design process.

Another, less conventional form of transport is occupying Eric Bantegnie, Ansys, is getting excited about online retailer, Amazon, using drones, called octopeters, to deliver products to customers 30mins after clicking the ‘buy’ button. Sadly, the drone-delivery is a few years’ away from being a reality. Until the technology arrives, Bantegnie will have to traipse around the shopping malls for presents this year, like everyone else!

Talking of presents and new toys, Drew Barbier is reflecting on what to do with this RFduino – the module based on a Nordic Semiconductor nRF51822, with ARM Cortex-M0 and Bluetooth 4.0 LE support. Making up for the time lost when he first contemplated the project, he seems to have had great fun shrinking the Arduino to the size of a fingertip and adding wireless

Another generous blogger is Colin Walls, Mentor, who continues the Embedded Hypervisor discussion with a mixture of compliments: quoting a colleague’s view on virtualization, but questioning his culinary skills…..

Finally, Richard Goering, Cadence, reports from the Signoff Summit, reviewing the technology behind the Tempus Timing Signoff Solution, and offering some insight into the challenges in static timing analysis.

Network on Chip Solution Is Gaining Market Shares

Thursday, November 14th, 2013

by Gabe Moretti, Contributing Editor

It is important to notice how much network on chip (NoC) architectures have established themselves as the preferred method of connectivity among IP blocks.  What I found lacking are tools and methods that help architects explore the chip topology in order to minimize the use of interconnect structures and to evaluate bus versus network tradeoffs.


Of course there are busses for SoC designs, most of which have been used in designs for years.  The most popular one is the AMBA bus first introduced in 1996.  Up to today there are five versions of the AMBA bus.  The last one introduced this year is the AMBA 5 CHI (Coherent Hub Interface) that offers a new high-speed transport layer and features aimed at reducing congestion.

Accellera offers the OCP bus developed by OCP-IP before it merged with Accellera.  It is an openly licensed protocol that allows the definition of cores ready for system integration that can be reused together with their respective test benches without rework.

The OpenCores open source hardware community offers the Wishbone bus.  I found it very difficult to find much information about Wishbone on the website, with the exception of three references to implementations using this protocol.  Wishbone is not a complete bus definition, since it has no physical definitions.  It is a logic protocol described in terms of signals and their states and clock cycles.

Other bus definitions are proprietary.  Among them designers can find Quick Path from Intel, Hyper Transport from AMD, and IPBus from IDT.

IBM has defined and supports the Core Connect bus that is used in its Power Architecture products and is also used with Xilinx’s MicroBlaze cores.

Finally Altera uses its own Avalon bus for its Nios II products line.

Clearly the use of busses is still quite pervasive.  With the exception of proprietary busses, designers have the ability to choose both physical and protocol characteristics that are best suited for their design.

Network on Chip

There are two major vendors of NoC solutions: Arteris and Sonics.  Arteris is a ten year old company headquartered in Sunnyvale but with engineering center near Paris, France.  its technology is derived from the computer networking solutions that are modified to the requirements of SoC realizations.  Its products deal both with on-chip as well as with die-to-die and multi-chip connectivity.

Sonics was founded in 1996.  In addition to network on chip products it also offers memory subsystems, and tools for performance analysis and development tools for SoC realizations.  It offers six products in the NoC market covering many degrees of sophistication depending on designers’ requirements.  SonicsGN is its most sophisticated product.  It offers a high-performance network for the transportation of packetized data, utilizing routers as the fundamental switching elements.  On the other hand SonicsExpress can be used as a bridge between two clock domains with optional voltage domain isolation.  It supports AXI and OCP protocols and thus can be integrated in those bus environments.

After the panel discussion on IP Blocks Connectivity that covers mostly NoC topics, I spoke with Avi Behar, Product Marketing Director at Cadence.  Cadence had wanted to participate to the discussion but their request had come too late to include them in the panel.  But, information is important, and scheduling matters should not become an obstacle.  So I decided to publish their contribution in this article.

The first question I asked was: on chip connectivity uses area and power and also generates noise.  Have we addressed these issues sufficiently?

Avi:A common tendency among on-chip network designer is to over design. While it’s better to be on the safe side – under–designing will lead to the starvation of bandwidth hungry IPs and failure of latency critical IPs – over–designing has a cost in gate count and as a result in power consumption. In order to make the right design decisions, it is crucial that design engineer run cycle-accurate performance analysis simulations (by performance I primarily mean data bandwidth and transaction latency) with various configurations applied to their network. By changing settings like outstanding transactions, buffer depth, bus width, QoS settings and the switching architecture of the network and running the same, realistic traffic scenarios, designers can get to the configuration that would meet the performance requirements defined by the architects without resorting to over-design. This iterative process is time consuming and error prone, and this is where the just launched Cadence Interconnect Workbench (IWB) steps in. By combining the ability to generate a correct-by-construction test bench tuned for performance benchmarking (the RTL for the network is usually generated by tools provided by the network IP provider) with a powerful performance analysis GUI that allows side-by-side analysis of different RTL configurations, IWB greatly speeds this iterative process while mitigating the risks associated with manual creation of the required test benches.

What type of work do we need to do to be ready to have a common, if not standard, verification method for a network type of connectivity?

Avi: There are two aspects to the verification of networks-on-a-chip: functional verification and ‘performance verification’. Functional verification of these networks needs to be addressed at two levels: first make sure that all the ports connected to the network are compliant with the protocol (say AMBA3 AXI or AMBA4 ACE) that they are implementing, and secondly, verifying that the network correctly channels data between all the master and slave nodes connected to the network. As for performance verification, while the EDA industry has been focusing on serving the SoC architect community with virtual prototyping tools that utilize SoC models for early stage architectural exploration, building cycle accurate models of the on-chip network, capturing all the configuration options mentioned above is impractical. As the RTL for the connectivity network is usually available before the rest of the IP blocks, it is the best vehicle for performing cycle-accurate performance analysis. Cadence’s IWB, which as described in the previous answer, can generate a test bench tuned for running realistic traffic scenarios and capturing performance metrics. IWB can also generate a functional verification testbench which addresses the two aspects I mentioned earlier – protocol compliance at the port level and connectivity across the on-chip network.

What do you think should be the next step?

Avi: Many of our big SoC designing customers have dedicated network-on-a-chip verification teams who are struggling to get not only the functionality right, but ever more importantly, get the best performance while removing unnecessary logic. We expect this trend to intensify, and we at Cadence are looking forward to serving this market with the right methodologies and tools.

The contribution from Cadence reinforced the point of views expressed by the panelists.  It is clear that there are many options available to engineers to enable communication among IP blocks and among chips and dies.  What was not mentioned by anyone was the need to explore the topology of a die in view of developing the best possible interconnect architecture in terms of speed, reliability, and cost.

Research Review Nov.13

Tuesday, November 12th, 2013

By Caroline Hayes, Senior Editor

Polymers snap to it, when triggered by light; algorithmic memory is developed for ASICs and SoCs; UDP/IP engines target Stratix V and Virtex 7 FPGAs and supercapacitors stretch their limits.

Maybe not faster than a beam of light, but influenced by it, researchers at the University of Pittsburgh Swanson School of Engineering found polymers that snap when triggered by a beam of light, converting light energy into mechanical action. It is like a venus fly trap, mused M Ravi Shankar, associate professor of industrial engineering, who conducted the research in collaboration with Timothy J White, Air Force Research Laboratory at Wright-Patteson Air Force Base and Matthew Smith, assistant professor; of engineering at Hope College, Holland, Michigan. (Early Edition of the Proceedings of the National Academy of Sciences – PNAS). Just as the underlying mechanism of a venus fly trap is slow, the unsuspecting prey is caught because it uses elastic instability, which snaps the trap shut – tight. The project examined polymeric materials’ actuation rates and output. A handheld laser provided the light for the polymers to generate power to convert the light into a mechanical action without a power source or wiring. Specific functions were pre-programmed into the material for it to be controlled by changing the character of the light. The technology could eliminate traditional machine components, and, speculates Dr Shankar, also develop biomedical devices that are adaptive and easily controlled.

STMicroelectronics and Memoir Systems have collaborated to create Algorithmic Memory Technology for embedded memories.
Integrated into ASICs (application-specific integrated circuits) and SoCs (systems on chips) manufactured in the former’s FD-SOI (fully-depleted silicon-on-insulator) process technology, they are able to exploit the process’ power and performance advantages and combine low soft error rate and ultra-low leakage currents, claims the company. This makes the memories particularly suitable for mission-critical applications, such as transportation, medical, and aerospace programs. Soft error rate is 50 to 100 times better than equivalent bulk technology, measured below 10FIT/Mbit (Failure-in-Time or failures per billion-chip hours). Devices produced in FD-SOI are also found to produce as much as 30% faster performance and as much as 30% greater energy efficiency when compared with the same products manufactured in bulk technology.
According to STMicroelectronics, FD-SOI process technology produces ASICs and SoCs that run faster and cooler than devices built from alternative process technologies, Adding third-party IP (intellectual property) from Memoir Systems demonstrates how simple porting is, says the company.
The European semiconductor supplier is the first to make FD-SOI process technology available. It extends and simplifies existing planar, bulk-silicon manufacturing; an FD-SOI transistor operates at higher frequencies than equivalent transistor manufactured using bulk CMOS because of improved transistor electrostatic characteristics and a shorter channel length, says the company.

Second-generation UDP/IP (user datagram protocol/internet protocol) off-load engines were demonstrated at the Low Latency Summit, NY, New York.
Digital Blocks, released the DB-UDP=IP-HFT IP core hardware stack / UDP Off-Load Engine (UOE) targeting Altera Stratix V and Xilinx Virtex 7 FPGAs on network adapter cards with one or more 10 / 40Gbit Ethernet network links. The engine targets trading systems with sub-microsecond packet transfers between network wire and host. The embedded DMAC (direct memory access controller) controls low-latency, parallel packet payload transfers between memory and the company’s UP/IP packet engines in, for example financial trading companies.

Researchers at the University of Delaware have developed a compact, stretchable, wire-shaped supercapacitor, based on continuous carbon nanotube fibers. Wire-shaped supercapacitors could use the advantages of recharging in seconds, a longer life span than conventional batteries and high reliability and robust storage, in wearable devices.
University of Delaware professors Tsu-Wei Chou and Bingqing Wei have developed the supercapacitor using prestraining-then-buckling to fabricate the wire-shaped supercapacitor using a Spandex fiber as the substrate, a polyvinyl alcohol-sulfuric acid gel as the solid electrolyte, and carbon nanotube fibers as the active electrodes.
When subjected to a tensile strain of 100% over 10,000 charge/discharge cycles, the carbon nanotube supercapacitor’s electrochemical performance improved to 108%, demonstrating its electrochemical stability.
Wei, explains that the network of individual carbon nanotubes and their bundles endow the fibers with the capacity to withstand large deformation without sacrificing electrical conductivity or mechanical and electrochemical properties.
The professors published their findings in Advanced Energy Materials. UD’s Tsu-Wei

Pictured: Tsu-Wei Chou (left) with visiting scholar, and first author on the report, Ping Xu. Photo – Ambre Alexander

Using Power Aware IBIS v5.0 Behavioral IO Models To Simulate Simultaneous Switching Noise

Thursday, April 25th, 2013

Typically simultaneous switching noise (SSN) transient simulations require significant CPU and RAM resources. A prominent factor affecting both CPU and RAM resource requirements is the number of MOSFET models included in the post layout extracted IO netlists. By replacing the IO netlists with power aware IBIS v5.0 behavioral models, both the CPU and RAM resource requirements are dramatically reduced. A comparison of several SSN transient simulations whereby the aggressor frequency is sweep across a wide frequency range is shown. The resultant victim waveforms will clearly demonstrate that each SSN transient simulation using post layout extracted IO netlists requires days to run compared to just mere minutes using power aware IBIS v5.0 behavioral models. Most notably, there is no significant loss in accuracy. In fact, in many cases, there is an increase in accuracy due to convergence issues associated with post layout extracted IO netlists. The power aware IBIS v5.0 behavioral models offer both dramatically faster transient simulation times and lower memory requirements. Improvements to these two key metrics without sacrificing accuracy, allows for more aggressive and accurate signal and power integrity analysis than has previously been possible.

To view this paper, click here.

Fundamental Laws of (FPGA) Nature: Similar, Yet Different

Monday, January 14th, 2013

Lattice and Xilinx muse on parallelism, partial reconfigurability, and the state-of-the-art in IP and EDA tools.

Most hardware and software designers end up dealing with FPGAs in some way or another. Either the system they’re working on incorporates one or more FPGAs and they have to write code or create logic to deal with them, or they simulate hardware behavior using a functionally-accurate simulator based upon FPGA reprogrammable logic. Because of this familiarity, many taken-for-granted FPGA truisms – let’s call them “laws of FPGA nature” – go unchallenged. We’re going to debunk a few of them here.

For example, designers assume that FPGAs always get bigger, denser and more expensive. Or that coding one up requires a mystical knowledge of C, HSPICE, HDL, RTL, and TLC finesse. It’s also a given that FPGAs are power hogs and are incapable of being used in low power designs like mobile handsets or tablet computers or the ultimate mobile device – your car. On the other hand, FPGAs are so flexible – essentially a blank sea of gates canvas – that low levels of abstraction (LUTS, MUXES, crossbars, NAND gates and so on) are fundamental building blocks that take huge effort to form into complex logic like processors, interface drivers, or MPEG decoders.

To answer these questions and more for this issue’s Roundtable Q&A, we turned to two of the biggest names in the business: Lattice and Xilinx. While it might seem a better match would be found between Altera and Xilinx, everyone lumps A and X together.  Let’s face it, they play leapfrog all the time and their product lines are materially similar at the high density end of the market. Lattice, on the other hand, is more PLD-like and focuses at the cost-effective end of the market (Figure 1). However, Lattice remains surprisingly similar in capability to companies like Xilinx in hard logic integration, IP, EDA tool suites, and target markets.  In fact, Lattice probably has a better chance of deploying FPGAs in smartphones, while Xilinx is really close to shipping Zynq-7000 SoCs into cars.


Lattice and Xilinx weigh in on the same set of questions, and their answers are at times in lockstep (IP, tools) or at opposite ends of the market (partial reconfiguration). Together, our experts offer a fabulous overview of the market from small- to high-density FPGAs.

EECatalog: Let’s face it, designing FPGAs is difficult and requires special knowledge, tools, and a mindset different from either coding or hardware layout.  Yet the FPGA, PLD, and EDA vendors are improving tool suites all the time.  What are some of the latest advances and what are some of the ones designers still are clamoring for?


According to Mike Kendrick, Director of Software Marketing, Lattice Semiconductor: There have been solid advances in providing designers pre-built functional blocks that speed up their design entry, design verification and timing closure tasks.  For the foreseeable future, the HDL design flow continues to be the best alternative for users engaged in lower density programmable logic designs, as it gives them the control they need to hit their aggressive cost and performance targets.  In larger density designs, HW/SW co-design flows, where functionality can be moved easily between SW and HW, have the promise of moving system cost/performance to an entirely new level.  However, these flows will take a long time to perfect, and will require users to acquire new skills.  The more immediate need, where the processor is integrated on-chip with the FPGA, is a new class of cross-domain debugging tools to provide the visibility and control that embedded designers expect from their current discrete processor solutions.


Responds David Myron, Xilinx director of Platform Technical Marketing: Answer in a word…productivity. Productivity lowers our customers’ costs and enables them to get their end products to market faster, next generation design tools are focusing on what we consider the two pillars of productivity: integration and implementation.

The first pillar entails integrating a variety of IP from multiple domains, like algorithmic IP written in C/C++ and System C, RTL level IP, DSP blocks, and connectivity IP. Not only must this IP be integrated successfully, but it must be verified quickly—as individual blocks and as an entire system. For integration of differing types of IP, for example, the latest integration solutions provide an interactive environment to graphically connect cores provided by third parties or in-house IP using interconnect standards such as AMBA-AXI4. With easy drag-and-drop integration at the interface level, these solutions can guarantee that the system is structurally correct by construction through DRC checks.

The second pillar involves the capability of implementing multi-million logic cell designs for optimal quality-of-results in the shortest time possible. Because designs continue to increase in size and complexity, next generation solutions are now using single, scalable data models throughout implementation to provide users insight into design metrics such as timing, power, resource utilization, and routing congestion early in the implementation process. With up to a 4x productivity advantage over traditional development environments, the Xilinx Vivado Design Suite attacks these four (4) major bottlenecks in programmable systems integration and implementation.


For instance, design changes are inevitable but schedules are often inflexible. Tools are now allowing for small changes to be quickly processed by only re-implementing small parts of the design, making iterations faster after each change. The latest tools can take a placed and routed design, this allows a designer to make ECO changes such as moving instances, rerouting nets, or tapping registers to primary outputs for debug—all without needing to go back through synthesis and implementation.

EECatalog: Partial reconfiguration on-the-fly is something major FPGA vendors have been talking about for a while. What’s new?

David Myron, Xilinx: Partial reconfiguration technology allows dynamic modification of FPGA logic by downloading partial bit files without interrupting the operation of the remaining logic. Designers can reduce system cost and power consumption by fitting sophisticated applications into the smallest possible device. This has been particularly useful with our customers developing space applications, software defined radio, communications, video and automotive markets. Using space systems as an example, ‘upgrades’ via partial reconfiguration reduce non-volatile rad-hard memory requirements—an expensive and limited resource on in-flight systems. Partial reconfiguration is available in the full line of 7 series FPGAs and Zynq-7000 SoCs, with new capabilities including dedicated encryption support and partial bitfile integrity checks.

Kendrick, Lattice: PROTF (Partial Reconfiguration On the Fly) has been an interesting area of research for many years.  The latest advances by certain FPGA vendors, while showing solid progress, still leave a lot of issues unresolved.

The primary obstacles to PROTF have always been more “design-flow” oriented than “silicon enablement” oriented.  The “silicon enablement” challenge has been largely understood, and solved, for many years; however, it carries a significant silicon area overhead and so is not economically viable unless the customer’s designs actually leverage the PROTF capabilities.  On the other hand, the “design-flow” challenges are quite substantial, and remain unsolved.  As one of many examples, users will need a method to simulate (and debug) their design functioning during reconfiguration to ensure that their system level design is operating correctly.  While certain vendors have recently demonstrated design flows that deploy PROTF when targeting a very narrow set of highly algorithmic, computationally intense problems, no one has demonstrated any capability to deliver such benefits to the design flow for “typical” digital logic systems.

EECatalog: FPGAs get bigger, denser, and more SoC-like.  What is do-able today that was unheard of only 3 years ago?

Kendrick, Lattice: Not all FPGAs are getting bigger, and the market for lower density devices is growing.  For example, while the breadth of densities that Lattice offers is increasing, we are more focused on creating the lowest cost, lowest power solution at a given density.  For instance, our MachXO2 FPGA, despite its low cost and low power, includes hard logic for commonly used interfaces, including SPI and I2C.  Our mixed signal Platform Manager product integrates analog circuits with programmable logic specifically to reduce the cost of power management within more complex systems. Our iCE40 FPGA uses an extremely small (and unique) non-volatile programming cell combined with an innovative programming architecture to enable a new low cost standard for programmable logic.

Myron, Xilinx: Access to “bigger” devices is a natural customer requirement. The “denser” devices, particularly All Programmable 3D FPGAs, open more opportunities in test, measurement and emulation markets.  The density and integration of the fabric—including CLBs, Block RAM and DSP blocks—allow performance levels that are not available in multi-chip solutions because of chip-to-chip delay.

SoC [FPGA] architectures such as Zynq alleviate multi-chip solutions, and have opened up new markets requiring high speed signal processing and real-time responsiveness. Having the complete processing system linked to the FPGA fabric allows architects to partition their design into software in the processing sub-system or accelerators in the FPGA fabric, all on one integrated chip.

EECatalog: The fastest growing markets on the planet deal with wireless connectivity.  FPGAs have a strong play in the infrastructure—but what’s required to get their power down enough to be deployed in the actual battery-powered embedded device?  Does this affect other markets/systems as well?

Kendrick, Lattice: There are at least two distinct markets: the bandwidth-driven wireless infrastructure market and the power-driven mobile device market.

First, to answer whether FPGA power can be sufficiently reduced, it already has been.  Our iCE40 and MachXO2 FPGA families achieve both mobile-friendly static power levels (~10-50µW) and consumer market-friendly costs (~$1.00 ASP).

Yes, there are significant tradeoffs required at every level of the ecosystem in order to develop products for one market versus the other.  Fundamentally, one ecosystem is driven by high-speed switching, while the other is driven by low-power operation.  With that in mind, the following tradeoffs must be made:

  1. Speed/Power Process Tradeoff: The types of processes that are used to design bandwidth-driven infrastructure FPGAs have far too much static leakage power to also support mobile devices, while the processes that can support mobile devices with very low static leakage power have slightly slower transistors.
  2. Design Tradeoff: Today many FPGAs are designed using NMOS pass gates in the routing fabric (for cost and speed), while low power mobile FPGAs must employ full CMOS pass gates in the routing fabric.  One design cannot effectively support both markets.
  3. Interface Standards: The infrastructure market demands very high-performance IOs – from high-speed SERDES (PCIe, etc.) to high speed memory interfaces (such as DDR3).  The mobile market has a very different set of interface standards; for example, the MIPI Alliance is driving a new set of very low power IO interfaces such as D-PHY and M-PHY.  So, the infrastructure and mobile ecosystems have very different IO interface requirements and one design cannot effectively support both markets.
  4. Package Requirements: The infrastructure market demands very high IO counts (typically ~400-800), which drive very large and expensive packages (currently flip-chip is the technology of choice while, most recently, 3D/TSV package technology is being developed).  The mobile ecosystem is at the opposite end of the spectrum, where size and board space is at a premium.  As a result, the focus here is on small packages (typically 2mm x 2mm) with fewer IOs (typically ~20-40) and aggressive ball pitch (typically 0.4mm) in order to maximize IO count while minimizing board footprint.

These two unique markets drive two fundamentally different FPGA solutions – and the differences exist at every level.

EECatalog: The two biggest features of FPGAs are parallelism and raw bandwidth/throughput.  What’s new in these areas at the chip- and system-level?

Kendrick, Lattice: FPGAs certainly provide designers the ability to implement parallel algorithms, and thus increase a system’s throughput if this is applied to a bottleneck.  Lattice, for example, provides a complete system building solution with our LatticeMico System Builder, and also unique to the industry the company provides a choice of both a 32-bit microprocessor and 8-bit microcontroller.  So, designers can quickly build custom platforms that have parallel engines, and marry that to the amount of serial processing power they need.

Myron, Xilinx: Communication protocols continue to require higher line rates and throughput from generation to generation. The latest devices provide up to 28 Gb/s transceivers, and soon we’ll see 32+ Gb/s and 56Gb/s transceivers to support next generation protocols and beyond. Yet with higher line rates comes the challenge of ensuring high channel quality in the context of the system.  As signals travel across a printed circuit board (PCB), the high-speed components of the signal get attenuated. This is why auto-adaptive equalization is imperative for transceivers—to automatically compensate for any channel-driven signal distortion.  As an example, network line cards can be moved from slot to slot on a system’s backplane while still maintaining high signal integrity– despite the fact the channel lengths have changed.  These auto-adaptive equalization solutions are already available in the Xilinx 7 series FPGAs and will be optimized further in our next generation devices.

Higher in-coming data flow requires greater parallelism and wider data busses inside the FPGA.  Current FPGAs at 28nm handle the most aggressive requirements of today. To support next generation serial bandwidth requirements, improvements in both silicon and tool fabric are needed. Silicon fabric will need to be optimized across many architectural blocks, along with improvements in routing architecture to support as much as 90% device utilization, which is a challenge in the industry today. Furthermore, design tools need to be “co-optimized” with devices to ensure designers get maximum value. Next generation routing architectures in the silicon, for example, have to be coupled with advancements in routing algorithms in the tools.


Chris A. Ciufo is senior editor for embedded content at Extension Media, which includes the EECatalog print and digital publications and website, Embedded Intel® Solutions, and other related blogs and embedded channels. He has 29 years of embedded technology experience, and has degrees in electrical engineering, and in materials science, emphasizing solid state physics. He can be reached at

New Kinds Of Hybrid Chips

Thursday, June 28th, 2012

By John Blyler and Staff
Crack open any SoC today and it will contain a variety of third-party memory, processor cores, internally and externally developed software and analog. In fact, the main challenge of most chip designs today is integration and software development rather than developing the chip from scratch.

By that definition, almost any chip is a hybrid. But the definition is about to expand significantly over the next few years, as Moore’s Law becomes increasingly difficult to follow and more of the chip is developed in discrete pieces that may go together horizontally, vertically, and sometimes even virtually.

Stacking of die, notably 2.5D configurations, is merely the first step in this process. Going vertical with 2.5D and full 3D versions will likely create a market for subsystems that are silicon-hardened. This makes good sense from a business standpoint, because not every part of the chip needs to be manufactured using the latest process node. In fact, analog developed and verified at older process nodes will likely work fine with a processor core developed at 20nm.

“In general, the trends are toward it being harder and harder from a process technology standpoint for foundries to create a process that is good,” said Hans Bouwmeester, director of IP at Open-Silicon. “That’s true for digital CMOS, for analog, for embedded DRAM and for embedded flash. We’re going to see a lot more heterogeneous die in a package, each in its own process technology. So you’ll have the CPU die in digital using a low-power/high-performance process, a high-speed I/O die with high-speed SerDes, and then you’ll have specialized RF, DRAM, and flash.”

FPGAs could well become part of the stack, as well. In fact, both Xilinx and Altera have created 2.5D planar chips and have commented publicly that they can be used in stacked configurations with other die.

“What you’ll see is that one side will become more specialized,” said Bouwmeester. “The other side will be everything in a package, which opens up enormous possibilities.”

At least part of this is being made possible by software. Getting to tape-out is still a big problem, but it’s certainly not the only one—and maybe not even the biggest. Software development has become a huge challenge. Recent IBS data (see Figure 3) agrees with other evidence that software has become the big driver of cost and schedule. What is unique to the IBS data is confirmation that this trend accelerates at each lower geometric silicon node. Like chip hardware, software—including firmware, operating systems, middleware and even applications—becomes more complex with each generation of Moore’s Law.

Software tends to be the main product differentiator, in large part because hardware has become a commodity—a trend that will likely continue as subsystems and processor platforms become too expensive for most companies to develop. In a “fast market” such as mobile handsets, manufacturers that miss the market by as little as 9 to 12 months may lose $50M to $100M in potential revenue. This revenue loss combined with the extra development time required by software is one reason why software and hardware co-design approaches are so important. In addition, it explains the rise in popularity of virtual and FPGA-based prototype systems and emulation platforms.

Chips also can be built with the assumption that they’re part of a broader communication and storage scheme. Apple’s iCloud is one example of this, where at least some of the processing is done externally, allowing devices to behave almost like thin clients at times, and as fully functional processors at others. This virtualization allows a whole new set of tradeoffs in design, putting as much or more emphasis on the I/O as on the processor and memory.

Mixing and matching
All of these considerations are the result of a big speed bump in IC design, which has forced the semiconductor industry to look elsewhere for gains in performance and efficiency. Double patterning at 20nm has greatly increased the cost of manufacturing, and it will increase further still at 14nm if EUV isn’t commercially viable. They key sticking point there is how many wafers can be processed per hour using EUV. It currently is way too slow to be a viable replacement for 193nm immersion lithography.

But there also is a possibility of double patterning only part of a chip, and developing the rest on the same planar die in an older node. Luigi Capodieci, R&D fellow at GlobalFoundries, noted this is a very real possibility for reducing development costs in the future. But so are new techniques such as directed self-assembly, which can supplement multi-patterning and potentially help keep the cost down.

Still, cost isn’t the only issue that has to be considered. Heat is difficult to remove from chips that are packaged together. While some of that can be programmed away, running a processor at maximum speed for a short period of time and then shutting down, some of it also has to be engineered out with new structures such as FinFETs and new materials such as silicon on insulator (SOI), which can reduce current leakage that causes heat in the first place. What’s new here is that chips may be a combination of all of these things, with companies investing more money in certain portions of a chip—or a die within a package—and reducing costs in other areas. So areas that don’t generate much heat, or functions that aren’t used as often, won’t require as much engineering or the latest process technology and presumably can be done using single patterning.

IC design and manufacturing have been largely evolutionary. After decades of slicing costs at every new process node, it’s difficult to give up on a model that has worked well. The move to 450mm wafers will help boost efficiency even further, providing that yields are reasonable.

However, there is also a growing recognition that not all parts of a chip will continue down the Moore’s Law path at 20nm and beyond. Some portions of an SoC will remain on that path, others will not. But they may all be part of the same aggregate solution, packaged together in unique ways that can actually improve performance, lower power consumption, and get to market on time and with minimal risk of failure.

Next Page »