Lattice and Xilinx muse on parallelism, partial reconfigurability, and the state-of-the-art in IP and EDA tools.
Most hardware and software designers end up dealing with FPGAs in some way or another. Either the system they’re working on incorporates one or more FPGAs and they have to write code or create logic to deal with them, or they simulate hardware behavior using a functionally-accurate simulator based upon FPGA reprogrammable logic. Because of this familiarity, many taken-for-granted FPGA truisms – let’s call them “laws of FPGA nature” – go unchallenged. We’re going to debunk a few of them here.
For example, designers assume that FPGAs always get bigger, denser and more expensive. Or that coding one up requires a mystical knowledge of C, HSPICE, HDL, RTL, and TLC finesse. It’s also a given that FPGAs are power hogs and are incapable of being used in low power designs like mobile handsets or tablet computers or the ultimate mobile device – your car. On the other hand, FPGAs are so flexible – essentially a blank sea of gates canvas – that low levels of abstraction (LUTS, MUXES, crossbars, NAND gates and so on) are fundamental building blocks that take huge effort to form into complex logic like processors, interface drivers, or MPEG decoders.
To answer these questions and more for this issue’s Roundtable Q&A, we turned to two of the biggest names in the business: Lattice and Xilinx. While it might seem a better match would be found between Altera and Xilinx, everyone lumps A and X together. Let’s face it, they play leapfrog all the time and their product lines are materially similar at the high density end of the market. Lattice, on the other hand, is more PLD-like and focuses at the cost-effective end of the market (Figure 1). However, Lattice remains surprisingly similar in capability to companies like Xilinx in hard logic integration, IP, EDA tool suites, and target markets. In fact, Lattice probably has a better chance of deploying FPGAs in smartphones, while Xilinx is really close to shipping Zynq-7000 SoCs into cars.
Lattice and Xilinx weigh in on the same set of questions, and their answers are at times in lockstep (IP, tools) or at opposite ends of the market (partial reconfiguration). Together, our experts offer a fabulous overview of the market from small- to high-density FPGAs.
EECatalog: Let’s face it, designing FPGAs is difficult and requires special knowledge, tools, and a mindset different from either coding or hardware layout. Yet the FPGA, PLD, and EDA vendors are improving tool suites all the time. What are some of the latest advances and what are some of the ones designers still are clamoring for?
According to Mike Kendrick, Director of Software Marketing, Lattice Semiconductor: There have been solid advances in providing designers pre-built functional blocks that speed up their design entry, design verification and timing closure tasks. For the foreseeable future, the HDL design flow continues to be the best alternative for users engaged in lower density programmable logic designs, as it gives them the control they need to hit their aggressive cost and performance targets. In larger density designs, HW/SW co-design flows, where functionality can be moved easily between SW and HW, have the promise of moving system cost/performance to an entirely new level. However, these flows will take a long time to perfect, and will require users to acquire new skills. The more immediate need, where the processor is integrated on-chip with the FPGA, is a new class of cross-domain debugging tools to provide the visibility and control that embedded designers expect from their current discrete processor solutions.
Responds David Myron, Xilinx director of Platform Technical Marketing: Answer in a word…productivity. Productivity lowers our customers’ costs and enables them to get their end products to market faster, next generation design tools are focusing on what we consider the two pillars of productivity: integration and implementation.
The first pillar entails integrating a variety of IP from multiple domains, like algorithmic IP written in C/C++ and System C, RTL level IP, DSP blocks, and connectivity IP. Not only must this IP be integrated successfully, but it must be verified quickly—as individual blocks and as an entire system. For integration of differing types of IP, for example, the latest integration solutions provide an interactive environment to graphically connect cores provided by third parties or in-house IP using interconnect standards such as AMBA-AXI4. With easy drag-and-drop integration at the interface level, these solutions can guarantee that the system is structurally correct by construction through DRC checks.
The second pillar involves the capability of implementing multi-million logic cell designs for optimal quality-of-results in the shortest time possible. Because designs continue to increase in size and complexity, next generation solutions are now using single, scalable data models throughout implementation to provide users insight into design metrics such as timing, power, resource utilization, and routing congestion early in the implementation process. With up to a 4x productivity advantage over traditional development environments, the Xilinx Vivado Design Suite attacks these four (4) major bottlenecks in programmable systems integration and implementation.
For instance, design changes are inevitable but schedules are often inflexible. Tools are now allowing for small changes to be quickly processed by only re-implementing small parts of the design, making iterations faster after each change. The latest tools can take a placed and routed design, this allows a designer to make ECO changes such as moving instances, rerouting nets, or tapping registers to primary outputs for debug—all without needing to go back through synthesis and implementation.
EECatalog: Partial reconfiguration on-the-fly is something major FPGA vendors have been talking about for a while. What’s new?
David Myron, Xilinx: Partial reconfiguration technology allows dynamic modification of FPGA logic by downloading partial bit files without interrupting the operation of the remaining logic. Designers can reduce system cost and power consumption by fitting sophisticated applications into the smallest possible device. This has been particularly useful with our customers developing space applications, software defined radio, communications, video and automotive markets. Using space systems as an example, ‘upgrades’ via partial reconfiguration reduce non-volatile rad-hard memory requirements—an expensive and limited resource on in-flight systems. Partial reconfiguration is available in the full line of 7 series FPGAs and Zynq-7000 SoCs, with new capabilities including dedicated encryption support and partial bitfile integrity checks.
Kendrick, Lattice: PROTF (Partial Reconfiguration On the Fly) has been an interesting area of research for many years. The latest advances by certain FPGA vendors, while showing solid progress, still leave a lot of issues unresolved.
The primary obstacles to PROTF have always been more “design-flow” oriented than “silicon enablement” oriented. The “silicon enablement” challenge has been largely understood, and solved, for many years; however, it carries a significant silicon area overhead and so is not economically viable unless the customer’s designs actually leverage the PROTF capabilities. On the other hand, the “design-flow” challenges are quite substantial, and remain unsolved. As one of many examples, users will need a method to simulate (and debug) their design functioning during reconfiguration to ensure that their system level design is operating correctly. While certain vendors have recently demonstrated design flows that deploy PROTF when targeting a very narrow set of highly algorithmic, computationally intense problems, no one has demonstrated any capability to deliver such benefits to the design flow for “typical” digital logic systems.
EECatalog: FPGAs get bigger, denser, and more SoC-like. What is do-able today that was unheard of only 3 years ago?
Kendrick, Lattice: Not all FPGAs are getting bigger, and the market for lower density devices is growing. For example, while the breadth of densities that Lattice offers is increasing, we are more focused on creating the lowest cost, lowest power solution at a given density. For instance, our MachXO2 FPGA, despite its low cost and low power, includes hard logic for commonly used interfaces, including SPI and I2C. Our mixed signal Platform Manager product integrates analog circuits with programmable logic specifically to reduce the cost of power management within more complex systems. Our iCE40 FPGA uses an extremely small (and unique) non-volatile programming cell combined with an innovative programming architecture to enable a new low cost standard for programmable logic.
Myron, Xilinx: Access to “bigger” devices is a natural customer requirement. The “denser” devices, particularly All Programmable 3D FPGAs, open more opportunities in test, measurement and emulation markets. The density and integration of the fabric—including CLBs, Block RAM and DSP blocks—allow performance levels that are not available in multi-chip solutions because of chip-to-chip delay.
SoC [FPGA] architectures such as Zynq alleviate multi-chip solutions, and have opened up new markets requiring high speed signal processing and real-time responsiveness. Having the complete processing system linked to the FPGA fabric allows architects to partition their design into software in the processing sub-system or accelerators in the FPGA fabric, all on one integrated chip.
EECatalog: The fastest growing markets on the planet deal with wireless connectivity. FPGAs have a strong play in the infrastructure—but what’s required to get their power down enough to be deployed in the actual battery-powered embedded device? Does this affect other markets/systems as well?
Kendrick, Lattice: There are at least two distinct markets: the bandwidth-driven wireless infrastructure market and the power-driven mobile device market.
First, to answer whether FPGA power can be sufficiently reduced, it already has been. Our iCE40 and MachXO2 FPGA families achieve both mobile-friendly static power levels (~10-50µW) and consumer market-friendly costs (~$1.00 ASP).
Yes, there are significant tradeoffs required at every level of the ecosystem in order to develop products for one market versus the other. Fundamentally, one ecosystem is driven by high-speed switching, while the other is driven by low-power operation. With that in mind, the following tradeoffs must be made:
- Speed/Power Process Tradeoff: The types of processes that are used to design bandwidth-driven infrastructure FPGAs have far too much static leakage power to also support mobile devices, while the processes that can support mobile devices with very low static leakage power have slightly slower transistors.
- Design Tradeoff: Today many FPGAs are designed using NMOS pass gates in the routing fabric (for cost and speed), while low power mobile FPGAs must employ full CMOS pass gates in the routing fabric. One design cannot effectively support both markets.
- Interface Standards: The infrastructure market demands very high-performance IOs – from high-speed SERDES (PCIe, etc.) to high speed memory interfaces (such as DDR3). The mobile market has a very different set of interface standards; for example, the MIPI Alliance is driving a new set of very low power IO interfaces such as D-PHY and M-PHY. So, the infrastructure and mobile ecosystems have very different IO interface requirements and one design cannot effectively support both markets.
- Package Requirements: The infrastructure market demands very high IO counts (typically ~400-800), which drive very large and expensive packages (currently flip-chip is the technology of choice while, most recently, 3D/TSV package technology is being developed). The mobile ecosystem is at the opposite end of the spectrum, where size and board space is at a premium. As a result, the focus here is on small packages (typically 2mm x 2mm) with fewer IOs (typically ~20-40) and aggressive ball pitch (typically 0.4mm) in order to maximize IO count while minimizing board footprint.
These two unique markets drive two fundamentally different FPGA solutions – and the differences exist at every level.
EECatalog: The two biggest features of FPGAs are parallelism and raw bandwidth/throughput. What’s new in these areas at the chip- and system-level?
Kendrick, Lattice: FPGAs certainly provide designers the ability to implement parallel algorithms, and thus increase a system’s throughput if this is applied to a bottleneck. Lattice, for example, provides a complete system building solution with our LatticeMico System Builder, and also unique to the industry the company provides a choice of both a 32-bit microprocessor and 8-bit microcontroller. So, designers can quickly build custom platforms that have parallel engines, and marry that to the amount of serial processing power they need.
Myron, Xilinx: Communication protocols continue to require higher line rates and throughput from generation to generation. The latest devices provide up to 28 Gb/s transceivers, and soon we’ll see 32+ Gb/s and 56Gb/s transceivers to support next generation protocols and beyond. Yet with higher line rates comes the challenge of ensuring high channel quality in the context of the system. As signals travel across a printed circuit board (PCB), the high-speed components of the signal get attenuated. This is why auto-adaptive equalization is imperative for transceivers—to automatically compensate for any channel-driven signal distortion. As an example, network line cards can be moved from slot to slot on a system’s backplane while still maintaining high signal integrity– despite the fact the channel lengths have changed. These auto-adaptive equalization solutions are already available in the Xilinx 7 series FPGAs and will be optimized further in our next generation devices.
Higher in-coming data flow requires greater parallelism and wider data busses inside the FPGA. Current FPGAs at 28nm handle the most aggressive requirements of today. To support next generation serial bandwidth requirements, improvements in both silicon and tool fabric are needed. Silicon fabric will need to be optimized across many architectural blocks, along with improvements in routing architecture to support as much as 90% device utilization, which is a challenge in the industry today. Furthermore, design tools need to be “co-optimized” with devices to ensure designers get maximum value. Next generation routing architectures in the silicon, for example, have to be coupled with advancements in routing algorithms in the tools.
Chris A. Ciufo is senior editor for embedded content at Extension Media, which includes the EECatalog print and digital publications and website, Embedded Intel® Solutions, and other related blogs and embedded channels. He has 29 years of embedded technology experience, and has degrees in electrical engineering, and in materials science, emphasizing solid state physics. He can be reached at email@example.com.