Published on August 24th, 2009
The increasing need for format agile applications plays directly to the strength of programmable logic-based systems. While programmability improves time-to-market, it also allows hardware designs to be format agile like software applications. This means that hardware features can be customized on-the-fly for changing protocols or varying geographical standards in mobile application. It also facilitates delivery of enhancements and fixes after product deployment.
Most users of programmable logic know this, however using programmability as a system feature is rare because mechanisms for integrating it into a design tend to be highly customized and complex. This article surveys some techniques for managing multiple bitstream images and safely updating these bitstream images in the field, thereby enabling the format agility capability in a hardware design.
Since certain device manufacturers provide different levels of support for this functionality (even from device family to device family) we will examine the manner in which different devices enable multiple bitstream support and safe bitstream image update for a selection of devices from the two largest programmable logic manufacturers: Altera and Xilinx. In addition, we will examine the use of a manufacturer independent bitstream management solution called SystemBIST provided by Intellitech Corp.
The functional steps that will ensure a viable solution best describe the components of a bitstream management system are illustrated in Figure 1. Those steps are:
[Figure 1. Block Diagram of the Nine Functional Steps]
Most devices have features that assist in the implementation of these functional steps but ultimately some infrastructure- be it hardware or software - is required. In the next section, we will examine a number of popular device families and survey the support they provide for multiple bitstream management.
Although there are many programmable logic vendors, the two market leaders are Altera and Xilinx. We will review a selection of the newer device families from each of these vendors. Xilinx provides two main device families: Virtex and Spartan. Broadly, the Spartan family targets high volume, low cost markets and the Virtex family targets the high end, high speed, complex functionality sector. Each device has slightly different support for multiple bitstream management.
For the Xilinx Spartan3A family, end users may either instantiate access to the internal configuration access port (ICAP) and have their design trigger a reboot to a new device image or the address can be loaded as part of the initial bitstream. In either case, the trigger for the loading of the new image is under control of either an end-user supplied pin on their design or a special instruction loaded into the boundary-scan (JTAG) test access port. The Spartan-3A has a watchdog timer that checks for valid bitstream loading and signals failure on the INIT pin if loading does not complete and loads the bitstream at address 0x0. Multiboot is available only for parallel (BPI) modes for this device.
For the Xilinx Virtex-5, Virtex-6 and Spartan-6 families, end users may either instantiate access to the internal configuration access port (ICAP) and have their design trigger a reboot to a new device image or the address can be loaded as part of the initial bitstream. In either case, the trigger for the loading of the new image is under control of either an end-user supplied pin on their design or a special instruction loaded into the boundary-scan (JTAG) test access port. For the Xilinx Virtex-5 and Virtex-6, a watchdog timer that checks for valid bitstream loading and signals failure on the INIT pin if loading does not complete and loads the bitstream at the address determined by the user value applied to the 2 revision select (RS) pins and lower address bits set to 0x0. The RS pins are used to segment the memory space and are connected to the 2 higher order address pins. Note that this results in a coarse level of memory segmentation that may result in significant gaps in your memory map. For instance, a 70Mbit bitstream would have to fit in a 128Mbit address space, so four 128Mbit areas of flash (512Mbit total) would be used instead of 280Mbit.
For the Spartan-6, a watchdog timer checks for valid bitstream load completion and signals any failure on the INIT pin. If loading does not complete, the initial bitstream is retried 3 times. If that still fails, a user defined “fallback” bitstream at a user-specified address is attempted 3 times. If that, too, fails it the bitstream at address 0x0 is loaded. Multiboot is supported in both parallel (BPI) and serial (SPI) modes for these devices.
Altera has trisected the programmable logic covering the high end with the Stratix family, the mid-range with the Arria family and the high volume market with the Cyclone family.
The Altera Stratix IV and Arria II devices provide a methodology based on the use of an end-user logic and a mega-function that grants access to on-chip hard logic to control system updates. This mega-function is provided as part of the Altera design kit and must be in the bitstream image located at address 0x0. The mega function sets up the image address, sets the watchdog timer and returns status information. Control and management of image selection and booting is always initiated using the logic located at address 0x0. Like Xilinx’s approach, the expiration of the watchdog timer or an error condition can automatically reload the configuration image at location 0x0 or at page 1 (corresponding to address 0x100000). The methodology is only supported in serial (SPI) mode.
The Altera Cyclone III device is similar but has a slight variation in its default addressing scheme because it is supported in both active serial and active parallel modes. The active serial mode functions identically to the Stratix IV and Arria II but in active parallel mode, the initial address is set to 0x100000 but can be modified to any value using a special boundary-scan instruction.
All vendors support a variety of configuration modes with parallel modes providing the fastest device configuration times. The parallel modes also require the largest number of interface pins (up to 48) to connect the address, data and control signals of the associated parallel NOR flash. While it is possible to reuse these pins for mission functions post configuration, it is not recommended by the FPGA vendors in format agile applications because of the implementation complexity. The NOR Flash mode incurs a significant cost to mission-available IO.
The different vendors provide different levels of support for managing multiple device configuration images. There is significant variation from device family to device family and support available only for specific configuration modes.
The ability to utilize these device features relies on having an appropriate hardware, software and programmable hardware infrastructure in your system. In the next sections, we will examine the requirements for those elements.
The common requirement for all FPGA devices is the need for an external configuration controller to manage the requests for new bitstream data, the receipt of said data and the programming of this data to the appropriate place in the configuration store. This same controller might initiate download to the target device and control the collection of operation status. The controller will also likely manage error handling. To accomplish this effectively, the controller needs to be tailored to the capabilities of the FPGAs and manage them appropriately. This customization would be a requirement of the system software.
In order to facilitate programming of the flash memories attached to FPGAs, vendors provide special IP. This special IP provides a pathway from the boundary-scan TAP to the memory bus. This allows programming of the attached flash memory to be controlled via the boundary-scan interface. The boundary-scan interface is common and available and re-using it for this purpose is practical and cost-effective. The IP is either included as part of every end-user design or loaded into the FPGA in place of your mission logic while programming is executed. The former case is used when system up-time needs to be maximized or when memory access and updates are high frequency events. The latter case requires cessation of FPGA mission operation while the flash memory is reprogrammed. While either of these approaches is manageable, they require thoughtful system design and represent a significant ongoing cost to the operation of the system.
It is important to note that in using this IP you are increasing the size of flash memory required because in addition to the actual mission logic, you need to store the programming logic bitstream.
The configuration controller required above may also utilize some sophisticated software to manage the data request, reception and validation algorithms. The software will likely have to be updated to accommodate the variety of features supported in each FPGA. It will also be necessary to incorporate error handling into this code. The software will likely ensure smooth switchover to new device images and controlled rollback when errors occur.
In some cases, software implements the flash-memory programming algorithm. In other approaches, the algorithm is encoded in a hardware state machine. In the former case, the system needs to include support software for the utilized flash devices. For instance, Xilinx requires compilation of the IP image, flash programming algorithm and flash data into SVF, XSVF or ACE files. You download these files to the FPGA device to configure the attached flash memory. Altera allows utilization of STAPL files for this purpose. The reason for this is that both vendors use the boundary-scan TAP to access the FPGA and load IP into the FPGA to gain access to and program the attached flash memory. This then requires the inclusion of some hardware (either a processor or a complex state machine) in the system to read and interpret the SVF, XSVF, ACE or STAPL file into the appropriate boundary-scan TAP signals.
End-users can always code native implementations for flash programming rather than make use of the vendor-provided solutions. As indicated previously, the vendor-provided solutions utilize the FPGA itself to program its attached flash memory and this results in significant system down time while flash programming is completed.
First, let us distinguish between the mission-mode function or logic and everything else. The mission-mode function or logic refers to the end-user valued functionality. That is, the product is a communications switch or a set top box – that is its mission. Test, verification, debug and diagnosis are not considered mission-mode functionality. This functions are valuable to the designer or manufacturer but of secondary or limited value to the end-user. All the functionality described in this article is therefore not mission-mode logic.
After understanding the building blocks that are required, the composition of the complete solution becomes clearer and is illustrated in Figure 2. At the heart of the system is some sort of controller – that may be a microcontroller – that needs to act as the master. It would likely tend to the external communications interface and initiate requests for new configuration data (in a polling environment) and process received configuration data. This controller would also be responsible for setting up the flash programming infrastructure (if it is programmable hardware) usually through the boundary-scan TAP of the target FPGA but possibly through functionality incorporated into the mission mode logic. Prior to the instantiation of the flash programming infrastructure, the controller should take the system off-line this is especially critical when the FPGA to flash connections are re-used for mission mode functionality. This ensures that the operation is not disturbed and that there are no spurious signals generated by the system during update. Then the controller can program the new image into the flash through the boundary-scan TAP or specially developed mission-mode logic of the target FPGA. The controller is also responsible for validating the flash contents after programming and then initiating configuration of the FPGA(s) using the appropriate technique dictated by the device targeted. This might involve asserting a signal on an external FPGA pin or sending a series of instructions into the FPGA’s boundary-scan TAP or a side channel. This process is repeated for each FPGA in the system that is to be updated. After the FPGAs’ configuration, the controller checks the status of the configuration operation and applies device-based functional vectors to ensure the configuration completed fully and successfully. Then, if necessary, the controller should restart the system or otherwise synchronize the updated image operation with the rest of the system.
If the controller has access to the boundary-scan TAP then this would likely involve loading a series of boundary-scan instructions into all the devices in the system to either reset the system or synchronize it. After ensuring the system operation is as expected by applying functional vectors using INTEST and EXTEST, the controller should reconnect the system to the outside world. If using separate side channels then there are significant challenges associated with developing a workable interface to accomplish this that would likely have to be customized for each FPGA family used.
[Figure 2.Custom Solution Block Diagram]
Obviously, this approach places significant demands on the configuration controller. Even though the configuration controller likely is used only occasionally, the computational demands may be such that its functionality cannot be shared and a separate controller needed. In addition, the software IP developed will likely have to incorporate specialized operations tuned to the specific capabilities of the FPGAs utilized in the system. This means that the software and potentially programmable hardware would need to be updated and maintained as new FPGA devices are introduced. This means that development time and expenses associated with a bitstream management system of this sort are ongoing and significant.
A custom approach allows you to leverage all the features included in the FPGAs you are targeting. In a system where such features are identical, this can be advantageous. In a mixed device environment where different FPGA devices may be utilized this could add complexity.
You can continually update you custom approach to take advantage of newer device capabilities to ensure an optimal implementation. Conversely, such customization ensures a continuing measurable effort on this maintaining the bitstream management system.
In many ways, a vendor independent and stable approach is beneficial. That is, regardless of the additional features the FPGAs provide with each generation, the heterogeneity of the devices creates a chaotic environment in which to provide a solution. The problem of ongoing maintenance of and enhancement to your management solution results in a continual expense in your system development. The need to accommodate new or otherwise modified features results in valuable engineering time wasted in the development and debug of functionality that is generally not a system value-add. That is to say, that the benefit of multiple bitstreams and format agility is primarily to the manufacturer who is able to use it to reduce inventory and maintenance costs. Because the perceived customer value is nil, they will not pay a premium for this functionality.
An off-the-shelf, pre-engineered solution that delivers the flexibility end-users need with a level of functionality that encapsulates and supports all the functional components of a bitstream management system would be beneficial. Such a solution promotes re-use and minimizes enhancements and maintenance associated with the introduction of and obsolescence of FPGA-specific features. There are two popular configuration PROMs that have built-in support for bitstream versioning. These devices, the Xilinx XCFP family and the Altera EPC4/8/16 devices, provide a versioning infrastructure but beyond that, version selection and update flow management is left as an exercise to the user. For this reason, we will focus on a more sophisticated and complete pre-engineered configuration solution in this paper.
The SystemBIST device from Intellitech is an example of a pre-engineered solution that incorporates the required intelligence for a system in a single configuration device providing a portable, transparent and secure solution.
SystemBIST is able to offload all operations related to bitstream management from the configuration controller of the previous section. SystemBIST provides a mechanism for FPGA programming, flash programming, bitstream management and update control. SystemBIST’s SPI interface is used for updates and for control from a CPU. It allows you to develop a solution that is forward and backward compatible across vendors and across vendor families and, unlike utilizes a single parallel NOR flash as its bitstream store. Any differences in support available in devices whether now or in the future are masked from your implementation.
The basic operation of the SystemBIST device is simple. The designer assembles a series of bitstream collections and device operations that are classified as tasks or suites related to:
[Other possibilities for these suites exist but these operations are essential for a complete bitstream management solution]
The Intellitech Eclipse software application constructs these suites and their internal sequencing without requiring the developer to have intimate knowledge of the system’s mission functionality or architecture. This could allow already burdened embedded software teams to offload this task. The construction of the suites examines all data in all suites, re-uses and compresses data where possible to ensure efficient memory usage. Bitstreams and their related operations are stored in memory contiguously rather than selected by address pins reducing memory cost compared to the FPGA-to-NOR flash method. SystemBIST supports FPGA vendor SVF files, FPGA vendor parallel bitstream files and third party boundary-scan SVF files. SystemBIST includes both JTAG and 8-bit wide native FPGA configuration. The 8-bit parallel mode can be used without requiring dedicated use of those pins for configuration.
The SystemBIST device itself processes the packaged suites as they are delivered.
The system power-up operations describe the steps to initialize the system. This can include driving specific pins to halt processors, three state busses, erasing sections of flash memories and anything that put the system into a known state. The mission mode functionality is a suite of bitstreams that defines the core functionality of the system. This is downloaded to the system’s FPGA devices. This may also include suites of functional vectors to test and verify the devices and the system as a whole. The default (fail-safe) functionality is a suite of bitstreams that puts the system in a known safe and useful state when any mission mode download or update fails. The failure recovery suite is called when functional tests fail to collect additional failure information and potentially shutdown the system or recover it to a safer state. The watchdog time-out handling suite is used when the watchdog timer implemented in SystemBIST device expires indicating that, say, a bitstream download (or any operation) did not complete in the expected time. The order of execution of these suites is under user control using the software GUI with flow specifications allowing branch on condition operations. For example, if the mission mode suite succeeds, you can specify to proceed to a verification suite; if it fails, you can specify to proceed to the failure suite. This information is encoded in the package of suites file (called a SystemBIST image file) generated by the PC software, downloaded and interpreted by the SystemBIST device.
[Figure 3. SystemBIST-based System Block Diagram]
After your initial system image is committed to flash memory, SystemBIST controls all updates of the flash memory with configuration images and controls download of those images to the on-board FPGAs. The flash memory updates occur without utilizing resources of the system’s FPGAs. No bitstreams are downloaded to the FPGA and the FPGA operation is not disturbed. The SystemBIST device tends to that while the whole system continues to operate in its mission mode. The defined suites can include verification of data as well as verification of device and system functionality and a user definable behavior if any step in the process fails. The single NOR FlashSince the flash memory update happens under control of the SystemBIST device your system downtime is minimized. The updated configuration images processed by SystemBIST represent only the incremental changes from the original SystemBIST image.
In a system that incorporates SystemBIST, illustrated in Figure 4, the system controller – that is likely a shared microcontroller needs to only tend to the external communications interface and initiate requests for new configuration data (in a polling environment) and process received configuration data. Upon receiving updated configuration data, the controller forwards the data to the SystemBIST’s SPI port and then can continue other operations only checking with the SystemBIST device to collect status information. The system controller can monitor the progress of the SystemBIST device update and decide when to take the system off line (if necessary). The SystemBIST device is responsible for programming the flash memory. The data update packets in the SystemBIST image specifically indicate the locations to be updated in the flash memory. The update packets contain address information and specify only incremental data change information to ensure efficient operation with minimal system down time, if any. The SystemBIST-associated NOR Flash is updated without disturbing the mission logic while it is operating. After verifying the programming operation, the SystemBIST device initiates configuration of the FPGA(s). The addressing of the appropriate configuration image is driven by SystemBIST independent of the features available (or not available) in the target FPGA. After the FPGA configuration, SystemBIST checks the status of the configuration operation and applies device-based functional vectors incorporated as part of the mission mode suite to ensure the configuration completed fully and successfully. Then, if necessary, SystemBIST can be directed to execute the system power-up suite to restart the system or otherwise synchronize the updated image operation with the rest of the system. If the system controller took the system off-line, it should reconnect the system to the outside world.
There are clear benefits to adopting a pre-engineered solution. You are isolated from device capabilities and develop your solution once – regardless of the target FPGA devices involved. Conversely, you cannot use any new FPGA device features released in support of bitstream management and format agility. The question then becomes one of a classic engineering trade-off – is design time better spent re-architecting the bitstream management system to exploit this new functionality or on a customer-visible feature?
In the examination of the development of the infrastructure for bitstream management systems for format agility, the system architect or designer faces significant decisions. New FPGA device functionality seems to be developed and obsoleted with every family from every manufacturer with each new generation. Does one commit to revisit the bitstream management system to exploit every new feature for every new application? Is there better value in adopting a device-neutral approach using pre-engineered solutions that guarantee forward compatibility and re-use? This paper argues that a general purpose, device manufacturer and device family independent solution that provides a flexible approach to multiple bitstream management and system update is the better choice. This mechanism affords designers the safety and security of a portable reusable solution that eliminates the development risk associated with bringing new devices and systems online and allows designers the opportunity to focus on revenue producing system features rather than system infrastructure tasks.
 XAPP 972: Updating a Platform Flash PROM Design Revision In-System Using SVF, Xilinx Inc., February 2009
 Remote Update Circuitry (ALTREMOTE_UPDATE) Megafunction User Guide, Altera Corp., April 2009
 AN 521: Cyclone III Active Parallel Remote System Upgrade Reference Design, Altera Corp., June 2008
 Configuration, Design Security, Remote System Upgrades with Stratix IV Devices, Altera Corp., June 2009
 Configuration, Design Security, Remote System Upgrades with Arria II Devices, Altera Corp., February 2009
 Remote System Upgrades with Cyclone III Devices, Altera Corp., October 2008
 XAPP 058: Xilinx In-System Programming Using an Embedded Microcontroller, Xilinx Inc., March 2009
 XAPP 441: Remote FPGA Reconfiguration Using MicroBlaze or PowerPC Processors, Xilinx Inc., Sept 2006
 XAPP 424: Embedded JTAG ACE Player, Xilinx, Inc., Apr, 2008
 SystemBIST SB1AC1-144 Datasheet, Intellitech Corp., 2008
 SystemBIST SB1AC1 SPI User Guide, Intellitech Corp., 2009
 UG 191: Virtex-5 FPGA Configuration User Guide, Xilinx Inc., February 2009.
 DS 160: Spartan-6 Family Overview, Xilinx Inc., May 2009.
 DS 150: Virtex-6 Family Overview, Xilinx Inc., May 2009.
 UG 332: Spartan-3 Generation Configuration User Guide, Xilinx Inc., March 2009.
Neil G. Jacobson (email@example.com) is Principal at Formidable Engineering Consultants and has been involved in the world of programmable logic for more than 14 years. During that time, he was a Principal Software Engineer at Xilinx, Inc responsible for the design and development of Xilinx’s configuration solutions. His work has resulted in 28 patents. He is chair of the IEEE STD 1532 Working Group on In-System Configuration and is responsible for the development of the 1532 standard that covers in-system configuration of programmable devices. He has published numerous technical papers and is the author of the textbook “The In-System Configuration Handbook: A Designer’s Guide to ISC” (Kluwer Academic Publishers. 2004.)