Maximize DDR DRAM Efficiency with a CAM-Based DDR Controller Architecture
System-on-Chip (SoC) designs are integrating more systems and functions to meet market demands for improved user experience. More often than not, these systems and functions require access to off-chip DRAM to support the required increase in compute performance. As not all of the SoC systems access DRAM in the same fashion, with the same traffic pattern, or with the same bandwidth requirement, designers must find the right technique to efficiently balance off-chip DRAM access for all of the systems on the SoC.
Multiple systems requesting data, sometimes simultaneously, can lead to random traffic, but accessing DRAM randomly is inefficient. While a properly architected DDR controller will execute traffic requests out of order to improve traffic efficiency, not all DDR DRAM controllers are designed with an adequate architecture.
This article will compare traditional first-in, first-out (FIFO)-based bank queue DDR controller architectures with a unique real-time traffic management architecture that leverages a fully associative content-addressable memory (CAM). The article will demonstrate how the CAM-based DDR controller architecture maximizes DDR DRAM efficiency associated with complex traffic. Only a CAM-based architecture can meet the system bandwidth requirements of multifunction SoCs.
Access to off-chip DDR SDRAM has been an integral part of SoC design for many years. When considering DDR interface IP, SoC designers have a choice of whether to make the IP themselves or to license it from a third-party IP provider. In making this choice, they need to consider many business-related as well as technical factors. For discussion purposes, this article will focus on the technical criteria for selecting DDR interface IP.
As with most IP, criteria for selecting DDR interface IP includes area, power, features and performance. As more features and applications are integrated into SoCs, the required processing power of the CPU and other processing functions must increase as well. The increase in processing power typically goes hand-in-hand with the increase in DDR bandwidth requirements, thus making performance the most important criteria in DDR IP selection. As an increase in frequency is the most common contributor to an increase in performance, performance is often thought of in terms of MHz. However, with a DDR SDRAM subsystem, increasing performance involves a lot more than simply increasing frequency.
Originally, DDR SDRAM was architected with one primary purpose, which manifested into the DDR SDRAM architecture we are familiar with today. The primary purpose was to create an off-chip memory that was also low cost in terms of size and pin count. SDRAM memory cells are very small and constructed of a single pass gate transistor and capacitor to store the charge. The interface uses few pins by sharing the same bus for reads and writes as well as address pins for column and row. While the primary goal of achieving low cost was met, it lead to an off-chip memory architecture that cannot be accessed in an efficient manner with random address requests. To accommodate the inefficiencies in data being written to or read from the SDRAM, SoCs require the use of a controller to manage the access to the DDR SDRAM. Most memory controllers today take into account the inefficiencies of the DDR SDRAM and attempt to group commands sent to the DDR SDRAM to minimize dead cycles and increase the efficiency (bandwidth) in accesses to the SDRAM by the SoC.
DDR SDRAM controllers traditionally reorder traffic using a FIFO architecture to evaluate the traffic request queue (often referred to as a look-ahead). Later technical advances have introduced a DDR controller that leverages a content addressable memory (CAM) look-ahead architecture to reorder traffic. This article will demonstrate how the CAM look-ahead architecture provides higher efficiency than the FIFO look-ahead used in today’s DDR SDRAM controllers.
DDR SDRAM Access Characteristics
Fully comprehending efficiency requires a keen understanding of key DDR SDRAM access characteristics. If every access to SDRAM memory took the same amount of time, it would be easy to determine the efficiency of a memory subsystem interface. The construction of DDR SDRAM memory devices and subsystems, however, leads to different access durations for different types of requests from the DRAM.
DDR SDRAM chips contain multiple independent memory banks--typically eight banks, as shown in Figure 1. A bank is idle, active, or changing from one to the other. An “activate” command “opens” an idle bank and reads the specified row data into an array of sense amplifiers where the data is stored during all read and write operations.
|Figure 1: DRAM with 8 banks, including row, column and sense amplifiers|
This process takes some time, adding overhead before reading data from any given row. When stored in the sense amplifiers, however, access to the stored data is faster. Each read or write command uses a column address to access the data within the row.
When the memory controller wants to access a different row, it must first return that bank's sense amplifiers to an idle state, ready to sense the next row. This is known as a "pre-charge" command, or "closing" the row. There is a minimum time that must transpire while the bank becomes fully idle and it can receive another activate command.
The hierarchy of access times, from longest to shortest, is as follows:
- Accessing a row when a different row is open (requiring the open row to be closed and a new row to be opened)
- Accessing a closed row (requiring that the row be opened)
- Accessing a currently open row
In addition to access times, there are a variety of other timing considerations in memory controller implementations (e.g., refresh, power down, and initialization). For example, switching the memory subsystem from a read to a write, or from a write to a read, causes delay associated with turning the interface bus around. If this happens too often, it reduces the overall efficiency of the data being transferred to and from the SDRAM.
Data transfer efficiency is a measure of the amount of the usable data transfer bandwidth available through a memory interface. Efficiency is commonly represented as a percentage of the theoretical maximum memory transfer bandwidth achieved by a particular memory interface implementation.
For example, if a DDR3 SDRAM is 8 bits wide, operating with an 800 MHz clock, the theoretical maximum transfer rate is 1,600 MBps. If the SDRAM achieves an average transfer rate of 800 MBps, the memory controller efficiency is 50%. Typical efficiencies for various memory controller implementations can vary from 25% to more than 90%. It is easy to see that an inefficient implementation may significantly impact key system characteristics, increasing overall solution cost.
In some cases, high data transfer efficiency is difficult to achieve due to the random traffic patterns originating from the requesters in the SoC. As open rows have much faster access times, if memory requests stay with an open row a high percentage of the time, then the theoretical maximum bandwidth can be achieved during those accesses. If memory accesses are scattered, then accesses to the same row may seldom happen, leading to longer access times for accessing different rows, which then lowers the average access time and overall data transfer efficiency.
Clearly, a memory controller that can evaluate traffic patterns and identify possibilities for sequencing operations in a more efficient manner—such as grouping accesses to the same memory rows together instead of just executing them as the memory access request comes in, or providing quick access to high-priority data—can mitigate the effects of inefficient traffic patterns. A DDR memory controller that has the capabilities and features to adeptly manage random traffic can significantly improve efficiency.
DDR Controller Efficiency Comparison
Figure 2 illustrates efficiency results generated from a FIFO-based DDR controller that has led several market benchmark studies. The example patterns represent three types of traffic, labeled pattern_80_20, pattern_50_50 and pattern_20_80. The nomenclature of the label indicates the type of pattern: The first number in the label indicates the percentage of traffic that is sequential or incremental, and the second number is the percentage of traffic that is random. As the percentage of the random component increases from 20% to 80%, the efficiency decreases as expected. The sequential component of the pattern is a traffic request to an open page, which is the most desirable condition, delivering the highest efficiency. The random portion of the traffic is either an access to a closed page or an access request to a bank with a different page open. Looking more closely, pattern_20_80 yields about 55% efficiency, pattern_50_50 yields about 60% efficiency and patter_80_20 yields about 75% efficiency.
|Figure 2: Efficiency of a benchmark-leading FIFO DDR controller for random and sequential patterns|
Figure 3 illustrates the efficiency results for a CAM-based DDR controller executing the same examples of three patterns. The results for the CAM with 32 entries are greater than or equal to the results from the FIFO-based DDR controller, and the efficiency results for the 64-entry CAM are significantly higher. The efficiency results for the 64-entry CAM-based DDR controller are approximately 98% for pattern_80_20, 80% for pattern_50_50 and 65% for pattern_20_80, demonstrating that the CAM-based architecture delivers a significant improvement in efficiency compared to the FIFO-based controller—which translates to improved bandwidth.
|Figure 3: Efficiency results from CAM-based DDR controller|
Because of the bank architecture of DRAMs, designers have had to work very hard at allocating memory access to DDR SDRAMs so that the SoC cycles through the eight available banks. Cycling through banks in patterns allows the controller to work within the confines of the bank architecture to deliver reasonable efficiency. However, some SoC systems do not send regularly cycling traffic between the different banks to the DDR controller, and that is where a CAM-based architecture excels. A CAM-based architecture can leverage the entire command queue to reorder even the most random traffic patterns for very high efficiency.
Pattern_random1 and pattern_random2 are two different kinds of very random traffic that do not cycle banks. Pattern_random/sequential combines random patterns with similar sequential patterns to those that were discussed in the earlier examples. In Figure 4, the efficiency for the FIFO_CTL increases for pattern_random/sequential, but the efficiency numbers from the CAM-based controller for all the patterns represented on the chart are greater than the FIFO_CTL’s efficiency.
|Figure 4: Efficiency results for random traffic comparing CAM architecture to FIFO architecture.|
This article demonstrates how the CAM-based DDR SDRAM architecture can improve performance for different traffic pattern types. The CAM-based DDR controller architecture has demonstrated the capability to deliver higher efficiency than the FIFO architecture for sequential traffic patterns as well as very random traffic patterns. The CAM architecture is ideal for applications where the traffic profiles are difficult to predict because there are multiple masters requesting access to DRAM. The increasing complexity and the number of applications on SoCs that require access to DDR SDRAM will tend to request access to DRAM in a random fashion, which makes the CAM-based DDR SDRAM controller architecture ideal for complex SoC designs.
Luigi Ternullo serves as senior product marketing manager for DDR controllers at Synopsys. Prior to joining Synopsys in 2010, Mr. Ternullo served as director of product marketing for the DDR interface IP product portfolio at Virage Logic, and previously held technical marketing and engineering management positions at Agere Systems, Vanguard International Semiconductor, and IBM. His range of experience includes SRAM and DRAM development as well as memory and logic built-in self-test (MBIST and LBIST). Mr. Ternullo has over 18 years of industry experience, holds over 25 patents in BIST and memory design, and has authored several technical papers. He holds a BS and MS in Electrical Engineering from Rochester Institute of Technology and an MBA from Lehigh University.