Published in issue of Chip Design Magazine

Caution: Clock Crossing

A prescription for uncontaminated data across clock domains

In today's ASIC design environment, the goal is to create a robust synchronous design. However, in many designs it's not possible to run the whole system on one clock, which means that blocks have registers and logic in them running on different clock domains. Sending signals across these clock boundaries can cause data integrity problems if the signals are not synchronized properly. In this article, I will describe many of the problems that a designer will encounter in sending signals across clock domains. I will also offer some solutions that are currently in use to address those problems. Finally, I will describe a usable methodology that integrates and implements the solutions to create a robust synchronous design.

There are two parts to the methodology presented here. The first is how to limit the number of signals that cross clock boundaries and the second is how to ensure data integrity when crossing clock boundaries. The goal of this paper is to illustrate to a designer unfamiliar with crossing clock boundaries how to accomplish both goals.

A designer always faces some inherent pitfalls when crossing clock boundaries. ASIC designers today use synchronizers to avoid these pitfalls. A synchronizer will take a signal that is asynchronous to a particular clock and run the signal through registers, so that the signal changes on the actual clock edge that the signal was synchronized to. The biggest problem in sending a signal across a clock domain is metastability.

Metastability is a condition that occurs when the output of a register is in an unknown, fluctuating state. Clocking signals into a flip-flop without the proper setup times can cause metastability. A synchronizer makes a circuit immune to metastability problems. For the purposes of this article, the term, "immune," means that there is an acceptably small probability of synchronizer failure.

The two-stage synchronizer

There are a number of different kinds of synchronizers. The one that is most common is the two-stage synchronizer. Although useful in many situations, the two-stage synchronizer also has limitations. It's only effective when the signal being passed to another clock domain is in the high state, long enough, to be caught. (see Figure 1)

Figure 1

In addition, we may not always have the luxury of passing a clock from a slower clock domain to a faster clock domain. Problems can occur when a signal from a faster clock domain needs to be synchronized into a slower clock domain, or even if the two different clocks are running at the same speed. The designer needs to be sure that the signal being passed will be caught on the clock edge of the clock domain that the signal is being passed to. If a signal in the faster clock domain is only a pulse wide, however, and happens to assert and de-assert before a clock edge for the slower clock domain can clock it into the synchronizers, then the signal will be missed. The designer must be sure that a signal is asserted/de-asserted long enough for the synchronizer to catch it. (see Figure 2)

Figure 2

The DEMET flip-flop did not propagate the active Signal_to_sync signal to the other clock domain. Therefore, the DEMET flip-flop did not see the Signal_to_sync signal as high when a rising clock edge of Clock_one occurred. This is either because the Signal_to_sync pulse had a rising edge after the clock's rising edge and the Signal_to_sync then had a falling edge before the clock's next rising edge, or because the Signal_to_sync had slow rising, or falling times.

Similarly, when two clocks are running at the same speed, there is no guarantee that the rising and falling times are sufficiently small to be ignored. If the DEMET flip-flop has a higher threshold point, then as the Signal_to_sync signal is rising, there is no guarantee that the DEMET flip-flop will register this signal as high. This signal could then fall to a zero before the next rising clock edge, and the DEMET would not propagate the active signal through to the other clock domain. If the signal is, in fact, not active/inactive long enough to be caught by the two-stage synchronizer, then the designer can use a capture synchronizer instead.

The capture synchronizer will work regardless of the different clock speed combinations. In the initial state of the flip-flops, CAP, DEMET, SYNC1, and SYNC2 are all low. The capture flip-flop, CAP, will read in the inverted value of what is now in the CAP flip-flop on the rising edge of the signal, Signal_to_be_CAP. This is the signal we want to capture.

The output of the CAP flip-flop will propagate through to the DEMET flip-flop, and the value on the DEMET register will change on the next rising clock edge of Clock_one. The output of the DEMET register will then go to SYNC1, and be captured in that register on the rising edge of Clock_one. The output of SYNC1 will go to two places. It will go the SYNC2 flip-flop and to one of the inputs to the XOR gate.

Small Changes

For one clock cycle, the values of SYNC1 and SYNC2 will be different, and the output of the XOR gate will be a pulse with a width of one clock cycle of the clock period that the signal was being synchronized to, Clock_one. Sync2 will then change, and make the output of the XOR gate go low again. The designer needs to design the circuit, so that a pulse is all that is required from the output of the synchronizers.

In order to catch and propagate through an inverted pulse, or a low pulse, instead of the regular pulse, however, the capture synchronizer needs a couple of small changes. The Signal_to_be_CAP needs to be inverted, so that the CAP register will change when the Signal_to_be_CAP transitions from a high to a low. And, the outputs of registers SYNC1 and SYNC2 need to go to an XNOR gate, instead of an XOR gate.

One thing a designer needs to be aware of, is that the capture flip-flops change state on the rising edge of the Signal_to_be_CAP. If the Signal_to_be_CAP is high for two pulses in a row, the capture flip-flop will only clock once. The design needs to be created such that this scenario will not cause problems in the design.

Using capture synchronizers, in general, will make the design more robust and require less redesign as new clock speeds are introduced. This in turn makes the design better for reuse. The capture synchronizer is larger than the two-stage synchronizer, however, so it's left up to the designer to decide which to use when either will work in a design.

The capture flip-flops are not on the regular clock tree, because they are clocked by the output of another register. This could be a problem when inserting SCAN. However, these testability issues are beyond the scope of this article.

Glitch problems

When consolidating signals, a designer has to be aware that problems can arise from glitches. The solution to this problem can be approached in two ways. The first is, if the synchronizer does capture a glitch, then the design needs to be able to handle the accidental glitch and maintain data integrity. This can be difficult and risky. The second and better solution is to make sure that there are no glitches crossing clock boundaries at all.

A designer can eliminate glitches crossing clock boundaries in two ways. The first is by registering the signals before they are synchronized. The second is by making sure that the signals switch at times that will not cause a glitch. (see Figure 3)

Figure 3

Signals A and B in Figure 3 are changing very close to each other. Signals A and B are both changing on the same clock edge, which is different from Clock_one. Due to timing delays, however, Signal A changes a small but perceptible moment after signal B. This causes a very small glitch, even as small as tens of picoseconds wide. The signal Signal_to_sync crosses a clock boundary via a synchronizer. In this case, if a rising clock edge occurs as the glitch occurs, the synchronizer could possibly capture a false Signal_to_sync active signal and propagate it through. The results could be disastrous to the circuit.

To fix this problem, the designer should register the Signal_to_sync signal before it's synchronized. This register needs to be clocked on the same clock that both A and B are changing on. (see Figure 4)

Figure 4

This synchronization eliminates any glitches, because the output of any register is stable. It has become a common design practice to register outputs before crossing clock boundaries and before outputting them to another block. This way, another designer who may end up using your signals will be sure that those signals will be glitch free.

The register solution will cause an extra delay of a clock cycle on the starting clock domain, Clock_two, but no glitches will propagate through.

Additional Options

The second option available to the designer that will not incur this delay is to make sure that signals A and B are not changing close together, in which case the signals will never cause a glitch. This second option requires the designer to list every possible circumstance where these signals might change, and make certain that these circumstances will not cause a glitch. However, this may be very difficult to do.

Any glitch problems inherent in the design will appear during gate-level simulations, and will also appear on silicon due to layout and timing delay issues. It's good design practice, therefore, for the designer to check all of the signals that cross clock boundaries for glitch problems.

We have already seen the difficulties in crossing clock boundaries and, hopefully, the previous discussion has impressed upon you why crossing clock boundaries should be avoided if possible. The most important thing a designer can do to limit the signals crossing clock boundaries is to plan the design with this goal in mind. Start by figuring out before the design is created exactly where crossing a boundary is necessary and where it can be safely avoided.

If a state machine is to be located in a clock domain, the designer should know which clock domain (or domains) the signals that interface with the state machine are coming from, or going to. The designer should choose to locate the state machine in the particular clock domain that the state machine is required to interface with most frequently. (see Figure 5)

Figure 5

This will save on the number of synchronizers that are needed (in other words, will save die area), and will limit the amount of time that passes before operations are complete. When sending signals across clock boundaries, always remember that it takes time for them to cross to the other clock domain.

A Single Domain

Meanwhile, it's important to keep all of the "smarts" in one clock domain. It's difficult to pass signals to registers, that hold a status in the design across clock domains -- a status is getting updated at a delayed time, or there is some waiting time until a status register is read. Delay in loading a status register may cause problems if that status is required immediately. This read could result in a false status due to the delay of the newly loaded status. Therefore, it's best to try to keep all status register updates in the clock domain in which they will be read and written.

Sometimes, it's necessary to send data across a clock boundary. The most obvious solution here is to synchronize all of the data. This may require a lot of registers, though, and could make the design too large. Additionally, there is no guarantee that all of the data signals will be synchronized at the same time, on the same clock edge. In which case, not all of the data signals may get there on the same clock edge, perhaps due to the slow rising or falling times of a signal as discussed earlier. It's actually a better idea to send across a valid signal to indicate when the data is valid. This strategy allows you to cut the number of registers being used. The designer just needs to make sure that the data stays stable whenever the valid signal indicates the data is good in the other clock domain. (see Figure 6)

Figure 6

Another problem is signal arrival time. When a signal is sent across a clock boundary via a synchronizer, there is no guarantee as to which rising clock edge the signal will arrive on in the other clock domain. In other words, if there is more than one signal being synchronized across a clock domain, there is no guarantee as to which signal will arrive first, or if they will arrive at the same time. Consolidating these signals before they are sent, will reduce the number of registers and it will eliminate the issue of which signal arrives first. When planning the design, it's important to eliminate the need for these arrival order dependencies in the design. This will help to eliminate problems, so always remember to carefully check for glitch occurrences when consolidating signals. (see Figure 7)

Figure 7

Methodology

Let's review the information up to this point, and put it together to establish a step-by-step design methodology to help ensure data integrity.

A -- Identify the different clock domains that are required. Make a list of the different clock domains that will be in your design first. Once you know this, it will be easier to see which clock domain to locate logic in.

B -- Identify the clock combinations that are possible for this design. Each clock domain may be able to run at more than one speed. This will mean that there will be more clock combinations that can occur. This will be important when deciding what type of synchronizer to use.

C -- Check the inputs and outputs of the block you are working in and list which clock domain each comes from and goes to.

D -- Begin creating the layout of the blocks of logic that will satisfy the design requirements into smaller blocks with each having a specific function. We now know which clock domain all of the inputs are coming from, and which clock domain the outputs are going to. We also know all of the possible clock combinations. So, we can start creating the functionality of the design and begin locating logic in the proper clock domains.

E -- Begin the design while keeping in mind as a goal limiting the number of signals that need to cross clock boundaries. As you begin creating logic, ask if there is any way to reduce the number of signals that can cross clock boundaries.

F -- Locate state machines and status registers in the clock domain that they are used in.

This will save a lot of registers in your design and make the design more efficient. If you can keep from crossing clock domains too many times, then you can save a lot of waiting, dead time, in your design.

G -- Make a list of all of the signals that still need to cross clock boundaries.

H -- Check these signals to see if any can be combined.

I -- Check the signals to see if any are buses and can have a valid signal passed, instead of registering all of the data bits.

J -- Check to see if any of the logic needs to be transferred to a different clock domain in order to reduce the number of synchronizers.

K -- Check the signals that are combined to see if they can be registered. Check to make sure that they will not cause glitch problems if they are not registered. This step can cause problems if you are not careful in your analysis. It's better practice to design the system so that you register the signals before they cross clock domains, in order to avoid glitch problems entirely.

L -- Check the signals to be synchronized to see what type of synchronizer is required. This is where the clock combinations become important. Check all of the clock combinations to make sure that the signal will not get lost as it crosses clock domains.

M -- If a capture synchronizer is required, make sure that the logic that uses the synchronized version of the signal uses a pulse, as indicated above.

In summary

This article has discussed the circumstances in which a synchronizer is needed and how it works. Hopefully the reader now understands why it's important to avoid crossing clock boundaries if at all possible. The discussion also explored a methodology to help designers avoid crossing clock boundaries and better ensure data integrity when it is necessary.

Make sure when planning the design to identify the proper number of synchronizers used goes into this planning. It's much easier to implement the correct number of synchronizers in a design in the first place, than it is to add them later on in the design cycle and possibly compromise data integrity.

The benefits in using this methodology are obvious and include area savings through limiting the number of synchronizers used, and the ability to ensure data integrity. Evaluating the synchronizing problems up-front in the design process will also limit the number of last-minute changes needed as a result of exhaustive testing. We all know that attempting to make changes after the RTL has been completed can cripple a design. The methodology described here will help to limit those changes, and will also limit the number of re-spins required when avoidable mistakes slip through the cracks.

Acknowledgments

The author would like to thank Greg Moller, Ajaz Siraj, and Jeff Ware for their guidance and support through a difficult design project. Their knowledge of crossing clock domains and willingness to share this information made the finished product possible and provided the impetus for this article.

Roy H. Parker is an ASIC design engineer for the Personal Storage Group at Seagate Technology, Inc. (Scotts Valley, CA)


Tech Videos

MAGAZINE

  • Download the latest issue of the Chip Design Magazine
    and subscribe to receive future issues and the email newsletter.

Chip Design Research

Are you up-to-date on important SoC and IP design trends, analysis and market forecasts?

Chip Design now offers customized market research services.

For more information contact Karen Popp at +1 415-305-5557

Calendar Of Events

©2014 Extension Media. All Rights Reserved. PRIVACY POLICY | TERMS AND CONDITIONS