Tricking Haskell into state: how Clash's Signal type works

Page content

I recently came across a question on /r/haskell, where /u/netj_nsh asked whether Clash supports asynchronous circuit designs. They went on to ask whether designing with multiple, synchronous clock domains is possible. While the (very) short answers are no and yes, respectively, I figured I’d write a blog post clarifying these concepts and how they relate to Clash. Mostly though, I just wanted an excuse to write about Clash’s simple yet clever trick that makes it tick: Signal.

Combinational logic

In digital design, combinational logic is a type of logic circuit that has no memory. It’s a circuit that computes its output based solely on its inputs. This is very similar to the concept of a pure function in functional programming: a function that always produces the same output for the same input, with no side effects. For any function in Haskell we therefore know that we can translate it to a digital design if:

  1. The function is pure: it has no side effects. In Haskell this is enforced by the type system: the absence of IO in its type signature indicates it is pure1.
  2. All arguments and the return value are of a type that can be represented as a finite number of bits.
  3. Any recursion, if present, should terminate at compile time. I.e., no data dependent recursion.

Because Haskell is an excellent language to describe pure functions (subjective, I know), it should also be an excellent language to describe combinational logic. This leaves out a big chunk of hardware designs: most realistic designs have lots of local state. In fact, local state is often the only way to reach realistic clock speeds. To have any shot at becoming a good hardware description language, Clash should therefore be able to describe non-combinational (sequential) logic too. Still, you won’t find the typical way of keeping state (e.g., the State monad) in a typical Clash design.

The digital, sequential abstraction

Digital hardware designs introduce state through memory elements called flip-flops. These elements take a clock signal and a data signal. At the clock edge they’re sensitive to, they latch the signal on their data input (D) and propagate it to their output (Q). Though composed of more elemental components, they are often represented as a single box:

A basic flip-flop diagram with a clock input, a data input (D), and a data output (Q)

To reliably capture data, the incoming signal should be stable both before and after the active clock edge. These periods are called the setup and hold time and can vary wildly per fabrication process. The following picture marks the “unstable” period with Xs — data is allowed to change in this region and indicates the maximum propagation delay for the signal.

alt text

If we chain these together and mark the stable periods with a label, we get the following picture:

alt text

This is very close to how digital designers think: they think in terms of stable periods. Hence, an often heard phrase is: at clock cycle x, the value of the signal was y, even though the y might have flipped between many values during stabilization. Condensing the previous diagram to its essentials, we get a waveform diagram (in this case generated by wavedrom):

This abstraction is called the digital, sequential abstraction. Digital, because we don’t consider any signal levels between 0 or 1. Sequential, because flip-flops can introduce memory/state and we consider progressing time to be a sequence of stable values rather than a continuous phenomenon.

Signals

Using the digital, sequential abstraction signals over time can be modeled as linked lists. Each element represents the stable value at each clock tick. The default list ([]) implementation in Haskell already gets us pretty close:

data List a
  = []
  | a : List a

This allows us to write:

>>> 1 : 2 : 3 : 4 : []
[1, 2, 3, 4]
>>> [1, 2, 3, 4] -- syntactic sugar
[1, 2, 3, 4]

This is very close to what we want from a signal, but we’re missing two things:

  • Signals don’t really have an “end”
  • Signals have a clock domain associated with them

Hence, if we look at the definition of Clash’s Signal we see:

data Signal (dom :: Domain) (a :: Type) = a :- Signal dom a

We can use this definition in simulation:

>>> signal = pure 1 :: Signal System Int
>>> signal
1 :- 1 :- 1 :- 1 :- 1 :-
^C

To save some time, I pressed interrupt (^C) to prevent Haskell from printing indefinitely. Like lists, we can map over it:

>>> fmap (+10) signal -- More typically written as `(+10) <$> signal`
11 :- 11 :- 11 :- 11 :- 11 :-
^C

We can also combine multiple signals and perform some computation on them2:

>>> liftA2 (+) signal signal -- More typically written as `(+) <$> signal <*> signal`
2 :- 2 :- 2 :- 2 :- 2 :-
^C

As discussed in the combinational logic section, Clash shouldn’t be able to translate the definitions of fmap and liftA2: after all, they act on infinitely recurring data types! In practice it can, because both pure and these two operations are marked as primitive: Clash will not try to translate the definitions of primitives, it will generate hardcoded bits of HDL.

Signals 🤝 state

So far we’ve only seen combinational operations: we haven’t introduced any state yet. Worse, with the introduced primitives we cannot add any state either. Let’s take a step back and see what it would have to look like:

I.e., the output signal Q is a shifted version of D. In this diagram, the initial state of the register is marked init. A fourth primitive Clash offers is therefore register:

-- | Simplified version of Clash's `register`: the actual version has more control signals.
register :: Clock dom -> a -> Signal dom a -> Signal dom a
register _clk init signal = init :- signal

Again, when Clash sees this primitive being used, it generates a hardcoded bit of HDL. Note that because register operates on a single domain, it does not need to be aware of any properties of the clock; the abstraction works the same for clocks running on 1 MHz or 100 MHz. The generated HDL does need clock lines to be routed though, so the primitive still needs the clock argument.

And just like that, we’ve introduced state. We can now implement a simple counter:

counter = register clock 0 (fmap (+1) counter)

-- Because `Signal` has a `Num` instance, we can also write:
-- counter = register clock 0 (counter + 1)

From an HDL compilation point of view, we can see how this might work. If Clash encounters the special primitive register, it will not try to translate the definition, but it will insert a predefined bit of HDL that implements the register.

From a simulation perspective it is less clear why this works as we’re using the result of a function call as its argument. In most other languages this would be either rejected at compile time or get stuck at runtime. Not so much in Haskell, as it is a lazy language. In a lazy language, definitions get “unfolded” as we go. The evaluation process roughly3 looks like:

counter 
  = register clock 0 (counter + 1)                     -- <=> inline register
  =                let tmp = 0       :- tmp + 1 in tmp -- <=> inline tmp
  = 0 :-           let tmp = (0 + 1) :- tmp + 1 in tmp -- <=> eval (+)
  = 0 :-           let tmp = 1       :- tmp + 1 in tmp -- <=> inline tmp
  = 0 :- 1 :-      let tmp = (1 + 1) :- tmp + 1 in tmp -- <=> eval (+)
  = 0 :- 1 :-      let tmp = 2       :- tmp + 1 in tmp -- <=> inline tmp
  = 0 :- 1 :- 2 :- let tmp = (2 + 1) :- tmp + 1 in tmp -- <=> eval (+)
  = ..

Signals on multiple domains

One of the questions /u/netj_nsh asked was:

Does Clash [..] support [..] synchronous design with clock domain crossing?

With the primitives introduced we cannot construct any clock domain crossing. In fact, the Haskell compiler will stop us before we even get a chance to run it:

>>> a = pure 3 :: Signal A Int
>>> b = pure 5 :: Signal B Int
>>> liftA2 (+) a b
*** Type error: could not deduce A ~ B

And this is great! It means you can never accidentally cross clock domains in Clash. Of course, sometimes you do want to cross clock domains, which can be done with yet another primitive: unsafeSynchronizer:

unsafeSynchronizer :: Clock dom1 -> Clock dom2 -> Signal dom1 a -> Signal dom2 a

This will reinterpret a signal on domain dom1 as if it were a signal on dom2. It keeps a measure called relative time that keeps track of the distance to the edges. The scalar variable relativeTime represents the time difference between the next two edges of the two clocks: the clock of the consumed signal and the clock of the produced signal. With the scalar variable, we can distinguish two moments in time. t = 0 points at the next edge of the consumed signal, and t = relativeTime points at the next edge of the produced signal. If relativeTime is less than zero, it means an element should now be produced. If it is greater than zero, it means an element should now be consumed. If relativeTime equals zero, both should happen. While the real definition in Clash is more involved, this computation boils down to this code:

-- `t1` and `t2` denote the clock period of `dom1` and `dom2`
go relativeTime (a :- as)
  | relativeTime < 0  = a :- go (relativeTime + t2)      (a :- as)
  | relativeTime == 0 = a :- go (relativeTime - t1 + t2) as
  | otherwise         =      go (relativeTime - t1)      as

Note that you almost never want to use unsafeSynchronizer directly. Though it synchronizes a Signal from one domain to another in simulation, it doesn’t do anything in the generated hardware. Clock domain crossing can be tricky and hardware designers — including Clash designers — should prefer vendor primitives such as Xilinx’s XPM where possible.

Below you’ll find an interactive simulator that steps through a synchronization:

Period A
Period B
Relative time 0
Steps 2
Next action 🟦 rel < 0 rel = 0 rel > 0 produce B (+3) produce B, consume A (+3, -1) consume A (-1)

If you’ve played around with the simulator a little bit you might have seen something strange. It can produce diagrams like these:

In this diagram, the produced signal B seems to time travel: it takes on a value of 2 before the sampled signal does. While you would never see an event-based simulator produce diagrams like these, it makes sense for Signal: elements of signals model the stable value at each clock tick. A more intuitive way of drawing this diagram would be:

Though Signal can never represent this: it must have exactly one value representing a clock cycle.

Signals: a leaky abstraction

In presence of multiple domains, signals become a leaky abstraction. For example, synchronizing a fast domain to a slower one loses information and resynchronizing to the fast domain does not necessarily give you back the same signal:

In the experience of yours truly, this is not a problem in practice: synchronization with unsafeSynchronizer is almost always backed by a memory element, sidestepping the problem entirely.

Recap

To recap, Signal works for HDL translation because its implementation is hidden from the public API. To make the data type useful still, primitives are offered:

  • pure: construct a signal
  • <$> or fmapSignal: apply a pure function/combinational circuit to a signal
  • <*> or appSignal: merge multiple signals on the same domain
  • register: introduce state

For simulation it works because:

  • Lazy evaluation means we can create feedback loops without compilation or runtime errors.
  • We can “synchronize” signals on multiple domains by calculating their relative clock offsets and constructing a new signal.

Footnotes

1This isn’t entirely true: Haskell provides “escape hatches” such as unsafePerformIO that allow you to break purity.

2These three operations effectively implement the Applicative and Functor typeclasses.

3The “unfolding” shown roughly follows Haskell’s evaluation if the signal would be forced.

Martijn Bastiaan avatar
About Martijn Bastiaan
Martijn Bastiaan has been a software engineer at QBayLogic since 2017. He is an active contributer to the Clash compiler itself.