April 01, 2025

in Code

Faking ADTs and GADTs in Languages That Shouldn't Have Them
Haskell is the world’s best programming language¹, but let’s face the harsh reality that a lot of times in life you’ll have to write in other programming languages. But alas you have been fully Haskell-brained and lost all ability to program unless it is type-directed, you don’t even know how to start writing a program without imagining its shape as a type first.

Well, fear not. The foundational theory behind Algebraic Data Types and Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that they’ll fit (somewhat) seamlessly into whatever language you’re forced to write. After all, if they can fit profunctor optics in Microsoft’s Java code, the sky’s the limit!

This is an “April Fools” joke in the tradition of my previous one in some of these ways that we are going to twist these other languages might seem unconventional or possibly ill-advised… but also the title is definitely a lie: these languages definitely should have them! :D

Normal ADTs

As a reminder, algebraic Data Types (ADTs) are products and sums; that’s why they’re algebraic, after all!

Product Types

Products are just immutable structs, which pretty much every language supports — as long as you’re able to make sure they are never mutated.

Structs in c, for example, look like:
#include <stdint.h>

typedef struct {
    uint32_t timestamp;
    double amount;
} Transaction;
But you’ll need proper immutable API for it:
Transaction createTransaction(uint32_t timestamp, double amount) {
    return (Transaction){ timestamp, amount};
}

uint32_t getTimestamp(const Transaction* t) {
    return t->timestamp;
}

double getAmount(const Transaction* t) {
    return t->amount;
}

Transaction setTimestamp(const Transaction* t, uint32_t timestamp) {
    return (Transaction){timestamp, t->amount};
}

Transaction setAmount(const Transaction* t, double amount) {
    return (Transaction){t->timestamp, amount};
}
This is much simpler in languages where you can associate functions with data, like OOP and classes. For example, this is the common “value object” pattern in java (roughly related to the java bean²):
public class Transaction {
    private final long timestamp;
    private final double amount;

    public Transaction(long timestamp, double amount) {
        this.timestamp = timestamp;
        this.amount = amount;
    }

    public long getTimestamp() { return timestamp; }
    public double getAmount() { return amount; }

    public Transaction setTimestamp(long newTimestamp) {
        return new Transaction(newTimestamp, this.amount);
    }

    public Transaction setAmount(double newAmount) {
        return new Transaction(this.timestamp, newAmount);
    }
}
And there you go. Nothing too surprising there!

In this case, not only are these ADTs (algebraic data types), they’re also ADTs (abstract data types): you are meant to work with them based on a pre-defined abstract interface based on type algebra, instead of their internal representations.

Sum Types

If your language doesn’t support sum types, usually the way to go is with the visitor pattern: the underlying implementation is hidden, and the only way to process a sum type value is by providing handlers for every branch — a pattern match as a function, essentially. Your sum values then basically determine which handler is called.

For example, we can implement it for a network address type that can either be IPv4 or IPv6. Here we are using C++ just for generics and lambdas with closures, for simplicity, but we’ll discuss how this might look in C later.
#include <iostream>
#include <format>
#include <cstdint>

struct IPAddress {
    bool isIPv4;
    union {
        uint32_t ipv4;
        uint8_t ipv6[16];
    };
};

template <typename R>
struct IPAddressVisitor {
    R (*visitIPv4)(uint32_t);
    R (*visitIPv6)(const uint8_t (&)[16]);
};

template <typename R>
R acceptIPAddress(const IPAddress& ip, IPAddressVisitor<R> visitor) {
    return ip.isIPv4 ? visitor.visitIPv4(ip.ipv4)
                     : visitor.visitIPv6(ip.ipv6);
}
You can create the values using:
IPAddress mkIPv4(uint32_t value) {
    return { true, { value } };
}

IPAddress mkIPv6(const uint8_t (&value)[16]) {
    IPAddress out = { false };
    std::copy(std::begin(value), std::end(value), out.ipv6);
    return out;
}
And we can show an address:
std::string showIPAddress(const IPAddress& ip) {
    IPAddressVisitor<std::string> visitor = {
        [](uint32_t v) {
            return std::format("{}.{}.{}.{}",
                               (v >> 24) & 0xFF, (v >> 16) & 0xFF,
                               (v >> 8) & 0xFF, v & 0xFF);
        },
        [](const uint8_t (&v)[16]) {
            return std::format("{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:"
                               "{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}",
                               v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7],
                               v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]);
        }
    };
    return acceptIPAddress(ip, visitor);
}
Note that in this way, the compiler enforces that we handle every branch. And, if we ever add a new branch, everything that ever consumes IPAddress with an IPAddressVisitor will have to add a new handler.

In a language without generics or powerful enough polymorphism, it’s difficult to enforce the “pure” visitor pattern because you can’t ensure that all branches return the same type.

One common pattern is to have an “effectful” visitor pattern, where the point isn’t to return something, but to execute something on the payload of the present branch. This is pretty effective for languages like C, javascript, python, etc. where types aren’t really a rigid thing.

For example, this might be how you treat an “implicit nullable”:
export const visitMaybe = (visitNothing, visitJust, val) =>
  (val == null) ? visitNothing() : visitJust(val);
This is basically for_ from Haskell: You can do something like conditionally launch some action if the value is present.
visitMaybe(
  () => console.log("Nothing to request"),
  (reqPayload) => makeRequest("google.com", reqPayload),
  maybeRequest
);
On a simpler note, if your language as subtyping built in (maybe with classes and subclasses) or some other form of dynamic dispatch, you can implement it in terms of that, which is nice in python, java, C++, etc.
interface ExprVisitor<R> {
    R visitLit(int value);
    R visitNegate(Expr unary);
    R visitAdd(Expr left, Expr right);
    R visitMul(Expr left, Expr right);
}

abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);
}
Alternatively, you’re in a language where lambdas are easy, instead of tupling up the visitor, you could just have accept itself take a number of arguments corresponding to each constructor:
// Alternative definition without an explicit Visitor class
abstract class Expr {
    public abstract <R> R accept(
        Function<int,R> visitLit,
        Function<Expr,R> visitNegate,
        BiFunction<Expr,Expr,R> visitAdd,
        BiFunction<Expr,Expr,R> visitMul
    );
}
(Note that C++ doesn’t allow template virtual methods — not because it’s not possible within the language semantics and syntax, but rather because the maintainers are too lazy to add it — so doing this faithfully requires a bit more creativity)

Now, if your language has dynamic dispatch or subclass polymorphism, you can actually do a different encoding, instead of the tagged union. This will work in languages that don’t allow or fully support naked union types, too. In this method, each constructor becomes a class, but it’s important to only allow access using accept to properly enforce the sum type pattern.
class Lit extends Expr {
    private final int value;

    public Lit(int value) {
        this.value = value;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitLit(value);
    }
}

class Negate extends Expr {
    private final Expr unary;

    public Negate(Expr unary) { this.unary = unary; }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitNegate(unary);
    }
}

class Add extends Expr {
    private final Expr left;
    private final Expr right;

    public Add(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitAdd(left, right);
    }
}

class Mul extends Expr {
    private final Expr left;
    private final Expr right;

    public Mul(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitMul(left, right);
    }
}
(But, just wanted to note that if you actually are working in java, you can actually do something with sealed classes, which allows exhaustiveness checking for its native switch/case statements.)

Alternatively you could make all of the subclasses anonymous and expose them as factory methods, if your language allows it:
abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitNegate(unary);
            }
        };
    }

    public static Expr add(Expr left, Expr right) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitAdd(left, right);
            }
        };
    }

    // ... etc
}
You’d then call using:
public class Main {
    public static void main(String[] args) {
        Expr expr = new Mul(new Negate(new Add(new Lit(4), new Lit(5))), new Lit(8));
        // or
        // Expr expr = Eval.mul(Eval.negate(Eval.add(Eval.lit(4), Eval.lit(5))), Eval.lit(8));

        ExprVisitor<Integer> eval = new ExprVisitor<>() {
            @Override public Integer visitLit(int value) {
                return value;
            }
            @Override public Integer visitNegate(Expr unary) {
                return -unary.accept(this);
            }
            @Override public Integer visitAdd(Expr left, Expr right) {
                return left.accept(this) + right.accept(this);
            }
            @Override public Integer visitMul(Expr left, Expr right) {
                return left.accept(this) * right.accept(this);
            }
        };

        System.out.println("Result: " + expr.accept(eval));
    }
}
Passing around function references like this is actually pretty close to the scott encoding of our data type — and for non-recursive types, it’s essentially the church encoding.

Recursive Types

Speaking of recursive types…what if your language doesn’t allow recursive data types? What if it doesn’t allow recursion at all, or what if recursively generated values are just annoying to deal with? Just imagine writing that Expr type in a language with explicit memory management, for example. Or, what if you wanted a way to express your recursive types in a more elegant and runtime-safe manner?

One thing you can instead do is have your visitor be in its “catamorphism”, or church encoding. Instead of having the “visitor” take the recursive sub-values, instead have it return the result of recursively applying itself.

Let’s do this in dhall, one of the most famous non-recursive languages. Dhall does have native sum types, so we won’t worry about manually writing a visitor pattern. But it does not have recursive data types.

Let’s define a type like:
data Expr = Lit Natural
          | Add Expr Expr
          | Mul Expr Expr
But we can’t define data types in dhall that refer to themselves. So instead, we can define them in their “church encoding”: give what you would do with an Expr to consume it, where the consumption function is given as if it were recursively applied.
let ExprF : Type -> Type
      = \(r : Type) ->
        { lit : Natural -> r
        , add    : r -> r -> r
        , mul    : r -> r -> r
        }

let Expr : Type
      = forall (r : Type) -> ExprF r -> r
Note that ExprF r is essentially ExprVisitor<R>, except instead of add being Expr -> Expr -> r, it’s r -> r -> r: the input values aren’t the expression, but rather the results of recursively folding on the expression. In fact, our original non-recursive ExprVisitor<R> (to be more precise, the R accept(ExprVisitor<R>)) is often called the “scott encoding”, as opposed to the recursive “church encoding” fold.

For value creation, you take the visitor and recursively apply:
let lit : Natural -> Expr
      = \(x : Natural) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.lit x

let add : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.add (left r handlers) (right r handlers)

let mul : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.mul (left r handlers) (right r handlers)
And finally, using the data type involves providing the handler to fold up from the bottom to top. Note that add : \(left : Natural) -> \(right : Natural) -> left + right already assumes that the handler has been applied to the sub-expressions, so you get Naturals on both sides instead of Expr.
let eval : Expr -> Natural
      = \(e : Expr) ->
          e Natural
            { lit = \(x : Natural) -> x
            , add = \(left : Natural) -> \(right : Natural) -> left + right
            , mul = \(left : Natural) -> \(right : Natural) -> left * right
            }

let testVal : Expr
      = mul (add (lit 4) (lit 5)) (lit 8)

in  assert : eval testVal === 72
This pattern is useful even in languages with good datatype recursion, like Haskell — it’s actually the recursion-schemes refactoring of a recursive data type, and it can be useful to have it live alongside your normal recursive types. I’ve written this blog post talking about how useful this pattern is to have alongside your normal recursive types.

This pattern is pretty portable to other languages too, as long as you can scrounge together something like Rank-N types:
interface ExprFold<R> {
    R foldLit(int value);
    R foldNegate(R unary);
    R foldAdd(R left, R right);
    R foldMul(R left, R right);
}

interface Expr {
    public abstract <R> R accept(ExprFold<R> fold);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldNegate(unary.accept(fold));
            }
        };
    }

    // etc.
}
By “Rank-N types” here, I mean that your objects can generate polymorphic functions: given an Expr, you could generate an <R> R accept(ExprFold <R> fold) for any R, and not something pre-determined or pre-chosen by your choice of representation of Expr.

Generalized Algebraic Data Types

You’ve implemented ADTs in your language of choice, or you are currently in a language with native ADTs. Life is good, right? Until that sneaky voice starts whispering in your hear: “we need more type safety.” You resist that urge, maybe even get a lot done without it, but eventually you are compelled to give in and embrace the warm yet harsh embrace of ultimate type safety. Now what?

Singletons and Witnesses

In Haskell, singletons are essentially enums used to associate a value with a reifiable type. “Reifiable” here means that you can take the runtime value of a singleton and use it to bring evidence to the type-level. I ran into a real-world usage of this while writing https://coronavirus.jle.im/, a web-based data visualizer of COVID-19 data (source here) in purescript. I needed a singleton to represent scales for scatter plots and linking them to the data that can be plotted. And, not only did it need to be type-safe in purescript (which has ADTs but not GADTs), it had to be type-safe in the javascript ffi as well.

Here’s how it might look in Haskell:
-- | Numeric types
data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

-- | Define a scale
data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> NType a -> Scale a   -- ^ whether to include zero in the axis or not
    ScaleLog :: NType a -> Scale a
You’d then run it like this:
plot :: Scale a -> Scale b -> [(a, b)] -> Canvas
So, we have the type of the input tuples being determined by the values you pass to plot:
ghci> :t plot ScaleDate (ScaleLinear True (LNumeric NInt))
[(Date, Int)] -> Canvas
But let’s say we only had ADTs. And then we’re passing them down to a javascript FFI which only has structs and functions. We could drop the type-safety and instead error on runtime, but…no. Type unsafety is not acceptable.

The fundamental ability we want to gain is that if we pattern match on ScaleDate, then we know a has to be Date. If we match on NInt, we know that a has to be Int.

For the sake of this example, we’re going to be implementing a simpler function in purescript and in javascript: a function that takes a scale type and a list of points prints the bounds. In Haskell, this looks like:
data AxisBounds a = AB
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayAxis :: Scale a -> [a] -> AxisBounds a
displayAxis = \case
    ScaleDate -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (showDate xMin) xMax (showDate xMax)
    ScaleLinear hasZero nt -> \xs ->
      displayNumericAxis (if hasZero then 0:xs else xs)
    ScaleLog nt ->
      displayNumericAxis nt xs

displayNumericAxis :: NType a -> [a] -> AxisBounds a
displayNumericAxis = \case
    NInt -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%d" xMin) xMax (printf "%d" xMax)
    NDouble -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.4f" xMin) xMax (printf "%.4f" xMax)
    NPercent -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.1f%%" (xMin*100)) xMax (printf "%.1f%%" (xMax*100))
(Pretend the Percent type is just a newtype-wrapped Float or something)

There are at least two main approaches to do this. We’ll be discussing runtime equality witnesses and Higher-Kinded Eliminators.

Runtime Witnesses and Coyoneda Embedding

The Yoneda Lemma is one of the most powerful tools that Category Theory has yielded as a branch of math, but its sibling coyoneda is one of the most useful Haskell abstractions.

This doesn’t give you GADTs, but it’s a very lightweight way to “downgrade” your GADTs into normal GADTs which is appropriate if you don’t need the full power.

The trick is this: if you have MyGADT a, and you know you are going to be using it to produce as, you can do a covariant coyoneda transform.

For example, if you have this type representing potential data sources:
data Source :: Type -> Type where
    ByteSource :: Handle -> Source Word
    StringSource :: FilePath -> Source String

readByte :: Handle -> IO Word
readString :: FilePath -> IO String

readSource :: Source a -> IO a
readSource = \case
    ByteSource h -> readByte h
    StringSource fp -> readString fp
You could instead turn Source into a non-GADT by making it a normal parameterized ADT and adding a X -> a field, which is a type of CPS transformation:
data Source a =
    ByteSource Handle (Word -> a)
  | StringSource FilePath (String -> a)

byteSource :: Handle -> Source Word
byteSource h = ByteSource h id

stringSource :: FilePath -> Source String
stringSource fp = StringSource fp id

readSource :: Source a -> IO a
readSource = \case
    ByteSource h out -> out <$> readByte h
    StringSource fp out -> out <$> readString fp
A nice benefit of this method is that Source can now have a Functor instance, which the original GADT could not.

And, if MyGADT a is going to be consuming as, you can do the contravariant coyoneda transform:
data Sink a =
    ByteSink Handle (a -> Word)
  | StringSink FilePath (a -> String)
This gives it a free Contravariant instance too!

And, if you are going to be both consuming and producing as, you can do the invariant coyoneda transform
data Interface a =
    ByteInterface Handle (Word -> a) (a -> Word)
  | StringInterface FilePath (String -> a) (Word -> a)
However, in practice, true equality involves being able to lift under injective type constructors, and carrying every single continuation is unwieldy. We can package them up together with a runtime equality witness.

This is something we can put “inside” NInt such that, when we pattern match on a NType a, the type system can be assured that a is an Int.

You need some sort of data of type IsEq a b with functions:

refl :: IsEq a a

to :: IsEq a b -> a -> b

sym :: IsEq a b -> IsEq b a

trans :: IsEq a b -> IsEq b c -> IsEq a c

inj :: IsEq (f a) (f b) -> IsEq a b

If you have to and sym you also get from :: IsEq a b -> b -> a.

From all of this, we can recover our original IsEq a Word -> Word -> a and IsEq a Word -> a -> Word functions, saving us from having to put two functions.

Your language of choice might already have this IsEq. But one of the more interesting ways to me is Leibniz equality (discussed a lot in this Ryan Scott post), which works in languages with higher-kinded polymorphism. Leibniz quality in languages with higher-kinded polymorphism means that a and b are equal if forall p. p a -> p b: any property of a is also true of b.

In Haskell, we write this like:
newtype Leibniz a b = Leibniz (forall p. p a -> p b)

refl :: Leibniz a a
refl = Leibniz id
The only possible way to construct a ‘Leibniz’ is with both type parameters being the same: You can only ever create a value of type Leibniz a a, never a value of Leibniz a b where b is not a.

You can prove that this is actually equality by writing functions Leibniz a b -> Leibniz b a and Leibniz a b -> Leibniz b c -> Leibniz a c (this Ryan Scott post goes over it well), but in practice we realize this equality by safely coercing a and b back and forth:
newtype Identity a = Identity { runIdentity :: a }

to :: Leibniz a b -> a -> b
to (Leibniz f) = runIdentity . f . Identity

newtype Op a b = Op { getOp :: b -> a }

from :: Leibniz a b -> b -> a
from (Leibniz f) = getOp (f (Op id))
So, if your language supports higher-kinded Rank-2 types, you have a solution!

There are other solutions in other languages, but they will usually all be language-dependent.

Let’s write everything in purescript. The key difference is we use map (to isNumber) :: Array a -> Array Number, etc., to get our Array as something we know it has the type of.
import Text.Printf

newtype Leibniz a b = Leibniz (forall p. p a -> p b)

to :: Leibniz a b -> a -> b
from :: Leibniz a b -> b -> a

data NType a =
    NInt (Leibniz a Int)
  | NNumber (Leibniz a Number)
  | NPercent (Leibniz a Percent)

type AxisBounds a =
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayNumericAxis :: NType a -> Array a -> AxisBounds a
displayNumericAxis = \case
    NInt isInt -> \xs ->
      let xMin = minimum $ map (to isInt) xs
          xMax = maximum $ map (to isInt) xs
          showInt = show
       in { minValue: xMin
          , minLabel: showInt xMin
          , maxValue: xMax
          , maxLabel: showInt xMax
          }
    NNumber isNumber -> \xs ->
      let xMin = minimum $ map (to isNumber) xs
          xMax = maximum $ map (to isNumber) xs
          showFloat = printf (Proxy :: Proxy "%.4f")   -- it works a little differently
       in { minValue: xMin
          , minLabel: showFloat xMin
          , maxValue: xMax
          , maxLabel: showFloat xMax
          }
    NPercent isPercent -> \xs ->
      let xMin = minimum $ map (to isPercent) xs
          xMax = maximum $ map (to isPercent) xs
          showPercent = printf (Proxy :: Proxy "%.1f%%") <<< (_ * 100.0)
       in { minValue: xMin
          , minLabel: showPercent xMin
          , maxValue: xMax
          , maxLabel: showPercent xMax
          }
To work with our [a] as if it were [Int], we have to map the coercion function over it that our Leibniz a Int gave us. Admittedly, this naive way adds a runtime cost of copying the array. But we could be more creative with finding the minimum and maximum in this way in constant space and no extra allocations.

And, if we wanted to outsource this to the javascript FFI, remember that javascript doesn’t quite have sum types, so we can create a quick visitor:
type NVisitor a r =
    { nvInt :: Leibniz a Int -> r
    , nvNumber :: Leibniz a Number -> r
    , nvPercent :: Leibniz a Percent -> r
    }

type NAccept a = forall r. NVisitor a r -> r

toAccept :: NType a -> NAccept a
toAccept = case _ of
    NInt isInt -> \nv -> nv.nvInt isInt
    NNumber isNumber -> \nv -> nv.nvNumber isNumber
    NPercent isPercent -> \nv -> nv.nvPercent isPercent

foreign import _formatNumeric :: forall a. Fn2 (NAccept a) a String

formatNumeric :: NType a -> a -> String
formatNumeric nt = runFn2 _formatNumeric (toAccept nt)
The FFI binding looks like: (taken from my actual source code)
import * as d3 from "d3-format";

export const _formatNumeric = (naccept, xs) =>
  naccept(
    { nvInt: (isInt) => d3.format("~s")
    , nvNumber: (isNumber) => d3.format(".3~s")
    , nvPercent: (isPercent) => d3.format("+.3~p")
    }
  );
Admittedly in the javascript we are throwing away the “GADT type safety” because we throw away the equality. But we take what we can — we at least retain the visitor pattern for sum-type type safety and exhaustiveness checking. I haven’t done this in typescript yet so there might be a way to formalize Leibniz equality to do this in typescript and keep the whole chain type-safe from top to bottom.

Higher-Kinded Eliminators

This is essentially the higher-kinded version of the visitor pattern, except in dependent type theory these visitors are more often called “eliminators” or destructors, which is definitely a cooler name.

In the normal visitor you’d have:
data User = TheAdmin | Member Int

data UserHandler r = UH
    { uhTheAdmin :: r
    , uhMember :: Int -> r
    }
But note that if you have the right set of continuations, you have something that is essentially equal to User without having to actually use User:
type User' = forall r. UserHandler r -> r

fromUser :: User -> User'
fromUser = \case
    TheAdmin -> \UH{..} -> uhTheAdmin
    Member userId -> \UH{..} -> uhMember userId

toUser :: User' -> Foo
toUser f = f $ UH { fhTheAdmin = TheAdmin, fhMember = Member }
This means that User is actually equivalent to forall r. UserHandler r -> r: they’re the same type, so if your language doesn’t have sum types, you could encode it as forall r. UserHandler r -> r instead. Visitors, baby.

But, then, what actually does the r type variable represent here, semantically? Well, in a UserHandler r, r is the “target” that we interpret into. But there’s a deeper relationship between r and User: A UserHandler r essentially “embeds” a User into an r. And, a UserHandler r -> r is the application of that embedding to an actual User.

If we pick r ~ (), then UserHandler () embeds User into (). If we pick r ~ String, then UserHandler () embeds User into String (like, “showing” it). And if we pick r ~ User, a UserHandler User embeds a User into…itself?

So here, r is essentially the projection that we view the user through. And by making sure we are forall r. UserHandler r -> r for all r, we ensure that we do not lose any information: the embedding is completely 1-to-1. It lets you “create” the User faithfully in a “polymorphic” way.

In fact, to hammer this home, some people like to use the name of the type as the type variable: UserHandler user:
-- | The same thing as before but with things renamed to prove a point
data MakeUser user = MakeUser
    { uhTheAdmin :: user
    , uhMember :: Int -> user
    }

type User' = forall user. MakeUser user -> user
The forall user. lets us faithfully “create” a User within the system we have, without actually having a User data type. Essentially we can imagine the r in the forall r as “standing in” for User, even if that type doesn’t actually exist.

Now, here’s the breakthrough: If we can use forall (r :: Type) to substitute for User :: Type, how about we use a forall (p :: Type -> Type) to substitute for a Scale :: Type -> Type?
data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> LType a -> Scale a
    ScaleLog :: NType a -> Scale a

data ScaleHandler p a = SH
    { shDate :: p Date
    , shLinear :: Bool -> NType a -> p a
    , shLog :: NType a -> p a
    }

type Scale' a = forall p. ScaleHandler p a -> p a

fromScale :: Scale a -> Scale' a
fromScale = \case
    ScaleDate -> \SH{..} -> shDate
    ScaleLinear hasZero lt -> \SH{..} -> shLinear hasZero lt
    ScaleLog nt -> \SH{..} -> shLog nt

toScale :: Scale' a -> Scale a
toScale f = f $ SH { shDate = ScaleDate, shLinear = ScaleLinear, shLog = ScaleLog }
So in our new system, forall p. ScaleHandler p a -> p a is identical to Scale: we can use p a to substitute in Scale in our language even if our language itself cannot support GADTs.

So let’s write formatNType in purescript. We no longer have an actual Scale sum type, but its higher-kinded church encoding:
type NType a = forall p.
    { int :: p Int
    , number :: p Number
    , percent :: p Percent
    } -> p a

type Scale a = forall p.
    { date :: p Date
    , linear :: Bool -> NType a -> p a
    , log :: NType a -> p a
    } -> p a

ntInt :: NType Int
ntInt nth = nth.int

ntNumber :: NType Number
ntNumber nth = nth.number

ntPercent :: NType Percent
ntPercent nth = nth.percent

formatNType :: NType a -> a -> String
formatNType nt = f
  where
    Op f = nt
      { int: Op show
      , number: Op $ printf (Proxy "%.4f")
      , percent: Op $ printf (Proxy "%.1f%%") <<< (_ * 100.0)
      }
Here we are using
newtype Op b a = Op (a -> b)
as our “target”: turning an NType a into an Op String a. And an Op String a is an a -> String, which is what we wanted! The int field is Op String Int, the number field is Op String Number, etc.

In many languages, using this technique effectively requires having a newtype wrapper on-hand, so it might be unwieldy in non-trivial situations. For example, if we wanted to write our previous axis function which is NType a -> [a] -> String, we’d have to have a newtype wrapper for [a] -> String that has a as its argument:
newtype OpList b a = Op ([a] -> b)
or you could re-use Compose:
newtype Compose f g a = Compose (f (g a))
and your p projection type would be Compose Op []. So, you don’t necessarily have to write a bespoke newtype wrapper, but you do have to devote some brain cycles to think it through (unless you’re in a language that doesn’t need newtype wrappers to have this work, like we’ll discuss later).

By the way, this method generalizes well to multiple arguments: if you have a type like MyGADT a b c, you just need to project into a forall (p :: k1 -> k2 -> k3 -> Type).

I believe I have read somewhere that the two methods discussed here (runtime equality witness vs. higher-kinded eliminator) are not actually fully identical in their power, and there are GADTs where one would work and not the other … but I can’t remember where I read this and I’m also not big-brained enough to figure out what those situations are. But if you, reader, have any idea, please let me know!

Existential Types

Let’s take a quick break to talk about something that’s not technically related to GADTs but is often used alongside them.

What if we wanted to store a value with its NType and hide the type variable? In Haskell we’d write this like:
data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

data SomeNType = forall a. SomeNType (NType a) a

formatNType :: NType a -> a -> String
formatNType nt x = ...

formatSomeNType :: SomeNType -> String
formatSomeNType (SomeNType nt x) = formatNType nt x

myFavoriteNumbers :: [SomeNType]
myFavoriteNumbers = [SomeNType NInt 3, SomeNType NDouble pi]
But what if our language doesn’t have existentials? Remember, this is basically a value SomeNType that isn’t a Generic, but contains both a NType a and an a of the same variable.

One strategy we have available is to CPS-transform our existentials into their CPS form (continuation-passing style form). Basically, we write exactly what we want to do with our contents if we pattern matched on them. It’s essentially a Rank-N visitor pattern with only a single constructor:
type SomeNType = forall r. (forall a. NType a -> a -> r) -> r

someNType :: NType a -> a -> SomeNType
someNType nt x = \f -> f nt x

formatSomeNumeric :: SomeNType -> String
formatSomeNumeric snt = snt
    \nt x -> formatNumeric nt x
You can imagine, syntactically, that snt acts as its “own” pattern match, except instead of matching on SomeNType nt x -> .., you “match” on \nt x -> ..

This general pattern works for languages with traditional generics like Java too:
interface SomeNTypeVisitor<R> {
    <A> R visit(NType<A> nt, A val);
}

interface SomeNType {
    public abstract <R> R accept(SomeNTypeVisitor<R> visitor);

    // One option: the factory method
    public static <A> SomeNType someNType(NType<A> nt, A val) {
        return new SomeNType() {
            @Override
            public <R> R accept(SomeNTypeVisitor<R> visitor) {
                return visitor.visit(nt, val);
            }
        };
    }
}

// Second option: the subtype hiding a type variable, which you have to always
// make sure to upcast into `SomeNType` after creating
class SomeNTypeImpl<A> extends SomeNType {
    private NType<A> nt;
    private A val;

    public SomeNTypeImpl(NType<A> nt, A val) {
        this.nt = nt;
        this.val = val;
    }

    @Override
    public <R> R accept(SomeNTypeVisitor<R> visitor) {
        return visitor.visit(nt, val);
    }
}
Does…anyone write java like this? I tried committing this once while at Google and I got automatically flagged to be put on a PIP.

Recursive GADTs

The climax of this discussion: what if your language does not support GADTs or recursive data types?

We’re going to be using dhall as an example again, but note that the lessons applied here are potentially useful even when you do have recursive types: we’re going to be talking about a higher-kinded church encoding, which can be a useful form of your data types that live alongside your normal recursive ones.

Let’s imagine Expr as a GADT, where Expr a represents an Expr that evaluates to an a:
data Expr :: Type -> Type where
    NatLit :: Natural -> Expr Natural
    BoolLit :: Bool -> Expr Bool
    Add :: Expr Natural -> Expr Natural -> Expr Natural
    LTE :: Expr Natural -> Expr Natural -> Expr Bool
    Ternary :: Expr Bool -> Expr a -> Expr a -> Expr a

eval :: Expr a -> a
eval = \case
    NatLit n -> n
    BoolLit b -> b
    Add x y -> eval x + eval y
    LTE a b -> eval a <= eval b
    Ternary b x y -> if eval b then eval x else eval y
Adding this type variable ensures that our Expr is type-safe: it’s impossible to Add an Expr Bool, and the two branches of a Ternary must have the same result type, etc. And, we can write eval :: Expr a -> a and know exactly what type will be returned.

Now, let’s combine the two concepts: First, the church encoding, where our handlers take the “final result” of our fold r instead of the recursive value Expr. Second, the higher-kinded eliminator pattern where we embed Expr :: Type -> Type into forall (p :: Type -> Type).

And finally, we get:³
let ExprF =
      \(p : Type -> Type) ->
        { natLit : Natural -> p Natural
        , boolLit : Bool -> p Bool
        , add : p Natural -> p Natural -> p Natural
        , ternary : forall (a : Type) -> p Bool -> p a -> p a -> p a
        }

let Expr
    : Type -> Type
    = \(a : Type) -> forall (p : Type -> Type) -> ExprF p -> p a

let eval
    : forall (a : Type) -> Expr a -> a
    = \(a : Type) ->
      \(e : Expr a) ->
        e
          (\(q : Type) -> q)
          { natLit = \(x : Natural) -> x
          , boolLit = \(x : Bool) -> x
          , add = \(x : Natural) -> \(y : Natural) -> x + y
          , ternary =
              \(a : Type) ->
              \(b : Bool) ->
              \(x : a) ->
              \(y : a) ->
                if b then x else y
          }
Again, now instead of add taking Expr, it takes p Natural: the “Natural result of the fold”. p not only stands in for what we embed Expr into, it stands in for the result of the recursive fold. That’s why in eval, the first arguments of add are the Natural results of the sub-evaluation.

These values can be created in the same way as before, merging the two techniques, sending the handlers downstream:
let natLit
    : Natural -> Expr Natural
    = \(n : Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.natLit n

let boolLit
    : Bool -> Expr Bool
    = \(n : Bool) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.boolLit n

let add
    : Expr Natural -> Expr Natural -> Expr Natural
    = \(x : Expr Natural) ->
      \(y : Expr Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.add (x p handlers) (y p handlers)

let ternary
    : forall (a : Type) -> Expr Bool -> Expr a -> Expr a -> Expr a
    = \(a : Type) ->
      \(b : Expr Bool) ->
      \(x : Expr a) ->
      \(y : Expr a) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.ternary (b p handlers) (x p handlers) (y p handlers)

let testVal
    : Expr Natural
    = add (natLit 5) (add (natLit 6) (natLit 7))

in  assert : eval testVal === 18
If all of this is difficult to parse, try reviewing both the recursive ADT section and the higher-kinded eliminator section and making sure you understand both well before tackling this, which combines them together!

Admittedly in Haskell (and purescript) this is a lot simpler because we don’t have to explicitly pass in type variables:
data ExprF p = ExprF
    { natLit :: Natural -> p Natural
    , boolLit :: Bool -> p Bool
    , add :: p Natural -> p Natural -> p Natural
    , ternary :: forall a.  p Bool -> p a -> p a -> p a
    }

type Expr a = forall p. ExprF p a -> p a

eval :: Expr a -> a
eval e = runIdentity $
  e
    { natLit = Identity
    , boolLit = Identity
    , add = \(Identity x) -> \(Identity y) -> Identity (x + y)
    , ternary = \(Identity b) -> \(Identity x) -> \(Identity y) -> if b then x else y
    }

ternary :: Expr Bool -> Expr a -> Expr a -> Expr a
ternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)
But one nice thing about the dhall version that’s incidental to dhall is that it doesn’t require any extra newtype wrappers like the Haskell one does. That’s because type inference tends to choke on things like this, but dhall doesn’t really have any type inference: all of the types are passed explicitly. It’s one of the facts about dhall that make it nice for things like this.

Congratulations

In any case, if you’ve made it this far, congratulations! You are a master of ADTs and GADTs. Admittedly every language is different, and some of these solutions have to be tweaked for the language in question. And, if your program gets very complicated, there is a good chance that things will become ergonomically unfeasible.

But I hope, at least, that this inspires your imagination to try to bring your haskell principles, techniques, standards, practices, and brainrot into the language of your choice (or language you are forced to work with).

And, if you ever find interesting ways to bring these things into a language not discussed here (or a new interesting technique or pattern), I would absolutely love to hear about it!

Until next time, happy “Haskelling”!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)

I bet you thought there was going be some sort of caveat in this footnote, didn’t you?↩︎

I didn’t think I’d ever write “java bean” non-ironically on my blog, but there’s a first time for everything.↩︎

Be aware that this implementation is not necessarily appropriately lazy or short-circuiting in Ternary: it might evaluate both sides returning the chosen branch.↩︎
by Justin Le at April 01, 2025 05:29 PM

March 30, 2025

Lysxia's blog

Unfolding trees breadth-first in Haskell
To visit a tree or graph in breadth-first order, there are two main implementation approaches: queue-based or level-based. Our goal here is to develop a level-based approach where the levels of the breadth-first walk are constructed compositionally and dynamically.

Compositionality means that for every node, its descendants—the other nodes reachable from it—are defined by composing the descendants of its children. Dynamism means that the children of a node are generated only when that node is visited; we will see that this requirement corresponds to asking for a monadic unfold.

A prior solution, using the Phases applicative functor, is compositional but not dynamic in that sense. The essence of Phases is a zipping operation in free applicative functors. What if we did zipping in free monads instead?

This is a Literate Haskell post. The source code is on Gitlab. A reusable version of this code is now available on Hackage: the weave library.
Extensions and imports for this Literate Haskell file
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ViewPatterns #-}
{-# OPTIONS_GHC -Wno-x-partial -Wno-unused-matches -Wno-unused-top-binds -Wno-unused-imports #-}

import "deepseq" Control.DeepSeq (NFData)
import Data.Foldable (toList)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.Functor.Identity (Identity(..), runIdentity)
import GHC.Generics (Generic)
import "tasty" Test.Tasty (TestTree, localOption)
import "tasty-hunit" Test.Tasty.HUnit ((@?=), testCase)
import "tasty-bench" Test.Tasty.Bench (bgroup, bench, defaultMain, nf, bcompare)
-- import "tasty-bench" Test.Tasty.Bench (mutatorCpuTime)
import "tasty-expected-failure" Test.Tasty.ExpectedFailure (expectFail)
import "some" Data.Some.Newtype (Some(Some))
import "transformers" Control.Monad.Trans.State
import qualified "containers" Data.Set as Set
import "containers" Data.Set (Set)
Background: breadth-first folds and traversals

Our running example will be the type of binary trees:
data Tree a = Leaf | Node a (Tree a) (Tree a)
  deriving (Eq, Show, Generic, NFData)
A breadth-first walk explores the tree level by level; every level contains the nodes at the same distance from the root. The list of levels of a tree can be defined recursively—it is a fold. For a tree Node x l r, the first level contains just the root node x, and the subsequent levels are obtained by appending the levels of the subtrees l and r pairwise.
levels :: Tree a -> [[a]]
levels Leaf = []
levels (Node x l r) = [x] : zipLevels (levels l) (levels r)
zipLevels :: [[a]] -> [[a]] -> [[a]]
zipLevels [] yss = yss
zipLevels xss [] = xss
zipLevels (xs : xss) (ys : yss) = (xs ++ ys) : zipLevels xss yss
(We can’t just use zipWith because it throws away the end of a list when the other list is empty.)

Finally, we concatenate the levels together to obtain the list of nodes in breadth-first order.
toListBF :: Tree a -> [a]
toListBF = concat . levels
Thanks to laziness, the list will indeed be produced by walking the tree in breadth-first order. So far so good.

The above function lets us fold a tree in breadth-first order. The next level of difficulty is to traverse a tree, producing a tree with the same shape as the original tree, only with modified labels.
traverseBF :: Applicative m => (a -> m b) -> Tree a -> m (Tree b)
This has the exact same type as traverse, which you might obtain with deriving (Foldable, Traversable). The stock-derived Traversable—enabled by the DeriveTraversable extension—is a depth-first traversal, but the laws of traverse don’t specify the order in which nodes should be visited, so you could make it a breadth-first traversal if you wanted.

To define a breadth-first traversal is a surprisingly non-trivial exercise, as pointed out by Chris Okasaki in Breadth-first numbering: lessons from a small exercise in algorithm design (ICFP 2000).

“Breadth-first numbering” is a special case of “breadth-first traversal” where the arrow (a -> m b) is specialized to a counter. Okasaki presents a “numbering” solution based on queues and another solution based on levels. Both are easily adaptable to the more general “traversal” problem as we will soon see.

There is a wonderful Discourse thread from 2024 on the topic of breadth-first traversals. The first post gives an elegant breadth-first numbering algorithm which also appears in the appendix of Okasaki’s paper, but sadly it does not generalize from “numbering” to “traversal” beyond the special case m = State s.

Last but not least, another level-based solution to the breadth-first traversal problem can be found in the tree-traversals library by Noah Easterly. It is built around an applicative transformer named Phases, which is a list of actions—imagine the type “[m _]”—where each element m _ represents one level of the tree. The Phases applicative enables a compositional definition of a breadth-first traversal, similarly to the levels function above: the set of nodes reachable from the root is defined by combining the sets of nodes reachable from its children. This concern of compositionality is one of the main motivations behind this post.

Non-standard terminology

The broad family of algorithms being discussed is typically called “breadth-first search” (BFS) or “breadth-first traversal”, but in general these algorithms are not “searching” for anything, and in Haskell, “traversal” is reserved for “things like traverse”. Instead, this post will use “walks” as a term encompassing folds, traversals, unfolds, or any concept that can be qualified with “breadth-first”.

Problem statement: Breadth-first unfolds

Both the fold toListBF and the traversal traverseBF had in common that they receive a tree as an input. This explicit tree makes the notion of levels “static”. With unfolds, we will have to deal with levels that exist only “dynamically” as the result of unfolding the tree progressively.

To introduce the unfolding of a tree, it is convenient to introduce its “base functor”. We modify the tree type by replacing the recursive tree fields with an extra type parameter:
data TreeF a t = LeafF | NodeF a t t
  deriving (Functor, Foldable, Traversable)
An unfold generates a tree from a seed and a function which expands the seed into a leaf or a node containing more seeds. A pure unfold—or anamorphism—can be defined readily:
unfold :: (s -> TreeF a s) -> s -> Tree a
unfold f s = case f s of
  LeafF -> Leaf
  NodeF a l r -> Node a (unfold f l) (unfold f r)
The order in which nodes are evaluated depends on how the resulting tree is consumed. Hence unfold is neither inherently “depth-first” nor “breadth-first”.

The situation changes if we make the unfold monadic.
unfoldM :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
An implementation of unfoldM must decide upon an ordering between actions. To see why adding an M to unfold imposes an ordering, contemplate the fact that these expressions have the same meaning:
Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
    let tr = unfold f r in
    Node a tl tr )
= ( let tr = unfold f r in
    let tl = unfold f l in
    Node a tl tr )
whereas these monadic expressions do not have the same meaning in general:
( unfoldM f l >>= \tl ->
  unfoldM f r >>= \tr ->
  pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
  unfoldM f l >>= \tl ->
  pure (Node a tl tr) )
Without further requirements, there is an “obvious” definition of unfoldM, which is a depth-first unfold:
unfoldM_DF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_DF f s = f s >>= \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (unfoldM_DF f l) (unfoldM_DF f r)
We unfold the left subtree l fully before unfolding the right one r.

The problem is to define a breadth-first unfoldM.

If you want to think about this problem on your own, you can stop reading here. The rest of this post presents solutions.

Queue-based unfold

The two breadth-first numbering algorithms in Okasaki’s paper can actually be generalized to breadth-first unfolds. Here is the first one that uses queues (using the function (<+) for “push” and pattern-matching on (:>) for “pop”):
unfoldM_BF_Q :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_Q f b0 = go (b0 <+ Empty) <&> \case
    _ :> t -> t
    _ -> error "impossible"
  where
    go Empty = pure Empty
    go (q :> b) = f b >>= \case
      LeafF -> go q <&> \p -> Leaf <+ p
      NodeF a b1 b2 -> go (b2 <+ b1 <+ q) <&> \case
        p :> t1 :> t2 -> Node a t1 t2 <+ p
        _ -> error "impossible"
(The operator (<&>) is flip (<$>). I use it to avoid parentheses around lambdas.)
Queue implementation for unfoldM_BF_Q
data Q a = Q [a] [a]

pattern Empty :: Q a
pattern Empty = Q [] []

infixr 1 <+
(<+) :: a -> Q a -> Q a
x <+ Q xs ys = Q (x : xs) ys

pop :: Q a -> Maybe (Q a, a)
pop (Q xs (y : ys)) = Just (Q xs ys, y)
pop (Q xs []) = case reverse xs of
  [] -> Nothing
  y : ys -> Just (Q [] ys, y)

infixl 1 :>
pattern (:>) :: Q a -> a -> Q a
pattern q :> y <- (pop -> Just (q, y))

{-# COMPLETE Empty, (:>) #-}
As it happens, containers uses that queue-based technique to implement breadth-first unfold for rose trees (Data.Tree.unfoldTreeM_BF). There is a pending question of whether we can improve upon it. This post might provide a theoretical alternative, but it seems too slow to be worth serious consideration (see the benchmark section).

If you’re frowning upon the use of error—as you should be—you can replace error with dummy values here (Empty, Leaf), but (1) that won’t be possible with tree structures that must be non-empty (e.g., if Leaf contained a value) and (2) this is dead code, which is harmless but no more elegant than making it obvious with error.

The correctness of this solution is also not quite obvious. There are subtle ways to get this implementation wrong: should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q? Should the pattern be p :> t1 :> t2 or p :> t2 :> t1? For another version of this challenge, try implementing the unfold for another tree type, such as finger trees or rose trees, without getting lost in the order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in containers). The invariant is not complex but there is room for mistakes. I believe that the compositional approach that will be presented later is more robust on that front, although it is admittedly a subjective quality for which is difficult to make a strong case.

Some uses of unfolds

Traversals from unfolds

One sense in which unfoldM is a more difficult problem than traverse is that we can use unfoldM to implement traverse. We do have to make light of the technicality that there is a Monad constraint instead of Applicative, which makes unfoldM not suited to implement the Traversable class.

A depth-first unfold gives a depth-first traversal:
traverse_DF :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_DF = unfoldM_DF . traverseRoot

-- auxiliary function
traverseRoot :: Applicative m => (a -> m b) -> Tree a -> m (TreeF b (Tree a))
traverseRoot _ Leaf = pure LeafF
traverseRoot f (Node a l r) = f a <&> \b -> NodeF b l r
A breadth-first unfold gives a breadth-first traversal:
traverse_BF_Q :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_BF_Q = unfoldM_BF_Q . traverseRoot
Unfolds in graphs

We can use a tree unfold to explore a graph. This usage distinguishes unfolds from folds and traversals, which only let you explore trees.

Given a type of vertices V, a directed graph is represented by a function V -> F V, where F is a functor which describes the arity of each node. The obvious choice for F is lists, but we will stick to TreeF here so we can just reuse this post’s unfoldM implementations. The TreeF functor restricts us graphs where each node has zero or two outgoing edges; it is a weird restriction, but we will make do for the sake of example.
An ASCII drawing of a graph
        +-------+
        v       |
+->1--->2--->3  |
|  |    |    ^  |
|  v    v    |  |
|  4--->5--->6--+
|  |    |    ^
|  +----|----+
|       |
+-------+
The graph drawn above turns into the following function, where every vertex is mapped either to NodeF with the same vertex as the first argument followed by its two adjacent vertices, or to LeafF if it has no outgoing edges or does not belong to the graph.
graph :: Int -> TreeF Int Int
graph 1 = NodeF 1 2 4
graph 2 = NodeF 2 3 5
graph 3 = LeafF
graph 4 = NodeF 4 5 6
graph 5 = NodeF 5 1 6
graph 6 = NodeF 6 2 3
graph _ = LeafF
If we simply feed that function to unfold, we will get the infinite tree of all possible paths from a chosen starting vertex.

To obtain a finite tree, we want to keep track of vertices that we have already visited, using a stateful memory. The following function wraps graph, returning LeafF also if a vertex has already been visited.
visitGraph :: Int -> State (Set Int) (TreeF Int Int)
visitGraph vertex = do
  visited <- get
  if vertex `elem` visited then pure LeafF
  else do
    put (Set.insert vertex visited)
    pure (graph vertex)
Applying unfoldM_BF to that function produces a “breadth-first tree” of the graph, an encoding of the trajectory of a breadth-first walk through the graph. “Breadth-first trees” are a concept from graph theory with well-studied properties.
-- Visit `graph` in breadth-first order
bfGraph_Q :: Int -> Tree Int
bfGraph_Q = (`evalState` Set.empty) . unfoldM_BF_Q visitGraph
testGraphQ :: TestTree
testGraphQ = testCase "Q-graph" $
  bfGraph_Q 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Compile and run

This post is a compilable Literate Haskell file. You can run all of the tests and benchmarks in here. The source repository provides the necessary configuration to build it with cabal.
$ cabal build breadth-first-unfolds
Test cases can then be selected with the -p option and a pattern (see the tasty documentation for details). Run all tests and benchmarks by passing no option.
$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
  Q-graph: OK
  S-graph: OK
“Global” level-based unfold

The other solution from Okasaki’s paper can also be adapted into a monadic unfold.

The starting point is to unfold a list of seeds [s] instead of a single seed: we can traverse the list with the expansion function s -> m (TreeF a s) to obtain another list of seeds, the next level of the breadth-first unfold, and keep going.

Iterating this process naively yields a variant of monadic unfold without a result. This no-result variant can be generalized from TreeF to any foldable structure:
-- Inner loop: multi-seed unfold
unfoldsM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> [s] -> m ()
unfoldsM_BF_G_ f [] = pure ()
-- Read from right to left: traverse, flatten, recurse.
unfoldsM_BF_G_ f xs = unfoldsM_BF_G_ f . concatMap toList =<< traverse f xs

-- Top-level function: single-seed unfold
unfoldM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> s -> m ()
unfoldM_BF_G_ f = unfoldsM_BF_G_ f . (: [])
Modifying this solution to create the output tree requires a little more thought. We must keep hold of the intermediate list of ts :: [TreeF a s] to reconstruct trees after the recursive call returns.
unfoldsM_BF_G :: Monad m => (s -> m (TreeF a s)) -> [s] -> m [Tree a]
unfoldsM_BF_G f [] = pure []
-- traverse, flatten, recurse, reconstruct
unfoldsM_BF_G f xs = traverse f xs >>= \ts ->
  reconstruct ts <$> unfoldsM_BF_G f (concatMap toList ts)
The reconstruction function picks a root in the first list and completes it with subtrees from the second list:
reconstruct :: [TreeF a s] -> [Tree a] -> [Tree a]
reconstruct (LeafF : ts) us = Leaf : reconstruct ts us
reconstruct (NodeF a _ _ : ts) (l : r : us) = Node a l r : reconstruct ts us
reconstruct _ _ = error "impossible"
You could modify the final branch to produce [], but error makes it explicit that this branch should never be reached by the unfold where it is used.

The top-level unfold function wraps the seed in a singleton input list and extracts the root from a singleton output list.
unfoldM_BF_G :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_G f = fmap head . unfoldsM_BF_G f . (: [])
Unit test testGraphG
bfGraph_G :: Int -> Tree Int
bfGraph_G = (`evalState` Set.empty) . unfoldM_BF_G visitGraph
testGraphG :: TestTree
testGraphG = testCase "Q-graph" $
  bfGraph_G 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
This solution is less brittle than the queue-based solution because we always traverse lists left-to-right. To avoid the uses of error in reconstruct, you can probably create a specialized data structure in place of [TreeF a s], but that is finicky in its own way.

In search of compositionality

Both of the solutions above (the queue-based and the “monolithic” level-based unfolds) stem from a global view of breadth-first walks: we are iterating on a list or a queue which holds all the seeds from one or two levels at a time. That structure represents a “front line” between visited and unvisited vertices, and every iteration advances the front line a little: with a queue we advance it one vertex at a time, with a list we advance the whole front line in an inner loop—one call to traverse—before recursing.

The opposite local view of breadth-first order is exemplified by the earlier levels function: it only produces a list of lists of the vertices reachable from the current root. It does so recursively, by composing together the vertices reachable from its children. Our goal here is to find a similarly local, compositional implementation of breadth-first unfolds.

Rather than defining unfoldM directly, which sequences the computations on all levels into a single computation, we will introduce an intermediate function weave that keeps levels separate—just as toListBF is defined using levels. The result of weave will be in an as yet unknown applicative functor F m depending on m. And because levels are kept separate, weave only needs a constraint Applicative m to compose computations on the same level. The goal is to implement this signature, where the result type F is also an unknown:
weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
The name weave comes from visualizing a breadth-first walk as a path zigzagging across a tree like this:

which is reminiscent of weaving as in the making of textile:

Illustration from the Wikipedia article on Weaving: warp and weft.

With only what we know so far, a bit of type-directed programming leads to the following incomplete definition. We have constructed something of type m (F m (Tree a)), while we expect F m (Tree a):
weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = _ (step <$> f s) where
  step :: TreeF a s -> F m (Tree a)
  step LeafF = pure Leaf
  step (NodeF a l r) = liftA2 NodeF (weave f l) (weave f r)
To fill the hole _, we postulate the following primitive, weft, as part of the unknown definition of F:
weft :: Applicative m => m (F m a) -> F m a
Intuitively, F m represents “multi-level computations”. The weft function constructs a multi-level (F m)-computation from one level of m-computation which returns the subsequent levels as an (F m)-computation.

We fill the hole with weft, completing the definition of weave:
weave :: forall m s a. Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = weft (weaveF <$> f s) where
  weaveF :: TreeF a s -> F m (Tree a)
  weaveF LeafF = pure Leaf
  weaveF (NodeF a l r) = liftA2 (Node a) (weave f l) (weave f r)
The function weave defines a multi-level computation which represents a breadth-first walk from a seed s:

the first level of the walk is f s, expanding the initial seed;

the auxiliary function weaveF constructs the remaining levels from the initial seed’s expansion:

if the seed expands to LeafF, there are no more seeds, and we terminate with an empty computation (pure);

if the seed expands to NodeF, we obtain two sub-seeds l and r, they generate their own weaves recursively (weave f l and weave f r), and we compose them (liftA2).

One way to think about weft is as a generalization of the following primitives: we can “embed” m-computations into F m, and we can “delay” multi-level (F m)-computations, shifting the m-computation on each level to the next level.
embed :: Applicative m => m a -> F m a
embed u = weft (pure <$> u)

delay :: Applicative m => F m a -> F m a
delay u = weft (pure u)
The key law relating these two operations is that embedded computations and delayed computations commute with each other:
embed u *> delay v = delay v <* embed u
The embed and delay operations are provided by the Phases applicative functor that I mentioned earlier, which enables breadth-first traversals, but not breadth-first unfolds. Thus, weft is a strictly more expressive primitive than embed and delay.

Eventually, we will run a multi-level computation as a single m-computation so that we can use weave to define unfoldM. The runner function will be called mesh:
mesh :: Monad m => F m a -> m a
It is characterized by this law which says that mesh executes the first level of the computation u :: m (F m a), then executes the remaining levels recursively:
mesh (weft u) = u >>= mesh
Putting everything together, weave and mesh combine into a breadth-first unfold:
unfoldM_BF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF f s = mesh (weave f s)
It remains to find an applicative functor F equipped with weft and mesh.

The weave applicative

A basic approach to design a type is to make some of the operations it should support into constructors. The weave applicative WeaveS has constructors for pure and weft:
data WeaveS m a
  = EndS a
  | WeftS (m (WeaveS m a))
(The suffix “S” stands for Spoilers. Read on!)

We instantiate the unknown functor F with WeaveS.
type F = WeaveS
Astute readers will have recognized WeaveS as the free monad. Just as Phases has the same type definition as the free applicative functor but a different Applicative instance, we will give WeaveS an Applicative instance that does not coincide with the Applicative and Monad instances of the free monad.

Starting with the easy functions, weft is WeftS, and the equation for mesh above is basically its definition. We just need to add an equation for EndS.
weft :: m (WeaveS m a) -> WeaveS m a
weft = WeftS

mesh :: Monad m => WeaveS m a -> m a
mesh (EndS a) = pure a
mesh (WeftS u) = u >>= mesh
Recall that WeaveS represents multi-level computations. Computations are composed level-wise with the following liftS2. The interesting case is the one where both arguments are WeftS: we compose the first level with liftA2, and the subsequent ones with liftS2 recursively.
liftS2 :: Applicative m => (a -> b -> c) -> WeaveS m a -> WeaveS m b -> WeaveS m c
liftS2 f (EndS a) wb = f a <$> wb
liftS2 f wa (EndS b) = flip f b <$> wa
liftS2 f (WeftS wa) (WeftS wb) = WeftS ((liftA2 . liftS2) f wa wb)
liftS2 will be the liftA2 in WeaveS’s Applicative instance. The Functor and Applicative instances show that WeaveS is an applicative transformer: for every applicative functor m, WeaveS m is also an applicative functor.
instance Functor m => Functor (WeaveS m) where
  fmap f (EndS a) = EndS (f a)
  fmap f (WeftS wa) = WeftS ((fmap . fmap) f wa)

instance Applicative m => Applicative (WeaveS m) where
  pure = EndS
  liftA2 = liftS2
That completes the definition of unfoldM_BF: a level-based, compositional breadth-first unfold.

As a unit test, we copy the code for visiting a graph from earlier:
bfGraphS :: Int -> Tree Int
bfGraphS = (`evalState` Set.empty) . unfoldM_BF visitGraph
testGraphS :: TestTree
testGraphS = testCase "S-graph" $
  bfGraphS 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Code golf

There is a variant of weave that I prefer:
weaveS :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveS m (Tree a))
weaveS f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (weft (weaveS f l)) (weft (weaveS f r))
The outer weft constructor was moved into the recursive calls. The result type has an extra m, which makes it more apparent that we always start with a call to f. It’s the same vibe as replacing the type [a] with NonEmpty a when we know that a list will always have at least one element; weaveS always produces at least one level of computation. We also replace (<$>) with its flipped version (<&>) for aesthetic reasons: we can apply it to a lambda without parentheses, and that change makes the logic flow naturally from left to right: we first expand the seed s using f, and continue depending on whether the expansion produced LeafF or NodeF.

To define unfoldM, instead of applying mesh directly, we chain it with (>>=).
unfoldM_BF_S :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_S f s = weaveS f s >>= mesh
A wrinkle in time

That solution is Obviously Correct™, but it has a terrible flaw: it does not run in linear time!

We can demonstrate this by generating a “thin” tree whose height is equal to its size. The height h is the seed of the unfolding, and we generate a NodeF as long as it is non-zero, asking for a decreased height h - 1 on the right, and a zero height on the left.
thinTreeS :: Int -> Tree ()
thinTreeS = runIdentity . unfoldM_BF_S f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))
Compare the running times of evaluating thinTreeS at height 100 (the baseline) and at height 1000 (10x the baseline).
benchS :: TestTree
benchS = bgroup "S-thin"
  [ bench  "1x" (nf thinTreeS 100)
  , bench "10x" (nf thinTreeS 1000) & bcompare "S-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 105x
Raw output
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
  S-thin
    1x:  OK
      27.6 μs ± 2.6 μs, 267 KB allocated, 317 B  copied, 6.0 MB peak memory
    10x: OK
      2.90 ms ± 181 μs,  23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x
Multiplying the height by 10x makes the function run 100x slower. Dramatically quadratic.

Complexity analysis

We can compare this implementation with level from earlier, which is linear-time. In particular, looking at zipLevels with liftS2—which play similar roles—there is a crucial difference when one of the arguments is empty ([] or EndS): zipLevels simply returns the other argument, whereas liftS2 calls (<$>), continuing the recursion down the other argument. So zipLevels stops working after reaching the end of either argument, whereas liftS2 walks to the end of both arguments. There is at least one call to liftS2 on every level which will walk to the bottom of the tree, so we get a quadratic lower bound Ω(height²).

Out of sight, out of mind

The problematic combinators are fmap and liftS2, which weaveS uses to construct the unfolded tree. If we don’t care about that tree—wanting only the effect of a monadic unfold—then we can get rid of the complexity associated with those combinators.

With no result to return, we remove the a type parameter from the definition of WeaveS, yielding the oblivious (“O”) variant:
data WeaveO m
  = EndO
  | WeftO (m (WeaveO m))
We rewrite mesh into meshO, reducing a WeaveO m computation into m () instead of m a.
meshO :: Monad m => WeaveO m -> m ()
meshO EndO = pure ()
meshO (WeftO u) = u >>= meshO
The Applicative instance for WeaveS becomes a Monoid instance for WeaveO. liftA2 is replaced with (<>), zipping two computations level-wise.
instance Applicative m => Semigroup (WeaveO m) where
  EndO <> v = v
  u <> EndO = u
  WeftO u <> WeftO v = WeftO (liftA2 (<>) u v)

instance Applicative m => Monoid (WeaveO m) where
  mempty = EndO
  mappend = (<>)
To implement a breadth-first walk, we modify weaveS above by replacing liftA2 (Node a) with (<>). Note that the type parameter a is no longer in the result. It was only used in the tree that we decided to forget.
weaveO :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveO m)
weaveO f s = f s <&> \case
  LeafF -> mempty
  NodeF _ l r -> WeftO (weaveO f l) <> WeftO (weaveO f r)
Running weaveO with meshO yields a oblivious monadic unfold: it produces m () instead of m (Tree a). (You may remember seeing another implementation of that same signature just earlier, unfoldM_BF_G_.)
unfoldM_BF_O_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_O_ f s = weaveO f s >>= meshO
Previously, we benchmarked the function thinTreeS that outputs a tree by forcing the tree. With an oblivious unfold, there is no tree to force. Instead we will count the number of generated NodeF constructors:
thinTreeO :: Int -> Int
thinTreeO = (`execState` 0) . unfoldM_BF_O_ (state . f)
  where
    f 0 counter = (LeafF, counter)
    f h counter = (NodeF () 0 (h - 1), counter + 1)  -- increment the counter for every NodeF
We adapt the benchmark from before to measure the complexity of unfolding thin trees. We have to increase the baseline height from 100 to 500 because this benchmark runs so much faster than the previous ones.
benchO :: TestTree
benchO = bgroup "O-thin"
  [ bench  "1x" (nf thinTreeO 500)
  , bench "10x" (nf thinTreeO 5000) & bcompare "O-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 9.8x
Raw output
$ cabal exec breadth-first-unfolds -- -p O-thin
All
  O-thin
    1x:  OK
      148  μs ± 8.3 μs, 543 KB allocated, 773 B  copied, 6.0 MB peak memory
    10x: OK
      1.45 ms ± 113 μs, 5.4 MB allocated,  82 KB copied, 7.0 MB peak memory, 9.78x
The growth is linear, as desired: the “10x” bench is 10x slower than the baseline “1x” bench.

Laziness for the win

The oblivious unfold avoided quadratic explosion by simplifying the problem. Now let’s solve the original problem again, so we can’t just get rid of fmap and liftA2. As mentioned previously, the root cause was that (1) liftA2 calls fmap when one of the constructors is EndS, and (2) fmap traverses the other argument. The next solution will be to make fmap take constant time, by storing the “mapped function” in the constructor. Behold the “L” variant of WeaveS, which is a GADT:
data WeaveL m a where
  EndL :: a -> WeaveL m a
  WeftL :: m (WeaveL m b) -> (b -> a) -> WeaveL m a
For comparison, here is the previous “S” variant with GADT syntax:
data WeaveS m a where
  EndS :: a -> WeaveS m a
  WeftS :: m (WeaveS m a) -> WeaveS m a
This trick is also known as the “co-Yoneda construction”.

The definition of fmap is no longer recursive. It doesn’t even need m to be a functor anymore!
instance Functor (WeaveL m) where
  fmap f (EndL a) = EndL (f a)
  fmap f (WeftL wa g) = WeftL wa (f . g)
The Applicative instance is… a good exercise for the reader. The details are not immediately important—we only care about improving fmap for now—we will come back to have a look at the Applicative instance soon.

The runner function meshL is a simple bit of type Tetris.
meshL :: Monad m => WeaveL m a -> m a
meshL (EndL a) = pure a
meshL (WeftL wa f) = f <$> (wa >>= meshL)
By partially applying WeftL to id as its second argument, we obtain a counterpart to the unary WeftS constructor:
weftL :: m (WeaveL m a) -> WeaveL m a
weftL wa = WeftL wa id
With those primitives redefined, the “weave” and “unfold” are identical. Below, we only renamed the “S” suffixes to “L”:
weaveL :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveL m (Tree a))
weaveL f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftL (weaveL f s1)) (weftL (weaveL f s2))

unfoldM_BF_L :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_L f s = weaveL f s >>= meshL
The benchmarks show that 10x the height takes 10x the time. Linear growth again.
Benchmark code and output: thinTreeL and benchL
Copy of the benchS benchmark.
thinTreeL :: Int -> Tree ()
thinTreeL = runIdentity . unfoldM_BF_L f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchL :: TestTree
benchL = bgroup "L-thin"
  [ bench  "1x" (nf thinTreeL  100)
  , bench "10x" (nf thinTreeL 1000) & bcompare "L-thin.1x"
  ]
Benchmark output (relative):

height time

baseline 1x

10x 9.93x

Raw output:
$ cabal exec breadth-first-unfolds -- -p "L-thin"     
All
  L-thin
    1x:  OK
      14.1 μs ± 782 ns,  59 KB allocated,   5 B  copied, 6.0 MB peak memory
    10x: OK
      140  μs ±  13 μs, 586 KB allocated,  51 B  copied, 6.0 MB peak memory, 9.93x
Lazy in more ways than one

As hinted by the “L” and “S” suffixes, WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones” work by accumulating functions in the WeftL constructor. That work is “forced” by meshL, which is where the fmap ((<$>)) of the underlying monad m is called, performing the work accumulated by possibly many calls to WeaveL’s fmap.

One subtlety is that there are multiple “lazinesses” at play. The main benefit of using WeaveL is really to delay computation, that is a kind of laziness, but WeaveL doesn’t need to be implemented in a lazy language. We can rewrite all of the code we’ve seen so far in a strict language with minor changes, and we will still observe the quadratic vs linear behavior of WeaveS vs WeaveL on thin trees. The “manufactured laziness” of WeaveL is a concept independent of the “ambient laziness” in Haskell.

Nevertheless, we can still find an interesting role for that “ambient laziness” in this story. Indeed, the function weaveL also happens to be lazier than weaveS in the usual sense.

A concrete test case is worth a thousand words. Consider the following tree generator which keeps unfolding left subtrees while making every right subtree undefined:
partialTreeF :: Bool -> TreeF () Bool
partialTreeF True = NodeF () True False
partialTreeF False = undefined
If we used the pure unfold, we would get the same tree as this recursive definition:
partialTree :: Tree ()
partialTree = Node () partialTree undefined
What happens if we use one of the monadic unfolds? For example unfoldM_BF_S:
partialTreeS :: Tree ()
partialTreeS = runIdentity (unfoldM_BF_S (Identity . partialTreeF) True)
Try to force the first Node constructor.
whnfTreeS :: TestTree
whnfTreeS = expectFail $ testCase "S-whnf" $ do
  case partialTreeS of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable" -- definitely not a Leaf
As it turns out, this test using the “S” variant fails. (That’s why the test is marked with expectFail.) Forcing partialTreeS evaluates the undefined in partialTreeF. Therefore partialTreeS is not equivalent to partialTree.
$ cabal exec breadth-first-unfolds -- -p "S-whnf"
All
  S-whnf: FAIL (expected)
    Exception: Prelude.undefined
    CallStack ...
In contrast, the “L” variant makes that same test succeed.
partialTreeL :: Tree ()
partialTreeL = runIdentity (unfoldM_BF_L (Identity . partialTreeF) True)

whnfTreeL :: TestTree
whnfTreeL = testCase "L-whnf" $ do
  case partialTreeL of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
  L-whnf: OK
This difference can only be seen with “lazy monads”, where (>>=) is lazy in its first argument. (If this definition sounds not quite right, that’s probably because of seq. It makes a precise definition of “lazy monad” more complicated.) Examples of lazy monads from the transformers library are Identity, Reader, lazy State, lazy Writer, and Accum.

The secret sauce is the definition of liftA2 for WeaveL:
instance Applicative m => Applicative (WeaveL m) where
  pure = EndL
  liftA2 f (EndL a) wb = f a <$> wb
  liftA2 f wa (EndL b) = flip f b <$> wa
  liftA2 f (WeftL wa g) (WeftL wb h)
    = WeftL ((liftA2 . liftA2) (,) wa wb) (\ ~(a, b) -> f (g a) (h b))
In the third clause of liftA2, we put the function f in a lambda with a lazy pattern (~(a, b)) directly under the topmost constructor WeftL. Thus, we can access the result of f from the second field of WeftL without looking at the first field. In liftS2 earlier, f was passed as an argument to (liftA2 . liftS2), that forces us to run the computation before we can get a hold on the result of f.

Maximizing laziness

The “L” variant of unfoldM is lazier than the “S” variant, but there is still a gap between partialTreeL and the pure partialTree: if we force not only the root, but also the left subtree of partialTreeL, then we run into undefined again.
forceLeftTreeL :: TestTree
forceLeftTreeL = expectFail $ testCase "L-left" $ do
  case partialTreeL of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "L-left" 
All
  L-left: FAIL (expected)
    Exception: Prelude.undefined
Although the unfold using WeaveL is lazier than using WeaveS, it is not yet as lazy as it could be. The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function. The expansion function partialTreeF produces a level with an undefined sub-computation, which crashes the whole level. Each level in a computation will be either completely defined or undefined.

To recap, we’ve been looking at the following trees:
partialTreeS = undefined
partialTreeL = Node () undefined undefined
partialTree  = Node () partialTree undefined
It is natural to ask: can we define a breadth-first unfold that, when applied to partialTreeF, will yield the same tree as partialTree?

More generally, the new problem is to define a breadth-first unfoldM whose specialization with the Identity functor is equivalent to the pure unfold even on partially-defined values. That is, it satisfies the following equation:
unfold f = runIdentity . unfoldM (Identity . f)
Laziness without end

The strictness of liftA2 is caused by WeaveL having two constructors. Let’s get rid of EndL.
data WeaveE m a where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeaveE m a
Having only one constructor lets us use lazy patterns:
instance Functor (WeaveE m) where
  fmap f ~(WeftE wa g) = WeftE wa (f . g)
Wait a second. I spoke too fast, GHC gives us an error:
error: [GHC-87005]
    • An existential or GADT data constructor cannot be used
        inside a lazy (~) pattern
    • In the pattern: WeftE wa g
      In the pattern: ~(WeftE wa g)
      In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |
641 | >   fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |              ^^^^^^^^^^
The feature we need is “first-class existentials”, for which there is an open GHC proposal.

Not letting that stop us, there is a simple version of first-class existentials available in the package some, as the module Data.Some.Newtype (internally using unsafeCoerce). That will be sufficient for our purposes. All we need is an abstract type Some and a pattern synonym:
-- imported from Data.Some.Newtype
data Some f
pattern Some :: f a -> Some f
And we’re back on track. Here comes the actual “E” (endless) variant:
newtype WeaveE m a = MkWeaveE (Some (WeavingE m a))

data WeavingE m a b where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeavingE m a b
I spare you the details.
Functor, Applicative, weftE, meshE
instance Functor (WeaveE m) where
  fmap f (MkWeaveE (Some ~(WeftE u g))) = MkWeaveE (Some (WeftE u (f . g)))

instance Applicative m => Applicative (WeaveE m) where
  pure x = MkWeaveE (Some (WeftE (pure (pure ())) (\_ -> x)))
  liftA2 f (MkWeaveE (Some ~(WeftE u g))) (MkWeaveE (Some ~(WeftE v h)))
    = MkWeaveE (Some (WeftE ((liftA2 . liftA2) (,) u v) (\ ~(x, y) -> f (g x) (h y))))

weftE :: m (WeaveE m a) -> WeaveE m a
weftE u = MkWeaveE (Some (WeftE u id))

meshE :: Monad m => WeaveE m a -> m a
meshE (MkWeaveE (Some (WeftE u f))) = f <$> (u >>= meshE)
Breadth-first unfold, “E” variant: weaveE and unfoldM_BF_E
weaveE :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveE m (Tree a))
weaveE f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftE (weaveE f s1)) (weftE (weaveE f s2))

unfoldM_BF_E :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_E f s = weaveE f s >>= meshE
The endless WeaveE enables an even lazier implementation of unfoldM. When specialized to the identity monad, it lets us force the resulting tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).
partialTreeE :: Tree ()
partialTreeE = runIdentity (unfoldM_BF_E (Identity . partialTreeF) True)

forceLeftTreeE :: TestTree
forceLeftTreeE = testCase "E-left" $ do
  case partialTreeE of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"
Test output:
$ cabal exec breadth-first-unfolds -- -p "E-left"
All
  E-left: OK
One can also check that forcing the left spine of partialTreeE arbitrarily deep throws no errors.

We made it lazy, but at what cost? First, this “Endless” variant only works for lazy monads. With a strict monad, the runner meshE will loop forever. It is possible to run things more incrementally by pattern-matching on WeaveE, but you’re better off using the oblivious WeaveO anyway.

Second, when you aren’t running into an unproductive loop, the “Endless” variant of unfoldM has quadratic time complexity Ω(height²). The reason is essentially the same as the “Strict” variant: liftA2 keeps looping even if one argument is a pure weave—before, that was to traverse the other non-pure argument, now, there isn’t even a way to tell when the computation has ended. Thus, every leaf may create work proportional to the height of the tree.

Running the same benchmark as before, we measure even more baffling timings:

height time

baseline 1x

10x 738x
Benchmark: thinTreeE and benchE
thinTreeE :: Int -> Tree ()
thinTreeE = runIdentity . unfoldM_BF_E f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchE :: TestTree
benchE = {- localOption mutatorCpuTime $ -} bgroup "E-thin"
  [ bench "1x" (nf thinTreeE 100)
  , bench "10x" (nf thinTreeE 1000) & bcompare "E-thin.1x"
  ]
Raw output:
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      243  μs ±  22 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      179  ms ±  17 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 737.76x
Using the previous setup comparing a baseline and a 10x run, we see a more than 700x slowdown, so much worse than the 100x predicted by a quadratic model. Interestingly, the raw output shows that the total cumulative allocations did grow by a 100x factor.¹

But it gets weirder with more data points: it does not follow a clear power law. If Time(n) grew as n^c for some fixed exponent c, then the ratio Time(Mn)/Time(n) would be M^c, a constant that does not depend on n.

In the following benchmark, we keep doubling the height (M = 2) for every test case, and we measure the time relative to the preceding case each time. A quadratic model predicts a 4x slowdown at every step. Instead, we observe wildly varying factors.

Benchmark output (each time factor is relative to the preceding line, for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):

height time

1x

2x 10.9x

4x 9.5x

8x 5.4x

16x 1.4x
Code and raw output
benchE' :: TestTree
benchE' = {- localOption mutatorCpuTime $ -} bgroup "E-thin-more"
  [ bench "1x" (nf thinTreeE 100)
  , bench "2x" (nf thinTreeE 200) & bcompare "E-thin-more.1x"
  , bench "4x" (nf thinTreeE 400) & bcompare "E-thin-more.2x"
  , bench "8x" (nf thinTreeE 800) & bcompare "E-thin-more.4x"
  , bench "16x" (nf thinTreeE 1000) & bcompare "E-thin-more.8x"
  ]
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      222  μs ± 9.3 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    2x:  OK
      2.43 ms ±  85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
    4x:  OK
      23.1 ms ± 1.2 ms,  19 MB allocated, 2.7 MB copied,  10 MB peak memory, 9.53x
    8x:  OK
      126  ms ± 7.8 ms,  76 MB allocated,  18 MB copied,  24 MB peak memory, 5.44x
    16x: OK
      181  ms ± 7.0 ms, 119 MB allocated,  30 MB copied,  24 MB peak memory, 1.44x
I believe this benchmark is triggering some pathological behavior in the garbage collector. I modified tasty-bench with an option to measure CPU time without GC (mutator time). At time of writing, tasty-bench is still waiting for a new release. We can point Cabal to an unreleased commit of tasty-bench by adding the following lines to cabal.project.local.
source-repository-package
    type: git
    location: https://github.com/Bodigrim/tasty-bench.git
    tag: 81ff742a3db1d514461377729e00a74e5a9ac1b8
Then, uncomment the setting “localOption mutatorCpuTime $” in benchE and benchE' above and uncomment the import of mutatorCpuTime at the top.

Benchmark output (excluding GC time, relative):

height time

baseline 1x

1x 95x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      216  μs ±  18 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      20.5 ms ± 1.9 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 94.91x
For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is still a noticeable gap. I’m going to chalk the rest to inherent measurement errors (the cost of tasty-bench’s simplicity) exacerbated by the pathological GC behavior; a possible explanation is that the pattern of memory usage becomes so bad that it affects non-GC time.

Benchmark output (excluding GC time, each measurement is relative to the preceding line):

height time

1x

2x 3.2x

4x 4.2x

8x 4.5x

16x 1.7x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      186  μs ±  16 μs, 1.2 MB allocated,  13 KB copied,  21 MB peak memory
    2x:  OK
      597  μs ±  28 μs, 4.8 MB allocated, 236 KB copied,  21 MB peak memory, 3.20x
    4x:  OK
      2.48 ms ± 148 μs,  19 MB allocated, 2.9 MB copied,  21 MB peak memory, 4.15x
    8x:  OK
      11.2 ms ± 986 μs,  76 MB allocated,  18 MB copied,  24 MB peak memory, 4.50x
    16x: OK
      18.4 ms ± 1.7 ms, 119 MB allocated,  29 MB copied,  24 MB peak memory, 1.65x
It doesn’t seem possible for a breadth-first unfold to be both maximally lazy and of linear time complexity, but I don’t know how to formally prove that impossibility either.

Microbenchmarks: Queues vs Global Levels vs Weaves

So far we’ve focused on asymptotics (linear vs quadratic). Some readers will inevitably wonder about real speed. Among the linear-time algorithms—queues (“Q”), global levels (“G”), and weaves (lazy “L” or oblivious “O”)—which one is faster?

tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!), but keep in mind that these are all quite naive implementations.

There are two categories to measure separately: unfolds which produce trees, and oblivious unfolds—which don’t produce trees. These microbenchmarks construct full trees up to a chosen number of nodes. When there is an output tree, we force it (using nf), otherwise we force a counter of the number of nodes. We run on different sufficiently large sizes (500 and 5000) to check the stability of the measured factors, ensuring that we are only comparing the time components that dominate at scale.

The tables list times relative to the queue benchmark for each tree size.

Tree-producing unfolds

algorithm size time

Queue 500 1x

Global Levels 500 1.4x

Lazy Weave 500 3.1x

Queue 5000 1x

Global Levels 5000 1.2x

Lazy Weave 5000 3.3x
Code and raw output
fullTreeF :: Int -> Int -> TreeF Int Int
fullTreeF size n | n >= size = LeafF
fullTreeF size n = NodeF n (2 * n) (2 * n + 1)

fullTree_Q :: Int -> Tree Int
fullTree_Q size = runIdentity (unfoldM_BF_Q (Identity . fullTreeF size) 1)

fullTree_G :: Int -> Tree Int
fullTree_G size = runIdentity (unfoldM_BF_G (Identity . fullTreeF size) 1)

fullTree_L :: Int -> Tree Int
fullTree_L size = runIdentity (unfoldM_BF_L (Identity . fullTreeF size) 1)

fullTree :: TestTree
fullTree = bgroup "fullTree"
  [ bench "Q-1x" (nf fullTree_Q 500)
  , bench "G-1x" (nf fullTree_G 500) & bcompare "fullTree.Q-1x"
  , bench "L-1x" (nf fullTree_L 500) & bcompare "fullTree.Q-1x"
  , bench "Q-10x" (nf fullTree_Q 5000)
  , bench "G-10x" (nf fullTree_G 5000) & bcompare "fullTree.Q-10x"
  , bench "L-10x" (nf fullTree_L 5000) & bcompare "fullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p fullTree
All
  fullTree
    Q-1x:  OK
      20.6 μs ± 1.1 μs, 141 KB allocated, 477 B  copied, 6.0 MB peak memory
    G-1x:  OK
      28.6 μs ± 2.4 μs, 223 KB allocated, 928 B  copied, 6.0 MB peak memory, 1.39x
    L-1x:  OK
      64.3 μs ± 5.6 μs, 353 KB allocated, 3.7 KB copied, 6.0 MB peak memory, 3.13x
    Q-10x: OK
      287  μs ±  26 μs, 1.5 MB allocated,  57 KB copied, 7.0 MB peak memory
    G-10x: OK
      349  μs ±  30 μs, 2.2 MB allocated,  94 KB copied, 7.0 MB peak memory, 1.22x
    L-10x: OK
      935  μs ±  73 μs, 3.5 MB allocated, 386 KB copied, 7.0 MB peak memory, 3.25x
Oblivious unfolds

algorithm size time

Queue 500 1x

Global Levels 500 11x

Oblivious Weave 500 25x

Queue 5000 1x

Global Levels 5000 10x

Oblivious Weave 5000 24x
Code and raw output
unfoldM_BF_Q_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_Q_ f s0 = unfoldM_f (s0 <+ Empty)
  where
    unfoldM_f (q :> s) = f s >>= \case
      LeafF -> unfoldM_f q
      NodeF _ l r -> unfoldM_f (r <+ l <+ q)
    unfoldM_f Empty = pure ()
eatFullTree_Q :: Int -> Int
eatFullTree_Q size = (`execState` 0) (unfoldM_BF_Q_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_G :: Int -> Int
eatFullTree_G size = (`execState` 0) (unfoldM_BF_G_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_O :: Int -> Int
eatFullTree_O size = (`execState` 0) (unfoldM_BF_O_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree :: TestTree
eatFullTree = bgroup "eatFullTree"
  [ bench "Q-1x" (nf eatFullTree_Q 500)
  , bench "G-1x" (nf eatFullTree_G 500) & bcompare "eatFullTree.Q-1x"
  , bench "W-1x" (nf eatFullTree_O 500) & bcompare "eatFullTree.Q-1x"
  , bench "Q-10x" (nf eatFullTree_Q 5000)
  , bench "G-10x" (nf eatFullTree_G 5000) & bcompare "eatFullTree.Q-10x"
  , bench "W-10x" (nf eatFullTree_O 5000) & bcompare "eatFullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p eatFullTree
All
  eatFullTree
    Q-1x:  OK
      11.0 μs ± 724 ns,  78 KB allocated, 338 B  copied, 6.0 MB peak memory
    G-1x:  OK
      116  μs ±  11 μs, 379 KB allocated, 1.3 KB copied, 6.0 MB peak memory, 10.57x
    W-1x:  OK
      278  μs ±  14 μs, 830 KB allocated, 5.9 KB copied, 6.0 MB peak memory, 25.36x
    Q-10x: OK
      120  μs ±  11 μs, 781 KB allocated,  21 KB copied, 6.0 MB peak memory
    G-10x: OK
      1.23 ms ± 122 μs, 3.9 MB allocated, 109 KB copied, 7.0 MB peak memory, 10.27x
    W-10x: OK
      2.92 ms ± 255 μs, 8.4 MB allocated, 631 KB copied, 7.0 MB peak memory, 24.43x
Conclusion

I hope to have piqued your interest in breadth-first unfolds without using queues. To the best of my knowledge, this specific problem hasn’t been studied in the literature. It is of course related to breadth-first traversals, previously solved using the Phases applicative.² The intersection of functional programming and breadth-first walks is a small niche, which makes it quick to survey that corner of the world for any related ideas to those presented here.

The paper Modular models of monoids with operations by Zhixuan Yang and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an example application of their theory. Basically, Phases is defined by a fixed-point equation:
Phases f = Day f Phases :+: Identity
We can express Phases abstractly as a least fixed-point μx.f▫x + Id in any monoidal category with a suitable structure. If we instantiate the monoidal product ▫ not with Day convolution, but with functor composition (Compose), then we get Weave.

In another coincidence, the monad-coroutine package implements a weave function which is a generalization of liftS2—this may require some squinting. While WeaveS as a data type coincides with the free monad Free, monad-coroutine’s core data type Coroutine coincides with the free monad transformer FreeT.

We can view Phases as a generalization of “zipping” from lists to free applicatives—which are essentially lists of actions, and Weave generalizes that further to free monads. To recap, the surprise was that the naive data type of free monads results in a quadratic-time unfold. That issue motivated a “lazy” variant³ which achieves a linear-time breadth-first unfold. That in turn suggested an even “lazier” variant which enables more control on evaluation order at the cost of efficiency.

I’ve just released the weave library which implements the main ideas of this post. I don’t expect it to have many users, given how much slower it is compared to queue-based solutions. But I would be curious to find a use case for the new compositionality afforded by this abstraction.

Recap table

Unfolds Time Laziness Compositional

Phases^* No linear^† by levels Yes

Queue (Q) Yes linear^† strict No

Global Levels (G) Yes linear^† by levels No

Strict Weave (S) Yes quadratic^‡ strict Yes

Oblivious Weave (O) Oblivious only linear^† N/A Yes

Lazy Weave (L) Yes linear^† by levels Yes

Endless Weave (E) Yes quadratic^‡E maximally lazy^◊ Yes

^†Linear wrt. size: Θ(size).
^‡Quadratic wrt. height: lower bound Ω(height²), upper bound O(height × size).
^EThe “Endless” meshE only terminates with lazy monads.
^*I guess there exists an “endless Phases” variant, that would be quadratic and maximally lazy.
^◊The definition of “maximally lazy” in this post actually misses a range of possible lazy behaviors with monads other than Identity. A further refinement seems to be another can of worms.
The main action of this Literate Haskell program
main :: IO ()
main = defaultMain
  [ testGraphQ
  , testGraphG
  , testGraphS
  , testGraphL
  , testGraphE
  , whnfTreeQ
  , whnfTreeS
  , whnfTreeL
  , whnfTreeE
  , forceLeftTreeL
  , forceLeftTreeE
  , benchS
  , benchO
  , benchL
  , benchE
  , benchE'
  , fullTree
  , eatFullTree
  ]
Extra test cases
whnfTreeE :: TestTree
whnfTreeE = testCase "E-whnf" $ do
  case partialTreeE of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

whnfTreeQ :: TestTree
whnfTreeQ = expectFail $ testCase "Q-whnf" $ do
  case partialTreeQ of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

partialTreeQ :: Tree ()
partialTreeQ = runIdentity (unfoldM_BF_Q (Identity . partialTreeF) True)

bfGraph_L :: Int -> Tree Int
bfGraph_L = (`evalState` Set.empty) . unfoldM_BF_L visitGraph

testGraphL :: TestTree
testGraphL = testCase "L-graph" $
  bfGraph_L 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

bfGraph_E :: Int -> Tree Int
bfGraph_E = (`evalState` Set.empty) . unfoldM_BF_E visitGraph

testGraphE :: TestTree
testGraphE = testCase "E-graph" $
  bfGraph_E 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))
Note that tasty-bench also reports memory statistics (allocated, copied, and peak memory) when certain RTS options are enabled, which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎

The tree-traversals library by Noah Easterly.

Algebras for weighted search, by Donnacha Oisín Kidney and Nicolas Wu, in ICFP 2021.

Breadth-first traversal via staging, by Jeremy Gibbons, Donnacha Oisín Kidney, Tom Shrijvers, and Nicolas Wu, in MPC 2022. It has been revised into a short version (5 pages): Phases in software architecture in FUNARCH 2023.

↩︎

Speaking of variants of free monads, one might think of the “freer” monad, which has different motivations and which does not help us here.↩︎
by Lysxia at March 30, 2025 12:00 AM

height	time
baseline	1x
10x	105x

height	time
baseline	1x
10x	9.8x

height	time
baseline	1x
10x	9.93x

height	time
baseline	1x
10x	738x

height	time
1x
2x	10.9x
4x	9.5x
8x	5.4x
16x	1.4x

height	time
baseline	1x
1x	95x

height	time
1x
2x	3.2x
4x	4.2x
8x	4.5x
16x	1.7x

algorithm	size	time
Queue	500	1x
Global Levels	500	1.4x
Lazy Weave	500	3.1x
Queue	5000	1x
Global Levels	5000	1.2x
Lazy Weave	5000	3.3x

algorithm	size	time
Queue	500	1x
Global Levels	500	11x
Oblivious Weave	500	25x
Queue	5000	1x
Global Levels	5000	10x
Oblivious Weave	5000	24x

	Unfolds	Time	Laziness	Compositional
Phases^*	No	linear^†	by levels	Yes
Queue (Q)	Yes	linear^†	strict	No
Global Levels (G)	Yes	linear^†	by levels	No
Strict Weave (S)	Yes	quadratic^‡	strict	Yes
Oblivious Weave (O)	Oblivious only	linear^†	N/A	Yes
Lazy Weave (L)	Yes	linear^†	by levels	Yes
Endless Weave (E)	Yes	quadratic^‡E	maximally lazy^◊	Yes

March 28, 2025

Mark Jason Dominus

Does someone really have to do the dirty jobs?

Doing the laundry used to be backbreaking toil. Haul the water, chop the wood, light the fire, heat the water, and now you are ready to begin the really tough part of the work. The old saying goes "Wash on Monday", because Monday is the day after your day of rest, and otherwise you won't have the strength to do the washing.

And the saying continues: “Iron on Tuesday, mend on Wednesday”. Routine management of clothing takes half of the six-day work week.

For this reason, washing is the work of last resort for the poorest and most marginal people. Widows are washerwomen. Prisons are laundries. Chinese immigrants run laundries. Anyone with enough money to outsource their laundry does so.

The invention of mechanical washing machines eliminated a great amount of human suffering and toil. Machines do the washing now. Nobody has to break their back scrubbing soiled linens against a washboard.

“Eskimo child with wooden tub and washboard”, c. 1905, by Frank Hamilton Nowell, public domain, via Wikimedia Commons.

But the flip side of that is that there are still poor and marginalized people, who now have to find other work. Mechanical laundry has taken away their jobs. They no longer have to do the backbreaking labor of hand laundry. Now they have the option to starve to death instead.

Is it a net win? I don't know. I'd like to think so. I'd like to free people from the toil of hand laundry without also starving some of them to death. Our present system doesn't seem to be very good at that sort of thing. I'm not sure what a better system would look like.

Anyway, this is on my mind a lot lately because of the recent developments in computer-generated art. I think “well, it's not all bad, because at least now nobody will have to make a living drawing pornographic pictures of other people's furry OCs. Surely that is a slight elevation of the human condition.” On the other hand, some of those people would rather have the money and who am I to deny them that choice?

by Mark Dominus (mjd@plover.com) at March 28, 2025 04:15 PM

March 26, 2025

Well-Typed.Com

The Haskell Unfolder Episode 41: generic monoids

Today, 2025-03-16, at 1930 UTC (12:30 pm PST, 3:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 41th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 41: generic monoids

Generic functions are a powerful tool that allows us to make more type classes derivable. In this episode, we’ll look at a simple example, namely deriving Monoid instances for product types, using both GHC’s built-in generics and the generics-sop library.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at March 26, 2025 12:00 AM

March 25, 2025

Mark Jason Dominus

The mathematical past is a foreign country

A modern presentation of the Peano axioms looks like this:

$0$ is a natural number

If $n$ is a natural number, then so is the result of appending an $S$ to the beginning of $n$

Nothing else is a natural number

This baldly states that zero is a natural number.

I think this is a 20th-century development. In 1889, the natural numbers started at $1$ , not at $0$ . Peano's Arithmetices principia, nova methodo exposita (1889) is the source of the Peano axioms and in it Peano starts the natural numbers at $1$ , not at $0$ :

$screencap of Peano's first axiom from his book. It states “1∈N”.$

There's axiom 1: $1\in\Bbb N$ . No zero. I think starting at $0$ may be a Bourbakism.

In a modern presentation we define addition like this:

$$ \begin{array}{rrl} (i) & a + 0 = & a \\ (ii) & a + Sb = & S(a+b) \end{array} $$

Peano doesn't have zero, so he doesn't need item $(i)$ . His definition just has $(ii)$ .

But wait, doesn't his inductive definition need to have a base case? Maybe something like this?

\begin{array}{rrl} (i') & a + 1 = & Sa \\ \end{array}

Nope, Peano has nothing like that. But surely the definition must have a base case? How can Peano get around that?

Well, by modern standards, he cheats!

Peano doesn't have a special notation like $S$ for successor. Where a modern presentation might write $Sa$ for the successor of the number $a$ , Peano writes “ $a + 1$ ”.

So his version of $(ii)$ looks like this:

$$ a + (b + 1) = (a + b) + 1 $$

which is pretty much a symbol-for-symbol translation of $(ii)$ . But if we try to translate $(i')$ similarly, it looks like this:

$$ a + 1 = a + 1 $$

That's why Peano didn't include it: to him, it was tautological.

But to modern eyes that last formula is deceptive because it equivocates between the " $+ 1$ " notation that is being used to represent the successor operation (on the right) and the addition operation that Peano is trying to define (on the left). In a modern presentation, we are careful to distinguish between our formal symbol for a successor, and our definition of the addition operation.

Peano, working pre-Frege and pre-Hilbert, doesn't have the same concept of what this means. To Peano, constructing the successor of a number, and adding a number to the constant $1$ , are the same operation: the successor operation is just adding $1$ .

But to us, $Sa$ and $a+S0$ are different operations that happen to yield the same value. To us, the successor operation is a purely abstract or formal symbol manipulation (“stick an $S$ on the front”). The fact that it also has an arithmetic interpretation, related to addition, appears only once we contemplate the theorem $$\forall a. a + S0 = Sa.$$ There is nothing like this in Peano.

It's things like this that make it tricky to read older mathematics books. There are deep philosophical differences about what is being done and why, and they are not usually explicit.

Another example: in the 19th century, the abstract presentation of group theory had not yet been invented. The phrase “group” was understood to be short for “group of permutations”, and the important property was closure, specifically closure under composition of permutations. In a 20th century abstract presentation, the closure property is usually passed over without comment. In a modern view, the notation $G_1\cup G_2$ is not even meaningful, because groups are not sets and you cannot just mix together two sets of group elements without also specifying how to extend the binary operation, perhaps via a free product or something. In the 19th century, $G_1\cup G_2$ is perfectly ordinary, because $G_1$ and $G_2$ are just sets of permutations. One can then ask whether that set is a group — that is, whether it is closed under composition of permutations — and if not, what is the smallest group that contains it.

It's something like a foreign language of a foreign culture. You can try to translate the words, but the underlying ideas may not be the same.

Addendum 20250326

Simon Tatham reminds me that Peano's equivocation has come up here before. I previously discussed a Math SE post in which OP was confused because Bertrand Russell's presentation of the Peano axioms similarly used the notation “ $+ 1$ ” for the successor operation, and did not understand why it was not tautological.

by Mark Dominus (mjd@plover.com) at March 25, 2025 07:02 PM

March 21, 2025

Well-Typed.Com

GHC activities report: December 2024â€“February 2025

This is the twenty-sixth edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of December 2024 to February 2025. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem. Clients who engage in these packages both fund the work described in this report and support the Haskell Foundation.

We are delighted to announce two new Bronze Haskell Ecosystem Supporters: Channable and QBayLogic. Many thanks also to our existing clients who also contribute to making this work possible: Anduril, Juspay and Mercury, and to the HLS Open Collective for supporting HLS release management.

However, Haskell’s core infrastructure is expansive; we need more funding to continue this essential maintenance work! Read more about our offer, or get in touch with us today.

Team

The Haskell toolchain team at Well-Typed currently includes:

Andreas Klebinger

Ben Gamari

Matthew Pickering

Rodrigo Mesquita

Sam Derbyshire

Hannes Siebenhandl

Zubin Duggal

Mikolaj Konarski

In addition, many others within Well-Typed contribute to GHC, Cabal and HLS occasionally, or contribute to other open source Haskell libraries and tools.

GHC

GHC Releases

We are currently overseeing releases of the GHC 9.10 and 9.12 release series.

Zubin oversaw the preparation and final release of GHC 9.12.1 on 16 December 2024. Unfortunately, it was found in mid-January that this release was affected by a regression affecting sub-word division (#25653). In response to this we scheduled a minimal 9.12.2 release fixing this issue, which was released on 14 March 2025. We anticipate that the next GHC 9.12 release will come in the summer.

Our current release engineering focus is 9.10.2, which is currently being worked on by Andreas and Ben. We expect that the release candidate for this will be out in late March.

In parallel, Zubin has been working towards cutting a corresponding release of HLS and introducing support for GHC 9.12.

Platform support

For many years, GHC’s FreeBSD support has been in a state of limbo: while the compiler has usually been functional on FreeBSD, we have never had proper CI support, meaning that we could neither systematically validate correctness nor produce binary distributions.

Late last year Ben fixed several issues that had been plaguing FreeBSD, allowing it to pass the testsuite and worked with a contributor to bring up a continuous integration runner for this platform (!13619, !13963). In response, we expect that GHC 9.14 will ship binary distributions for FreeBSD as a tier 2 platform.

However, supporting GHC’s compatibility matrix requires a real investment of time and energy. If you rely on FreeBSD or any other BSD, we would appreciate your help in looking after and improving GHC’s support of these platforms.

Frontend

Matthew, Adam, and Rodrigo wrote and proposed the now-accepted Explicit Level Imports proposal. This proposal represents a significant step forward in Haskell staged metaprogramming story, introducing syntax to distinguish imports needed at runtime from those only needed at compile-time (e.g. for TemplateHaskell splices). This distinction opens the door to compile-time improvements, more robust cross-compilation support, and a more expressive metaprogramming story.

Sam worked with Simon Peyton-Jones to implement GHC proposal #493 (Allow expressions in SPECIALISE pragmas) (!12319).

Sam finished up work by GHC contributor Jade, giving GHCi error messages their own error codes (#23338, !13094).

Sam made several internal improvements to the typechecker, surrounding the function checkTyEqRhs which is responsible for skolem escape, occurs checks and representation-polymorphism checking (!13778, !13931).

Sam implemented a simplification of the logic for solving of quantified constraints, both improving solver efficiency and simplifying specialization (!13958).

Sam refactored the GHC “error context” infrastructure, migrating it to a structured representation like for the error message contents (#23436, !10540).

Sam implemented defaulting of representational equalities, which allows GHC to accept several uses of coerce that used to be rejected with ambiguous types (#21003, !13834).

Sam fixed GHC emitting spurious “incomplete record selectors” warnings due to missing long-distance information in the pattern-match checker (#25749, !13979).

Backend

Sam prevented GHC emitting LLVM code with incompatible vector types (e.g. the same variable being declared as 4xi32 and used as 8xi16) (!13936).

Sam investigated CI failures with the LLVM backend, identifying several critical bugs such as #25771 and #25773.

Ben improved the naming of various compiler-generated binders, which will make it easier to make sense of -ddump-simpl output and runtime stacks (!13849, !13875).

Rodrigo ensured that certain join-points inline, dramatically improving runtime allocations of certain programs (#25723, !13909).

In response to a serious correctness regression in 9.12.1, Ben improved the testing story for primops by adding Cmm surface syntax for the previously-untested Mul2 operations and expanding the scope of the test-primops testsuite (!13843, test-primops!27).

Compiler performance

Matthew improved the performance of type family consistency checking by ensuring checks are run in topological order, significantly reducing redundant work (#25554, !13685).

Rodrigo made a variety of improvements in GHC’s memory consumption, including refactoring the ModuleGraph interface (!13658) and reducing the memory usage of module transitive closure calculations done when encountering Template Haskell splices (#25634, !13753).

Rodrigo refactored the HomePackageTable and HomeUnitGraph to avoid significant space leaks (#25511, !13675).

Rodrigo improved the performance of compiling deriving Show and deriving Data (!13739).

Runtime system

Ben fixed a few bugs in the linker’s object unloading implementation resulting in runtime crashes (#24935, !13704; #25039, !13714).

Ben removed some dead code in the IO manager which was causing some CI jobs to fail (!13678).

Ben lifted the runtime system’s limit of 256 capabilities, ensuring that the runtime system can scale to large multicore systems (#25560, !13692).

Ben fixed a bug in mmapInRegion which would cause it to loop indefinitely in certain circumstances on FreeBSD (#25492, !13618).

GHCi & bytecode interpreter

Matthew improved the error reporting of out-of-scope qualified names in GHCi (!13751).

Matthew fixed segfaults in the bytecode interpreter that were caused by the FastString table being loaded unoptimised (!13877).

Matthew dramatically improved the performance of the bytecode interpreter

by avoiding generating no-op SLIDE x 0 instructions (!13868),

by using a strict genericLength function (!13885),

by avoiding intermerdiate lists in nameToCLabel (!13898),

by using Name rather than FastString to key the symbol cache (!13914).

Matthew fixed the INTERP_STATS macro that is used for performance statistics of GHCi (!13879), and then proceeded to fix accounting errors in these statistics (!13956).

Ben and Matthew improved the printing of BCOs to assist debugging the bytecode interpreter (!13570, !13878, !13955)

Ben fixed an incorrect assumption regarding which unlifted types can appear at the top-level (#25641, !13796).

Libraries

Ben re-introduced missing {Enum/Show} IOSubSystem instances that were accidentally removed in !9676 (#25549, !13683).

Ben introduced Data.Enum and Data.Bounded as agreed in CLC Proposal #208, correcting an accidental inclusion of Data.Enum in the ghc-internal refactor (#25320, !11347, !13790).

Ben implemented CLC proposal #305, ensuring that threads created by GHC’s base library can be easily identified by their thread label (#25452, !13566).

Rodrigo improved the implementation of SomeException for SomeAsyncException, implementing CLC Proposal #309 (!13725).

Build system

Ben fixed #25501, ensuring that the ld-override logic is consistent between the configure script and ghc-toolchain (!13617).

Ben mitigated a race condition with mktexfmt in Hadrian (#25564, !13703).

Ben allowed i686 to be parsed as part of triples in the configure script (#25691, !13874).

Cabal

Matthew fixed two Cabal 3.14 regressions in which the current working directory was not correctly taken into account, when creating the build folder (#10772, #10800) and when running test executables (#10704, #10725).

Matthew fixed a Cabal 3.14 regression in which Cabal would erroneously pick versions of build tools (such as alex or happy) from the system environment rather than the versions specified in build-tool-depends (#10692, #10731).

ghc-debug

Zubin implemented support in the ghc-debug backend and Brick front-end for streaming heap traversals, enabling constant-space analysis of large heaps (!66).

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at March 21, 2025 12:00 AM

March 20, 2025

Tweag I/O

A hundred pull requests for Liquid Haskell
A new release of Liquid Haskell is out after quite an active period of development with 99 pull requests in the liquidhaskell repository, and 29 pull requests in the liquid-fixpoint repository from about ten contributors. This post is to provide an overview of the changes that made it into the latest release.

There were contributions to the reflection and proof mechanisms; we got contributions to the integration with GHC; the support of cvc5 was improved when dealing with sets, bags, and maps; and there was a rather large overhaul of the name resolution mechanism.

Reflection improvements

Liquid Haskell is a tool to verify Haskell programs. We can write formal specifications inside special Haskell comments {-@ ... @-}, and the tool will check whether the program behaves as specified. For instance, the following specification of the filter function says that we expect all of the elements in the result to satisfy the given predicate.
{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}
Liquid Haskell would then analyze the implementation of filter to verify that it does indeed yield elements that satisfy the predicate.

To verify such a specification, Liquid Haskell needs to attach a meaning to the names in the predicate all p v. It readily learns that p is a parameter of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the Haskell function from the Prelude.
all :: (a -> Bool) -> [a] -> Bool
And Liquid Haskell has a mechanism to provide logic meaning to the implementation of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a mix of local and imported definitions from dependencies that are not analysed with Liquid Haskell. Last year, there was an internship at Tweag to address exactly this friction, which resulted in contributions to the latest release.

Reasoning and reflection of lambdas

The reflection mechanism also has other specific limitations at the moment. For instance, it doesn’t allow reflecting recursive functions defined in let or where bindings. And until recently, it didn’t allow reflecting functions that contained anonymous functions. For example,
takePositives = filter (\x -> x > 0)
In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them. This feature is considered experimental at the moment, since we will still have usability and performance concerns that deserve further contributions, but one can already explore the experience that we could expect in the long run.

Integration with GHC

In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the end of the type checking phase firstly to ensure it only runs on well-typed programs, and secondly, to ensure the plugin runs when GHC is only asked to typecheck the module but not to generate code, which was helpful to IDEs.

For a few technical reasons, the plugin was re-parsing and re-typechecking the module instead of using the abstract syntax tree (AST) that GHC handed to it as the result of type checking. That is no longer the case in the latest release, where the AST after type checking is now used for all purposes. In addition, there were several improvements to how the ghc library is used.

cvc5 support

Liquid Haskell offloads part of its reasoning to a family of automated theorem provers known as SMT solvers. For most developments, Liquid Haskell has been used with the Z3 SMT solver, and this is what has been used most of the time in continuous integration pipelines.

In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard interface known as SMT-LIB. In practice, however, experiments are done with theories that are not part of the standard. For instance, the reasoning capabilities for bags, sets, and maps used to require z3. But now the latest release implements support for cvc5 as well.

Name resolution overhaul

Name resolution determines, for each name in a program, what is the definition that it refers to. Liquid Haskell, in particular, is responsible for resolving names that appear in specifications. This task was problematic when the programs it was asked to verify spanned many modules.

There were multiple kinds of names, each with their own name resolution rules, and names were resolved in different environments when verifying a module and when importing it elsewhere, not always yielding the same results, which often produced confusing errors.

Name resolution, however, was done all over the code base, and any attempt to rationalize it would require a few months of effort. I started such an epic last September, and managed to conclude it in February. These changes made it into the latest release together with an awful lot of side quests to simplify the existing code.

The road ahead

There is no coordinated roadmap for Liquid Haskell. Much of the contributions that it receives depend on the opportunity enabled by academic research or the needs of particular use cases.

On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge is reducing the amount of common workarounds that the proficient Liquid Haskeller needs to employ today. For instance, supporting reflection of functions in local bindings would save the user the trouble of rewriting her programs to put the recursive functions in the top level. Repairing the support for type classes would allow functions to be verified even if they use type classes, which is a large subset of Haskell today. And without having defined a scope with precision yet, Liquid Haskell still needs to improve its user documentation, its error messages, and its tracing and logging.

The project is chugging along, though. It is making significant leaps in usability. The upgrade costs have been quantified for a few GHC releases, and no longer look like an unbounded risk. The amount of external contributions has increased last year, although we still have to see if it is a trend. And there is no shortage of interest from academia and industrial interns.

Thanks to the many contributors for their work and their help during code reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!
March 20, 2025 12:00 AM

March 18, 2025

Mark Jason Dominus

Baseball on the Moon

We want to adapt baseball to be played on the moon. Is there any way to make it work?

My first impression is: no, for several reasons.

The pitched ball will go a little faster (no air resistance) but breaking balls are impossible (ditto). So the batter will find it easier to get a solid hit. We can't fix this by moving the plate closer to the pitcher's rubber; that would expose both batter and pitcher to unacceptable danger. I think we also can't fix it by making the plate much wider.

Once the batter hits the ball, it will go a long long way, six times as far as a batted ball on Earth. In order for every hit to not be a home run, the outfield fence will have to be about six times as far way, so the outfield will be $36$ times as large. I don't think the outfielders can move six times as fast to catch up to it. Perhaps if there were 100 outfielders instead of only three?

Fielding the ball will be more difficult. Note that even though the vacuum prevents the pitch from breaking, the batted ball can still take unexpected hops off the ground.

Having gotten hold of the ball, the outfielder will then need to throw it back to the infield. They will be able to throw it that far, but they probably won't be able do it accurately enough for the receiving fielder to make the play at the base. More likely the outfielder will throw it wild.

I don't think this can be easily salvaged. People do love home runs, but I don't think they would love this. Games are too long already.

Well, here's a thought. What if instead of four bases, arranged in a $90$ -foot square, we had, I don't know, eight or ten, maybe $200$ or $300$ feet apart? More opportunities for outs on the basepaths, and also the middle bases would not be so far from the outfield. Instead of throwing directly to the infield, the outfielders would have a relay system where one outfielder would throw to another that was farther in, and perhaps one more, before reaching the infield. That might be pretty cool.

I think it's not easy to run fast on the Moon. On the Earth, a runner's feet are pushing against the ground many times each second. On the Moon, the runner is taking big leaps. They may only get in one-sixth as many steps over the same distance, which would give them much less opportunity to convert muscle energy into velocity. (Somewhat countervailing, though: no air resistance.) Runners would have to train specially to be able to leap accurately to the bases. Under standard rules, a runner who overshoots the base will land off the basepaths and be automatically out.

So we might expect to see the runner bounding toward first base. Then one of the thirty or so far-left fielders would get the ball, relay it to the middle-left fielder and then the near-left fielder who would make the throw back to first. The throw would be inaccurate because it has to traverse a very large infield, and the first baseman would have to go chasing after it and pick it up from foul territory. He can't get back to first base quickly enough, but that's okay, the pitcher has bounded over from the mound and is waiting near first base to make the force play. Maybe the runner isn't there yet because one of his leaps was too long and to take another he has to jump high into the air and come down again.

It would work better than Quiddich, anyway.

by Mark Dominus (mjd@plover.com) at March 18, 2025 03:12 PM

March 17, 2025

Haskell Interlude

63: Farhad Mehta

On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.

by Haskell Podcast at March 17, 2025 02:00 PM

March 16, 2025

Magnus Therning

Using lens-aeson to implement FromJSON
At work I sometimes need to deal with large and deep JSON objects where I'm only interested in a few of the values. If all the interesting values are on the top level, then aeson have functions that make it easy to implement FromJSON's parseJSON (Constructors and accessors), but if the values are spread out then the functions in aeson come up a bit short. That's when I reach for lens-aeson, as lenses make it very easy to work with large structures. However, I've found that using its lenses to implement parseJSON become a lot easier with a few helper functions.

Many of the lenses produces results wrapped in Maybe, so the first function is one that transforms a Maybe a to a Parser a. Here I make use of Parser implementing MonadFail.
infixl 8 <!>
(<!>) :: (MonadFail m) => Maybe a -> String -> m a
(<!>) mv err = maybe (fail err) pure mv
In some code I wrote this week I used it to extract the user name out of a JWT produced by Keycloak:
instance FromJSON OurClaimsSet where
    parseJSON = ... $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "preferre username missing"
        ...
        pure $ OurClaimsSet cs n ...
Also, all the lenses start with a Value and that makes the withX functions in aeson to not be a perfect fit. So I define variations of the withX functions, e.g.
withObjectV :: String -> (Value -> Parser a) -> Value -> Parser a
withObjectV s f = withObject s (f . Object)
That makes the full FromJSON instance for OurClaimsSet looks like this
instance FromJSON OurClaimsSet where
    parseJSON = withObjectV "OurClaimsSet" $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "name"
        let rs = o ^.. key "resource_access" . members . key "roles" . _Array . traverse . _String
        pure $ OurClaimsSet cs n rs
Tags: haskell
March 16, 2025 08:26 AM

March 15, 2025

Mark Jason Dominus

Hangeul sign-engraving machine

Last summer I was privileged to visit the glorious Letterpress Museum in Paju Book City, where I spent several hours and took a collection of photos that are probably not of interest to anyone but letterpress geeks, and perhaps not even to them.

Looking back at the photos it's not always clear to me why I took each one. But some of them I can remember. For example, this one:

This is not exactly letterpress. It is a device for engraving lettered signs on thin strips of metal or perhaps plastic. Happily I don't have to spend too much time explaining this because Marcin Wichary has just published an extensively-illustrated article about the Latin-script version. The only thing different about this one is the fonts, which are for writing Korean in Hangeul script rather than English in Latin script.

(Here's my real-quick summary. There is no ink. A stylus goes into the grooves of those brass templates. The stylus is attached with a pantograph to a router bit that rests on the object that the operator wants to engrave. When operator moves the stylus in the template grooves, the router bit follows their motions and engraves matching grooves in the target object. By adjusting the pantograph, one can engrave letters that are larger or smaller than the templates.)

Hangeul has an alphabet of 24 letters, but there's a difficulty in adapting this engraving technique for written Hangeul: The letters aren't written in a simple horizontal row as European languages are. Instead, they are grouped into syllables of two or three letters. For example, consider the consider the Korean word “문어”, pronounced (roughly) "moon-aw". which means “octopus”. This is made up of five letters ㅁㅜㄴㅇㅓ, but as you see they are arranged in two syllables 문 ("moon") and 어 ("aw"). So instead of twenty-four kinds of templates, one for each letter, the Korean set needs one for every possible syllable, and there are thousands of possible syllables.

Unicode gets around this by… sorry, Unicode doesn't get around it, they just allocate eleven thousand codepoints, one for each possible syllable. But for this engraving device, it would be prohibitively expensive to make eleven thousand little templates, then another eleven thousand spares, and impractical to sort and manage them in the shop. Instead there is a clever solution.

Take a look at just one of these templates:

This is not a Hangeul syllable.

Rather, it is five. The upper-right letter in the syllable is the vowel, and the template allows the operator to engrave any of the five vowels

ㅣㅓㅏㅕㅑ

to produce the syllables

잉 엉 앙 영 양

pronounced respectively "ing", "ông", "ang", "yông", and "yang".

Similarly this one can produce six different syllables:

The upper-left part can be used to engrave either of the consonants ㅅ or ㅈ and the upper-right part can be used to engrave any of the vowels ㅣㅓㅏ, to produce the combined set 싱 성 상 징 정 장. I'm not sure why this template doesn't also enable vowels ㅕㅑ as the other one did.

In the picture at top you can see that while the third template can be used to engrave any of the three syllables 송 승 숭 the operator has actually used it to engrave the first of these.

This ingenious mechanism cuts down the required number of templates by perhaps a factor of five, from ten boxes to two.

Addendum 20250325

A great many of the 11,000 Unicode codepoints are for seldom-used syllables that contain four or even five letters, such as 둻. I studied Korean for a while and I think I learned only one with with more than three letters in a syllable: 닭 means “chicken”.

I don't see templates for these syllables in any of my photographs, which probaby accounts for much of the great reduction in templates from the 11,000 possible syllables. But there must have been some way to engrave the syllables with the machine.

Maybe there was a template that had a small four small ㄷsymbols, one in each of the four corners of the template, and another with four ㄹ symbols, and so on? Then the operator could have composed 닭out of bits from four different templates.

by Mark Dominus (mjd@plover.com) at March 15, 2025 09:58 PM

March 14, 2025

GHC Developer Blog

GHC 9.12.2 is now available

GHC 9.12.2 is now available

Ben Gamari - 2025-03-14

The GHC developers are happy to announce the release of GHC 9.12.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly. This is a small release fixing a critical code generation bug, #25653, affecting some subword division operations.

As always, GHCâ€™s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors who contribute their code, tickets, and energy to the GHC project.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at March 14, 2025 12:00 AM

March 13, 2025

Brent Yorgey

Hendrix College Programming Contest 2025

Hendrix College Programming Contest 2025

Posted on March 13, 2025
Tagged competitive programming, Hendrix, programming, contest, HCPC, Kattis

I haven’t written on here in a while, mostly because a lot of my time has gone into preparing for the second annual Hendrix College Programming Contest, which will take place this Saturday, March 15, from 12:30-5:30pm CDT (17:30-22:30 UTC).

I’ve created an open mirror contest which will run in parallel to the official contest, so if you want to grab some friends and try solving some of the problems together using your favorite language, be my guest!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at March 13, 2025 12:00 AM

March 12, 2025

Well-Typed.Com

The Haskell Unfolder Episode 40: understanding through a model

Today, 2025-03-12, at 1930 UTC (12:30 pm PST, 3:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 40th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 40: understanding through a model

QuickCheck is useful for more than just testing. Comparing the behaviour of a system to a model can be used to check if a system under construction is working correctly, but it can also be used to better understand an already existing system. In this episode we show that this does not need to be very difficult, by designing a model that we can use to understand tensor convolutions in an existing large library.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at March 12, 2025 12:00 AM

March 10, 2025

Joachim Breitner

Extrinsic termination proofs for well-founded recursion in Lean
A few months ago I explained that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.

This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…

Want to share your thoughts about this? Please join the discussion on the Lean community zulip!

Background

When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like List.map, then extra attention used to be necessary so that Lean can see that xs.map applies its argument only elements of the list xs. The usual idiom is to write xs.attach.map instead, where List.attach attaches to the list elements a proof that they are in that list. You can read more about this my Lean blog post on recursive definitions and our new shiny reference manual, look for Example “Nested Recursion in Higher-order Functions”.

To make this step less tedious I taught Lean to automatically rewrite xs.map to xs.attach.map (where suitable) within the construction of well-founded recursion, so that nested recursion just works (issue #5471). We already do such a rewriting to change if c then … else … to the dependent if h : c then … else …, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.

I implemented it like this (PR #6744) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:

A simpler fix

Recall that to use WellFounded.fix
WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x
we have to rewrite the functorial of the recursive function, which naturally has type
F : ((y : α) →  C y) → ((x : α) → C x)
to the one above, where all recursive calls take the termination proof r y x. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.

Things are simpler for recursive definitions using the new partial_fixpoint machinery, where we use Lean.Order.fix
Lean.Order.fix : [CCPO α] (F : β → β) (hmono : monotone F) : β
so the functorial’s type is unmodified (here β will be ((x : α) → C x)), and everything else is in the propositional side-condition montone F. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by
theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
    monotone (fun x => xs.mapM (f x)) 
Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified F that closely matches the function definition as written by the user. Much simpler!

Isabelle has it easier

Isabelle also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!

There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for List.map something like our List.map_congr_left
List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
    List.map f l = List.map g l
This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial F calls its argument f only on smaller arguments”.

Can we have it easy, too?

I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.

But maybe there is a way to do it, using an existential to give a witness that F can alternatively implemented using the more restrictive argument. The following callsOn P F predicate can express that F calls its higher-order argument only on arguments that satisfy the predicate P:
section setup

variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}

def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
  ∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f

variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))

local infix:50 " ≺ " => R

def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)

noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
  wf.fix (fun x => (h x).choose)

def fix_eq (wf : WellFounded R) h x :
    fix R F wf h x = F (fix R F wf h) x := by
  unfold fix
  rw [wf.fix_eq]
  apply (h x).choose_spec
This allows nice compositional lemmas to discharge callsOn predicates:
theorem callsOn_base (y : α) (hy : P y) :
    callsOn P (fun (f : ∀ x, β x) => f y) := by
  exists fun f => f y hy
  intros; rfl

@[simp]
theorem callsOn_const (x : γ) :
    callsOn P (fun (_ : ∀ x, β x) => x) :=
  ⟨fun _ => x, fun _ => rfl⟩

theorem callsOn_app
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (F₁ :  (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => F₁ f (F₂ f)) := by
  obtain ⟨F₁', h₁⟩ := h₁
  obtain ⟨F₂', h₂⟩ := h₂
  exists (fun f => F₁' f (F₂' f))
  intros; simp_all

theorem callsOn_lam
    {γ₁ : Sort uu}
    (F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
    (h : ∀ x, callsOn P (F x)) :
    callsOn P (fun f x => F x f) := by
  exists (fun f x => (h x).choose f)
  intro f
  ext x
  apply (h x).choose_spec

theorem callsOn_app2
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (g : γ₁ → γ₂ → γ)
    (F₁ :  (∀ y, β y) → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => g (F₁ f) (F₂ f)) := by
  apply_rules [callsOn_app, callsOn_const]
With this setup, we can have the following, possibly user-defined, lemma expressing that List.map calls its arguments only on elements of the list:
theorem callsOn_map (δ : Type uu) (γ : Type ww)
    (P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
    (h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
    callsOn P (fun f => xs.map (fun x => F f x)) := by
  suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
    simpa
  apply callsOn_app
  · apply callsOn_app
    · apply callsOn_const
    · apply callsOn_lam
      intro ⟨x', hx'⟩
      dsimp
      exact (h x' hx')
  · apply callsOn_const

end setup
So here is the (manual) construction of a nested map for trees:
section examples

structure Tree (α : Type u) where
  val : α
  cs : List (Tree α)

-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
--   fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
  fix (sizeOf · < sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
    (InvImage.wf (sizeOf ·) WellFoundedRelation.wf) <| by
  intro ⟨v, cs⟩
  dsimp only
  apply callsOn_app2
  · apply callsOn_const
  · apply callsOn_map
    intro t' ht'
    apply callsOn_base
    -- ht' : t' ∈ cs -- !
    -- ⊢ sizeOf t' < sizeOf { val := v, cs := cs }
    decreasing_trivial

end examples
This makes me happy!

All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and partial_fixpoint.

I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the β above is dependent, so it looks good.

With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. This may be a good thing.

The cake is a lie

What unfortunately kills this idea, though, is the generation of the functional induction principles, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging F to return a proof, but since the extra assumptions (e.g. for ite or List.map) only exist in the termination proof, they are not available in F.

Oh wey, how anticlimactic.

PS: Path dependencies

Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.
by Joachim Breitner (mail@joachim-breitner.de) at March 10, 2025 05:47 PM

March 08, 2025

Mark Jason Dominus

Claude and Merle Miller let me down

Claude

My relationship with Claude has its ups and downs, and I'm still trying to figure out what to use it for and what not. It's great at rewriting my job application cover letters to sound less like an awkward nerd.

Last week I was in the supermarket and decided to try asking it something I've been wondering for a long time:

MJD: What is the song "Mercy Street” about? What is the connection with Anne Sexton? Who is "Mary"?

I thought Claude might do well with this. I had had a conversation with it a while back about Pixies songs, which I was satisfied with. But this time Claude let me down:

Claude: … The "Mary" referenced in the song is likely Mary Gray Staples, who was the name of Sexton's childhood doll. …

(I thought: What? Am I supposed to believe that

Dreaming of the tenderness
The tremble in the hips
Of kissing Mary's lips

is about a doll?)

MJD: Mary Gray Staples is an unusual name for a doll. Are you sure about that?

Claude: You're right to question this - I made an error. Mary Gray was actually Anne Sexton's great-aunt, not the name of her doll. … The doll references in Gabriel's song likely connect to Sexton's imagery in other poems, but Mary specifically refers to her great-aunt.

MJD: I don't remember a doll in Gabriel's song.

Claude: You're absolutely right - I apologize for the error. There isn't actually a reference to a doll in Peter Gabriel's "Mercy Street" lyrics. … The "Mary" referenced in the song is indeed Anne Sexton's great-aunt Mary Gray…

MJD: Are you sure? Wikipedia says that Mary Gray Staples was the birth name of Sexton's mother.

Claude: You're absolutely right, and I apologize for the confusion. …

Claude just flubbed over and over.

I wonder if the grammatical error in “Mary Gray Staples, who was the name of …” is a kind of a tell? Perhaps Claude is fabricating, by stitching together parts of two unrelated sentences that it read somewhere, one with “Mary Gray Staples, who was…” and the other “… was the name of…”? Probably it's not that simple, but the grammatical error is striking.

Anyway, this was very annoying because I tend to remember things like this long past the time when I remember where I heard them. Ten years from now I might remember that Anne Sexton once had a doll with a very weird name.

Merle Miller

A while back I read Merle Miller's book Plain Speaking. It's an edited digest of a series of interviews Miller did with former President Truman in 1962, at his home in Independence, Missouri. The interviews were originally intended to be for a TV series, but when that fell through Miller turned them into a book. In many ways it's a really good book. I enjoyed it a lot, read it at least twice, and a good deal of it stuck in my head.

But I can't recommend it, because it has a terrible flaw. There have been credible accusations that Miller changed some of the things that Truman said, embellished or rephrased many others, that he tarted up Truman's language, and that he made up some conversations entirely.

So now whenever I remember something that I think Truman said, I have to stop and try to remember if it was from Miller. Did Truman really say that it was the worst thing in the world when records were destroyed? I'm sure I read it in Miller, so, uhh… maybe?

Miller recounts a discussion in which Truman says he is pretty sure that President Grant had never read the Constitution. Later, Miller says, he asked Truman if he thought that Nixon had read the Constitution, and reports that Truman's reply was:

I don't know. I don't know. But I'll tell you this. If he has, he doesn't understand it.

Great story! I have often wanted to repeat it. But I don't, because for all I know it never happened.

(I've often thought of this, in years past, and whatever Nixon's faults you could at least wonder what the answer was. Nobody would need to ask this about the current guy, because the answer is so clear.)

Miller, quotes Truman's remarks about Supreme Court Justice Tom Clark, “It isn't so much that he's a bad man. It's just that he's such a dumb son of a bitch.” Did Truman actually say that? Did he just imply it? Did he say anything like it? Uhhh… maybe?

There's a fun anecdote about the White House butler learning to make an Old-fashioned cocktail in the way the Trumans preferred. (The usual recipe involves whiskey, sugar, fresh fruit, and bitters.) After several attempts the butler converged on the Trumans' preferred recipe, of mostly straight bourbon. Hmm, is that something I heard from Merle Miller? I don't remember.

There's a famous story about how Paul Hume, music critic for the Washington Post, savaged an performance of Truman's daughter Margaret, and how Truman sent him an infamous letter, very un-presidential, that supposedly contained the paragraph:

Some day I hope to meet you. When that happens you'll need a new nose, a lot of beef steak for black eyes, and perhaps a supporter below!

Miller reports that he asked Truman about this, and Truman's blunt response: “I said I'd kick his nuts out.” Or so claims Miller, anyway.

I've read Truman's memoirs. Volume I, about the immediate postwar years, is fascinating; Volume II is much less so. They contain many detailed accounts of the intransigence of the Soviets and their foreign minister Vyacheslav Molotov, namesake of the Molotov Cocktail. Probably 95% of what I remember Truman saying is from those memoirs, direct from Truman himself. But some of it must be from Plain Speaking. And I don't know any longer which 5% it is.

As they say, an ice cream sundae with a turd in it isn't 95% ice cream, it's 100% shit. Merle Miller shit in the ice cream sundae of my years of reading of Truman and the Truman administrations.

Now Claude has done the same. And if I let it, Claude will keep doing it to me. Claude caga en la leche.

Addendum

The Truman Library now has the recordings of those interviews available online. I could conceivably listen to them all and find out for myself which things went as Miller said.

So there may yet be a happy ending, thanks to the Wonders of the Internet! I dream of someday going through those interviews and producing an annotated edition of Plain Speaking.

by Mark Dominus (mjd@plover.com) at March 08, 2025 02:07 AM

March 06, 2025

in Code

Sum Types and Subtypes and Unions
There’s yet again been a bit of functional programming-adjacent twitter drama recently, but it’s actually sort of touched into some subtleties about sum types that I am asked about (and think about) a lot nowadays. So, I’d like to take this opportunity to talk a bit about the “why” and nature of sum types and how to use them effectively, and how they contrast with other related concepts in programming and software development and when even cases where sum types aren’t the best option.

Sum Types at their Best

The quintessential sum type that you just can’t live without is Maybe, now adopted in a lot of languages as Optional:
data Maybe a = Nothing | Just a
If you have a value of type Maybe Int, it means that its valid values are Nothing, Just 0, Just 1, etc.

This is also a good illustration to why we call it a “sum” type: if a has n possible values, then Maybe a has 1 + n: we add the single new value Nothing to it.

The “benefit” of the sum type is illustrated pretty clearly here too: every time you use a value of type Maybe Int, you are forced to consider the fact that it could be Nothing:
showMaybeInt :: Maybe Int -> String
showMaybeInt = \case
  Nothing -> "There's nothing here"
  Just i -> "Something is here: " <> show i
That’s because usually in sum type implementations, they are implemented in a way that forces you to handle each case exhaustively. Otherwise, sum types are much less useful.

At the most fundamental level, this behaves like a compiler-enforced null check, but built within the language in user-space instead being compiler magic, ad-hoc syntax¹, or static analysis — and the fact that it can live in user-space is why it’s been adopted so widely. At a higher level, functional abstractions like Functor, Applicative, Monad, Foldable, Traversable allow you to use a Maybe a like just a normal a with the appropriate semantics, but that’s a topic for another time (like 2014).

This power is very special to me on a personal level. I remember many years ago on my first major haskell project changing a type from String to Maybe String, and then GHC telling me every place in the codebase where something needed to change in order for things to work still. Coming from dynamically typed languages in the past, this sublime experience truly altered my brain chemistry and Haskell-pilled me for the rest of my life. I still remember the exact moment, what coffee shop I was at, what my order was, the weather that day … it was truly the first day of the rest of my life.

It should be noted that I don’t consider sum types a “language feature” or a compiler feature as much as I’d consider it a design pattern. Languages that don’t have sum types built-in can usually implement them using typed unions and an abstract visitor pattern interface (more on that later). Of course, having a way to “check” your code before running it (like with a type system or statically verified type annotations) does make a lot of the features much more useful.

Anyway, this basic pattern can be extended to include more error information in your Nothing branch, which is how you get the Either e a type in the Haskell standard library, or the Result<T,E> type in rust.

Along different lines, we have the common use case of defining syntax trees:
data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y

main :: IO ()
main = do
    putStrLn $ pretty myExpr
    print $ eval myExpr
  where
    myExpr = Mul (Negate (Add (Lit 4) (Lit 5))) (Lit 8)
-(4 + 5) * 8
-72
Now, if we add a new command to the sum type, the compiler enforces us to handle it.
data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr
  | Abs Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y
    Abs x -> abs (eval x)

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y
        Abs x -> wrap prio 2 $ "|" <> go 0 x <> "|"
Another example where things shine are as clearly-fined APIs between processes. For example, we can imagine a “command” type that sends different types of commands with different payloads. This can be interpreted as perhaps the result of parsing command line arguments or the message in some communication protocol.

For example, you could have a protocol that launches and controls processes:
data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure

launch :: String -> Command Int
launch nm = Launch nm id

stop :: Int -> Command Bool
stop pid = Stop pid id
This ADT is written in the “interpreter” pattern (used often with things like free monad), where any arguments not involving a are the command payload, any X -> a represent that the command could respond with X.

Let’s write a sample interpreter backing the state in an IntMap in an IORef:
import qualified Data.IntMap as IM
import Data.IntMap (IntMap)

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    Launch newName next -> do
        currMap <- readIORef ref
        let newId = case IM.lookupMax currMap of
              Nothing -> 0
              Just (i, _) -> i + 1
        modifyIORef ref $ IM.insert newId newName
        pure (next newId)
    Stop procId next -> do
        existed <- IM.member procId <$> readIORef ref
        modifyIORef ref $ IM.delete procId
        pure (next existed)

main :: IO ()
main = do
    ref <- newIORef IM.empty
    aliceId <- runCommand ref $ launch "alice"
    putStrLn $ "Launched alice with ID " <> show aliceId
    bobId <- runCommand ref $ launch "bob"
    putStrLn $ "Launched bob with ID " <> show bobId
    success <- runCommand ref $ stop aliceId
    putStrLn $
      if success
        then "alice succesfully stopped"
        else "alice unsuccesfully stopped"
    print =<< readIORef ref
Launched alice with ID 0
Launched bob with ID 1
alice succesfully stopped
fromList [(1, "bob")]
Let’s add a command to “query” a process id for its current status:
data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure
  | Query Int (String -> a)     -- ^ takes a process ID, returns a status message

query :: Int -> Command String
query pid = Query pid id

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    -- ...
    Query procId next -> do
        procName <- IM.lookup procId <$> readIORef ref
        pure case procName of
          Nothing -> "This process doesn't exist, silly."
          Just n -> "Process " <> n <> " chugging along..."
Relationship with Unions

To clarify a common confusion: sum types can be described as “tagged unions”: you have a tag to indicate which branch you are on (which can be case-matched on), and then the rest of your data is conditionally present.

In many languages this can be implemented under the hood as a struct with a tag and a union of data, along with some abstract visitor pattern interface to ensure exhaustiveness.

Remember, it’s not exactly a union, because, ie, consider a type like:
data Entity = User Int | Post Int
An Entity here could represent a user at a user id, or a post at a post id. If we considered it purely as a union of Int and Int:
union Entity {
    int user_id;
    int post_id;
};
we’d lose the ability to branch on whether or not we have a user or an int. If we have the tagged union, we recover the original tagged union semantics:
struct Entity {
    bool is_user;
    union {
        int user_id;
        int post_id;
    } payload;
};
Of course, you still need an abstract interface like the visitor pattern to actually be able to use this as a sum type with guarantees that you handle every branch, but that’s a story for another day. Alternatively, if your language supports dynamic dispatch nicely, that’s another underlying implementation that would work to back a higher-level visitor pattern interface.

Subtypes Solve a Different Problem

Now, sum types aren’t exactly a part of common programming education curriculum, but subtypes and supertypes definitely were drilled into every CS student’s brain and waking nightmares from their first year.

Informally (a la Liskov), B is a subtype of A (and A is a supertype of B) if anywhere that expects an A, you could also provide a B.

In normal object-oriented programming, this often shows up in early lessons as Cat and Dog being subclasses of an Animal class, or Square and Circle being subclasses of a Shape class.

When people first learn about sum types, there is a tendency to understand them as similar to subtyping. This is unfortunately understandable, since a lot of introductions to sum types often start with something like
-- | Bad Sum Type Example!
data Shape = Circle Double | Rectangle Double Double
While there are situations where this might be a good sum type (ie, for an API specification or a state machine), on face-value this is a bad example on the sum types vs. subtyping distinction.

You might notice the essential “tension” of the sum type: you declare all of your options up-front, the functions that consume your value are open and declared ad-hoc. And, if you add new options, all of the consuming functions must be adjusted.

So, subtypes (and supertypes) are more effective when they lean into the opposite end: the universe of possible options are open and declared ad-hoc, but the consuming functions are closed. And, if you add new functions, all of the members must be adjusted.

In typed languages with a concept of “objects” and “classes”, subtyping is often implemented using inheritance and interfaces.
interface Widget {
    void draw();
    void handleEvent(String event);
    String getName();
}

class Button implements Widget {
    // ..
}

class InputField implements Widget {
    // ..
}

class Box implements Widget {
    // ..
}
So, a function like processWidget(Widget widget) that expects a Widget would be able to be passed a Button or InputField or Box. And, if you had a container like List<Widget>, you could assemble a structure using Button, InputField, and Box. A perfect Liskov storm.

In typical library design, you’re able to add new implementations of Widget as an open universe easily: anyone that imports Widget can, and they can now use it with functions taking Widgets. But, if you ever wanted to add new functionality to the Widget interface, that would be a breaking change to all downstream implementations.

However, this implementation of subtyping, while prevalent, is the most mind-numbly boring realization of the concept, and it pained my soul to even spend time talking about it. So let’s jump into the more interesting way that subtype and supertype relationships manifest in the only language where anything is interesting: Haskell.

Subtyping via Parametric Polymorphism

In Haskell, subtyping is implemented in terms of parametric polymorphism and sometimes typeclasses. This allows for us to work nicely with the concept of functions and APIs as subtypes and supertypes of each other.

For example, let’s look at a function that takes indexers and applies them:
sumAtLocs :: ([Double] -> Int -> Double) -> [Double] -> Double
sumAtLocs ixer xs = ixer xs 1 + ixer xs 2 * ixer xs 3
ghci> sumAtLocs (!!) [1,2,3,4,5]
14
So, what functions could you pass to sumAtLocs? Can you only pass [Double] -> Int -> Double?

Well, not quite. Look at the above where we passed (!!), which has type forall a. [a] -> Int -> a!

In fact, what other types could we pass? Here are some examples:
fun1 :: [a] -> Int -> a
fun1 = (!!)

fun2 :: [a] -> Int -> a
fun2 xs i = reverse xs !! i

fun3 :: (Foldable t, Floating a) => t a -> Int -> a
fun3 xs i = if length xs > i then xs !! i else pi

fun4 :: Num a => [a] -> Int -> a
fun4 xs i = sum (take i xs)

fun5 :: (Integral b, Num c) => a -> b -> c
fun5 xs i = fromIntegral i

fun5 :: (Foldable t, Fractional a, Integral b) => t a -> b -> a
fun5 xs i = sum xs / fromIntegral i

fun5 :: (Foldable t, Integral b, Floating a) => t a -> b -> a
fun5 xs i = logBase (fromIntegral i) (sum xs)
What’s going on here? Well, the function expects a [Double] -> Int -> Double, but there are a lot of other types that could be passed instead.

At first this might seem like meaningless semantics or trickery, but it’s deeper than that: remember that each of the above types actually has a very different meaning and different possible behaviors!

forall a. [a] -> Int -> a means that the a must come from the given list. In fact, any function with that type is guaranteed to be partial: if you pass it an empty list, there is no a available to use.

forall a. Num a => [a] -> Int -> a means that the result might actually come from outside of the list: the implementation could always return 0 or 1, even if the list is empty. It also guarantees that it will only add, subtract, multiply, or abs: it will never divide.

forall a. Fractional a => [a] -> Int -> a means that we could possibly do division on the result, but we can’t do anything “floating” like square rooting or logarithms.

forall a. Floating a => [a] -> Int -> a means that we can possibly start square rooting or taking the logarithms of our input numbers

[Double] -> Int -> Double gives us the least guarantees about the behavior: the result could come from thin air (and not be a part of the list), and we can even inspect the machine representation of our inputs.

So, we have all of these types with completely different semantics and meanings. And yet, they can all be passed to something expecting a [Double] -> Int -> Double. That means that they are all subtypes of [Double] -> Int -> Double! [Double] -> Int -> Double is a supertype that houses multitudes of possible values, uniting all of the possible values and semantics into one big supertype.

Through the power of parametric polymorphism and typeclasses, you can actually create an extensible hierarchy of supertypes, not just of subtypes.

Consider a common API for json serialization. You could have multiple functions that serialize into JSON:
fooToJson :: Foo -> Value
barToJson :: Bar -> Value
bazToJson :: Baz -> Value
Through typeclasses, you can create:
toJSON :: ToJSON a => a -> Value
The type of toJSON :: forall a. JSON a => a -> Value is a subtype of Foo -> Value, Bar -> Value, and Baz -> Value, because everywhere you would want a Foo -> Value, you could give toJSON instead. Every time you want to serialize a Foo, you could use toJSON.

This usage works well, as it gives you an extensible abstraction to design code around. When you write code polymorphic over Monoid a, it forces you to reason about your values with respect to only the aspects relating to monoidness. If you write code polymorphic over Num a, it forces you to reason about your values only with respect to how they can be added, subtracted, negated, or multiplied, instead of having to worry about things like their machine representation.

The extensibility comes from the fact that you can create even more supertypes of forall a. ToJSON a => a -> Value easily, just by defining a new typeclass instance. So, if you need a MyType -> Value, you could make it a supertype of toJSON :: ToJSON a => a -> Value by defining an instance of the ToJSON typeclass, and now you have something you can use in its place.

Practically this is used by many libraries. For example, ad uses it for automatic differentiation: its diff function looks scary:
diff :: (forall s. AD s ForwardDouble -> AD s ForwardDouble) -> Double -> Double
But it relies on the fact that that (forall s. AD s ForwardDouble -> AD s ForwardDuble) is a superclass of (forall a. Floating a => a -> a), (forall a. Num a => a -> a), etc., so you can give it functions like \x -> x * x (which is a forall a. Num a => a -> a) and it will work as that AD s type:
ghci> diff (\x -> x * x) 10
20      -- 2*x
This “numeric overloading” method is used by libraries for GPU programming, as well, to accept numeric functions to be optimized and compiled to GPU code.

Another huge application is in the lens library, which uses subtyping to unite its hierarchy of optics.

For example, an Iso is a subtype of Traversal which is a subtype of Lens, and Lens is a supertype of Fold and Traversal, etc. In the end the system even allows you to use id from the Prelude as a lens or a traversal, because the type signature of id :: a -> a is actually a subtype of all of those types!

Subtyping using Existential Types

What more closely matches the spirit of subtypes in OOP and other languages is the existential type: a value that can be a value of any type matching some interface.

For example, let’s imagine a value that could be any instance of Num:
data SomeNum = forall a. Num a => SomeNum a

someNums :: [SomeNum]
someNums = [SomeNum (1 :: Int), SomeNum (pi :: Double), SomeNum (0xfe :: Word)]
This is somewhat equivalent to Java’s List<MyInterface> or List<MyClass>, or python’s List[MyClass].

Note that to use this effectively in Haskell with superclasses and subclasses, you need to manually wrap and unwrap:
data SomeFrational = forall a. Fractional a => SumFractional a

castUp :: SomeFractional -> SumNum
castUp (SomeFractional x) = SomeNum x
So, SomeNum is “technically” a supertype of SomeFractional: everywhere a SomeNum is expected, a SomeFractional can be given…but in Haskell it’s a lot less convenient because you have to explicitly cast.

In OOP languages, you can often cast “down” using runtime reflection (SomeNum -> Maybe SomeFractional). However, this is impossible in Haskell the way we have written it!
castDown :: SomeNum -> Maybe SomeFractional
castDown = error "impossible!"
That’s because of type erasure: Haskell does not (by default) couple a value at runtime with all of its associated interface implementations. When you create a value of type SomeNum, you are packing an untyped pointer to that value as well as a “dictionary” of all the functions you could use it with:
data NumDict a = NumDict
    { (+) :: a -> a -> a
    , (*) :: a -> a -> a
    , negate :: a -> a
    , abs :: a -> a
    , fromInteger :: Integer -> a
    }

mkNumDict :: Num a => NumDict a
mkNumDict = NumDict (+) (*) negate abs fromInteger

data FractionalDict a = FractionalDict
    { numDict :: NumDict a
    , (/) :: a -> a -> a
    , fromRational :: Rational -> a
    }

-- | Essentially equivalent to the previous 'SomeNum'
data SomeNum = forall a. SomeNum
    { numDict :: NumDict a
    , value :: a
    }

-- | Essentially equivalent to the previous 'SomeFractional'
data SomeFractional = forall a. SomeFractional
    { fractionalDict :: FractionalDict a
    , value :: a
    }

castUp :: SomeFractional -> SomeNum
castUp (SomeFractional (FractionalDict {numDict}) x) = SomeNum d x

castDown :: SomeNum -> Maybe SomeFractional
castDown (SomeNum nd x) = error "not possible!"
All of these function pointers essentially exist at runtime inside the SomeNum. So, SomeFractional can be “cast up” to SomeNum by simply dropping the FractionalDict. However, you cannot “cast down” from SomeNum because there is no way to materialize the FractionalDict: the association from type to instance is lost at runtime. OOP languages usually get around this by having the value itself hold pointers to all of its interface implementations at runtime. However, in Haskell, we have type erasure by default: there are no tables carried around at runtime.²

In the end, existential subtyping requires explicit wrapping/unwrapping instead of implicit or lightweight casting possible in OOP languages optimized around this sort of behavior.³ Existential-based subtyping is just less common in Haskell because parametric polymorphism offers a solution to most similar problems. For more on this topic, Simon Peyton Jones has a nice lecture on the topic.

The pattern of using existentially qualified data in a container (like [SomeNum]) is often called the “widget pattern” because it’s used in libraries like xmonad to allow extensible “widgets” stored alongside the methods used to manipualte them. It’s more common to explicitly store the handler functions (a “dictionary”) inside the type instead of of existential typeclasses, but sometimes it can be nice to let the compiler handle generating and passing your method tables implicitly for you. Using existential typeclasses instead of explicit dictionaries also allows you to bless certain methods and functions as “canonical” to your type, and the compiler will make sure they are always coherent.

I do mention in a blog post about different types of existential lists, however, that this “container of instances” type is much less useful in Haskell than in other languages for many reasons, including the up/downcasting issues mentioned above. In addition, Haskell gives you a whole wealth of functionality to operate over homogeneous parameters (like [a], where all items have the same type) that jumping to heterogeneous lists gives up so much.
Aside

Let’s briefly take a moment to talk about how typeclass hierarchies give us subtle subtype/supertype relationships.

Let’s look at the classic Num and Fractional:
class Num a

class Num a => Fractional a
Num is a superclass of Fractional, and Fractional is a subclass of Num. Everywhere a Num constraint is required, you can provide a Fractional constraint to do the same thing.

However, in these two types:
Num a => a
Fractional a => a
forall a. Num a => a is actually a subclass of forall a. Fractional a => a! That’s because if you need a forall a. Fractional a => a, you can provide a forall a. Num a => a instead. In fact, let’s look at three levels: Double, forall a. Fractional a => a, and forall a. Num a => a.
-- can be used as `Double`
1.0 :: Double
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Fractional a => a`
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Num a => a`
1 :: Num a => a
So, Double is a supertype of Fractional a => a is a supertype of Num a => a.

The general idea here is that the more super- you go, the more you “know” about the actual term you are creating. So, with Num a => a, you know the least (and, you have the most possible actual terms because there are more instances of Num than of Fractional). And, with Double, you know the most: you even know its machine representation!

So, Num is a superclass of Fractional but forall a. Num a => a is a subclass of forall a. Fractional a => a. This actually follows the typical rules of subtyping: if something appears on the “left” of an arrow (=> in this case), it gets flipped from sub- to super-. We often call the left side a “negative” (contravariant) position and the right side a “positive” position, because a negative of a negative (the left side of a left size, like a in (a -> b) -> c) is a positive.

Also note that our “existential wrappers”:
data SomeNum = forall a. Num a => SomeFractional a
data SomeFractional = forall a. Fractional a => SomeFractional a
can be CPS-transformed to their equivalent types:
type SomeNum' = forall r. (forall a. Num a => a -> r) -> r
type SomeFractional' = forall r. (forall a. Fractional a => a -> r) -> r

toSomeNum' :: SomeNum -> SomeNum'
toSomeNum' (SomeNum x) f = f x

toSomeNum :: SomeNum' -> SomeNum
toSomeNum sn = sn SomeNum
And in those cases, Num and Fractional again appear in the covariant (positive) position, since they’re the negative of negative. So, this aligns with our intuition that SomeFractional is a subtype of SomeNum.
The Expression Problem

This tension that I described earlier is closely related to the expression problem, and is a tension that is inherent to a lot of different aspects of language and abstraction design. However, in the context laid out in this post, it serves as a good general guide to decide what pattern to go down:

If you expect a canonical set of “inhabitants” and an open set of “operations”, sum types can suit that end of the spectrum well.

If you expect a canonical set of “operations” and an open set of “inhabitants”, consider subtyping and supertyping.

I don’t really think of the expression problem as a “problem” in the sense of “some hindrance to deal with”. Instead, I see it in the “math problem” sort of way: by adjusting how you approach things, you can play with the equation make the most out of what requirements you need in your design.

Looking Forward

A lot of frustration in Haskell (and programming in general) lies in trying to force abstraction and tools to work in a way they weren’t meant to. Hopefully this short run-down can help you avoid going against the point of these design patterns and start making the most of what they can offer. Happy Haskelling!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)

?↩︎

Must OOP languages also have mechanisms for type erasure, but the default is unerased, which is opposite of Haskell.↩︎

Note that there are current GHC proposals that attempt to allow “naked” existentials without newtype wrappers, so we could actually get the same seamless and implicit up-casting as we would get in OOP languages. However, the jury is out on whether or not this is a good idea.↩︎
by Justin Le at March 06, 2025 05:25 PM

Tweag I/O

Bazel and Testwell CTC++, revisited

A while ago, we wrote a post on how we helped a client initially integrate the Testwell CTC++ code coverage tool from Verifysoft into their Bazel build.

Since then, some circumstances have changed, and we were recently challenged to see if we could improve the CTC++/Bazel integration to the point were CTC++ coverage builds could enjoy the same benefits of Bazel caching and incremental rebuilds as regular (non-coverage) builds. Our objective was to make it feasible for developers to do coverage builds with CTC++ locally, rather than them using different coverage tools or delaying coverage testing altogether. Thus we could enable the client to focus their efforts on improving overall test coverage with CTC++ as their only coverage tool.

In this sequel to the initial integration, we, as a team, have come up with a more involved scheme for making CTC++ meet Bazelâ€™s expectations of hermetic and reproducible build actions. There is considerable extra complexity needed to make this work, but the result is a typical speedup of 5-10 times on most coverage builds. The kind of speedup that not only makes your CI faster, but that allows developers to work in a different and more efficient way, altogether.

More generally, we hope this blog post can serve as a good example (or maybe a cautionary tale ğŸ˜‰) of how to take a tool that does not play well with Bazelâ€™s idea of a well-behaved build step, and force it into a shape where we can still leverage Bazelâ€™s strengths.

The status quo

You can read our previous blog post for more details, but here weâ€™ll quickly summarize the relevant bits of the situation after our initial integration of CTC++ coverage builds with Bazel:

CTC++ works by wrapping the compiler invocation with its ctc tool, and adding coverage instrumentation between the preprocessing and compiling steps.

In addition to instrumenting the source code itself, ctc also writes instrumentation data in a custom text format (aka. symbol data) to a separate output file, typically called MON.sym (aka. the symbol file).

At runtime the instrumented unit tests will collect coverage statistics and write these (in binary form) to another separate output file: MON.dat.

As far as Bazel is concerned, both the MON.sym and MON.dat files are untracked side-effects of the respective compilation and testing steps. As such we had to poke a hole in the Bazel sandbox and arrange for these files to be written to a persistent location without otherwise being tracked or managed by Bazel.

More importantly, these side-effects mean that we have to disable all caching and re-run the entire build and all tests from scratch every single time. Otherwise, we would end up with incomplete MON.sym and MON.dat files.

Another consideration - not emphasized in our previous post since we had to disable caching of intermediate outputs in any case - is that the outputs from ctc are not hermetic and reproducible. Both the instrumentation that is added to the source code, as well as the symbol file that is written separately by ctc contain the following information that is collected at compile time:

Absolute paths to source code files: Even though Bazel passes relative paths on the command-line, ctc will still resolve these into absolute paths and record these paths into its outputs. Since all these build steps run inside the Bazel sandbox, the recorded paths vary arbitrarily from build to build. Even worse: the paths are made invalid as soon as the sandbox is removed, when the compilation step is done.

Timestamps: ctc will also record timestamps into the instrumented source code and the symbol file. As far as we know, these might have been part of some internal consistency check in previous versions of CTC++, but currently they are simply copied into the final report, and displayed as a property of the associated symbol data on which the HTML report is based. Since our coverage reports are already tied to known Git commits in the code base, these timestamps have no additional value for us.

Fingerprints: ctc calculates a 32-bit fingerprint based on the symbol data, and records this fingerprint into both the instrumented source and the symbol file. Since the symbol data already contains absolute path names as detailed above, the resulting fingerprint will also vary accordingly, and thus not be reproducible from one build to the next, even when all other inputs remain unchanged.

Outlining the problems to be solved

If we are to make CTC++ coverage builds quicker by leveraging the Bazel cache, we must answer these two questions:

Can we make ctcâ€™s outputs reproducible? Without this, re-enabling the Bazel cache for these builds is a non-starter, as each re-evaluation of an intermediate build step will have never-before-seen action inputs, and none of the cached outputs from previous builds will ever get reused.

Can we somehow capture the extra MON.sym output written by ctc at build time, and appropriately include it into Bazelâ€™s build graph?¹ We need for Bazel to cache and reuse the symbol data associated with a compilation unit in exactly the same way that it would cache and reuse the object file associated with the same compilation unit.

Solving both of these would allow us to achieve a correct coverage report assembled from cached object files and symbol data from previously-built and unchanged source code, together with newly-built object files and symbol data from recently-changed source code (in addition to the coverage statistics collected from re-running all tests).

Achieving reproducibility

Letâ€™s tackle the problem of making ctcâ€™s outputs reproducible first. We start by observing that ctc allows us to configure hook scripts that will be invoked at various points while ctc is running. We are specifically interested in:

RUN_AFTER_CPP, allows access to the preprocessed source before the instrumentation step, and

RUN_AFTER_INSTR, allows access to the instrumented source before itâ€™s passed on to the underlying compiler.

From our existing work, we of course also have our own wrapper script around ctc, which allows us to access the outputs of each ctc invocation before they are handed back to Bazel. We also know, from our previous work, that we can instruct ctc to write a separate symbol file per compilation unit, rather than have all compilation units append to the same MON.sym file.

Together this allows us to rewrite the outputs from ctc in such a way as to make them reproducible. What we want to rewrite, has already been outlined above:

Absolute paths into the sandbox: We could rewrite these into corresponding absolute paths to the original source tree instead, but we can just as well take it one step further and simply strip the sandbox root directory prefix from all absolute paths. This turns them into relative paths that happen to resolve correctly, whether theyâ€™re taken relative to the sandbox directory at compile time, or relative to the root of the source tree afterwards.

Timestamps: This one is relatively easy, we just need to decide on a static timestamp that does not change across builds. For some reason the CTC++ report tooling did not like us passing the ultimate default timestamp, aka. the Unix Epoch, so we instead settled for midnight on January 1 2024.²

Fingerprints: Here we need to calculate a 32-bit value that will reflect the complete source code in this compilation unit (but importantly with transient sandbox paths excluded). We donâ€™t have direct access to the in-progress symbol data that ctc uses to calculate its own fingerprint, so instead we settle on calculating a CRC32 checksum across the entire preprocessed source code (before ctc adds its own instrumentation).³

Once weâ€™ve figured out what to rewrite, we can move on to the how:

Using the RUN_AFTER_CPP option to ctc, we can pass in a small script that calculates our new fingerprint by running the preprocessed source code through CRC32.

Using the RUN_AFTER_INSTR option to ctc, we can pass in a script that processes the instrumented source, line by line:

rewriting any absolute paths that point into the Bazel sandbox,

rewriting the timestamp recorded by ctc into our static timestamp, and

rewriting the fingerprint to the one calculated in step 1.

In our script that wraps the ctc invocation, we can insert the above two options on the ctc command line. We can also instruct ctc to write a separate .sym file for this compilation unit inside the sandbox.

In the same wrapper script, after ctc is done producing the object file and symbol file for a compilation unit, we can now rewrite the symbol file that ctc produced. The rewrites are essentially the same as performed in step 2, although the syntax of the symbol file is different than the instrumented source.

At this point, we have managed to make ctcâ€™s outputs reproducible, and we can proceed to looking at the second problem from above: properly capturing and maintaining the symbol data generated by ctc. However, we have changed the nature of the symbol data somewhat: Instead of having multiple compilation units write to the same MON.sym file outside of the sandbox, we now have one .sym file per compilation unit written inside the sandbox. These files are not yet known to Bazel, and would be removed together with the rest of the sandbox as soon as the compilation step is finished.

Enabling correct cache/reuse of symbol data

What we want to achieve here is for the symbol data associated with a compilation unit to closely accompany the corresponding object file from the same compilation unit: If the object file is cached and later reused by Bazel, we want the symbol file to be treated the same. And when the object file is linked into an executable or a shared library, we want the symbol file to automatically become part of any coverage report that is later created based on running code from that executable or library.

I suspect there are other ways we could handle this, for example using Bazel aspects, or similar, but since weâ€™re already knee-deep in compiler wrappers and rewriting outputsâ€¦

In for a penny, in for a poundâ€¦

Given that we want the symbol file to be as closely associated with the object file as possible, letâ€™s take that to the ultimate conclusion and make it a stowaway inside the object file. After all, the object file is â€œjustâ€� an ELF file, and it does not take too much squinting to regard the ELF format as a generic container of sections, where a section really can be any piece of data you like.

The objcopy tool, part of the GNU binutils tool suite, also comes to our aid with options like --add-section and --dump-section to help us embed and extract such sections from any ELF file.

With this in hand, we can design the following scheme:

In our wrapper script, after ctc has generated an object file with an accompanying symbol file, we run objcopy --add-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE to embed the symbol file as a new ctc_sym section inside the object file.

We make no changes to our Bazel build, otherwise. We merely expect Bazel to collect, cache, and reuse the object files as it would do with any intermediate build output. The symbol data is just along for the ride.

In the linking phase (which is already intercepted by ctc and our wrapper script) we can forward the symbol data from the linker inputs (ELF object files) into the linker output (a shared library or executable, also in the ELF format), like this: Extract the ctc_sym from each object file passed as input (objcopy --dump-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE /dev/null), then concatenate these symbol files together, and finally embed that into the ELF output file from the linker.⁴

At test run time, in addition to running the tests (which together produce MON.dat as a side effect), we can iterate over the test executables and their shared library dependencies, and extract any ctc_sym sections that we come across. These are then split into separate symbol files and placed next to MON.dat.

Finally, we can pass MON.dat and all the .sym files on to the ctcreport report generator to generate the final HTML report.⁵

Results

With all of the above in place, we can run coverage builds with and without our changes, while testing various build scenarios, to see what we have achieved.

Letâ€™s look at some sample build times for generating CTC++ coverage reports. All times below are taken from the best of three runs, all on the same machine.

Status quo

Starting with the situation as of our previous blog post:

Scope of coverage build + tests bazel build/test ctcreport Total

Entire source tree 38m46s 2m06s 44m26s

One large application 13m59s 43s 15m30s

One small application 21s 1s 35s

Since caching is intentionally disabled and there is no reuse between these coverage builds, these are the kinds of numbers you will get, no matter the size of your changes since the last coverage build.

Letâ€™s look at the situation after we made the changes outlined above.

Worst case after our changes: No cache to be reused

First, for a new coverage build from scratch (i.e. a situation in which there is nothing that can be reused from the cache):

Scope of coverage build + tests bazel build/test ctcreport Total Speedup

Entire source tree 38m48s 1m59s 43m03s 1.0x

One large application 13m04s 43s 14m26s 1.1x

One small application 19s 1s 22s 1.6x

As expected, these numbers are very similar to the status quo. After all, we are doing the same amount of work, and this is not the scenario we sought to improve in any case.

There is maybe a marginal improvement in the overhead (i.e. the time spent between/around bazel and ctcreport), but itâ€™s pretty much lost in the noise, and certainly nothing worth writing a blog post about.

Best case after our changes: Rebuild with no changes

This is the situation where we are now able to reuse already-instrumented intermediate build outputs. In fact, in this case there are no changes whatsoever, and Bazel can reuse the test executables from the previous build directly, no (re-)building necessary. However, as discussed above, we do need to re-run all tests and then re-generate the coverage report:

Scope of coverage build + tests bazel build/test ctcreport Total Speedup

Entire source tree 3m24s 1m58s 6m55s 6.4x

One large application 1m31s 42s 2m49s 5.5x

One small application 1s 1s 4s 8.8x

Common case after our changes: Rebuild with limited change set

This last table is in many ways the most interesting (but least accurate), as it tries to reflect the common case that most developers are interested in:

â€œIâ€™ve made a few changes to the source code, how long will I have to wait to see the updated coverage numbers?â€�

Of course, as with a regular build, it depends on the size of your changes, and the extent to which they cause misses in Bazelâ€™s build cache. Here, Iâ€™ve done some small source code change that cause rebuilds in a handful of compilation units:

Scope of coverage build + tests bazel build/test ctcreport Total Speedup

Entire source tree 3m23s 1m57s 6m54s 6.4x

One large application 1m34s 42s 2m52s 5.4x

One small application 4s 1s 6s 5.8x

The expectation here would be that the total time needed is the sum of how long it takes to do a regular build of your changes, plus the numbers from the no-op case above. And this seems to largely hold true. Especially for the single- application case were we expect your changes to affect applicationâ€™s unit tests, and therefore the build phase must strictly precede the test runs.

In the full source tree scenario, it seems that Bazel can start running other (unrelated) tests concurrently with building your changes, and as long as your changes, and the tests on which they depend, are not among the slowest tests to run, then those other, slower test will â€œhideâ€� the marginal build time cost imposed by your changes.

Conclusion

We have achieved what we set out to do: to leverage the Bazel cache to avoid unnecessary re-building of coverage-instrumented source code. It involves a fair amount of added complexity in the build process, in order to make CTC++â€˜s outputs reproducible, and thus reusable by Bazel, but the end result, in the common case - a developer making a small source code change relative to a previous coverage build - is a 5-10x speedup of the total time needed to build and test with coverage instrumentation, including the generation of the final coverage report.

Future work

A natural extension of the above scheme is to apply a similar treatment to the generation of the coverage statistics at test runtime: Bazel allows for test runs to be cached, so that later build/test runs can reuse the results and logs from earlier test runs, rather than having to re-run tests that havenâ€™t changed.

However, in much the same way as for symbol data at build time, we would need to make sure that coverage statistics (.dat files) were saved and reused along with the corresponding test run results/logs.

One could imagine each test creating a separate .dat file when run, and then have Bazel cache this together with the test logs. The report generation phase would then need to collect the .dat files from both the reused/cached and the new/uncached test runs, and pass them all to the ctcreport tool. Failure to do so correctly would cause coverage statistics to be lost, and the resulting coverage report would be misleading.

With all this in place we could then enable caching of test results (in practice, removing the --nocache_test_results flag that we currently pass), and enjoy yet another speedup courtesy of Bazelâ€™s cache.

That said, we are entering the realm of diminishing returns: Unit tests - once they are built - typically run quickly, and there is certainly less time to be saved here than what is saved by reusing cached build results. Looking at the above numbers: even if we were able to fully eliminate time used by bazel test, we would still only achieve another 2x speedup, theoretically.

For now, we can live with re-running all tests from scratch in order to create a complete MON.dat file, every time.

And that is where I believe it stops: extending this even further to incrementally generate the coverage report itself, in effect to re-generate parts of the report based on a few changed inputs, is - as far as I can see - not possible with the existing tools.

Finally, I want to commend Verifysoft for their understanding and cooperation. I can only imagine that for someone not used to working with Bazel, our initial questions must have seemed very eccentric. They were, however, eager to understand our situation and find a way to make CTC++ work for us. They have even hinted at including a feature in a future version of CTC++ to allow shortening/mapping paths at instrumentation time. Using such a feature to remove the sandbox paths would also have the nice side effect of making CTC++â€˜s own fingerprint logic reproducible, as far as we can see. Together, this would enable us to stop rewriting paths and fingerprints on our own.

Thanks to Mark Karpov for being my main co-conspirator in coming up with this scheme, and helping to work out all the side quests and kinks along the way.

Also thanks to Christopher Harrison, Joseph Neeman, and Malte Poll for their reviews of this article.

For now, we ignore the non-hermetic writing of MON.dat files. See the section on future work for how tackling this properly is in many ways similar (and similarly complex) to what weâ€™re doing for the CTC++ symbol data in the rest of this article.â†©

On reconsideration, we should probably have used the somewhat standardized $SOURCE_DATE_EPOCH environment variable here rather than coming up with our own static date. In practice, it should not matter.â†©

In later talks with Verifysoft, we have been given the OK that this fingerprint scheme should be sufficient for our purpose, at least until a new version of CTC++ that allows for more reproducible fingerprints are available.â†©

It seems that - by default - the linker is doing almost exactly what we want: The ctc_sym sections from the linker inputs are indeed automatically concatenated into the linker output. However, the linker appears to discard sections from inputs that are completely optimized away at link time. But we do in fact want these symbol data sections to be retained, otherwise the final coverage report would omit the corresponding source files rather than showing them as lacking test coverage. Hence we resort to maintaining the ctc_sym section ourselves at link time.â†©

As an extra sanity check, ctcreport will verify that the fingerprints from inside the given .sym files match the corresponding fingerprints recorded alongside the coverage statistics in the MON.dat file. Thus we can discover if weâ€™ve messed up somewhere along the way.â†©

March 06, 2025 12:00 AM

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total
Entire source tree	38m46s	2m06s	44m26s
One large application	13m59s	43s	15m30s
One small application	21s	1s	35s

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	38m48s	1m59s	43m03s	1.0x
One large application	13m04s	43s	14m26s	1.1x
One small application	19s	1s	22s	1.6x

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	3m24s	1m58s	6m55s	6.4x
One large application	1m31s	42s	2m49s	5.5x
One small application	1s	1s	4s	8.8x

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	3m23s	1m57s	6m54s	6.4x
One large application	1m34s	42s	2m52s	5.4x
One small application	4s	1s	6s	5.8x

February 27, 2025

Russell O'Connor

The Right Number of YubiKeys is Three

Four years ago I bought a pair of YubiKey 5s: One YubiKey 5 Nano, which fits in my laptopâ€™s USB slot, and another YubiKey 5 NFC as backup, which sat in my home office. However, I kept worrying about what happens if my house burns down or something, taking both my laptop and office YubiKeys together at the same time. On the otherhand, if I stored my YubiKey 5 NFC offsite, then whenever I needed to register a new FIDO service, I would need to go fetch the key, update it, and then return it. Based my peronal experince, even if that were not a big pain, the "return it" step often gets delayed indefinitely because it feels so low priority.
Then I read a popular comment made on Hacker News: Get three YubiKeys. Suddenly everything clicked! I bought a second YubiKey 5 NFC last year.
Now, I keep a second YubiKey 5 NFC offsite, in addition to the one in my laptop and the one in my office. If my home burns down, I still have an offsite YubiKey available. But the best thing about having a second YubiKey 5 NFC is that it partly mitigates the offsite update problem.
In the previous scenario, we required potentially two trips offsite to update the backup YubiKey. However, now the procedure is to register a new FIDO service is to first update the office YubiKey 5 NFC key (and the YubiKey 5 nano). Then, at your earlist convienence, you swap the office YubiKey 5 NFC key with the offsite YubiKey 5 NFC. When you get the offsite YubiKey home, you update it with the new FIDO service and then it becomes the new office YubiKey. There is no need to return to the offsite location.
Part of the issue is that there is no "public FIDO key", like there is with a "public PGP key". You need the acutual YubiKey in hand to register it with a FIDO service, no matter whether it is a discoverable credetial or not. If you were only using the YubiKey as a OpenPGP smart card, the perhaps you could get away with just having a local key and an offsite key. Even still, I would recommend a third YubiKey so that whenever the time comes to do some operation on your offsite key, you can perform the same swaping trick.
The title of this article says that three is the right number of YubiKeys. However this is because I only have one nano in my laptop because that is my primary computing interface. I do have a desktop computer that I mostly only access as a remote server. If you have multiple computer devices that you regularly use, it would make sense to have a YubiKey nano device in each of them. And in addition to those, have one offsite key, and one local key for swapping with the offsite key.

February 27, 2025 01:25 AM

Tweag I/O

Evaluating the evaluators: know your RAG metrics
Retrieval-augmented generation (RAG) is about providing large language models with extra context to help them produce more informative responses. Like any machine learning application, your RAG app needs to be monitored and evaluated to ensure that it continues to respond accurately to user queries. Fortunately, the RAG ecosystem has developed to the point where you can evaluate your system in just a handful of lines of code. The outputs of these evaluations are easily interpretable: numbers between 0 and 1, where higher numbers are better. Just copy our sample code below, paste it into your continuous monitoring system, and you’ll be looking at nice dashboards in no time. So that’s it, right?

Well, not quite. There are several common pitfalls in RAG evaluation. From this blog post, you will learn what the metrics mean and how to check that they’re working correctly on your data with our field-gained knowledge. As they say, “forewarned is forearmed”!

Background

If you’re new to RAG evaluation, our previous posts about it give an introduction to evaluation and discuss benchmark suites. For now, you just need to know that a benchmark suite consists of a collection of questions or prompts, and for each question establishes:

a “ground truth” context, consisting of documents from our database that are relevant for answering the question; and

a “ground truth” answer to the question.

For example

Query Ground truth context Ground truth answer

What is the capital of France? Paris, the capital of France, is known for its delicious croissants. Paris

Where are the best croissants? Lune Croissanterie, in Melbourne, Australia, has been touted as ‘the best croissant in the world.’ Melbourne

Then the RAG system provides (for each question):

a “retrieved” context — the documents that our RAG system thought were relevant — and

a generated answer.

The inputs to a RAG evaluator

Example

Here’s an example that uses the Ragas library to evaluate the “faithfulness” (how well the response was supported by the context) of a single RAG output, using an LLM from AWS Bedrock:
from langchain_aws import ChatBedrockConverse
from ragas import EvaluationDataset, evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import Faithfulness

# In real life, this probably gets loaded from an internal file (and hopefully
# has more than one element!)
eval_dataset = EvaluationDataset.from_list([{
    "user_input": "What is the capital of France?",
    "retrieved_contexts": ["Berlin is the capital of Germany."],
    "response": "I don't know.",
}])

# The LLM to use for computing metrics (more on this below).
model = "anthropic.claude-3-haiku-20240307-v1:0"
evaluator = LangchainLLMWrapper(ChatBedrockConverse(model=model))
print(evaluate(dataset=eval_dataset, metrics=[Faithfulness(llm=evaluator)]))
If you paid close attention in the previous section, you’ll have noticed that our evaluation dataset doesn’t include all of the components we talked about. That’s because the “faithfulness” metric only requires the retrieved context and the generated answer.

RAG evaluation metrics

There are a variety of RAG evaluation metrics available; to keep them straight, we like to use the RAG Triad, a helpful system of categorizing some RAG metrics. A RAG system has one input (the query) and two outputs (the context and the response), and the RAG Triad lets us visualize the three interactions that need to be evaluated.

The RAG triad

Evaluating retrieval

Feeding an LLM with accurate and relevant context can help it respond well; that’s the whole idea of RAG. Your system needs to find that relevant context, and your evaluation system needs to figure out how well the retrieval is working. This is the top-right side of the RAG Triad: evaluating the relationship between the query and the retrieved context. The two main retrieval metrics are precision and recall; each one has a classical definition, plus an “LLM-enhanced” definition for RAG. Roughly, “good precision” means that we don’t return irrelevant information, while “good recall” means that we don’t miss any relevant information. Let’s say that each of our benchmark queries is labelled with a ground truth set of relevant documents, so that we can check how many of the retrieved documents are relevant.

Then the classical precision and recall are

$encoding="application/x-tex">\text{precision} = \frac{\text{\# relevant retrieved docs}}{\text{\# retrieved docs}} \qquad \text{recall} = \frac{\text{\# relevant retrieved docs}}{\text{\# relevant docs in the database}}</annotation></semantics>$

These metrics are well-established, useful, and easy to compute. But in a RAG system, the database might be large, uncurated, and contain redundant documents. For example, suppose you have ten related documents, each containing an answer to the query. If your retrieval system returns just one of them then it will have done its job adequately, but it will only receive a 10% recall score. With a large database, it’s also possible that there’s a document with the necessary context that wasn’t tagged as relevant by the benchmark builder. If the retrieval system finds that document, it will be penalized in the precision score even though the document is relevant.

Because of these issues with classical precision and recall, RAG evaluations often adapt them to work on statements instead of documents. We list the statements in the ground-truth context and in the retrieved context; we call a retrieved statement “relevant” if it was present in the ground-truth context.

$encoding="application/x-tex">\text{precision} = \frac{\text{\# relevant retrieved statements}}{\text{\# retrieved statements}} \qquad \text{recall} = \frac{\text{\# relevant retrieved statements}}{\text{\# ground truth statements}}</annotation></semantics>$

This definition of precision and recall is better tailored to RAG than the classical one, but it comes with a big disadvantage: you need to decide what a “statement” is, and whether two statements are “the same.” Usually you’ll want to automate this decision with an LLM, but that raises its own issues with cost and reliability. We’ll say more about that later.

Evaluating generation

Once your retrieval is working well — with continuous monitoring and evaluation, of course — you’ll need to evaluate your generation step. The most commonly used metric here is faithfulness¹, which measures whether a generated answer is factually supported by the retrieved context; this is the bottom side of the RAG Triad. To calculate faithfulness, we count the number of factual claims in the generated answer, and then decide which of them is supported by the context. Then we define

$encoding="application/x-tex">\text{faithfulness} = \frac{\text{\# context-supported statements}}{\text{\# statements}}</annotation></semantics>$

Like the RAG-adapted versions of context precision and recall, this is a statement-based metric. To automate it, we’d need an LLM to count the factual claims and decide which of them is context-supported.

You can evaluate faithfulness without having retrieval working yet, as long as you have a benchmark with ground truth contexts. But if you do that, there’s one crucial point to keep in mind: you also need to test generation when retrieval is bad, like when it contains distracting irrelevant documents or just doesn’t have anything useful at all. Bad retrieval will definitely happen in the wild, and so you need to ensure that your generation (and your generation evaluation) will degrade gracefully. More on that below.

Evaluating the answer

Finally, there is a family of commonly-used generation metrics that evaluate the quality of the answer by comparing it to the prompt and the ground truth:

answer semantic similarity measures the semantic similarity between the generated answer and the ground truth;

answer correctness also compares the generated answer and the ground truth, but is based on counting factual claims instead of semantic similarity; and

answer relevance measures how well the generated answer corresponds to the question that the prompt asked. This is the top-left side of the RAG Triad.

These metrics directly get to the key outcome of your RAG system: are the generated responses good? They come with the usual pluses and minuses of end-to-end metrics. On the one hand, they measure exactly what you care about; on the other hand, when they fail you don’t know which component is to blame.

As you’ve seen above, many of the metrics used for evaluating RAG rely on LLMs to extract and evaluate factual claims. That means that some of the same challenges you’ll face while building your RAG system also apply to its evaluation:

You’ll need to decide which model (or models) to use for evaluation, taking into account cost, accuracy, and reliability.

You’ll need to sanity-check the evaluator’s responses, preferably with continuous monitoring and occasional manual checks.

Because the field is moving so quickly, you’ll need to evaluate the options yourself — any benchmarks you read online have a good chance of being obsolete by the time you read them.

When the judges don’t agree

In order to better understand these issues, we ran a few experiments on a basic RAG system — without query re-writing, context re-ranking or other tools to improve retrieval — using the Neural Bridge benchmark dataset as our test set. We first ran these experiments in early 2024; when we re-visited them in December 2024 we found that newer base LLMs had improved results somewhat but not dramatically.

The Neural Bridge dataset contains 12,000 questions; each one comes with a context and an answer. We selected 200 of these questions at random and ran them through a basic RAG system using Chroma DB as the vector store and either Llama 2 or Claude Haiku 3 as the LLM for early 2024 and December 2024 runs, respectively. The RAG system was not highly tuned — for example, its retrieval step was just a vector similarity search — and so it gave a mix of good answers, bad answers, and answers saying essentially “I don’t know: the context doesn’t say.” Finally, we used Ragas to evaluate various metrics on the generated responses, while varying the LLMs used to power the metrics.

Experimental results

Our goal in these experiments was to determine:

whether the LLM evaluators were correct, and

whether they were consistent with one another.

We found that different LLMs are often not in agreement. In particular, they can’t all be correct.

Here are the evaluation scores of five different models on four different metrics, averaged across our benchmark dataset. You’ll notice a fair amount of spread in the scores for faithfulness and context precision.

Average metrics scores across models

But the scores above are just averages across the dataset — they don’t tell us how well the LLMs agreed on individual ratings. For that, we checked the correlation between model scores and again found some discrepancies between models. Here are the results for answer relevancy scores: the correlations show that even though the different models gave very similar average scores, they aren’t in full agreement.

Correlation of answer relevancy scores across models. A score of one means that the models agree completely, while a score of zero means that they agree or disagree essentially at random.

It might not be too surprising that models from the same family (GPT 3.5 and 4, and Sonnet 3 and 3.5) had larger overlaps than models from different families. If your budget allows it, choosing multiple uncorrelated models and evaluating with all of them might make your evaluation more robust.

When faithfulness gets difficult

We dug a little more into the specific reasons for LLM disagreement, and found something interesting about the faithfulness score: we restricted to the subset of questions for which retrieval was particularly bad, having no overlap with the ground truth data. Even the definition of faithfulness is tricky when the context is bad. Let’s say the LLM decides that the context doesn’t have relevant information and so responds “I don’t know” or “The context doesn’t say.” Are those factual statements? If so, are they supported by the context? If not, then according to the definition, the faithfulness is zero divided by zero. Alternatively, you could try to detect responses like this and treat them as a sort of meta-response that doesn’t go through the normal metrics pipeline. We’re not sure how best to handle this corner case, but we do know that you need to do it explicitly and consistently. You also need to be prepared to handle null values and empty responses from your metrics pipeline, because this situation often induces them.

Experimental results

On the subset of questions with poor retrieval our Ragas-computed faithfulness scores ranged from 0%, as judged by Llama 3, to more than 80%, as judged by Claude 3 Sonnet. We emphasize that these were faithfulness scores evaluated by different LLMs judging the same retrievals, responses, and generated answers. Even if you exclude Llama 3 as an outlier, there is a lot of variation.

Faithfulness scores across models, when the context is bad

This variation in scores doesn’t seem to be an intentional choice (to the extent that LLMs can have “intent”) by the evaluator LLMs, but rather a situation of corner cases compounding one another. We noticed that this confusing situation made some models — Llama 3 most often, but also other models — fail to respond in the JSON format expected by the Ragas library. Depending on how you treat these failures, this can result in missing metrics or strange scores. You can sidestep these issues somewhat if you have thorough evaluation across the entire RAG pipeline: if other metrics are flagging poor retrieval, it matters less that your generation metrics are behaving strangely on poorly-retrieved examples.

In general, there’s no good substitute for careful human evaluation. The LLM judges don’t agree, so which one agrees best with ground truth human evaluations (and is the agreement good enough for your application)? That will depend on your documents, your typical questions, and on future releases of improved models.

Conclusion

Oh, were you hoping we’d tell you which LLM you should use? No such luck: our advice would be out of date by the time you read this, and if your data doesn’t closely resemble our benchmark data, then our results might not apply anyway.

In summary, it’s easy to compute metrics for your RAG application, but don’t just do it blindly. You’ll want to test different LLMs for driving the metrics, and you’ll need to evaluate their outputs. Your metrics should cover all the sides of the RAG triad, and you should know what they mean (and be aware of their corner cases) so that you can interpret the results. We hope that helps, and happy measuring!

The terminology is not quite settled: what Ragas calls “faithfulness,” TruLens calls “groundedness.” Since the RAG Triad was introduced by TruLens, you’ll usually see it used in conjunction with their terminology. We’ll use the Ragas terminology in this post, since that’s what we used for our experiments.↩
February 27, 2025 12:00 AM

Query	Ground truth context	Ground truth answer
What is the capital of France?	Paris, the capital of France, is known for its delicious croissants.	Paris
Where are the best croissants?	Lune Croissanterie, in Melbourne, Australia, has been touted as ‘the best croissant in the world.’	Melbourne

February 26, 2025

GHC Developer Blog

GHC 9.12.2-rc1 is now available

GHC 9.12.2-rc1 is now available

Ben Gamari - 2025-02-26

The GHC developers are happy to announce the availability of the first and likely final release candidate of GHC 9.12.2. This is an important bug-fix release resolving a significant correctness issue present in 9.12.1 (#25653).

In accordance with our under-discussion release policies this candidate will have a two-week testing window. The final 9.12.2 release will likely come the week of 12 March 2025.

As always, if you find anything amiss please open a ticket.

Happy Haskelling!

Ben

by ghc-devs at February 26, 2025 12:00 AM

February 20, 2025

Tweag I/O

From minimal skeletons to comprehensive transactions with cooked-validators
Cooked Validators is a Haskell library designed to simplify the complex process of crafting and testing transactions on the Cardano blockchain. Writing proper transactions in Cardano can be challenging due to its UTXO-based model, which requires precise definitions and careful structuring of inputs, outputs, and complementary components. cooked-validators tackles these challenges by offering a powerful framework for defining transactions in a minimal and declarative manner while incorporating a significant degree of automation.

One of the library’s core strengths lies in its ability to help developers transform simple transaction templates, referred to as “skeletons”, or TxSkel, into fully-formed transactions that satisfy the technical requirements of Cardano’s validation process. This automation not only minimizes boilerplate code but also reduce the room for errors, thus streamlining the creation and testing of transactions. In particular, we’ve used cooked-validators extensively to rigorously audit smart contracts for many clients and well-known products now live on Cardano.

Although cooked-validators has been a reliable tool for years, no blog post has yet explored how it automates key aspects of transaction creation, simplifying complex processes into manageable workflows. This post aims to fill that gap by showcasing how the library helps developers build Cardano transactions with ease and efficiency, allowing them to focus on high-level design and intent rather than getting bogged down by low-level technical details.

Validating transactions in cooked-validators

cooked-validators provides a convenient way to interact with the blockchain through a type class abstraction, MonadBlockChain. Among the primitives provided by this type class, the most fundamental is validateTxSkel which:

takes a transaction skeleton as input,

expands the skeleton’s content based on missing parts and skeleton options,

generates a transaction,

submits this transaction for validation, and

returns the validated transaction, or throws an error if it is invalid.

Thus, the function has the following type signature:
validateTxSkel :: (MonadBlockChain m) => TxSkel -> m CardanoTx
In the remainder of this post, we will explore the fields of the transaction skeleton (TxSkel) and how validateTxSkel behaves when automatically expanding this skeleton.

Transaction skeletons

Cardano transactions are usually represented by large Haskell records containing a predefined set of fields that evolve alongside the Cardano protocol. The traditional approach to building transactions involves directly creating instances of these records and submitting them for validation.

In cooked-validators, however, transactions are further abstracted through a custom record called TxSkel, which has its own set of fields, some of which map directly to corresponding fields in a Cardano transaction, while others guide the translation process. The primary motivation behind using this abstraction is to highlight the most relevant information for common use cases while hiding less critical details that can be inferred automatically based on the provided data¹.

There are several additional reasons for the use of TxSkel:

Our transaction skeletons embed as much type information as possible for scripts and UTXOs, thus increasing type-safety.

Each transaction skeleton includes its own set of options to guide transaction generation, with sensible default values.

Our transaction skeletons have default values for all fields, allowing users to provide minimal information relevant to their use-case.

Our skeleton elements use meaningful, yet simple types, avoiding the need for the complex overlays and type annotations commonly found in Cardano or Ledger APIs, which are avoided by defaulting to the current Cardano era.

While TxSkel is designed to be lighter and more user-friendly than Cardano transactions, it does not compromise user flexibility. Since TxSkel ultimately generates Cardano transactions, users are provided with the option to manually tweak the generated transaction if desired. This ensures that users retain full control and can build their Cardano transactions in any way they prefer.

To build a transaction skeleton, users simply override the fields they need to set from the default skeleton, txSkelTemplate.
txSkelTemplate
  { txSkelIns = ...,
    txSkelMints = ...,
	...
  }
From manual ADA payments to automated transaction balancing

The first feature one might expect from a transaction is to pay assets to a given peer. Surprisingly, this can be quite complex due to the underlying extended UTXO model on which Cardano is based. Without diving too deeply into the details, it’s important to understand that exchanging assets in Cardano is done through “pouches” of various sizes, called UTXOs. If Alice wants to send 12 ADA (Cardano’s currency) to Bob, and she possesses one UTXO with 4 ADA and another with 10 ADA, she will have to provide both UTXOs, create a new UTXO with 12 ADA for Bob, and return a UTXO with 2 ADA for herself. Moreover, she will also need to account for transaction fees, meaning the returning UTXO will actually contain something like 1.998222 ADA (1,998,222 lovelace).

In summary, this seemingly simple payment of 12 ADA will result in a transaction with 2 inputs and 2 outputs, along with an additional “phantom” payment corresponding to the transaction fees. However, from the user’s perspective, the key point is that Alice needs to pay 12 ADA to Bob. cooked-validators allows users to focus on these high-level intentions, as demonstrated by the following skeleton:
txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12],
    txSkelSigners = [alice]
  }
In this skeleton, we specify that the transaction pays 12 ADA to Bob and that Alice is a signer of the transaction. And that’s it.

Internally, cooked-validators processes this skeleton through a balancing phase. In this context, “balancing” is a multifaceted term. It not only refers to ensuring that the inputs and outputs of the transaction contain the same amount of ADA (and other assets)², but also to calculating fees, accounting for them in the transaction, and handling associated collaterals when necessary (funds that are made available within the transaction in case a script failure occurs during validation). This automated process is a part of the added value provided by cooked-validators.

Computing fees, collaterals, and balancing transactions is notoriously difficult in Cardano due to circular dependencies (higher fees imply more collaterals, which increase transaction size, which in turn leads to higher fees…) and the unpredictable resource consumption of scripts in terms of memory space and computation cycles. See cooked-validators’s documentation for the details of what balancing involves, how cooked-validators performs it, and the options available to control balancing. Notably, cooked-validators is non-invasive, meaning that the automation can be disabled if needed. For instance, users can manually set fees and collaterals and even balance their transactions themselves.

After balancing, the skeleton will look like this:
txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12, paysPk alice $ lovelace 1_998_222],
    txSkelIns = Map.fromList [(aliceUtxo1, emptyTxSkelRedeemer), (aliceUtxo2, emptyTxSkelRedeemer)],
    txSkelSigners = [alice]
  }
In most cases, this skeleton will remain hidden from the user, though it can be retrieved and used if necessary by manually invoking the balancing function or checking the logs.

From manual payments to automated minimal amount of ADA

While Alice is using Cardano, she might come across non-ADA tokens with custom names⁴, such as mySmartContractToken. These tokens are provided by smart contracts and dedicated to specific purposes such as NFTs to represent ownership of a certain resource. Alice might also want to send such a token to Bob:
txSkelTemplate
  { txSkelOuts = [paysPk bob $ mySmartContractToken 1],
    txSkelSigners = [alice]
  }
As shown above, cooked-validators will attempt to balance this skeleton by retrieving an instance of mySmartContractToken from Alice’s UTXOs, along with the necessary ADA to cover the transaction fee. However, validating the resulting balanced skeleton will fail because Cardano requires every UTXO to include a minimum amount of lovelace to cover its storage cost. This minimum amount, derived from the protocol parameters, also acts as a safeguard against potential security risks that could arise if UTXOs were allowed to exist without any ADA. Thankfully, cooked-validators can automatically calculate this required amount when the appropriate transaction option is enabled. The updated skeleton then becomes:
txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1],
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }
Enabling this option triggers an initial transformation pass, before balancing, which calculates the required amount of ADA to sustain the output and adds this amount to the transaction skeleton. After both passes, the skeleton will resemble something like, with remainingValue being the original value in Alice’s UTXO minus the fees and the payment to Bob:
txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1 <> lovelace 546_000, paysPk alice remainingValue],
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }
By default, txOptEnsureMinAda is set to False, which may seem counterintuitive. However, this prevents unexpected adjustments to ADA amounts that may have been carefully computed. If a transaction output is meant to contain a specific ADA amount based on a precise calculation, but the protocol requires a higher minimum, enabling this option would silently modify the value. This could obscure computation errors, allowing transactions to validate without the user realizing the discrepancy. To stay true to cooked-validators’s philosophy of minimal intervention, the option remains off by default, ensuring that any necessary adjustments are made explicitly.

From spending scripts to automated script witness binding

In the previous examples, we saw how cooked-validators can handle the addition of inputs in a transaction skeleton. However, there are cases where one might want to manually specify the inputs. This is typically necessary when a transaction needs to consume UTXOs belonging to scripts, in which case a redeemer must be provided, as it cannot be inferred automatically. A redeemer is a piece of information (which may be empty) required whenever a script from a smart contract is invoked. This redeemer usually informs the script as to why it has been called, and can also pass dynamic values as inputs to the script. In the examples above, the added inputs were UTXOs from peers, so emptyTxSkelRedeemer was automatically provided.

When consuming scripts, collaterals must be included in case the validation process fails after the script execution. These collaterals cover the computation resources used during validation, which cannot be covered by fees, as fees are only paid if the transaction is successfully validated. The inclusion (or omission) of collaterals, depending on whether the transaction involves scripts, is handled during balancing. Collaterals can only be provided as UTXOs from peers, so a signer is also required in such cases, even if no peer UTXO is consumed. A transaction skeleton that consumes a script can thus be written as:
txSkelTemplate
  { txSkelIns = Map.singleton scriptUtxo $ someTxSkelRedeemer scriptRedeemer,
    txSkelSigners = [alice],
  }
From this skeleton, cooked-validators offers two types of automation. The first is the balancing mechanism, which has already been discussed. Beyond computing fees and collaterals, the balancing process also creates an output at the first signer’s address to return any excess value from inputs and consumes a UTXO from the user to cover the transaction fees.

The second automation concerns the addition of script witnesses. On-chain, scripts are represented by their hash, which serves different purposes depending on the script’s type³—address for spending scripts, policy ID for minting scripts, or staking ID for staking scripts. However, during validation, scripts must be executed, and their hash alone is insufficient. Instead, users must supply the full scripts as witnesses, ensuring their hash matches the expected on-chain hash.

When a UTXO is created at a spending script’s address, cooked-validators retains the script, allowing it to automatically attach the required witness for these inputs in future transactions. However, for minting or staking scripts, the tool lacks knowledge of the necessary witnesses, so they must be specified manually.

Since September 2022, Cardano supports reference scripts, which are complete scripts stored on-chain in UTXOs. These reference scripts can be used as witnesses in place of the full script, reducing transaction size and fees. cooked-validators also automates the inclusion of such reference scripts. In practice, when a script witness is required, the following process unfolds:

if a witness is manually provided by the user, it is used as is.

if no such witness exists, cooked-validators attempts to find a reference witness among known UTXOs.

if no such witness could be found, and the script is used for spending a UTXO, cooked-validators attempts to find a direct witness among its known scripts.

In the previous example, assuming a reference witness was present on some UTXO, the skeleton will look like this:
txSkelTemplate
  { txSkelIns = Map.fromList
      [ (scriptUtxo, TxSkelRedeemer scriptRedeemer (Just referenceInputWithScript)),
        (aliceUtxoForFees, emptyTxSkelRedeemer)
      ],
    txSkelOut = [paysPK alice $ valueInScriptUtxo <> valueInAliceUtxo <> negate fee],
    txSkelSigners = [alice],
  }
From issuing proposals to automated deposit payment

The final type of automation we will discuss in this post involves proposals issued by users, a feature introduced in the Conway era. These proposals can vary, but the most common are parameter changes, where users propose new values for parameters that control on-chain behaviors. These proposals must obey a set of constitutional rules, which is checked by a constitution script. For example, here is a skeleton where Alice proposes to update the cost of fees per byte in the size of a transaction to 100 lovelace, witnessed by a given constitution script:
txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelSigners = [alice],
  }
Each proposal requires a deposit of a certain amount of lovelace, as specified by the protocol parameters. cooked-validators takes such deposits into account during the balancing phase. It looks up the current required deposit amount and retrieves this amount from the available UTXOs from the balancing wallet to include in the transaction. After balancing, the skeleton will look like this:
txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelOuts = [paysPK alice (valueInAliceUtxo <> negate fee <> negate depositValueFromParams)],
    txSkelSigners = [alice],
  }
Currently, cooked-validators allows the users to provide any constitution script to validate whether the proposal adheres to constitutional rules. In practice, the ledger prevents any such script that does not correspond to the current official Cardano constitution. Thus, in the future, cooked-validators might automatically fetch this script and attach it to proposals.

Conclusion

One of cooked-validators’ main strengths is its ability to allow users to express their high-level transaction requirements conveniently and efficiently, without having to deal with the intricate technical details of the resulting transaction. This is achieved through TxSkels, which are transaction abstractions that can be partially filled by users. cooked-validators performs several passes on these partial skeletons, such as filling in missing minimal ADA, balancing the transaction, and automatically adding witnesses, to translate these minimal skeletons into transactions that can be submitted for validation. This blog post has summarized these key automation steps, stay tuned for more posts around cooked-validators.

it is always possible to override those fields in the generated transaction though, as cooked-validators never forces users to build their transactions one way or another.↩

the actual balancing equation is more complicated: withdrawals + inputs + mints = burn + outputs + deposits + fees↩

this name here stands for the combination of a token name and a policy ID.↩

all scripts are defined in a same way following the Conway era, what we call script types are only abstractions to reference the way they are used.↩
February 20, 2025 12:00 AM

February 17, 2025

Haskell Interlude

62: Conal Elliott

In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners.

by Haskell Podcast at February 17, 2025 11:00 AM

February 15, 2025

Sandy Maguire

Bidirectional Instance Contexts
Just a quick one today, but I wanted to point out a little trick you can do with Haskell’s typeclass inference.

Imagine we have some little class, the details of which matter not in the least:
type Foo :: Type -> Constraint
class Foo a where
  ...
We can give some instances of this type:
instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...
Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:
instance (Foo a, Foo b) => Foo (a, b) where ...
Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:
import Data.Constraint

forwards
  :: Dict (Foo a)
  -> Dict (Foo b)
  -> Dict (Foo (a, b))
forwards Dict Dict = Dict
Perhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck:
backwards
  :: Dict (Foo (a, b))
  -> (Dict (Foo a), Dict (Foo b))
backwards Dict = (Dict, Dict)
But why should it not typecheck?¹ Recall from the relevant instance definition that these instances must, in fact, exist:
instance (Foo a, Foo b) => Foo (a, b)
As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances.

The trick is to add an associated type family to Foo, and to use it as a superclass constraint:
type Foo :: Type -> Constraint
class Evidence a => Foo a where
  type Evidence a :: Constraint
  type Evidence a = ()
  ...
Because we’ve given a default implementation of the type family, our existing simple instances work as before:
instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...
with the only change required coming from the type constructor instances:
instance (Foo a, Foo b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...
or, if we you want to be cute about it:
instance Evidence (a, b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...
By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.

This is all available in a play session if you’d like to fool around with it.

Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever.↩︎
February 15, 2025 02:15 AM

February 13, 2025

Tweag I/O

Bashfulness


When I first joined the Topiary Team, I floated the idea of trying to
format Bash with Topiary. While this did nothing to appease my
unenviable epithet of â€œthe Bash guy,â€� it was our first foray into
expanding Topiaryâ€™s support beyond OCaml and simple syntaxes like JSON.
Alas, at the time, the Tree-sitter Bash grammar was
not without its problems. I got quite a long way, despite this, but
there were too many things that didnâ€™t work properly for us to graduate
Bash to a supported language.
Fast-forward two years and both Topiary and the Tree-sitter Bash grammar
have moved on. As the incumbent Bash grammar was beginning to cause
downstream problems from bit rot â€” frustratingly breaking the builds of
both Topiary and Nickel â€” my fellow Topiarist, Nicolas Bacquey,
migrated Topiary to the latest version of the Bash grammar and updated
our Bash formatting queries to match.
With surprisingly little effort, Nicolas was
able to resolve all those outstanding problems. So with that, Bash was
elevated to the lofty heights of â€œsupported languageâ€� and â€” with the
changes Iâ€™ve made from researching this blog post â€” Bash formatting is
now in pretty good shape in Topiary v0.6.
So much so, in fact, let me put my money where my mouth is! Letâ€™s see
how Topiary fares against a rival formatter. Iâ€™ll do this, first, by
taking you down some of the darker alleys of Bash parsing, just to show
you what weâ€™re up against.
Hello darkness, my old friend
There is a fifth dimension beyond that which is known to man. It is a
dimension as vast as space and as timeless as infinity. It is the middle
ground between light and shadow, between science and superstition; it
lies between the pit of manâ€™s fears and the summit of his knowledge.
This is the dimension of imagination. It is an area we call: the Bash
grammar.
In our relentless hubris, man has built a rocket that â€” rather than
exploding on contact with reality â€” dynamically twists and turns to
meet realityâ€™s expectations. Is that a binary? Execute it! Is that a
built-in? Execute it! Is that three raccoons in a trench coat,
masquerading as a function? Execute it! And so, with each token parsed,
we are Bourne Again and stray ever further from god.
Bear witness to but a few eldritch horrors:¹


Trailing comments must be preceded by whitespace or a semicolon.
However, if either of those are escaped, they are interpreted as
literals and this changes the tokenisation semantics:
echo \ # Ceci n'est pas
 | une pipe'
Here, perhaps the writer intended to add a comment against the first
line. But, what looks like a comment isnâ€™t a comment at all; it
becomes an argument to echo, along with everything that follows.
That includes the apostrophe in â€œnâ€™estâ€�, which is interpreted as an
opening quote â€” a raw string â€” which is closed at the end of the
next line.


Case statements idiomatically delimit each branch condition with a
closing parenthesis. In a subshell, for example, this leads to
unbalanced brackets:
( case $x in foo )   # Wat?...
echo bar;; esac )    # ğŸ¤¯
This subshell outputs bar when the variable $x is equal to foo.
Whereas, on a more casual reading, this formulation might just look
like a confusing syntax error.
Speaking of case statements, did you know that ;& and ;;& are also
valid branch terminators? Without checking the manual â€” if you can
find the single paragraph where itâ€™s mentioned â€” can you tell me
how they differ?


Bash will try to compute an array index if it looks like an arithmetic
expression:
# Output the (foo - bar)th element of array
echo "${array[foo-bar]}"
However, if array in this example is an associative array (i.e., a
hash map/dictionary), then foo-bar could be a valid key. In which
case, itâ€™s not evaluated and used verbatim.


Without backtracking, itâ€™s not possible to distinguish between an
arithmetic expansion and a command substitution containing a subshell
at its beginning or end:
echo $((foo + bar))
echo $((foo); (bar))
Here, the first statement will output the value of the addition of
those two variables; the second will execute foo then bar, each in
a subshell, echoing their output. In the subshell case, the POSIX
standards even recommend that you add spaces â€” e.g., $( (foo) ) â€”
to remove this ambiguity.


Heredocs effectively switch the parser into a different state, where
everything is interpreted literally except when it isnâ€™t. This alone
is tricky, but Bash introduces some variant forms that allow
additional indentation (with hard tabs), switching off all string
interpolation, or both.
# Indented, with interpolation
cat <<-HEREDOC
	I am a heredoc. Hear me roar.
	HEREDOC


Suffice to say, any formatter has their work cut out.
Battle of the Bash formatters
The de facto formatter for Bash is shfmt. Itâ€™s written in Go,
by Daniel MartÃ, actively maintained and has been around for the best
part of a decade.
Letâ€™s compare Topiaryâ€™s Bash formatting with shfmt in a contest worthy
of a Netflix special. Iâ€™ll look specifically at each toolâ€™s parsing and
formatting capabilities as well as their performance characteristics. I
wonâ€™t, however, compare their subjective formatting styles, as this is
largely a matter of taste.
What Topiary canâ€™t do that shfmt can²
When it comes to formatting Bash in a way that is commonly attested in
the wild, there are three things that Topiary cannot currently do.
Unfortunately, these are either from the absence of a feature in
Topiary, or a lack of fidelity in the Tree-sitter grammar; no amount of
hacking on queries will fix them.
The worst offender is probably the inability to distinguish line
continuations from other token boundaries. These are used in Bash
scripts all the time to break up long commands into more digestible
code. In the following example, the call to topiary was spread over
multiple lines, with line continuations. Topiary slurps everything onto
a single line, whereas shfmt preserves the original line continuations
in the input:

# Topiary
topiary format --language bash --query bash.scm <"${script}"
# shfmt
topiary format \
    --language bash \
    --query bash.scm \
    <"${script}"

One saving grace is that Topiaryâ€™s Bash parser understands a trailing
|, in a pipeline, to accept a line break. As such â€” while it isnâ€™t my
personal favourite style³ â€” Topiary does support multi-line
pipelines. Arguably, they even look a little nicer in Topiary than in
shfmt, which only preserves where the line breaks occurred in the
input:

# Topiary
foo |
  bar |
  baz |
  quux
# shfmt
foo | bar |
    baz | quux

Otherwise, in Topiary, every command is a one-linerâ€¦whether you like
it or not!
Next on the â€œnice to haveâ€� list is the long-standing (and
controversial) feature request of â€œalignment blocksâ€�;
specifically for comments. That is, presumably related comments
appearing on a series of lines should be aligned to the same column:

# Topiary
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment
# shfmt
here     # comment
is       # comment
a        # comment
sequence # comment
of       # comment
commands # comment

The tl;dr of the controversy is that, despite being a popular request â€”
and we all know where popularity gets us, these days â€” itâ€™s a slap in
the face to one of Topiaryâ€™s core design principles: minimising diffs.
Because we live in a universe where elastic tabstops never really took
off, a small change to the above example â€” say, adding an option to one
of the commands â€” would produce the following noisy diff:
-here     # comment
-is       # comment
-a        # comment
-sequence # comment
-of       # comment
-commands # comment
+here                      # comment
+is                        # comment
+a                         # comment
+sequence                  # comment
+of                        # comment
+commands --with-an-option # comment
For the time being, Topiary wonâ€™t be making alignment great again.
Finally, string interpolations â€” with command substitution and
arithmetic expansions â€” cannot be formatted without potentially
breaking the string itself. This is particularly true of heredocs; the
full subtleties of which escape the Tree-sitter Bash grammar and so are
easily corruptible with naive formatting changes. As such, Topiary has
to treat these as immutable leaves and leave them untouched:

# Topiary
echo "2 + 2 = $((  2+  2 ))"

cat <<EOF
Today is $(   date )
EOF
# shfmt
echo "2 + 2 = $((2 + 2))"

cat <<EOF
Today is $(date)
EOF

So far, I have only found three constructions that are syntactically
correct, but the Tree-sitter Bash grammar cannot parse (whereas, shfmt
can):


A herestring that follows a file redirection (issue #282):
rev > output <<< hello
A workaround, for now, is to switch the order; so the herestring
comes first.


A heredoc that uses an empty marker (issue #283):

cat <<''
Only a monster would do this, anyway!




Similar to line continuations, the Tree-sitter Bash grammar seems to
swallow escaped spaces at the beginning of tokens, interpreting them
as tokenisation whitespace rather than literals (issue #284):
# This should output:
# <a>
# <b>
# < >
# <c>
printf "<%s>\n" a b \  c


For what itâ€™s worth, shfmt also supports POSIX shell and mksh (a
KornShell implementation). As of writing, there are no Tree-sitter
grammars for these shells. However, their syntax doesnâ€™t diverge too far
from Bash, so itâ€™s likely that Topiaryâ€™s Bash support will be sufficient
for large swathes of such scripts. Moreover, the halcyon years of the
1990s are a long way behind us, so maybe this doesnâ€™t matter.
What shfmt canâ€™t do that Topiary can²
shfmt is part of a wider project that includes a Bash parser for the
Go ecosystem. A purpose-built parser, particularly for Bash, should
perform better than the generalised promise of Tree-sitter and, indeed,
thatâ€™s what we see. However, there are a few minor constructions that
shfmt doesnâ€™t like, but the Tree-sitter Bash grammar accepts:


An array index assignment which uses the addition augmented
assignment operator:
my_array=(
  foo
  [0]+=bar
)
To be fair to shfmt, while this is valid Bash, not even the
venerable ShellCheck can parse this!


Topiary leaves array indices unformatted, despite them allowing
arithmetic expressions. shfmt, however, will add whitespace to any
index that looks like an arithmetic expression (e.g., [foo-bar]
will become [ foo - bar ]); even if the original, unspaced version
could be a valid associative array key.
(Neither Topiary nor shfmt can handle indices containing spaces.
However, the standard Bash workaroundâ„¢ is to quote these:
${array["foo bar"]}.)


Brace expansions can appear â€” perhaps
surprisingly â€” almost anywhere. Particularly surprising to shfmt
is when they appear in variable declarations, which it cannot parse:
declare {a,b,c}=123      # a=123 b=123 c=123
declare foo{1..10}=bar   # foo1=bar foo2=bar ... foo10=bar


While itâ€™s a bit of a hack,⁴ we also implement something akin to
â€œrewrite rulesâ€� in our Topiary Bash formatting queries, which shfmt
(mostly) doesnâ€™t do. This is to enforce a canonical style over certain
constructions. Namely:


All $... variables are rewritten in their unambiguous form of
${...}, excluding special variables such as $1 and $@. (Note
that this doesnâ€™t affect $'...' ANSI C strings, despite their
superficial similarity.)


All function signatures are rewritten to the name() { ... } form,
rather than function name { ... } or function name() { ... }.


All POSIX-style [ ... ] test clauses are rewritten to the Bash
[[ ... ]] form.


All legacy $[ ... ] arithmetic expansions are rewritten to their
$(( ... )) form.


All `...` command substitutions are rewritten to their
$( ... ) form.
(This is one that shfmt does do.)


Technically, it is also possible to write rules that put quotes around
unquoted command arguments, ignoring things like -o/--options. While
this is good practice, we do not enforce this style as it changes the
codeâ€™s semantics and there may be legitimate reasons to leave arguments
unquoted.
Throughput
Letâ€™s be honest: If you have so much Bash to format that throughput
becomes meaningful, then formatting is probably the least of your
worries. That being said, it is the one metric that we can actually
quantify.
Our first problem is that we need a large corpus of normal scripts. By
â€œnormal,â€� I mean things that youâ€™d see in the wild and could conceivably
understand if you squint hard enough. This rules out the Bash test
suite, for example, which â€” while quite large â€” is a grimoire of weird
edge cases that neither Topiary nor shfmt handle well. Quite frankly,
if youâ€™re writing Bash that looks like this, then you donâ€™t deserve
formatting:
: $(case a in a) : ;#esac ;;
esac)
Digging around on r/bash, I came across this
repository of scripts. Theyâ€™re all fairly short, but
theyâ€™re quite sane. This will do.
We need to slam large amounts of Bash into the immovable objects that
are our formatters; a â€œBash test dummy,â€�⁵ if
you will. It would be ideal if we could stream Bash into our formatters
â€” so we could orchestrate sampling at regular time intervals â€”
however, neither Topiary nor shfmt support streaming formatting. This
stands to reason as there are cases where formatting will depend on some
future context, so the whole input will need to be read upfront. As
such, we need to invert our approach to collecting metrics and sample
over input size instead.
The general method is:

Locate the scripts in the repository that are Bash, by looking at
their shebang.
Filter this list to those which Topiary can handle without tripping
over itself because of some obscure parsing failure. (We assume
shfmt doesnâ€™t require such a concession.)
Perform  $N$  trials, in which:

The whitelist of scripts is randomised, to remove any potential
confounding from caching.
The top  $M$  scripts are concatenated to obtain a single trial
input.⁶ This is to increase the input size to
the formatters in each trial, which is presumed to be the dependent
variable, but may be subject to confounding effects when the input
is small.
The trial input is read to /dev/null a handful of times to warm
up the filesystem cache.
The trial input is fed into the following, with benchmarks â€” trial
input size (bytes) and runtime (nanoseconds) â€” recorded for each:

cat, which acts as a control;
Topiary (v0.5.1; release build, with the query changes described
in this blog post);
Topiary, with its idempotence checking disabled;
shfmt (v3.10.0).





This identified 156 Bash scripts within the test repository; of which,
154 of them could be handled by Topiary.⁷ On an 11th
generation Intel Core i7, at normal stepping, with  $N = 50$  and  $M = 25$ , on
a Tuesday afternoon, I obtained the following results:



cat, which does nothing, is unsurprisingly way out in front; by two
orders of magnitude. This is not interesting, but establishes that input
can be read faster than it can be formatted. That is, our little
experiment is not accidentally I/O bound.
What is interesting is that Topiary is about 3Ã— faster than
shfmt. We also see that the penalty imposed by idempotency checking â€”
which formats twice, to check the output reaches a fixed point â€” is
quite negligible. This indicates that most of the work Topiary is doing
is in its startup overhead, which involves loading the grammar and
parsing the formatting query file.
Since Topiary only has to do this once per trial, itâ€™s a little unfair
to set  $M = 25$ ; that is, an artificially enlarged input that is
syntactically valid but semantically meaningless. However, if we set
 $M = 1$  (i.e., individual scripts), then we see a similar comparison:

For small inputs, the idempotency check penalty is barely perceptible.
Otherwise, the startup overhead dominates for both formatters â€” hence
the much lower throughput values â€” but, still, Topiary comfortably
outperforms shfmt by a similar factor.
And the winner isâ€¦
In an attempt to regain some professional integrity, Iâ€™ll fess up to the
fact that Topiary has a bit of a home advantage and maybe â€” just
maybe â€” Iâ€™m ever so slightly biased. That is, as we are in the
(dubious) position of building a plane while attempting to fly it, I was
able to tweak and fix a few of our formatting rules to improve Topiaryâ€™s
Bash support during the writing of this blog post:

I added formatting rules for arrays (and associative arrays) and
their elements.
I corrected the formatting of trailing comments that appear at the
end of a script.
I corrected the function signature rewriting rule.
I corrected the formatting of a string of commands that are
interposed by Bashâ€™s & asynchronous operator.
I fixed the formatting of test commands and added a rewrite rule for
POSIX-style [ ... ] tests.
I implemented multi-line support for pipelines.⁸
I updated the $... variable rewrite rule to avoid targeting
special forms like $0, $? and $@, etc.
I implemented a rewrite rule that converts legacy $[ ... ]
arithmetic expansions into their $(( ... )) form.
I implemented a rewrite rule that converts `...` command
substitutions into their $(...) form.
I fixed the spacing within variable declarations, to accommodate
arguments and expansions.
I forced additional spacing in command substitutions containing
subshells, to remove any ambiguity with arithmetic expansions.

The point Iâ€™m making here is that these adjustments were very easy to
conjure up; just a few minutes of thought for each, across our
Tree-sitter queries, was required.
So whoâ€™s the winner?
Well, would it be terribly anticlimactic of me, after all that, not to
call it? shfmt is certainly more resilient to Bash-weirdness and, of
the â€œbig threeâ€� I discussed, its line continuation handling is a must
have. However, Topiary does pretty well, regardless: Itâ€™s much faster,
for what thatâ€™s worth, and â€” more to the point â€” far easier to tweak
and hack on.
Indeed, when the Topiary team first embarked upon this path, we werenâ€™t
even sure whether it would be possible to format Bash. Now that the
Tree-sitter Bash grammar has matured, Topiary â€” perhaps with future
fixes to address some of its shortcomings, uncovered by this blog post
â€” is a contender in the Bash ecosystem.


Thanks to Nicolas Bacquey, Yann Hamdaoui, Tor Hovland, Torsten Schmits
and Arnaud Spiwack for their reviews and input on this post, and to
Florent Chevrou for his assistance with the side-by-side code styling.







Itâ€™s very likely that the syntax highlighting for the more exotic
Bash snippets in this blog post will be completely broken.â†©


â€¦Yet.â†©


My preferred multi-line pipeline style is to have a line
continuation and then the | character on the next line, indented:
foo \
  | bar \
  | baz \
  | quux
I personally find this much clearer, but Topiary cannot currently
handle those pesky line continuations. For shame!â†©


Topiaryâ€™s formatting rules include node deletion and delimiter
insertion. However, delimiters can be any string, so we can coopt
this functionality to create basic rewrite rules.â†©


Iâ€™m also the â€œterrible pun guy.â€�â†©


This exposed an unexpected bug, whereby Topiaryâ€™s formatting model
breaks down when some complexity (or, by proxy, size) limit is
reached. This behaviour had not been previously observed and further
investigation is required.â†©


The two failures were due to the aforementioned herestring and
complexity⁶ problems.â†©


It may also be possible to implement multi-line && and ||
lists in a similar way. However, the Tree-sitter grammar parses
these into a left-associative nested (list) structure, which is
tricky to query.â†©

February 13, 2025 12:00 AM

Oleg Grenrus

PHOAS to de Bruijn conversion
Posted on 2025-02-13 by Oleg Grenrus agda

Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.

There are standalone source files if you just want to see the code:

Agda: https://github.com/phadej/nbexp/blob/master/src/NbEXP/SelfContained/Conv.agda

Haskell: https://github.com/phadej/nbexp/blob/master/src-hs/Conv.hs

How to convert PHOAS terms to de Bruijn terms?

The solution is hard to find.

You can cheat, [as mentioned by Roman on Agda mailing list https://lists.chalmers.se/pipermail/agda/2018/010033.html]:

There is always a way to cheat, though. You can turn the PHOAS -> untyped de Bruijn machinery into the PHOAS -> typed de Bruijn machinery by checking that future contexts indeed extend past contexts and throwing an error otherwise (which can't happed, because future contexts always extend past contexts, but it's a metatheorem).

In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing
  where postulate whatever : _
Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.

I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.

Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.

In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.

This Agda development is a translation of that developement.

Common setup

Our syntax representations will be well-typed, so we need types:
-- Types
data Ty : Set where
  emp : Ty
  fun : Ty → Ty → Ty

Ctx : Set
Ctx = List Ty

variable
  A B C : Ty
  Γ Δ Ω : Ctx
  v : Ty → Set
de Bruijn syntax
Var : Ctx → Ty → Set
Var Γ A = Idx A Γ -- from agda-np, essentially membership relation.

data DB (Γ : Ctx) : Ty → Set where
  var : Var Γ A → DB Γ A
  app : DB Γ (fun A B) → DB Γ A → DB Γ B
  lam : DB (A ∷ Γ) B → DB Γ (fun A B)
  abs : DB Γ emp → DB Γ A
Parametric Higher-order abstract syntax
data PHOAS (v : Ty → Set) : Ty → Set where
  var : v A → PHOAS v A
  app : PHOAS v (fun A B) → PHOAS v A → PHOAS v B
  lam : (v A → PHOAS v B) → PHOAS v (fun A B)
  abs : PHOAS v emp → PHOAS v A

-- closed "true" PHOAS terms.
PHOAS° : Ty → Set₁
PHOAS° A = ∀ {v} → PHOAS v A
de Bruijn to PHOAS

This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.
phoasify : NP v Γ → DB Γ A → PHOAS v A
phoasify γ (var x)   = var (lookup γ x)
phoasify γ (app f t) = app (phoasify γ f) (phoasify γ t)
phoasify γ (lam t)   = lam λ x → phoasify (x ∷ γ) t
phoasify γ (abs t)   = abs (phoasify γ t)
Interlude: Well-formedness of PHOAS terms

dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less

The goal is to rule out standalone terms like
module Invalid where
  open import Data.Unit using (⊤; tt)

  invalid : PHOAS (λ _ → ⊤) emp
  invalid = var tt
Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructing PHOAS terms.

The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.
data phoasWf {v : Ty → Set} (G : List (Σ Ty v)) : {A : Ty} → PHOAS v A → Set
 where
  varWf : ∀ {A} {x : v A}
    → Idx (A , x) G
    → phoasWf G (var x)
  appWf : ∀ {A B} {f : PHOAS v (fun A B)} {t : PHOAS v A}
    → phoasWf G f
    → phoasWf G t
    → phoasWf G (app f t)
  lamWf : ∀ {A B} {f : v A → PHOAS v B}
    → (∀ (x : v A) → phoasWf ((A , x) ∷ G) (f x))
    → phoasWf G (lam f)
  absWf : ∀ {A} {t : PHOAS v emp}
    → phoasWf G t
    → phoasWf G (abs {A = A} t)

-- closed terms start with an empty G
phoasWf° : PHOAS° A → Set₁
phoasWf° tm = ∀ {v} → phoasWf {v = v} [] tm
A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.
meta-theorem-proposition : Set₁
meta-theorem-proposition = ∀ {A} (t : PHOAS° A) → phoasWf° t
As far as I'm aware this proposition cannot be proved nor refuted in Agda.

de Bruijn to PHOAS translation creates well-formed PHOAS terms.

As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.
toList : NP v Γ → List (Σ Ty v)
toList []       = []
toList (x ∷ xs) = (_ , x) ∷ toList xs

phoasifyWfVar : (γ : NP v Γ) (x : Var Γ A) → Idx (A , lookup γ x) (toList γ)
phoasifyWfVar (x ∷ γ) zero    = zero
phoasifyWfVar (x ∷ γ) (suc i) = suc (phoasifyWfVar γ i)

phoasifyWf : (γ : NP v Γ) (t : DB Γ A) → phoasWf (toList γ) (phoasify γ t)
phoasifyWf γ (var x)   = varWf (phoasifyWfVar γ x)
phoasifyWf γ (app f t) = appWf (phoasifyWf γ f) (phoasifyWf γ t)
phoasifyWf γ (lam t)   = lamWf λ x → phoasifyWf (x ∷ γ) t
phoasifyWf γ (abs t)   = absWf (phoasifyWf γ t)

phoasifyWf° : (t : DB [] A) → phoasWf° (phoasify [] t)
phoasifyWf° t = phoasifyWf [] t
PHOAS to de Bruijn

The rest deals with the opposite direction.

In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.

I think that in the typed setting using v = λ _ → Ctx turns out nicer.

The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.
data IsSuffixOf {ℓ} {a : Set ℓ} : List a → List a → Set ℓ where
  refl : ∀ {xs} → IsSuffixOf xs xs
  cons : ∀ {xs ys} → IsSuffixOf xs ys → ∀ {y} → IsSuffixOf xs (y ∷ ys)
We need to establish well-formedness of PHOAS expression in relation to some context Γ

Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.
wf : (Γ : Ctx) → PHOAS (λ _ → Ctx) A → Set
wf {A = A} Γ (var Δ)         = IsSuffixOf (A ∷ Δ) Γ
wf         Γ (app f t)       = wf Γ f × wf Γ t
wf         Γ (lam {A = A} t) = wf (A ∷ Γ) (t Γ)
wf         Γ (abs t)         = wf Γ t
And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):
makeVar : IsSuffixOf (A ∷ Δ) Γ → Var Γ A
makeVar refl     = zero
makeVar (cons s) = suc (makeVar s)
Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.
dbify : (t : PHOAS (λ _ → Ctx) A) → wf Γ t → DB Γ A
dbify         (var x)   wf        = var (makeVar wf)
dbify         (app f t) (fʷ , tʷ) = app (dbify f fʷ) (dbify t tʷ)
dbify {Γ = Γ} (lam t)   wf        = lam (dbify (t Γ) wf)
dbify         (abs t)   wf        = abs (dbify t wf)
What is left is to show that we can construct wf for all phoasWf-well-formed terms.

Adam Chlipala defines a helper function:
makeG′ : Ctx → List (Σ Ty (λ _ → Ctx))
makeG′ [] = []
makeG′ (A ∷ Γ) = (A , Γ) ∷ makeG′ Γ
However for somewhat technical reasons, we rather define
expand : (Γ : Ctx) → NP (λ _ → Ctx) Γ
expand []      = []
expand (_ ∷ Γ) = Γ ∷ expand Γ
and use expand with previously defined toList to define our version of makeG:
makeG : Ctx → List (Σ Ty (λ _ → Ctx))
makeG Γ = toList (expand Γ)
makeG and makeG′ are the same:
toList∘expand≡makeG : ∀ Γ → makeG Γ ≡ makeG′ Γ
toList∘expand≡makeG []      = refl
toList∘expand≡makeG (A ∷ Γ) = cong ((A , Γ) ∷_) (toList∘expand≡makeG Γ)
Then we can construct wf for all phoasWf:
wfWfVar : Idx (A , Δ) (makeG Γ) → IsSuffixOf (A ∷ Δ) Γ
wfWfVar {Γ = B ∷ Γ} zero    = refl
wfWfVar {Γ = B ∷ Γ} (suc i) = cons (wfWfVar i)

wfWf : (t : PHOAS (λ _ → Ctx) A) → phoasWf (makeG Γ) t → wf Γ t
wfWf         (var x)   (varWf xʷ)    = wfWfVar xʷ
wfWf         (app f t) (appWf fʷ tʷ) = wfWf f fʷ , wfWf t tʷ
wfWf {Γ = Γ} (lam f)   (lamWf fʷ)    = wfWf (f Γ) (fʷ Γ)
wfWf         (abs t)   (absWf tʷ)    = wfWf t tʷ
And finally we define dbifyᵒ for all well-formed PHOASᵒ terms.
dbify° : (t : PHOAS° A) → phoasWf° t → DB [] A
dbify° t w = dbify t (wfWf t w)
Bonus section

We can show that converting closed de Bruijn term to PHOAS and back is an identity function:
bonus-var : (x : Var Γ A) → x ≡ makeVar (wfWfVar (phoasifyWfVar (expand Γ) x))
bonus-var {Γ = A ∷ Γ} zero    = refl
bonus-var {Γ = A ∷ Γ} (suc i) = cong suc (bonus-var i)

bonus : (t : DB Γ A)
      → t ≡ dbify (phoasify (expand Γ) t)
              (wfWf (phoasify (expand Γ) t) (phoasifyWf _ t))
bonus (var x)   = cong var (bonus-var x)
bonus (app f t) = cong₂ app (bonus f) (bonus t)
bonus (lam t)   = cong lam (bonus t)
bonus (abs t)   = cong abs (bonus t)

bonus° : ∀ (t : DB [] A) → t ≡ dbify° (phoasify [] t) (phoasifyWf° t)
bonus° t = bonus t
February 13, 2025 12:00 AM

February 12, 2025

Well-Typed.Com

The Haskell Unfolder Episode 39: deriving strategies

Today, 2025-02-12, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 39th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 39: deriving strategies

In this episode we’ll discuss the the four different ways GHC offers for deriving class instance definitions: the classic “stock” deriving, generalised “newtype” deriving, as well as the “anyclass” and “via” strategies. For each of these, we’ll explain the underlying ideas, use cases, and limitations.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at February 12, 2025 12:00 AM

February 11, 2025

Oleg Grenrus

NbE PHOAS

Posted on 2025-02-11 by Oleg Grenrus agda

Normalization by evaluation using parametric higher order syntax. In Agda.

I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.

module NbEXP.PHOAS where

data Ty : Set where
  emp : Ty
  fun : Ty → Ty → Ty

data Tm (v : Ty → Set) : Ty → Set where
  var : ∀ {a} → v a → Tm v a
  app : ∀ {a b} → Tm v (fun a b) → Tm v a → Tm v b
  lam : ∀ {a b} → (v a → Tm v b) → Tm v (fun a b)

data Nf (v : Ty → Set) : Ty → Set
data Ne (v : Ty → Set) : Ty → Set

data Ne v where
  nvar : ∀ {a} → v a → Ne v a
  napp : ∀ {a b} → Ne v (fun a b) → Nf v a → Ne v b

data Nf v where
  neut : Ne v emp → Nf v emp
  nlam : ∀ {a b} → (v a → Nf v b) → Nf v (fun a b)

Sem : (Ty → Set) → Ty → Set
Sem v emp       = Ne v emp
Sem v (fun a b) = Sem v a → Sem v b

lower : ∀ {v : Ty → Set} (a : Ty) → Sem v a → Nf v a
raise : ∀ {v : Ty → Set} (a : Ty) → Ne v a → Sem v a

lower emp       s = neut s
lower (fun a b) s = nlam λ x → lower b (s (raise a (nvar x)))

raise emp       n   = n
raise (fun a b) n x = raise b (napp n (lower a x ))

eval : {v : Ty → Set} {a : Ty} → Tm (Sem v) a → Sem v a
eval (var x)   = x
eval (app f t) = eval f (eval t)
eval (lam t) x = eval (t x)

nf : {a : Ty} → {v : Ty → Set} → Tm (Sem v) a → Nf v a
nf {a} t = lower a (eval t)

nf_parametric : {a : Ty} → ({v : Ty → Set} → Tm v a) -> ({v : Ty → Set} → Nf v a)
nf_parametric t = nf t

February 11, 2025 12:00 AM

February 10, 2025

Oskar Wickström

Machine: Learning; Human: Unlearning;

This last month has been fascinating. I guess LLMs have finally resonated with me on a deeper level. It wasnâ€™t like I woke up and suddenly everything was different, but their impact is growing on me non-linearly, forcing me to rewire my brain.

February 10, 2025 11:00 PM

February 06, 2025

Philip Wadler

I've been nominated for a teaching award

I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.
As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.
Thank you, anonymous first-year student!

by Philip Wadler (noreply@blogger.com) at February 06, 2025 10:07 PM

February 04, 2025

Michael Snoyman

Who pays a tax?

President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.

Checking on people's general view of taxes. When the government imposes a tax on trade (sales tax, VAT, tariff, or even payroll tax), which party absorbs the cost of the tax?
— Michael Snoyman (@snoyberg) February 2, 2025

For purposes of this blog post, our ultimate question will be the following:

Suppose apples currently sell for $1 each in the entire United States.

There are domestic sellers and foreign sellers of apples, all receiving the same price.

There are no taxes or tariffs on the purchase of apples.

The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:

Number of apples bought in the US

Price paid by buyers for apples in the US

Post-tax price received by domestic apple producers

Post-tax price received by foreign apple producers

Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?

And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:

The number of apples bought will go down

The buyers will pay more for each apple they buy, but not the full amount of the tariff

Domestic apple sellers will receive a higher price per apple

Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff

In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.

Marginal benefit

All of the numbers discussed below are part of a helper Google Sheet I put together for this analysis. Also, apologies about the jagged lines in the charts below, I hadn’t realized before starting on this that there are some difficulties with creating supply and demand charts in Google Sheets.

Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.

But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.

How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.

The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:

Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.

Marginal cost

Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.

Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?

Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.

Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!

However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:

Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900

Total revenue: $5000 (10,000 apples at $0.50 each)

Total profit: $2100

So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.

This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:

Equilibrium price

We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:

You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:

When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.

When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.

Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”

OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.

Modeling taxes

Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.

First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:

With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.

We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:

The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:

As we can see, the equilibrium has shifted:

The equilibrium price paid by consumers has risen from $1 to $1.20.

The total number of apples purchased has dropped from 63 apples to 60 apples.

Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.

Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.

Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.

The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.

We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:

Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.

Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.

Domestic producers scored. They can sell less apples and make more revenue doing it.

And the government walked away with an extra $30.

Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.

Other taxes

Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.

And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.

You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:

Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.

Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.

Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.

This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.

February 04, 2025 12:00 AM

February 02, 2025

Joachim Breitner

Coding on my eInk Tablet
For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.

TL;DR: Using code-server on a beefy machine seems to be quite neat.

Passively lit coding

Personal history

Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.

Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.

Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.

I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.

VSCode, working remotely

Since these attempts I have started a new job working on the Lean theorem prover, and working on or with Lean basically means using VSCode. (There is a very good neovim plugin as well, but I’m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience).

My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.

In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?

VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.

code-server, running remotely

A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.

Access

With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.

I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.

That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:
$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true
With trust-on-first-use this seems reasonably secure.

Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:
bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key
(This is getting very specific to my particular needs and constraints, so I’ll spare you the details.)

Service

To keep code-server running I created a systemd service that’s managed by my user’s systemd instance:
~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target

[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server

[Install]
WantedBy=default.target
(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.

For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.

Git credentials

The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.

I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.

I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.

So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.
$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"
To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…

This still grants rather broad permissions to the code-server, but at least only temporarily

Android setup

On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!

I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.

For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.

A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.

I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.

Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.

I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.

I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?

Conclusion

After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.

A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?
by Joachim Breitner (mail@joachim-breitner.de) at February 02, 2025 03:07 PM

January 31, 2025

Well-Typed.Com

An introduction to Cabal Hooks for package authors
Over the last year, Well-Typed have carried out significant work in Cabal, Haskell’s build system, thanks to funding from the Sovereign Tech Fund. Our main goal was to re-think the Cabal architecture for building packages. This was historically tied to the Setup command-line interface, with each package technically capable of providing its own independent build system via the Custom build-type. In practice, the full generality of this interface is not useful, and it obstructs the development of new features and created a drag on maintenance, so there has long been an appetite to reimagine this interface within Cabal.¹

With the release of Cabal-3.14.0.0 and cabal-install-3.14.1.1, the new Hooks build-type we have developed, together with the Cabal-hooks library, are now available to package authors. Over time, we hope to see packages that depend on the Custom build-type gradually migrate to use Hooks instead.

For more background on this work, check out:

our blog post introducing the project,

our HF Tech Proposal in which the design was discussed with the developer community, and

the Distribution.Simple.SetupHooks documentation.

In the remainder of this post, we will:

dive into the background details of how Cabal works,

provide an introduction to the new interfaces for package authors who may wish to adapt their packages.

This post is based on Sam’s talk at the Haskell Ecosystem Workshop 2024.

Background

The Cabal specification

The Cabal specification (2005) was designed to allow Haskell tool authors to package their code and share it with other developers.

The Haskell Package System (Cabal) has the following main goal:

to specify a standard way in which a Haskell tool can be packaged, so that it is easy for consumers to use it, or re-package it, regardless of the Haskell implementation or installation platform.

The Cabal concept of a package is a versioned unit of distribution in source format, with enough metadata to allow it to be built and packaged by downstream distributors (e.g. Linux distributions and other build tools).

A Cabal package consists of multiple components which map onto individual Haskell units (e.g. a single library or executable).

The Cabal package model

Each package must bundle some metadata, specified in a .cabal file. Chiefly:

the package name and version number,

its dependencies, including version bounds (e.g. base >= 4.17 && < 4.21, lens ^>= 5.3),

what the package provides (libraries and their exposed modules, executables…),

how to build the package (e.g. build-type: Simple).

The Cabal library then implements everything required to build individual packages, first parsing the .cabal file and then building and invoking the Setup script of the package.

The Setup interface

The key component of the original Cabal specification is that each package must provision an executable which is used to build it. As written in an early draft:

To help users install packages and their dependencies, we propose a system similar to Python’s Distutils, where each Haskell package is distributed with a script which has a standard command-line interface.

More precisely, to comply with the Cabal specification, the build system of a package need only implement the Setup command-line interface, i.e. provide a Setup executable that supports invocations of the form ./Setup <cmd>:

<cmd> description

configure resolve compiler, tools and dependencies

build/haddock/repl prepare sources and build/generate docs/open a session in the interpreter

test/bench run testsuites or benchmarks

copy/install/register move files into an image dir or final location/register libraries with the compiler

sdist create an archive for distribution/packaging

clean clean local files (local package store, local build artifacts, …)

In practice, the ./Setup configure command takes a large number of parameters (as represented in the Cabal ConfigFlags datatype). This configuration is preserved for subsequent invocations, which usually only take a couple of parameters (e.g. ./Setup build -v2 --builddir=<dir>).

This interface can be used directly to build any package, by executing the the following recipe:

build and install the dependencies in dependency order;

to build each individual unit:

./Setup configure <componentName> <configurationArgs>

./Setup build --builddir=<buildDir>

./Setup haddock --builddir=<buildDir> <haddockArgs> (optional, to generate documentation)

to make a unit available to units that depend on it:

./Setup copy --builddir=<buildDir> --destDir=<destDir> (this makes executables available, e.g. for build-tool-depends)

for libraries, registration (see § Library registration):

./Setup register --builddir=<buildDir> --gen-pkg-config=<unitPkgRegFile>

hc-pkg register --package-db=<pkgDb> <unitPkgRegFile>

Usually, these steps will be executed by a build tool such as cabal-install, which provides a more convenient user interface than invoking Setup commands directly. Some systems (such as nixpkgs) do directly use this interface, however.

The tricky parts in the above are:

passing appropriate arguments to ./Setup configure, in particular exactly specifying dependencies,² and making sure the arguments are consistent with those expected by the cabal-version of the package,³

constructing the correct environment for invoking ./Setup, e.g. adding appropriate build-tool-depends executables in PATH and defining the corresponding <buildTool>_datadir environment variables.

Library registration

In the above recipe to build packages, there was a single step which wasn’t an invocation of the Setup script: a call to hc-pkg. To quote from the original Cabal specification:

Each Haskell compiler hc must provide an associated package-management program hc-pkg. A compiler user installs a package by placing the package’s supporting files somewhere, and then using hc-pkg to make the compiler aware of the new package. This step is called registering the package with the compiler.

To register a package, hc-pkg takes as input an installed package description (IPD), which describes the installed form of the package in detail.

This is the key interchange mechanism between Cabal and the Haskell compiler.

The installed package description format is laid out in the Cabal specification; in brief, it contains all the information the Haskell compiler needs to use a library, such as its exposed modules, its dependencies, and its installation path. This information can be seen by calling hc-pkg describe:
> ghc-pkg describe attoparsec --package-db=<cabal-store>/<ghc-ver>/package.db
name:            attoparsec
version:         0.14.4
visibility:      public
id:              attoparsec-0.14.4-b35cdbf2c0654f3ef00c00804c5e2b390700d4a0
abi:             d84b6b3e46222f7ab87b5a2d405e7f48
exposed:         True
exposed-modules:
    Data.Attoparsec Data.Attoparsec.ByteString
    [...]
hidden-modules:
    Data.Attoparsec.ByteString.Internal Data.Attoparsec.Text.Internal
depends:
    array-0.5.7.0-9340
    attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
    base-4.20.0.0-30dc bytestring-0.12.1.0-b549 containers-0.7-2f81
    deepseq-1.5.0.0-30ad ghc-prim-0.11.0-d05e
    scientific-0.3.6.2-d4ceb07500a94c3c60cb88dff4bfb53d40348b25
    text-2.1.1-e169 transformers-0.6.1.1-6955
Note that, perhaps confusingly, the hc-pkg interface is not concerned with Cabal’s notion of “packages”. Rather, it deals only in “units”; these generally map to Cabal components, such as the package’s main library and its private and public sublibraries. For example, the internal attoparsec-internal sublibrary of the attoparsec package is registered separately:
> ghc-pkg describe z-attoparsec-z-internal
name:            z-attoparsec-z-attoparsec-internal
version:         0.14.4
package-name:    attoparsec
lib-name:        attoparsec-internal
id:              attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
abi:             908ae57d09719bcdfb9cf85a27dab0e4
exposed-modules:
    Data.Attoparsec.ByteString.Buffer
    Data.Attoparsec.ByteString.FastSet Data.Attoparsec.Internal.Compat
    [...]
depends:
    array-0.5.7.0-9340 base-4.20.0.0-30dc bytestring-0.12.1.0-b549
    text-2.1.1-e169
How the Setup interface is used by packages

Centering the package build process around the Setup script provides a great deal of flexibility to package authors, as the Setup executable can be implemented in any way the package author chooses. In this way, each package brings its own build system.

However, in practice, this is more expressiveness that most library authors want or need. Consequently, almost all packages use one of the following two build systems:
build-type: Simple (most packages). For such packages, the Setup.hs file is of the following form:
module Main where
import Distribution.Simple (defaultMain)
main = defaultMain
This means that the ./Setup CLI interface maps directly to the implementation provided by the Cabal library:

./Setup configure = Cabal library Distribution.Simple.Configure.configure

./Setup build = Cabal library Distribution.Simple.Build.build

etc.
build-type: Custom where the Setup.hs file uses the Cabal library to perform most of the build, but brackets some of its logic with package-specific code using the Cabal UserHooks mechanism, e.g. so that it runs custom configuration code after Cabal configure, or generates module sources before running Cabal build.
For an example of case (2), the custom Setup.hs code for hooking into the configure phase might look like the following:
main =
  ( defaultMainWithHooks simpleUserHooks )
    { confHook = \ info cfgFlags -> do
        info' <- customPreConfHook info cfgFlags
        confHook simpleUserHooks info' cfgFlags
    }
In this example, simpleUserHooks means “no hooks” (or more accurately “exactly the hooks that build-type: Simple uses”). So the above snippet shows how we can include custom logic in customPreConfHook in order to update the Cabal GenericPackageDescription, before calling the Cabal library configure function (via confHook simpleUserHooks). Here, aGenericPackageDescription is the representation of a .cabal file used by Cabal (the Generic part means “before attempting to resolve any conditionals”).

The fact that Setup executables may (in principle) be arbitrary when using build-type: Custom fundamentally limits what build tools such as cabal-install or the Haskell Language Server can do in multi-package projects. The tool has to treat the build system of each package as an opaque black box, merely invoking functionality defined by the specific version of the Setup interface supported by the package.

The main observation is that, in practice, custom Setup.hs scripts only insert benign modifications to the build process: they still fundamentally rely on the Cabal library to do the bulk of the work building the package.

A replacement for Custom setup scripts

The limitations of the Setup interface discussed above motivate the need for a new mechanism to customise the build system of a package:

The bulk of the work should be carried out by the Cabal library, which exposes functions such as configure and build, but these need to be augmented with hooks so that individual packages can customise certain phases.

The hooks provided by this mechanism should be kept to a minimum (to give more flexibility to build tools) while still accommodating the needs of package authors in practice.

Customisation should be declared by a Haskell library interface (as opposed to the black-box command-line interface of Setup.hs), in order to enable as much introspection by build systems as possible.

This will enable a gradual restructuring of build tools such as cabal-install away from the Setup command-line interface, which has grown unwieldy due to the difficulty of evolving it to meet requirements that could not be foreseen when it was created.

Building on this understanding, as well as a survey of existing uses cases of build-type: Custom, we have introduced an alternative mechanism for customizing how a package is built: build-type: Hooks. This mechanism does not allow arbitrary replacement of the usual Cabal build logic, but rather merely exposes a set of well-defined hooks which bracket a subset of Cabal’s existing build steps.

We arrived at this design through collaboration with Cabal developers, users, and packagers as part of a RFC process in Haskell Foundation Tech Proposal #60.

Introducing build-type: Hooks

The main documentation for usage of the hooks API is provided in the Haddocks for the Cabal-hooks package. The Cabal Hooks overlay contains patched packages using build-type: Hooks. It can be used as an overlay like head.hackage, for constructing build plans without any build-type: Custom packages. It can also serve as a reference for usage of the API.

At a high-level, a package with build-type: Hooks:

declares in its .cabal file:

a cabal-version of at least 3.14,

build-type: Hooks,

a custom-setup stanza with a dependency on Cabal-hooks (the latter is a library bundled with Cabal that provides the API for writing hooks):
cabal-version: 3.14
...
build-type: Hooks
...

custom-setup
  setup-depends:
    base        >= 4.18 && < 5,
    Cabal-hooks >= 0.1  && < 0.2
contains a SetupHooks.hs Haskell module source file, next to the .cabal file, which specifies the hooks the package uses. This module exports a value setupHooks :: SetupHooks (in which the SetupHooks type is exported by Distribution.Simple.SetupHooks from the Cabal-hooks package).
module SetupHooks where

-- Cabal-hooks
import Distribution.Simple.SetupHooks

setupHooks :: SetupHooks
setupHooks =
  noSetupHooks
    { configureHooks = myConfigureHooks
    , buildHooks = myBuildHooks }
The new hooks fall into the following categories:

configure hooks allow customising how a package will be built

pre-build rules allow generating source files to be built

post-build hooks allow the package to customise the linking step

install hooks allow the package to install additional files alongside the usual binary artifacts

In the remainder of this blog post, we will focus on the two most important (and most commonly used) hooks: configure hooks and pre-build rules.

Configure hooks

The configure hooks allow package authors to make decisions about how to build their package, by modifying the Cabal package description (which is Cabal’s internal representation of the information in a .cabal file). Crucially, these modifications will persist to all subsequent phases.

Configuration happens at two levels:

global configuration covers the entire package,

local configuration covers a single component.

There are three hooks into the configure phase:

Package-wide pre-configure. This can be used for custom logic in the style of traditional ./configure scripts, e.g. finding out information about the system and configuring dependencies, when those don’t easily fit into Cabal’s framework.

Package-wide post-configure. This can be used to write custom package-wide information to disk, to be consumed by (3).

Per-component pre-configure. This can be used to modify individual components, e.g. adding exposed modules or specifying flags to be used when building the component.

Per-package configuration

Suppose our package needs to use some external executable, e.g. a preprocessor. If we require custom logic to find this external executable on the system, or to parse its version number, we need to go beyond Cabal’s built-in support for build-tool-depends.

We can do this in a pre-configure hook:
myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { preConfigurePackageHook = Just configureCustomPreProc }

configureCustomPreProc :: PreConfPackageInputs -> IO PreConfPackageOutputs
configureCustomPreProc pcpi@( PreConfPackageInputs { configFlags = cfg, localBuildConfig = lbc } ) = do
  let verbosity = fromFlag $ configVerbosity cfg
      progDb = withPrograms lbc
  configuredPreProcProg <-
    configureUnconfiguredProgram verbosity customPreProcProg progDb
  return $
    ( noPreConfPackageOutputs pcpi )
      { extraConfiguredProgs =
        Map.fromList
          [ ( customPreProcName, configuredPreProcProg ) ]
      }

customPreProcName :: String
customPreProcName = "customPreProc"

customPreProcProg :: Program
customPreProcProg =
  ( simpleProgram customPreProcName )
    { programFindLocation =
        -- custom logic to find the installed location of myPreProc
        -- on the system used to build the package
        myPreProcProgFindLocation
    , programFindVersion =
        -- custom logic to find the program version
        myPreProcProgFindVersion
    }
Cabal will then add this program to its program database, allowing the program to be used to satisfy build-tool-depends requirements, as well as making it available in subsequent hooks (e.g. pre-build hooks).

Modifying individual components

Suppose we want to modify a component of a Cabal package, e.g. inserting configuration options determined by inspecting the system used to build the package (e.g. availability of certain processor capabilities). We can do this using hooks into the configure phase. For illustration, consider the following example, which includes:

a package-wide post-configure hook, which inspects the system to determine availability of AVX2 CPU features, and writes it out to a "system-info" file,

a per-component pre-configure hook which reads the "system-info" file, and uses that to pass appropriate compiler options (e.g. -mavx2) when compiling each component.
myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { postConfPackageHook  = Just writeSystemInfo
    , preConfComponentHook = Just confComps
    }

data SystemInfo = SystemInfo { supportsAVX2 :: !Bool }
  deriving stock ( Show, Read )
    -- Show/Read for a quick-and-dirty serialisation interface (illustration only)

systemInfoFlags :: SystemInfo -> [ String ]
systemInfoFlags ( SystemInfo { supportsAVX2 } ) =
  [ "-mavx2" | supportsAVX2 ]

writeSystemInfo :: PostConfPackageInputs -> IO ()
writeSystemInfo ( PostConfPackageInputs { packageBuildDescr = pbd } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  supportsAVX2 <- System.Cpuid.Basic.supportsAVX2
  -- + more system-wide checks, if desired
  writeFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
    ( show $ SystemInfo { supportsAVX2 } )

systemInfoFile :: SymbolicPath Pkg ( Dir Dist ) -> SymbolicPath Pkg File
systemInfoFile distPref = distPref </> makeRelativePathEx "system-info"

confComps :: PreConfComponentInputs -> IO PreConfComponentOutputs
confComps pcci@( PreConfComponentInputs { packageBuildDescr = pbd, component = comp } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  sysInfo <- read <$> readFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
  let opts = systemInfoFlags sysInfo
      bi' = emptyBuildInfo
              { ccOptions = opts
              , cxxOptions = opts
              , options = PerCompilerFlavor opts []
              }
  return $
    ( noPreConfComponentOutputs pcci )
      { componentDiff =
         buildInfoComponentDiff ( componentName comp ) bi'
      }
Pre-build rules

Pre-build rules can be used to generate Haskell source files which can then be built as part of the compilation of a unit. Since we want to ensure that such generated modules don’t break recompilation avoidance (thereby crippling HLS and other interactive tools), these hooks comprise a simple build system. They are described in the Haddock documentation for Cabal-hooks.

The overall structure is that one specifies a collection of Rules inside the monadic API in the RulesM monad.

Each individual rule contains a Command, consisting of a statically specified action to run (e.g. a preprocessor such as alex, happy or c2hs) bundled with (possibly dynamic) arguments (such as the input and output filepaths). In the Hooks API, these are constructed using the mkCommand function. The actions are referenced using static pointers; this allows the static pointer table of the SetupHooks module to be used as a dispatch table for all the custom preprocessors provided by the hooks.

One registers rules using staticRule, declaring the inputs and outputs of each rule. In this way, we can think of each rule as corresponding to an individual invocation of a custom preprocessor. Rules are also allowed to have dynamic dependencies (using dynamicRule instead of staticRule); this supports use-cases such as C2Hs in which one needs to first process .chs module headers to discover the import structure.

Let’s start with a simple toy example to get used to the API: declare hooks that run alex on Lexer.alex and happy on Parser.happy (running alex/happy on *.x/*.y files is built into Cabal, but this is just for illustrative purposes).
{-# LANGUAGE StaticPointers #-}
-- [...]
myBuildHooks :: BuildHooks
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
      Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules pbci = do
  -- [...]
  -- Define the alex/happy commands.
      alexCmd  = mkCommand ( static Dict ) ( static runAlex )
      happyCmd = mkCommand ( static Dict ) ( static runHappy )
  -- Register a rule: run alex on Lexer.alex, producing Lexer.hs.
  let lexerInFile  = Location srcDir     ( makeRelativePathEx "Lexer.alex" )
      lexerOutFile = Location autogenDir ( makeRelativePathEx "Lexer.hs" )
  registerRule_ "alex:Lexer" $
    staticRule ( alexCmd ( verbosity, mbWorkDir, alex, lexerInFile, lexerOutFile ) )
      {- inputs  -} [ FileDependency lexerInFile ]
      {- outputs -} ( NE.singleton lexerOutFile )
  -- Register a rule: run happy on Parser.happy, producing Parser.hs.
  let parserInFile  = Location srcDir     (  makeRelativePathEx "Parser.happy" )
      parserOutFile = Location autogenDir (  makeRelativePathEx "Parser.hs" )
  registerRule_ "happy:Parser" $
    staticRule ( happyCmd ( verbosity, mbWorkDir, happy, parserInFile, parserOutFile ) )
      {- inputs  -} [ FileDependency parserInFile ]
      {- outputs -} ( NE.singleton parserOutFile )

runAlex, runHappy :: ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location ) -> IO ()
runAlex  = runPp ( Suffix "x" )
runHappy = runPp ( Suffix "y" )

runPp :: Suffix
      -> ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
      -> IO ()
runPp ( Suffix ppExt ) ( verbosity, mbWorkDir, ppProg, inLoc, outLoc ) = do
  -- Alex/Happy expect files with a specific extension,
  -- so we make a new temporary file and copy its contents,
  -- giving the file the expected file extension.
  tempDir <- makeSymbolicPath <$> getTemporaryDirectory
  withTempFileCwd mbWorkDir tempDir ( "." <> ppExt ) $ \ inputPpFile _ -> do
    copyFileVerbose verbosity
      ( interpretSymbolicPath mbWorkDir $ location inLoc )
      ( interpretSymbolicPath mbWorkDir inputPpFile )
    runProgramCwd verbosity mbWorkDir ppProg
      [ getSymbolicPath inputPpFile
      , "-o"
      , getSymbolicPath ( location outLoc )
      ]
The static Dict arguments to mkCommand provide evidence that the arguments passed to the preprocessor can be serialised and deserialised. While syntactically inconvenient for writers of Hooks, this crucially allows external build tools (such as cabal-install or HLS) to run and re-run individual build rules without re-building everything, as explained in the Haskell Foundation Tech Proposal #60.

Rules are allowed to depend on the output of other rules, as well as directly on files (using the Location datatype). If rule B depends on a file generated by rule A, then one must declare A as rule dependency of B (and not use a file dependency).

To summarise, the general structure is that we use the monadic API to declare a collection of rules (usually, one rule per Haskell module we want to generate, but a rule can generate multiple outputs as well). Each rule stores a reference (via StaticPointers) to a command to run, as well as the (possibly dynamic) arguments to that command. We can think of the pre-build rules as a table of statically known custom pre-processors, together with a collection of invocations of these custom pre-processors with specific arguments.

A word of warning: authors of pre-build rules should use the static keyword at the top-level whenever possible in order to avoid GHC bug #16981. In the example above, this corresponds to defining runAlex and runHappy at the top-level, instead of defining them in-line in the body of myPreBuildRules.

Custom pre-processors

To illustrate how to write pre-build rules, let’s suppose one wants to declare a custom preprocessor, say myPreProc, which generates Haskell modules from *.hs-mypp files. Any component of the package which requires such pre-processing would declare build-tool-depends: exe:myPreProc.

The pre-build rules can be structured as follows:

Look up the pre-processor in the Cabal ProgramDb (program database).

Define how, given input/output files, we should invoke this preprocessor, e.g. what arguments should we pass to it.

Search for all *.hs-mypp files relevant to the project, monitoring the results of this search (for recompilation checking).

For each file found by the search in (3), register a rule which invokes the processor as in (2).
{-# LANGUAGE StaticPointers #-}
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
        Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules
  PreBuildComponentInputs
    { buildingWhat   = what
    , localBuildInfo = lbi
    , targetInfo     = TargetInfo { targetComponent = comp, targetCLBI = clbi }
    } = do
  let verbosity = buildingWhatVerbosity what
      progDb = withPrograms lbi
      bi = componentBuildInfo comp
      mbWorkDir = mbWorkDirLBI lbi
  -- 1. Look up our custom pre-processor in the Cabal program database.
  for_ ( lookupProgramByName myPreProcName progDb ) $ \ myPreProc -> do
    -- 2. Define how to invoke our custom preprocessor.
    let myPpCmd :: Location -> Location -> Command MyPpArgs ( IO () )
        myPpCmd inputLoc outputLoc =
          mkCommand ( static Dict ) ( static ppModule )
            ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc )

    -- 3. Search for "*.hs-mypp" files to pre-process in the source directories of the package.
    let glob = GlobDirRecursive [ WildCard, Literal "hs-mypp" ]
    myPpFiles <- liftIO $ for ( hsSourceDirs bi ) $ \ srcDir -> do
      let root = interpretSymbolicPath mbWorkDir srcDir
      matches <- runDirFileGlob verbosity Nothing root glob
      return
        [ Location srcDir ( makeRelativePathEx match )
        | match <- globMatches matches
        ]
    -- Monitor existence of file glob to handle new input files getting added.
    --   NB: we don't have to monitor the contents of the files, because the files
    --       are declared as inputs to rules, which means that their contents are
    --       automatically tracked.
    addRuleMonitors [ monitorFileGlobExistence $ RootedGlob FilePathRelative glob ]
      -- NB: monitoring a directory recursive glob isn't currently supported;
      -- but implementing support would be a nice newcomer-friendly task for cabal-install.
      -- See https://github.com/haskell/cabal/issues/10064.

    -- 4. Declare rules, one for each module to be preprocessed, with the
    --    corresponding preprocessor invocation.
    for_ ( concat myPpFiles ) $ \ inputLoc@( Location _ inputRelPath ) -> do
      let outputBaseLoc = autogenComponentModulesDir lbi clbi
          outputLoc =
            Location
              outputBaseLoc
              ( unsafeCoerceSymbolicPath $ replaceExtensionSymbolicPath inputRelPath "hs" )
      registerRule_ ( toShortText $ getSymbolicPath inputRelPath ) $
        staticRule ( myPpCmd inputLoc outputLoc ) [] ( outputLoc NE.:| [] )

type MyPpArgs = ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
  -- NB: this could be a datatype instead, but it would need a 'Binary' instance.

ppModule :: MyPpArgs -> IO ()
ppModule ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc ) = do
  let inputPath  = location inputLoc
      outputPath = location outputLoc
  createDirectoryIfMissingVerbose verbosity True $
    interpretSymbolicPath mbWorkDir $ takeDirectorySymbolicPath outputPath
  runProgramCwd verbosity mbWorkDir myPreProc
    [ getSymbolicPath inputPath, getSymbolicPath outputPath ]
This might all be a bit much on first reading, but the key principle is that we are declaring a preprocessor, and then registering one invocation of this preprocessor per *.hs-mypp file:

In myPpCmd, the occurrence of static ppModule can be thought of as declaring a new preprocessor,⁴ with ppModule being the function to run. This is accompanied by the neighbouring static Dict occurrence, which provides a way to serialise and deserialise the arguments passed to preprocessor invocations.

We register one rule per each module to pre-process, which means that external build tools can re-run the preprocessor on individual modules when the source *.hs-mypp file changes.

Conclusion

This post has introduced build-type: Hooks for the benefit of package authors who use build-type: Custom. We hope that this introduction will inspire and assist package authors to move away from build-type: Custom in the future.

We encourage package maintainers to explore build-type: Hooks and contribute their feedback on the Cabal issue tracker, helping refine the implementation and expand its adoption across the ecosystem. To assist such explorations, we also recall the existence of the Cabal Hooks overlay, an overlay repository like head.hackage which contains packages that have been patched to use build-type: Hooks instead of build-type: Custom.

In addition to the work described here, we have done extensive work in cabal-install to address technical debt and enable it to make use of the new interface as opposed to going through the Setup CLI. The changes needed in cabal-install and other build tools (such as HLS) will be the subject of a future post.

While there remains technical work needed in cabal-install and HLS to fully realize the potential of build-type: Hooks, it should eventually lead to:

decreases in build times,

improvements in recompilation checking,

more robust HLS support,

removal of most limitations of build-type: Custom, such as the lack of ability to use multiple sublibraries,

better long-term maintainability of the Cabal project.

Well-Typed are grateful to the Sovereign Tech Fund for funding this work. In order to continue our work on Cabal and the rest of the Haskell tooling ecosystem, we are offering Haskell Ecosystem Support Packages. If your company relies on Haskell, please encourage them to consider purchasing a package!

See, for example, Cabal issue #3600.↩︎

e.g. --package-db=<pkgDb>, --cid=<unitId> and --dependency=<depPkgNm>:<depCompNm>=<depUnitId> arguments↩︎

The cabal-version field of a package description specifies the version of the Cabal specification it expects. As the Cabal specification evolves, so does the set of flags understood by the Setup CLI. This means that, when invoking the Setup script for a package, the build tool needs to be careful to pass arguments consistent with that version; see for instance how cabal-install handles this in Distribution.Client.Setup.filterConfigureFlags.↩︎

In practice, this means adding an entry to the static pointer table.↩︎
by sam at January 31, 2025 12:00 AM

`<cmd>`	description
`configure`	resolve compiler, tools and dependencies
`build`/`haddock`/`repl`	prepare sources and build/generate docs/open a session in the interpreter
`test`/`bench`	run testsuites or benchmarks
`copy`/`install`/`register`	move files into an image dir or final location/register libraries with the compiler
`sdist`	create an archive for distribution/packaging
`clean`	clean local files (local package store, local build artifacts, …)

January 26, 2025

Chris Reade

PenroseKiteDart Animations
About PenroseKiteDart

Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.

These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.

Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)

Decomposition Animations

The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.

Figure 1: sunGraph

We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.
    sunD3 :: Tgraph
    sunD3 = decompositions sunGraph !! 3
where we have used
    decompose :: Tgraph -> Tgraph
    
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose
The result (sunD3) is drawn in figure 2 (scaled up).

Figure 2: sunD3

The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.

Figure 3: Two decompositions of sunD3

Figure 4 also illustrates two decompositions, this time starting from forcedKingD.
    forcedKingD :: Tgraph
    forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD

A Composition Animation

An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.

Firstly, split darts in half.

Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.

Finally join the halves of the larger scale kites.

This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).

The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.

Figure 5: Composition Animation

In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)

Forcing Animations

The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.

The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.

Figure 6: Force animation

Figure 7: Another force animation

It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.
    boundaryExample :: Tgraph
    boundaryExample = runTry $ tryBoundaryFaceGraph $ force $ decompositions dartGraph !!5
Figure 8: boundaryExample

Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.

This is illustrated in the animation shown in figure 9.

Figure 9: Boundary gap animation

As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.

Figure 10: Star boundary

In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).

The result is shown in figure 11.

Figure 11: Heart animation

Creating Animations

Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.

The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.

An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).

The animation can then be passed to mainWith.
module Main (main) where
    
import Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine

...

fig::[(Diagram B,Int)]
fig = myExampleAnimation

main :: IO ()
main = mainWith fig
If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).

Crossfade tool

The decompose and compose animations were defined using crossfade.
crossfade :: Int -> Diagram B -> Diagram B -> [Diagram B]
crossfade n d1 d2 = map blending ratios 
  where
    blending r = opacity (1-r) d1 <> opacity r d2
    ratios = map ((/ fromIntegral n) . fromIntegral) [0..n]
Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).

Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.

Force animation tool

To create force animations it was useful to create a tool to produce frames with stages of forcing.
forceFrames :: Angle Double 
            -> Int
            -> Tgraph 
            -> (Colour Double, Colour Double, Colour Double)
            -> [Diagram B]
This takes as arguments

an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),

an Int (for the required number of frames),

a Tgraph (to be forced),

a triple of colours for filling darts, kites and grout (edge colour), respectively.

The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)

Previous related blogs

PenroseKiteDart user guide – this explains how to install and use the PenroseKiteDart package.

Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.

Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.

Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).

Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] GrÃ¼nbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).
by readerunner at January 26, 2025 10:11 AM

January 25, 2025

Ken T Takusagawa

[kufstdwm] alpha-beta with transposition table as a library function

transposition table is the other elegant improvement to minimax (after alpha-beta): elegant in principle, hairy to implement in practice.

consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game. what should be its API? what features should it provide?

evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation). intended for small games.

return the transposition table so that it can be reused for subsequent moves.

evaluate to given depth. or, user-specified predicate of whether to stop searching, e.g., quiescence search. quiescence search wants access to the transposition table.

ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways. I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try. but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.

common customizations sacrifice accuracy (correctness or completeness) for speed. for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated. the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.

allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm. for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.

provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.

by Unknown (noreply@blogger.com) at January 25, 2025 04:34 AM

January 24, 2025

Sandy Maguire

Use Monoids for Construction
There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.

This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:
Foo myFoo = new Foo().setBar().setQux(17).setZap(true);
Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:
Doodad doodad = new Doodad;
foreach (Widget widget in widgets) {
  doodad.addWidget(widget);
}
To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.

Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:
foldr
    (\(k, v) m -> Map.insert k v m)
    Map.empty
  $ toKVPairs something
Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.

The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.

But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.

I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?
cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]
cgo f = foldr step []
  where
    step a [] = [pure a]
    step a bss0@((b :| bs) : bss)
      | let (al, ac) = f a
      , let (bl, bc) = f b
      , al + 1 == bl && ac == bc
            = (a :| b : bs) : bss
      | otherwise = pure a : bss0
which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!

It’s Always Monoids

The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.

The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”

As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:
∀a. mempty <> a = a = a <> mempty
∀a b c. (a <> b) <> c = a <> (b <> c)
which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.

If you’re going to memorize any two particular examples of monoids, it had better be these two:
instance Monoid [a] where
  mempty = []
  a <> b = a ++ b

instance (Monoid a, Monoid b) => Monoid (a, b) where
  mempty = (mempty, mempty)
  (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)
The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.

The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.

The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:
from :: (Bool -> a) -> (a, a)
from f = (f False, f True)

to :: (a, a) -> (Bool -> a)
to (a, _) False = a
to (_, a) True  = a
and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:
instance Monoid a => Monoid (x -> a) where
  mempty = \_ -> mempty
  f <> g = \x -> f x <> g x
which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:
  from f <> from g
= (f False,  f True) <> (g False, g True)
= (f False <> g False, f True <> g True)
= ((f <> g) False, (f <> g) True)
= from (f <> g)
and
  to (a11, a12) <> to (a21, a22)
= \x -> to (a11, a12) x <> to (a21, a22) x
= \x -> case x of
    False -> to (a11, a12) False <> to (a21, a22) False
    True  -> to (a11, a12) True  <> to (a21, a22) True
= \x -> case x of
    False -> a11 <> a21
    True  -> a12 <> a22
= \x -> to (a11 <> a21, a12 <> a22) x
= to (a11 <> a21, a12 <> a22)
which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.

Anyway, I digresss.

The reason to memorize this Monoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.

Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.

From Theory to Practice

What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.

The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.

At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.

It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.

Rather amazingly, this implementation kinda sorta worked! But it was slow, and took $O(10k)$ SLOC to implement.

The insight here is that everything we needed to collect was monoidal:
data Solution = Solution
  { graph :: Graph
  , collectedNodes :: Set Node
  , metadata :: Map Node Metadata
  }
  deriving stock (Generic)
  deriving (Semigroup, Monoid) via Generically Solution
where the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.

And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:
instance Monoid w => Monad (Writer w)
Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:
collectNode :: MonadWriter Solution m => Node -> m ()
collectNode n = tell $ mempty { collectedNodes = Set.singleton n }

addMetadata :: MonadWriter Solution m => Node -> Metadata -> m ()
addMetadata n m = tell $ mempty { metadata = Map.singleton n m }

emitGraphFragment :: MonadWriter Solution m => Graph -> m ()
emitGraphFragment g = tell $ mempty { graph = g }
each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:
algorithm
  :: Metadata
  -- ^ the current scope
  -> Tree
  -- ^ the tree we're reducing
  -> Writer Solution Node
  -- ^ our partial solution, and the node corresponding to the root of the tree
which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.

This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.

Chalk it down to another win for induction!
January 24, 2025 09:35 AM

January 23, 2025

Brent Yorgey

You could have invented Fenwick trees
You could have invented Fenwick trees

Posted on January 23, 2025
Tagged Haskell, segment, Fenwick, tree, JFP, journal, paper
My paper, You could have invented Fenwick trees, has just been published as a Functional Pearl in the Journal of Functional Programming. This blog post is an advertisement for the paper, which presents a novel way to derive the Fenwick tree data structure from first principles.

Suppose we have a sequence of integers $a_1, \dots, a_n$ and want to be able to perform two operations:

we can update any $a_i$ by adding some value $v$ to it; or

we can perform a range query, which asks for the sum of the values $a_i + \dots + a_j$ for any range $[i,j]$.

There are several ways to solve this problem. For example:

We could just keep the sequence of integers in a mutable array. Updating is $O(1)$, but range queries are $O(n)$ since we must actually loop through the range and add up all the values.

We could keep a separate array of prefix sums on the side, so that $P_i$ stores the sum $a_1 + \dots + a_i$. Then the range query on $[i,j]$ can be computed as $P_j - P_{i-1}$, which only takes $O(1)$; however, updates now take $O(n)$ since we must also update all the prefix sums which include the updated element.

We can get the best of both worlds using a segment tree, a binary tree storing the elements at the leaves, with each internal node caching the sum of its children. Then both update and range query can be done in $O(\lg n)$.

I won’t go through the details of this third solution here, but it is relatively straightforward to understand and implement, especially in a functional language.

However, there is a fourth solution, known as a Fenwick tree or Fenwick array, independently invented by Ryabko (1989) and Fenwick (1994). Here’s a typical Java implementation of a Fenwick tree:
class FenwickTree {
    private long[] a;
    public FenwickTree(int n) { a = new long[n+1]; }
    public long prefix(int i) {
        long s = 0;
        for (; i > 0; i -= LSB(i)) s += a[i]; return s;
    }
    public void update(int i, long delta) {
        for (; i < a.length; i += LSB(i)) a[i] += delta;
    }
    public long range(int i, int j) {
        return prefix(j) - prefix(i-1);
    }
    public long get(int i) { return range(i,i); }
    public void set(int i, long v) { update(i, v - get(i)); }
    private int LSB(int i) { return i & (-i); }
}
I know what you’re thinking: what the heck!? There are some loops adding and subtracting LSB(i), which is defined as the bitwise AND of i and -i? What on earth is this doing? Unless you have seen this before, this code is probably a complete mystery, as it was for me the first time I encountered it.

However, from the right point of view, we can derive this mysterious imperative code as an optimization of segment trees. In particular, in my paper I show how we can:

Start with a segment tree.

Delete some redundant info from the segment tree, and shove the remaining values into an array in a systematic way.

Define operations for moving around in the resulting Fenwick array by converting array indices to indices in a segment tree, moving around the tree appropriately, and converting back.

Describe these operations using a Haskell EDSL for infinite-precision 2’s complement binary arithmetic, and fuse away all the intermediate conversion steps, until the above mysterious implementation pops out.

Profit.

I may be exaggerating step 5 a teensy bit. But you’ll find everything else described in much greater detail, with pretty pictures, in the paper! The official JFP version is here, and here’s an extended version with an appendix containing an omitted proof.

References

Fenwick, Peter M. 1994. “A New Data Structure for Cumulative Frequency Tables.” Software: Practice and Experience 24 (3): 327–36.

Ryabko, Boris Yakovlevich. 1989. “A Fast on-Line Code.” In Doklady Akademii Nauk, 306:548–52. 3. Russian Academy of Sciences.
<noscript>Javascript needs to be activated to view comments.</noscript>
by Brent Yorgey at January 23, 2025 12:00 AM

January 22, 2025

Haskell Interlude

61: Sam Lindley

Sam Lindley is a Reader in Programming Languages Design and Implementation at the University of Edinburgh. In this episode, he tells us how difficult naming is, the different kinds of effect systems and handlers, languages *much* purer than Haskell, and Modal logic.

by Haskell Podcast at January 22, 2025 09:00 PM

January 21, 2025

in Code

Advent of Code 2024: Haskell Solution Reflections for all 25 Days

Admittedly a bit late, buuuuuut Merry belated Christmas and Happy New Years to all!

This past December I again participated in Eric Wastl’s Advent of Code, a series of 25 daily Christmas-themed puzzles. Each puzzle comes with a cute story about saving Christmas, and the puzzles increase in difficulty as the stakes get higher and higher. Every night at midnight EST, my friends and I (including the good people of libera chat’s ##advent-of-code channel) discuss the latest puzzle and creative ways to solve and optimize it. But, the main goal isn’t to solve it quickly, it’s always to see creative ways to approach the puzzle and share different insights. The puzzles are bite-sized enough that there are often multiple ways to approach it, and in the past I’ve leveraged group theory, galilean transformations and linear algebra, and more group theory. This year was also the special 10 year anniversary event, with callbacks to fun story elements of all the previous years!

Most of the puzzles are also pretty nice to solve in Haskell! Lots of DFS’s that melt away as simple recursion or recursion schemes, and even the BFS’s that expose you to different data structures and encodings.

This year I’ve moved almost all of my Haskell code to an Advent of Code Megarepo. I also like to post write-ups on Haskelly ways to approach the problems, and they are auto-compiled on the megarepo wiki.

I try my best every year, but sometimes I am able to complete write-ups for all 25 puzzles before the new year catches up. The last time was 2020, and I’m proud to announce that 2024 is now also 100% complete!

You can find all of them here, but here are links to each individual one. Hopefully you can find them helpful. And if you haven’t yet, why not try Advent of Code yourself? :) And drop by the freenode ##advent-of-code channel, we’d love to say hi and chat, or help out! Thanks all for reading, and also thanks to Eric for a great event this year, as always!

Day 1 - Historian Hysteria

Day 2 - Red-Nosed Reports

Day 3 - Mull It Over

Day 4 - Ceres Search

Day 5 - Print Queue

Day 6 - Guard Gallivant

Day 7 - Bridge Repair

Day 8 - Resonant Collinearity

Day 9 - Disk Fragmenter

Day 10 - Hoof It

Day 11 - Plutonian Pebbles

Day 12 - Garden Groups

Day 13 - Claw Contraption

Day 14 - Restrom Redoubt

Day 15 - Warehouse Woes

Day 16 - Reindeer Maze

Day 17 - Chronospatial Computer

Day 18 - RAM Run

Day 19 - Linen Layout

Day 20 - Race Condition

Day 21 - Keypad Conundrum

Day 22 - Monkey Market

Day 23 - LAN Party

Day 24 - Crossed Wires

Day 25 - Code Chronicle

by Justin Le at January 21, 2025 07:54 AM

Matt Parsons

Making My Life Harder with GADTs
Lucas Escot wrote a good blog post titled “Making My Life Easier with GADTs”, which contains a demonstration of GADTs that made his life easier. He posted the article to reddit.

I’m going to trust that - for his requirements and anticipated program evolution - the solution is a good one for him, and that it actually made his life easier. However, there’s one point in his post that I take issue with:

Dependent types and assimilated type-level features get a bad rep. They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

I am often arguing against complexity in Haskell codebases. While Lucas’s prediction about “misconceptions” may be true for others, it is not true for me. I have worked extensively with Haskell’s most advanced features in large scale codebases. I’ve studied “Types and Programming Languages,” the Idris book, “Type Theory and Formal Proof”, and many other resources on advanced type systems. I don’t say this to indicate that I’m some kind of genius or authority, just that I’m not a rube who’s looking up on the Blub Paradox. My argument for simplicity comes from the hard experience of having to rip these advanced features out, and the pleasant discovery that simpler alternatives are usually nicer in every respect.

So how about GADTs? Do they make my life easier? Here, I’ll reproduce the comment I left on reddit:

They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

This opinion - in my case at least - stems from having seen people code themselves into a corner with fancy type features where a simpler feature would have worked just as well.

In this case, the “simplest solution” is to have two entirely separate datatypes, as the blog post initially starts with. These datatypes, after all, represent different things - a typed environment and an untyped environment. Why mix the concerns? What pain or requirement is solved by having one more complicated datatype when two datatypes works pretty damn well?

I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will mostly be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.

Code duplication can be a real concern. In this case, we have code that is not precisely duplicated, but simply similar - we want compilation logic to work for both untyped and typed logics, and only take typing information into account. When we want code to work over multiple possible types, we have two options: parametric polymorphism and ad-hoc polymorphism.

With parametric polymorphism, the solution looks like this:
data GlobalEnv a = GlobalEnv [(Name, GlobalDecl a)]

data GlobalDecl a
  = DataDecl (DataBody a)
  | FunDecl  (FunBody a)
  | TypeDecl a

data DataBody a = DataBody
  { indConstructors :: [ConstructorBody a]
  }

data ConstructorBody a = ConstructorBody
  { ctorName :: Name
  , ctorArgs :: Int
  , ctorType :: a
  }

data FunBody a = FunBody
  { funBody :: LamBox.Term
  , funType :: a
  }
This is actually very similar to the GADT approach, because we’re threading a type variable through the system. For untyped, we can write GlobalDecl (), and for typed, we can write GlobalDecl LamBox.Type.

Functions which can work on either untyped or typed would have GlobalDecl a -> _ as their input, and functions which require a representation can specify it directly. This would look very similar to the GADT approach: in practice, replace GlobalDecl Typed with GlobalDecl Type and GlobalDecl Untyped with GlobalDecl () and you’re good.

(or, heck, data Untyped = Untyped and the change is even smaller).

This representation is much easier to work with. You can deriving stock (Show, Eq, Ord). You can $(deriveJSON ''GlobalEnv). You can delete several language extensions. It’s also more flexible: you can use Maybe Type to represent partially typed programs (or programs with type inference). You can use Either TypeError Type to represent full ASTs with type errors. You can deriving stock (Functor, Foldable, Traversable) to get access to fmap (change the type with a function) and toList (collect all the types in the AST) and traverse (change each type effectfully, combining results).

When we choose GADTs here, we pay significant implementation complexity costs, and we give up flexibility. What is the benefit? Well, the entire benefit is that we’ve given up flexibility. With the parametric polymorphism approach, we can put anything in for that type variable a. The GADT prevents us from writing TypeDecl () and it forbids you from having anything other than Some (type :: Type) or None in the fields.

This restriction is what I mean by ‘coding into a corner’. Let’s say you get a new requirement to support partially typed programs. If you want to stick with the GADT approach, then you need to change data Typing = Typed | Untyped | PartiallyTyped and modify all the WhenTyped machinery - Optional :: Maybe a -> WhenTyped PartiallTyped a. Likewise, if you want to implement inference or type-checking, you need another constructor on Typing and another onWhenTyped - ... | TypeChecking and Checking :: Either TypeError a -> WhenTyped TypeChecking a.

But wait - now our TypeAliasDecl has become overly strict!
data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: TypeAliasBody -> GlobalDecl Typed
We actually want TypeAliasDecl to work with any of PartiallyTyped, Typed, or TypeChecking. Can we make this work? Yes, with a type class constraint:
class IsTypedIsh (t :: Typing)

instance IsTypedIsh Typed
instance IsTypedIsh PartiallyTyped
instance (Unsatisfiable msg) => IsTypedIsh Untyped

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: (IsTypedIsh t) => TypeAliasBody -> GlobalDecl t
But, uh oh, we also want to write functions that can operate in many of these states. We can extend IsTypedish with a function witness witnessTypedish :: WhenTyped t Type -> Type, but that also doesn’t quite work - the t actually determines the output type.
class IsTypedIsh (t :: Typing) where
  type TypedIshPayload t 
  isTypedIshWitness :: WhenTyped t Type -> TypedIshPayload t

instance IsTypedIsh Typed where
  type TypedIshPayload Typed = Type
  isTypedIshWitness (Some a) = a

instance IsTypedIsh PartiallyTyped where
  type TypedIshPayload PartiallyTyped = Maybe Type
  isTypedIshWitness (Optional a) = a

instance IsTypedIsh TypeChecking where
  type TypedIshPayload TypeChecking = Either TypeError Type
  isTypedIshWitness (Checking a) = a

instance (Unsatisfiable msg) => IsTypedIsh Untyped
Now, this does let us write code like:
inputHasTypeSorta :: (IsTypedIsh t) => GlobalDec t -> _
but actually working with this becomes a bit obnoxious. You see, without knowing t, you can’t know the result of isTypedIshWitness, so you end up needing to say things like (IsTypedish t, TypedIshPayload t ~ f Type, Foldable f) => ... to cover the Maybe and Either case - and this only lets you fold the result. But now you’re working with the infelicities of type classes (inherently open) and sum types (inherently closed) and the way that GHC tries to unify these two things with type class dispatch.

Whew.

Meanwhile, in parametric polymorphism land, we get almost all of the above for free. If we want to write code that covers multiple possible cases, then we can use much simpler type class programming. Consider how easy it is to write this function and type:
beginTypeChecking 
    :: GlobalDecl () 
    -> GlobalDecl (Maybe (Either TypeError Type))
beginTypeChecking = fmap (\() -> Nothing)
And now consider what you need to do to make the GADT program work out like this.
January 21, 2025 12:00 AM

January 19, 2025

Magnus Therning

Reviewing GitHub PRs in Emacs
My Emacs config's todo-list has long had an item about finding some way to review GitHub PRs without having to leave Emacs and when the forge issue that I subscribe to came alive again I thought it was time to see if I can improve my config.

I found three packages for doing reviews

code-review

github-review

emacs-pr-review

I've tried the first one before but at the time it didn't seem to work at all. Apparently that's improved somewhat, though there's a PR with a change that's necessary to make it work.¹ The first two don't support comments on multiple lines of a PR, there are issues/discussions for both

code-review: Code suggestion on multiple lines

github-review: Multi-line code comments

The last one, emacs-pr-review does support commenting on multiple lines, but it lacks a nice way of opening a review from magit. What I can do is

position the cursor on a PR in the magit status view, then

copy the the PR's URL using forge-copy-url-at-point-as-kill, and

open the PR by calling pr-review and pasting the PR's URL.

Which I did for a few days until I got tired of it and wrote a function to cut out they copy/paste part.
(defun mes/pr-review-via-forge ()
  (interactive)
  (if-let* ((target (forge--browse-target))
            (url (if (stringp target) target (forge-get-url target)))
            (rev-url (pr-review-url-parse url)))
      (pr-review url)
    (user-error "No PR to review at point")))
I've bound it to a key in magit-mode-map to make it easier.

I have to say I'm not completely happy with emacs-pr-review, so if either of the other two sort out commenting on multiple lines I'll check them out again.

My full setup for pr-review is here.

Footnotes:

¹
The details can be found among the comments of the forge issue.

Tags: emacs
January 19, 2025 10:10 AM

Dan Piponi (sigfpe)

Running from the past
Important Note
The links to formulae here are broken but a PDF version is available at github.

Preface
Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)

Markov Chains
A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence $X_0, X_1, X_2, \ldots$ such that $P(X_{i+1}=x|X_0,X_1,\ldots,X_i) = P(X_{i+1}=x|X_i)$ for all $i\ge0$ . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)

Like imperative code, this description is stateful. The state at step $i+1$ depends on the state at step $i$ . Probability is often easier to reason about when we work with independent identically drawn random variables and our $X_i$ aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.

Let's choose a Markov chain to play with. I'll pick one with 3 states called $A$ , $B$ and $C$ and with transition probabilities given by $P(X_{i+1}=y|X_i=x)=T_{xy}$ where $T=\begin{pmatrix} \frac{1}{2}& \frac{1}{2}& 0\\ \frac{1}{3}& \frac{1}{3}& \frac{1}{3}\\ 0& \frac{1}{2}& \frac{1}{2}\\ \end{pmatrix}$

Here's a diagram illustrating our states:
Implementation
First some imports:
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA)
> import System.Random
> import Control.Monad.State
> import Control.Monad
> import Data.List
> import Data.Array
And now the type of our random variable:
> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)
We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:
> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random
And now the code to take a single step in the Markov chain:
> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C
Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.

We can simulate the effect of taking $n$ steps with a function like this:
> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i
We can run for 100 steps, starting with , with a line like so:
*Main> evalState (steps 3 A) gen
B
The starting state of our random number generator is given by gen.

Consider the distribution of states after taking $n$ steps. For Markov chains of this type, we know that as $n$ goes to infinity the distribution of the $n$ th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large $n$ and hope it's large enough. As $n$ gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.

Some random functions

Here is a new version of our random step function:
> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C
In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:
> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs
Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)
*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]
When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!

Here's the code:
> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs
We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.

Try it and see:
*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]
Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.
*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]
In other words, for large enough we get the constant function.

Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take $n$ . If we can detect this then we've found the limit of $f_{0}$ $\circ$ $f_{1}$ $\ldots$ $\circ$ $f_{n-1}$ as $n$ goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.

We can code this up:
> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool
> test_constant f =
>     all_equal $ map f $ enumFromTo minBound maxBound
This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.
> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')
We can now sample from the limiting distribution a million times, say:
*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen
We can now count how often A appears:
*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748
That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.
> gen = mkStdGen 669
Notes
The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.
by sigfpe (noreply@blogger.com) at January 19, 2025 05:15 AM

Abhinav Sarkar

Interpreting Brainfuck in Haskell
Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer, and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!

This post was originally published on abhinavsarkar.net.

Contents
Introduction
Setup
String Interpreter
AST Interpreter
Bytecode Interpreter
Optimizing Bytecode Interpreter
Comparison

Introduction

Brainfuck (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in the fact that it is extremely minimalistic, with only eight instructions, and very easy to implement. Yet, it is Turing-complete and as capable as any other programming language¹. Writing an interpreter for BF is a fun exercise, and so there are hundreds, maybe even thousands of them. Since BF is very verbose, optimizing BF interpreters is almost a sport, with people posting benchmarks of their creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.

BF has eight instructions of one character each. A BF program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the IP to jump to remote instructions. The program terminates when the IP moves past the last instruction.

BF programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. BF programs can also read from standard input and write to standard output, one byte at a time using the ASCII character encoding.

The eight BF instructions each consist of a single character:

>

Increment the DP by one to point to the next cell to the right.

<

Decrement the DP by one to point to the next cell to the left.

+

Increment the byte at the DP by one.

-

Decrement the byte at the DP by one.

.

Output the byte at the DP.

,

Accept one byte of input, and store its value in the byte at the DP.

[

If the byte at the DP is zero, then instead of moving the IP forward to the next command, jump it forward to the command after the matching ] command.

]

If the byte at the DP is nonzero, then instead of moving the IP forward to the next command, jump it back to the command after the matching [ command.

Each [ matches exactly one ] and vice versa, and the [ comes first. Together, they add conditions and loops to BF.

Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.

For a taste, here is a small BF program that prints Hello, World! when run:
+++++++++++[>++++++>+++++++++>++++++++>++++>+++>+<<<<<<-]>+++
+++.>++.+++++++..+++.>>.>-.<<-.<.+++.------.--------.>>>+.>-.
As you can imagine, interpreting BF is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.

Setup

First, some imports:
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Monad (void)
import Data.Bits (shiftR, (.&.))
import Data.ByteArray qualified as BA
import Data.Char (chr, ord)
import Data.Functor (($>))
import Data.Int (Int8)
import Data.Kind (Type)
import Data.Vector qualified as V
import Data.Vector.Storable.Mutable qualified as MV
import Data.Word (Word16, Word8)
import Foreign.Ptr (Ptr, castPtr, minusPtr, plusPtr)
import Foreign.Storable qualified as S
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO qualified as IO
import Text.ParserCombinators.ReadP qualified as P
We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the memory and vector libraries.

We abstract the interpreter interface as a typeclass:
class Interpreter a where
  data Program a :: Type
  parse :: String -> Program a
  interpret :: Memory -> Program a -> IO ()
An Interpreter is specified by a data type Program and two functions: parse parses a string to a Program, and interpret interprets the parsed Program.

For modelling the mutable memory, we use a mutable unboxed IOVector of signed bytes (Int8) from the vector package. Since our interpreter runs in IO, this works well for us. The DP hence, is modelled as a index in this vector, which we name the MemIdx type.
newtype Memory = Memory {unMemory :: MV.IOVector Int8}
type MemIdx = Int

newMemory :: Int -> IO Memory
newMemory = fmap Memory . MV.new

memorySize :: Memory -> Int
memorySize = MV.length . unMemory

readMemory :: Memory -> MemIdx -> IO Int8
readMemory = MV.unsafeRead . unMemory

writeMemory :: Memory -> MemIdx -> Int8 -> IO ()
writeMemory = MV.unsafeWrite . unMemory

modifyMemory :: Memory -> (Int8 -> Int8) -> MemIdx -> IO ()
modifyMemory = MV.unsafeModify . unMemory

nextMemoryIndex :: Memory -> MemIdx -> MemIdx
nextMemoryIndex memory memIdx = (memIdx + 1) `rem` memorySize memory

prevMemoryIndex :: Memory -> MemIdx -> MemIdx
prevMemoryIndex memory memIdx = (memIdx - 1) `mod` memorySize memory
We wrap the IOVector Int8 with a Memory newtype. newMemory creates a new memory array of bytes initialized to zero. memorySize returns the size of the memory. readMemory, writeMemory and modifyMemory are for reading from, writing to and modifying the memory respectively. nextMemoryIndex and prevMemoryIndex increment and decrement the array index respectively, taking care of wrapping at boundaries.

Now we write the main function using the Interpreter typeclass functions:
main :: IO ()
main = do
  IO.hSetBuffering IO.stdin IO.NoBuffering
  IO.hSetBuffering IO.stdout IO.LineBuffering
  progName <- getProgName
  let usage = "Usage: " <> progName <> " -(s|a|b|o) <bf_file>"

  getArgs >>= \case
    [interpreterType, fileName] -> do
      code <- filter (`elem` "+-.,><[]") <$> readFile fileName
      memory <- newMemory 30000
      parseAndInterpret memory code usage interpreterType
    _ -> exitWithMsg usage
  where
    parseAndInterpret memory code usage = \case
      "-s" -> interpret @StringInterpreter memory $ parse code
      "-a" -> interpret @ASTInterpreter memory $ parse code
      "-b" -> interpret @BytecodeInterpreter memory $ parse code
      "-o" -> interpret @OptimizingBytecodeInterpreter memory $ parse code
      t -> exitWithMsg $ "Invalid interpreter type: " <> t <> "\n" <> usage

    exitWithMsg msg = IO.hPutStrLn IO.stderr msg >> exitFailure
The main function calls the parse and interpret functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-BF characters when reading the input file.

With the setup done, let’s move on to our first interpreter.

String Interpreter

A BF program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has $O(n)$ time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper².
data StringInterpreter

instance Interpreter StringInterpreter where
  data Program StringInterpreter = ProgramCZ CharZipper
  parse = ProgramCZ . czFromString
  interpret memory (ProgramCZ code) = interpretCharZipper memory code
Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus³.
data CharZipper = CharZipper
  {czLeft :: String, czFocus :: Maybe Char, czRight :: String}

czFromString :: String -> CharZipper
czFromString = \case
  [] -> CharZipper [] Nothing []
  (x : xs) -> CharZipper [] (Just x) xs

czMoveLeft :: CharZipper -> CharZipper
czMoveLeft = \case
  CharZipper [] (Just focus) right -> CharZipper [] Nothing (focus : right)
  CharZipper (x : xs) (Just focus) right -> CharZipper xs (Just x) (focus : right)
  z -> z

czMoveRight :: CharZipper -> CharZipper
czMoveRight = \case
  CharZipper left (Just focus) [] -> CharZipper (focus : left) Nothing []
  CharZipper left (Just focus) (x : xs) -> CharZipper (focus : left) (Just x) xs
  z -> z
This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as Maybe Char. czFromString creates a char zipper from a string. czMoveLeft and czMoveRight move the focus left and right respectively, taking care of setting the focus to Nothing if we move outside the program string.

Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:
interpretCharZipper :: Memory -> CharZipper -> IO ()
interpretCharZipper memory = go 0
  where
    go !memIdx !program = case czFocus program of
      Nothing -> return ()
      Just c -> case c of
        '+' -> modifyMemory memory (+ 1) memIdx >> goNext
        '-' -> modifyMemory memory (subtract 1) memIdx >> goNext
        '>' -> go (nextMemoryIndex memory memIdx) program'
        '<' -> go (prevMemoryIndex memory memIdx) program'
        ',' -> do
          getChar >>= writeMemory memory memIdx . fromIntegral . ord
          goNext
        '.' -> do
          readMemory memory memIdx >>= putChar . chr . fromIntegral
          goNext
        '[' -> readMemory memory memIdx >>= \case
          0 -> go memIdx $ skipRight 1 program
          _ -> goNext
        ']' -> readMemory memory memIdx >>= \case
          0 -> goNext
          _ -> go memIdx $ skipLeft 1 program
        _ -> goNext
      where
        program' = czMoveRight program
        goNext = go memIdx program'
Our main driver here is the tail-recursive go function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the BF logic accordingly.

If the current focus is Nothing, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the BF spec tells us to do.

For + and -, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For > and <, we increment or decrement the memory index respectively, and go to the next character.

For ,, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For ., we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.

For [, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching ] is found. Otherwise, we go to the next character.

For ], we skip left over the part of the program till the matching [ is found, if the current memory byte is non-zero. Otherwise, we go to the next character.

The next two functions implement the skipping logic:
skipRight :: Int -> CharZipper -> CharZipper
skipRight !depth !program
  | depth == 0 = program'
  | otherwise = case czFocus program' of
      Nothing -> error "No matching [ while skipping the loop forward"
      Just '[' -> skipRight (depth + 1) program'
      Just ']' -> skipRight (depth - 1) program'
      _ -> skipRight depth program'
  where
    program' = czMoveRight program

skipLeft :: Int -> CharZipper -> CharZipper
skipLeft !depth !program
  | depth == 0 = czMoveRight program
  | otherwise = case czFocus program' of
      Nothing -> error "No matching ] while skipping the loop backward"
      Just ']' -> skipLeft (depth + 1) program'
      Just '[' -> skipLeft (depth - 1) program'
      _ -> skipLeft depth program'
  where
    program' = czMoveLeft program
The tail-recursive functions skipRight and skipLeft skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.

That’s it! We now have a fully functioning BF interpreter. To test it, we use these two BF programs: hanoi.bf and mandelbrot.bf.

hanoi.bf solves the Tower of Hanoi puzzle with animating the solution process as ASCII art:

<noscript></noscript>
A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf

mandelbrot.bf prints an ASCII art showing the Mandelbrot set:

<noscript></noscript>
Mandelbrot set ASCII art by mandelbrot.bf

Both of these BF programs serve as good benchmarks for BF interpreters. Let’s test ours by compiling and running it⁴:
❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
    --run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main             ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
       29.15 real        29.01 user         0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
       94.86 real        94.11 user         0.50 sys
That seems quite slow. We can do better.

AST Interpreter

Instead of executing BF programs from their string representations, we can parse them to an Abstract Syntax Tree (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as AST nodes, allowing us to skip them trivially.
data ASTInterpreter

instance Interpreter ASTInterpreter where
  data Program ASTInterpreter = ProgramAST Instructions
  parse = ProgramAST . parseToInstrs
  interpret memory (ProgramAST instrs) = interpretAST memory instrs
We represent the BF AST as a Haskell Algebraic Data Type (ADT):
type Instructions = V.Vector Instruction

data Instruction
  = Inc               -- +
  | Dec               -- -
  | MoveR             -- >
  | MoveL             -- <
  | GetC              -- ,
  | PutC              -- .
  | Loop Instructions -- []
  deriving (Show)
There is one constructor per BF instruction, except for loops where the Loop constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in $O(1)$.

We use the parse combinator library ReadP to write a recursive-decent parser for BF:
parseToInstrs :: String -> Instructions
parseToInstrs code =
  V.fromList $ case P.readP_to_S (P.many instrParser <* P.eof) code of
    [(res, "")] -> res
    out -> error $ "Unexpected output while parsing: " <> show out

instrParser :: P.ReadP Instruction
instrParser = P.choice
  [ P.char '+' $> Inc,
    P.char '-' $> Dec,
    P.char '>' $> MoveR,
    P.char '<' $> MoveL,
    P.char ',' $> GetC,
    P.char '.' $> PutC,
    Loop . V.fromList <$> P.between (P.char '[') (P.char ']') (P.many instrParser)
  ]
All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.

Next, we interpret the BF AST:
interpretAST :: Memory -> Instructions -> IO ()
interpretAST memory = void . interpretInstrs 0 memory

interpretInstrs :: MemIdx -> Memory -> Instructions -> IO MemIdx
interpretInstrs memIdx !memory !program = go memIdx 0
  where
    go !memIdx !progIdx
      | progIdx == V.length program = return memIdx
      | otherwise = case program V.! progIdx of
          Inc -> modifyMemory memory (+ 1) memIdx >> goNext
          Dec -> modifyMemory memory (subtract 1) memIdx >> goNext
          MoveR -> go (nextMemoryIndex memory memIdx) $ progIdx + 1
          MoveL -> go (prevMemoryIndex memory memIdx) $ progIdx + 1
          GetC -> do
            getChar >>= writeMemory memory memIdx . fromIntegral . ord
            goNext
          PutC -> do
            readMemory memory memIdx >>= putChar . chr . fromIntegral
            goNext
          Loop instrs -> readMemory memory memIdx >>= \case
            0 -> goNext
            _ -> interpretInstrs memIdx memory instrs >>= flip go progIdx
      where
        goNext = go memIdx $ progIdx + 1
The AST interpreter code is quite similar to the string interpreter one. This time we use an integer as the IP to index the Instructions vector. All cases except the loop one are pretty much same as before.

For loops, we read the byte at the current memory index, and if it is zero, we skip executing the Loop AST node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.

And we are done. Let’s see how it performs:
❯ time ./bfi -a hanoi.bf > /dev/null
       14.94 real        14.88 user         0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
       36.49 real        36.32 user         0.17 sys
Great! hanoi.bf runs 2x faster, whereas mandelbrot.bf runs 2.6x faster. Can we do even better?

Bytecode Interpreter

AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, it jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode. That’s what our next interpreter uses.
data BytecodeInterpreter

instance Interpreter BytecodeInterpreter where
  data Program BytecodeInterpreter = ProgramBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> assemble
      >>> ProgramBC
  interpret memory (ProgramBC bytecode) = interpretBytecode memory bytecode
We reuse the parser from the AST interpreter, but then we convert the resultant AST into bytecode by translating and assembling it⁵. We use the Bytes byte array data type from the memory package to represent bytecode.

Unlike AST, bytecode has a flat list of instructions—called Opcodes—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike AST.
data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]
We use the Opcode ADT to model the BF opcodes. For now, it corresponds one-to-one with the Instruction ADT.

The translate function translates Instructions to Opcodes:
translate :: Instructions -> Opcodes
translate = V.toList >>> map translateOpcode
  where
    translateOpcode = \case
      Inc -> OpInc
      Dec -> OpDec
      MoveR -> OpMoveR
      MoveL -> OpMoveL
      GetC -> OpGetC
      PutC -> OpPutC
      Loop instrs -> OpLoop $ translate instrs
The assemble function assembles Opcodes to bytecode byte array:
assemble :: Opcodes -> BA.Bytes
assemble = BA.pack . concatMap assembleOpcode

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]
  where
    assembleBodyLen bodyLen =
      let lb = fromIntegral $ bodyLen .&. 0xff
          mb = fromIntegral $ (bodyLen .&. 0xff00) `shiftR` 8
       in [lb, mb] -- assumes Little-endian arch
The assembleOpcode function assembles an Opcode to a list of bytes (Word8s). For all cases except for OpLoop, we simply return a unique byte for the opcode.

For OpLoop, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 ($= 2^8*2^8$). If so, we throw an error. Otherwise, we return:

a unique byte for loop start (6),

followed by the body length encoded in two bytes (in the Little-endian order),

then the assembled loop body,

followed by a unique byte for loop end (7),

finally followed by the encoded body length again.

We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:
> code = "++++++++++++++++++++++++++++++++++++++++++++++++>+++++[<+.>-]"
> concatMap assembleOpcode . translate . parseToInstrs $ code
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,6,8,0,3,0,5,2,1,7,8,0]
Let’s focus on the last twelve bytes. The diagram below shows the meaning of the various bytes:

<noscript></noscript>
Assembled bytecode for a BF loop

The example also demonstrates the flat nature of assembled bytecode. Now, all we have to do is to interpret it:
interpretBytecode :: Memory -> BA.Bytes -> IO ()
interpretBytecode memory bytecode =
  MV.unsafeWith
    (unMemory memory)
    (BA.withByteArray bytecode
      . interpretBytecodePtr (memorySize memory) (BA.length bytecode))
Instead of using integer indices in the bytecode array and memory vector, this time we use C-style direct pointers ⁶:
type ProgPtr = Ptr Word8
type MemPtr = Ptr Int8

interpretBytecodePtr :: Int -> Int -> MemPtr -> ProgPtr -> IO ()
interpretBytecodePtr memLen programLen memStartPtr progStartPtr =
  go memStartPtr progStartPtr
  where
    progEndPtr = progStartPtr `plusProgPtr` programLen
    memEndPtr = memStartPtr `plusMemPtr` memLen

    go !memPtr !progPtr
      | progPtr == progEndPtr = return ()
      | otherwise = readProg >>= \case
          0 -> modifyMem (+ 1) >> goNext                           -- Inc
          1 -> modifyMem (subtract 1) >> goNext                    -- Dec
          2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
          3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
          4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
          5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
          6 -> readMem >>= \case                                   -- Loop start
            0 -> readProg2 >>= jump memPtr
            _ -> jump memPtr 3
          7 -> readMem >>= \case                                   -- Loop end
            0 -> jump memPtr 3
            _ -> readProg2 >>= jump memPtr . negate
          8 -> writeMem 0 >> goNext                                -- Clear
          op -> error $ "Unknown opcode: " <> show op
      where
        goNext = jump memPtr 1
        jump memPtr offset = go memPtr $ progPtr `plusProgPtr` offset

        readProg = S.peek progPtr
        readProg2 = -- assumes Little-endian arch
          fromIntegral <$> S.peek (castPtr @_ @Word16 $ progPtr `plusProgPtr` 1)

        readMem = S.peek memPtr
        writeMem = S.poke memPtr
        modifyMem f = readMem >>= writeMem . f
In Haskell, the pointer type Ptr is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the IP and DP are actually pointers.

The go function here is again the core of the interpreter loop. We track the current IP and DP in it, and execute the logic corresponding to the opcode at the current memory location. go ends when the IP points to the end of the program byte array.

Most of the cases in go are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the DP, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the IP by to skip over the loop body. Otherwise, we jump the IP by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.

The helper functions for doing pointer arithmetic are following:
plusProgPtr :: ProgPtr -> Int -> ProgPtr
plusProgPtr = plusPtr

plusMemPtr :: MemPtr -> Int -> MemPtr
plusMemPtr = plusPtr

nextMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
nextMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` inc
   in if memEndPtr > memPtr'
        then memPtr'
        else memStartPtr `plusPtr` (memPtr' `minusPtr` memEndPtr)

prevMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
prevMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` (-1 * inc)
   in if memPtr' >= memStartPtr
        then memPtr'
        else memEndPtr `plusPtr` (memPtr' `minusPtr` memStartPtr)
nextMemPtr and prevMemPtr implement wrapping of pointers as we do for memory indices in nextMemoryIndex and prevMemoryIndex. Let’s see what the results of our hard work are:
❯ time ./bfi -b hanoi.bf > /dev/null
       11.10 real        11.04 user         0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
       15.72 real        15.68 user         0.04 sys
1.3x and 2.3x speedups for hanoi.bf and mandelbrot.bf respectively over the AST interpreter. Not bad. But surely we can do even better?

Optimizing Bytecode Interpreter

We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since BF is so verbose, there are many opportunities for optimizing BF bytecode⁷. We are going to implement only one simple optimization, just to get a taste of how to do it.
data OptimizingBytecodeInterpreter

instance Interpreter OptimizingBytecodeInterpreter where
  data Program OptimizingBytecodeInterpreter = ProgramOBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> optimize
      >>> assemble
      >>> ProgramOBC
  interpret memory (ProgramOBC bytecode) = interpretBytecode memory bytecode
The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the optimize function called between the translation and assembly phases.

The pattern of opcode we are optimizing for is [-] and [+]. Both of these BF opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new Opcode for clearing a cell:
data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]
The optimize function recursively goes over the Opcodes, and emits optimized ones by replacing the patterns that clear the current cell with OpClear:
optimize :: Opcodes -> Opcodes
optimize = map $ \case
  OpLoop [OpDec] -> OpClear
  OpLoop [OpInc] -> OpClear
  OpLoop body -> OpLoop $ optimize body
  op -> op
Then we modify the assembleOpcode function to emit a unique byte for OpClear:
assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]
Finally, we modify the bytecode interpreter to execute the OpClear opcode.
go !memPtr !progPtr
  | progPtr == progEndPtr = return ()
  | otherwise = readProg >>= \case
      0 -> modifyMem (+ 1) >> goNext                           -- Inc
      1 -> modifyMem (subtract 1) >> goNext                    -- Dec
      2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
      3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
      4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
      5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
      6 -> readMem >>= \case                                   -- Loop start
        0 -> readProg2 >>= jump memPtr
        _ -> jump memPtr 3
      7 -> readMem >>= \case                                   -- Loop end
        0 -> jump memPtr 3
        _ -> readProg2 >>= jump memPtr . negate
      8 -> writeMem 0 >> goNext                                -- Clear
      op -> error $ "Unknown opcode: " <> show op
We can see how the patterns [-] and [+] that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:
❯ time ./bfi -o hanoi.bf > /dev/null
        4.07 real         4.04 user         0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
       15.58 real        15.53 user         0.04 sys
hanoi.bf runs 2.7x faster, whereas mandelbrot.bf is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.

Comparison

It’s time for a final comparison of the run times of the four interpreters:

Interpreter Hanoi Mandelbrot

String 29.15s 94.86s

AST 14.94s 36.49s

Bytecode 11.10s 15.72s

Optimizing Bytecode 4.07s 15.58s

The final interpreter is 7x faster than the baseline one for hanoi.bf, and 6x faster for mandelbrot.bf. Here’s the same data as a chart:

Run time of the four interpreters

That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our BF interpreter. The full code for this post is available here.

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
BF is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.↩︎

A string interpreter also serves as an useful baseline for measuring the performance of BF interpreters. That’s why I decided to use strings instead of Data.Text or Data.Sequence, which are more performant.↩︎

I am a big fan of zippers, as evidenced by this growing list of posts that I use them in.↩︎

We use Nix for getting the dependency libraries.↩︎
If you are unfamiliar, >>> is the left-to-right function composition function:
f >>> g = g . f
↩︎
While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of hanoi.bf by 1.1x and mandelbrot.bf by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use Storable vectors for the memory.↩︎

See BFC, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.↩︎
If you liked this post, please leave a comment.
by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 19, 2025 12:00 AM

Interpreter	Hanoi	Mandelbrot
String	29.15s	94.86s
AST	14.94s	36.49s
Bytecode	11.10s	15.72s
Optimizing Bytecode	4.07s	15.58s

January 18, 2025

Sandy Maguire

A New Perspective on Lenses
I’ve always considered lenses to be a bit uncomfortable. While they’re occasionally useful for doing deeply nested record updates, they often seem to be more trouble than they’re worth. There’s a temptation in the novice programmer, to ^.. and folded their way to a solution that is much more naturally written merely as toList. And don’t get me started about the stateful operators like <<+= and their friends. Many programs which can be more naturally written functionally accidentally end up being imperative due to somebody finding a weird lens combinator and trying to use it in anger. Much like a serious drug collection, the tendency is to push it as far as you can.

Thus, my response has usually been one of pushback and moderation. I don’t avoid lenses at all costs, but I do try to limit myself to the prime types (Lens', Prism', Iso'), and to the boring combinators (view, set, over). I feel like these give me most of the benefits of lenses, without sending me tumbling down the rabbit hole.

All of this is to say that my grokkage of lenses has always been one of generalized injections and projections, for a rather shallow definition of “generalized”. That is, I’ve grown accustomed to thinking about lenses as getter/setter pairs for data structures—eg, I’ve got a big product type and I want to pull a smaller piece out of it, or modify a smaller piece in a larger structure. I think about prisms as the dual structure over coproducts—“generalized” injecting and pattern matching.

And this is all true; but I’ve been missing the forest for the trees on this one. That’s not to say that I want to write lensier code, but that I should be taking the “generalized” part much more seriously.

The big theme of my intellectual development over the last few years has been thinking about abstractions as shared vocabularies. Monoids are not inherently interesting; they’re interesting because of how they let you quotient seemingly-unrelated problems by their monoidal structure. Applicatives are cool because once you’ve grokked them, you begin to see them everywhere. Anywhere you’ve got conceptually-parallel, data-independent computations, you’ve got an applicative lurking somewhere under the surface (even if it happens to be merely the Identity applicative.)

I’ve had a similar insight about lenses, and that’s what I wanted to write about today.

The Context

At work, I’ve been thinking a lot about compilers and memory layout lately. I won’t get into the specifics of why, but we can come up with an inspired example. Imagine we’d like to use Haskell to write a little eDSL that we will use to generate x86 machine code.

The trick of course, is that we’re writing Haskell in order to not write machine code. So the goal is to design high-level combinators in Haskell that express our intent, while simultaneously generating machine code that faithfully implements the intention.

One particularly desirable feature about eDSLs is that they allow us to reuse Haskell’s type system. Thus, imagine we have some type:
type Code :: Type -> Type
data Code a = Code
  { getMachineCode :: [X86OpCode]
  }
Notice that the a parameter here is entirely phantom; it serves only to annotate the type of the value produced by executing getMachineCode. For today’s purpose, we’ll ignore all the details about calling conventions and register layout and what not; let’s just assume a Code a corresponds to a computation that leaves a value (or pointer) to something of type a in a well-known place, whether that be the top of the stack, or eax or something. It doesn’t matter!

Since the type parameter to Code is phantom, we need to think about what role it should have. Keeping it at phantom would be disastrous, since this type isn’t used by Haskell, but it is certainly used to ensure our program is correct. Similarly, representational seems wrong, since coerce is meaningful only when thinking about Haskell; which this thing decidedly is not. Thus, our only other option is:
type role Code nominal
Frustratingly, due to very similar reasoning, Code cannot be a functor, because there’s no way¹ to lift an arbitrary Haskell function a -> b into a corresponding function Code a -> Code b. If there were, we’d be in the clear! But alas, we are not.

The Problem

All of the above is to say that we are reusing Haskell’s type system, but not its values. An expression of type Code Bool has absolutely no relation to the values True or False—except that we could write, by hand, a function litBool :: Bool -> Code Bool which happened to do the right thing.

It is tempting, however, to make new Haskell types in order to help constrain the assembly code we end up writing. For example, maybe we want to write a DSP for efficiently decoding audio. We can use Haskell’s types to organize our thoughts and prevent ourselves from making any stupid mistakes:
data Decoder = Decoder
  { format :: Format
  , seekPos :: Int
  , state :: ParserState
  }

data Chunk = ...

createDecoder :: Code MediaHandle -> Code Decoder
decodeChunk :: Code Decoder -> (Code Decoder, Code Chunk)
We now have a nice interface in our eDSL to guide end-users along the blessed path of signal decoding. We have documented what we are trying to do, and how it can be used once it’s implemented. But due to our phantom, yet nominal, parameter to Code, this is all just make believe. There is absolutely no correlation between what we’ve written down and how we can use it. The problem arises when we go to implement decodeChunk. We’ll need to know what state we’re in, which means we’ll need some function:
decoderState :: Code Decoder -> Code ParserState
decoderState = ???
In a world where Code is a functor, this is implemented trivially as fmap state. But Code is not a functor! Alas! Woe! What ever can we do?

The Solution

Lenses, my guy!

Recall that Code is phantom in its argument, even if we use roles to restrict that fact. This means we can implement a safe-ish version of unsafeCoerce, that only fiddles with the paramater of our phantom type:
unsafeCoerceCode :: Code a -> Code b
unsafeCoerceCode (Code ops) = Code ops
Judicious use of unsafeCoerceCode allows us to switch between a value’s type and its in-memory representation. For example, given a type:
type Bytes :: Nat -> Type
data Bytes n
we can reinterpret a Decode as a sequence of bytes:
decoderRep :: Iso' (Code Decoder) (Code (Bytes (32 + 4 + 1)))
decoderRep = iso unsafeCoerceCode unsafeCoerceCode

stateRep :: Iso' (Code ParserState) (Code (Bytes 1))
stateRep = iso unsafeCoerceCode unsafeCoerceCode
which says we are considering our Decoder to be laid out in memory like:
struct Decoder {
  char format[32];
  int32_t seekPos;
  char state;
};
Of course, this is a completely unsafe transformation, as far as the Haskell type system is aware. We’re in the wild west out here, well past any type theoretical life buoys. We’d better be right that this coercion is sound. But assuming this is in fact the in-memory representation of a Decoder, we are well justified in this transformation.

Notice the phrasing of our Iso' above. It is not an iso between Decoder and Bytes 37, but between Codes of such things. This witnesses the fact that it is not true in the Haskell embedding, merely in our Code domain. Of course, isos are like the least exciting optics, so let’s see what other neat things we can do.

Imagine we have some primitives:
slice
    :: n <= m
    => Int     -- ^ offset
    -> Proxy n -- ^ size
    -> Code (Bytes m)
    -> Code (Bytes n)

overwrite
    :: n <= m
    => Int  -- ^ offset
    -> Bytes n
    -> Bytes m
    -> Bytes m
which we can envision as Haskell bindings to the pseudo-C functions:
const char[n] slice(size_t offset, char[m] bytes) {
  return &bytes[offset];
}

char[m] overwrite(size_t offset, char[n] value, char[m] bytes) {
  char[m] new_bytes = malloc(m);
  memcpy(new_bytes, bytes, m);
  memcpy(&new_bytes[offset], value, n);
  return new_bytes;
}
We can use slice and overwrite to give a Lens' into Bytes:
slicing :: n <= m => Int -> Code (Bytes m) -> Code (Bytes n)
slicing offset =
  lens
    (slice offset Proxy)
    (\orig new -> overwrite offset new orig)
and finally, we can give an implementation of the desired decoderState above:
decoderState :: Lens' (Code Decoder) (Code ParserState)
decoderState = decoderRep . slicing 36 . from stateRep
Such a lens acts exactly as a record selector would, in that it allows us to view, set, and over a ParserState inside of a Decoder. But recall that Code is just a list of instructions we eventually want the machine to run. We’re using the shared vocabulary of lenses to emit machine code! What looks like using a data structure to us when viewed through the Haskell perspective, is instead invoking an assembler.

Reflections

Once the idea sinks in, you’ll start seeing all sorts of cool things you can do with optics to generate code. Prisms generalize running initializer code. A Traversal over Code can be implemented as a loop. And since all the sizes are known statically, if you’re feeling plucky, you can decide to unroll the loop right there in the lens.

Outside of the context of Code, the realization that optics are this general is still doing my head in. Something I love about working in Haskell is that I’m still regularly having my mind blown, even after a decade.

Short of compiling to categories via something like categorifier.↩︎
January 18, 2025 09:18 AM

January 16, 2025

Michael Snoyman

The Paradox of Necessary Force

Humans want the resources of other humans. I want the food that the supermarket owns so that I can eat it. Before buying it, I wanted the house that I now own. And before that, someone wanted to build a house on that plot of land, which was owned by someone else first. Most of the activities we engage in during our lifetime revolve around extracting something from someone else.

There are two basic modalities to getting the resources of someone else. The first, the simplest, and the one that has dominated the majority of human history, is force. Conquer people, kill them, beat them up and take their stuff, force them into slavery and make them do your work. It’s a somewhat effective strategy. This can also be more subtle, by using coercive and fraudulent methods to trick people into giving you their resources. Let’s call this modality the looter approach.

The second is trade. In the world of trade, I can only extract resources from someone else when they willingly give them to me in exchange for something else of value. This can be barter of value for value, payment in money, built-up goodwill, favors, charity (exchanging resources for the benefit you receive for helping someone else), and more. In order to participate in this modality, you need to create your own valuable resources that other people want to trade for. Let’s call this the producer approach.

The producer approach is better for society in every conceivable way. The looter approach causes unnecessary destruction, pushes production into ventures that don’t directly help anyone (like making more weapons), and rewards people for their ability to inflict harm. By contrast, the producer approach rewards the ability to meet the needs of others and causes resources to end up in the hands of those who value them the most.

Looter philosophy is rooted in the concept of the zero sum game, the mistaken belief that I can only have more if someone else has less. By contrast, the producer philosophy correctly identifies the fact that we can all end up better by producing more goods in more efficient ways. We live in our modern world of relatively widespread luxury because producers have made technological leaps—for their own self-serving motives—that have improved everyone’s ability to produce more goods going forward. Think of the steam engine, electricity, computing power, and more.

A producer-only world

It would be wonderful to live in a world in which there are no looters. We all produce, we all trade, everyone receives more value than they give, and there is no wasted energy or destruction from the use of force.

Think about how wonderful it could be! We wouldn’t need militaries, allowing a massive amount of productive capacity to be channeled into things that make everyone’s lives better. We wouldn’t need police. Not only would that free up more resources, but would remove the threat of improper use of force by the state against citizens. The list goes on and on.

I believe many economists—especially Austrian economists—are cheering for that world. I agree with them on the cheering. It’s why things like Donald Trump’s plans for tariffs are so horrific in their eyes. Tariffs introduce an artificial barrier between nations, impeding trade, preventing the peaceful transfer of resources, and leading to a greater likelihood of armed conflict.

There’s only one problem with this vision, and it’s also based in economics: game theory.

Game theory and looters

Imagine I’m a farmer. I’m a great farmer, I have a large plot of land, I run my operations efficiently, and I produce huge amounts of food. I sell that food into the marketplace, and with that money I’m able to afford great resources from other people, who willingly trade them to me because they value the money more than their own resources. For example, how many T-shirts does the clothing manufacturer need? Instead of his 1,000th T-shirt, he’d rather sell it for $5 and buy some food.

While I’m really great as a farmer, I’m not very good as a fighter. I have no weapons training, I keep no weapons on my property, and I dislike violence.

And finally, there’s a strong, skilled, unethical person down the street. He could get a job with me on the farm. For back-breaking work 8 hours a day, I’ll pay him 5% of my harvest. Or, by contrast, he could act like the mafia, demand a “protection fee” of 20%, and either beat me up, beat up my family, or cause harm to my property, if I don’t pay it.

In other words, he could be a producer and get 5% in exchange for hard work, or be a looter and get 20% in exchange for easy (and, likely for him, fun) work. As described, the game theoretic choice is clear.

So how do we stop a producer world from devolving back into a looter world?

Deterrence

There’s only one mechanism I’m aware of for this, and it’s deterrence. As the farmer, I made a mistake. I should get weapons training. I should keep weapons on my farm. I should be ready to defend myself and my property. Because if I don’t, game theory ultimately predicts that all trade will collapse, and society as we know it will crumble.

I don’t necessarily have to have the power of deterrence myself. I could hire a private security company, once again allowing the producer world to work out well. I trade something of lesser value (some money) for something I value more (the protection afforded by private security). If I’m lucky, that security company will never need to do anything, because the mere threat of their presence is sufficient.

And in modern society, we generally hope to rely on the government police force to provide this protection.

There are easy ways to defeat the ability of deterrence to protect our way of life. The simplest is to defang it. Decriminalize violent and destructive acts, for example. Remove the consequences for bad, looter behavior, and you will incentivize looting. This is far from a theoretical discussion. We’ve seen the clear outcome in California, which has decriminalized theft under $950, resulting—in a completely predictable way—in more theft, stores closing, and an overall erosion of producer philosophy.

And in California, this is even worse. Those who try to be their own deterrence, by arming themselves and protecting their rights, are often the targets of government force instead of the looters.

I’m guessing this phrasing has now split my reading audience into three groups. Group A agrees wholly with what I’m saying. Group B believes what I’ve just written is pure evil and garbage. Group C initially disagreed with my statements, but has an open mind and is willing to consider a different paradigm. The next section is targeted at groups A and C. Group B: good luck with the broken world you’re advocating.

Global scale

This concept of deterrence applies at a global scale too. I would love to live in a world where all nations exchange value for value and never use force against others. In fact, I believe the ultimate vision for this kind of a world ends with anarcho-capitalism (though I don’t know enough about the topic to be certain). There ends up being no need for any force against anyone else. It’s a beautiful vision for a unified world, where there are no borders, there is no destruction, there is only unity through trade. I love it.

But game theory destroys this too. If the entire world disarmed, it would take just one person who thinks he can do better through looter tactics to destroy the system. The only way to defeat that is to have a realistic threat of force to disincentivize someone from acting like a looter.

And this is the paradox. In order to live in our wonderful world of production, prosperity, health, and happiness, we always need to have our finger near enough to the trigger to respond to looters with force. I know of no other approach that allows production to happen. (And I am very interested in other theoretical solutions to this problem, if anyone wants to share reading material.)

Peace through strength

This line of thinking leads to the concept of peace through strength. When those tempted to use violence see the overwhelming strength of their potential victims, they will be disincentivized to engage in violent behavior. It’s the story of the guy who wants to rob my farm. Or the roaming army in the ancient world that bypassed the well fortified walled city and attacked its unprotected neighbor.

There are critics of this philosophy. As put by Andrew Bacevich, "'Peace through strength' easily enough becomes 'peace through war.'" I don’t disagree at all with that analysis, and it’s something we must remain vigilant against. But disarming is not the answer, as it will, of course, necessarily lead to the victory of those willing to use violence on others.

In other words, my thesis here is that the threat of violence must be present to keep society civilized. But the cost of using that violence must be high enough that neither side is incentivized to initiate it.

Israel

I’d been thinking of writing a blog post on this topic for a few months now, but finally decided to today. Israel just agreed to a hostage deal with Hamas. In exchange for the release of 33 hostages taken in the October 7 massacre, Israel will hand over 1,000 terrorists in Israeli prisons.

I have all the sympathy in the world for the hostages and their families. I also have great sympathy for the Palestinian civilians who have been harmed, killed, displaced, and worse by this war. And I have empathy (as one of the victims) for all of the Israeli citizens who have lived under threat of rocket attacks, had our lives disrupted, and for those who have been killed by this war. War is hell, full stop.

My message here is to those who have been pushing the lie of “peace through negotiations.” Or peace through capitulation. Or anything else. These tactics are the reason the war has continued. As long as the incentive structure makes initiating a war a positive, wars will continue to be initiated. Hamas has made its stance on the matter clear: it has sworn for the eradication of all Jews within the region, and considers civilian casualties on the Palestinian side not only acceptable, but advantageous.

I know that many people who criticize Israel and put pressure on us to stop the war in Gaza believe they are doing so for noble reasons. (For the record, I also believe many people have less altruistic reasons for their stance.) I know people like to point to the list of atrocities they believe Israel has committed. And, by contrast, the pro-Israel side is happy to respond with corresponding atrocities from the other side.

I honestly believe this is all far beyond irrelevant. The only question people should be asking is: how do we disincentivize the continuation of hostilities? And hostage deals that result in the release of terrorists, allow “aid” to come in (which, if history is any indication, will be used to further the construction of tunnels and other sources for attack on Israel), and give Hamas an opportunity to rearm, only incentivize the continuation of the war.

In other words, if you care about the innocent people on either side, you should be opposed to this kind of capitulation. Whatever you think about the morality of each side, more people will suffer with this approach.

Skin in the game

It’s easy to say things like that when your life isn’t on the line. I also don’t think that matters much. Either the philosophical, political, and economic analysis is correct, or it isn’t. Nonetheless, I do have skin in the game here. I still live in a warzone. I am less than 15 kilometers from the Lebanese border. We’ve had Hezbollah tunnels reaching into our surrounding cities. My family had to lock ourselves inside when Hezbollah paratroopers had attempted to land in our city.

My wife (Miriam) and I have discussed this situation at length, many times, over the course of this war. If I’m ever taken hostage, I hope the Israeli government bombs the hell out of wherever I am being held. I say this not only because I believe it is the right, just, moral, ethical, and strategically correct thing to do. I say this because I am selfish:

I would rather die than be tortured by our enemies.

I would rather die than be leveraged to make my family and country less safe.

I would rather die than live the rest of my life a shell of my former self, haunted not only by the likely torture inflicted on me, but by the guilt of the harm to others resulting from my spared life.

I don’t know why this hostage deal went through now. I don’t know what pressures have been brought to bear on the leaders in Israel. I don’t know if they are good people trying to protect their citizens, nefarious power hungry cretins looking to abuse both the Israeli and Palestinian populace to stay in control, weak-willed toadies who do what they’re told by others, or simply stupid. But my own stance is clear.

But what about the Palestinians?

I said it above, and I’ll say it again: I truly do feel horrible for the trauma that the Palestinian people are going through. Not for the active terrorists mind you, I feel no qualms about those raising arms against us being destroyed. But everyone else, even those who wish me and my fellow Israelis harm. (And, if polling is to be believed, that’s the majority of Palestinians.) I would much rather that they not be suffering now, and that eventually through earned trust on both sides, everyone’s lots are improved.

But the framework being imposed by those who “love” peace isn’t allowing that to happen. Trust cannot be built when there’s a greater incentive to return to the use of force. I was strongly opposed to the 2005 disengagement from Gaza. But once it happened, it could have been one of those trust-building starting points. Instead, I saw many people justify further violence by Hamas—such as non-stop rocket attacks on the south of Israel—because Israel hadn’t done enough yet.

Notice how fundamentally flawed this mentality is, just from an incentives standpoint! Israel gives up control of land, something against its own overall interests and something desired by Palestinians, and is punished for it with increased violence against citizens. Hamas engaged in a brutal destruction of all of its opponents within the Palestinian population, launched attacks on Israel, and when Israel did respond with force, Israel was blamed for having not done enough to appease Hamas.

I know people will want to complicate this story by bringing up the laundry list of past atrocities, of assigning negative motivations to Israel and its leaders, and a million other evasions that are used to avoid actually solving this conflict. Instead, I beg everyone to just use basic logic.

The violence will continue as long as the violence gets results.

January 16, 2025 12:00 AM

January 13, 2025

Michael Snoyman

Incentives Determine Outcomes

My blog posts and reading material have both been on a decidedly economics-heavy slant recently. The topic today, incentives, squarely falls into the category of economics. However, when I say economics, I’m not talking about “analyzing supply and demand curves.” I’m talking about the true basis of economics: understanding how human beings make decisions in a world of scarcity.

A fair definition of incentive is “a reward or punishment that motivates behavior to achieve a desired outcome.” When most people think about economic incentives, they’re thinking of money. If I offer my son $5 if he washes the dishes, I’m incentivizing certain behavior. We can’t guarantee that he’ll do what I want him to do, but we can agree that the incentive structure itself will guide and ultimately determine what outcome will occur.

The great thing about monetary incentives is how easy they are to talk about and compare. “Would I rather make $5 washing the dishes or $10 cleaning the gutters?” But much of the world is incentivized in non-monetary ways too. For example, using the “punishment” half of the definition above, I might threaten my son with losing Nintendo Switch access if he doesn’t wash the dishes. No money is involved, but I’m still incentivizing behavior.

And there are plenty of incentives beyond our direct control! My son is also incentivized to not wash dishes because it’s boring, or because he has some friends over that he wants to hang out with, or dozens of other things. Ultimately, the conflicting array of different incentive structures placed on him will ultimately determine what actions he chooses to take.

Why incentives matter

A phrase I see often in discussions—whether they are political, parenting, economic, or business—is “if they could just do…” Each time I see that phrase, I cringe a bit internally. Usually, the underlying assumption of the statement is “if people would behave contrary to their incentivized behavior then things would be better.” For example:

If my kids would just go to bed when I tell them, they wouldn’t be so cranky in the morning.

If people would just use the recycling bin, we wouldn’t have such a landfill problem.

If people would just stop being lazy, our team would deliver our project on time.

In all these cases, the speakers are seemingly flummoxed as to why the people in question don’t behave more rationally. The problem is: each group is behaving perfectly rationally.

The kids have a high time preference, and care more about the joy of staying up now than the crankiness in the morning. Plus, they don’t really suffer the consequences of morning crankiness, their parents do.

No individual suffers much from their individual contribution to a landfill. If they stopped growing the size of the landfill, it would make an insignificant difference versus the amount of effort they need to engage in to properly recycle.

If a team doesn’t properly account for the productivity of individuals on a project, each individual receives less harm from their own inaction. Sure, the project may be delayed, company revenue may be down, and they may even risk losing their job when the company goes out of business. But their laziness individually won’t determine the entirety of that outcome. By contrast, they greatly benefit from being lazy by getting to relax at work, go on social media, read a book, or do whatever else they do when they’re supposed to be working.

My point here is that, as long as you ignore the reality of how incentives drive human behavior, you’ll fail at getting the outcomes you want.

If everything I wrote up until now made perfect sense, you understand the premise of this blog post. The rest of it will focus on a bunch of real-world examples to hammer home the point, and demonstrate how versatile this mental model is.

Running a company

Let’s say I run my own company, with myself as the only employee. My personal revenue will be 100% determined by my own actions. If I decide to take Tuesday afternoon off and go fishing, I’ve chosen to lose that afternoon’s revenue. Implicitly, I’ve decided that the enjoyment I get from an afternoon of fishing is greater than the potential revenue. You may think I’m being lazy, but it’s my decision to make. In this situation, the incentive–money–is perfectly aligned with my actions.

Compare this to a typical company/employee relationship. I might have a bank of Paid Time Off (PTO) days, in which case once again my incentives are relatively aligned. I know that I can take off 15 days throughout the year, and I’ve chosen to use half a day for the fishing trip. All is still good.

What about unlimited time off? Suddenly incentives are starting to misalign. I don’t directly pay a price for not showing up to work on Tuesday. Or Wednesday as well, for that matter. I might ultimately be fired for not doing my job, but that will take longer to work its way through the system than simply not making any money for the day taken off.

Compensation overall falls into this misaligned incentive structure. Let’s forget about taking time off. Instead, I work full time on a software project I’m assigned. But instead of using the normal toolchain we’re all used to at work, I play around with a new programming language. I get the fun and joy of playing with new technology, and potentially get to pad my resume a bit when I’m ready to look for a new job. But my current company gets slower results, less productivity, and is forced to subsidize my extracurricular learning.

When a CEO has a bonus structure based on profitability, he’ll do everything he can to make the company profitable. This might include things that actually benefit the company, like improving product quality, reducing internal red tape, or finding cheaper vendors. But it might also include destructive practices, like slashing the R&D budget to show massive profits this year, in exchange for a catastrophe next year when the next version of the product fails to ship.

Or my favorite example. My parents owned a business when I was growing up. They had a back office where they ran operations like accounting. All of the furniture was old couches from our house. After all, any money they spent on furniture came right out of their paychecks! But in a large corporate environment, each department is generally given a budget for office furniture, a budget which doesn’t roll over year-to-year. The result? Executives make sure to spend the entire budget each year, often buying furniture far more expensive than they would choose if it was their own money.

There are plenty of details you can quibble with above. It’s in a company’s best interest to give people downtime so that they can come back recharged. Having good ergonomic furniture can in fact increase productivity in excess of the money spent on it. But overall, the picture is pretty clear: in large corporate structures, you’re guaranteed to have mismatches between the company’s goals and the incentive structure placed on individuals.

Using our model from above, we can lament how lazy, greedy, and unethical the employees are for doing what they’re incentivized to do instead of what’s right. But that’s simply ignoring the reality of human nature.

Moral hazard

Moral hazard is a situation where one party is incentivized to take on more risk because another party will bear the consequences. Suppose I tell my son when he turns 21 (or whatever legal gambling age is) that I’ll cover all his losses for a day at the casino, but he gets to keep all the winnings.

What do you think he’s going to do? The most logical course of action is to place the largest possible bets for as long as possible, asking me to cover each time he loses, and taking money off the table and into his bank account each time he wins.

But let’s look at a slightly more nuanced example. I go to a bathroom in the mall. As I’m leaving, I wash my hands. It will take me an extra 1 second to turn off the water when I’m done washing. That’s a trivial price to pay. If I don’t turn off the water, the mall will have to pay for many liters of wasted water, benefiting no one. But I won’t suffer any consequences at all.

This is also a moral hazard, but most people will still turn off the water. Why? Usually due to some combination of other reasons such as:

We’re so habituated to turning off the water that we don’t even consider not turning it off. Put differently, the mental effort needed to not turn off the water is more expensive than the 1 second of time to turn it off.

Many of us have been brought up with a deep guilt about wasting resources like water. We have an internal incentive structure that makes the 1 second to turn off the water much less costly than the mental anguish of the waste we created.

We’re afraid we’ll be caught by someone else and face some kind of social repercussions. (Or maybe more than social. Are you sure there isn’t a law against leaving the water tap on?)

Even with all that in place, you may notice that many public bathrooms use automatic water dispensers. Sure, there’s a sanitation reason for that, but it’s also to avoid this moral hazard.

A common denominator in both of these is that the person taking the action that causes the liability (either the gambling or leaving the water on) is not the person who bears the responsibility for that liability (the father or the mall owner). Generally speaking, the closer together the person making the decision and the person incurring the liability are, the smaller the moral hazard.

It’s easy to demonstrate that by extending the casino example a bit. I said it was the father who was covering the losses of the gambler. Many children (though not all) would want to avoid totally bankrupting their parents, or at least financially hurting them. Instead, imagine that someone from the IRS shows up at your door, hands you a credit card, and tells you you can use it at a casino all day, taking home all the chips you want. The money is coming from the government. How many people would put any restriction on how much they spend?

And since we’re talking about the government already…

Government moral hazards

As I was preparing to write this blog post, the California wildfires hit. The discussions around those wildfires gave a huge number of examples of moral hazards. I decided to cherry-pick a few for this post.

The first and most obvious one: California is asking for disaster relief funds from the federal government. That sounds wonderful. These fires were a natural disaster, so why shouldn’t the federal government pitch in and help take care of people?

The problem is, once again, a moral hazard. In the case of the wildfires, California and Los Angeles both had ample actions they could have taken to mitigate the destruction of this fire: better forest management, larger fire department, keeping the water reservoirs filled, and probably much more that hasn’t come to light yet.

If the federal government bails out California, it will be a clear message for the future: your mistakes will be fixed by others. You know what kind of behavior that incentivizes? More risky behavior! Why spend state funds on forest management and extra firefighters—activities that don’t win politicians a lot of votes in general—when you could instead spend it on a football stadium, higher unemployment payments, or anything else, and then let the feds cover the cost of screw-ups.

You may notice that this is virtually identical to the 2008 “too big to fail” bail-outs. Wall Street took insanely risky behavior, reaped huge profits for years, and when they eventually got caught with their pants down, the rest of us bailed them out. “Privatizing profits, socializing losses.”

And here’s the absolute best part of this: I can’t even truly blame either California or Wall Street. (I mean, I do blame them, I think their behavior is reprehensible, but you’ll see what I mean.) In a world where the rules of the game implicitly include the bail-out mentality, you would be harming your citizens/shareholders/investors if you didn’t engage in that risky behavior. Since everyone is on the hook for those socialized losses, your best bet is to maximize those privatized profits.

There’s a lot more to government and moral hazard, but I think these two cases demonstrate the crux pretty solidly. But let’s leave moral hazard behind for a bit and get to general incentivization discussions.

Non-monetary competition

At least 50% of the economics knowledge I have comes from the very first econ course I took in college. That professor was amazing, and had some very colorful stories. I can’t vouch for the veracity of the two I’m about to share, but they definitely drive the point home.

In the 1970s, the US had an oil shortage. To “fix” this problem, they instituted price caps on gasoline, which of course resulted in insufficient gasoline. To “fix” this problem, they instituted policies where, depending on your license plate number, you could only fill up gas on certain days of the week. (Irrelevant detail for our point here, but this just resulted in people filling up their tanks more often, no reduction in gas usage.)

Anyway, my professor’s wife had a friend. My professor described in great detail how attractive this woman was. I’ll skip those details here since this is a PG-rated blog. In any event, she never had any trouble filling up her gas tank any day of the week. She would drive up, be told she couldn’t fill up gas today, bat her eyes at the attendant, explain how helpless she was, and was always allowed to fill up gas.

This is a demonstration of non-monetary compensation. Most of the time in a free market, capitalist economy, people are compensated through money. When price caps come into play, there’s a limit to how much monetary compensation someone can receive. And in that case, people find other ways of competing. Like this woman’s case: through using flirtatious behavior to compensate the gas station workers to let her cheat the rules.

The other example was much more insidious. Santa Monica had a problem: it was predominantly wealthy and white. They wanted to fix this problem, and decided to put in place rent controls. After some time, they discovered that Santa Monica had become wealthier and whiter, the exact opposite of their desired outcome. Why would that happen?

Someone investigated, and ended up interviewing a landlady that demonstrated the reason. She was an older white woman, and admittedly racist. Prior to the rent controls, she would list her apartments in the newspaper, and would be legally obligated to rent to anyone who could afford it. Once rent controls were in place, she took a different tact. She knew that she would only get a certain amount for the apartment, and that the demand for apartments was higher than the supply. That meant she could be picky.

She ended up finding tenants through friends-of-friends. Since it wasn’t an official advertisement, she wasn’t legally required to rent it out if someone could afford to pay. Instead, she got to interview people individually and then make them an offer. Normally, that would have resulted in receiving a lower rental price, but not under rent controls.

So who did she choose? A young, unmarried, wealthy, white woman. It made perfect sense. Women were less intimidating and more likely to maintain the apartment better. Wealthy people, she determined, would be better tenants. (I have no idea if this is true in practice or not, I’m not a landlord myself.) Unmarried, because no kids running around meant less damage to the property. And, of course, white. Because she was racist, and her incentive structure made her prefer whites.

You can deride her for being racist, I won’t disagree with you. But it’s simply the reality. Under the non-rent-control scenario, her profit motive for money outweighed her racism motive. But under rent control, the monetary competition was removed, and she was free to play into her racist tendencies without facing any negative consequences.

Bureaucracy

These were the two examples I remember for that course. But non-monetary compensation pops up in many more places. One highly pertinent example is bureaucracies. Imagine you have a government office, or a large corporation’s acquisition department, or the team that apportions grants at a university. In all these cases, you have a group of people making decisions about handing out money that has no monetary impact on them. If they give to the best qualified recipients, they receive no raises. If they spend the money recklessly on frivolous projects, they face no consequences.

Under such an incentivization scheme, there’s little to encourage the bureaucrats to make intelligent funding decisions. Instead, they’ll be incentivized to spend the money where they recognize non-monetary benefits. This is why it’s so common to hear about expensive meals, gift bags at conferences, and even more inappropriate ways of trying to curry favor with those that hold the purse strings.

Compare that ever so briefly with the purchases made by a small mom-and-pop store like my parents owned. Could my dad take a bribe to buy from a vendor who’s ripping him off? Absolutely he could! But he’d lose more on the deal than he’d make on the bribe, since he’s directly incentivized by the deal itself. It would make much more sense for him to go with the better vendor, save $5,000 on the deal, and then treat himself to a lavish $400 meal to celebrate.

Government incentivized behavior

This post is getting longer in the tooth than I’d intended, so I’ll finish off with this section and make it a bit briefer. Beyond all the methods mentioned above, government has another mechanism for modifying behavior: through directly changing incentives via legislation, regulation, and monetary policy. Let’s see some examples:

Artificial modification of interest rates encourages people to take on more debt than they would in a free capital market, leading to malinvestment and a consumer debt crisis, and causing the boom-bust cycle we all painfully experience.

Going along with that, giving tax breaks on interest payments further artificially incentivizes people to take on debt that they wouldn’t otherwise.

During COVID-19, at some points unemployment benefits were greater than minimum wage, incentivizing people to rather stay home and not work than get a job, leading to reduced overall productivity in the economy and more printed dollars for benefits. In other words, it was a perfect recipe for inflation.

The tax code gives deductions to “help” people. That might be true, but the real impact is incentivizing people to make decisions they wouldn’t have otherwise. For example, giving out tax deductions on children encourages having more kids. Tax deductions on childcare and preschools incentivizes dual-income households. Whether or not you like the outcomes, it’s clear that it’s government that’s encouraging these outcomes to happen.

Tax incentives cause people to engage in behavior they wouldn’t otherwise (daycare+working mother, for example).

Inflation means that the value of your money goes down over time, which encourages people to spend more today, when their money has a larger impact. (Milton Friedman described this as high living.)

Conclusion

The idea here is simple, and fully encapsulated in the title: incentives determine outcomes. If you want to know how to get a certain outcome from others, incentivize them to want that to happen. If you want to understand why people act in seemingly irrational ways, check their incentives. If you’re confused why leaders (and especially politicians) seem to engage in destructive behavior, check their incentives.

We can bemoan these realities all we want, but they are realities. While there are some people who have a solid internal moral and ethical code, and that internal code incentivizes them to behave against their externally-incentivized interests, those people are rare. And frankly, those people are self-defeating. People should take advantage of the incentives around them. Because if they don’t, someone else will.

(If you want a literary example of that last comment, see the horse in Animal Farm.)

How do we improve the world under these conditions? Make sure the incentives align well with the overall goals of society. To me, it’s a simple formula:

Focus on free trade, value for value, as the basis of a society. In that system, people are always incentivized to provide value to other people.

Reduce the size of bureaucracies and large groups of all kinds. The larger an organization becomes, the farther the consequences of decisions are from those who make them.

And since the nature of human beings will be to try and create areas where they can control the incentive systems to their own benefits, make that as difficult as possible. That comes in the form of strict limits on government power, for example.

And even if you don’t want to buy in to this conclusion, I hope the rest of the content was educational, and maybe a bit entertaining!

January 13, 2025 12:00 AM

January 12, 2025

Sandy Maguire

Read the Code, Not the Profile
At work a few weeks back, I found myself digging into profile reports, trying to determine why our program was running so slowly. Despite having the extremely obvious-in-retrospect data in front of me, I wasted a lot of time speeding up code that turned out to not move the needle at all.

Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.

Some Context

I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.

But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.

So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.

Without further ado, here’s what our profile looked like:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
Well, that’s not very helpful. Of course MAIN takes 100% of the time. So I expanded that, and saw:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
No clearer. Opening up main:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
Sheesh.
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
OH MY GOD. JUST TELL ME SOMETHING ALREADY.
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%
Fast forwarding for quite a while, I opened up the entire stack until I got to something that didn’t take 100% of the program’s runtime:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%
        └ makeTest . . . . . . . . . . . . . . . . . 100%
          └ makeTest.\ . . . . . . . . . . . . . . . 100%
            └ compileProgram . . . . . . . . . . . . 100%
              └ evalAppT . . . . . . . . . . . . . . 100%
                └ runAppT  . . . . . . . . . . . . . 100%
                  └ runAppT' . . . . . . . . . . . . 100%
                    └ withLogging  . . . . . . . . . 100%
                      └ transformSSA . . . . . . . . 100%
                        └ >>=  . . . . . . . . . . . 100%
                          └ >>>= . . . . . . . . . . 100%
                            └ ibind  . . . . . . . . 100%
                              └ ibind.\  . . . . . . 100%
                                └ ibind.\.\  . . . . 100%
                                  ├ toSSA  . . . . . 15%
                                  ├ transform1 . . . 15%
                                  ├ transform2 . . . 10%
                                  ├ transform3 . . . 10%
                                  ├ transform4 . . . 20%
                                  └ collectGarbage . 30%
Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.

A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.

Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.

Retrospective

So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.

Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.

That is to say that a percentage-based improvement is $O(n)$ faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.

It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 10¹¹ instructions, while 9s is 10¹⁰. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.

To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.

So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.

The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.

After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.

This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.

Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.

The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.
January 12, 2025 03:29 PM

January 09, 2025

Edward Z. Yang

New Years resolutions for PyTorch in 2025

In my previous two posts "Ways to use torch.compile" and "Ways to use torch.export", I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some... just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!

Improving torch.compile

A programming model for PT2. A programming model is a an abstract description of the system that is both simple (so anyone can understand it and keep it in their head all at once) and can be used to predict the system's behavior. The torch.export programming model is an example of such a description. Beyond export, we would like to help users understand why all aspects of PT2 behave the way it does (e.g., via improved error messages), and give simple, predictable tools for working around problems when they arise. The programming model helps us clearly define the intrinsic complexity of our compiler, which we must educate users about. This is a big effort involving many folks on the PyTorch team and I hope we can share more about this effort soon.

Pre-compilation: beyond single graph export. Whenever someone realizes that torch.compile compilation is taking a substantial amount of time on expensive cluster machines, the first thing they ask is, "Why don't we just compile it in advance?" To support precompiling the torch.compile API exactly as is not so easy; unlike a traditional compiler which gets the source program directly as input, users of torch.compile must actually run their Python program to hit the regions of code that are intended to be compiled. Nor can these regions be trivially enumerated and then compiled: not only must know all the metadata input tensors flowing into a region, a user might not even know what the compiled graphs are if a model has graph breaks.

OK, but why not just run the model, dump all the compiled products, and then reuse them later? This works! Here is a POC from Nikita Shulga where a special decorator aot_compile_sticky_cache swaps between exporting a graph and running the exported product. Zhengxu Chen used a similar idea to export Whisper as a few distinct graphs, which he then manually stitched together in C++ to get a Python-free version of Whisper. If you want training to work, you can more directly integrate AOTInductor as an Inductor backend, e.g., as seen in this POC.. We are a stones throw away from working precompilation, which can guarantee no compilation at runtime, we just need to put the pieces together!

Improving caching further. There are some gaps with caching which we hope to address in the near future: (1) loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup (James Wu is on this), (2) if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request at the beginning (Oguz Ulgen recently landed this), (3) AOTAutograd cache is not fully rolled out yet (James Wu again). These collectively should be worth a 2x speedup or even more on warm cache time.

Fix multithreading. We should just make sure multithreading works, doing the testing and fiddly thread safety auditing needed to make it work. Here's a list of multithreading related issues.

Improving torch.export

Draft mode export. Export requires a lot of upfront work to even get an exported artifact in the first place. Draft mode export capitalizes on the idea that it's OK to generate an unsound "draft" graph early in the export, because even an incorrect graph is useful for kicking the tires on the downstream processing that happens after export. A draft export gives you a graph, and it also gives you a report describing what potential problems need to be fixed to get some guarantees about the correctness of the export. You can then chip away on the problems in the report until everything is green. One of the biggest innovations of draft-mode export is pervasive use of real tensor propagation when doing export: you run the export with actual tensors, so you can always trace through code, even if it is doing spicy things like data-dependent control flow.

Libtorch-free AOTInductor. AOTInductor generated binaries have a relatively small ABI surface that needs to be implemented. This hack from the most recent CUDA Mode meetup shows that you can just create an alternate implementation of the ABI that has no dependence on libtorch. This makes your deployed binary size much smaller!

Support for bundling CUDA kernels into AOTInductor. AOTInductor already supports directly bundling Triton kernels into the generated binary, but traditional CUDA kernels cannot be bundled in this way. There's no reason this has to be the case though: all we're doing is bundling cubins in both case. If we have the ability to bundle traditional CUDA kernels into AOTInductor, this means you could potentially directly embed custom operators into AOTInductor binaries, which is nice because then those operators no longer have to be offered on the runtime (especially if you're commonly iterating on these kernels!)

Export multigraphs. Export's standard model is to give you a single graph that you call unconditionally. But it's easy to imagine a level of indirection on top of these graphs, where we can dispatch between multiple graphs depending on some arguments to the model. For example, if you have a model that optionally takes an extra Tensor argument, you can simply have two graphs, one for when the Tensor is absent, and one for when it is present.

ABI stable PyTorch extensions. It's hard work being a third-party PyTorch extension with native code, because whenever there's a new release of Python or PyTorch you have to rebuild all of your wheels. If there was a limited ABI that you could build your extension against that didn't expose CPython and only relied on a small, stable ABI of PyTorch functions, your binary packaging situation would be much simpler! And if an extension relied on a small ABI, it could even be bundled with AOTInductor binary, letting these export products be truly package agnostic (one of our lessons we learned with torch.package is picking the split between "what is packaged" and "what is not" is very difficult, and people would much rather just have everything be packaged.) Jane Xu is investigating how to do this, and separately, Scott Wolchok has been refactoring headers in libtorch so that a small set of headers can be used independently of the rest of libtorch.

by Edward Z. Yang at January 09, 2025 08:50 PM

January 05, 2025

Manuel M T Chakravarty

Functional Programming in Swift

Functional Programming in Swift
When people talk about functional programming in modern multi-paradigm languages, they usually mention Rust, Scala, or Kotlin. You rarely hear Swift being mentioned. This is odd, as one might argue that, of these languages, Swift places the strongest emphasis on functional programming.
In this talk, I will explain the core functional programming features of Swift, including its expressive type system, value types, and mutability control. Furthermore, I will discuss how Swift’s language design is influenced by the desire to create a language that addresses the whole spectrum from low-level systems programming up to high-level applications with sophisticated graphical user interfaces. Beyond the core language itself, functional programming also permeates Swift’s rich ecosystem of libraries. To support this point, I will outline some FP-inspired core libraries, covering concepts from functional data structures over functional reactive programming to declarative user interfaces.
Finally, I will briefly summarise practical considerations for using Swift in your own projects. This includes the cross-platform toolchain, the package manager, and interoperability with other languages.

January 05, 2025 07:45 PM

Abhinav Sarkar

Solving Advent of Code “Seating System” with Comonads and Stencils
In this post, we solve the Advent of Code 2020 “Seating System” challenge in Haskell using comonads and stencils.

This post was originally published on abhinavsarkar.net.

Contents
The Challenge
The Cellular Automaton
The Solution
The Zipper
The Comonad
The Array
The Stencil

The Challenge

Here’s a quick summary of the challenge:
The seat layout fits on a grid. Each position is either floor (.), an empty seat (L), or an occupied seat (#). For example, the initial seat layout might look like this:
L.LL.LL.LL
LLLLLLL.LL
L.L.L..L..
LLLL.LL.LL
L.LL.LL.LL
L.LLLLL.LL
..L.L.....
LLLLLLLLLL
L.LLLLLL.L
L.LLLLL.LL
All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).

The following rules are applied to every seat simultaneously:

If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.

If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.

Otherwise, the seat’s state does not change.

Floor (.) never changes; seats don’t move, and nobody sits on the floor.
This is a classic Cellular Automaton problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and returns the final number of occupied seats. Let’s solve this in Haskell.

The Cellular Automaton

First, some imports:
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Comonad (Comonad (..))
import Data.Function (on)
import Data.List (intercalate, nubBy)
import Data.Massiv.Array (Ix2 (..))
import Data.Massiv.Array qualified as A
import Data.Massiv.Array.Unsafe qualified as AU
import Data.Proxy (Proxy (..))
import Data.Vector.Generic qualified as VG
import Data.Vector.Generic.Mutable qualified as VGM
import Data.Vector.Unboxed qualified as VU
import System.Environment (getArgs, getProgName)
We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the comonad, massiv and vector libraries.

Quoting the Wikipedia page on Cellular Automaton (CA):

A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.

For each cell, a set of cells called its neighborhood is defined relative to the specified cell.

An initial state is selected by assigning a state for each cell.

A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.

Let’s model the automaton of the challenge using Haskell:
newtype Cell = Cell Char deriving (Eq)

pattern Empty, Occupied, Floor :: Cell
pattern Empty = Cell 'L'
pattern Occupied = Cell '#'
pattern Floor = Cell '.'
{-# COMPLETE Empty, Occupied, Floor #-}

parseCell :: Char -> Cell
parseCell = \case
  'L' -> Empty
  '#' -> Occupied
  '.' -> Floor
  c -> error $ "Invalid character: " <> show c

rule :: Cell -> [Cell] -> Cell
rule cell neighbours =
  let occupiedNeighboursCount = length $ filter (== Occupied) neighbours
   in case cell of
        Empty | occupiedNeighboursCount == 0 -> Occupied
        Occupied | occupiedNeighboursCount >= 4 -> Empty
        _ -> cell
A cell in the grid can be in empty, occupied or floor state. We encode this with the pattern synonyms Empty, Occupied and Floor over the Cell newtype over Char¹.

The parseCell function parses a character to a Cell. The rule function implements the automaton rule.

The Solution

We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.
class (Eq a) => Grid a where
  fromLists :: [[Cell]] -> a
  step :: a -> a
  toLists :: a -> [[Cell]]

solve :: forall a. (Grid a) => Proxy a -> [[Cell]] -> Int
solve _ =
  fromLists @a
    >>> fix step
    >>> toLists
    >>> fmap (filter (== Occupied) >>> length)
    >>> sum
  where
    fix f x = let x' = f x in if x == x' then x else fix f x'
We solve the challenge using the Grid typeclass that all our different solutions implement. A grid is specified by three functions:

fromList: converts a list of lists of cells to the grid.

step: runs one step of the CA simulation.

toList: converts the grid back to a list of lists of cells.

The solve function calculates the number of finally occupied seats for any instance of the Grid typeclass by running the simulation till it converges².

Now, we use solve to solve the challenge in three ways depending on the command line argument supplied:
main :: IO ()
main = do
  progName <- getProgName
  getArgs >>= \case
    [gridType, fileName] ->
      readFile fileName
        >>= (lines >>> map (map parseCell) >>> solve' gridType >>> print)
    _ -> putStrLn $ "Usage: " <> progName <> " -(z|a|s) <input_file>"
  where
    solve' = \case
      "-z" -> solve $ Proxy @(ZGrid Cell)
      "-a" -> solve $ Proxy @(AGrid Cell)
      "-s" -> solve $ Proxy @(SGrid Cell)
      _ -> error "Invalid grid type"
We have set up the top (main) and the bottom (rule) of our solutions. Now let’s work on the middle part.

The Zipper

To simulate a CA, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? Zippers!.

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.

The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:
data Zipper a = Zipper [a] a [a] deriving (Eq, Functor)

zPosition :: Zipper a -> Int
zPosition (Zipper left _ _) = length left

zLength :: Zipper a -> Int
zLength (Zipper left _ right) = length left + 1 + length right

listToZipper :: [a] -> Zipper a
listToZipper = \case
  [] -> error "Cannot create Zipper from empty list"
  (x : xs) -> Zipper [] x xs

zipperToList :: Zipper a -> [a]
zipperToList (Zipper left focus right) = reverse left <> (focus : right)

pShowZipper :: (Show a) => Zipper a -> String
pShowZipper (Zipper left focus right) =
  unwords $
    map show (reverse left) <> (("[" <> show focus <> "]") : map show right)

zLeft :: Zipper a -> Zipper a
zLeft z@(Zipper left focus right) = case left of
  [] -> z
  x : xs -> Zipper xs x (focus : right)

zRight :: Zipper a -> Zipper a
zRight z@(Zipper left focus right) = case right of
  [] -> z
  x : xs -> Zipper (focus : left) x xs
A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:

zPosition returns the zero-indexed position of the focus in the zipper.

zLength returns the length of the zipper.

listToZipper and zipperToList do conversions between lists and zippers.

pShowZipper pretty-prints a zipper, highlighting the focus.

zLeft and zRight move the zipper’s focus to left and right respectively.

Let’s see it all in action:
> z = listToZipper [1..7]
> putStrLn $ pShowZipper z
[1] 2 3 4 5 6 7
> z' = zRight $ zRight $ zLeft $ zRight $ zRight z
> putStrLn $ pShowZipper z'
1 2 3 [4] 5 6 7
> zPosition z'
3
> zLength z'
7
> zipperToList z'
[1,2,3,4,5,6,7]
Great! Now, what is the zipper for a once-nested list? A once-nested zipper, of course:
newtype ZGrid a = ZGrid (Zipper (Zipper a)) deriving (Eq, Functor)

zgPosition :: ZGrid a -> (Int, Int)
zgPosition (ZGrid rows@(Zipper _ focus _)) = (zPosition rows, zPosition focus)

zgSize :: ZGrid a -> (Int, Int)
zgSize (ZGrid rows@(Zipper _ focus _)) = (zLength rows, zLength focus)

listsToZGrid :: [[a]] -> ZGrid a
listsToZGrid rows =
  let (first : rest) = fmap listToZipper rows
   in ZGrid $ Zipper [] first rest

zGridToLists :: ZGrid a -> [[a]]
zGridToLists (ZGrid (Zipper left focus right)) =
  reverse (fmap zipperToList left)
    <> (zipperToList focus : fmap zipperToList right)

pShowZGrid :: (Show a) => ZGrid a -> String
pShowZGrid (ZGrid (Zipper left focus right)) =
  intercalate "\n" $ pShowRows left <> (pShowZipper focus : pShowRows right)
  where
    pShowRows = map pShowZipper'
    pShowZipper' =
      zipperToList
        >>> splitAt (zPosition focus)
        >>> \ ~(left', focus' : right') ->
          unwords $
            map show left' <> ((" " <> show focus' <> " ") : map show right')
ZGrid is a newtype over a zipper of zippers. It has functions similar to Zipper for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.

Next, the functions to move the focus in the grid:
zgUp :: ZGrid a -> ZGrid a
zgUp (ZGrid rows) = ZGrid $ zLeft rows

zgDown :: ZGrid a -> ZGrid a
zgDown (ZGrid rows) = ZGrid $ zRight rows

zgLeft :: ZGrid a -> ZGrid a
zgLeft (ZGrid rows) = ZGrid $ fmap zLeft rows

zgRight :: ZGrid a -> ZGrid a
zgRight (ZGrid rows) = ZGrid $ fmap zRight rows
Let’s check them out in GHCi:
> zg = listsToZGrid $ replicate 7 $ [1..7]
> putStrLn $ pShowZGrid zg
[1] 2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
> zg' = zgDown $ zgRight $ zgDown $ zgRight zg
> putStrLn $ pShowZGrid zg'
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2 [3] 4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
> zgPosition zg'
(2,2)
> zgSize zg'
(7,7)
> zGridToLists zg'
[[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7]]
It works as expected. Now, how do we use this to simulate a CA?

The Comonad

A CA requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is Comonad.

Comonads are duals of Monads³. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating CA:
class Functor w => Comonad w where
  extract :: w a -> a
  duplicate :: w a -> w (w a)
  extend :: (w a -> b) -> w a -> w b
  {-# MINIMAL extract, (duplicate | extend) #-}
Assuming we can make ZGrid a comonad instance, the signatures for the above functions for ZGrid Cell would be:
class Comonad ZGrid where
  extract :: ZGrid Cell -> Cell
  duplicate :: ZGrid Cell -> ZGrid (ZGrid Cell)
  extend :: (ZGrid Cell -> Cell) -> ZGrid Cell -> ZGrid Cell
For ZGrid as a CA comonad:

The extract function would return the current focus of the grid.

The duplicate function would return a grid of grids, one inner grid for each possible focus of the input grid.

The extend function would apply the automata rule to each possible focus of the grid, and return a new grid.

The nice part is, we need to implement only the extract and duplicate functions, and the generation of the new grid is taken care of automatically by the default implementation of the extend function. Let’s write the comonad instance for ZGrid.

First, we write the comonad instance for Zipper:
instance Comonad Zipper where
  extract (Zipper _ focus _) = focus
  duplicate zipper = Zipper left zipper right
    where
      pos = zPosition zipper
      left = iterateN pos zLeft $ zLeft zipper
      right = iterateN (zLength zipper - pos - 1) zRight $ zRight zipper

iterateN :: Int -> (a -> a) -> a -> [a]
iterateN n f = take n . iterate f
extract for Zipper simply returns the input zipper’s focus element.

duplicate returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:
> z = listToZipper [1..7] :: Zipper Int
> :t duplicate z
duplicate z :: Zipper (Zipper Int)
> mapM_ (putStrLn . pShowZipper) $ zipperToList $ duplicate z
[1] 2 3 4 5 6 7
1 [2] 3 4 5 6 7
1 2 [3] 4 5 6 7
1 2 3 [4] 5 6 7
1 2 3 4 [5] 6 7
1 2 3 4 5 [6] 7
1 2 3 4 5 6 [7]
Great! Now we use similar construction to write the comonad instance for ZGrid:
instance Comonad ZGrid where
  extract (ZGrid grid) = extract $ extract grid
  duplicate grid = ZGrid $ Zipper left focus right
    where
      (focusRowPos, focusColPos) = zgPosition grid
      (rowCount, colCount) = zgSize grid

      focus = Zipper focusLeft grid focusRight
      focusLeft = iterateN focusColPos zgLeft $ zgLeft grid
      focusRight =
        iterateN (colCount - focusColPos - 1) zgRight $ zgRight grid

      left = iterateN focusRowPos (fmap zgUp) $ fmap zgUp focus
      right =
        iterateN (rowCount - focusRowPos - 1) (fmap zgDown) $ fmap zgDown focus
It works in similar fashion:
> zg = listsToZGrid $ replicate 4 $ [0..3] :: ZGrid Int
> putStrLn $ pShowZGrid zg
[0] 1 2 3
 0  1 2 3
 0  1 2 3
 0  1 2 3
> :t duplicate zg
duplicate zg :: ZGrid (ZGrid Int)
> :t mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg
mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg :: IO ()
I’ve rearranged the output of running the last line of the code above for clarity:

<noscript></noscript>
Output of duplicate for ZGrid

We can see a grid of grids, with one inner grid focussed at each possible focus of the input grid. Now we finally implement the automaton:
zGridNeighbours :: ZGrid a -> [a]
zGridNeighbours grid =
  map snd . nubBy ((==) `on` fst) $
    [ (pos, extract grid')
      | move <- moves,
        let grid' = move grid,
        let pos = zgPosition grid',
        pos /= zgPosition grid
    ]
  where
    moves =
      [ zgUp, zgDown, zgRight, zgLeft,
        zgUp >>> zgLeft, zgUp >>> zgRight,
        zgDown >>> zgLeft, zgDown >>> zgRight
      ]

stepZGrid :: ZGrid Cell -> ZGrid Cell
stepZGrid = extend $ \grid -> rule (extract grid) (zGridNeighbours grid)

instance Grid (ZGrid Cell) where
  fromLists = listsToZGrid
  step = stepZGrid
  toLists = zGridToLists
zGridNeighbours returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.

stepZGrid implements one step of the CA using the extend function of the Comonad typeclass. We call extend with a function that takes the current grid, and returns the result of running the CA rule on its focus and the neighbours of the focus.

Finally, we plug in our functions into the ZGrid Cell instance of Grid.

That’s it! Let’s compile and run the code⁴:
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main             ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
        2.72 real         2.68 user         0.02 sys
I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?

The Array

The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like length are $O(n)$. The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure⁵, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.

Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the CA, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.

Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the massiv library.
data AGrid a = AGrid {aGrid :: A.Array A.B A.Ix2 a, aGridFocus :: A.Ix2}
  deriving (Eq, Functor)
A.Ix2 is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of Ints. A.Array A.B A.Ix2 a here means a 2D boxed array of as. massiv uses representation strategies to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the Functor instance of A.Array exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.

It is actually massively⁶ easier to write the Comonad instance for AGrid:
instance Comonad AGrid where
  extract (AGrid grid focus) = grid A.! focus
  extend f (AGrid grid focus) =
    AGrid (A.compute $ A.imap (\pos _ -> f $ AGrid grid pos) grid) focus
The extract implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement duplicate because it is easier to implement extend directly. We map with index (A.imap) over the grid, calling the function f for the variation of the grid with the index as the focus.

Next, we write the CA step:
listsToAGrid :: [[Cell]] -> AGrid Cell
listsToAGrid = A.fromLists' A.Seq >>> flip AGrid (0 :. 0)

aGridNeighbours :: AGrid a -> [a]
aGridNeighbours (AGrid grid (x :. y)) =
  [ grid A.! (x + i :. y + j)
    | i <- [-1, 0, 1],
      j <- [-1, 0, 1],
      (x + i, y + j) /= (x, y),
      validIndex (x + i, y + j)
  ]
  where
    A.Sz (rowCount :. colCount) = A.size grid
    validIndex (a, b) = and [a >= 0, b >= 0, a < rowCount, b < colCount]

stepAGrid :: AGrid Cell -> AGrid Cell
stepAGrid = extend $ \grid -> rule (extract grid) (aGridNeighbours grid)

instance Grid (AGrid Cell) where
  fromLists = listsToAGrid
  step = stepAGrid
  toLists = aGrid >>> A.toLists
listsToAGrid converts a list of lists of cells into an AGrid focussed at (0,0). aGridNeighbours finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. stepAGrid calls extract and aGridNeighbours to implement the CA step, much like the ZGrid case. And finally, we create the AGrid Cell instance of Grid.

Let’s compile and run it:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
        0.10 real         0.09 user         0.00 sys
Woah! It takes only 0.1 second this time. Can we do even better?

The Stencil

massiv has a construct called Stencil that can be used for simulating CA:

Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.

That sounds like exactly what we need. Let’s try it out next.

With stencils, we do not need the instance of Comonad for the grid. So we can switch to the faster unboxed array representation:
newtype instance VU.MVector s Cell = MV_Char (VU.MVector s Char)
newtype instance VU.Vector Cell = V_Char (VU.Vector Char)
deriving instance VGM.MVector VU.MVector Cell
deriving instance VG.Vector VU.Vector Cell
instance VU.Unbox Cell

type SGrid a = A.Array A.U A.Ix2 a
First five lines make Cell an instance of the Unbox typeclass. We chose to make Cell a newtype wrapper over Char because Char has an Unbox instance.

Then we define a new grid type SGrid that is an 2D unboxed array.

Now, we define the stencil and the step function for our CA:
ruleStencil :: A.Stencil A.Ix2 Cell Cell
ruleStencil = AU.makeUnsafeStencil (A.Sz (3 :. 3)) (1 :. 1) $ \_ get ->
  rule (get (0 :. 0)) $ map get neighbourIndexes
  where
    neighbourIndexes =
      [ -1 :. -1, -1 :. 0, -1 :. 1,
         0 :. -1,           0 :. 1,
         1 :. -1,  1 :. 0,  1 :. 1
      ]

stepSGrid :: SGrid Cell -> SGrid Cell
stepSGrid = A.mapStencil (A.Fill Floor) ruleStencil >>> A.computeP

instance Grid (SGrid Cell) where
  fromLists = A.fromLists' A.Seq
  step = stepSGrid
  toLists = A.toLists
We make a stencil of size 3-by-3, where the focus is at index (1,1) relative to the stencil’s top-left cell. In the callback function, we use the supplied get function to get the neighbours of the focus by using indices relative to the focus, and call rule with the cells at focus and neighbour indices.

Then we write the step function stepSGrid that maps the stencil over the grid. Finally we put everything together in the SGrid Cell instance of Grid.

Let’s compile and run it:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
        0.08 real         0.07 user         0.00 sys
It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice A.computeP we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
        0.04 real         0.11 user         0.05 sys
The -threaded option enables multithreading, and the +RTS -N option makes the process use all CPU cores⁷. We get a nice speedup of 2x over the single-threaded version.

Bonus Round: Simulation Visualization

Since you’ve read the entire post, here is a bonus visualization of the CA simulation for you (warning: lots of fast blinking):

Play the simulation <noscript></noscript>

That’s it for this post! I hope you enjoyed it and took something away from it. The full code for this post is available here.

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
The reason for using a newtype instead of a data is explained in the Stencil section.↩︎
If you are unfamiliar, >>> is the left-to-right function composition function:
f >>> g = g . f
↩︎
This short post by Bartosz Milewski explains how comonads and monads are related.↩︎

We use Nix for getting the dependency libraries.↩︎

I did try a variation with Data.Sequence.Seq instead of lists, and it was twice as fast.↩︎

Pun very much intended.↩︎

I tried running the process with different values of N and found that N4 gave the fastest results. So, Amdahl’s law applies here.↩︎
If you liked this post, please leave a comment.
by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 05, 2025 12:00 AM

Planet Haskell

April 01, 2025

Normal ADTs

Product Types

Sum Types

Recursive Types

Generalized Algebraic Data Types

Singletons and Witnesses

Runtime Witnesses and Coyoneda Embedding

Higher-Kinded Eliminators

Existential Types

Recursive GADTs

Congratulations

Special Thanks

March 30, 2025

Background: breadth-first folds and traversals

Non-standard terminology

Problem statement: Breadth-first unfolds

Queue-based unfold

Some uses of unfolds

Traversals from unfolds

Unfolds in graphs

Compile and run

“Global” level-based unfold

In search of compositionality

The weave applicative

Code golf

A wrinkle in time

Complexity analysis

Out of sight, out of mind

Laziness for the win

Lazy in more ways than one

Maximizing laziness

Laziness without end

Microbenchmarks: Queues vs Global Levels vs Weaves

Tree-producing unfolds

Oblivious unfolds

Conclusion

Recap table

March 28, 2025

March 26, 2025

About the Haskell Unfolder

March 25, 2025

Addendum 20250326

March 21, 2025

Sponsorship

Team

GHC

GHC Releases

Platform support

Frontend

Backend

Compiler performance

Runtime system

GHCi & bytecode interpreter

Libraries

Build system

Cabal

ghc-debug

March 20, 2025

Reflection improvements

Reasoning and reflection of lambdas

Integration with GHC

cvc5 support

Name resolution overhaul

The road ahead

March 18, 2025

March 17, 2025

March 16, 2025

March 15, 2025

Addendum 20250325

March 14, 2025

GHC 9.12.2 is now available

Ben Gamari - 2025-03-14

March 13, 2025

Hendrix College Programming Contest 2025

March 12, 2025

About the Haskell Unfolder

March 10, 2025

Background

`ghc-debug`

A simpler `fix`