Planet Haskell

January 17, 2025

Well-Typed.Com

Tracing foreign function invocations

When profiling Haskell programs, time spent in foreign functions (functions defined in C) does not show up on normal time profiles, which can be problematic when debugging or optimizing performance of code that makes heavy use of the foreign function interface (FFI). In this blog post we present a new compiler plugin called trace-foreign-calls, which makes this time visible and available for analysis.

The trace-foreign-calls plugin as well as a simple analysis tool ghc-events-util are both available on GitHub.

Overview

Consider a C function

long slow_add(long a, long b) {
  while(b--) {
    a++;
  }
  return a;
}

with corresponding Haskell import

foreign import capi unsafe "test_cbits.h slow_add"
  c_slowAddIO :: CLong -> CLong -> IO CLong

and an application that invokes

main :: IO ()
main = do
    print =<< slowAddIO a b
    print =<< slowAddIO b a
  where
    a = 1_000_000_000
    b = 2_000_000_000

When we compile the application with the trace-foreign-calls plugin enabled, run it, and then look at the generated .eventlog using ghc-events-util, we will see:

 607.16ms   607.16ms       1922573  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
   0.26ms     0.00ms     609077635  cap 1  trace-foreign-calls: return c_slowAddIO

 302.02ms   302.02ms     609336093  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
   0.01ms     0.00ms     911353269  cap 1  trace-foreign-calls: return c_slowAddIO

The important column here is the first column, which for each event reports the time from that event to the next; in this case, from the call to the foreign call to its return. Perhaps ghc-events-util could be given a mode that is designed specifically for trace-foreign-events to make this output a bit more readable, but for now this general purpose output suffices.1

If we additionally compile with profiling enabled, we get an additional event for each foreign call, recording the cost-centre callstack:

   0.00ms     0.00ms       4643217  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
 856.37ms   856.37ms       4643327  cap 1  heap prof sample 0, residency 1, cost centre stack:
                                           slowAddIO in Example at Example.hs:29:1-78
                                           main in Main at test/Main.hs:(8,1)-(19,21)
                                           runMainIO1 in GHC.Internal.TopHandler at <no location info>
   0.65ms     0.00ms     861018010  cap 1  trace-foreign-calls: return c_slowAddIO

   0.00ms     0.00ms     861672464  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
 426.79ms   426.79ms     861672834  cap 1  heap prof sample 0, residency 1, cost centre stack:
                                           slowAddIO in Example at Example.hs:29:1-78
                                           main in Main at test/Main.hs:(8,1)-(19,21)
                                           runMainIO1 in GHC.Internal.TopHandler at <no location info>
   0.06ms     0.00ms    1288461103  cap 1  trace-foreign-calls: return c_slowAddIO

Note that we are abusing the “heap profile sample” event to record the cost-centre callstack to the foreign function (see “Conclusions and future work”, below).2

Dependencies

Suppose in example-pkg-A we have

foreign import capi "cbits.h xkcdRandomNumber"
  someFunInA :: IO CInt

and we use this function in example-pkg-B

main :: IO ()
main = do
    randomNumber <- someFunInA
    let bs = compress (BS.Char8.pack $ show randomNumber)
    print $ BS.Word8.unpack bs

where compress is from zlib. Although we are running main from example-pkg-B, in order to get information about someFunInA we need to enable the plugin when compiling example-pkg-A; the README.md describes how to enable the plugin for all dependencies. Indeed, when we do this, we see calls to libz as well:

   0.00ms     0.00ms        414047  cap 0  trace-foreign-calls: call someFunInA (capi safe "cbits.h xkcdRandomNumber")
   0.00ms     0.00ms        414607  cap 0  trace-foreign-calls: return someFunInA

(..)

   0.00ms     0.00ms        493076  cap 0  trace-foreign-calls: call c_zlibVersion (capi unsafe "zlib.h zlibVersion")
   0.00ms     0.00ms        493866  cap 0  trace-foreign-calls: return c_zlibVersion

Indeed, if we are willing to do a custom build of ghc, we can even enable the plugin on the boot libraries, which (amongst other things) makes the final print also visible:

   0.00ms     0.00ms        609576  cap 0  trace-foreign-calls: call unsafe_fdReady (ccall unsafe "fdReady")
   0.00ms     0.00ms        611846  cap 0  trace-foreign-calls: return unsafe_fdReady
   0.01ms     0.00ms        612286  cap 0  trace-foreign-calls: call c_write (capi unsafe "HsBase.h write")
   0.23ms     0.00ms        618236  cap 0  trace-foreign-calls: return c_write

Conclusions and future work

The trace-foreign-calls compiler plugin can be used to generate eventlog events for foreign function invocations, so that the time spent in foreign functions becomes visible; the ghc-events-util tool can be used to inspect these eventlogs.

The plugin works by renaming each foreign function import of foo to foo_uninstrumented, and then introducing a new wrapper function foo which emits some custom events to the eventlog before and after calling foo_uninstrumented. Since we want the plugin to work even on the GHC boot libraries, the wrapper tries to use only functionality from GHC.Prim, which limits what we can do. One consequence is that because the plugin reuses “heap profile sample” events to record the cost centre stacks for foreign functions, it is not currently possible to record both regular heap profile samples (that is, run the code with +RTS -p) and enable the plugin at the same time.

A better solution would be to add support for profiling foreign functions to ghc itself. This would involve creating new eventlog event types, and then upgrading existing time profiling tools to interpret these new events. Until then, however, time profiling of foreign function invocations is now at least possible.


  1. The first three columns are the time from each event to the next visible event (some events might be filtered out), the time from each event to the next actual event, and the time of the event since the start of the program.↩︎

  2. The samplefield will always be 0; the residency field is used to record the capability. The latter allows us to correlate events of concurrent foreign function invocations; the --res-is-cap command line option to ghc-events-util makes it understand this convention.↩︎

by edsko, zubin, matthew at January 17, 2025 12:00 AM

January 16, 2025

Michael Snoyman

The Paradox of Necessary Force

Humans want the resources of other humans. I want the food that the supermarket owns so that I can eat it. Before buying it, I wanted the house that I now own. And before that, someone wanted to build a house on that plot of land, which was owned by someone else first. Most of the activities we engage in during our lifetime revolve around extracting something from someone else.

There are two basic modalities to getting the resources of someone else. The first, the simplest, and the one that has dominated the majority of human history, is force. Conquer people, kill them, beat them up and take their stuff, force them into slavery and make them do your work. It’s a somewhat effective strategy. This can also be more subtle, by using coercive and fraudulent methods to trick people into giving you their resources. Let’s call this modality the looter approach.

The second is trade. In the world of trade, I can only extract resources from someone else when they willingly give them to me in exchange for something else of value. This can be barter of value for value, payment in money, built-up goodwill, favors, charity (exchanging resources for the benefit you receive for helping someone else), and more. In order to participate in this modality, you need to create your own valuable resources that other people want to trade for. Let’s call this the producer approach.

The producer approach is better for society in every conceivable way. The looter approach causes unnecessary destruction, pushes production into ventures that don’t directly help anyone (like making more weapons), and rewards people for their ability to inflict harm. By contrast, the producer approach rewards the ability to meet the needs of others and causes resources to end up in the hands of those who value them the most.

Looter philosophy is rooted in the concept of the zero sum game, the mistaken belief that I can only have more if someone else has less. By contrast, the producer philosophy correctly identifies the fact that we can all end up better by producing more goods in more efficient ways. We live in our modern world of relatively widespread luxury because producers have made technological leaps—for their own self-serving motives—that have improved everyone’s ability to produce more goods going forward. Think of the steam engine, electricity, computing power, and more.

A producer-only world

It would be wonderful to live in a world in which there are no looters. We all produce, we all trade, everyone receives more value than they give, and there is no wasted energy or destruction from the use of force.

Think about how wonderful it could be! We wouldn’t need militaries, allowing a massive amount of productive capacity to be channeled into things that make everyone’s lives better. We wouldn’t need police. Not only would that free up more resources, but would remove the threat of improper use of force by the state against citizens. The list goes on and on.

I believe many economists—especially Austrian economists—are cheering for that world. I agree with them on the cheering. It’s why things like Donald Trump’s plans for tariffs are so horrific in their eyes. Tariffs introduce an artificial barrier between nations, impeding trade, preventing the peaceful transfer of resources, and leading to a greater likelihood of armed conflict.

There’s only one problem with this vision, and it’s also based in economics: game theory.

Game theory and looters

Imagine I’m a farmer. I’m a great farmer, I have a large plot of land, I run my operations efficiently, and I produce huge amounts of food. I sell that food into the marketplace, and with that money I’m able to afford great resources from other people, who willingly trade them to me because they value the money more than their own resources. For example, how many T-shirts does the clothing manufacturer need? Instead of his 1,000th T-shirt, he’d rather sell it for $5 and buy some food.

While I’m really great as a farmer, I’m not very good as a fighter. I have no weapons training, I keep no weapons on my property, and I dislike violence.

And finally, there’s a strong, skilled, unethical person down the street. He could get a job with me on the farm. For back-breaking work 8 hours a day, I’ll pay him 5% of my harvest. Or, by contrast, he could act like the mafia, demand a “protection fee” of 20%, and either beat me up, beat up my family, or cause harm to my property, if I don’t pay it.

In other words, he could be a producer and get 5% in exchange for hard work, or be a looter and get 20% in exchange for easy (and, likely for him, fun) work. As described, the game theoretic choice is clear.

So how do we stop a producer world from devolving back into a looter world?

Deterrence

There’s only one mechanism I’m aware of for this, and it’s deterrence. As the farmer, I made a mistake. I should get weapons training. I should keep weapons on my farm. I should be ready to defend myself and my property. Because if I don’t, game theory ultimately predicts that all trade will collapse, and society as we know it will crumble.

I don’t necessarily have to have the power of deterrence myself. I could hire a private security company, once again allowing the producer world to work out well. I trade something of lesser value (some money) for something I value more (the protection afforded by private security). If I’m lucky, that security company will never need to do anything, because the mere threat of their presence is sufficient.

And in modern society, we generally hope to rely on the government police force to provide this protection.

There are easy ways to defeat the ability of deterrence to protect our way of life. The simplest is to defang it. Decriminalize violent and destructive acts, for example. Remove the consequences for bad, looter behavior, and you will incentivize looting. This is far from a theoretical discussion. We’ve seen the clear outcome in California, which has decriminalized theft under $950, resulting—in a completely predictable way—in more theft, stores closing, and an overall erosion of producer philosophy.

And in California, this is even worse. Those who try to be their own deterrence, by arming themselves and protecting their rights, are often the targets of government force instead of the looters.

I’m guessing this phrasing has now split my reading audience into three groups. Group A agrees wholly with what I’m saying. Group B believes what I’ve just written is pure evil and garbage. Group C initially disagreed with my statements, but has an open mind and is willing to consider a different paradigm. The next section is targeted at groups A and C. Group B: good luck with the broken world you’re advocating.

Global scale

This concept of deterrence applies at a global scale too. I would love to live in a world where all nations exchange value for value and never use force against others. In fact, I believe the ultimate vision for this kind of a world ends with anarcho-capitalism (though I don’t know enough about the topic to be certain). There ends up being no need for any force against anyone else. It’s a beautiful vision for a unified world, where there are no borders, there is no destruction, there is only unity through trade. I love it.

But game theory destroys this too. If the entire world disarmed, it would take just one person who thinks he can do better through looter tactics to destroy the system. The only way to defeat that is to have a realistic threat of force to disincentivize someone from acting like a looter.

And this is the paradox. In order to live in our wonderful world of production, prosperity, health, and happiness, we always need to have our finger near enough to the trigger to respond to looters with force. I know of no other approach that allows production to happen. (And I am very interested in other theoretical solutions to this problem, if anyone wants to share reading material.)

Peace through strength

This line of thinking leads to the concept of peace through strength. When those tempted to use violence see the overwhelming strength of their potential victims, they will be disincentivized to engage in violent behavior. It’s the story of the guy who wants to rob my farm. Or the roaming army in the ancient world that bypassed the well fortified walled city and attacked its unprotected neighbor.

There are critics of this philosophy. As put by Andrew Bacevich, "'Peace through strength' easily enough becomes 'peace through war.'" I don’t disagree at all with that analysis, and it’s something we must remain vigilant against. But disarming is not the answer, as it will, of course, necessarily lead to the victory of those willing to use violence on others.

In other words, my thesis here is that the threat of violence must be present to keep society civilized. But the cost of using that violence must be high enough that neither side is incentivized to initiate it.

Israel

I’d been thinking of writing a blog post on this topic for a few months now, but finally decided to today. Israel just agreed to a hostage deal with Hamas. In exchange for the release of 33 hostages taken in the October 7 massacre, Israel will hand over 1,000 terrorists in Israeli prisons.

I have all the sympathy in the world for the hostages and their families. I also have great sympathy for the Palestinian civilians who have been harmed, killed, displaced, and worse by this war. And I have empathy (as one of the victims) for all of the Israeli citizens who have lived under threat of rocket attacks, had our lives disrupted, and for those who have been killed by this war. War is hell, full stop.

My message here is to those who have been pushing the lie of “peace through negotiations.” Or peace through capitulation. Or anything else. These tactics are the reason the war has continued. As long as the incentive structure makes initiating a war a positive, wars will continue to be initiated. Hamas has made its stance on the matter clear: it has sworn for the eradication of all Jews within the region, and considers civilian casualties on the Palestinian side not only acceptable, but advantageous.

Gaza Chief's Brutal Calculation: Civilian Bloodshed Will Help Hamas

I know that many people who criticize Israel and put pressure on us to stop the war in Gaza believe they are doing so for noble reasons. (For the record, I also believe many people have less altruistic reasons for their stance.) I know people like to point to the list of atrocities they believe Israel has committed. And, by contrast, the pro-Israel side is happy to respond with corresponding atrocities from the other side.

I honestly believe this is all far beyond irrelevant. The only question people should be asking is: how do we disincentivize the continuation of hostilities? And hostage deals that result in the release of terrorists, allow “aid” to come in (which, if history is any indication, will be used to further the construction of tunnels and other sources for attack on Israel), and give Hamas an opportunity to rearm, only incentivize the continuation of the war.

In other words, if you care about the innocent people on either side, you should be opposed to this kind of capitulation. Whatever you think about the morality of each side, more people will suffer with this approach.

Skin in the game

It’s easy to say things like that when your life isn’t on the line. I also don’t think that matters much. Either the philosophical, political, and economic analysis is correct, or it isn’t. Nonetheless, I do have skin in the game here. I still live in a warzone. I am less than 15 kilometers from the Lebanese border. We’ve had Hezbollah tunnels reaching into our surrounding cities. My family had to lock ourselves inside when Hezbollah paratroopers had attempted to land in our city.

My wife (Miriam) and I have discussed this situation at length, many times, over the course of this war. If I’m ever taken hostage, I hope the Israeli government bombs the hell out of wherever I am being held. I say this not only because I believe it is the right, just, moral, ethical, and strategically correct thing to do. I say this because I am selfish:

  • I would rather die than be tortured by our enemies.
  • I would rather die than be leveraged to make my family and country less safe.
  • I would rather die than live the rest of my life a shell of my former self, haunted not only by the likely torture inflicted on me, but by the guilt of the harm to others resulting from my spared life.

I don’t know why this hostage deal went through now. I don’t know what pressures have been brought to bear on the leaders in Israel. I don’t know if they are good people trying to protect their citizens, nefarious power hungry cretins looking to abuse both the Israeli and Palestinian populace to stay in control, weak-willed toadies who do what they’re told by others, or simply stupid. But my own stance is clear.

But what about the Palestinians?

I said it above, and I’ll say it again: I truly do feel horrible for the trauma that the Palestinian people are going through. Not for the active terrorists mind you, I feel no qualms about those raising arms against us being destroyed. But everyone else, even those who wish me and my fellow Israelis harm. (And, if polling is to be believed, that’s the majority of Palestinians.) I would much rather that they not be suffering now, and that eventually through earned trust on both sides, everyone’s lots are improved.

But the framework being imposed by those who “love” peace isn’t allowing that to happen. Trust cannot be built when there’s a greater incentive to return to the use of force. I was strongly opposed to the 2005 disengagement from Gaza. But once it happened, it could have been one of those trust-building starting points. Instead, I saw many people justify further violence by Hamas—such as non-stop rocket attacks on the south of Israel—because Israel hadn’t done enough yet.

Notice how fundamentally flawed this mentality is, just from an incentives standpoint! Israel gives up control of land, something against its own overall interests and something desired by Palestinians, and is punished for it with increased violence against citizens. Hamas engaged in a brutal destruction of all of its opponents within the Palestinian population, launched attacks on Israel, and when Israel did respond with force, Israel was blamed for having not done enough to appease Hamas.

I know people will want to complicate this story by bringing up the laundry list of past atrocities, of assigning negative motivations to Israel and its leaders, and a million other evasions that are used to avoid actually solving this conflict. Instead, I beg everyone to just use basic logic.

The violence will continue as long as the violence gets results.

January 16, 2025 12:00 AM

January 15, 2025

Well-Typed.Com

The Haskell Unfolder Episode 38: tasting and testing CUDA (map, fold, scan)

Today, 2025-01-15, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 38th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 38: tasting and testing CUDA (map, fold, scan)

CUDA is an extension of C for programming NVIDIA GPUs. In this episode of the Haskell Unfolder we show how to set up a CUDA library so that we can link to it from a Haskell application, how we can call CUDA functions from Haskell, and how we can use QuickCheck to find subtle bugs in our CUDA code. On the CUDA side, we show how to implement simple concurrent versions of map, fold and scan. No familiarity with CUDA will be assumed, but of course we will only be able to give a taste of CUDA programming.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at January 15, 2025 12:00 AM

January 13, 2025

Michael Snoyman

Incentives Determine Outcomes

My blog posts and reading material have both been on a decidedly economics-heavy slant recently. The topic today, incentives, squarely falls into the category of economics. However, when I say economics, I’m not talking about “analyzing supply and demand curves.” I’m talking about the true basis of economics: understanding how human beings make decisions in a world of scarcity.

A fair definition of incentive is “a reward or punishment that motivates behavior to achieve a desired outcome.” When most people think about economic incentives, they’re thinking of money. If I offer my son $5 if he washes the dishes, I’m incentivizing certain behavior. We can’t guarantee that he’ll do what I want him to do, but we can agree that the incentive structure itself will guide and ultimately determine what outcome will occur.

The great thing about monetary incentives is how easy they are to talk about and compare. “Would I rather make $5 washing the dishes or $10 cleaning the gutters?” But much of the world is incentivized in non-monetary ways too. For example, using the “punishment” half of the definition above, I might threaten my son with losing Nintendo Switch access if he doesn’t wash the dishes. No money is involved, but I’m still incentivizing behavior.

And there are plenty of incentives beyond our direct control! My son is also incentivized to not wash dishes because it’s boring, or because he has some friends over that he wants to hang out with, or dozens of other things. Ultimately, the conflicting array of different incentive structures placed on him will ultimately determine what actions he chooses to take.

Why incentives matter

A phrase I see often in discussions—whether they are political, parenting, economic, or business—is “if they could just do…” Each time I see that phrase, I cringe a bit internally. Usually, the underlying assumption of the statement is “if people would behave contrary to their incentivized behavior then things would be better.” For example:

  • If my kids would just go to bed when I tell them, they wouldn’t be so cranky in the morning.
  • If people would just use the recycling bin, we wouldn’t have such a landfill problem.
  • If people would just stop being lazy, our team would deliver our project on time.

In all these cases, the speakers are seemingly flummoxed as to why the people in question don’t behave more rationally. The problem is: each group is behaving perfectly rationally.

  • The kids have a high time preference, and care more about the joy of staying up now than the crankiness in the morning. Plus, they don’t really suffer the consequences of morning crankiness, their parents do.
  • No individual suffers much from their individual contribution to a landfill. If they stopped growing the size of the landfill, it would make an insignificant difference versus the amount of effort they need to engage in to properly recycle.
  • If a team doesn’t properly account for the productivity of individuals on a project, each individual receives less harm from their own inaction. Sure, the project may be delayed, company revenue may be down, and they may even risk losing their job when the company goes out of business. But their laziness individually won’t determine the entirety of that outcome. By contrast, they greatly benefit from being lazy by getting to relax at work, go on social media, read a book, or do whatever else they do when they’re supposed to be working.

Free Candy!

My point here is that, as long as you ignore the reality of how incentives drive human behavior, you’ll fail at getting the outcomes you want.

If everything I wrote up until now made perfect sense, you understand the premise of this blog post. The rest of it will focus on a bunch of real-world examples to hammer home the point, and demonstrate how versatile this mental model is.

Running a company

Let’s say I run my own company, with myself as the only employee. My personal revenue will be 100% determined by my own actions. If I decide to take Tuesday afternoon off and go fishing, I’ve chosen to lose that afternoon’s revenue. Implicitly, I’ve decided that the enjoyment I get from an afternoon of fishing is greater than the potential revenue. You may think I’m being lazy, but it’s my decision to make. In this situation, the incentive–money–is perfectly aligned with my actions.

Compare this to a typical company/employee relationship. I might have a bank of Paid Time Off (PTO) days, in which case once again my incentives are relatively aligned. I know that I can take off 15 days throughout the year, and I’ve chosen to use half a day for the fishing trip. All is still good.

What about unlimited time off? Suddenly incentives are starting to misalign. I don’t directly pay a price for not showing up to work on Tuesday. Or Wednesday as well, for that matter. I might ultimately be fired for not doing my job, but that will take longer to work its way through the system than simply not making any money for the day taken off.

Compensation overall falls into this misaligned incentive structure. Let’s forget about taking time off. Instead, I work full time on a software project I’m assigned. But instead of using the normal toolchain we’re all used to at work, I play around with a new programming language. I get the fun and joy of playing with new technology, and potentially get to pad my resume a bit when I’m ready to look for a new job. But my current company gets slower results, less productivity, and is forced to subsidize my extracurricular learning.

When a CEO has a bonus structure based on profitability, he’ll do everything he can to make the company profitable. This might include things that actually benefit the company, like improving product quality, reducing internal red tape, or finding cheaper vendors. But it might also include destructive practices, like slashing the R&D budget to show massive profits this year, in exchange for a catastrophe next year when the next version of the product fails to ship.

Golden Parachute CEO

Or my favorite example. My parents owned a business when I was growing up. They had a back office where they ran operations like accounting. All of the furniture was old couches from our house. After all, any money they spent on furniture came right out of their paychecks! But in a large corporate environment, each department is generally given a budget for office furniture, a budget which doesn’t roll over year-to-year. The result? Executives make sure to spend the entire budget each year, often buying furniture far more expensive than they would choose if it was their own money.

There are plenty of details you can quibble with above. It’s in a company’s best interest to give people downtime so that they can come back recharged. Having good ergonomic furniture can in fact increase productivity in excess of the money spent on it. But overall, the picture is pretty clear: in large corporate structures, you’re guaranteed to have mismatches between the company’s goals and the incentive structure placed on individuals.

Using our model from above, we can lament how lazy, greedy, and unethical the employees are for doing what they’re incentivized to do instead of what’s right. But that’s simply ignoring the reality of human nature.

Moral hazard

Moral hazard is a situation where one party is incentivized to take on more risk because another party will bear the consequences. Suppose I tell my son when he turns 21 (or whatever legal gambling age is) that I’ll cover all his losses for a day at the casino, but he gets to keep all the winnings.

What do you think he’s going to do? The most logical course of action is to place the largest possible bets for as long as possible, asking me to cover each time he loses, and taking money off the table and into his bank account each time he wins.

Heads I win, tails you lose

But let’s look at a slightly more nuanced example. I go to a bathroom in the mall. As I’m leaving, I wash my hands. It will take me an extra 1 second to turn off the water when I’m done washing. That’s a trivial price to pay. If I don’t turn off the water, the mall will have to pay for many liters of wasted water, benefiting no one. But I won’t suffer any consequences at all.

This is also a moral hazard, but most people will still turn off the water. Why? Usually due to some combination of other reasons such as:

  1. We’re so habituated to turning off the water that we don’t even consider not turning it off. Put differently, the mental effort needed to not turn off the water is more expensive than the 1 second of time to turn it off.
  2. Many of us have been brought up with a deep guilt about wasting resources like water. We have an internal incentive structure that makes the 1 second to turn off the water much less costly than the mental anguish of the waste we created.
  3. We’re afraid we’ll be caught by someone else and face some kind of social repercussions. (Or maybe more than social. Are you sure there isn’t a law against leaving the water tap on?)

Even with all that in place, you may notice that many public bathrooms use automatic water dispensers. Sure, there’s a sanitation reason for that, but it’s also to avoid this moral hazard.

A common denominator in both of these is that the person taking the action that causes the liability (either the gambling or leaving the water on) is not the person who bears the responsibility for that liability (the father or the mall owner). Generally speaking, the closer together the person making the decision and the person incurring the liability are, the smaller the moral hazard.

It’s easy to demonstrate that by extending the casino example a bit. I said it was the father who was covering the losses of the gambler. Many children (though not all) would want to avoid totally bankrupting their parents, or at least financially hurting them. Instead, imagine that someone from the IRS shows up at your door, hands you a credit card, and tells you you can use it at a casino all day, taking home all the chips you want. The money is coming from the government. How many people would put any restriction on how much they spend?

And since we’re talking about the government already…

Government moral hazards

As I was preparing to write this blog post, the California wildfires hit. The discussions around those wildfires gave a huge number of examples of moral hazards. I decided to cherry-pick a few for this post.

The first and most obvious one: California is asking for disaster relief funds from the federal government. That sounds wonderful. These fires were a natural disaster, so why shouldn’t the federal government pitch in and help take care of people?

The problem is, once again, a moral hazard. In the case of the wildfires, California and Los Angeles both had ample actions they could have taken to mitigate the destruction of this fire: better forest management, larger fire department, keeping the water reservoirs filled, and probably much more that hasn’t come to light yet.

If the federal government bails out California, it will be a clear message for the future: your mistakes will be fixed by others. You know what kind of behavior that incentivizes? More risky behavior! Why spend state funds on forest management and extra firefighters—activities that don’t win politicians a lot of votes in general—when you could instead spend it on a football stadium, higher unemployment payments, or anything else, and then let the feds cover the cost of screw-ups.

You may notice that this is virtually identical to the 2008 “too big to fail” bail-outs. Wall Street took insanely risky behavior, reaped huge profits for years, and when they eventually got caught with their pants down, the rest of us bailed them out. “Privatizing profits, socializing losses.”

Too big to fail

And here’s the absolute best part of this: I can’t even truly blame either California or Wall Street. (I mean, I do blame them, I think their behavior is reprehensible, but you’ll see what I mean.) In a world where the rules of the game implicitly include the bail-out mentality, you would be harming your citizens/shareholders/investors if you didn’t engage in that risky behavior. Since everyone is on the hook for those socialized losses, your best bet is to maximize those privatized profits.

There’s a lot more to government and moral hazard, but I think these two cases demonstrate the crux pretty solidly. But let’s leave moral hazard behind for a bit and get to general incentivization discussions.

Non-monetary competition

At least 50% of the economics knowledge I have comes from the very first econ course I took in college. That professor was amazing, and had some very colorful stories. I can’t vouch for the veracity of the two I’m about to share, but they definitely drive the point home.

In the 1970s, the US had an oil shortage. To “fix” this problem, they instituted price caps on gasoline, which of course resulted in insufficient gasoline. To “fix” this problem, they instituted policies where, depending on your license plate number, you could only fill up gas on certain days of the week. (Irrelevant detail for our point here, but this just resulted in people filling up their tanks more often, no reduction in gas usage.)

Anyway, my professor’s wife had a friend. My professor described in great detail how attractive this woman was. I’ll skip those details here since this is a PG-rated blog. In any event, she never had any trouble filling up her gas tank any day of the week. She would drive up, be told she couldn’t fill up gas today, bat her eyes at the attendant, explain how helpless she was, and was always allowed to fill up gas.

This is a demonstration of non-monetary compensation. Most of the time in a free market, capitalist economy, people are compensated through money. When price caps come into play, there’s a limit to how much monetary compensation someone can receive. And in that case, people find other ways of competing. Like this woman’s case: through using flirtatious behavior to compensate the gas station workers to let her cheat the rules.

The other example was much more insidious. Santa Monica had a problem: it was predominantly wealthy and white. They wanted to fix this problem, and decided to put in place rent controls. After some time, they discovered that Santa Monica had become wealthier and whiter, the exact opposite of their desired outcome. Why would that happen?

Someone investigated, and ended up interviewing a landlady that demonstrated the reason. She was an older white woman, and admittedly racist. Prior to the rent controls, she would list her apartments in the newspaper, and would be legally obligated to rent to anyone who could afford it. Once rent controls were in place, she took a different tact. She knew that she would only get a certain amount for the apartment, and that the demand for apartments was higher than the supply. That meant she could be picky.

She ended up finding tenants through friends-of-friends. Since it wasn’t an official advertisement, she wasn’t legally required to rent it out if someone could afford to pay. Instead, she got to interview people individually and then make them an offer. Normally, that would have resulted in receiving a lower rental price, but not under rent controls.

So who did she choose? A young, unmarried, wealthy, white woman. It made perfect sense. Women were less intimidating and more likely to maintain the apartment better. Wealthy people, she determined, would be better tenants. (I have no idea if this is true in practice or not, I’m not a landlord myself.) Unmarried, because no kids running around meant less damage to the property. And, of course, white. Because she was racist, and her incentive structure made her prefer whites.

You can deride her for being racist, I won’t disagree with you. But it’s simply the reality. Under the non-rent-control scenario, her profit motive for money outweighed her racism motive. But under rent control, the monetary competition was removed, and she was free to play into her racist tendencies without facing any negative consequences.

Bureaucracy

These were the two examples I remember for that course. But non-monetary compensation pops up in many more places. One highly pertinent example is bureaucracies. Imagine you have a government office, or a large corporation’s acquisition department, or the team that apportions grants at a university. In all these cases, you have a group of people making decisions about handing out money that has no monetary impact on them. If they give to the best qualified recipients, they receive no raises. If they spend the money recklessly on frivolous projects, they face no consequences.

Under such an incentivization scheme, there’s little to encourage the bureaucrats to make intelligent funding decisions. Instead, they’ll be incentivized to spend the money where they recognize non-monetary benefits. This is why it’s so common to hear about expensive meals, gift bags at conferences, and even more inappropriate ways of trying to curry favor with those that hold the purse strings.

Compare that ever so briefly with the purchases made by a small mom-and-pop store like my parents owned. Could my dad take a bribe to buy from a vendor who’s ripping him off? Absolutely he could! But he’d lose more on the deal than he’d make on the bribe, since he’s directly incentivized by the deal itself. It would make much more sense for him to go with the better vendor, save $5,000 on the deal, and then treat himself to a lavish $400 meal to celebrate.

Government incentivized behavior

This post is getting longer in the tooth than I’d intended, so I’ll finish off with this section and make it a bit briefer. Beyond all the methods mentioned above, government has another mechanism for modifying behavior: through directly changing incentives via legislation, regulation, and monetary policy. Let’s see some examples:

  • Artificial modification of interest rates encourages people to take on more debt than they would in a free capital market, leading to malinvestment and a consumer debt crisis, and causing the boom-bust cycle we all painfully experience.
  • Going along with that, giving tax breaks on interest payments further artificially incentivizes people to take on debt that they wouldn’t otherwise.
  • During COVID-19, at some points unemployment benefits were greater than minimum wage, incentivizing people to rather stay home and not work than get a job, leading to reduced overall productivity in the economy and more printed dollars for benefits. In other words, it was a perfect recipe for inflation.
  • The tax code gives deductions to “help” people. That might be true, but the real impact is incentivizing people to make decisions they wouldn’t have otherwise. For example, giving out tax deductions on children encourages having more kids. Tax deductions on childcare and preschools incentivizes dual-income households. Whether or not you like the outcomes, it’s clear that it’s government that’s encouraging these outcomes to happen.
  • Tax incentives cause people to engage in behavior they wouldn’t otherwise (daycare+working mother, for example).
  • Inflation means that the value of your money goes down over time, which encourages people to spend more today, when their money has a larger impact. (Milton Friedman described this as high living.)

Conclusion

The idea here is simple, and fully encapsulated in the title: incentives determine outcomes. If you want to know how to get a certain outcome from others, incentivize them to want that to happen. If you want to understand why people act in seemingly irrational ways, check their incentives. If you’re confused why leaders (and especially politicians) seem to engage in destructive behavior, check their incentives.

We can bemoan these realities all we want, but they are realities. While there are some people who have a solid internal moral and ethical code, and that internal code incentivizes them to behave against their externally-incentivized interests, those people are rare. And frankly, those people are self-defeating. People should take advantage of the incentives around them. Because if they don’t, someone else will.

(If you want a literary example of that last comment, see the horse in Animal Farm.)

How do we improve the world under these conditions? Make sure the incentives align well with the overall goals of society. To me, it’s a simple formula:

  • Focus on free trade, value for value, as the basis of a society. In that system, people are always incentivized to provide value to other people.
  • Reduce the size of bureaucracies and large groups of all kinds. The larger an organization becomes, the farther the consequences of decisions are from those who make them.
  • And since the nature of human beings will be to try and create areas where they can control the incentive systems to their own benefits, make that as difficult as possible. That comes in the form of strict limits on government power, for example.

And even if you don’t want to buy in to this conclusion, I hope the rest of the content was educational, and maybe a bit entertaining!

January 13, 2025 12:00 AM

January 12, 2025

Sandy Maguire

Read the Code, Not the Profile

At work a few weeks back, I found myself digging into profile reports, trying to determine why our program was running so slowly. Despite having the extremely obvious-in-retrospect data in front of me, I wasted a lot of time speeding up code that turned out to not move the needle at all.

Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.

Some Context

I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.

But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.

So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.

Without further ado, here’s what our profile looked like:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%

Well, that’s not very helpful. Of course MAIN takes 100% of the time. So I expanded that, and saw:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%

No clearer. Opening up main:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%

Sheesh.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%

OH MY GOD. JUST TELL ME SOMETHING ALREADY.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%

Fast forwarding for quite a while, I opened up the entire stack until I got to something that didn’t take 100% of the program’s runtime:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%
        └ makeTest . . . . . . . . . . . . . . . . . 100%
          └ makeTest.\ . . . . . . . . . . . . . . . 100%
            └ compileProgram . . . . . . . . . . . . 100%
              └ evalAppT . . . . . . . . . . . . . . 100%
                └ runAppT  . . . . . . . . . . . . . 100%
                  └ runAppT' . . . . . . . . . . . . 100%
                    └ withLogging  . . . . . . . . . 100%
                      └ transformSSA . . . . . . . . 100%
                        └ >>=  . . . . . . . . . . . 100%
                          └ >>>= . . . . . . . . . . 100%
                            └ ibind  . . . . . . . . 100%
                              └ ibind.\  . . . . . . 100%
                                └ ibind.\.\  . . . . 100%
                                  ├ toSSA  . . . . . 15%
                                  ├ transform1 . . . 15%
                                  ├ transform2 . . . 10%
                                  ├ transform3 . . . 10%
                                  ├ transform4 . . . 20%
                                  └ collectGarbage . 30%

Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.

A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.

Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.

Retrospective

So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.

Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.

That is to say that a percentage-based improvement is \(O(n)\) faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.

It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 1011 instructions, while 9s is 1010. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.

To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.

So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.

The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.

After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.

This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.

Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.

The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.

January 12, 2025 03:29 PM

January 09, 2025

Edward Z. Yang

New Years resolutions for PyTorch in 2025

In my previous two posts "Ways to use torch.compile" and "Ways to use torch.export", I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some... just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!

Improving torch.compile

A programming model for PT2. A programming model is a an abstract description of the system that is both simple (so anyone can understand it and keep it in their head all at once) and can be used to predict the system's behavior. The torch.export programming model is an example of such a description. Beyond export, we would like to help users understand why all aspects of PT2 behave the way it does (e.g., via improved error messages), and give simple, predictable tools for working around problems when they arise. The programming model helps us clearly define the intrinsic complexity of our compiler, which we must educate users about. This is a big effort involving many folks on the PyTorch team and I hope we can share more about this effort soon.

Pre-compilation: beyond single graph export. Whenever someone realizes that torch.compile compilation is taking a substantial amount of time on expensive cluster machines, the first thing they ask is, "Why don't we just compile it in advance?" To support precompiling the torch.compile API exactly as is not so easy; unlike a traditional compiler which gets the source program directly as input, users of torch.compile must actually run their Python program to hit the regions of code that are intended to be compiled. Nor can these regions be trivially enumerated and then compiled: not only must know all the metadata input tensors flowing into a region, a user might not even know what the compiled graphs are if a model has graph breaks.

OK, but why not just run the model, dump all the compiled products, and then reuse them later? This works! Here is a POC from Nikita Shulga where a special decorator aot_compile_sticky_cache swaps between exporting a graph and running the exported product. Zhengxu Chen used a similar idea to export Whisper as a few distinct graphs, which he then manually stitched together in C++ to get a Python-free version of Whisper. If you want training to work, you can more directly integrate AOTInductor as an Inductor backend, e.g., as seen in this POC.. We are a stones throw away from working precompilation, which can guarantee no compilation at runtime, we just need to put the pieces together!

Improving caching further. There are some gaps with caching which we hope to address in the near future: (1) loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup (James Wu is on this), (2) if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request at the beginning (Oguz Ulgen recently landed this), (3) AOTAutograd cache is not fully rolled out yet (James Wu again). These collectively should be worth a 2x speedup or even more on warm cache time.

Fix multithreading. We should just make sure multithreading works, doing the testing and fiddly thread safety auditing needed to make it work. Here's a list of multithreading related issues.

Improving torch.export

Draft mode export. Export requires a lot of upfront work to even get an exported artifact in the first place. Draft mode export capitalizes on the idea that it's OK to generate an unsound "draft" graph early in the export, because even an incorrect graph is useful for kicking the tires on the downstream processing that happens after export. A draft export gives you a graph, and it also gives you a report describing what potential problems need to be fixed to get some guarantees about the correctness of the export. You can then chip away on the problems in the report until everything is green. One of the biggest innovations of draft-mode export is pervasive use of real tensor propagation when doing export: you run the export with actual tensors, so you can always trace through code, even if it is doing spicy things like data-dependent control flow.

Libtorch-free AOTInductor. AOTInductor generated binaries have a relatively small ABI surface that needs to be implemented. This hack from the most recent CUDA Mode meetup shows that you can just create an alternate implementation of the ABI that has no dependence on libtorch. This makes your deployed binary size much smaller!

Support for bundling CUDA kernels into AOTInductor. AOTInductor already supports directly bundling Triton kernels into the generated binary, but traditional CUDA kernels cannot be bundled in this way. There's no reason this has to be the case though: all we're doing is bundling cubins in both case. If we have the ability to bundle traditional CUDA kernels into AOTInductor, this means you could potentially directly embed custom operators into AOTInductor binaries, which is nice because then those operators no longer have to be offered on the runtime (especially if you're commonly iterating on these kernels!)

Export multigraphs. Export's standard model is to give you a single graph that you call unconditionally. But it's easy to imagine a level of indirection on top of these graphs, where we can dispatch between multiple graphs depending on some arguments to the model. For example, if you have a model that optionally takes an extra Tensor argument, you can simply have two graphs, one for when the Tensor is absent, and one for when it is present.

ABI stable PyTorch extensions. It's hard work being a third-party PyTorch extension with native code, because whenever there's a new release of Python or PyTorch you have to rebuild all of your wheels. If there was a limited ABI that you could build your extension against that didn't expose CPython and only relied on a small, stable ABI of PyTorch functions, your binary packaging situation would be much simpler! And if an extension relied on a small ABI, it could even be bundled with AOTInductor binary, letting these export products be truly package agnostic (one of our lessons we learned with torch.package is picking the split between "what is packaged" and "what is not" is very difficult, and people would much rather just have everything be packaged.) Jane Xu is investigating how to do this, and separately, Scott Wolchok has been refactoring headers in libtorch so that a small set of headers can be used independently of the rest of libtorch.

by Edward Z. Yang at January 09, 2025 08:50 PM

January 05, 2025

Manuel M T Chakravarty

Functional Programming in Swift

When people talk about functional programming in modern multi-paradigm languages, they usually mention Rust, Scala, or Kotlin. You rarely hear Swift being mentioned. This is odd, as one might argue that, of these languages, Swift places the strongest emphasis on functional programming.

In this talk, I will explain the core functional programming features of Swift, including its expressive type system, value types, and mutability control. Furthermore, I will discuss how Swift’s language design is influenced by the desire to create a language that addresses the whole spectrum from low-level systems programming up to high-level applications with sophisticated graphical user interfaces. Beyond the core language itself, functional programming also permeates Swift’s rich ecosystem of libraries. To support this point, I will outline some FP-inspired core libraries, covering concepts from functional data structures over functional reactive programming to declarative user interfaces.

Finally, I will briefly summarise practical considerations for using Swift in your own projects. This includes the cross-platform toolchain, the package manager, and interoperability with other languages.

January 05, 2025 07:45 PM

Abhinav Sarkar

Solving Advent of Code “Seating System” with Comonads and Stencils

In this post, we solve the Advent of Code 2020 “Seating System” challenge in Haskell using comonads and stencils.

This post was originally published on abhinavsarkar.net.

The Challenge

Here’s a quick summary of the challenge:

The seat layout fits on a grid. Each position is either floor (.), an empty seat (L), or an occupied seat (#). For example, the initial seat layout might look like this:

L.LL.LL.LL
LLLLLLL.LL
L.L.L..L..
LLLL.LL.LL
L.LL.LL.LL
L.LLLLL.LL
..L.L.....
LLLLLLLLLL
L.LLLLLL.L
L.LLLLL.LL

All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).

The following rules are applied to every seat simultaneously:

  • If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
  • If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
  • Otherwise, the seat’s state does not change.
Floor (.) never changes; seats don’t move, and nobody sits on the floor.

This is a classic Cellular Automaton problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and returns the final number of occupied seats. Let’s solve this in Haskell.

The Cellular Automaton

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Comonad (Comonad (..))
import Data.Function (on)
import Data.List (intercalate, nubBy)
import Data.Massiv.Array (Ix2 (..))
import Data.Massiv.Array qualified as A
import Data.Massiv.Array.Unsafe qualified as AU
import Data.Proxy (Proxy (..))
import Data.Vector.Generic qualified as VG
import Data.Vector.Generic.Mutable qualified as VGM
import Data.Vector.Unboxed qualified as VU
import System.Environment (getArgs, getProgName)

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the comonad, massiv and vector libraries.

Quoting the Wikipedia page on Cellular Automaton (CA):

  • A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.
  • For each cell, a set of cells called its neighborhood is defined relative to the specified cell.
  • An initial state is selected by assigning a state for each cell.
  • A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.

Let’s model the automaton of the challenge using Haskell:

newtype Cell = Cell Char deriving (Eq)

pattern Empty, Occupied, Floor :: Cell
pattern Empty = Cell 'L'
pattern Occupied = Cell '#'
pattern Floor = Cell '.'
{-# COMPLETE Empty, Occupied, Floor #-}

parseCell :: Char -> Cell
parseCell = \case
  'L' -> Empty
  '#' -> Occupied
  '.' -> Floor
  c -> error $ "Invalid character: " <> show c

rule :: Cell -> [Cell] -> Cell
rule cell neighbours =
  let occupiedNeighboursCount = length $ filter (== Occupied) neighbours
   in case cell of
        Empty | occupiedNeighboursCount == 0 -> Occupied
        Occupied | occupiedNeighboursCount >= 4 -> Empty
        _ -> cell

A cell in the grid can be in empty, occupied or floor state. We encode this with the pattern synonyms Empty, Occupied and Floor over the Cell newtype over Char1.

The parseCell function parses a character to a Cell. The rule function implements the automaton rule.

The Solution

We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.

class (Eq a) => Grid a where
  fromLists :: [[Cell]] -> a
  step :: a -> a
  toLists :: a -> [[Cell]]

solve :: forall a. (Grid a) => Proxy a -> [[Cell]] -> Int
solve _ =
  fromLists @a
    >>> fix step
    >>> toLists
    >>> fmap (filter (== Occupied) >>> length)
    >>> sum
  where
    fix f x = let x' = f x in if x == x' then x else fix f x'

We solve the challenge using the Grid typeclass that all our different solutions implement. A grid is specified by three functions:

  1. fromList: converts a list of lists of cells to the grid.
  2. step: runs one step of the CA simulation.
  3. toList: converts the grid back to a list of lists of cells.

The solve function calculates the number of finally occupied seats for any instance of the Grid typeclass by running the simulation till it converges2.

Now, we use solve to solve the challenge in three ways depending on the command line argument supplied:

main :: IO ()
main = do
  progName <- getProgName
  getArgs >>= \case
    [gridType, fileName] ->
      readFile fileName
        >>= (lines >>> map (map parseCell) >>> solve' gridType >>> print)
    _ -> putStrLn $ "Usage: " <> progName <> " -(z|a|s) <input_file>"
  where
    solve' = \case
      "-z" -> solve $ Proxy @(ZGrid Cell)
      "-a" -> solve $ Proxy @(AGrid Cell)
      "-s" -> solve $ Proxy @(SGrid Cell)
      _ -> error "Invalid grid type"

We have set up the top (main) and the bottom (rule) of our solutions. Now let’s work on the middle part.

The Zipper

To simulate a CA, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? Zippers!.

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.

The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:

data Zipper a = Zipper [a] a [a] deriving (Eq, Functor)

zPosition :: Zipper a -> Int
zPosition (Zipper left _ _) = length left

zLength :: Zipper a -> Int
zLength (Zipper left _ right) = length left + 1 + length right

listToZipper :: [a] -> Zipper a
listToZipper = \case
  [] -> error "Cannot create Zipper from empty list"
  (x : xs) -> Zipper [] x xs

zipperToList :: Zipper a -> [a]
zipperToList (Zipper left focus right) = reverse left <> (focus : right)

pShowZipper :: (Show a) => Zipper a -> String
pShowZipper (Zipper left focus right) =
  unwords $
    map show (reverse left) <> (("[" <> show focus <> "]") : map show right)

zLeft :: Zipper a -> Zipper a
zLeft z@(Zipper left focus right) = case left of
  [] -> z
  x : xs -> Zipper xs x (focus : right)

zRight :: Zipper a -> Zipper a
zRight z@(Zipper left focus right) = case right of
  [] -> z
  x : xs -> Zipper (focus : left) x xs

A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:

  • zPosition returns the zero-indexed position of the focus in the zipper.
  • zLength returns the length of the zipper.
  • listToZipper and zipperToList do conversions between lists and zippers.
  • pShowZipper pretty-prints a zipper, highlighting the focus.
  • zLeft and zRight move the zipper’s focus to left and right respectively.

Let’s see it all in action:

> z = listToZipper [1..7]
> putStrLn $ pShowZipper z
[1] 2 3 4 5 6 7
> z' = zRight $ zRight $ zLeft $ zRight $ zRight z
> putStrLn $ pShowZipper z'
1 2 3 [4] 5 6 7
> zPosition z'
3
> zLength z'
7
> zipperToList z'
[1,2,3,4,5,6,7]

Great! Now, what is the zipper for a once-nested list? A once-nested zipper, of course:

newtype ZGrid a = ZGrid (Zipper (Zipper a)) deriving (Eq, Functor)

zgPosition :: ZGrid a -> (Int, Int)
zgPosition (ZGrid rows@(Zipper _ focus _)) = (zPosition rows, zPosition focus)

zgSize :: ZGrid a -> (Int, Int)
zgSize (ZGrid rows@(Zipper _ focus _)) = (zLength rows, zLength focus)

listsToZGrid :: [[a]] -> ZGrid a
listsToZGrid rows =
  let (first : rest) = fmap listToZipper rows
   in ZGrid $ Zipper [] first rest

zGridToLists :: ZGrid a -> [[a]]
zGridToLists (ZGrid (Zipper left focus right)) =
  reverse (fmap zipperToList left)
    <> (zipperToList focus : fmap zipperToList right)

pShowZGrid :: (Show a) => ZGrid a -> String
pShowZGrid (ZGrid (Zipper left focus right)) =
  intercalate "\n" $ pShowRows left <> (pShowZipper focus : pShowRows right)
  where
    pShowRows = map pShowZipper'
    pShowZipper' =
      zipperToList
        >>> splitAt (zPosition focus)
        >>> \ ~(left', focus' : right') ->
          unwords $
            map show left' <> ((" " <> show focus' <> " ") : map show right')

ZGrid is a newtype over a zipper of zippers. It has functions similar to Zipper for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.

Next, the functions to move the focus in the grid:

zgUp :: ZGrid a -> ZGrid a
zgUp (ZGrid rows) = ZGrid $ zLeft rows

zgDown :: ZGrid a -> ZGrid a
zgDown (ZGrid rows) = ZGrid $ zRight rows

zgLeft :: ZGrid a -> ZGrid a
zgLeft (ZGrid rows) = ZGrid $ fmap zLeft rows

zgRight :: ZGrid a -> ZGrid a
zgRight (ZGrid rows) = ZGrid $ fmap zRight rows

Let’s check them out in GHCi:

> zg = listsToZGrid $ replicate 7 $ [1..7]
> putStrLn $ pShowZGrid zg
[1] 2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
> zg' = zgDown $ zgRight $ zgDown $ zgRight zg
> putStrLn $ pShowZGrid zg'
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2 [3] 4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
> zgPosition zg'
(2,2)
> zgSize zg'
(7,7)
> zGridToLists zg'
[[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7]]

It works as expected. Now, how do we use this to simulate a CA?

The Comonad

A CA requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is Comonad.

Comonads are duals of Monads3. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating CA:

class Functor w => Comonad w where
  extract :: w a -> a
  duplicate :: w a -> w (w a)
  extend :: (w a -> b) -> w a -> w b
  {-# MINIMAL extract, (duplicate | extend) #-}

Assuming we can make ZGrid a comonad instance, the signatures for the above functions for ZGrid Cell would be:

class Comonad ZGrid where
  extract :: ZGrid Cell -> Cell
  duplicate :: ZGrid Cell -> ZGrid (ZGrid Cell)
  extend :: (ZGrid Cell -> Cell) -> ZGrid Cell -> ZGrid Cell

For ZGrid as a CA comonad:

  • The extract function would return the current focus of the grid.
  • The duplicate function would return a grid of grids, one inner grid for each possible focus of the input grid.
  • The extend function would apply the automata rule to each possible focus of the grid, and return a new grid.

The nice part is, we need to implement only the extract and duplicate functions, and the generation of the new grid is taken care of automatically by the default implementation of the extend function. Let’s write the comonad instance for ZGrid.

First, we write the comonad instance for Zipper:

instance Comonad Zipper where
  extract (Zipper _ focus _) = focus
  duplicate zipper = Zipper left zipper right
    where
      pos = zPosition zipper
      left = iterateN pos zLeft $ zLeft zipper
      right = iterateN (zLength zipper - pos - 1) zRight $ zRight zipper

iterateN :: Int -> (a -> a) -> a -> [a]
iterateN n f = take n . iterate f

extract for Zipper simply returns the input zipper’s focus element.

duplicate returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:

> z = listToZipper [1..7] :: Zipper Int
> :t duplicate z
duplicate z :: Zipper (Zipper Int)
> mapM_ (putStrLn . pShowZipper) $ zipperToList $ duplicate z
[1] 2 3 4 5 6 7
1 [2] 3 4 5 6 7
1 2 [3] 4 5 6 7
1 2 3 [4] 5 6 7
1 2 3 4 [5] 6 7
1 2 3 4 5 [6] 7
1 2 3 4 5 6 [7]

Great! Now we use similar construction to write the comonad instance for ZGrid:

instance Comonad ZGrid where
  extract (ZGrid grid) = extract $ extract grid
  duplicate grid = ZGrid $ Zipper left focus right
    where
      (focusRowPos, focusColPos) = zgPosition grid
      (rowCount, colCount) = zgSize grid

      focus = Zipper focusLeft grid focusRight
      focusLeft = iterateN focusColPos zgLeft $ zgLeft grid
      focusRight =
        iterateN (colCount - focusColPos - 1) zgRight $ zgRight grid

      left = iterateN focusRowPos (fmap zgUp) $ fmap zgUp focus
      right =
        iterateN (rowCount - focusRowPos - 1) (fmap zgDown) $ fmap zgDown focus

It works in similar fashion:

> zg = listsToZGrid $ replicate 4 $ [0..3] :: ZGrid Int
> putStrLn $ pShowZGrid zg
[0] 1 2 3
 0  1 2 3
 0  1 2 3
 0  1 2 3
> :t duplicate zg
duplicate zg :: ZGrid (ZGrid Int)
> :t mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg
mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg :: IO ()

I’ve rearranged the output of running the last line of the code above for clarity:

Output of duplicate for ZGrid <noscript>Output of duplicate for ZGrid</noscript>
Output of duplicate for ZGrid

We can see a grid of grids, with one inner grid focussed at each possible focus of the input grid. Now we finally implement the automaton:

zGridNeighbours :: ZGrid a -> [a]
zGridNeighbours grid =
  map snd . nubBy ((==) `on` fst) $
    [ (pos, extract grid')
      | move <- moves,
        let grid' = move grid,
        let pos = zgPosition grid',
        pos /= zgPosition grid
    ]
  where
    moves =
      [ zgUp, zgDown, zgRight, zgLeft,
        zgUp >>> zgLeft, zgUp >>> zgRight,
        zgDown >>> zgLeft, zgDown >>> zgRight
      ]

stepZGrid :: ZGrid Cell -> ZGrid Cell
stepZGrid = extend $ \grid -> rule (extract grid) (zGridNeighbours grid)

instance Grid (ZGrid Cell) where
  fromLists = listsToZGrid
  step = stepZGrid
  toLists = zGridToLists

zGridNeighbours returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.

stepZGrid implements one step of the CA using the extend function of the Comonad typeclass. We call extend with a function that takes the current grid, and returns the result of running the CA rule on its focus and the neighbours of the focus.

Finally, we plug in our functions into the ZGrid Cell instance of Grid.

That’s it! Let’s compile and run the code4:

❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main             ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
        2.72 real         2.68 user         0.02 sys

I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?

The Array

The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like length are \(O(n)\). The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure5, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.

Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the CA, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.

Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the massiv library.

data AGrid a = AGrid {aGrid :: A.Array A.B A.Ix2 a, aGridFocus :: A.Ix2}
  deriving (Eq, Functor)

A.Ix2 is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of Ints. A.Array A.B A.Ix2 a here means a 2D boxed array of as. massiv uses representation strategies to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the Functor instance of A.Array exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.

It is actually massively6 easier to write the Comonad instance for AGrid:

instance Comonad AGrid where
  extract (AGrid grid focus) = grid A.! focus
  extend f (AGrid grid focus) =
    AGrid (A.compute $ A.imap (\pos _ -> f $ AGrid grid pos) grid) focus

The extract implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement duplicate because it is easier to implement extend directly. We map with index (A.imap) over the grid, calling the function f for the variation of the grid with the index as the focus.

Next, we write the CA step:

listsToAGrid :: [[Cell]] -> AGrid Cell
listsToAGrid = A.fromLists' A.Seq >>> flip AGrid (0 :. 0)

aGridNeighbours :: AGrid a -> [a]
aGridNeighbours (AGrid grid (x :. y)) =
  [ grid A.! (x + i :. y + j)
    | i <- [-1, 0, 1],
      j <- [-1, 0, 1],
      (x + i, y + j) /= (x, y),
      validIndex (x + i, y + j)
  ]
  where
    A.Sz (rowCount :. colCount) = A.size grid
    validIndex (a, b) = and [a >= 0, b >= 0, a < rowCount, b < colCount]

stepAGrid :: AGrid Cell -> AGrid Cell
stepAGrid = extend $ \grid -> rule (extract grid) (aGridNeighbours grid)

instance Grid (AGrid Cell) where
  fromLists = listsToAGrid
  step = stepAGrid
  toLists = aGrid >>> A.toLists

listsToAGrid converts a list of lists of cells into an AGrid focussed at (0,0). aGridNeighbours finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. stepAGrid calls extract and aGridNeighbours to implement the CA step, much like the ZGrid case. And finally, we create the AGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
        0.10 real         0.09 user         0.00 sys

Woah! It takes only 0.1 second this time. Can we do even better?

The Stencil

massiv has a construct called Stencil that can be used for simulating CA:

Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.

That sounds like exactly what we need. Let’s try it out next.

With stencils, we do not need the instance of Comonad for the grid. So we can switch to the faster unboxed array representation:

newtype instance VU.MVector s Cell = MV_Char (VU.MVector s Char)
newtype instance VU.Vector Cell = V_Char (VU.Vector Char)
deriving instance VGM.MVector VU.MVector Cell
deriving instance VG.Vector VU.Vector Cell
instance VU.Unbox Cell

type SGrid a = A.Array A.U A.Ix2 a

First five lines make Cell an instance of the Unbox typeclass. We chose to make Cell a newtype wrapper over Char because Char has an Unbox instance.

Then we define a new grid type SGrid that is an 2D unboxed array.

Now, we define the stencil and the step function for our CA:

ruleStencil :: A.Stencil A.Ix2 Cell Cell
ruleStencil = AU.makeUnsafeStencil (A.Sz (3 :. 3)) (1 :. 1) $ \_ get ->
  rule (get (0 :. 0)) $ map get neighbourIndexes
  where
    neighbourIndexes =
      [ -1 :. -1, -1 :. 0, -1 :. 1,
         0 :. -1,           0 :. 1,
         1 :. -1,  1 :. 0,  1 :. 1
      ]

stepSGrid :: SGrid Cell -> SGrid Cell
stepSGrid = A.mapStencil (A.Fill Floor) ruleStencil >>> A.computeP

instance Grid (SGrid Cell) where
  fromLists = A.fromLists' A.Seq
  step = stepSGrid
  toLists = A.toLists

We make a stencil of size 3-by-3, where the focus is at index (1,1) relative to the stencil’s top-left cell. In the callback function, we use the supplied get function to get the neighbours of the focus by using indices relative to the focus, and call rule with the cells at focus and neighbour indices.

Then we write the step function stepSGrid that maps the stencil over the grid. Finally we put everything together in the SGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
        0.08 real         0.07 user         0.00 sys

It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice A.computeP we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
        0.04 real         0.11 user         0.05 sys

The -threaded option enables multithreading, and the +RTS -N option makes the process use all CPU cores7. We get a nice speedup of 2x over the single-threaded version.

Bonus Round: Simulation Visualization

Since you’ve read the entire post, here is a bonus visualization of the CA simulation for you (warning: lots of fast blinking):

Play the simulation <noscript></noscript>

That’s it for this post! I hope you enjoyed it and took something away from it. If you have any questions or comments, please leave a comment below. If you liked this post, please share it with your friends. Thanks for reading!

The full code for this post is available here.


  1. The reason for using a newtype instead of a data is explained in the Stencil section.↩︎

  2. If you are unfamiliar, >>> is the left-to-right function composition function:

    f >>> g = g . f
    ↩︎
  3. This short post by Bartosz Milewski explains how comonads and monads are related.↩︎

  4. We use Nix for getting the dependency libraries.↩︎

  5. I did try a variation with Data.Sequence.Seq instead of lists, and it was twice as fast.↩︎

  6. Pun very much intended.↩︎

  7. I tried running the process with different values of N and found that N4 gave the fastest results. So, Amdahl’s law applies here.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 05, 2025 12:00 AM

January 04, 2025

Philip Wadler

Telnaes quits The Washington Post



Cartoonist Ann Telnaes has quit the Washington Post, after they refused to publish one of her cartoons, depicting Mark Zuckerberg (Meta), Sam Altman (Open AI), Patrick Soon-Shiong (LA Times), the Walt Disney Company (ABC News), and Jeff Bezos (Amazon & Washington Post). All that exists is her preliminary sketch, above. Why is this important? See her primer below. (Spotted via Boing Boing.)





 

by Philip Wadler (noreply@blogger.com) at January 04, 2025 09:41 PM

December 24, 2024

Edward Z. Yang

Ways to use torch.export

Previously, I discussed the value proposition of torch.compile. While doing so, I observed a number of downsides (long compile time, complicated operational model, lack of packaging) that were intrinsic to torch.compile's API contract, which emphasized being able to work on Python code as is, with minimal intervention from users. torch.export occupies a different spot in the tradeoff space: in exchange for more upfront work making a model exportable, it allows for use of PyTorch models in environments where using torch.compile as is would be impossible.

Enable end-to-end C++ CPU/GPU Inference

Scenario: Like before, suppose you want to deploy your model for inference. However, now you have more stringent runtime requirements: perhaps you need to do inference from a CPython-less environment (because your QPS requirements require GIL-less multithreading; alternately, CPython execution overhead is unacceptable but you cannot use CUDA graphs, e.g., due to CPU inference or dynamic shapes requirements). Or perhaps your production environment requires hermetic deploy artifacts (for example, in a monorepo setup, where infrastructure code must be continually pushed but model code should be frozen). But like before, you would prefer not to have to rewrite your model; you would like the existing model to serve as the basis for your Python-less inference binary.

What to do: Use torch.export targeting AOTInductor. This will compile the model into a self-contained shared library which then can be directly invoked from a C++ runtime. This shared library contains all of the compiler generated Triton kernels as precompiled cubins and is guaranteed not to need any runtime compilation; furthermore, it relies only on a small runtime ABI (with no CPython dependency), so the binaries can be used across versions of libtorch. AOTInductor's multithreading capability and low runtime overhead also makes it a good match for CPU inference too!

You don't have to go straight to C++ CPU/GPU inference: you can start with using torch.compile on your code before investing in torch.export. There are four primary extra requirements export imposes: (1) your model must compile with fullgraph=True (though you can sometimes bypass missing Dynamo functionality by using non-strict export; sometimes, it is easier to do non-strict torch.export than it is to torch.compile!), (2) your model's inputs/outputs must only be in torch.export's supported set of argument types (think Tensors in pytrees), (3) your model must never recompile--specifically, you must specify what inputs have dynamic shapes, and (4) the top-level of your model must be an nn.Module (so that export can keep track of all of the parameters your model has).

Some tips:

  • Check out the torch.export programming model. The torch.export programming model is an upcoming doc which aims to help set expectations on what can and cannot be exported. It talks about things like "Tensors are the only inputs that can actually vary at runtime" and common mistakes such as module code which modifies NN modules (not supported!) or optional input types (you will end up with an export that takes in that input or not, there is no runtime optionality).
  • Budget time for getting a model to export. With torch.compile for Python inference, you could just slap it on your model and see what happens. For torch.export, you have to actually finish exporting your entire model before you can even consider running the rest of the pipeline. For some of the more complicated models we have exported, there were often dozens of issues that had to be worked around in one way or another. And that doesn't even account for all of the post-export work you have to do, like validating the numerics of the exported model.
  • Intermediate value debugging. AOTInductor has an option to add dumps of intermediate tensor values in the compiled C++ code. This is good for determining, e.g., the first time where a NaN shows up, in case you are suspecting a miscompilation.

Open source examples: Among other things, torchchat has an example end-to-end AOTInductor setup for server-side LLM inference, which you can view in run.cpp.

torch.export specific downsides:

  • No built-in support for guard-based dispatch (multiple compilations). Earlier, I mentioned that an exported model must not have any recompiles. This leads to some fairly common patterns of code not being directly supported by torch.export: you can't export a single model that takes an enum as input, or has an optional Tensor argument, or accepts two distinct tensor shapes that need to be compiled individually. Now, technically, we could support this: you could imagine a package that contains multiple exported artifacts and dispatches between them depending on some conditions (e.g., the value of the enum, whether or the optional Tensor argument was provided, the shape of the input tensor). But you're on your own: torch.compile will do this for you, but torch.export will not.
  • No built-in support for models that are split into multiple graphs. Similarly, we've mentioned that an exported model must be a single graph. This is in contrast to torch.compile, which will happily insert graph breaks and compile distinct islands of code that can be glued together with Python eager code. Now, technically, you can do this with export too: you can carve out several distinct subnets of your model, export them individually, and then glue them together with some custom written code on the other end (in fact, Meta's internal recommendation systems do this), but there's no built-in support for this workflow.
  • The extra requirements often don't cover important components of real world models. I've mentioned this previously as the extra restrictions export places on you, but it's worth reiterating some of the consequences of this. Take an LLM inference application: obviously, there is a core model that takes in tokens and produces logit predictions--this part of the model is exportable. But there are also important other pieces such as the tokenizer and sampling strategy which are not exportable (tokenizer because it operates on strings, not tensors; sampling because it involves complicated control flow). Arguably, it would be much better if all of these things could be directly bundled with the model itself; in practice, end-to-end applications should just expect to directly implement these in native code (e.g., as is done in torchchat). Our experience with TorchScript taught us that we don't really want to be in the business of designing a general purpose programming language that is portable across all of export's targets; better to just bet that the tokenizer doesn't change that often and eat the cost of natively integrating it by hand.

AOTInductor specific downsides:

  • You still need libtorch to actually run the model. Although AOTInductor binaries bundle most of their compiled kernel implementation, they still require a minimal runtime that can offer basic necessities such as tensor allocation and access to custom operators. There is not yet an official offering of an alternative, lightweight implementation of the stable ABI AOTInductor binaries depends on, so if you do want to deploy AOTInductor binaries you will typically have to also bring libtorch along. This is usually not a big deal server side, but it can be problematic if you want to do client side deployments!
  • No CUDA graphs support. This one is not such a big deal since you are much less likely to be CPU bound when the host side logic is all compiled C++, but there's no support for CUDA graphs in AOTInductor. (Funnily enough, this is also something you technically can orchestrate from outside of AOTInductor.)

Edge deployment

Scenario: You need to deploy your PyTorch model to edge devices (e.g., a mobile phone or a wearable device) where computational resources are limited. You have requirements that are a bit different from server size: you care a lot more about minimizing binary size and startup time. Traditional PyTorch deployment with full libtorch won't work. The device you're deploying too might also have some strange extra processors, like a DSP or NPU, that you want your model to target.

What to do: Use torch.export targeting Executorch. Among other things, Executorch offers a completely separate runtime for exported PyTorch programs (i.e., it has no dependency on libtorch, except perhaps there are a few headers which we share between the projects) which was specifically designed for edge deployment. (Historical note: we spent a long time trying to directly ship a stripped down version of libtorch to mobile devices, but it turns out it's really hard to write code that is portable on server and client, so it's better to only share when absolutely necessary.) Quantization is also a pretty important part of deployment to Edge, and Executorch incorporates this into the end-to-end workflow.

Open source examples: torchchat also has an Executorch integration letting you run an LLM on your Android phone.

Downsides. All of the export related downsides described previously apply here. But here's something to know specifically about Executorch:

  • The edge ecosystem is fragmented. At time of writing, there are seven distinct backends Executorch can target. This is not really Executorch's fault, it comes with the territory--but I want to call it out because it stands in stark contrast to the NVIDIA's server-side hegemony. Yes, AMD GPUs are a thing, and various flavors of CPU are real, but it really is a lot easier to be focused on server side because NVIDIA GPUs come first.

Pre-compiled kernels for eager mode

Scenario: You need a new function or self-contained module with an efficient kernel implementation. However, you would prefer not to have to write the CUDA (or even Triton) by hand; the kernel is something that torch.compile can generate from higher level PyTorch implementation. At the same time, however, you cannot tolerate just-in-time compilation at all (perhaps you are doing a massive training job, and any startup latency makes it more likely that one of your nodes will fail during startup and then you make no progress at all; or maybe you just find it annoying when PyTorch goes out to lunch when you cache miss).

What to do: Use torch.export targeting AOTInductor, and then load and run the AOTInductor generated binary from Python.

Downsides. So, we know this use case works, because we have internally used this to unblock people who wanted to use Triton kernels but could not tolerate Triton's just-in-time compilation. But there's not much affordance in our APIs for this use case; for example, guard-based dispatch is often quite useful for compiled functions, but you'll have to roll that by hand. More generally, when compiling a kernel, you have to make tradeoffs about how static versus dynamic the kernel should be (for example, will you force the inputs to be evenly divisible by eight? Or would you have a separate kernel for the divisible and not divisible cases?) Once again, you're on your own for making the call there.

An exchange format across systems

Scenario: In an ideal world, you would have a model, you could export it to an AOTInductor binary, and then be all done. In reality, maybe this export process needs to be a multi-stage process, where it has to be processed to some degree on one machine, and then finish processing on another machine. Or perhaps you need to shift the processing over time: you want to export a model to freeze it (so it is no longer tied to its original source code), and then repeatedly run the rest of the model processing pipeline on this exported program (e.g., because you are continuously updating its weights and then reprocessing the model). Maybe you want to export the model and then train it from Python later, committing to a distributed training strategy only when you know how many nodes you are running. The ability to hermetically package a model and then process it later is one of the big value propositions of TorchScript and torch.package.

What to do: Use torch.export by itself, potentially using pre-dispatch if you need to support training use-cases. torch.export produces an ExportedProgram which has a clean intermediate representation that you can do processing on, or just serialize and then do processing on later.

Downsides:

  • Custom operators are not packaged. A custom operator typically refers to some native code which was linked with PyTorch proper. There's no way to extract out this kernel and embed it into the exported program so that there is no dependence; instead, you're expected to ensure the eventual runtime relinks with the same custom operator. Note that this problem doesn't apply to user defined Triton kernels, as export can simply compile it and package the binary directly into the exported product. (Technically, this applies to AOTInductor too, but this tends to be much more of a problem for use cases which are primarily about freezing rapidly evolving model code, as opposed to plain inference where you would simply just expect people to not be changing custom operators willy nilly.)
  • Choose your own decompositions. Export produces IR that only contains operators from a canonical operator set. However, the default choice is sometimes inappropriate for use cases (e.g., some users want aten.upsample_nearest2d.vec to be decomposed while others do not), so in practice for any given target you may have a bespoke operator set that is appropriate for that use case. Unfortunately, it can be fiddly getting your operator set quite right, and while we've talked about ideas like a "build your own operator set interactive tool" these have not been implemented yet.
  • Annoyingly large FC/BC surface. Something I really like about AOTInductor is that it has a very small FC/BC surface: I only need to make sure I don't make breaking changes to the C ABI, and I'm golden. With export IR, the FC/BC surface is all of the operators produced by export. Even a decomposition is potentially BC breaking: a downstream pass could be expecting to see an operator that no longer exists because I've decomposed it into smaller pieces. Matters get worse in pre-dispatch export, since the scope of APIs used inside export IR expands to include autograd control operators (e.g., torch.no_grad) as well as tensor subclasses (since Tensor subclasses cannot be desugared if we have not yet eliminated autograd). We will not break your AOTInductor blobs. We can't as easily give the same guarantee for the IR here.

Next time: What's missing, and what we're doing about it

by Edward Z. Yang at December 24, 2024 04:28 AM

December 23, 2024

Michael Snoyman

A secure Bitcoin self custody strategy

Up until this year, my Bitcoin custody strategy was fairly straightforward, and likely familiar to other hodlers:

  • Buy a hardware wallet
  • Put the seed phrase on steel plates
  • Secure those steel plates somewhere on my property

But in October of last year, the situation changed. I live in Northern Israel, close to the Lebanese border. The past 14 months have involved a lot of rocket attacks, including destruction of multiple buildings in my home town. This brought into question how to properly secure my sats. Importantly, I needed to balance two competing goals:

  1. Resiliency of the saved secrets against destruction. In other words: make sure I didn't lose access to the wallet.
  2. Security against attackers trying to steal those secrets. In other words: make sure no one else got access to the wallet.

I put some time into designing a solution to these conflicting goals, and would like to share some thoughts for others looking to improve their BTC custody strategy. And if anyone has any recommendations for improvements, I'm all ears!

Goals

  • Self custody I didn't want to rely on an external custody company. Not your keys, not your coins.
  • Full access I always maintain full access to my funds, without relying on any external party.
  • Computer hack resilient If my computer systems are hacked, I will not lose access to or control of my funds (neither stolen nor lost).
  • Physical destruction resilient If my hardware device and steel plates are both destroyed (as well as anything else physically located in my home town), I can still recovery my funds.
  • Will survive me If I'm killed, I want my wife, children, or other family members to be able to recover and inherit my BTC.

Multisig

The heart of this protection mechanism is a multisig wallet. Unfortunately, interfaces for setting up multisig wallets are tricky. I'll walk through the basics and then come back to how to set it up.

The concept of a multisig is that your wallet is protected by multiple signers. Each signer can be any "normal" wallet, e.g. a software or hardware wallet. You choose a number of signers and a threshold of signers required to perform a transaction.

For example, a 2 of 2 multisig would mean that 2 wallets can sign transactions, and both of them need to sign to make a valid transaction. A 3 of 5 would mean 5 total signers, any 3 of them being needed to sign a transaction.

For my setup, I set up a 2 of 3 multisig, with the 3 signers being a software wallet, a hardware wallet, and SLIP39 wallet. Let's go through each of those, explain how they work, and then see how the solution addresses the goals.

Software wallet

I set up a software wallet and saved the seed phrase in a dedicated password manager account using Bitwarden. Bitwarden offers an emergency access feature, which essentially means a trusted person can be listed as an emergency contact and can recover your account. The process includes a waiting period, during which the account owner can reject the request.

Put another way: Bitwarden is offering a cryptographically secure, third party hosted, fully managed, user friendly dead-man switch. Exactly what I needed.

I added a select group of trusted people as the recoverers on the account. Otherwise, I keep the account securely locked down in Bitwarden and can use it for signing when necessary.

Let's see how this stacks up against the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Fail, if my system is hacked, I lose control of the wallet
  • Physical destruction resilient Check, Bitwarden lives beyond my machines
  • Will survive me Check thanks to the dead-man switch

Hardware wallet

Not much to say about the hardware wallet setup that I haven't said already. Let's do the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Check, the private keys never leave the hardware device
  • Physical destruction resilient Fail, the wallet and plates could easily be destroyed, and the plates could easily be stolen. (The wallet could be stolen too, but thanks to the PIN mechanism would theoretically be resistant to compromise. But that's not a theory I'd want to bet my wealth on.)
  • Will survive me Check, anyone can take my plates and recover the wallet

SLIP39

This one requires a bit of explanation. SLIP39 is a not-so-common standard for taking some data and splitting it up into a number of shards. You can define the threshold of shards necessary to reconstruct the original secret. This uses an algorithm called Shamir's Secret Sharing. (And yes, it is very similar in function to multisig, but implemented differently).

The idea here is that this wallet is controlled by a group of friends and family members. Without getting into my actual setup, I could choose 7 very trusted individuals from all over the world and tell them that, should I contact them and ask for them, they should send me their shards so I can reconstruct that third wallet. And to be especially morbid, they also know the identity of some backup people in the event of my death.

In any event, the idea is that if enough of these people agree to, they can reconstruct the third wallet. The assumption is that these are all trustworthy people. But even with trustworthy people, (1) I could be wrong about how trustworthy they are, or (2) they could be coerced or tricked. So let's see how these security mechanism stands up:

  • Self custody Fail, I'm totally reliant on others.
  • Full access Fail, by design I don't keep this wallet myself, so I must rely on others.
  • Computer hack resilient Check, the holders of these shards keep them in secure, offline storage.
  • Physical destruction resilient Check (sort of), since the probability of all copies being destroyed or stolen is negligible.
  • Will survive me Check, by design

Comparison against goals

We saw how each individual wallet stacked up against the goals. How about all of them together? Well, there are certainly some theoretical ways I could lose the funds, e.g. my hardware wallet and plates are destroyed and a majority of shard holders for the SLIP39 lost their shards. However, if you look through the check/fail lists, every category has at least two checks. Meaning: on all dimensions, if some catastrophe happens, at least two of the wallets should survive.

Now the caveats (I seem to like that word). I did a lot of research on this, and this is at least tangential to my actual field of expertise. But I'm not a dedicated security researcher, and can't really claim full, deep understanding of all these topics. So if I made any mistakes here, please let me know.

How-to guide

OK, so how do you actually get a system like this running? I'll give you my own step-by-step guide. Best case scenario for all this: download all the websites and programs mentioned onto a fresh Linux system install, disconnect the internet, run the programs and copy down any data as needed, and then wipe the system again. (Or, alternatively, do all the actions from a Live USB session.)

  1. Set up the SLIP39. You can use an online generator. Choose the number of bits of entropy (IMO 128bit is sufficient), choose the total shares and threshold, and then copy down the phrases.
  2. Generate the software wallet. You can use a sister site to the SLIP39 generator. Choose either 12 or 24 words, and write those words down. On a different, internet-connected computer, you can save those words into a Bitwarden account, and set it up with appropriate emergency access.
  3. Open up Electrum. (Other wallets, like Sparrow, probably work for this too, but I've only done it with Electrum.) The rest of this section will include a step-by-step guide through the Electrum steps. And yes, I took these screenshots on a Mac, but for a real setup use a Linux machine.

Set up a new wallet. Enter a name (doesn't matter what) and click next.

New wallet

Choose a multisig wallet and click next.

Multisig

Choose 3 cosigners and require 2 signatures.

Signer count

Now we're going to enter all three wallets. The first one will be your hardware device. Click next, then follow all the prompts to set it up.

Hardware

After a few screens (they'll be different based on your choice of hardware device), you'll be prompted to select a derivation path. Use native segwit and the standard derivation path.

segwit

This next screen was the single most complicated for me, simply because the terms were unclear. First, you'll see a Zpub string displayed as a "master public key," e.g.:

Zpub75J9cLwa3iX1zB2oiTdvGDf4EyHWN1ZYs5gVt6JSM9THA6XLUoZhA4iZwyruCKHpw8BFf54wbAK6XdgtMLa2TgbDcftdsietCuKQ6eDPyi6

You need to write this down. It's the same as an xpub, but for multisig wallets. This represents all the possible public keys for your hardware wallet. Putting together the three Zpub values will allow your software of choice to generate all the receiving and change addresses for your new wallet. You'll need all three, so don't lose them! But on their own, they cannot be used to access your funds. Therefore, treat them with "medium" security. Backing up in Bitwarden with your software wallet is a good idea, and potentially simply sending to some friends to back up just in case.

And that explanation brings us back to the three choices on the screen. You can choose to either enter a cosigner key, a cosigner seed, or use another hardware wallet. The difference between key and seed is that the former is public information only, whereas the latter is full signing power. Often, multisig wallets are set up by multiple different people, and so instead of sharing the seed with each other (a major security violation), they each generate a seed phrase and only share the key with each other.

However, given that you're setting up the wallet with access to all seed phrases, and you're doing it on an airgapped device, it's safe to enter the seed phrases directly. And I'd recommend it, to avoid the risk of generating the wrong master key from a seed. So go ahead and choose "enter cosigner seed" and click next.

Add cosigner 2

And now onto the second most confusing screen. I copied my seed phrase into this text box, but it won't let me continue!

Cannot continue

The trick is that Electrum, by default, uses its own concept of seed phrases. You need to click on "Options" and then choose BIP39, and then enter your seed phrase.

BIP39

Continue through the other screens until you're able to enter the final seed. This time, instead of choosing BIP39, choose SLIP39. You'll need to enter enough of the SLIP39 shards to meet the threshold.

SLIP39

And with that, you can continue through the rest of the screens, and you'll now have a fully operational multisig!

Addresses

Open up Electrum again on an internet-connected computer. This time, connect the hardware wallet as before, enter the BIP39 as before, but for the SLIP39, enter the master key instead of the SLIP39 seed phrase. This will ensure that no internet connected device ever has both the software wallet and SLIP39 at the same time. You should confirm that the addresses on the airgapped machine match the addresses on the internet connected device.

If so, you're ready for the final test. Send a small amount of funds into the first receiving address, and then use Electrum on the internet connected device to (1) confirm in the history that it arrived and (2) send it back to another address. You should be asked to sign with your hardware wallet.

If you made it this far, congratulations! You're the proud owner of a new 2of3 multisig wallet.

Conclusion

I hope the topic of death and war wasn't too terribly morbid for others. But these are important topics to address in our world of self custody. I hope others found this useful. And once again, if anyone has recommendations for improvements to this setup, please do let me know!

December 23, 2024 12:00 AM

December 22, 2024

Haskell Interlude

60: Tom Ellis

Tom Ellis works at Groq, using Haskell to compile AI models to specialized hardware.  In this episode, we talk about stability of both GHC and Haskell libraries, effects, and strictness, and the premise of functional programming: make invalid states and invalid *laziness* unrepresentable! 

by Haskell Podcast at December 22, 2024 06:00 PM

December 21, 2024

Philip Wadler

Please submit to Lambda Days

 


I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!

Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!

🎯 Submission Deadline: 9 February 2025
🎙ï¸� Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.

Submit your talk and share your wisdom with the FP community.

👉 https://www.lambdadays.org/lambdadays2025#call-for-talks

by Philip Wadler (noreply@blogger.com) at December 21, 2024 07:56 PM

December 19, 2024

Tweag I/O

The Developer Experience Upgrade: From Create React App to Vite

We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity. Enter Vite: a modern build tool that’s not just an alternative to CRA, but a glimpse into the future of web development tooling. I’ll introduce both CRA and Vite, share how switching to Vite transformed our development workflow with concrete numbers and benchmarks to demonstrate the dramatic improvements in build times, startup speed, and overall developer experience.

Create React App: A Historical Context

Create React App played a very important role in making React what it is today. By introducing a single, clear, and recommended approach for creating React projects, it enabled developers to focus on building applications without worrying about the complexity of the underlying build tools.

However, like many mature and widely established tools, CRA has become stagnant over time by not keeping up with features provided by modern (meta-)frameworks like server-side rendering, routing, and data fetching. It also hasn’t taken advantage of web APIs to deliver fast applications by default.

Let’s dive into some of the most noticeable limitations.

Performance Issues

CRA’s performance issues stem from one major architectural factor: its reliance on Webpack as its bundler. Webpack, while powerful and flexible, has inherent performance limitations. Webpack processes everything through JavaScript, which is single-threaded by nature and slower at CPU-intensive tasks compared to lower-level languages like Go or Rust.

Here’s a simplified version of what happens every time you make a code change:

  1. CRA (using Webpack) needs to scan your entire project to understand how all your files are connected to build a dependency graph
  2. It then needs to transform all your modern JavaScript, TypeScript, or JSX code into a version that browsers can understand
  3. Finally, it bundles everything together into a single package that can be served to your browser

Rebuilding the app becomes increasingly time-consuming as the project grows. During development, Webpack’s incremental builds help mitigate performance challenges by only reprocessing modules that have changed, leveraging the dependency graph to minimize unnecessary work. However, the bundling step still needs to consider all files—both cached and reprocessed, to generate a complete bundle that can be served to the browser, which means Webpack must account for the entire codebase’s structure with each build.

Security Issues

When running npx create-react-app <project-directory>, after waiting for a while, a good amount of deprecated warnings (23 packages as of writing this) will be shown. At the end of the installation process, a message indicating 8 vulnerabilities (2 moderate, 6 high) will appear. This means that create-react-app relies on packages that have known critical security vulnerabilities.

Support Issues

The React team no longer recommends CRA for new projects, and they have stopped providing support for it. The last published version on npm was 3 years ago.

Instead, React’s official documentation now includes Vite in its recommendations for both starting new projects and adding React to existing projects.

While CRA served its purpose well in the past, its aging architecture, security vulnerabilities, and lack of modern features make it increasingly difficult to justify for new projects.

Introducing Vite

Vite is a build tool that is designed to be simpler, faster and more efficient for building modern web applications. It’s opinionated and comes with sensible defaults out of the box.

Vite was created by Evan You, author of Vue, in 2020 to solve the complexity, slowness and the heaviness of the JavaScript module bundling toolchain. Since then, Vite has become one of the most popular build tools for web development, with over 15 million downloads per week and a community that has rated it as the Most Loved Library Overall, No.1 Most Adopted (+30%) and No.2 Highest Retention (98%) in the State of JS 2024 Developer Survey.

In addition to streamlining the development of single-page applications, Vite can also power meta frameworks and has support for server-side rendering (SSR). Although its scope is broader than what CRA was meant for, it does a fantastic job replacing CRA.

Why Vite is Faster

Vite applies several modern web technologies to improve the development experience:

1. Native ES Modules (ESM)

During development mode, Vite serves source code over native ES modules basically letting the browser handle module loading directly and skipping the bundling step. With this approach, Vite only processes and sends code as it is imported by the browser, and conditionally imported modules are processed only if they’re actually needed on the current page. This means the dev server can start much faster, even in large projects.

2. Efficient Hot Module Replacement (HMR)

By serving source code as native ESM to the browser, thus skipping the bundling step, Vite’s HMR process can provide near-instant updates while preserving the application state. When code changes, Vite updates only the modified module and its direct dependencies, ensuring fast updates regardless of project size. Additionally, Vite leverages HTTP headers and caching to minimize server requests, speeding up page reloads when necessary. More information about what HMR is and how it works in Vite can be found in this exhaustive blog post.

3. Optimized Build Tooling

Even though ESM are now widely supported, dependencies can still be shipped as CommonJS or UMD. To leverage the benefits of ESM during development, Vite uses esbuild to pre-bundle dependencies when starting the dev server. This step involves transforming CommonJS/UMD to ES modules and converting dependencies with many internal modules into a single module, thus improving performance and reducing browser requests.

When it comes to production, Vite switches to Rollup to bundle the application. Bundling is still preferred over ESM when shipping to production, as it allows for more optimizations like tree-shaking, lazy-loading and chunk splitting.

While this dual-bundler approach leverages the strengths of each bundler, it’s important to note that it’s a trade-off that can potentially introduce subtle inconsistencies between development and production environments and adds to Vite’s complexity.

By leveraging modern web technologies like ESM and efficient build tools like esbuild and Rollup, Vite represents a significant leap forward in development tooling, offering speed and simplicity that CRA simply cannot match with the way it’s currently architected.

Practical Results

The Migration Process

The codebase we migrated from CRA to Vite had around 250 files and 30k lines of code. Built as a Single Page Application using React 18, it uses Zustand and React Context for state management, with Tailwind CSS and shadcn/ui and some Bootstrap legacy components.

Here is a high-level summary of the migration process as it applied to our project, which took roughly a day to complete. The main steps included:

  1. Removing CRA-related dependencies
  2. Installing Vite and its React plugin
  3. Moving index.html to the root directory
  4. Creating a Vite configuration file
  5. Adding a type declaration file
  6. Updating the npm scripts in package.json
  7. Adjusting tsconfig.json to align with Vite’s requirements

All steps are well documented in the Vite documentation and in several step-by-step guides available on the web.

Most challenges encountered were related to environment variables and path aliases, which were easily resolved using Vite’s documentation, and its vibrant community has produced extensive resources, guides, and solutions for even the most specialized setups.

Build Time

The build time for the project using Create React App (CRA) was 1 minute and 34 seconds. After migrating to Vite, the build time was reduced to 29.2 seconds, making it 3.2 times faster.

[Build time comparison between CRA and Vite showing 3.2x improvement]

This reduction in build time speeds up CI/CD cycles, enabling more frequent testing and deployment. This is crucial for our development workflow, where faster builds mean quicker turnaround times and fewer delays for other team members. It can also reduce the cost of running the build process.

Dev Server Startup Time

The speed at which the development server starts can greatly impact the development workflow, especially in large projects.

The development server startup times saw a remarkable improvement after migrating from Create React App (CRA) to Vite. With CRA, a cold start took 15.469 seconds, and a non-cold start was 6.241 seconds. Vite dramatically reduced these times, with a cold start at just 1.202 seconds—12.9 times faster—and a non-cold start at 598 milliseconds, 10.4 times faster. The graph below highlights these impressive gains.

Development server startup time comparison showing 12.9x improvement

This dramatic reduction in startup time is particularly valuable when working with multiple branches or when frequent server restarts are needed during development.

HMR Update Time

While both CRA and Vite perform well with Hot Module Replacement at our current project scale, there are notable differences in the developer experience. CRA’s Webpack-based HMR typically takes around 1 second to update—which might sound fast, but the difference becomes apparent when compared to Vite’s near-instantaneous updates.

This distinction becomes more pronounced as projects grow in size and complexity. More importantly, the immediate feedback from Vite’s HMR creates a noticeably smoother development experience, especially when designing features that require frequent code changes and UI testing cycles. The absence of even a small delay helps maintain a more fluid and enjoyable workflow.

Bundle Size

Another essential factor is the size of the final bundled application, which affects load times and overall performance.

Bundle size comparison between CRA and Vite showing 27.5% reduction in raw bundle size and 9.3% reduction in gzipped size

This represents a 27.5% reduction in raw bundle size and a 9.3% reduction in gzipped size. For end users, this means faster page loads, less data usage, and better performance, especially on mobile devices.

The data clearly illustrates that Vite’s improvements in build times, startup speed, and bundle size provide a significant and measurable upgrade to our development workflow.

The Hidden Advantage: Reduced Context Switching

One of the less obvious but valuable benefits of migrating to a faster environment like Vite is the reduction in context switching. In environments with slower build and start-up times, developers are more likely to engage in other tasks during these “idle” moments. Research on task interruptions shows that even brief context switches can introduce cognitive “reorientation” costs, increasing stress and reducing efficiency.

By reducing build and start-up times, Vite allows our team to maintain focus on their primary tasks. Developers are less likely to switch tasks and better able to stay within the “flow” of development, ultimately leading to a smoother, more focused workflow and, over time, less cognitive strain.

Beyond the measurable metrics, the real victory lies in how Vite’s speed helps developers maintain their focus and flow, leading to a more enjoyable and happy experience overall.

The Future of Vite is Bright

Vite is aiming to be a unified toolchain for the JavaScript ecosystem, and it is already showing great progress by introducing new tools like Rolldown and OXC.

Rolldown, Vite’s new bundler written in Rust, promises to be even faster than esbuild while maintaining full compatibility with the JavaScript ecosystem. It also unifies Vite’s bundling approach across development and production environments, solving the previously mentioned trade-off. Meanwhile, OXC provides a suite of high-performance tools including the fastest JavaScript parser, resolver, and TypeScript transformer available.

These innovations are part of Vite’s broader vision to create a more unified, efficient, and performant development experience that eliminates the traditional fragmentation in JavaScript tooling.

Early benchmarks show impressive performance improvements:

  • OXC Parser is 3x faster than SWC
  • OXC Resolver is 28x faster than enhanced-resolve
  • OXC TypeScript transformer is 4x faster than SWC
  • OXLint is 50-100x faster than ESLint

With innovations like Rolldown and OXC on the horizon, Vite is not just solving today’s development challenges but is actively shaping the future of web development tooling.

Conclusion

Migrating from Create React App to Vite proved to be a straightforward process that delivered substantial benefits across multiple dimensions. The quantifiable improvements in terms of build time, bundle size and development server startup time were impressive and by themselves justify the migration effort.

However, the true value extends beyond these measurable metrics. The near-instant Hot Module Replacement, reduced context switching, and overall smoother development workflow have significantly enhanced our team’s development experience. Developers spend less time waiting and more time in their creative flow, leading to better focus and increased productivity.

The migration also positions our project for the future, as Vite continues to evolve with promising innovations like Rolldown and OXC. Given the impressive results and the relatively straightforward migration process, the switch from CRA to Vite stands as a clear win for both our development team and our application’s performance.

December 19, 2024 12:00 AM

December 18, 2024

Michael Snoyman

Normal People Shouldn't Invest

The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:

  1. Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
  2. Try to beat inflation by investing in the stock market and other risk-on investments.
  3. Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.

(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)

High living and melting ice cubes

Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.

But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.

Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.

The investment option

With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.

You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?

Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.

So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.

Michael (Midas) Mulligan

Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.

He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.

And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.

No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.

That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.

Michael (Crappy Investor) Snoyman

I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.

I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.

So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.

I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.

And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.

Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.

And yes, I ultimately lost a massive amount of money through this.

Normal people shouldn’t invest

Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.

(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)

The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.

One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.

Option 4: deflation

The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.

If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.

And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.

So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.

Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.

I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.

Further reading

December 18, 2024 12:00 AM

December 17, 2024

Michael Snoyman

Hello Nostr

This blog post is in the style of my previous blog post on Matrix. I'm reviewing and sharing onboarding experience with a new technology. I'm sharing in the hopes that it will help others become aware of this new technology, understand what it can do, and if people are intrigued, have a more pleasant onboarding experience. Just keep in mind: I'm in no way an expert on this. PRs welcome to improve the content here!

What is Nostr? Why Nostr?

I’d describe Nostr as decentralized social media. It’s a protocol for people to identify themselves via public key cryptography (thus no central identity service), publish various kinds of information, access through any compatible client, and interact with anyone else. At its simplest, it’s a Twitter/X clone, but it offers much more functionality than that (though I’ve barely scratched the surface).

Nostr has a high overlap with the Bitcoin ecosystem, including built-in micropayments (zaps) via the Lightning Network, an instantaneous peer-to-peer payment layer built on top of Bitcoin.

I'll start off by saying: right now, Nostr's user experience is not on a par with centralized services like X. But I can see a lot of promise. The design of the protocol encourages widespread innovation, as demonstrated by the plethora of clients and other tools to access the protocol. Decentralized/federated services are more difficult to make work flawlessly, but the advantages in terms of freedom of expression, self-custody of your data, censorship resistance, and ability to build more featureful tools on top of it make me excited.

I was skeptical (to say the least) about the idea of micropayments built into social media. But I'm beginning to see the appeal. Firstly, getting away from an advertiser-driven business model fixes the age old problem of "if you're not paying for a service, you're the product." But I see a deeper social concept here too. I intend to blog more in the future on the topic of non-monetary competition and compensation. But in short, in the context of social media: every social network ends up making its own version of imaginary internet points (karma, moderator privileges, whatever you want to call it). Non-monetary compensation has a lot of downsides, which I won't explore here. Instead, making the credit system based on money with real-world value has the promise to vastly improve social media interactions.

Did that intrigue you enough to want to give this a shot? Awesome! Let me give you an overview of the protocol, and then we'll dive into my recommendation on getting started.

Protocol overview

The basics of the protocol can be broken down into:

  • Relays
  • Events
  • Identities
  • Clients

As a decentralized protocol, Nostr relies on public key cryptography for identities. That means, when you interact on the network, you'll use a private key (represented as an nsec value) to sign your messages, and will be identified by your public key (represented as an npub value). Anyone familiar with Bitcoin or cryptocurrency will be familiar with the keys vs wallet divide, and it lines up here too. Right off the bat, we see the first major advantage of Nostr: no one controls your identity except you.

Clients are how a user will interact with the protocol. You'll provide your identity to the client in one of a few ways:

  • Directly entering your nsec. This is generally frowned upon since it opens you up for exploit, though most mobile apps work by direct nsec entry.
  • Getting a view-only experience in clients that support it by entering your npub.
  • Using a signing tool to perform the signing on behalf of the client without giving away your private keys to everyone. (This matches most web3 interactions that rely on a wallet browser extension.)

Events are a general-purpose concept, and are the heart of Nostr interaction. Events can represent a note (similar to a Tweet), articles, likes, reposts, profile updates, and more. Anything you do on the protocol involves creating and signing an event. This is also the heart of Nostr's extensibility: new events can be created to support new kinds of interactions.

Finally there are relays. Relays are the servers of the Nostr world, and are where you broadcast your events to. Clients will typically configure multiple relays, broadcast your events to those relays, and query relays for relevant events for you (such as notes from people you follow, likes on your posts, and more).

Getting started

This is where the suboptimal experience really exists for Nostr. It took me a few days to come up with a setup that worked reliably. I'm going to share what worked best for me, but keep in mind that there are many other options. I'm a novice; other guides may give different recommendations and you may not like my selection of tools. My best recommendation: don't end up in shell shock like I did. Set up any kind of a Nostr profile, introduce yourself with the #introductions hashtag, and ask for help. I've found the community unbelievably welcoming.

Alright, so here are the different pieces you're going to need for a full experience:

  • Browser extension for signing
  • Web client
  • Mobile client
  • Lightning wallet
  • A Nostr address

I'm going to give you a set of steps that I hope both provides easy onboarding while still leaving you with the ability to more directly control your Nostr experience in the future.

Lightning wallet: coinos

First, you're going to set up a Lightning wallet. There are a lot of options here, and there are a lot of considerations between ease-of-use, self-custody, and compatibility with other protocols. I tried a bunch. My recommendation: use coinos. It's a custodial wallet (meaning: they control your money and you're trusting them), so don't put any large sums into it. But coinos is really easy to use, and supports Nostr Wallet Connect (NWC). After you set up your account, click on the gear icon, and then click on "Reveal Connection String." You'll want to use that when setting up your clients. Also, coinos gives you a Lightning address, which will be <username>@coinos.io. You'll need that for setting up your profile on Nostr.

Web client: YakiHonne

I tried a bunch of web clients and had problems with almost all of them. I later realized most of my problems seemed to be caused by incorrectly set relays, which we'll discuss below. In any event, I ultimately chose YakiHonne. It also has mobile iOS and Android clients, so you can have a consistent experience. (I also used the Damus iOS client, which is also wonderful.)

Go to the homepage, click on the Login button in the bottom-left, and then choose "Create an account." You can add a profile picture, banner image, choose a display name, and add a short description. In the signup wizard, you'll see an option to let YakiHonne set up a wallet (meaning a Lightning wallet) for you. I chose not to rely on this and used coinos instead to keep more flexibility for swapping clients in the future. If you want to simplify, however, you can just use the built-in wallet.

Before going any further, make sure you back up your nsec secret key!!! Click on your username in the bottom-left, then settings, and then "Your keys." I recommend saving both your nsec and npub values in your password manager.

YakiHonne keys

No, this isn't my actual set of keys, this was a test profile I set up while writing this post.

Within that settings page, click on "wallets," then click on the plus sign next to add wallets, and choose "Nostr wallet connect." Paste the NWC string you got from coinos, and you'll be able to zap people money!

Next, go back to settings and choose "Edit Profile." Under "Lightning address," put your <username>@coinos.io address. Now you'll also be able to receive zaps from others.

Another field you'll notice on the profile is NIP-05. That's your Nostr address. Let's talk about getting that set up.

NIP-05 Nostr address

Remembering a massive npub address is a pain. Instead, you'll want to set up a NIP-05 address. (NIP stands for Nostr Implementation Possibilities, you can see NIP-05 on GitHub.) There are many services—both paid and free—to get a NIP-05 address. You can see a set of services on AwesomeNostr. Personally, I decided to set up an identifier on my own domain. You can see my live nostr.json file, which at time of writing supports:

  • michael@snoyman.com and an alias snoyberg@snoyman.com
  • The special _@snoyman.com, which actually means "the entire domain itself"
  • And an identifier for my wife as well, miriam@snoyman.com

If you host this file yourself, keep in mind these two requirements:

  • You cannot have any redirects at that URL! If your identifier is name@domain, the URL https://domain/.well-known/nostr.json?name=<name> must resolve directly to this file.
  • You need to set CORS headers appropriately to allow for web client access, specifically the response header access-control-allow-origin: *.

Once you have that set up, add your Nostr address to your profile on YakiHonne. Note that you'll need the hex version of your npub, which you can generate by using the Nostr Army Knife:

Nostr Army Knife

BONUS I decided to also set up my own Lightning wallet address, by rehosting the Lightning config file from https://coinos.io/.well-known/lnurlp/snoyberg on my domain at https://snoyman.com/.well-known/lnurlp/michael.

Signer

As far as I can tell, Alby provides the most popular Nostr signing browser extension. The only problem I had with it was confusion about all the different things it does. Alby provides a custodial lightning wallet via Alby Hub, plus a mobile Alby Go app for accessing it, plus a browser extension for Nostr signing, and that browser extension supports using both the Alby Hub wallet and some other lightning wallets. I did get it all to work together, and it's a pleasant experience.

nos2x

However, to keep things a bit more simple and single-task focused, I'll recommend trying out the nos2x extension first. It's not pretty, but handles the signer piece very well. Install the extension, enter the nsec you got from YokiHonne, click save, and you're good to go. If you go to another Nostr client, like noStrudel, you should be able to "sign in with extension."

You may also notice that there's any entry area for "preferred relays." We'll discuss relays next. Feel free to come back to this step and add those relays. (And, after you've done that, you can also use a nostr.json generator to help you self-host your NIP-05 address if you're so inclined.)

Final note: once you've done the initial setup, it's not clear how to get back to the nos2x settings page. Right-click the extension, click manage extension, and then choose "extension options." At least those were the steps in Brave; it may be slightly different in other browsers.

Relays

This has been my biggest pain point with Nostr so far. Everything you do with Nostr needs to be sent to relays or received from relays. You want to have a consistent and relatively broad set of relays to make sure your view of the world is consistent. If you don't, you'll likely end up with things like mismatched profiles across relays, messages that seem to disappear, and more. This was probably my biggest stumbling block when first working with Nostr.

There seem to be three common ways to set the list of relays:

  • Manually entering the relays in the client's settings.
  • Getting the list of relays from the signer extension (covered by NIP-07).
  • Getting the list of relays from your NIP-05 file.

Unfortunately, it looks like most clients don't support the latter two methods. So unfortunately, any time you start using a new client, you should check the relay list and manually sync it up with a list of relays you maintain.

You can look at my nostr.json file for my own list of relays. One relay in particular I was recommended to use is wss://hist.nostr.land. This relay will keep track of your profile and follow list updates. As I mentioned, it's easy to accidentally partially-override your profile information through inconsistent relay lists, and apparently lots of new users (myself included) end up doing this. If you go to hist.nostr.land you can sign in, find your historical updates, and restore old versions.

Mobile

You're now set up on your web experience. For mobile, download any mobile app and set it up similarly to what I described for web. The major difference will be that you'll likely be entering your nsec directly into the mobile app.

I've used both Damus and YakiHonne. I had better luck with YakiHonne for getting zaps working reliably, but that may simply be because I'd tried Damus before I'd gotten set up with coinos before. I'll probably try out Damus some more soon.

Note on Damus: I had trouble initially with sending Zaps on Damus, but apparently that's because of Apple rules. You can enable Zaps by visiting this site on your device: https://zap.army/. Thanks William Cassarin for the guidance and the great app.

Introductions

You should now be fully set up to start interacting on Nostr! As a final step, I recommend you start off by sending an introduction note. This is a short note telling the world a bit about yourself, with the #introductions hashtag. For comparison, here's my introduction note (or a Nostr-native URL).

And in addition, feel free to @ me in a note as well, I'd love to meet other people on Nostr who joined up after reading this post. My identifier is michael@snoyman.com. You can also check out my profile page on njump, which is a great service to become acquainted with.

And now that you're on Nostr, let me share my experiences with the platform so far.

My experience

I'm definitely planning to continue using Nostr. The community has a different feel to my other major social media hub, X, which isn't surprising. There's a lot more discussion of Bitcoin and economics, which I love. There's also, at least subjectively, more of a sense of having fun versus X. I described it as joyscrolling versus doomscrolling.

Nostr is a free speech haven. It's quite literally impossible to fully silence someone. People can theoretically be banned from specific relays, but a banned user could always just use other relays are continue to create new keys. There's no KYC process to stop them. I've only found one truly vile account so far, and it was easy enough to just ignore. This fits very well with my own personal ethos. I'd rather people have a public forum to express any opinion, especially the opinions I most strongly disagree with, including calls to violence. I believe the world is better for allowing these opinions to be shared, debated, and (hopefully) exposed as vapid.

The process of zapping is surprisingly engaging. The amount of money people send around isn't much. The most common zap amount is 21 satoshis, which at the current price of Bitcoin is just about 2 US cents. Unless you become massively popular, you're not going to retire on zaps. But it's far more meaningful to receive a zap than a like; it means someone parted with something of actual value because you made their day just a little bit better. And likewise, zapping someone else has the same feeling. It's also possible to tip providers of clients and other tools, which is a fundamental shift from the advertiser-driven web of today.

I'd love to hear from others about their own experiences! Please reach out with your own findings. Hopefully we'll all be able to push social media into a more open, healthy, and fun direction.

December 17, 2024 12:00 AM

December 16, 2024

GHC Developer Blog

GHC 9.12.1 is now available

GHC 9.12.1 is now available

Zubin Duggal - 2024-12-16

The GHC developers are very pleased to announce the release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • Experimental support for the RISC-V platform with the native code generator.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at December 16, 2024 12:00 AM

December 13, 2024

Well-Typed.Com

GHC activities report: September–November 2024

This is the twenty-fifth edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of September to November 2024. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We are delighted to offer Haskell Ecosystem Support Packages to provide commercial users with access to Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem. Clients will both fund the work described in this report and support the Haskell Foundation. If your company is using Haskell, read more about our offer, or get in touch with us today, so we can help you get the most out of the toolchain. We need more funding to continue our essential maintenance work!

Many thanks to our existing sponsors who make this work possible: Anduril and Juspay. In addition, we are grateful to Mercury for funding specific work on improved performance for developer tools on large codebases.

Team

The GHC team at Well-Typed currently consists of Andreas Klebinger, Ben Gamari, Matthew Pickering, Rodrigo Mesquita, Sam Derbyshire and Zubin Duggal. Adam Gundry acts as Secretary to the GHC Steering Committee. Cabal maintenance is undertaken by Mikolaj Konarski, and HLS maintenance by Hannes Siebenhandl and Zubin Duggal. In addition, many others within Well-Typed are contributing to GHC more occasionally.

GHC Releases

Cabal

Cabal-3.14.0.0 was released in September, adding initial support for the new hooks build-type we implemented to replace custom setup scripts, as part of our work for the Sovereign Tech Fund on Cabal long-term maintainability. Corresponding new versions of cabal-install and the release of the Cabal-hooks library are due soon, at which point it will become easier for users to explore replacing their custom setup scripts with the hooks feature. Related to this effort:

  • Sam amended the Cabal-hooks version to match the Cabal library version (#10579).

  • Rodrigo made progress on Sam’s work to simplify the way cabal-install uses the Cabal library to build packages. This will make the code easier to work with in the future, and should improve build performance (#9871).

  • Rodrigo made cabal-install invoke git clone concurrently when downloading from git repositories for source-repository-package stanzas or cabal get, and switched it to use shallow clones by default (#10254). This speeds up the cloning step significantly.

  • Rodrigo’s work on private dependencies (#9743) is slowly making progress, thanks to recent work by Kristen Kozak.

  • Rodrigo and Matthew fixed various minor Cabal bugs, improved documentation and future-proofed against future core libraries changes (#10311, #10404, #10415 and #10433).

HLS

  • Hannes finished off and merged support for a new “jump to instance definition” feature in HLS (#4392), which will make it easier for users to understand which typeclass instance is in use in a particular expression.

GHC

Exception backtraces

Rodrigo worked on improving several facets of the exception backtraces story:

  • Improved the rendering of uncaught exceptions so the default output is much clearer and easier to understand (CLC proposal #285, !13301). This included reformatting the output, reducing duplication, avoiding exposing internal implementation details in the call stacks, and reducing duplication.

  • Changed functions such as catch to propagate the original cause if another exception is subsequently thrown (CLC proposal #202.

  • Landed a patch by Ben so that the HasCallStack-based backtrace for an error call is more informative (!12620, #24807).

Overall, exception backtraces will be much more useful in GHC 9.12 and later, making it easier to debug Haskell applications.

Frontend

  • Andreas added new primops, is[Mutable]ByteArrayWeaklyPinned#, which allow checking whether a bytearray can be moved by the RTS, as per CLC proposal #283.

  • Sam fixed a GHC panic involving out-of scope pattern synonyms (#25056, !13092).

  • Sam augmented the -fdiagnostics-as-json output to include the reason an error or warning was emitted (!13577, #25403).

  • Matthew and Rodrigo deprecated the unused -Wcompat-unqualified-imports warning (!12755, #24904, !13349, #25330).

  • Ben improved the parsing and parser errors for sizes in GHC RTS flags (!12384, #20201).

  • Matthew fixed a bug in the interaction of -working-dir and foreign files (!13196, #25150).

  • Zubin bumped the Haddock binary interface version to 46, to improve errors when there are mismatched interface files (!13342).

SIMD in the NCG backend

  • Sam and Andreas teamed up to finish the mega-MR adding SIMD support to GHC’s X86 native code generator backend (!12860, and see our previous report for more background). This also fixed critical correctness bugs that affected SIMD support in the existing LLVM backend, such as #25062 and #25169.

  • Sam followed this up with user’s guide documentation for the feature (!13380) and a couple of additional fixes for bugs that have been reported since (!13561, !13612).

LLVM backend

  • Sam implemented several fixes relating to the LLVM backend, in collaboration with GHC contributor @aratamizuki:

    • fix bugs involving fltused to ensure that GHC can use the LLVM backend on Windows once more (#22487, !13183),
    • use +sse4.2 instead of +sse42 (#25019),
    • make SSE4.2 imply +popcnt (#25353).
  • Matthew bumped the LLVM upper bound to allow GHC to use LLVM 19 (!13311, #25295).

RISC-V backend

  • Andreas added support for floating-point min/max operations in the RISC-V NCG backend (!13325)

  • Matthew fixed some issues to do with the fact that the RISC-V backend does not yet support SIMD vectors (!13327, #25314, #13327).

Object code determinism

  • Rodrigo merged !12680, which goes 95% of the way towards ensuring GHC produces fully deterministic object code (#12935).

  • Rodrigo made the unique generation used by the LLVM backend deterministic (!13307, #25274), thus making GHC 96% object-code deterministic.

  • Rodrigo ensured that re-exports did not spoil determinism of interface files in !13316 (#25304).

Compiler performance

  • Matthew, Rodrigo and Adam published a proposal for Explicit Level Imports. The proposed language feature will allow users of Template Haskell to communicate more precise dependencies for quotes and splices, which can unlock significant compile-time performance improvements and is a step towards better cross-compilation support.

  • Rodrigo improved the performance of module reachability queries (!13593), which can significantly reduce compile times and memory usage for projects with very large numbers of modules.

  • Andreas introduced a new flag, -fmax-forced-spec-args, to control the maximum size of specialised functions introduced when using the SPEC keyword (!13184, #25197). This avoids a potential compile-time performance cliff caused by specialisations with excessively large numbers of arguments.

  • Matthew greatly reduced the memory footprint of linking with the Javascript backend by making some parts of the compiler more lazy (!13346).

  • Zubin’s proposal to address libc compatibility issues when using semaphores for build parallelism has been making progress through the proposal process.

Runtime system

  • Ben fixed the encoding of breakpoint instructions in the RTS to account for a recent addition of support for inlining breakpoints (!13423, #25374).

  • Ben tightened up the documentation and invariants in the bytecode interpreter (!13565).

  • Ben allowed GNU-style non-executable stack notes to be used on FreeBSD (!13587, #25475).

  • Ben fixed an incorrect EINTR check in the timerfd ticker (!13588, #25477).

  • Zubin ensured that any new cost centres added by freshly-loaded objects are correctly included in the eventlog (!13114, #24148).

  • Ben increased the gen_workspace alignment in the RTS from 64 bytes to 128 bytes in order to prevent false sharing on Apple’s ARMv8 implementation, which uses a cache-line size of 128 bytes (!13594, #25459).

  • Ben removed some incorrect platform-dependent pointer casts in the RTS (!13597).

  • Zubin fixed a segfault when using the non-moving GC with profiling (!13271, #25232).

  • Ben fixed a stack overrun error with i386 adjustors (!13599, #25485).

  • Ben introduced a convenience printIPE debugging function for printing info-provenance table entries (!13614).

  • Andreas fixed a crash that could happen if an exception and the compacting GC aligned in specific ways (#24791, !13640).

Documentation

  • Ben fleshed out missing documentation of the eventlog format in the user’s guide (!13398, #25296).

  • Ben documented the :where GHCi command (!13399, #24509).

  • Andreas clarified the documentation of fexpose-overloaded-unfoldings (!13286, #24844).

  • Ben documented that GHC coalesces adjacent sub-word size fields of data constructors (!13397).

  • Ben improved the documentation of equality constraints to mention which language extensions are required to use them (!12395, #24127).

Codebase improvements

  • Ben fixed a few warnings in the runtime system, including some FreeBSD specific warnings (!13586).

  • Ben fixed (!13394, #25362) some incomplete pattern matches in GHC.Internal.IO.Windows.Handle that came to light in !13308, a refactor of the desugarer. Andreas also chipped in, squashing some warnings in ghc-heap (!13510).

  • Matthew refactored the partitionByWorkerSize function to avoid spurious pattern-match warning bugs when compiling with -g3 (!13359, #25338).

  • Matthew removed the hs-boot file for Language.Haskell.Syntax.ImpExp and introduced one for GHC.Hs.Doc, which better reflects the intended modular hierarchy of the modules (!13406).

GHC API

  • We are pleased to see that the Haskell Foundation and Tweag are resuming efforts aimed at defining a stable API for GHC.

  • Ben added lookupTHName, a more convenient way to look up a name in a GHC plugin that re-uses Template Haskell lookup functions (!12432, #24741).

  • Zubin made sure that the driverPlugin is correctly run for static plugins (!13199, #25217).

Libraries

  • Andreas allowed unknown FD device types in the setNonBlockingMode function (!13204, #25199), as per CLC proposal #282. This fixes a regression in the hinotify package on GHC 9.10.

  • Andreas made sure that all primops are re-exported in the ghc-experimental package via the module GHC.PrimOps (!13245), and changed the versioning scheme of ghc-experimental to follow GHC versions (!13344, #25289).

  • Ben fixed a performance regression in throw by judicious insertion of noinline (!13275, #25066), as discussed in CLC proposal #290.

  • Matthew unwired the base package to pave the way for a reinstallable base package (!13200).

  • Matthew upgraded GHC’s Unicode support to Unicode 16 (!13514).

  • Rodrigo removed BCO primops from the GHC.Exts re-export, as per CLC proposal #212 (!13211, #25110).

Profiling

  • Matthew enabled late cost-centres by default when the libraries distributed with GHC are built for profiling (!10930, #21732). This greatly improves the resolution of cost centre stacks when profiling.

  • Andreas fixed a bug in which profiling could affect program behaviour by allowing profiling ticks to move past unsafeCoerce# in Core (!13413, #25212).

Build system

  • Ben fixed the configure script incorrectly reporting that subsections-via-symbols is enabled on AArch64/Darwin (!12834, #24962).

  • The first alpha pre-release of 9.12 incorrectly had a version number of 9.12.20241014 instead of 9.12.0.20241014, which broke the expected lexicographic ordering of GHC releases. Ben added a check for the validity of a release GHC version number to prevent such issues in the future (!13456).

  • Matthew allowed GHC to build with happy-2.0.2 (!13318, #25276), and Ben with happy-2.1.2 (!13532, #25438).

  • Andreas made the Hadrian progress messages including the working directory to allow quickly distinguishing builds when multiple builds are in progress (!13353, #25335).

  • Ben allowed Haddock options to be passed as Hadrian key-value settings (!11006).

Testsuite

  • Ben improved the reporting of certain errors in the testsuite (!13332).

  • Ben ensured performance metrics are collected even when there are untracked files (!13579, #25471).

  • Ben ensured performance metrics are properly pushed for WebAssembly jobs (!13312).

  • Zubin made several improvements to the testsuite, in particular regarding normalisation of tests and fixing a Haddock bug involving files with the same modification time (!13418, !13522).

CI

  • Matthew added a i386 validation job that can be triggered by adding the i386 label on an MR (!13352).

  • Matthew added a mechanism for only triggering certain individual jobs in CI by using the ONLY_JOBS variable (!13350).

  • Matthew fixed some issues of variable inheritance in the ghcup-metadata testing job (!13306).

  • Matthew added Ubuntu 22.04 jobs to CI (!13335).

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at December 13, 2024 12:00 AM

December 12, 2024

Stackage Blog

LTS 23 release for ghc-9.8 and Nightly now on ghc-9.10

Stackage LTS 23 has been released

The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).

We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.

LTS 23 includes many package changes, and almost 3200 packages! Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen. (The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)

If your package is missing from LTS 23 and can build there, you can easily have it added by opening a PR in lts-haskell to the build-constraints/lts-23-build-constraints.yaml file.

Stackage Nightly updated to ghc-9.10.1

At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this. This initial release build was made by Jens Petersen (64 commits).

Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled. You can see all the changes made relative to the preceding last 9.8 nightly snapshot. Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section. We also have some tracking issues still open related to 9.10 core boot libraries.

Thank you to all those who have already done work updating their packages for ghc-9.10.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions, you can ask in Stack and Stackage Matrix room (#haskell-stack:matrix.org) or Slack channel.

December 12, 2024 07:00 AM

December 11, 2024

Haskell Interlude

Episode 59: Harry Goldstein

Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.

by Haskell Podcast at December 11, 2024 02:00 PM

Philip Wadler

John Longley's Informatics Lecturer Song

From my colleague, John Longley, a treat. 

‘Informatics Lecturer Song 

(Based on Gilbert and Sullivan’s ‘Major General song’) 

John Longley 

I am the very model of an Informatics lecturer,
For educating students you will never find a betterer.
I teach them asymptotics with a rigour that’s impeccable,
I’ll show them how to make their proofs mechanically checkable.
On parsing algorithms I can hold it with the best of them,
With LL(1) and CYK and Earley and the rest of them.
I’ll teach them all the levels of the Chomsky hierarchy…
With a nod towards that Natural Language Processing malarkey.

I’ll summarize the history of the concept of a function,
And I’ll tell them why their Haskell code is ‘really an adjunction’.
In matters mathematical and logical, etcetera,
I am the very model of an Informatics lecturer.

For matters of foundations I’m a genuine fanaticker:
I know by heart the axioms of Principia Mathematica,
I’m quite au fait with Carnap and with Wittgenstein’s Tractatus,
And I’ll dazzle you with Curry, Church and Turing combinators.
I’ll present a proof by Gödel with an algebraic seasoning,
I’ll instantly detect a step of non-constructive reasoning.
I’ll tell if you’re a formalist or logicist or Platonist…
For I’ll classify your topos by the kinds of objects that exist.

I’ll scale the heights of cardinals from Mahlo to extendible,
I’ll find your favourite ordinals and stick them in an n-tuple.
In matters philosophical, conceptual, etcetera,
I am the very essence of an Informatics lecturer.

And right now I’m getting started on my personal computer,
I’ve discovered how to get it talking to the Wifi router.
In Internet and World Wide Web I’ve sometimes had my finger dipped,
And once I wrote a line of code in HTML/Javascript.
[Sigh.] I know I have a way to go to catch up with my students,
But I try to face each lecture with a dash of common prudence.
When it comes to modern tech: if there’s a way to get it wrong, I do!
But that seems to be forgiven if I ply them with a song or two.

So… although my present IT skills are rather rudimentary,
And my knowledge of computing stops around the nineteenth century,
Still, with help from all my colleagues and my audience, etcetera…
I’ll be the very model of an Informatics lecturer.


by Philip Wadler (noreply@blogger.com) at December 11, 2024 11:52 AM

December 10, 2024

Chris Smith 2

When is a call stack not a call stack?

Tom Ellis, who I have the privilege of working with at Groq, has an excellent article up about using HasCallStack in embedded DSLs. You should read it. If you don’t, though, the key idea is that HasCallStack isn’t just about exceptions: you can use it to get source code locations in many different contexts, and storing call stacks with data is particularly powerful in providing a helpful experience to programmers.

Seeing Tom’s article reminded me of a CodeWorld feature which was implemented long ago, but I’m excited to share again in this brief note.

CodeWorld Recap

If you’re not familiar with CodeWorld, it’s a web-based programming environment I created mainly to teach mathematics and computational thinking to students in U.S. middle school, ages around 11 to 14 years old. The programming language is based on Haskell — well, it is technically Haskell, but with a lot of preprocessing and tricks aimed at smoothing out the rough edges. There’s also a pure Haskell mode, giving you the full power of the idiomatic Haskell language.

In CodeWorld, the standard library includes primitives for putting pictures on the screen. This includes:

  • A few primitive pictures: circles, rectangles, and the like
  • Transformations to rotate, translate, scale, clip, and and recolor an image
  • Compositions to overlay and combine multiple pictures into a more complex picture.

Because the environment is functional and declarative — and this will be important — there isn’t a primitive to draw a circle. There is a primitive that represents the concept of a circle. You can include a circle in your drawing, of course, but you compose a picture by combining simpler pictures declaratively, and then draw the whole thing only at the very end.

Debugging in CodeWorld

CodeWorld’s declarative interface enables a number of really fun kinds of interactivity… what programmers might call “debugging”, but for my younger audience, I view as exploratory tools: ways they can pry open the lid of their program and explore what it’s doing.

There are a few of these that are pretty awesome. Lest I seem to be claiming the credit, the implementation for these features is due to two students in Summer of Haskell and then in Google Summer of Code: Eric Roberts, and Krystal Maughan.

  • Not the point here, but there are some neat features for rewinding and replaying programs, zooming in, etc.
  • There’s also an “inspect” mode, in which you not only see the final result, but the whole structure of the resulting picture (e.g., maybe it’s an overlay of three other pictures: a background, and two characters, and each of those is transformed in some way, and the base picture for the transformation is some other overlay of multiple parts…) This is possible because pictures are represented not as bitmaps, but as data structures that remember how the picture was built from its individual parts

Krystal’s recap blog post contains demonstrations of not only her own contributions, but the inspect window as well. Here’s a section showing what I’ll talk about now.

https://medium.com/media/7f09408e8411d852516bedb5aab2601c/href

The inspect window is linked to the code editor! Hover over a structural part of the picture, and you can see which expression in your own code produced that part of the picture.

This is another application of the technique from Tom’s post. The data type representing pictures in CodeWorld stores a call stack captured at each part of the picture, so that when you inspect the picture and hover over some part, the environment knows where in your code you described that part, and it highlights the code for you, and jumps there when clicked.

While it’s the same technique, I really like this example because it’s not at all like an exception. We aren’t reporting errors or anything of the sort. Just using this nice feature of GHC that makes the connection between code and declarative data observable to help our users observe things about their own code.

by Chris Smith at December 10, 2024 10:50 PM

Christopher Allen

Two memory issues from the last two weeks

Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.

by Unknown at December 10, 2024 12:00 AM

December 06, 2024

Well-Typed.Com

Debugging your Haskell application with debuggable

In this blog post we will introduce a new open source Haskell library called debuggable, which provides various utilities designed to make it easier to debug your applications. Some of these are intended for use during actual debugging, others are designed to be a regular part of your application, ready to be used if and when necessary.

Non-interleaved output

Ever see output like this when debugging concurrent applications?

ATnhdi st hiiss  ai sm eas smaegses afgreo mf rtohme  tfhier sste ctohnrde atdh
read
AndT htihsi si si sa  am emsessasgaeg ef rformo mt hteh ef isrescto ntdh rteharde
ad
TAhnids  tihsi sa  imse sas amgees sfargoem  ftrhoem  ftihres ts etchorneda dt
hread

The problem is that concurrent calls to putStrLn can result in interleaved output. To solve this problem, debuggable offers Debug.NonInterleavedIO, which provides variants of putStrLn and friends, as well as trace and its variants, all of which can safely be called concurrently without ever resulting in interleaved output. For example:

import Debug.NonInterleavedIO qualified as NIIO

useDebuggable :: IO ()
useDebuggable = do
    concurrently_
      ( replicateM_ 10 $ do
          NIIO.putStrLn "This is a message from the first thread"
          threadDelay 100_000
      )
      ( replicateM_ 10 $ do
          NIIO.putStrLn "And this is a message from the second thread"
          threadDelay 100_000
      )

If we run this as-is, we will only see

niio output to /tmp/niio2418318-0

on the terminal; inspecting /tmp/niio2418318-0 will show

And this is a message from the second thread
This is a message from the first thread
And this is a message from the second thread
This is a message from the first thread
...

If you want to send the output to a specific file (or /dev/stdout for output to the terminal), you can set the NIIO_OUTPUT environment variable.

Provenance

Provenance is about tracking of what was called when and where.

Call-sites

Consider the following example:

f1 :: IO ()
f1 = f2

f2 :: HasCallStack => IO ()
f2 = f3

f3 :: HasCallStack => IO ()
f3 = putStrLn $ prettyCallStack callStack

The callstack we get from this example looks something like this:

CallStack (from HasCallStack):
  f3, called at Demo/Provenance.hs:15:6 in ..
  f2, called at Demo/Provenance.hs:12:6 in ..

Callstacks are awesome, and a huge help during debugging, but there are some minor issues with this example:

  • Personally, this has always felt a bit “off by one” to me: the first entry tells us that we are in f3, but we were called from line 15, which is f2; likewise, the second entry in the stack tells us that we are in f2, but we were called from line 12, which is f1. Not a huge deal, but arguably a bit confusing. (See also GHC ticket #25546: Make HasCallStack include the caller.)
  • Somewhat relatedly, when we are in f3, and ask for a CallStack, being told that we are in f3 is not particularly helpful (we knew that already).
  • Finally, it is sometimes useful to have just the “first” entry in the callstack; “we were called from line such and such, which is function so and so”.

For this reason, Debug.Provenance provides a CallSite abstraction

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = g3

g3 :: HasCallStack => IO ()
g3 = print callSite

This outputs:

g2 -> g3 (Demo/CallSite.hs:31:6)

where line 31 is the call to g3 in g2. Due to the (alleged) “off-by-one”, both g2 and g3 must be given a HasCallStack constraint, otherwise we get

{unknown} -> g3 (Demo/CallSite.hs:31:6)

when g2 lacks the constraint, or

{unknown} -> {unknown} ()

when g3 does. There is also a variant callSiteWithLabel, which results in output such as

g2 -> g3 (Demo/CallSite.hs:31:6, "foo")

Invocations

Sometimes we are not so much interested in where we are called from, but how often a certain line in the source is called. Debug.Provenance offers “invocations” to track this:

g1 :: IO ()
g1 = replicateM_ 2 g2

g2 :: HasCallStack => IO ()
g2 = do
    print =<< newInvocation
    replicateM_ 2 g3

g3 :: HasCallStack => IO ()
g3 = print =<< newInvocation

This results in output such as

g2 (Demo/Invocation.hs:30:15) #1
g3 (Demo/Invocation.hs:34:16) #1
g3 (Demo/Invocation.hs:34:16) #2
g2 (Demo/Invocation.hs:30:15) #2
g3 (Demo/Invocation.hs:34:16) #3
g3 (Demo/Invocation.hs:34:16) #4

We see the first call to g2, then the first and second call to g3, then the second call to g2, and finally the third and fourth call to h3.

When debugging problems such as deadlocks, it is often useful to insert putStrLn statements like this:

f4 :: IO ()
f4 = do
    putStrLn "f4:1"
    -- f4 does something ..
    putStrLn "f4:2"
    -- f4 does something else ..
    putStrLn "f4:3"

This pattern too can be made a bit simpler by using invocations:

g4 :: HasCallStack => IO ()
g4 = do
    print =<< newInvocation
    -- f4 does something ..
    print =<< newInvocation
    -- f4 does something else ..
    print =<< newInvocation

Resulting in output such as

g4 (Demo/Invocation.hs:48:15) #1
g4 (Demo/Invocation.hs:50:15) #1
g4 (Demo/Invocation.hs:52:15) #1

Scope

The definition of g4 above is still a little clunky, especially if we also want to include other output than just the invocation itself. We can do better:

import Debug.NonInterleavedIO.Scoped qualified as Scoped

g4 :: HasCallStack => IO ()
g4 = do
    Scoped.putStrLn "start"
    -- f4 does something ..
    Scoped.putStrLn "middle"
    -- f4 does something else ..
    Scoped.putStrLn "end"

outputs

[g4 (Demo/Scope.hs:21:5) #1] start
[g4 (Demo/Scope.hs:23:5) #1] middle
[g4 (Demo/Scope.hs:25:5) #1] end

As the name suggests, though, there is more going on here than simply a more convenient API: Debug.Provenance.Scope offers a combinator called scoped for scoping invocations:

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = scoped g3

g3 :: HasCallStack => IO ()
g3 = scoped g4

This results in

[g4 (Demo/Scope.hs:29:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] start
[g4 (Demo/Scope.hs:31:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] middle
[g4 (Demo/Scope.hs:33:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] end

Threads

The counters that are part of an Invocation can be very useful to cross-reference output messages from multiple threads. Continuing with the g4 example we introduced in the section on Scope, suppose we have

concurrent :: IO ()
concurrent = concurrently_ g4 g4

we might get output like this:

[g4 (Demo/Scope.hs:32:5) #1] start
[g4 (Demo/Scope.hs:32:5) #2] start
[g4 (Demo/Scope.hs:34:5) #1] middle
[g4 (Demo/Scope.hs:34:5) #2] middle
[g4 (Demo/Scope.hs:36:5) #1] end
[g4 (Demo/Scope.hs:36:5) #2] end

(where the scheduling between the two thread might be different, of course).

Scope is always thread local, but debuggable provides a way to explicitly inherit the scope of a parent thread in a child thread:

h1 :: IO ()
h1 = h2

h2 :: HasCallStack => IO ()
h2 = scoped h3

h3 :: HasCallStack => IO ()
h3 = scoped $ do
    tid <- myThreadId
    concurrently_
      (inheritScope tid >> g4)
      (inheritScope tid >> g4)

results in

[g4 (Demo/Scope.hs:34:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:34:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:36:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:36:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:38:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end
[g4 (Demo/Scope.hs:38:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end

Callbacks

Suppose we have some functions which take another function, a callback, as argument, and invoke that callback at some point:

f1 :: HasCallStack => (Int -> IO ()) -> IO ()
f1 k = f2 k

f2 :: HasCallStack => (Int -> IO ()) -> IO ()
f2 k = scoped $ k 1

Let’s use this example callback function:

g1 :: HasCallStack => Int -> IO ()
g1 n = g2 n

g2 :: HasCallStack => Int -> IO ()
g2 n = Scoped.putStrLn $ "n = " ++ show n ++ " at " ++ prettyCallStack callStack

and invoke f1 as follows:

withoutDebuggable :: HasCallStack => IO ()
withoutDebuggable = f1 g1

This outputs:

[g2 (Demo/Callback.hs:26:8) #1, f2 (Demo/Callback.hs:20:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:29:24 in ..
    withoutDebuggable, called at Demo.hs:25:36 in ..

Confusingly, this callstack does not include any calls to f1 or f2. This happens because the call to k in f2 does not pass the current CallStack; instead we see the CallStack as it was when we defined g1.

For callbacks like this it is often useful to have two pieces of information: the CallStack that shows how the callback is actually invoked, and the CallSite where the callback was defined. Debug.Provenance.Callback provides a Callback abstraction that does exactly this. A Callback m a b is essentially a function a -> m b, modulo treatment of the CallStack. Let’s change f1 and f2 to take a CallBack instead:

h1 :: HasCallStack => Callback IO Int () -> IO ()
h1 k = h2 k

h2 :: HasCallStack => Callback IO Int () -> IO ()
h2 k = scoped $ invokeCallback k 1

If we now use this top-level function

useDebuggable :: HasCallStack => IO ()
useDebuggable = h1 (callback g1)

we get a much more useful CallStack:

[g2 (Demo/Callback.hs:26:8) #1, h2 (Demo/Callback.hs:39:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:42:30 in ..
    callbackFn, called at src/Debug/Provenance/Callback.hs:57:48 in ..
    invoking callback defined at useDebuggable (Demo/Callback.hs:42:21), called at ..
    h2, called at Demo/Callback.hs:36:8 in ..
    h1, called at Demo/Callback.hs:42:17 in ..
    useDebuggable, called at Demo.hs:26:36 in ..

Alternative: profiling backtraces

in addition to HasCallStack-style backtraces, there may also be other types of backtraces available, depending on how we build and how we run the code (we discuss some of these in the context of exception handling in episode 29 of the Haskell Unfolder). The most important of these is probably the profiling (cost centre) backtrace.

We can request the “current” callstack with currentCallstack, and the callstack attached to an object (“where was this created”) using whoCreated. This allows us to make similar distinctions that we made in Callback, for example:

f1 :: (Int -> IO ()) -> IO ()
f1 k = do
    cs <- whoCreated k
    putStrLn $ "f1: invoking callback defined at " ++ show (cs)
    f2 k

f2 :: (Int -> IO ()) -> IO ()
f2 k = k 1

g1 :: Int -> IO ()
g1 n = g2 n

g2 :: Int -> IO ()
g2 n = do
    cs <- currentCallStack
    putStrLn $ "n = " ++ show n ++ " at " ++ show cs

This does require the code to be compiled with profiling enabled. The profiling callstacks are sometimes more useful than HasCallstack callstacks, and sometimes worse; for example, in

demo :: Maybe Int -> IO ()
demo Nothing  = f1 (\x -> g1 x)
demo (Just i) = f1 (\x -> g1 (x + i))

the function defined in the Just case will have a useful profiling callstack, but since the function defined in the Nothing case is entirely static (does not depend on any runtime info), its callstack is reported as

["MAIN.DONT_CARE (<built-in>)"]

It would be useful to extend debuggable with support for both types of backtraces in a future release.

Performance considerations

Adding permanent HasCallStack constraints to functions does come at a slight cost, since they correspond to additional arguments that must be passed at runtime. For most functions this is not a huge deal; personally, I consider some well-placed HasCallStack constraints part of designing with debugging in mind. That said, you will probably want to avoid adding HasCallStack constraints to functions that get called repeatedly in tight inner loops; similar considerations also apply to the use of the Callback abstraction.

Conclusions

Although debuggable is a small library, it offers some functionality that has proven quite useful in debugging applications, especially concurrent ones. We can probably extend it over time to cover more use cases; “design for debuggability” is an important principle, and is made easier with proper library support. Contributions and comments are welcome!

As a side note, the tracing infrastructure of debuggable can also be combined with the recover-rtti package, which implements some dark magic to recover runtime type information by looking at the heap; in particular, it offers

anythingToString :: forall a. a -> String

which can be used to print objects without having a Show a instance available (though this is not the only use of recover-rtti). The only reason that debuggable doesn’t provide explicit support for this is that the dependency footprint of recover-rtti is a bit larger.

by edsko at December 06, 2024 12:00 AM

December 04, 2024

Well-Typed.Com

The Haskell Unfolder Episode 37: solving Advent of Code 2024 day 4

Today, 2024-12-04, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 37th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 37: solving Advent of Code 2024 day 4

In this episode of the Haskell Unfolder, we are going to try solving the latest problem of this year’s Advent of Code live.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at December 04, 2024 12:00 AM

December 02, 2024

GHC Developer Blog

GHC 9.8.4 is now available

GHC 9.8.4 is now available

Ben Gamari - 2024-12-02

The GHC developers are happy to announce the availability of GHC 9.8.4. Binary distributions, source distributions, and documentation are available on the release page.

This release is a small release fixing a few issues noted in 9.8.3, including:

  • Update the filepath submodule to avoid a misbehavior of splitFileName under Windows.

  • Update the unix submodule to fix a compilation issue on musl platforms

  • Fix a potential source of miscompilation when building large projects on 32-bit platforms

  • Fix unsound optimisation of prompt# uses

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Happy compiling!

  • Ben

by ghc-devs at December 02, 2024 12:00 AM

December 01, 2024

Magnus Therning

Servant and a weirdness in Keycloak

When writing a small tool to interface with Keycloak I found an endpoint that require the content type to be application/json while the body should be plain text. (The details are in the issue.) Since servant assumes that the content type and the content match (I know, I'd always thought that was a safe assumption to make too) it doesn't work with ReqBody '[JSON] Text. Instead I had to create a custom type that's a combination of JSON and PlainText, something that turned out to required surprisingly little code:

data KeycloakJSON deriving (Typeable)

instance Accept KeycloakJSON where
    contentType _ = "application" // "json"

instance MimeRender KeycloakJSON Text where
    mimeRender _ = fromStrict . encodeUtf8

The bug has already been fixed in Keycloak, but I'm sure there are other APIs with similar weirdness so maybe this will be useful to someone else.

December 01, 2024 10:00 PM

Christopher Allen

Rebuilding Rust (Leptos) apps quickly

I'm working on a side project that is written in Rust on the backend and the frontend. The frontend component is in Leptos. Our app is about 20kLOC in total, so it takes a little time.

by Unknown at December 01, 2024 12:00 AM

November 29, 2024

Mark Jason Dominus

A complex bug with a ⸢simple⸣ fix

Last month I did a fairly complex piece of systems programming that worked surprisingly well. But it had one big bug that took me a day to track down.

One reason I find the bug interesting is that it exemplifies the sort of challenges that come up in systems programming. The essence of systems programming is that your program is dealing with the state of a complex world, with many independent agents it can't control, all changing things around. Often one can write a program that puts down a wrench and then picks it up again without looking. In systems programming, the program may have to be prepared for the possibility that someone else has come along and moved the wrench.

The other reason the bug is interesting is that although it was a big bug, fixing it required only a tiny change. I often struggle to communicate to nonprogrammers just how finicky and fussy programming is. Nonprogrammers, even people who have taken a programming class or two, are used to being harassed by crappy UIs (or by the compiler) about missing punctuation marks and trivially malformed inputs, and they think they understand how fussy programming is. But they usually do not. The issue is much deeper, and I think this is a great example that will help communicate the point.

The job of my program, called sync-spam, was to move several weeks of accumulated email from system S to system T. Each message was probably spam, but its owner had not confirmed that yet, and the message was not yet old enough to be thrown away without confirmation.

The probably-spam messages were stored on system S in a directory hierarchy with paths like this:

    /spam/2024-10-18/…

where 2024-10-18 was the date the message had been received. Every message system S had received on October 18 was somewhere under /spam/2024-10-18.

One directory, the one for the current date, was "active", and new messages were constantly being written to it by some other programs not directly related to mine. The directories for the older dates never changed. Once sync-spam had dealt with the backlog of old messages, it would continue to run, checking periodically for new messages in the active directory.

The sync-spam program had a database that recorded, for each message, whether it had successfully sent that message from S to T, so that it wouldn't try to send the same message again.

The program worked like this:

  • Repeat forever:
    1. Scan the top-level spam directory for the available dates
    2. For each date D:
      1. Scan the directory for D and find the messages in it. Add to the database any messages not already recorded there.
      2. Query the database for the list of messages for date D that have not yet been sent to T
      3. For each such message:
        1. Attempt to send the message
        2. If the attempt was successful, record that in the database
    3. Wait some appropriate amount of time and continue.

Okay, very good. The program would first attempt to deal with all the accumulated messages in roughly chronological order, processing the large backlog. Let's say that on November 1 it got around to scanning the active 2024-11-01 directory for the first time. There are many messages, and scanning takes several minutes, so by the time it finishes scanning, some new messages will be in the active directory that it hasn't seen. That's okay. The program will attempt to send the messages that it has seen. The next time it comes around to 2024-11-01 it will re-scan the directory and find the new messages that have appeared since the last time around.

But scanning a date directory takes several minutes, so we would prefer not to do it if we don't have to. Since only the active directory ever changes, if the program is running on November 1, it can be sure that none of the directories from October will ever change again, so there is no point in its rescanning them. In fact, once we have located the messages in a date directory and recorded them in the database, there is no point in scanning it again unless it is the active directory, the one for today's date.

So sync-spam had an elaboration that made it much more efficient. It was able to put a mark on a date directory that meant "I have completely scanned this directory and I know it will not change again". The algorithm was just as I said above, except with these elaborations.

  • Repeat forever:
    1. Scan the top-level spam directory for the available dates
    2. For each date D:
        • If the directory for D is marked as having already been scanned, we already know exactly what messages are in it, since they are already recorded in the database.
        • Otherwise:
          1. Scan the directory for D and find the messages in it. Add to the database any messages not already recorded there.
          2. If D is not today's date, mark the directory for D as having been scanned completely, because we need not scan it again.
      1. Query the database for the list of messages for date D that have not yet been sent to T
      2. For each such message:
        1. Attempt to send the message
        2. If the attempt was successful, record that in the database
    3. Wait some appropriate amount of time and continue.

It's important to not mark the active directory as having been completely scanned, because new messages are continually being deposited into it until the end of the day.

I implemented this, we started it up, and it looked good. For several days it processed the backlog of unsent messages from September and October, and it successfully sent most of them. It eventually caught up to the active directory for the current date, 2024-11-01, scanned it, and sent most of the messages. Then it went back and started over again with the earliest date, attempting to send any messages that it hadn't sent the first time.

But a couple of days later, we noticed that something was wrong. Directories 2024-11-02 and 2024-11-03 had been created and were well-stocked with the messages that had been received on those dates. The program had found the directories for those dates and had marked them as having been scanned, but there were no messages from those dates in its database.

Now why do you suppose that is?

(Spoilers will follow the horizontal line.)

I investigate this in two ways. First, I made sync-spam's logging more detailed and looked at the results. While I was waiting for more logs to accumulate, I built a little tool that would generate a small, simulated spam directory on my local machine, and then I ran sync-spam against the simulated messages, to make sure it was doing what I expected.

In the end, though, neither of these led directly to my solving the problem; I just had a sudden inspiration. This is very unusual for me. Still, I probably wouldn't have had the sudden inspiration if the information from the logging and the debugging hadn't been percolating around my head. Fortune favors the prepared mind.


The problem was this: some other agent was creating the 2024-11-02 directory a bit prematurely, say at 11:55 PM on November 1.

Then sync-spam came along in the last minutes of November 1 and started its main loop. It scanned the spam directory for available dates, and found 2024-11-02. It processed the unsent messages from the directories for earlier dates, then looked at 2024-11-02 for the first time. And then, at around 11:58, as per above it would:

  1. Scan the directory for 2024-11-02 and find the messages in it. Add to the database any messages not already recorded there.

There weren't any yet, because it was still 11:58 on November 1.

  1. If 2024-11-02 is not today's date, mark the directory as having been scanned completely, because we need not scan it again.

Since the 2024-11-02 directory was not the one for today's date — it was still 11:58 on November 1 — sync-spam recorded that it had scanned that directory completely and need not scan it again.

Five minutes later, at 00:03 on November 2, there would be new messages in the 2024-11-02, which was now the active directory, but sync-spam wouldn't look for them, because it had already marked 2024-11-02 as having been scanned completely.

This complex problem in this large program was completely fixed by changing:

        if ($date ne $self->current_date) {
          $self->mark_this_date_fully_scanned($date_dir);
        }

to:

        if ($date lt $self->current_date) {
          $self->mark_this_date_fully_scanned($date_dir);
        }

(ne and lt are Perl-speak for "not equal to" and "less than".)

Many organizations have their own version of a certain legend, which tells how a famous person from the past was once called out of retirement to solve a technical problem that nobody else could understand. I first heard the General Electric version of the legend, in which Charles Proteus Steinmetz was called out of retirement to figure out why a large complex of electrical equipment was not working.

In the story, Steinmetz walked around the room, looking briefly at each of the large complicated machines. Then, without a word, he took a piece of chalk from his pocket, marked one of the panels, and departed. When the puzzled engineers removed that panel, they found a failed component, and when that component was replaced, the problem was solved.

Steinmetz's consulting bill for $10,000 arrived the following week. Shocked, the bean-counters replied that $10,000 seemed an exorbitant fee for making a single chalk mark, and, hoping to embarrass him into reducing the fee, asked him to itemize the bill.

Steinmetz returned the itemized bill:

One chalk mark $1.00
Knowing where to put it $9,999.00
TOTAL $10,000.00

This felt like one of those times. Any day when I can feel a connection with Charles Proteus Steinmetz is a good day.

This episode also makes me think of the following variation on an old joke:

A: Ask me what is the most difficult thing about systems programming.

B: Okay, what is the most difficult thing ab—

A: TIMING!

by Mark Dominus (mjd@plover.com) at November 29, 2024 03:11 PM

GHC Developer Blog

GHC 9.12.1-rc1 is now available

GHC 9.12.1-rc1 is now available

Zubin Duggal - 2024-11-29

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 29, 2024 12:00 AM

November 28, 2024

Christopher Allen

The cost of hosting is too damn high

I recently migrated a side project from DigitalOcean to some dedicated servers. I thought that I would offer some context and examples for why.

by Unknown at November 28, 2024 12:00 AM

November 27, 2024

Brent Yorgey

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Posted on November 27, 2024
Tagged , , , , ,

Suppose we have a list of items of length \(n\), and we want to consider windows (i.e. contiguous subsequences) of width \(w\) within the list.

A list of numbers, with contiguous size-3 windows highlighted

We can compute the sum of each window by brute force in \(O(nw)\) time, by simply generating the list of all the windows and then summing each. But, of course, we can do better: keep track of the sum of the current window; every time we slide the window one element to the right we can add the new element that enters the window on the right and subtract the element that falls of the window to the left. Using this “sliding window” technique, we can compute the sum of every window in only \(O(n)\) total time instead of \(O(nw)\).

How about finding the maximum of every window? Of course the brute force \(O(nw)\) algorithm still works, but doing it in only \(O(n)\) is considerably trickier! We can’t use the same trick as we did for sums since there’s no way to “subtract” the element falling off the left. This really comes down to the fact that addition forms a group (i.e. a monoid-with-inverses), but max does not. So more generally, the question is: how can we compute a monoidal summary for every window in only \(O(n)\) time?

Today I want to show you how to solve this problem using one of my favorite competitive programming tricks, which fits beautifully in a functional context. Along the way we’ll also see how to implement simple yet efficient functional queues.

Stacks

Before we get to queues, we need to take a detour through stacks. Stacks in Haskell are pretty boring. We can just use a list, with the front of the list corresponding to the top of the stack. However, to make things more interesting—and because it will come in very handy later—we’re going to implement monoidally-annotated stacks. Every element on the stack will have a measure, which is a value from some monoid m. We then want to be able to query any stack for the total of all the measures in \(O(1)\). For example, perhaps we want to always be able to find the sum or max of all the elements on a stack.

If we wanted to implement stacks annotated by a group, we could just do something like this:

data GroupStack g a = GroupStack (a -> g) !g [a]

That is, a GroupStack stores a measure function, which assigns to each element of type a a measure of type g (which is intended to be a Group); a value of type g representing the sum (via the group operation) of measures of all elements on the stack; and the actual stack itself. To push, we would just compute the measure of the new element and add it to the cached g value; to pop, we subtract the measure of the element being popped, something like this:

push :: a -> GroupStack g a -> GroupStack g a
push a (GroupStack f g as) = GroupStack f (f a <> g) (a:as)

pop :: GroupStack g a -> Maybe (a, GroupStack g a)
pop (GroupStack f g as) = case as of
  [] -> Nothing
  (a:as') -> GroupStack f (inv (f a) <> g) as'

But this won’t work for a monoid, of course. The problem is pop, where we can’t just subtract the measure for the element being popped. Instead, we need to be able to restore the measure of a previous stack. Hmmm… sounds like we might be able to use… a stack! We could just store a stack of measures alongside the stack of elements; even better is to store a stack of pairs. That is, each element on the stack is paired with an annotation representing the sum of all the measures at or below it. Here, then, is our representation of monoidally-annotated stacks:

{-# LANGUAGE BangPatterns #-}

module Stack where

data Stack m a = Stack (a -> m) !Int [(m, a)]

A Stack m a stores three things:

  1. A measure function of type a -> m.Incidentally, what if we want to be able to specify an arbitrary measure for each element, and even give different measures to the same element at different times? Easy: just use (m,a) pairs as elements, and use fst as the measure function.

  2. An Int representing the size of the stack. This is not strictly necessary, especially since one could always just use a monoidal annotation to keep track of the size; but wanting the size is so ubiquitous that it seems convenient to just include it as a special case.

  3. The aforementioned stack of (annotation, element) pairs.

Note that we cannot write a Functor instance for Stack m, since a occurs contravariantly in (a -> m). But this makes sense: if we change all the a values, the cached measures would no longer be valid.

When creating a new, empty stack, we have to specify the measure function; to get the measure of a stack, we just look up the measure on top, or return mempty for an empty stack.

new :: (a -> m) -> Stack m a
new f = Stack f 0 []

size :: Stack m a -> Int
size (Stack _ n _) = n

measure :: Monoid m => Stack m a -> m
measure (Stack _ _ as) = case as of
  [] -> mempty
  (m, _) : _ -> m

Now let’s implement push and pop. Both are relatively straightforward.

push :: Monoid m => a -> Stack m a -> Stack m a
push a s@(Stack f n as) = Stack f (n + 1) ((f a <> measure s, a) : as)

pop :: Stack m a -> Maybe (a, Stack m a)
pop (Stack f n as) = case as of
  [] -> Nothing
  (_, a) : as' -> Just (a, Stack f (n - 1) as')

Note that if we care about using non-commutative monoids, in the implementation of push we have a choice to make between f a <> measure s and measure s <> f a. The former seems nicer to me, since it keeps the measures “in the same order” as the list representing the stack. For example, if we push a list of elements onto a stack via foldr, using the measure function (:[]) that injects each element into the monoid of lists, the resulting measure is just the original list:

measure . foldr push (new (:[])) == id

And more generally, for any measure function f, we have

measure . foldr push (new f) == foldMap f

Finally, we are going to want a function to reverse a stack, which is a one-liner:

reverse :: Monoid m => Stack m a -> Stack m a
reverse (Stack f _ as) = foldl' (flip push) (new f) (map snd as)

That is, to reverse a stack, we extract the elements and then use foldl' to push the elements one at a time onto a new stack using the same measure function.

There is a bit more code you can find on GitHub, such as Show and Eq instances.

Queues

Now that we have monoidally-annotated stacks under our belt, let’s turn to queues. And here’s where my favorite trick is revealed: we can implement a queue out of two stacks, so that enqueue and dequeue run in \(O(1)\) amortized time; and if we use monoidally-annotated stacks, we get monoidally-annotated queues for free!

First, some imports.

{-# LANGUAGE ImportQualifiedPost #-}

module Queue where

import Data.Bifunctor (second)
import Stack (Stack)
import Stack qualified as Stack

A Queue m a just consists of two stacks, one for the front and one for the back. To create a new queue, we just create two new stacks; to get the size of a queue, we just add the sizes of the stacks; to get the measure of a queue, we just combine the measures of the stacks. Easy peasy.

type CommutativeMonoid = Monoid

data Queue m a = Queue {getFront :: Stack m a, getBack :: Stack m a}
  deriving (Show, Eq)

new :: (a -> m) -> Queue m a
new f = Queue (Stack.new f) (Stack.new f)

size :: Queue m a -> Int
size (Queue front back) = Stack.size front + Stack.size back

measure :: CommutativeMonoid m => Queue m a -> m
measure (Queue front back) = Stack.measure front <> Stack.measure back

Note the restriction to commutative monoids, since the queue elements are stored in different orders in the front and back stacks. If we really cared about making this work with non-commutative monoids, we would have to make two different push methods for the front and back stacks, to combine the measures in opposite orders. That just doesn’t seem worth it. But if you have a good example requiring the use of a queue annotated by a non-commutative monoid, I’d love to hear it!

Now, to enqueue, we just push the new element on the back:

enqueue :: CommutativeMonoid m => a -> Queue m a -> Queue m a
enqueue a (Queue front back) = Queue front (Stack.push a back)

Dequeueing is the magic bit that makes everything work. If there are any elements in the front stack, we can just pop from there. Otherwise, we need to first reverse the back stack into the front stack. This means dequeue may occasionally take \(O(n)\) time, but it’s still \(O(1)\) amortized.The easiest way to see this is to note that every element is touched exactly three times: once when it is pushed on the back; once when it is transferred from the back to the front; and once when it is popped from the front. So, overall, we do \(O(1)\) work per element.

dequeue :: CommutativeMonoid m => Queue m a -> Maybe (a, Queue m a)
dequeue (Queue front back)
  | Stack.size front == 0 && Stack.size back == 0 = Nothing
  | Stack.size front == 0 = dequeue (Queue (Stack.reverse back) front)
  | otherwise = second (\front' -> Queue front' back) <$> Stack.pop
  front

Finally, for convenience, we can make a function drop1 which just dequeues an item from the front of a queue and throws it away.

drop1 :: CommutativeMonoid m => Queue m a -> Queue m a
drop1 q = case dequeue q of
  Nothing -> q
  Just (_, q') -> q'

This “banker’s queue” method of building a queue out of two stacks is discussed in Purely Functional Data Structures by Okasaki, though I don’t think he was the first to come up with the idea. It’s also possible to use some clever tricks to make both enqueue and dequeue take \(O(1)\) time in the worst case. In a future post I’d like to do some benchmarking to compare various queue implementations (i.e. banker’s queues, Data.Sequence, circular array queues built on top of STArray). At least anecdotally, in solving some sliding window problems, banker’s queues seem quite fast so far.

Sliding windows

I hope you can see how this solves the initial motivating problem: to find e.g. the max of a sliding window, we can just put the elements in a monoidally-annotated queue, enqueueing and dequeueing one element every time we slide the window over.More generally, of course, it doesn’t even matter if the left and right ends of the window stay exactly in sync; we can enqueue and dequeue as many times as we want.

The following windows function computes the monoidal sum foldMap f window for each window of width \(w\), in only \(O(n)\) time overall.

windows :: CommutativeMonoid m => Int -> (a -> m) -> [a] -> [m]
windows w f as = go startQ rest
 where
  (start, rest) = splitAt w as
  startQ = foldl' (flip enqueue) (new f) start

  go q as =
    measure q : case as of
      [] -> []
      a : as -> go (enqueue a (drop1 q)) as

“But…maximum and minimum do not form monoids, only semigroups!” I hear you cry. Well, we can just adjoin special positive or negative infinity elements as needed, like so:

data Max a = NegInf | Max a deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Max a) where
  NegInf <> a = a
  a <> NegInf = a
  Max a <> Max b = Max (max a b)

instance Ord a => Monoid (Max a) where
  mempty = NegInf

data Min a = Min a | PosInf deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Min a) where
  PosInf <> a = a
  a <> PosInf = a
  Min a <> Min b = Min (min a b)

instance Ord a => Monoid (Min a) where
  mempty = PosInf

Now we can write, for example, windows 3 Max [1,4,2,8,9,4,4,6] which yields [Max 4, Max 8, Max 9, Max 9, Max 9, Max 6], the maximums of each 3-element window.

Challenges

If you’d like to try solving some problems using the techniques from this blog post, I can recommend the following (generally in order of difficulty):

In a future post I’ll walk through my solution to Hockey Fans. And here’s another couple problems along similar lines; unlike the previous problems I am not so sure how to solve these in a nice way. I may write about them in the future.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 27, 2024 12:00 AM

November 21, 2024

Tweag I/O

GHC's wasm backend now supports Template Haskell and ghci

Two years ago I wrote a blog post to announce that the GHC wasm backend had been merged upstream. I’ve been too lazy to write another blog post about the project since then, but rest assured, the project hasn’t stagnated. A lot of improvements have happened after the initial merge, including but not limited to:

  • Many, many bugfixes in the code generator and runtime, witnessed by the full GHC testsuite for the wasm backend in upstream GHC CI pipelines. The GHC wasm backend is much more robust these days compared to the GHC-9.6 era.
  • The GHC wasm backend can be built and tested on macOS and aarch64-linux hosts as well.
  • Earlier this year, I landed the JSFFI feature for wasm. This lets you call JavaScript from Haskell and vice versa, with seamless integration of JavaScript async computation and Haskell’s green threading concurrency model. This allows us to support Haskell frontend frameworks like reflex & miso, and we have an example repo to demonstrate that.

And…the GHC wasm backend finally supports Template Haskell and ghci!

Show me the code!

$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org'
$ wasm32-wasi-ghc --interactive
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
ghci>

Or if you prefer the non-Nix workflow:

$ curl https://gitlab.haskell.org/haskell-wasm/ghc-wasm-meta/-/raw/master/bootstrap.sh | sh
...
Everything set up in /home/terrorjack/.ghc-wasm.
Run 'source /home/terrorjack/.ghc-wasm/env' to add tools to your PATH.
$ . ~/.ghc-wasm/env
$ wasm32-wasi-ghc --interactive
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
ghci>

Both the Nix and non-Nix installation methods default to GHC HEAD, for which binary artifacts for Linux and macOS hosts, for both x86_64 and aarch64, are provided. The Linux binaries are statically linked so they should work across a wide range of Linux distros.

If you take a look at htop, you’ll notice wasm32-wasi-ghc spawns a node child process. That’s the “external interpreter” process that runs our Template Haskell (TH) splice code as well as ghci bytecode. We’ll get to what this “external interpreter” is about later, just keep in mind that whatever code is typed into this ghci session is executed on the wasm side, not on the native side.

Now let’s run some code. It’s been six years since I published the first blog post when I joined Tweag and worked on a prototype compiler codenamed “Asterius”; the first Haskell program I managed to compile to wasm was fib, time to do that again:

ghci> :{
ghci| fib :: Int -> Int
ghci| fib 0 = 0
ghci| fib 1 = 1
ghci| fib n = fib (n - 2) + fib (n - 1)
ghci| :}
ghci> fib 10
55

It works, though with <semantics>O(2n)<annotation encoding="application/x-tex">O(2^n)</annotation></semantics>O(2n) time complexity. It’s easy to do an <semantics>O(n)<annotation encoding="application/x-tex">O(n)</annotation></semantics>O(n) version, using the canonical Haskell fib implementation based on a lazy infinite list:

ghci> :{
ghci| fib :: Int -> Int
ghci| fib = (fibs !!)
ghci|   where
ghci|     fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs)
ghci| :}
ghci> fib 32
2178309

That’s still boring isn’t it? Now buckle up, we’re gonna do an <semantics>O(1)<annotation encoding="application/x-tex">O(1)</annotation></semantics>O(1) implementation… using Template Haskell!

ghci> import Language.Haskell.TH
ghci> :{
ghci| genFib :: Int -> Q Exp
ghci| genFib n =
ghci|   pure $
ghci|     LamCaseE
ghci|       [ Match (LitP $ IntegerL $ fromIntegral i) (NormalB $ LitE $ IntegerL r) []
ghci|       | (i, r) <- zip [0 .. n] fibs
ghci|       ]
ghci|   where
ghci|     fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs)
ghci| :}
ghci> :set -XTemplateHaskell
ghci> :{
ghci| fib :: Int -> Int
ghci| fib = $(genFib 32)
ghci| :}
ghci> fib 32
2178309

Joking aside, the real point is not about how to implement fib, but rather to demonstrate that the GHC wasm backend indeed supports Template Haskell and ghci now.

Here’s a quick summary of wasm’s TH/ghci support status:

  • The patch has landed in the GHC master branch and will be present in upstream release branches starting from ghc-9.12. I also maintain non-official backport branches in my fork, and wasm TH/ghci has been backported to 9.10 as well. The GHC release branch bindists packaged by ghc-wasm-meta are built from my branches.
  • TH splices that involve only pure computation (e.g. generating class instances) work. Simple file I/O also works, so file-embed works. Side effects are limited to those supported by WASI, so packages like gitrev won’t work because you can’t spawn subprocesses in WASI. The same restrictions apply to ghci.
  • Our wasm dynamic linker can load bytecode and compiled code, but the only form of compiled code it can load are wasm shared libraries. If you’re using wasm32-wasi-ghc directly to compile code that involves TH, make sure to pass -dynamic-too to ensure the dynamic flavour of object code is also generated. If you’re using wasm32-wasi-cabal, make sure shared: True is present in the global config file ~/.ghc-wasm/.cabal/config.
  • The wasm TH/ghci feature requires at least cabal-3.14 to work (the wasm32-wasi-cabal shipped in ghc-wasm-meta is based on the correct version).
  • Our novel JSFFI feature also works in ghci! You can type foreign import javascript declarations directly into a ghci session, use that to import sync/async JavaScript functions, and even export Haskell functions as JavaScript ones.
  • If you have c-sources/cxx-sources in a cabal package, those can be linked and run in TH/ghci out of the box. However, more complex forms of C/C++ foreign library dependencies like pkgconfig-depends, extra-libraries, etc. will require special care to build both static and dynamic flavours of those libraries.
  • For ghci, hot reloading and basic REPL functionality works, but the ghci debugger doesn’t work yet.

What happens under the hood?

For the curious mind, -opti-v can be passed to wasm32-wasi-ghc. This tells GHC to pass -v to the external interpreter, so the external interpreter will print all messages passed between it and the host GHC process:

$ wasm32-wasi-ghc --interactive -opti-v
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
GHC iserv starting (in: {handle: <file descriptor: 2147483646>}; out: {handle: <file descriptor: 2147483647>})
[             dyld.so] reading pipe...
[             dyld.so] discardCtrlC
...
[             dyld.so] msg: AddLibrarySearchPath ...
...
[             dyld.so] msg: LoadDLL ...
...
[             dyld.so] msg: LookupSymbol "ghczminternal_GHCziInternalziBase_thenIO_closure"
[             dyld.so] writing pipe: Just (RemotePtr 2950784)
...
[             dyld.so] msg: CreateBCOs ...
[             dyld.so] writing pipe: [RemoteRef (RemotePtr 33)]
...
[             dyld.so] msg: EvalStmt (EvalOpts {useSandboxThread = True, singleStep = False, breakOnException = False, breakOnError = False}) (EvalApp (EvalThis (RemoteRef (RemotePtr 34))) (EvalThis (RemoteRef (RemotePtr 33))))
4
[             dyld.so] writing pipe: EvalComplete 15248 (EvalSuccess [RemoteRef (RemotePtr 36)])
...

Why is any message passing involved in the first place? There’s a past blog post which contains an overview of cross compilation issues in Template Haskell, most of the points still hold today, and apply to both TH as well as ghci. To summarise:

  • When GHC cross compiles and evaluates a TH splice, it has to load and run code that’s compiled for the target platform. Compiling both host/target code and running host code for TH is never officially supported by GHC/Cabal.
  • The “external interpreter” runs on the target platform and handles target code. Messages are passed between the host GHC and the external interpreter, so GHC can tell the external interpreter to load stuff, and the external interpreter can send queries back to GHC when running TH splices.

In the case of wasm, the core challenge is dynamic linking: to be able to interleave code loading and execution at run-time, all while sharing the same program state. Back when I worked on Asterius, it could only link a self-contained wasm module that wasn’t able to share any code/data with other Asterius-linked wasm modules at run-time.

So I went with a hack: when compiling each single TH splice, just link a temporary wasm module and run it, get the serialized result and throw it away! That completely bypasses the need to make a wasm dynamic linker. Needless to say, it’s horribly slow and doesn’t support cross-splice state or ghci. Though it is indeed sufficient to support compiling many packages that use TH.

Now it’s 2024, time to do it the right way: implement our own wasm dynamic linker! Some other toolchains like emscripten also support dynamic linking of wasm, but there’s really no code to borrow here: each wasm dynamic linker is tailored to that toolchain’s specific needs, and we have JSFFI-related custom sections in our wasm code that can’t be handled by other linkers anyway.

Our wasm dynamic linker supports loading exactly one kind of wasm module: wasm shared libraries. This is something that you get by compiling C with wasm32-wasi-clang -shared, which enables generation of position-independent code. Such machine code can be placed anywhere in the address space, making it suitable for run-time code loading. A wasm shared library is yet another wasm module; it imports the linear memory and function table, and you can specify any base address for memory data and functions.

So I rolled up my sleeves and got to work. Below is a summary of the journey I took towards full TH & ghci support in the GHC wasm backend:

  • Step one was to have a minimum NodeJS script to load libc.so: it is the bottom of all shared library dependencies, the first and most important one to be loaded. It took me many cans of energy drink to debug mysterious memory corruptions! But finally I could invoke any libc function and do malloc/free, etc. from the NodeJS REPL, with the wasm instance state properly persisted.
  • Then load multiple shared libraries up to libc++.so and running simple C++ snippets compiled to .so. Dependency management logic of shared libraries is added at this step: the dynamic linker traverses the dependency tree of a .so, spawns async WebAssembly.compile tasks, then sequentially loads the dynamic libraries based on their topological order.
  • Then figure out a way to emit wasm position-independent-code from GHC’s wasm backend’s native code generator. The GHC native code generator emits a .s assembly file for the target platform, and while assembly format for x86_64 or aarch64, etc. is widely taught, there’s really no tutorial nor blog post to teach me about assembly syntax for wasm! Luckily, learning from Godbolt output examples was easy enough and I quickly figured out how the position-independent entities are represented in the assembly syntax.
  • The dynamic linker can now load the Haskell ghci shared library! It contains the default implementation of the external interpreter; it almost worked out of the box, though the linker needed some special logic to handle the piping logic across wasm/JS and the host GHC process.
  • In ghci, the logic to load libraries, lookup symbols, etc. are calling into the RTS linker on other platforms. Given all the logic exists on the JS side instead of C for wasm, they are patched to call back into the linker using JSFFI imports.
  • The GHC build system and driver needed quite a few adjustments, to ensure that shared libraries are generated for the wasm target when TH/ghci is involved. Thanks to Matthew Pickering for his patient and constructive review of my patch, I was able to replace many hacks in the GHC driver with more principled approaches.
  • The GHC driver also needs to learn to handle the wasm flavour of the external interpreter. Thanks to the prior work of the JS backend team here, my life is a lot easier when adding wasm external interpreter logic.
  • The GHC testsuite also needed quite a bit of work. In the end, there are over 1000 new test case passes after I flip on TH/ghci support for the wasm target.

What comes next?

The GHC wasm backend TH/ghci feature is way faster and more robust than what I hacked in Asterius back then. One nice example I’d like to show off here is pandoc-wasm: it’s finally possible to compile our beloved pandoc tool to wasm again since Asterius is deprecated.

The new pandoc-wasm is more performant not only at run-time, but also at compile-time. On a GitHub-hosted runner with just 4 CPU cores and 16 GB of memory, it takes around 16min to compile pandoc from scratch, and the time consumption can even be halved on my own laptop with peak memory usage at around 10.8GB. I wouldn’t doubt that time/memory usage can triple or more with legacy GHC-based compilers like Asterius or GHCJS to compile the same codebase!

The work on wasm TH/ghci is not fully finished yet. I do have some things in mind to work on next:

  • Support running the wasm external interpreter in the browser via puppeteer. So your ghci session can connect to the browser, all your Haskell code runs in the browser main thread, and all JSFFI logic in your code can access the browser’s window context. This would allow you to do Haskell frontend livecoding using ghci.
  • Support running an interactive ghci session within the browser. Which means a truly client side Haskell playground in the browser. It’ll only support in-memory bytecode, since it can’t invoke compiler processes to do any heavy lifting, but it’s still good for teaching purposes.
  • Maybe make it even faster? Performance isn’t my concern right now, though I haven’t done any serious profiling and optimization in the wasm dynamic linker either, so we’ll see.
  • Fix ghci debugger support.

You’re welcome to join the Haskell wasm Matrix room to chat about the GHC wasm backend. Do get in touch if you feel it is useful to your project!

November 21, 2024 12:00 AM

November 18, 2024

Haskell Interlude

58: ICFP 2024

In this episode, Matti and Sam traveled to the International Conference on Functional Programming (ICFP 2024) in Milan, Italy, and recorded snippets with various participants, including keynote speakers, Haskell legends, and organizers.

by Haskell Podcast at November 18, 2024 04:00 PM

Brent Yorgey

Competitive Programming in Haskell: Union-Find, part II

Competitive Programming in Haskell: Union-Find, part II

Posted on November 18, 2024
Tagged , ,

In my previous post I explained how to implement a reasonably efficient union-find data structure in Haskell, and challenged you to solve a couple Kattis problems. In this post, I will (1) touch on a few generalizations brought up in the comments of my last post, (2) go over my solutions to the two challenge problems, and (3) briefly discuss generalizing the second problem’s solution to finding max-edge decompositions of weighted trees.

Generalizations

Before going on to explain my solutions to those problems, I want to highlight some things from a comment by Derek Elkins and a related blog post by Philip Zucker. The first is that instead of (or in addition to) annotating each set with a value from a commutative semigroup, we can also annotate the edges between nodes with elements from a group (or, more generally, a groupoid). The idea is that each edge records some information about, or evidence for, the relationship between the endpoints of the edge. To compute information about the relationship between two arbitrary nodes in the same set, we can compose elements along the path between them. This is a nifty idea—I have never personally seen it used for a competitive programming problem, but it probably has been at some point. (It kind of makes me want to write such a problem!) And of course it has “real” applications beyond competitive programming as well. I have not actually generalized my union-find code to allow edge annotations; I leave it as an exercise for the reader.

The other idea to highlight is that instead of thinking in terms of disjoint sets, what we are really doing is building an equivalence relation, which partitions the elements into disjoint equivalence classes. In particular, we do this by incrementally building a relation \(R\), where the union-find structure represents the reflexive, transitive, symmetric closure of \(R\). We start with the empty relation \(R\) (whose reflexive, transitive, symmetric closure is the discrete equivalence relation, with every element in its own equivalence class); every \(\mathit{union}(x,y)\) operation adds \((x,y)\) to \(R\); and the \(\mathit{find}(x)\) operation computes a canonical representative of the equivalence class of \(x\). In other words, given some facts about which things are related to which other things (possibly along with some associated evidence), the union-find structure keeps track of everything we can infer from the given facts and the assumption that the relation is an equivalence.

Finally, through the comments I also learned about other potentially-faster-in-practice schemes for doing path compression such as Rem’s Algorithm; I leave it for future me to try these out and see if they speed things up.

Now, on to the solutions!

Duck Journey

In Duck Journey, we are essentially given a graph with edges labelled by bitstrings, where edges along a path are combined using bitwise OR. We are then asked to find the greatest possible value of a path between two given vertices, assuming that we are allowed to retrace our steps as much as we want.Incidentally, if we are not allowed to retrace our steps, this problem probably becomes NP-hard.

If we can retrace our steps, then on our way from A to B we might as well visit every edge in the entire connected component, so this problem is not really about path-finding at all. It boils down to two things: (1) being able to quickly test whether two given vertices are in the same connected component or not, and (2) computing the bitwise OR of all the edge labels in each connected component.

One way to solve this would be to first use some kind of graph traversal, like DFS, to find the connected components and build a map from vertices to component labels; then partition the edges by component and take the bitwise OR of all the edge weights in each component. To answer queries we could first look up the component label of the two vertices; if the labels are the same then we look up the total weight for that component.

This works, and is in some sense the most “elemantary” solution, but it requires building some kind of graph data structure, storing all the edges in memory, doing the component labelling via DFS and building another map, and so on. An alternative solution is to use a union-find structure with a bitstring annotation for each set: as we read in the edges in the input, we simply union the endpoints of the edge, and then update the bitstring for the resulting equivalence class with the bitstring for the edge. If we take a union-find library as given, this solution seems simpler to me.

First, some imports and the top-level main function. (See here for the ScannerBS module.)

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}

module Main where

import Control.Category ((>>>))
import Control.Monad.ST
import Data.Bits
import Data.ByteString.Lazy.Char8 (ByteString)
import Data.ByteString.Lazy.Char8 qualified as BS

import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner tc >>> solve >>> format

format :: [Maybe Int] -> ByteString
format = map (maybe "-1" (show >>> BS.pack)) >>> BS.unlines

Next, some data types to represent the input, and a Scanner to read it.

-- Each edge is a "filter" represented as a bitstring stored as an Int.
newtype Filter = Filter Int
  deriving (Eq, Show)

instance Semigroup Filter where
  Filter x <> Filter y = Filter (x .|. y)

filterSize :: Filter -> Int
filterSize (Filter f) = popCount f

data Channel = Channel UF.Node UF.Node Filter deriving (Eq, Show)
data TC = TC {n :: !Int, channels :: [Channel], queries :: [(Int, Int)]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  m <- int
  q <- int
  channels <- m >< (Channel <$> int <*> int <*> (Filter <$> int))
  queries <- q >< pair int int
  return TC {..}

Finally, here’s the solution itself: process each channel with a union-find structure, then process queries. The annoying thing, of course, is that this all has to be in the ST monad, but other than that it’s quite straightforward.

solve :: TC -> [Maybe Int]
solve TC {..} = runST $ do
  uf <- UF.new (n + 1) (Filter 0)
  mapM_ (addChannel uf) channels
  mapM (answer uf) queries

addChannel :: UF.UnionFind s Filter -> Channel -> ST s ()
addChannel uf (Channel a b f) = do
  UF.union uf a b
  UF.updateAnn uf a f

answer :: UF.UnionFind s Filter -> (Int, Int) -> ST s (Maybe Int)
answer uf (a, b) = do
  c <- UF.connected uf a b
  case c of
    False -> pure Nothing
    True -> Just . filterSize <$> UF.getAnn uf a

Inventing Test Data

In Inventing Test Data, we are given a tree \(T\) with integer weights on its edges, and asked to find the minimum possible weight of a complete graph for which \(T\) is the unique minimum spanning tree (MST).



Let \(e = (x,y)\) be some edge which is not in \(T\). There must be a unique path between \(x\) and \(y\) in \(T\) (so adding \(e\) to \(T\) would complete a cycle); let \(m\) be the maximum weight of the edges along this path. Then I claim that we must give edge \(e\) weight \(m+1\):

  • On the one hand, this ensures \(e\) can never be in any MST, since an edge which is strictly the largest edge in some cycle can never be part of an MST (this is often called the “cycle property”).
  • Conversely, if \(e\) had a weight less than or equal to \(m\), then \(T\) would not be a MST (or at least not uniquely): we could remove any edge in the path from \(x\) to \(y\) through \(T\) and replace it with \(e\), resulting in a spanning tree with a lower (or equal) weight.

Hence, every edge not in \(T\) must be given a weight one more than the largest weight in the unique \(T\)-path connecting its endpoints; these are the minimum weights that ensure \(T\) is a unique MST.

A false start

At first, I thought what we needed was a way to quickly compute this max weight along any path in the tree (where by “quickly” I mean something like “faster than linear in the length of the path”). There are indeed ways to do this, for example, using a heavy-light decomposition and then putting a data structure on each heavy path that allows us to query subranges of the path quickly. (If we use a segment tree on each path we can even support operations to update the edge weights quickly.)

All this is fascinating, and something I may very well write about later. But it doesn’t actually help! Even if we could find the max weight along any path in \(O(1)\), there are still \(O(V^2)\) edges to loop over, which is too big. There can be up to \(V = 15\,000\) nodes in the tree, so \(V^2 = 2.25 \times 10^8\). A good rule of thumb is \(10^8\) operations per second, and there are likely to be very high constant factors hiding in whatever complex data structures we use to query paths efficiently.

So we need a way to somehow process many edges at once. As usual, a change in perspective is helpful; to get there we first need to take a slight detour.

Kruskal’s Algorithm

It helps to be familiar with Kruskal’s Algorithm, which is the simplest algorithm I know for finding minimum spanning trees:

  • Sort the edges from smallest to biggest weight.
  • Initialize \(T\) to an empty set of edges.
  • For each edge \(e\) in order from smallest to biggest:
    • If \(e\) does not complete a cycle with the other edges already in \(T\), add \(e\) to \(T\).

To efficiently check whether \(e\) completes a cycle with the other edges in \(T\), we can use a union-find, of course: we maintain equivalence classes of vertices under the “is connected to” equivalence relation; adding \(e\) would complete a cycle if and only if the endpoints of \(e\) are already connected to each other in \(T\). If we do add an edge \(e\), we can just \(\mathit{union}\) its endpoints to properly maintain the relation.

A change of perspective

So how does this help us solve “Inventing Test Data”? After all, we are not being directly asked to find a minimum spanning tree. However, it’s still helpful to think about the process Kruskal’s Algorithm would go through, in order to choose edge weights that will force it to do what we want (i.e. pick all the edges in \(T\)). That is, instead of thinking about each individual edge not in \(T\), we can instead think about the edges that are in \(T\), and what must be true to force Kruskal’s algorithm to pick each one.

Suppose we are part of the way through running Kruskal’s algorithm, and that it is about to consider a given edge \(e = (x,y) \in T\) which has weight \(w_e\). At this point it has already considered any edges with smaller weight, and (we shall assume) chosen all the smaller-weight edges in \(T\). So let \(X\) be the set of vertices reachable from \(x\) by edges in \(T\) with weight less than or equal to \(w_e\), and similarly let \(Y\) be those reachable from \(y\). Kruskal’s algorithm will pick edge \(e\) after checking that \(X\) and \(Y\) are disjoint.



Think about all the other edges from \(X\) to \(Y\): all of them must have weight greater than \(w_e\), because otherwise Kruskal’s algorithm would have already considered them earlier, and used one of them to connect \(X\) and \(Y\). In fact, all of these edges must have weight \(w_e + 1\), as we argued earlier, since \(e\) is the largest-weight edge on the \(T\)-path between their endpoints (all the other edges on these paths were already chosen earlier and hence have smaller weight). The number of such edges is just \(|X| |Y| - 1\) (there is an edge for every pair of vertices, but we do not want to count \(e\) itself). Hence they contribute a total of \((|X||Y| - 1)(w_e + 1)\) to the sum of edge weights.

Hopefully the solution is now becoming clear: we process the edges of \(T\) in order from smallest to biggest, using a union-find to keep track equivalence classes of connected vertices so far. For each edge \((x,y)\) we look up the sizes of the equivalence classes of \(x\) and \(y\), add \((|X||Y| - 1)(w_e + 1)\) to a running total, and union. This accounts for all the edges not in \(T\); finally we must also add the weights of the edges in \(T\) themselves.

First some standard pragmas and imports, along with some data types and a Scanner to parse the input. Note the custom Ord instance for Edge, so we can sort edges by weight.

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE RecordWildCards #-}

import Control.Category ((>>>))
import Control.Monad.ST
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List (sort)
import Data.Ord (comparing)
import Data.STRef
import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner (numberOf tc) >>> map (solve >>> show >>> BS.pack) >>> BS.unlines

data Edge = Edge {a :: !Int, b :: !Int, w :: !Integer}
  deriving (Eq, Show)

instance Ord Edge where
  compare = comparing w

data TC = TC {n :: !Int, edges :: [Edge]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  edges <- (n - 1) >< (Edge <$> int <*> int <*> integer)
  return TC {..}

Finally, the (remarkably short) solution proper: we sort the edges and process them from smallest to biggest; for each edge we update an accumulator according to the formula discussed above. Since we’re already tied to the ST monad anyway, we might as well keep the accumulator in a mutable STRef cell.

solve :: TC -> Integer
solve TC {..} = runST $ do
  uf <- UF.new (n + 1)
  total <- newSTRef (0 :: Integer)
  mapM_ (processEdge uf total) (sort edges)
  readSTRef total

processEdge :: UF.UnionFind s -> STRef s Integer -> Edge -> ST s ()
processEdge uf total (Edge a b w) = do
  modifySTRef' total (+ w)
  sa <- UF.size uf a
  sb <- UF.size uf b
  modifySTRef' total (+ (fromIntegral sa * fromIntegral sb - 1) * (w + 1))
  UF.union uf a b

Max-edge decomposition



Incidentally, there’s something a bit more general going on here: for a given nonempty weighted tree \(T\), a max-edge decomposition of \(T\) is a binary tree defined as follows:

  • The max-edge decomposition of a trivial single-vertex tree is a single vertex.
  • Otherwise, the max-edge decomposition of \(T\) consists of a root node with two children, which are the max-edge decompositions of the two trees that result from deleting a largest-weight edge from \(T\).

Any max-edge decomposition of a tree \(T\) with \(n\) vertices will have \(n\) leaf nodes and \(n-1\) internal nodes. Typically we think of the leaf nodes of the decomposition as being labelled by the vertices of \(T\), and the internal nodes as being labelled by the edges of \(T\).

An alternative way to think of the max-edge decomposition is as the binary tree of union operations performed by Kruskal’s algorithm while building \(T\), starting with each vertex in a singleton leaf and then merging two trees into one with every union operation. Thinking about, or even explicitly building, this max-edge decomposition occasionally comes in handy. For example, see Veður and Toll Roads.

Incidentally, I can’t remember whether I got the term “max-edge decomposition” from somewhere else or if I made it up myself; in any case, regardless of what it is called, I think I first learned of it from this blog post by Petr Mitrichev.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 18, 2024 12:00 AM

November 17, 2024

Eric Kidd

9½ years of Rust in production (and elsewhere)

The first stable release of Rust was on May 15, 2015, just about 9½ years ago. My first “production” Rust code was a Slack bot, which talked to GoCD to control the rollout of a web app. This was utterly reliable. And so new bits of Rust started popping up.

I’m only going to talk about open source stuff here. This will be mostly production projects, with a couple of weekend projects thrown in. Each project will ideally get its own post over the next couple of months.

Planned posts

Here are some of the tools I’d like to talk about:

  1. Moving tables easily between many databases (dbcrossbar)
  2. 700-CPU batch jobs
  3. Geocoding 60,000 addresses per second
  4. Interlude: Neural nets from scratch in Rust
  5. Lots of CSV munging
  6. Interlude: Language learning using subtitles, Anki, Whisper and ChatGPT
  7. Transpiling BigQuery SQL for Trino (a work in progress)

I’ll update this list to link to the posts. Note that I may not get to all of these!

Maintaining Rust & training developers

One of the delightful things about Rust is the low rate of “bit rot”. If something worked 5 years ago—and if it wasn’t linked against the C OpenSSL libraries—then it probably works unchanged today. And if it doesn’t, you can usually fix it in 20 minutes. This is largely thanks to Rust’s “stability without stagnation” policy, the Edition system, and the Crater tool which is used to nest new Rust releases against the entire ecosystem.

The more interesting questions are (1) when should you use Rust, and (2) how do you make sure your team can use it?

Read more…

November 17, 2024 04:08 PM

November 14, 2024

Gabriella Gonzalez

The Haskell inlining and specialization FAQ

The Haskell inlining and specialization FAQ

This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I’ve also structured it as a blog post that you can read from top to bottom.

What is inlining?

“Inlining” means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:

module Example where

x :: Int
x = 5

y :: Int
y = x + 1

… then at compile time the Haskell compiler can (and will) substitute the last occurrence of x with its definition (i.e. 5):

y :: Int
y = 5 + 1

… which then allows the compiler to further simplify the code to:

y :: Int
y = 6

In fact, we can verify that for ourselves by having the compiler dump its intermediate “core” representation like this:

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… which will produce this output:

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 20, types: 7, coercions: 0, joins: 0/0}

x = I# 5#

$trModule4 = "main"#

$trModule3 = TrNameS $trModule4

$trModule2 = "Example"#

$trModule1 = TrNameS $trModule2

$trModule = Module $trModule3 $trModule1

y = I# 6#

… which we can squint a little bit and read it as:

x = 5

y = 6

… and ignore the other stuff.

A slightly more interesting example of inlining is a function call, like this one:

f :: Int -> Int
f x = x + 1

y :: Int
y = f 5

The compiler will be smart enough to inline f by replacing f 5 with 5 + 1 (here x is 5):

y :: Int
y = 5 + 1

… and just like before the compiler will simplify that further to y = 6, which we can verify from the core output:

y = I# 6#

What is specialization?

“Specialization” means replacing a “polymorphic” function with a “monomorphic” function. A “polymorphic” function is a function whose type has a type variable, like this one:

-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… and a “monomorphic” version of the same function replaces the type variable with a specific (concrete) type or type constructor:

example2 :: Maybe Int -> Maybe Int
example2 = fmap (+ 1)

Notice that example and example2 are defined in the same way, but they are not exactly the same function:

  • example is more flexible and works on strictly more type constructors

    example works on any type constructor f that implements Functor, whereas example2 only works on the Maybe type constructor (which implements Functor).

  • example and example2 compile to very different core representations

In fact, they don’t even have the same “shape” as far as GHC’s core representation is concerned. Under the hood, the example function takes two extra “hidden” function arguments compared to example2, which we can see if you dump the core output (and I’ve tidied up the output a lot for clarity):

example @f $Functor = fmap $Functor (\v -> v + 1)

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

The two extra function arguments are:

  • @f: This represents the type variable f

    Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.

  • $Functor: This represents the Functor instance for f

    Yes, the Functor instance for a type like f is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.

Notice how the compiler cannot optimize example as well as it can optimize example2 because the compiler doesn’t (yet) know which type constructor f we’re going to call example on and also doesn’t (yet) know which Functor f instance we’re going to use. However, once the compiler does know which type constructor we’re using it can optimize a lot more.

In fact, we can see this for ourselves by changing our code a little bit to simply define example2 in terms of example:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

example2 :: Maybe Int -> Maybe Int
example2 = example

This compiles to the exact same code as before (you can check for yourself if you don’t believe me).

Here we would say that example2 is “example specialized to the Maybe type constructor”. When write something like this:

example2 :: Maybe Int -> Maybe Int
example2 = example

… what’s actually happening under the hood is that the compiler is actually doing something like this:

example2 = example @Maybe $FunctorMaybe

In other words, the compiler is taking the more general example function (which works on any type constructor f and any Functor f instance) and then “applying” it to a specific type constructor (@Maybe) and the corresponding Functor instance ($FunctorMaybe).

In fact, we can see this for ourselves if we generate core output with optimization disabled (-O0 instead of -O2) and if we remove the -dsuppress-all flag:

$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs

This outputs (among other things):

…

example2 = example @Maybe $FunctorMaybe
…

And when we enable optimizations (with -O2):

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… then GHC inlines the definition of example and simplifies things further, which is how it generates this much more optimized core representation for example2:

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

In fact, specialization is essentially the same thing as inlining under the hood (I’m oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:

  • specialization simplifies function calls with “type-level” arguments

    By “type-level” arguments I mean (hidden) function arguments that are types, type constructors, and type class instances

  • inlining simplifies function calls with “term-level” arguments

    By “term-level” arguments I mean the “ordinary” (visible) function arguments you know and love

Does GHC always inline or specialize code?

NO. GHC does not always inline or specialize code, for two main reasons:

  • Inlining is not always an optimization

    Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.

  • Inlining a function requires access to the function’s source code

    In particular, if the function is defined in a different module from where the function is used (a.k.a. the “call site”) then the call site does not necessarily have access to the function’s source code.

To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate “compilation unit”), and the compiler generates two outputs when compiling a module:

  • a .o file containing object code (e.g. Example.o)

    This object code is what is linked into the final executable to generate a runnable program.

  • a .hi file containing (among other things) source code

    The compiler can optionally store the source code for any compiled functions inside this .hi file so that it can inline those functions when compiling other modules.

However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:

  • this slows down compilation

    This slows down compilation both for the “upstream” module (the module defining the function we might want to inline) and the “downstream” module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the .hi file and the downstream module takes longer to compile because inlining isn’t free (all optimizations, including inlining, generate more work for the compiler).

  • this makes the .hi file bigger

    The .hi file gets bigger because it’s storing the source code of the function.

  • this can also make the object code larger, too

    Inlining a function multiple times can lead to duplicating the corresponding object code for that function.

This is why by default the compiler uses its own heuristic to decide which functions are worth storing in the .hi file. The compiler does not indiscriminately save the source code for all functions.

You can override the compiler’s heuristic, though, using …

Compiler directives

There are a few compiler directives (a.k.a. “pragmas”) related to inlining and specialization that we’ll cover here:

  • INLINABLE
  • INLINE
  • NOINLINE
  • SPECIALIZE

My general rule of thumb for these compiler directives is:

  • don’t use any compiler directive until you benchmark your code to show that it helps
  • if you do use a compiler directive, INLINABLE is probably the one you should pick

I’ll still explain what what all the compiler directives mean, though.

INLINABLE

INLINABLE is a compiler directive that you use like this:

f :: Int -> Int
f x = x + 1
{-# INLINABLE f #-}

The INLINABLE directive tells the compiler to save the function’s source code in the .hi file in order to make that function available for inlining downstream. HOWEVER, INLINABLE does NOT force the compiler to inline that function. The compiler will still use its own judgment to decide whether or not the function should be inlined (and the compiler’s judgment tends to be fairly good).

INLINE

INLINE is a compiler directive that you use in a similar manner as INLINABLE:

f :: Int -> Int
f x = x + 1
{-# INLINE f #-}

INLINE behaves like INLINABLE except that it also heavily biases the compiler in favor of inlining the function. There are still some cases where the compiler will refuse to fully inline the function (for example, if the function is recursive), but generally speaking the INLINE directive override’s the compiler’s own judgment for whether or not to inline the function.

I would argue that you usually should prefer the INLINABLE pragma over the INLINE pragma because the compiler’s judgment for whether or not to inline things is usually good. If you override the compiler’s judgment there’s a good chance you’re making things worse unless you have benchmarks showing otherwise.

NOINLINE

If you mark a function as NOINLINE:

f :: Int -> Int
f x = x + 1
{-# NOINLINE f #-}

… then the compiler will refuse to inline that function. It’s pretty rare to see people use a NOINLINE annotation for performance reasons (although there are circumstances where NOINLINE can be an optimization). It’s far, far, far more common to see people use NOINLINE in conjunction with unsafePerformIO because that’s what the unsafePerformIO documentation recommends:

Use {-# NOINLINE foo #-} as a pragma on any function foo that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.

SPECIALIZE

SPECIALIZE lets you hint to the compiler that it should compile a polymorphic function for a monomorphic type ahead of time. For example, if we define a polymorphic function like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… we can tell the compiler to go ahead and specialize the example function for the special case where f is Maybe, like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)
{-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}

This tells the compiler to go ahead and compile the more specialized version, too, because we expect some other module to use that more specialized version. This is nice if we want to get the benefits of specialization without exporting the function’s source code (so we don’t bloat the .hi file) or if we want more precise control over when specialize does and does not happen.

In practice, though, I find that most Haskell programmers don’t want to go to the trouble of anticipating and declaring all possible specializations, which is why I endorse INLINABLE as the more ergonomic alternative to SPECIALIZE.

by Gabriella Gonzalez (noreply@blogger.com) at November 14, 2024 04:58 PM

GHC Developer Blog

GHC 9.12.1-alpha3 is now available

GHC 9.12.1-alpha3 is now available

Zubin Duggal - 2024-11-14

The GHC developers are very pleased to announce the availability of the third alpha release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 14, 2024 12:00 AM

November 08, 2024

Donnacha Oisín Kidney

POPL Paper—Formalising Graph Algorithms with Coinduction

Posted on November 8, 2024
Tags:

New paper: “Formalising Graph Algorithms with Coinduction”, by myself and Nicolas Wu, will be published at POPL 2025.

The preprint is available here.

The paper is about representing graphs (especially in functional languages). We argue in the paper that graphs are naturally coinductive, rather than inductive, and that many of the problems with graphs in functional languages go away once you give up on induction and pattern-matching, and embrace the coinductive way of doing things.

Of course, coinduction comes with its own set of problems, especially when working in a total language or proof assistant. Another big focus of the paper was figuring out a representation that was amenable to formalisation (we formalised the paper in Cubical Agda). Picking a good representation for formalisation is a tricky thing: often a design decision you make early on only looks like a mistake after a few thousand lines of proofs, and modern formal proofs tend to be brittle, meaning that it’s difficult to change an early definition without also having to change everything that depends on it. On top of this, we decided to use quotients for an important part of the representation, and (as anyone who’s worked with quotients and coinduction will tell you) productivity proofs in the presence of quotients can be a real pain.

All that said, I think the representation we ended up with in the paper is quite nice. We start with a similar representation to the one we had in our ICFP paper in 2021: a graph over vertices of type a is simply a function a -> [a] that returns the neighbours of a supplied vertex (this is the same representation as in this post). Despite the simplicity, it turns out that this type is enough to implement a decent number of search algorithms. The really interesting thing is that the arrow methods (from Control.Arrow) work on this type, and they define an algebra on graphs similar to the one from Mokhov (2017). For example, the <+> operator is the same as the overlay operation in Mokhov (2017).

That simple type gets expanded upon and complicated: eventually, we represent a possibly-infinite collection as a function that takes a depth and then returns everything in the search space up to that depth. It’s a little like representing an infinite list as the partial application of the take function. The paper spends a lot of time picking an algebra that properly represents the depth, and figuring out coherency conditions etc.

One thing I’m especially proud of is that all the Agda code snippets in the paper are hyperlinked to a rendered html version of the code. Usually, when I want more info on some code snippet in a paper, I don’t really want to spend an hour or so downloading some artefact, installing a VM, etc. What I actually want is just to see all of the definitions the snippet relies on, and the 30 or so lines of code preceding it. With this paper, that’s exactly what you get: if you click on any Agda code in the paper, you’re brought to the source of that code block, and every definition is clickable so you can browse without having to install anything.

I think the audience for this paper is anyone who is interested in graphs in functional languages. It should be especially interesting to people who have dabbled in formalising some graphs, but who might have been stung by an uncooperative proof assistant. The techniques in the second half of the paper might help you to convince Agda (or Idris, or Rocq) to accept your coinductive and quotient-heavy arguments.

Mokhov, Andrey. 2017. “Algebraic Graphs with Class (Functional Pearl).” In Proceedings of the 10th ACM SIGPLAN International Symposium on Haskell, 2–13. Haskell 2017. New York, NY, USA: ACM. doi:10.1145/3122955.3122956.

by Donnacha Oisín Kidney at November 08, 2024 12:00 AM

November 07, 2024

Tweag I/O

Exploring Effect in TypeScript: Simplifying Async and Error Handling

Effect is a powerful library for TypeScript developers that brings functional programming techniques into managing effects and errors. It aims to be a comprehensive utility library for TypeScript, offering a range of tools that could potentially replace specialized libraries like Lodash, Zod, Immer, or RxJS.

In this blog post, we will introduce you to Effect by creating a simple weather widget app. This app will allow users to search for weather information by city name, making it a good example as it involves API data fetching, user input handling, and error management. We will implement this project in both vanilla TypeScript and using Effect to demonstrate the advantages Effect brings in terms of code readability and maintainability.

What is Effect?

Effect promises to improve TypeScript code by providing a set of modules and functions that are composable with maximum type-safety. The term “effect” refers to an effect system, which provides a declarative approach to handling side effects. Side effects are operations that have observable consequences in the real world, like logging, network requests, database operations, etc. The library revolves around the Effect<Success, Error, Requirements> type, which can be used to represent an immutable value that lazily describes a workflow or job. Effects are not functions by themselves, they are descriptions of what should be done. They can be composed with other effects, and they can be interpreted by the Effect runtime system. Before we dive into the project we will build, let’s look at some basic concepts of Effect.

Creating effects

We can create an effect based on a value using the Effect.succeed and Effect.fail functions:

const success: Effect.Effect<number, never, never> = Effect.succeed(42)

const fail: Effect.Effect<never, Error, never> = Effect.fail(
  new Error("Something went wrong")
)
  • An effect with never as the Error means it never fails
  • An effect with never as the Success means it never produces a successful value.
  • An effect with never as the Requirements means it doesn’t require any context to run.

With the functions above, we can create effects like this:

const divide = (a: number, b: number): Effect.Effect<number, Error, never> =>
  b === 0
    ? Effect.fail(new Error("Cannot divide by zero"))
    : Effect.succeed(a / b)

To create an effect based on a function, we can use the Effect.sync and Effect.promise for synchronous and asynchronous functions that can’t fail, respectively, and Effect.try and Effect.tryPromise for synchronous and asynchronous functions that can fail.

// Synchronous function that can't fail
const log = (message: string): Effect.Effect<void, never, never> =>
  Effect.sync(() => console.log(message))

// Asynchronous function that can't fail
const delay = (message: string): Effect.Effect<string, never, never> =>
  Effect.promise<string>(
    () =>
      new Promise(resolve => {
        setTimeout(() => {
          resolve(message)
        }, 2000)
      })
  )

// Synchronous function that can fail
const parse = (input: string): Effect.Effect<any, Error, never> =>
  Effect.try({
    // JSON.parse may throw for bad input
    try: () => JSON.parse(input),
    // remap the error
    catch: _unknown => new Error(`something went wrong while parsing the JSON`),
  })

// Asynchronous function that can fail
const getTodo = (id: number): Effect.Effect<Response, Error, never> =>
  Effect.tryPromise({
    // fetch can throw for network errors
    try: () => fetch(`https://jsonplaceholder.typicode.com/todos/${id}`),
    // remap the error
    catch: unknown => new Error(`something went wrong ${unknown}`),
  })

For more details about creating effects you can check the Effect documentation.

Running effects

In order to run an effect, we need to use the appropriate function depending on the effect type. In our application we’ll use the Effect.runPromise function, which is used for effects that are asynchronous and can’t fail:

Effect.runPromise(delay("Hello, World!")).then(console.log)
// -> Hello, World! (after 2 seconds)

You can read about other ways to run effects, and what happens when you don’t use the correct function, in the “Running Effects” page of the Effect documentation.

Pipe

When writing a program using Effect, we usually need to run a sequence of operations, and we can use the pipe function to compose them:

const double = (n: number) => n * 2

const divide =
  (b: number) =>
  (a: number): Effect.Effect<number, Error> =>
    b === 0
      ? Effect.fail(new Error("Cannot divide by zero"))
      : Effect.succeed(a / b)

const increment = (n: number) => Effect.succeed(n + 1)

const result = pipe(
  42,
  // Here we have an Effect.Effect<number, Error> with the value 21
  divide(2),
  // To run a function over the value changing the effect's value, we use Effect.map
  Effect.map(double),
  // To run a function over the value without changing the effect's value, we use Effect.tap
  Effect.tap(n => console.log(`The double is ${n}`)),
  // To run a function that returns a new effect, we use Effect.andThen
  Effect.andThen(increment),
  Effect.tap(n => console.log(`The incremented value is ${n}`))
)

Effect.runSync(result)
// -> The double is 42
// -> The incremented value is 43

If you want to know more about the pipe function, you can check this page on the Effect documentation.

The project

Now that we have a basic understanding of Effect, we can start the project! We will build a simple weather app in which the user types the name of a city, selects the desired one from a list of suggestions, and then the app shows the current weather in that city.
The project will have three main components: the input field, the list of suggestions, and the weather information.

We will use the Open-Meteo API to get the weather information as it doesn’t require an API key.

Setup

We begin by creating a new TypeScript project:

mkdir weather-app
cd weather-app
npm init -y

Next, we install the dependencies. We will use Parcel to bundle the project as it works without any configuration:

npm install --save-dev parcel

Now we create the project structure:

mkdir src
touch src/index.html
touch src/styles.scss
touch src/index.ts

The index.html file contains a main element with sections: one with a text input for city input and another for displaying weather information.

You can check the HTML and SCSS code in the GitHub repository.

In order to run the project, we need to add the following keys to the package.json file:

{
  "source": "./src/index.html",
  "scripts": {
    "dev": "parcel",
    "build": "parcel build"
  }
}

Now we can run the project:

npm run dev

Server running at http://localhost:1234
✨ Built in 8ms

By accessing the URL, you should see the application, but it won’t work yet.

Figure 1. Application's initial state
Figure 1. Application's initial state

Let’s write the TypeScript code!

Without Effect

All the following code examples should be placed in the src/index.ts file.

First, we query the elements from the DOM:

// The field input
const cityElement = document.querySelector<HTMLInputElement>("#city")
// The list of suggestions
const citiesElement = document.querySelector<HTMLUListElement>("#cities")
// The weather information
const weatherElement = document.querySelector<HTMLDivElement>("#weather")

Next, we’ll define the types for the data we’ll fetch from the API.
To validate the data, we’ll use a library called Zod. Zod is a TypeScript-first schema declaration and validation library.

npm install zod

First, we define the schema by using z.object and, for each property, we use z.string, z.number and other functions to define its type:

import { z } from "zod"

// ...

const CityResponse = z.object({
  name: z.string(),
  country_code: z.string().length(2),
  latitude: z.number(),
  longitude: z.number(),
})

const GeocodingResponse = z.object({
  results: z.array(CityResponse),
})

With the schema defined, we can use the z.infer utility type to infer the type of the data based on the schema:

type CityResponse = z.infer<typeof CityResponse>

type GeocodingResponse = z.infer<typeof GeocodingResponse>

Now, we create the function to fetch the cities from the Open-Meteo API. It fetches the cities that match the given name and returns a list of suggestions. In order to validate the API response, we use the safeParse method that our GeocodingResponse Zod schema provides. This method returns an object with two key properties:

  1. success: A boolean indicating if the parsing succeeded.
  2. data: The parsed data if successful, matching our defined schema.
const getCity = async (city: string): Promise<CityResponse[]> => {
  try {
    const response = await fetch(
      `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`
    )

    // Convert the response to JSON
    const geocoding = await response.json()

    // Parse the response using the GeocodingResponse schema
    const parsedGeocoding = GeocodingResponse.safeParse(geocoding)

    if (!parsedGeocoding.success) {
      return []
    }

    return parsedGeocoding.data.results
  } catch (error) {
    console.error("Error:", error)
    return []
  }
}

To make the input field work, we need to attach an event listener to it to call the getCity function:

const getCities = async function (input: HTMLInputElement) {
  const { value } = input

  // Check if the HTML element exists
  if (citiesElement) {
    // Clear the list of suggestions
    citiesElement.innerHTML = ""
  }

  // Check if the input is empty
  if (!value) {
    return
  }

  // Fetch the cities
  const results = await getCity(value)

  renderCitySuggestions(results)
}

cityElement?.addEventListener("input", function (_event) {
  getCities(this)
})

Next, we create the renderCitySuggestions function to render the list of suggestions or display an error message if there are no suggestions:

const renderCitySuggestions = (cities: CityResponse[]) => {
  // If there are cities, populate the suggestions
  if (cities.length > 0) {
    populateSuggestions(cities)
    return
  }

  // Otherwise, show a message that the city was not found
  if (weatherElement) {
    const search = cityElement?.value || "searched"
    weatherElement.innerHTML = `<p>City ${search} not found</p>`
  }
}

The populateSuggestions function is very simple - it creates a list item for each city:

const populateSuggestions = (results: CityResponse[]) =>
  results.forEach(city => {
    const li = document.createElement("li")
    li.innerText = `${city.name} - ${city.country_code}`
    citiesElement?.appendChild(li)
  })

Now if we type a city name in the input field, we should see the list of suggestions:

Figure 2. City suggestions
Figure 2. City suggestions

Great!

The next step is to implement the selectCity function that fetches the weather information of a city and displays it:

const selectCity = async (result: CityResponse) => {
  // If the HTML element doesn't exist, return
  if (!weatherElement) {
    return
  }

  try {
    const data = await getWeather(result)

    if (data.tag === "error") {
      throw data.value
    }

    const {
      temperature_2m,
      apparent_temperature,
      relative_humidity_2m,
      precipitation,
    } = data.value.current

    weatherElement.innerHTML = `
 <h2>${result.name}</h2>
 <p>Temperature: ${temperature_2m}°C</p>
 <p>Feels like: ${apparent_temperature}°C</p>
 <p>Humidity: ${relative_humidity_2m}%</p>
 <p>Precipitation: ${precipitation}mm</p>
 `
  } catch (error) {
    weatherElement.innerHTML = `<p>An error occurred while fetching the weather: ${error}</p>`
  }
}

Then we call it in the populateSuggestions function:

const populateSuggestions = (results: CityResponse[]) =>
  results.forEach(city => {
    // ...
    li.addEventListener("click", () => selectCity(city))
    citiesElement?.appendChild(li)
  })

The last piece of the puzzle is the getWeather function. Once again, we’ll use Zod to create the schema and the type for the weather information.

type WeatherResult =
  | { tag: "ok"; value: WeatherResponse }
  | { tag: "error"; value: unknown }

const WeatherResponse = z.object({
  current_units: z.object({
    temperature_2m: z.string(),
    relative_humidity_2m: z.string(),
    apparent_temperature: z.string(),
    precipitation: z.string(),
  }),
  current: z.object({
    temperature_2m: z.number(),
    relative_humidity_2m: z.number(),
    apparent_temperature: z.number(),
    precipitation: z.number(),
  }),
})

type WeatherResponse = z.infer<typeof WeatherResponse>

const getWeather = async (result: CityResponse): Promise<WeatherResult> => {
  try {
    const response = await fetch(
      `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}&current=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1`
    )

    // Convert the response to JSON
    const weather = await response.json()

    // Parse the response using the WeatherResponse schema
    const parsedWeather = WeatherResponse.safeParse(weather)

    if (!parsedWeather.success) {
      return { tag: "error", value: parsedWeather.error }
    }

    return { tag: "ok", value: parsedWeather.data }
  } catch (error) {
    return { tag: "error", value: error }
  }
}

We have a type WeatherResult for error handling; it can be ok or error. The getWeather function fetches the weather information based on the latitude and longitude of a city and returns the result. We are passing some parameters to the API to get the current temperature, humidity, apparent temperature, and precipitation. If you want to know more about these parameters, you can check the API documentation.

One last thing we need to do is to use a debounce function to avoid making too many requests to the API while the user is typing. To do that, we’ll install Lodash which provides many useful functions for everyday programming.

npm install lodash
npm install --save-dev @types/lodash

We’ll wrap the getCities function with the debounce function:

import { debounce } from "lodash"

// ...

const getCities = debounce(async function (input: HTMLInputElement) {
  // The same code as before
}, 500)

This way, the getCities function will be called only after the user stops typing for 500 milliseconds.

Our small weather app is now complete: when we type a city name in the input field, a list of suggestions is displayed, and when we click on one of them, we can see the weather information for that city.

Figure 3. Weather information
Figure 3. Weather information

While our current code works and handles errors well, let’s explore how using Effect can potentially improve its robustness and simplicity.

With Effect

To get started with Effect, we need to install it:

npm install effect

We will start by refactoring the functions in the order we implemented them in the previous section.

First, we refactor the querySelector calls. We’ll use the Option type from Effect: it represents a value that may or may not exist. If the value exists, it’s a Some, if it doesn’t, it’s a None.

import { Option } from "effect"

// The field input
const cityElement = Option.fromNullable(
  document.querySelector<HTMLInputElement>("#city")
)
// The list of suggestions
const citiesElement = Option.fromNullable(
  document.querySelector<HTMLUListElement>("#cities")
)
// The weather information
const weatherElement = Option.fromNullable(
  document.querySelector<HTMLDivElement>("#weather")
)

Using the Option type, we can chain operations without worrying about null or undefined values. This approach simplifies our code by eliminating the need for explicit null checks. We can use functions like Option.map and Option.andThen to handle the transformations and checks in a more elegant way. To know more about the Option type, take a look at the page about it in the documentation.

Now, let’s move to the getCity function. We’ll use the Schema.Struct to define the types of the CityResponse and GeocodingResponse objects. Those schemas will be used to validate the response from the API. This is the same thing we did before with Zod, but this time we don’t have to install any library. Instead, we can just use the Schema module that Effect provides.

import { /* ... */, Effect, Scope, pipe } from "effect";
import { Schema } from "@effect/schema"
import {
  FetchHttpClient,
  HttpClient,
  HttpClientResponse,
  HttpClientError
} from "@effect/platform";

// ...

const CityResponse = Schema.Struct({
  name: Schema.String,
  country_code: pipe(Schema.String, Schema.length(2)),
  latitude: Schema.Number,
  longitude: Schema.Number,
})

type CityResponse = Schema.Schema.Type<typeof CityResponse>

const GeocodingResponse = Schema.Struct({
  results: Schema.Array(CityResponse),
})

type GeocodingResponse = Schema.Schema.Type<typeof GeocodingResponse>

const getRequest = (url: string): Effect.Effect<HttpClientResponse.HttpClientResponse, HttpClientError.HttpClientError, Scope.Scope> =>
  pipe(
    HttpClient.HttpClient,
    // Using `Effect.andThen` to get the client from the `HttpClient.HttpClient` tag and then make the request
    Effect.andThen(client => client.get(url)),
    // We don't need to send the tracing headers to the API to avoid CORS errors
    HttpClient.withTracerPropagation(false),
    // Providing the HTTP client to the effect
    Effect.provide(FetchHttpClient.layer)
  )

const getCity = (city: string): Effect.Effect<readonly CityResponse[], never, never> =>
  pipe(
    getRequest(
      `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`
    ),
    // Validating the response using the `GeocodingResponse` schema
    Effect.andThen(HttpClientResponse.schemaBodyJson(GeocodingResponse)),
    // Providing a default value in case of failure
    Effect.orElseSucceed<GeocodingResponse>(() => ({ results: [] })),
    // Extracting the `results` array from the `GeocodingResponse` object
    Effect.map(geocoding => geocoding.results),
    // Providing a scope to the effect
    Effect.scoped
  )

Here we already have some interesting things happening!

The getRequest function sets up the HTTP client. While we could use the built-in fetch API as our HTTP client, Effect provides a solution called HttpClient in the @effect/platform package. It’s important to note that this package is currently in beta, as mentioned in the official documentation. Despite its beta status, we’ll be using it to explore more of Effect’s capabilities and showcase how it integrates with the broader Effect ecosystem. This choice allows us to demonstrate Effect’s approach to HTTP requests and error handling in a more idiomatic way. HttpClient.HttpClient is something called a “tag” that we can use to get the HTTP client from the context. To do that, we use the Effect.andThen function.
After that, we’re setting withTracerPropagation to false to avoid sending the tracing headers to the API and getting a CORS error.

Since we’re using the HttpClient service, it’s a requirement to our effect (remember the Effect<Success, Error, Requirements> type?) and we need to provide this requirement in order to run the effect.
With the Effect.provide function we can add a layer to the effect that provides the HttpClient service. For more information about the Effect.provide function and how it works, take a look at the runtime page on the Effect documentation.

In the getCity function, we call the getRequest function to get the response from the API. Then we validate the response using the HttpClientResponse.schemaBodyJson function, which validates the response body using the GeocodingResponse schema.
In the last line of the function, we use the Effect.scoped function to provide a scope to the effect, this is a requirement for the HttpClient service that we’re using in the getRequest function. The scope ensures that if the program is interrupted, any request will be aborted, preventing memory leaks. getCity returns a Effect.Effect<CityResponse[], never, never>: the two never means it never fails (we’re providing a default value in case of failure), and it doesn’t require any context to run.

Next, we refactor the getCities function:

import { /* ... */, Effect, Option, pipe } from "effect";

// ...

const getCities = (search: string): Effect.Effect<Option.Option<void>, never, never> => {
  Option.map(citiesElement, citiesEl => (citiesEl.innerHTML = ""))

  return pipe(
    getCity(search),
    Effect.map(renderCitySuggestions),
    // Check if the input is empty
    Effect.when(() => Boolean(search))
  )
}

We’re using the Option.map function to access the actual citiesElement and clear the list of suggestions. After that, it’s pretty straightforward: we call the getCity function with the search term, then we map the renderCitySuggestions function over the successful value, and finally, we apply a condition that makes the effect run only if the search term is not empty.

Here is how we add the event listener to the input field:

import { /* ... */, Effect, Option, pipe, Stream, Chunk, StreamEmit } from "effect";

// ...

Option.map(cityElement, cityEl => {
  const stream = Stream.async(
    (emit: StreamEmit.Emit<never, never, string, void>) =>
      cityEl.addEventListener("input", function (_event) {
        emit(Effect.succeed(Chunk.of(this.value)))
      })
  )

  pipe(
    stream,
    Stream.debounce(500),
    Stream.runForEach(getCities),
    Effect.runPromise
  )
})

Actually, we’re doing more than just adding an event listener. The debounce function that we had to import from Lodash before is now part of Effect as the Stream.debounce function. In order to use this function, we need to create a Stream.
A Stream has the type Stream<A, E, R> and it’s a program description that, when executed, can emit zero or more values of type A, handle errors of type E, and operates within a context of type R. There are a couple of ways to create a Stream, which are detailed in the page about streams in the documentation. In this case, we’re using the Stream.async function as it receives a callback that emits values to the stream.

After creating the Stream and assigning it to the stream variable, we use a pipe to build a pipeline where we debounce the stream by 500 milliseconds, run the getCities function whenever the stream gets a value (that is, when we emit a value), and finally run the effect with Effect.runPromise.

Let’s move on to the renderCitySuggestions function:

import { /* ... */, Array, Option, pipe } from "effect";

// ...

const renderCitySuggestions = (cities: readonly CityResponse[]): void | Option.Option<void> =>
  // If there are multiple cities, populate the suggestions
  // Otherwise, show a message that the city was not found
  pipe(
    cities,
    Array.match({
      onNonEmpty: populateSuggestions,
      onEmpty: () => {
        const search = Option.match(cityElement, {
          onSome: (cityEl) => cityEl.value,
          onNone: () => "searched",
        });

        Option.map(
          weatherElement,
          (weatherEl) =>
            (weatherEl.innerHTML = `<p>City ${search} not found</p>`),
        );
      },
    }),
  );

Instead of manually checking the length of the cities array, we’re using the Array.match function to handle that. If the array is empty, it calls the callback defined in the onEmpty property, and if the array is not empty, it calls the callback defined in the onNonEmpty property.

The populateSuggestions function remains almost the same. The only change is that we now wrap the forEach operation in an Option.map to safely handle the optional cities element. This ensures we only attempt to populate suggestions when the element exists.

The selectCity function is simpler now:

import { /* ... */, Option, pipe } from "effect";

// ...

const selectCity = (result: CityResponse): Option.Option<Promise<string>> =>
  Option.map(weatherElement, weatherEl =>
    pipe(
      result,
      getWeather,
      Effect.match({
        onFailure: error =>
          (weatherEl.innerHTML = `<p>An error occurred while fetching the weather: ${error}</p>`),
        onSuccess: (weatherData: WeatherResponse) =>
          (weatherEl.innerHTML = `
<h2>${result.name}</h2>
<p>Temperature: ${weatherData.current.temperature_2m}°C</p>
<p>Feels like: ${weatherData.current.apparent_temperature}°C</p>
<p>Humidity: ${weatherData.current.relative_humidity_2m}%</p>
<p>Precipitation: ${weatherData.current.precipitation}mm</p>
`),
      }),
      Effect.runPromise
    )
  )

There is no checking for the data.tag any more, we’re using the Effect.match function to handle both cases, success and failure, and we don’t throw anything anymore.

Finally, the getWeather function:

import { /* ... */, Effect, pipe } from "effect";
import { Schema, ParseResult } from "@effect/schema";
import { /* ... */, HttpClientResponse, HttpClientError } from "@effect/platform";

// ...

const WeatherResponse = Schema.Struct({
  current_units: Schema.Struct({
    temperature_2m: Schema.String,
    relative_humidity_2m: Schema.String,
    apparent_temperature: Schema.String,
    precipitation: Schema.String,
  }),
  current: Schema.Struct({
    temperature_2m: Schema.Number,
    relative_humidity_2m: Schema.Number,
    apparent_temperature: Schema.Number,
    precipitation: Schema.Number,
  }),
})

type WeatherResponse = Schema.Schema.Type<typeof WeatherResponse>

const getWeather = (
  result: CityResponse,
): Effect.Effect<WeatherResponse, HttpClientError.HttpClientError | ParseResult.ParseError, never> =>
  pipe(
    getRequest(
      `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}&current=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1`
    ),
    Effect.andThen(HttpClientResponse.schemaBodyJson(WeatherResponse)),
    Effect.scoped
  )

We’re again using the Schema.Struct to define the WeatherResponse type. However, we don’t need to have a WeatherResult anymore as the Effect type already handles the success and failure cases.

After this refactoring, the app works the same way it did before, but now we have the confidence that our code is more robust and type-safe. Let’s see the benefits of Effect when comparing to the code without it.

Conclusion

Now that we have the two versions of the application, we can analyze them and highlight the pros and cons of using Effect:

Pros

  • Type-safety: Effect provides a way to handle errors and requirements in a type-safe way and using it increases the overall type safety of our app.
  • Error handling: The Effect type has built-in error handling, making the code more robust.
  • Validation: We don’t need to use a library like Zod to validate the response - we can use the Schema module to validate the response.
  • Utility functions: We don’t need to use a library like Lodash to use utility functions. Instead, we can use the Array, Option, Stream, and other modules.
  • Declarative style: Writing code with Effect means we’re using a more declarative approach: we’re describing “what” we want our program to do, rather than “how” we want it to do it.

Cons

  • Complexity: The code is more complex than the one without Effect; it may be hard to understand for people who are not familiar with the library.
  • Learning curve: You need to learn how to use the library - it’s not as simple as writing plain TypeScript code.
  • Documentation: The documentation is good, but could be better. Some parts are not clear.

While the code written with Effect may initially appear more complex to those unfamiliar with the library, its benefits far outweigh the initial learning curve. Effect offers powerful tools for maximum type-safety, error handling, asynchronous operations, streams and more, all within a single library that is incrementally adoptable. In our project, we used two separate libraries (Zod and Lodash) to achieve what Effect accomplishes on its own.

While plain TypeScript may be adequate for small projects, we believe Effect can truly shine in larger, more complex applications. Its robust handling of side-effects and comprehensive error management have the potential to make it a game changer for taming complexity and maintaining code quality at scale.

November 07, 2024 12:00 AM

Donnacha Oisín Kidney

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Posted on November 7, 2024
Tags:

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

by Donnacha Oisín Kidney at November 07, 2024 12:00 AM

November 05, 2024

Edward Z. Yang

Ways to use torch.compile

On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn't. By the way, some of these gaps are either served by export, or by missing features we are actively working on, those will be some other posts!

Improve training efficiency on a small-medium scale

Scenario: You have a model in PyTorch that you want to train at a small-medium scale (e.g., below 1K GPUs--at the 1K point there is a phase change in behavior that deserves its own section). You would like it to train faster. Locally, it's nice to get a trained model faster than you would have otherwise. But globally, the faster everyone's models train, the less GPU hours they use, which means you can run more jobs in a given time window with a fixed cluster. If your supply of GPUs is inelastic (lol), efficiency improvement means you can support more teams and use cases for the same amount of available GPUs. At a capacity planning level, this can be a pretty big deal even if you are GPU rich.

What to do: In some sense, this is the reason we built torch.compile. (When we were initially planning torch.compile, we were trying to assess if we were going after inference; but inference compilers are a much more crowded space than training compilers, and we reasoned that if we did a good job building a training compiler, inference would work too--which it did!) The dream which we sold with torch.compile is that you could slap it on the top of your model and get a speed up. This turns out to... not quite be true? But the fact remains that if you're willing to put in some work, there is almost always performance waiting at the end of the road for you. Some tips:

  • Compile only the modules you need. You don't have to compile the entire model; there might be specific modules which are easy to compile which will give you the most of the benefit. For example, in recommendation systems, there is not much compute improvement to be had from optimizing the embedding lookups, and their model parallelism is often quite hard to handle in the compiler, so torch.compiler.disable them. NB: This doesn't apply if you want to do some global graph optimization which needs the whole model: in that case, pass fullgraph=True to torch.compile and ganbatte!
  • Read the missing manual. The missing manual is full of guidance on working with the compiler, with a particular emphasis on working on training.

Open source examples: torchtune and torchtitan are two first party libraries which are intended to showcase modern PyTorch using torch.compile in a training context. There's also some training in torchao.

Downsides:

  • The compiler is complicated. One of the things we've slowly been coming to terms with is that, uh, maybe promising you could just slap torch.compile on a model and have it run faster was overselling the feature a teensy bit? There seems to be some irreducible complexity with compilers that any user bringing their own model to torch.compile has to grapple with. So yes, you are going to spend some of your complexity budget on torch.compile, in hopes that the payoff is worth it (we think it is!) One ameliorating factor is that the design of torch.compile (graph breaks) means it is very easy to incrementally introduce torch.compile into a codebase, without having to do a ton of upfront investment.
  • Compile time can be long. The compiler is not a straightforward unconditional win. Even if the compiler doesn't slow down your code (which it can, in pathological cases), you have to spend some amount of time compiling your model (investment), which you then have to make back by training the model more quickly (return). For very small experimentation jobs, or jobs that are simply crashing, the time spent compiling is just dead weight, increasing the overall time your job takes to run. (teaser: async compilation aims to solve this.) To make matters worse, if you are scheduling your job on systems that have preemption, you might end up repeatedly compiling over and over again every time your job gets rescheduled (teaser: caching aims to solve this.) But even when you do spend some time training, it is not obvious without an A/B test whether or not you are actually getting a good ROI. In an ideal world, everyone using torch.compile would actually verify this ROI calculation, but it doesn't happen automatically (teaser: automatic ROI calculation) and in large organizations we see people running training runs without even realizing torch.compile is enabled.
  • Numerics divergence from eager. Unfortunately, the compiler does not guarantee exact bitwise equivalence with eager code; we reserve the right to do things like select different matrix multiply algorithms with different numerics or eliminate unnecessary downcast/upcasts when fusing half precision compute together. The compiler is also complicated and can have bugs that can cause loss not to converge. Expect to also have to evaluate whether or not application of torch.compile affects accuracy. Fortunately, for most uses of compiler for training efficiency, the baseline is the eager model, so you can just run an ablation to figure out who is actually causing the accuracy problem. (This won't be true in a later use case when the compiler is load bearing, see below!)

Improve Python inference efficiency

Scenario: You've finished training your model and you want to deploy it for inference. Here, you want to improve the efficiency of inference to improve response latency or reduce the overall resource requirements of the system, so you can use less GPUs to serve the traffic you are receiving. Admittedly, it is fairly common to just use some other, more inference friendly systems (which I will decline to name by name lol) to serve the model. But let's say you can't rewrite the model in a more serving friendly language (e.g., because the model authors are researchers and they keep changing the model, or there's a firehose of models and you don't have the money to keep continuously porting each of them, or you depend on an ecosystem of libraries that are only available in CPython).

What to do: If Python can keep up with the CPU-side QPS requirements, a way of getting good performance without very much work is taking the Python model, applying torch.compile on it in the same way as you did in training and directly using this as your inference solution. Some tips that go beyond training:

  • Autotuning makes the most sense for inference. In training runs, you have a limited window (the lifetime of the training job) to get return on the investment you spent optimizing the model. In the serving regime, you can amortize over the entire lifetime of your model in inference, which is typically much longer. Therefore, expensive optimization modes like mode="max-autotune" are more likely to pay off!
  • Warmup inference processes before serving traffic to them. Because torch.compile is a just-in-time compiler, you will spend quite a bit of time compiling (even if you cache hit) at startup. If you have latency requirements, you will want to warmup a fresh process with a representative set of inputs so that you can make sure you trigger all of the compilation paths you need to hit. Caching will reduce compile time but not eliminate it.
  • Try skip_guard_eval_unsafe to reduce guard overhead. Dynamo guard overhead can be material in the inference case. If this is a problem, get a nightly and try skip_guard_eval_unsafe.

Open source examples: LLM serving on torch.compile is quite popular: vllm, sglang, tensorrt-llm, gpt-fast (this is technically not an E2E serving solution, but one of its primary reasons for existing is to serve as a starting point so you can build your own torch.compile based LLM inference stack on top of it). Stable diffusion models are also notable beneficiaries of torch.compile, e.g., diffusers.

Downsides:

  • Just in time compilation is a more complicated operational model. It would be better if you didn't have to warmup inference processes before serving traffic to them. Here, torch.compile has traded operational simplicity for ease of getting started. If you wanted to guarantee that compilation had already happened ahead of time, you have to instead commit to some sort of export-based flow (e.g., C++ GPU/CPU inference) below.
  • Model and dependency packaging in Python is unaddressed. You need to somehow package and deploy the actual Python code (and all its dependencies) which constitute the model; torch.compile doesn't address this problem at all (while torch.export does). If you are running a monorepo and do continuous pushes of your infra code, it can be organizationally complicated to ensure people don't accidentally break model code that is being shipped to production--it's very common to be asked if there's a way to "freeze" your model code so that the monorepo can move on. But with Python inference you have to solve this problem yourself, whether the solution is torch.package, Docker images, or something else.
  • Caches are not guaranteed to hit. Do you have to recompile the model every time you restart the inference process? Well, no, we have an Inductor and Triton (and an in-progress AOTAutograd) cache which in principle can cache all of the cubin's that are generated by torch.compile. Most of the time, you can rely on this to reduce startup cost to Dynamo tracing the model only. However, the caches are not guaranteed to hit: there are rarer cases where we don't know how to compute the cache key for some feature a model is using, or the compiler is nondeterministic in a way that means the cache doesn't hit. You should file bugs for all of these issues as we are interested in fixing them, but we don't give a categorical guarantee that after you've compiled your inference program once, you won't have to compile it again. (And indeed, under torch.compile's user model, we can't, because the user code might be the root cause of the nondeterminism--imagine a model that is randomly sampling to decide what version of a model to run.)
  • Multithreading is currently buggy. It should, in principle, be possible to run torch.compile'd code from multiple threads in Python and get a speedup, especially when CUDA graphs or CPP wrapper is used. (Aside: Inductor's default compile target is "Python wrapper", where Inductor's individually generated Triton kernels are called from Python. In this regime, you may get in trouble due to the GIL; CUDA graphs and CPP wrapper, however, can release the GIL when the expensive work is being done.) However, it doesn't work. Track the issue at https://github.com/pytorch/pytorch/issues/136833

Like above, but the compiler is load bearing

Scenario: In both the cases above, we assumed that we had a preexisting eager model that worked, and we just wanted to make it faster. But you can also use the compiler in a load bearing way, where the model does not work without the compiler. Here are two common cases where this can occur:

  1. Performance: A compiler optimization results in an asymptotic or large constant factor improvement in performance can make a naive eager implementation that would have otherwise been hopelessly slow have good performance. For example, SimpleFSDP chooses to apply no optimizations to the distributed collectives it issues, instead relying on the compiler to bucket and prefetch them for acceptable performance.
  2. Memory: A compiler optimization reduces the memory usage of a model, can allow you to fit a model or batch size that would otherwise OOM. Although we don't publicly expose APIs for doing so, you can potentially use the compiler to do things like force a certain memory budget when doing activation checkpointing, without requiring the user to manually specify what needs to be checkpointed.

What to do: Unlike in the previous cases where you took a preexisting model and slap torch.compile, this sort of use of the compiler is more likely to arise from a codevelopment approach, where you use torch.compile while you build your model, and are constantly checking what the compiler does to the code you write. Some tips:

  • Don't be afraid to write your own optimization pass. Inductor supports custom FX optimization passes. torch.compile has done the work of getting your model into an optimizable form; you can take advantage of this to apply domain specific optimizations that Inductor may not support natively.

Open source examples. SimpleFSDP as mentioned above. VLLM uses torch.compile to apply custom optimization passes. Although its implementation is considerably more involved than what you might reasonable expect a third party to implement, FlexAttention is a good example of a non-compiler feature that relies on the compiler in a load-bearing way for performance.

Downsides: Beyond the ones mentioned above:

  • You can no longer (easily) use eager as a baseline. This is not always true; for example, FlexAttention has an eager mode that runs everything unfused which can still be fast enough for small experiments. But if you have an accuracy problem, it may be hard to compare against an eager baseline if you OOM in that case! It turns out that it's really, really useful to have access to an eager implementation, so it's worth working harder to make sure that the eager implementation works, even if it is slow. (It's less clear how to do that with, e.g., a fancy global optimization based activation checkpointing strategy.)

Next time: Ways to use torch.export

by Edward Z. Yang at November 05, 2024 03:11 PM

Jeremy Gibbons

Alan Jeffrey, 1967–2024

My friend Alan Jeffrey passed away earlier this year. I described his professional life at a Celebration in Oxford on 2nd November 2024. This post is a slightly revised version of what I said.

Edinburgh, 1983–1987

I’ve known Alan for over 40 years—my longest-standing friend. We met at the University of Edinburgh in 1983, officially as computer science freshers together, but really through the clubs for science fiction and for role-playing games. Alan was only 16: like many in Scotland, he skipped the final school year for an earlier start at university. It surely helped that his school had no computers, so he wasted no time in transferring to a university that did. His brother David says that it also helped that he would then be able to get into the student union bars.

Oxford, 1987–1991

After Edinburgh, Alan and I wound up together again as freshers at the University of Oxford. We didn’t coordinate this; we independently and simultaneously applied to the same DPhil programme (Oxford’s name for the PhD). We were officemates for those 4 years, and shared a terraced hovel on St Mary’s Road in bohemian East Oxford with three other students for most of that time. He was clever, funny, kind, and serially passionate about all sorts of things. It was a privilege and a pleasure to have known him.

Alan had a career that spanned academia and industry, and he excelled at both. He described himself as a “semanticist”: using mathematics instead of English for precise descriptions of programming languages. He had already set out in that direction with his undergraduate project on concurrency under Robin Milner at Edinburgh; and he continued to work on concurrency for his DPhil under Bill Roscoe at Oxford, graduating in 1992.

Chalmers, 1991–1992

Alan spent the last year of his DPhil as a postdoc working for K V S Prasad at Chalmers University in Sweden. While there, he was assigned to host fellow Edinburgh alumnus Carolyn Brown visiting for an interview; Carolyn came bearing a bottle of malt whisky, as one does, which she and Alan proceeded to polish off together that evening.

Sussex, 1992–1999

Carolyn’s interview was successful; but by the time she arrived at Chalmers, Alan had left for a second postdoc under Matthew Hennessy at the University of Sussex. They worked together again when Carolyn was in turn hired as a lecturer at Sussex. In particular, they showed in 1994 that “string diagrams”—due to Roger Penrose and Richard Feynman in physics—provide a “fully abstract” calculus for hardware circuits, meaning that everything true of the diagrams is true of the hardware, and vice versa. This work foreshadowed a hot topic in the field of Applied Category Theory today.

Matthew essentially left Alan to his own devices: as Matthew put it, “something I was very happy with as he was an exceptional researcher”. Alan was soon promoted to a lectureship himself. He collaborated closely with Julian Rathke, then Matthew’s PhD student and later postdoc, on the Full Abstraction Factory project, developing a bunch more full abstraction results for concurrent and object-oriented languages. That fruitful collaboration continued even after Alan left Sussex.

DePaul, 1999–2004

Alan presented a paper A Fully Abstract Semantics for a Nondeterministic Functional Language with Monadic Types at the 1995 conference on Mathematical Foundations of Programming Semantics in New Orleans. I believe that this is where he met Karen Bernstein, who also had a paper. One thing led to another, and Alan took a one-year visiting position at DePaul University in Chicago in 1998, then formally left Sussex in 1999 for a regular Associate Professor position at DePaul. He lived in Chicago for the rest of his life.

Alan established the Foundations of Programming Languages research group at DePaul, attracting Radha Jagadeesan from Loyola, James Riely from Sussex, and Corin Pitcher from Oxford, working among other things on “relaxed memory”—modern processors don’t actually read and write their multiple levels of memory in the order you tell them to, when they can find quicker ways to do things concurrently and sometimes out of order.

James remembers showing Alan his first paper on relaxed memory, co-authored with Radha. Alan thought their approach was an “appalling idea”; the proper way was to use “event structures”, an idea from the 1980s. This turned in 2016 into a co-authored paper at LICS (Alan’s favourite conference), and what James considers his own best ever talk—an on-stage reenactment of the to and fro of their collaboration, sadly not recorded for posterity.

James was Alan’s most frequent collaborator over the years, with 14 joint papers. Their modus operandi was that, having identified a problem together, Alan would go off by himself and do some Alanny things, eventually coalescing on a solution, and choose an order of exposition, tight and coherent; this is about 40% of the life of the paper. But then there are various tweaks, extensions, corrections… Alan would never look at the paper again, and would be surprised years later to learn what was actually in it. However, Alan was always easy to work with: interested only in the truth, although it must be beautiful. He had a curious mix of modesty and egocentricity: always convinced he was right (and usually right that he was right). Still, he had no patience for boring stuff, especially university admin.

While at DePaul, Alan also had a significant collaboration with Andy Gordon from Microsoft on verifying security protocols. Their 2001 paper Authenticity by Typing for Security Protocols won a Test Of Time Award this year at the Symposium on Computer Security Foundations, “given to outstanding papers with enduring significance and impact”—recognition which happily Alan lived to see.

Bell Labs, 2004–2015

After the dot com crash in 2000, things got more difficult at DePaul, and Alan left in 2004 for Bell Labs, nominally as a member of technical staff in Naperville but actually part of a security group based at HQ in Murray Hill NJ. He worked on XPath, “a modal logic for XML”, with Michael Benedikt, now my databases colleague at Oxford. They bonded because only Alan and Michael lived in Chicago rather than the suburbs. Michael had shown Alan a recent award-winning paper in which Alan quickly spotted an error in a proof—an “obvious” and unproven lemma that turned out to be false—which led to their first paper together.

(A recurring pattern. Andy Gordon described Alan’s “uncanny ability to find bugs in arguments”: he found a type unsoundness bug in a released draft specification for Java, and ended up joining the standards committee to help fix it. And as a PhD examiner he “shockingly” found a subtle bug that unpicked the argument of half of the dissertation, necessitating major corrections: it took a brave student to invite Alan as examiner—or a very confident one.)

Michael describes Alan as an “awesome developer”. They once had an intern; it didn’t take long after the intern had left for Alan to discard the intern’s code and rewrite it from scratch. Alan was unusual in being able to combine Euro “abstract nonsense” and US engineering. Glenn Bruns, another Bell Labs colleague, said that “I think Alan was the only person I’ve met who could do theory and also low-level hackery”.

At Bell Labs Alan also worked with Peter Danielsen on the Web InterFace Language, WIFL for short: a way of embedding API descriptions in HTML. Peter recalls: “We spent a few months working together on the conceptual model. In the early stages of software development, however, Alan looked at what I’d written and said, “I wouldn’t do it that way at all!”, throwing it all away and starting over. The result was much better; and he inadvertently taught me a new way to think in JavaScript (including putting //Sigh… comments before unavoidable tedious code.)”

Mozilla Research, 2015–2020

The Bell Labs group dissolved in 2015, and Alan moved to Mozilla Research as a staff research engineer to work on Servo, a new web rendering engine in the under-construction programming language Rust.

For one of Alan’s projects at Mozilla, he took a highly under-specified part of the HTML specification about how web links and the back and forwards browser buttons should interact, created a formal model in Agda based on the existing specification, identified gaps in it as well as ways that major browsers did not match the model, then wrote it all up as a paper. Alan’s manager Josh Matthews recalls the editors of the HTML standard being taken aback by Alan’s work suddenly being dropped in their laps, but quickly appreciated how much more confidently they could make changes based on it.

Josh also recalled: “Similarly, any time other members of the team would talk about some aspect of the browser engine being safe as long as some property was upheld by the programmer, Alan would get twitchy. He had a soft spot for bad situations that looked like they could be made impossible by clever enough application of static types.”

In 2017 Alan made a rather surprising switch to working on augmented reality for the web, partly driven by internal politics at Mozilla. He took the lead on integrating Servo into the Magic Leap headset; the existing browser was clunky, the only way to interact with pages being via an awkward touchpad on the hand controller. This was not good enough for Alan: after implementing the same behaviour for Servo and finding it frustrating, he had several email exchanges with the Magic Leap developers, figured out how to access some interfaces that weren’t technically public but also were not actually private, and soon he proudly showed off a more natural laser pointer-style means of interacting with pages in augmented reality in Servo—to much acclaim from the team and testers.

Roblox, 2020–2024

Then in 2020, Mozilla’s funding stream got a lot more constrained, and Alan moved to the game platform company Roblox. Alan was a principal software engineer, and the language owner of Luau, “a fast, small, safe, gradually typed embeddable scripting language derived from Lua”, working on making the language easier to use, teach, and learn. Roblox supports more than two million “content creators”, mostly kids, creating millions of games a year; Alan’s goal was to empower them to build larger games with more characters.

The Luau product manager Bryan Nealer says that “people loved Alan”. Roblox colleagues appreciated his technical contributions: “Alan was meticulous in what he built and wrote at Roblox. He would stress not only the substance of his work, but also the presentation. His attention to detail inspired the rest of us!”; “One of the many wonderful things Alan did for us was to be the guy who could read the most abstruse academic research imaginable and translate it into something simple, useful, interesting, and even fun.” They also appreciated the more personal contributions: Alan led an internal paper reading group, meeting monthly to study some paper on programming or networking, but he also established the Roblox Book Club: “He was always thoughtful when discussing books, and challenged us to think about the text more deeply. He also had an encyclopedic knowledge of scifi. He recommended Iain M. Banks’s The Culture series to me, which has become my favorite scifi series. I think about him every time I pick up one of those books.”

Envoie

From my own perspective, one of the most impressive things about Alan is that he was impossible to pigeonhole: like Dr Who, he was continually regenerating. He explained to me that he got bored quickly with one area, and moved on to another. As well as his academic abilities, he was a talented and natural cartoonist: I still have a couple of the tiny fanzine comics he produced as a student.

Of course he did some serious science for his DPhil and later career: but he also took a strong interest in typography and typesetting. He digitized some beautiful Japanese crests for the chapter title pages of his DPhil dissertation. Alan dragged me in typography with him, a distraction I have enjoyed ever since. Among other projects, Alan and I produced a font containing some extra symbols so that we could use them in our papers, and named it St Mary’s Road after our Oxford digs. And Alan produced a full blackboard bold font, complete with lowercase letters and punctuation: you can see some of it in the order of service. But Alan was not satisfied with merely creating these things; he went to all the trouble to package them up properly and get them included in standard software distributions, so that they would be available for everyone: Alan loved to build things for people to use. These two fonts are still in regular use 35 years later, and I’m sure they will be reminding us of him for a long time to come.

by jeremygibbons at November 05, 2024 01:13 PM

GHC Developer Blog

GHC 9.12.1-alpha2 is now available

GHC 9.12.1-alpha2 is now available

Zubin Duggal - 2024-11-05

The GHC developers are very pleased to announce the availability of the second alpha release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 05, 2024 12:00 AM

November 04, 2024

in Code

Functors to Monads: A Story of Shapes

For many years now I’ve been using a mental model and intuition that has guided me well for understanding and teaching and using functors, applicatives, monads, and other related Haskell abstractions, as well as for approaching learning new ones. Sometimes when teaching Haskell I talk about this concept and assume everyone already has heard it, but I realize that it’s something universal yet easy to miss depending on how you’re learning it. So, here it is: how I understand the Functor and other related abstractions and free constructions in Haskell.

The crux is this: instead of thinking about what fmap changes, ask: what does fmap keep constant?

This isn’t a rigorous understanding and isn’t going to explain every aspect about every Functor, and will probably only be useful if you already know a little bit about Functors in Haskell. But it’s a nice intuition trick that has yet to majorly mislead me.

The Secret of Functors

First of all, what is a Functor? A capital-F Functor, that is, the Haskell typeclass and abstraction. Ask a random Haskeller on the street and they’ll tell you that it’s something that can be “mapped over”, like a list or an optional. Maybe some of those random Haskellers will feel compelled to mention that this mapping should follow some laws…they might even list the laws. Ask them why these laws are so important and maybe you’ll spend a bit of time on this rhetorical street of Haskellers before finding one confident enough to give an answer.

So I’m going to make a bit of a tautological leap: a Functor gives you a way to “map over” values in a way that preserves shape. And what is “shape”? A shape is the thing that fmap preserves.

The Functor typeclass is simple enough: for Functor f, you have a function fmap :: (a -> b) -> f a -> f b, along with fmap id = id and fmap f . fmap g = fmap (f . g). Cute things you can drop into quickcheck to prove for your instance, but it seems like those laws are hiding some sort of deeper, fundamental truth.

The more Functors you learn about, the more you see that fmap seems to always preserve “something”:

  • For lists, fmap preserves length and relative orderings.
  • For optionals (Maybe), fmap preserves presence (the fact that something is there or not). It cannot flip a Just to a Nothing or vice versa.
  • For Either e, fmap preserves the error (if it exists) or the fact that it was succesful.
  • For Map k, fmap preserves the keys: which keys exist, how many there are, their relative orderings, etc.
  • For IO, fmap preserves the IO effect. Every bit of external I/O that an IO action represents is unchanged by an fmap, as well as exceptions.
  • For Writer w or (,) w, fmap preserves the “logged” w value, leaving it unchanged. Same for Const w.
  • For Tree, fmap preserves the tree structure: how many layers, how big they are, how deep they are, etc.
  • For State s, fmap preserves what happens to the input state s. How a State s transform a state value s is unchanged by fmap
  • For ConduitT i o m from conduit, fmap preserves what the conduit pulls upstream and what it yields downstream. fmap will not cause the conduit to yield more or different objects, nor cause it to consume/pull more or less.
  • For parser-combinator Parser, fmap preserves what input is consumed or would fail to be consumed. fmap cannot change whether an input string would fail or succeed, and it cannot change how much it consumes.
  • For optparse-applicative Parsers, fmap preserves the command line arguments available. It leaves the --help message of your program unchanged.

It seems like as soon as you define a Functor instance, or as soon as you find out that some type has a Functor instance, it magically induces some sort of … “thing” that must be preserved.1 A conserved quantity must exist. It reminds me a bit of Noether’s Theorem in Physics, where any continuous symmetry “induces” a conserved quantity (like how translation symmetry “causes” conservation of momentum). In Haskell, every lawful Functor instance induces a conserved quantity. I don’t know if there is a canonical name for this conserved quantity, but I like to call it “shape”.

A Story of Shapes

The word “shape” is chosen to be as devoid of external baggage/meaning as possible while still having some. The word isn’t important as much as saying that there is some “thing” preserved by fmap, and not exactly the nature of that “thing”. The nature of that thing changes a lot from Functor to Functor, where we might better call it an “effect” or a “structure” specifically, but that some “thing” exists is almost universal.

Of course, the value if this “thing” having a canonical name at all is debatable. I were to coin a completely new term I might call it a “conserved charge” or “gauge” in allusion to physics. But the most useful name probably would be shape.

For some Functor instances, the word shape is more literal than others. For trees, for instance, you have the literal shape of the tree preserved. For lists, the “length” could be considered a literal shape. Map k’s shape is also fairly literal: it describes the structure of keys that exist in the map. But for Writer w and Const w, shape can be interpreted as some information outside of the values you are mapping that is left unchanged by mapping. For Maybe and Either e shape also considers if there has been any short-circuiting. For State s and IO and Parser, “shape” involves some sort of side-computation or consumption that is left unchanged by fmap, often called an effect. For optparse-applicative, “shape” involves some sort of inspectable and observable static aspects of a program. “Shape” comes in all forms.

But, this intuition of “looking for that conserved quantity” is very helpful for learning new Functors. If you stumble onto a new type that you know is a Functor instance, you can immediately ask “What shape is this fmap preserving?”, and it will almost always yield insight into that type.

This viewpoint also sheds insight onto why Set.map isn’t a good candidate for fmap for Data.Set: What “thing” does Set.map f preserve? Not size, for sure. In a hypothetical world where we had ordfmap :: Ord b => (a -> b) -> f a -> f b, we would still need Set.map to preserve something for it to be useful as an “Ord-restricted Functor”.2

A Result

Before we move on, let’s look at another related and vague concept that is commonly used when discussing functors: fmap is a way to map a function that preserves the shape and changes the result.

If shape is the thing that is preserved by fmap, result is the thing that is changed by it. fmap cleanly splits the two.

Interestingly, most introduction to Functors begin with describing functor values as having a result and fmap as the thing that changes it, in some way. Ironically, though it’s a more common term, it’s by far the more vague and hard-to-intuit concept.

For something like Maybe, “result” is easy enough: it’s the value present if it exists. For parser-combinator Parsers too it’s relatively simple: the “shape” is the input consumed but the “result” is the Haskell value you get as a result of the consumption. For optparse-applicative parser, it’s the actual parsed command line arguments given by the user at runtime. But sometimes it’s more complicated: for the technical List functor, the “non-determinism” functor, the “shape” is the number of options to choose from and the order you get them in, and the “result” (to use precise semantics) is the non-deterministic choice that you eventually pick or iterate over.

So, the “result” can become a bit confusing to generalize. So, in my mind, I usually reduce the definitions to:

  • Shape: the “thing” that fmap preserves: the f in f a
  • Result: the “thing” that fmap changes: the a in f a

With this you could “derive” the Functor laws:

  • fmap id == id: fmap leaves the shape unchanged, id leaves the result unchanged. So entire thing must remain unchanged!
  • fmap f . fmap g == fmap (f . g). In both cases the shape remains unchanged, but one changes the result by f after g, and the other changes the result by f . g. They must be the same transformation!

All neat and clean, right? So, maybe the big misdirection is focusing too much on the “result” when learning Functors, when we should really be focusing more on the “shape”, or at least the two together.

Once you internalize “Functor gives you shape-preservation”, this helps you understand the value of the other common typeclass abstractions in Haskell as well, and how they function based on how they manipulate “shape” and “result”.

Traversable

For example, what does the Traversable typeclass give us? Well, if Functor gives us a way to map pure functions and preserve shape, then Traversable gives us a way to map effectful functions and preserve shape.

Whenever someone asks me about my favorite Traversable instance, I always say it’s the Map k traversable:

traverse :: Applicative f => (a -> f b) -> Map k a -> f (Map k b)

Notice how it has no constraints on k? Amazing isn’t it? Map k b lets us map an (a -> f b) over the values at each key in a map, and collects the results under the key the a was originally under.

In essence, you can be assured that the result map has the same keys as the original map, perfectly preserving the “shape” of the map. The Map k instance is the epitome of beautiful Traversable instances. We can recognize this by identifying the “shape” that traverse is forced to preserve.

Applicative

What does the Applicative typeclass give us? It has ap and pure, but its laws are infamously difficult to understand.

But, look at liftA2 (,):

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)

It lets us take “two things” and combine their shapes. And, more importantly, it combines the shapes without considering the results.

  • For Writer w, <*> lets us combine the two logged values using mappend while ignoring the actual a/b results.
  • For list, <*> (the cartesian product) lets us multiply the lengths of the input lists together. The length of the new list ignores the actual contents of the list.
  • For State s, <*> lets you compose the s -> s state functions together, ignoring the a/bs
  • For Parser, <*> lets you sequence input consumption in a way that doesn’t depend on the actual values you parse: it’s “context-free” in a sense, aside from some caveats.
  • For optparse-applicative, <*> lets you combine your command line argument specs together, without depending on the actual values provided at runtime by the caller.

The key takeaway is that the “final shape” only depends on the input shapes, and not the results. You can know the length of <*>-ing two lists together with only knowing the length of the input lists, and you can also know the relative ordering of inputs to outputs. Within the specific context of the semantics of IO, you can know what “effect” <*>-ing two IO actions would produce only knowing the effects of the input IO actions3. You can know what command line arguments <*>-ing two optparse-applicative parsers would have only knowing the command line arguments in the input parsers. You can know what strings <*>-ing two parser-combinator parsers would consume or reject, based only on the consumption/rejection of the input parsers. You can know the final log of <*>-ing two Writer w as together by only knowing the logs of the input writer actions.

And hey…some of these combinations feel “monoidal”, don’t they?

  • Writer w sequences using mappend
  • List lengths sequence by multiplication
  • State s functions sequence by composition

You can also imagine “no-op” actions:

  • Writer w’s no-op action would log mempty, the identity of mappend
  • List’s no-op action would have a length 1, the identity of multiplication
  • State s’s no-op action would be id, the identity of function composition

That might sound familiar — these are all pure from the Applicative typeclass!

So, the Applicative typeclass laws aren’t that mysterious at all. If you understand the “shape” that a Functor induces, Applicative gives you a monoid on that shape! This is why Applicative is often called the “higher-kinded” Monoid.

This intuition takes you pretty far, I believe. Look at the examples above where we clearly identify specific Applicative instances with specific Monoid instances (Monoid w, Monoid (Product Int), Monoid (Endo s)).

Put in code:

-- A part of list's shape is its length and the monoid is (*, 1)
length (xs <*> ys) == length xs * length ys
length (pure r) == 1

-- Maybe's shape is isJust and the monoid is (&&, True)
isJust (mx <*> my) == isJust mx && isJust my
isJust (pure r) = True

-- State's shape is execState and the monoid is (flip (.), id)
execState (sx <*> sy) == execState sy . execState sx
execState (pure r) == id

-- Writer's shape is execWriter and the monoid is (<>, mempty)
execWriter (wx <*> wy) == execWriter wx <> execWriter wy
execWriter (pure r) == mempty

We can also extend this to non-standard Applicative instances: the ZipList newtype wrapper gives us an Applicative instance for lists where <*> is zipWith. These two have the same Functor instances, so their “shape” (length) is the same. And for both the normal Applicative and the ZipList Applicative, you can know the length of the result based on the lengths of the input, but ZipList combines shapes using the Min monoid, instead of the Product monoid. And the identity of Min is positive infinity, so pure for ZipList is an infinite list.

-- A part of ZipList's shape is length and its monoid is (min, infinity)
length (xs <*> ys) == length xs `min` length ys
length (pure r) == infinity

The “know-the-shape-without-knowing-the-results” property is actually leveraged by many libraries. It’s how optparse-applicative can give you --help output: the shape of the optparse-applicative parser (the command line arguments list) can be computed without knowing the results (the actual arguments themselves at runtime). You can list out what arguments are expecting without ever getting any input from the user.

This is also leveraged by the async library to give us the Concurrently Applicative instance. Normally <*> for IO gives us sequential combination of IO effects. But, <*> for Concurrently gives us parallel combination of IO effects. We can launch all of the IO effects in parallel at the same time because we know what the IO effects are before we actually have to execute them to get the results. If we needed to know the results, this wouldn’t be possible.

This also gives some insight into the Backwards Applicative wrapper — because the shape of the final does not depend on the result of either, we are free to combine the shapes in whatever order we want. In the same way that every monoid gives rise to a “backwards” monoid:

ghci> "hello" <> "world"
"helloworld"
ghci> getDual $ Dual "hello" <> Dual "world"
"worldhello"

Every Applicative gives rise to a “backwards” Applicative that does the shape “mappending” in reverse order:

ghci> putStrLn "hello" *> putStrLn "world"
hello
world
ghci> forwards $ Backwards (putStrLn "hello") *> Backwards (putStrLn "world")
world
hello

The monoidal nature of Applicative with regards to shapes and effects is the heart of the original intent, and I’ve discussed this in earlier blog posts.

Alternative

The main function of the Alternative typeclass is <|>:

(<|>) :: Alternative f => f a -> f a -> f a

At first this might look a lot like <*> or liftA2 (,)

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)

Both of them take two f a values and squish them into a single one. Both of these are also monoidal on the shape, independent of the result. They have a different monoidal action on <|> than as <*>:

-- A part of list's shape is its length:
-- the Ap monoid is (*, 1), the Alt monoid is (+, 0)
length (xs <*> ys) == length xs * length ys
length (pure r) == 1
length (xs <|> ys) == length xs + length ys
length empty == 0

-- Maybe's shape is isJust:
-- The Ap monoid is (&&, True), the Alt monoid is (||, False)
isJust (mx <*> my) == isJust mx && isJust my
isJust (pure r) = True
isJust (mx <|> my) == isJust mx || isJust my
isJust empty = False

If we understand that functors have a “shape”, Applicative implies that the shapes are monoidal, and Alternative implies that the shapes are a “double-monoid”. The exact nature of how the two monoids relate to each other, however, is not universally agreed upon. For many instances, however, it does happen to form a semiring, where empty “annihilates” via empty <*> x == empty, and <*> distributes over <|> like x <*> (y <|> z) == (x <*> y) <|> (x <*> z). But this is not universal.

However, what does Alternative bring to our shape/result dichotomy that Applicative did not? Notice the subtle difference between the two:

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)
(<|>) :: Alternative f => f a -> f a -> f a

For Applicative, the “result” comes from the results of both inputs. For Alternative, the “result” could come from one or the other input. So, this introduces a fundamental data dependency for the results:

  • Applicative: Shapes merge monoidally independent of the results, but to get the result of the final, you need to produce the results of both of the two inputs in the general case.
  • Alternative: Shapes merge monoidally independent of the results, but to get the result of the final, you need the results of one or the other input in the general case.

This also implies that choice of combination method for shapes in Applicative vs Alternative aren’t arbitrary: the former has to be “conjoint” in a sense, and the latter has to be “disjoint”.

See again that clearly separating the shape and the result gives us the vocabulary to say precisely what the different data dependencies are.

Monad

Understanding shapes and results also help us appreciate more the sheer power that Monad gives us. Look at >>=:

(>>=) :: Monad m => m a -> (a -> m b) -> m b

Using >>= means that the shape of the final action is allowed to depend on the result of the first action! We are no longer in the Applicative/Alternative world where shape only depends on shape.

Now we can write things like:

greet = do
  putStrLn "What is your name?"
  n <- getLine
  putStrLn ("Hello, " ++ n ++ "!")

Remember that for “IO”, the shape is the IO effects (In this case, what exactly gets sent to the terminal) and the “result” is the haskell value computed from the execution of that IO effect. In our case, the action of the result (what values are printed) depends on the result of of the intermediate actions (the getLine). You can no longer know in advance what action the program will have without actually running it and getting the results.

The same thing happens when you start sequencing parser-combinator parsers: you can’t know what counts as a valid parse or how much a parser will consume until you actually start parsing and getting your intermediate parse results.

Monad is also what makes guard and co. useful. Consider the purely Applicative:

evenProducts :: [Int] -> [Int] -> [Bool]
evenProducts xs ys = (\x y -> even (x * y)) <$> xs <*> ys

If you passed in a list of 100 items and a list of 200 items, you can know that the result has 100 * 200 = 20000 items, without actually knowing any of the items in the list.

But, consider an alternative formulation where we are allowed to use Monad operations:

evenProducts :: [Int] -> [Int] -> [(Int, Int)]
evenProducts xs ys = do
  x <- xs
  y <- ys
  guard (even (x * y))
  pure (x, y)

Now, even if you knew the lengths of the input lists, you can not know the length of the output list without actually knowing what’s inside your lists. You need to actually start “sampling”.

That’s why there is no Monad instance for Backwards or optparse-applicative parsers. For Backwards doesn’t work because we’ve now introduced an asymmetry (the m b depends on the a of the m a) that can’t be reversed. For optparse-applicative, it’s because we want to be able to inspect the shape without knowing the results at runtime (so we can show a useful --help without getting any actual arguments): but, with Monad, we can’t know the shape without knowing the results!

In a way, Monad simply “is” the way to combine Functor shapes together where the final shape is allowed to depend on the results. Hah, I tricked you into reading a monad tutorial!

Free Structures

I definitely write way too much about free structures on this blog. But this “shapeful” way of thinking also gives rise to why free structures are so compelling and interesting to work with in Haskell.

Before, we were describing shapes of Functors and Applicatives and Monads that already existed. We had this Functor, what was its shape?

However, what if we had a shape that we had in mind, and wanted to create an Applicative or Monad that manipulated that shape?

For example, let’s roll our own version of optparse-applicative that only supported --myflag somestring options. We could say that the “shape” is the list of supported option and parsers. So a single element of this shape would be the specification of a single option:

data Option a = Option { optionName :: String, optionParse :: String -> Maybe a }
  deriving Functor

The “shape” here is the name and also what values it would parse, essentially. fmap won’t affect the name of the option and won’t affect what would succeed or fail.

Now, to create a full-fledged multi-argument parser, we can use Ap from the free library:

type Parser = Ap Option

We specified the shape we wanted, now we get the Applicative of that shape for free! We can now combine our shapes monoidally using the <*> instance, and then use runAp_ to inspect it:

data Args = Args { myStringOpt :: String, myIntOpt :: Int }

parseTwo :: Parser args
parseTwo = Args <$> liftAp stringOpt <*> liftAp intOpt
  where
    stringOpt = Option "string-opt" Just
    intOpt = Option "int-opt" readMaybe

getAllOptions :: Parser a -> [String]
getAllOptions = runAp_ (\o -> [optionName o])
ghci> getAllOptions parseTwo
["string-opt", "int-opt"]

Remember that Applicative is like a “monoid” for shapes, so Ap gives you a free “monoid” on your custom shape: you can now create list-like “sequences” of your shape that merge via concatenation through <*>. You can also know that fmap on Ap Option will not add or remove options: it’ll leave the actual options unchanged. It’ll also not affect what options would fail or succeed to parse.

You could also write a parser combinator library this way too! Remember that the “shape” of a parser combinator Parser is the string that it consumes or rejects. The single element might be a parser that consumes and rejects a single Char:

newtype Single a = Single { satisfies :: Char -> Maybe a }
  deriving Functor

The “shape” is whether or not it consumes or rejects a char. Notice that fmap for this cannot change whether or not a char is rejected or accepted: it can only change the Haskell result a value. fmap can’t flip the Maybe into a Just or Nothing.

Now we can create a full monadic parser combinator library by using Free from the free library:

type Parser = Free Single

Again, we specified the shape we wanted, and now we have a Monad for that shape! For more information on using this, I’ve written a blog post in the past. Ap gives you a free “monoid” on your shapes, but in a way Free gives you a “tree” for your shapes, where the sequence of shapes depends on which way you go down their results. And, again, fmap won’t ever change what would or would not be parsed.

How do we know what free structure to pick? Well, we ask questions about what we want to be able to do with our shape. If we want to inspect the shape without knowing the results, we’d use the free Applicative or free Alternative. As discussed earlier, using the free Applicative means that our final result must require producing all of the input results, but using the free Alternative means it doesn’t. If we wanted to allow the shape to depend on the results (like for a context-sensitive parser), we’d use the free Monad. Understanding the concept of the “shape” makes this choice very intuitive.

The Shape of You

Next time you encounter a new Functor, I hope these insights can be useful. Ask yourself, what is fmap preserving? What is fmap changing? And from there, its secrets will unfold before you. Emmy Noether would be proud.

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. There are some exceptions, especially degenerate cases like Writer () aka Identity which add no meaningful structure. So for these this mental model isn’t that useful.↩︎

  2. Incidentally, Set.map does preserve one thing: non-emptiness. You can’t Set.map an empty set into a non-empty one and vice versa. So, maybe if we recontextualized Set as a “search for at least one result” Functor or Monad where you could only ever observe a single value, Set.map would work for Ord-restricted versions of those abstractions, assuming lawful Ord instances.↩︎

  3. That is, if we take the sum consideration of all input-output with the outside world, independent of what happens within the Haskell results, we can say the combination of effects is deterministic.↩︎

by Justin Le at November 04, 2024 07:44 PM

November 03, 2024

Haskell Interlude

57: Gabriele Keller

Gabriele Keller, professor at Utrecht University, is interviewed by Andres and Joachim. We follow her journey through the world as well as programming languages, learn why Haskell is the best environment for embedding languages and how the desire to implement parallel programming sparked the development of type families in Haskell and that teaching functional programming works better with graphics.

by Haskell Podcast at November 03, 2024 08:00 PM

November 02, 2024

Brent Yorgey

Competitive Programming in Haskell: Union-Find

Competitive Programming in Haskell: Union-Find

Posted on November 2, 2024
Tagged , ,

Union-find

A union-find data structure (also known as a disjoint set data structure) keeps track of a collection of disjoint sets, typically with elements drawn from \(\{0, \dots, n-1\}\). For example, we might have the sets

\(\{1,3\}, \{0, 4, 2\}, \{5, 6, 7\}\)

A union-find structure must support three basic operations:

  • We can \(\mathit{create}\) a union-find structure with \(n\) singleton sets \(\{0\}\) through \(\{n-1\}\). (Alternatively, we could support two operations: creating an empty union-find structure, and adding a new singleton set; occasionally this more fine-grained approach is useful, but we will stick with the simpler \(\mathit{create}\) API for now.)

  • We can \(\mathit{find}\) a given \(x \in \{0, \dots, n-1\}\), returning some sort of “name” for the set \(x\) is in. It doesn’t matter what these names are; the only thing that matters is that for any \(x\) and \(y\), \(\mathit{find}(x) = \mathit{find}(y)\) if and only if \(x\) and \(y\) are in the same set. The most important application of \(\mathit{find}\) is therefore to check whether two given elements are in the same set or not.

  • We can \(\mathit{union}\) two elements, so the sets that contain them become one set. For example, if we \(\mathit{union}(2,5)\) then we would have

    \(\{1,3\}, \{0, 4, 2, 5, 6, 7\}\)

Note that \(\mathit{union}\) is a one-way operation: once two sets have been unioned together, there’s no way to split them apart again. (If both merging and splitting are required, one can use a link/cut tree, which is very cool—and possibly something I will write about in the future—but much more complex.) However, these three operations are enough for union-find structures to have a large number of interesting applications!

In addition, we can annotate each set with a value taken from some commutative semigroup. When creating a new union-find structure, we must specify the starting value for each singleton set; when unioning two sets, we combine their annotations via the semigroup operation.

  • For example, we could annotate each set with its size; singleton sets always start out with size 1, and every time we union two sets we add their sizes.
  • We could also annotate each set with the sum, product, maximum, or minumum of all its elements.
  • Of course there are many more exotic examples as well.

We typically use a commutative semigroup, as in the examples above; this guarantees that a given set always has a single well-defined annotation value, regardless of the sequence of union-find operations that were used to create it. However, we can actually use any binary operation at all (i.e. any magma), in which case the annotations on a set may reflect the precise tree of calls to \(\mathit{union}\) that were used to construct it; this can occasionally be useful.

  • For example, we could annotate each set with a list of values, and combine annotations using list concatenation; the order of elements in the list associated to a given set will depend on the order of arguments to \(\mathit{union}\).

  • We could also annotate each set with a binary tree storing values at the leaves. Each singleton set is annotated with a single leaf; to combine two trees we create a new branch node with the two trees as its children. Then each set ends up annotated with the precise tree of calls to \(\mathit{union}\) that were used to create it.

Implementing union-find

My implementation is based on one by Kwang Yul Seo, but I have modified it quite a bit. The code is also available in my comprog-hs repository. This blog post is not intended to be a comprehensive union-find tutorial, but I will explain some things as we go.

{-# LANGUAGE RecordWildCards #-}

module UnionFind where

import Control.Monad (when)
import Control.Monad.ST
import Data.Array.ST

Let’s start with the definition of the UnionFind type itself. UnionFind has two type parameters: s is a phantom type parameter used to limit the scope to a given ST computation; m is the type of the arbitrary annotations. Note that the elements are also sometimes called “nodes”, since, as we will see, they are organized into trees.

type Node = Int
data UnionFind s m = UnionFind {

The basic idea is to maintain three mappings:

  • First, each element is mapped to a parent (another element). There are no cycles, except that some elements can be their own parent. This means that the elements form a forest of rooted trees, with the self-parenting elements as roots. We store the parent mapping as an STUArray (see here for another post where we used STUArray) for efficiency.
  parent :: !(STUArray s Node Node),
  • Each element is also mapped to a size. We maintain the invariant that for any element which is a root (i.e. any element which is its own parent), we store the size of the tree rooted at that element. The size associated to other, non-root elements does not matter.

    (Many implementations store the height of each tree instead of the size, but it does not make much practical difference, and the size seems more generally useful.)

  sz :: !(STUArray s Node Int),
  • Finally, we map each element to a custom annotation value; again, we only care about the annotation values for root nodes.
  ann :: !(STArray s Node m) }

To \(\mathit{create}\) a new union-find structure, we need a size and a function mapping each element to an initial annotation value. Every element starts as its own parent, with a size of 1. For convenience, we can also make a variant of createWith that gives every element the same constant annotation value.

createWith :: Int -> (Node -> m) -> ST s (UnionFind s m)
createWith n m =
  UnionFind
    <$> newListArray (0, n - 1) [0 .. n - 1]    -- Every node is its own parent
    <*> newArray (0, n - 1) 1                   -- Every node has size 1
    <*> newListArray (0, n - 1) (map m [0 .. n - 1])

create :: Int -> m -> ST s (UnionFind s m)
create n m = createWith n (const m)

To perform a \(\mathit{find}\) operation, we keep following parent references up the tree until reaching a root. We can also do a cool optimization known as path compression: after finding a root, we can directly update the parent of every node along the path we just traversed to be the root. This means \(\mathit{find}\) can be very efficient, since it tends to create trees that are extremely wide and shallow.

find :: UnionFind s m -> Node -> ST s Node
find uf@(UnionFind {..}) x = do
  p <- readArray parent x
  if p /= x
    then do
      r <- find uf p
      writeArray parent x r
      pure r
    else pure x

connected :: UnionFind s m -> Node -> Node -> ST s Bool
connected uf x y = (==) <$> find uf x <*> find uf y

Finally, to implement \(\mathit{union}\), we find the roots of the given nodes; if they are not the same we make the root with the smaller tree the child of the other root, combining sizes and annotations as appropriate.

union :: Semigroup m => UnionFind s m -> Node -> Node -> ST s ()
union uf@(UnionFind {..}) x y = do
  x <- find uf x
  y <- find uf y
  when (x /= y) $ do
    sx <- readArray sz x
    sy <- readArray sz y
    mx <- readArray ann x
    my <- readArray ann y
    if sx < sy
      then do
        writeArray parent x y
        writeArray sz y (sx + sy)
        writeArray ann y (mx <> my)
      else do
        writeArray parent y x
        writeArray sz x (sx + sy)
        writeArray ann x (mx <> my)

Note the trick of writing x <- find uf x: this looks kind of like an imperative statement that updates the value of a mutable variable x, but really it just makes a new variable x which shadows the old one.

Finally, a few utility functions. First, one to get the size of the set containing a given node:

size :: UnionFind s m -> Node -> ST s Int
size uf@(UnionFind {..}) x = do
  x <- find uf x
  readArray sz x

Also, we can provide functions to update and fetch the custom annotation value associated to the set containing a given node.

updateAnn :: Semigroup m => UnionFind s m -> Node -> m -> ST s ()
updateAnn uf@(UnionFind {..}) x m = do
  x <- find uf x
  old <- readArray ann x
  writeArray ann x (old <> m)
  -- We could use modifyArray above, but the version of the standard library
  -- installed on Kattis doesn't have it

getAnn :: UnionFind s m -> Node -> ST s m
getAnn uf@(UnionFind {..}) x = do
  x <- find uf x
  readArray ann x

Challenge

Here are a couple of problems I challenge you to solve for next time:

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 02, 2024 12:00 AM

October 31, 2024

Abhinav Sarkar

Going REPLing with Haskeline

So you went ahead and created a new programming language, with an AST, a parser, and an interpreter. And now you hate how you have to write the programs in your new language in files to run them? You need a REPL! In this post, we’ll create a shiny REPL with lots of nice features using the Haskeline library to go along with your new PL that you implemented in Haskell.

This post was originally published on abhinavsarkar.net.

The Demo

First a short demo:

<noscript>
Play demo <noscript></noscript>
</noscript>

That is a pretty good REPL, isn’t it? You can even try it online1, running entirely in your browser.

Dawn of a New Language

Let’s assume that we have created a new small Lisp2, just large enough to be able to conveniently write and run the Fibonacci function that returns the nth Fibonacci number. That’s it, nothing more. This lets us focus on the features of the REPL3, not the language.

We have a parser to parse the code from text to an AST, and an interpreter that evaluates an AST and returns a value. We are not going into the details of the parser and the interpreter, just listing the type signatures of the functions they provide is enough for this post.

Let’s start with the AST:

module Language.FiboLisp.Types where

import Data.Text qualified as Text
import Data.Text.Lazy qualified as LText
import Text.Pretty.Simple qualified as PS
import Text.Printf (printf)

type Ident = String

data Expr
  = Num_ Integer
  | Bool_ Bool
  | Var Ident
  | BinaryOp Op Expr Expr
  | If Expr Expr Expr
  | Apply Ident [Expr]
  deriving (Show)

data Op = Add | Sub | LessThan
  deriving (Show, Enum)

data Def = Def {defName :: Ident, defParams :: [Ident], defBody :: Expr}

data Program = Program [Def] [Expr]
  deriving (Show)

carKeywords :: [String]
carKeywords = ["def", "if", "+", "-", "<"]

instance Show Def where
  show Def {..} =
    printf "(Def %s [%s] (%s))" defName (unwords defParams) (show defBody)

showProgram :: Program -> String
showProgram =
  Text.unpack
    . LText.toStrict
    . PS.pShowOpt
      ( PS.defaultOutputOptionsNoColor
          { PS.outputOptionsIndentAmount = 2,
            PS.outputOptionsCompact = True,
            PS.outputOptionsCompactParens = True
          }
      )

That’s right! We named our little language FiboLisp.

FiboLisp is expression oriented; everything is an expression. So naturally, we have an Expr AST. Writing the Fibonacci function requires not many syntactic facilities. In FiboLisp we have:

  • integer numbers,
  • booleans,
  • variables,
  • addition, subtraction, and less-than binary operations on numbers,
  • conditional if expressions, and
  • function calls by name.

We also have function definitions, captured by Def, which records the function name, its parameter names, and its body as an expression.

And finally we have Programs, which are a bunch of function definitions to define, and another bunch of expressions to evaluate.

Short and simple. We don’t need anything more4. This is how the Fibonacci function looks in FiboLisp:

(def fibo [n]
  (if (< n 2)
    n
    (+ (fibo (- n 1)) (fibo (- n 2)))))

We can see all the AST types in use here. Note that FiboLisp is lexically scoped.

The module also lists a bunch of keywords (carKeywords) that can appear in the car5 position of a Lisp expression, that we use later for auto-completion in the REPL, and some functions to convert the AST types to nice looking strings.

For the parser, we have this pared-down code:

module Language.FiboLisp.Parser (ParsingError(..), parse) where

import Control.DeepSeq (NFData)
import Control.Exception (Exception)
import GHC.Generics (Generic)
import Language.FiboLisp.Types

parse :: String -> Either ParsingError Program

data ParsingError = ParsingError String | EndOfStreamError
  deriving (Show, Generic, NFData)

instance Exception ParsingError

The essential function is parse, which takes the code as a string, and returns either a ParsingError on failure, or a Program on success. If the parser detects that an S-expression is not properly closed, it returns an EndOfStreamError error.

We also have this pretty-printer module that converts function ASTs back to pretty Lisp code:

module Language.FiboLisp.Printer (prettyShowDef) where

import Language.FiboLisp.Types

prettyShowDef :: Def -> String

Finally, the last thing before we hit the real topic of this post, the FiboLisp interpreter:

module Language.FiboLisp.Interpreter
  (Value, RuntimeError, interpret, builtinFuncs, builtinVals) where

import Control.DeepSeq (NFData)
import Control.Exception (Exception)
import Data.Map.Strict qualified as Map
import GHC.Generics (Generic)
import Language.FiboLisp.Types

interpret :: (String -> IO ()) -> Program -> IO (Either RuntimeError Value)

newtype RuntimeError = RuntimeError String
  deriving (Show, Generic, NFData)

instance Exception RuntimeError

data Value = ...
  deriving (Show, Generic, NFData)

builtinFuncs :: Map.Map String Value

builtinVals :: [Value]

We have elided the details again. All that matters to us is the interpret function that takes a program, and returns either a runtime error or a value. Value is the runtime representation of the values of FiboLisp expressions, and all we care about is that it can be shown and fully evaluated via NFData6. interpret also takes a String -> IO () function, that’ll be demystified when we get into implementing the REPL.

Lastly, we have a map of built-in functions and a list of built-in values. We expose them so that they can be treated specially in the REPL.

If you want, you can go ahead and fill in the missing code using your favourite parsing and pretty-printing libraries7, and the method of writing interpreters. For this post, those implementation details are not necessary.

Let’s package all this functionality into a module for ease of importing:

module Language.FiboLisp
  ( module Language.FiboLisp.Types,
    module Language.FiboLisp.Parser,
    module Language.FiboLisp.Printer,
    module Language.FiboLisp.Interpreter,
  )
where

import Language.FiboLisp.Interpreter
import Language.FiboLisp.Parser
import Language.FiboLisp.Printer
import Language.FiboLisp.Types

Now, with all the preparations done, we can go REPLing.

A REPL of Our Own

The main functionality that a REPL provides is entering expressions and definitions, one at a time, that it Reads, Evaluates, and Prints, and then Loops back, letting us do the same again. This can be accomplished with a simple program that prompts the user for an input and does all these with it. However, such a REPL will be quite lackluster.

These days programming languages come with advanced REPLs like IPython and nREPL, which provide many functionalities beyond simple REPLing. We want FiboLisp to have a great REPL too.

You may have already noticed some advanced features that our REPL provides in the demo. Let’s state them here:

  1. Commands starting with colon:
    1. to set and unset settings: :set and :unset,
    2. to load files into the REPL: :load,
    3. to show the source code of functions: :source,
    4. to show a help message: :help.
  2. Settings to enable/disable:
    1. dumping of parsed ASTs: dump,
    2. showing program execution times: time.
  3. Multiline expressions and functions, with correct indentation.
  4. Colored output and messages.
  5. Auto-completion of commands, code and file names.
  6. Safety checks when loading files.
  7. Readline-like navigation through the history of previous inputs.

Haskeline — the Haskell library that we use to create the REPL — provides only basic functionalities, upon which we build to provide these features. Let’s begin.

State and Settings

As usual, we start the module with many imports8:

{-# LANGUAGE TemplateHaskell #-}

module Language.FiboLisp.Repl (run) where

import Control.DeepSeq qualified as DS
import Control.Exception (Exception (..), evaluate)
import Control.Lens.Basic qualified as Lens
import Control.Monad (when)
import Control.Monad.Catch qualified as Catch
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.Identity (IdentityT (..))
import Control.Monad.Reader (MonadReader, ReaderT (runReaderT))
import Control.Monad.Reader qualified as Reader
import Control.Monad.State.Strict (MonadState, StateT (runStateT))
import Control.Monad.State.Strict qualified as State
import Control.Monad.Trans (MonadTrans, lift)
import Data.Char qualified as Char
import Data.Functor ((<&>))
import Data.List
  (dropWhileEnd, foldl', isPrefixOf, isSuffixOf, nub, sort, stripPrefix)
import Data.Map.Strict qualified as Map
import Data.Maybe (fromJust)
import Data.Set qualified as Set
import Data.Time (NominalDiffTime, diffUTCTime, getCurrentTime)
import Language.FiboLisp qualified as L
import System.Console.Haskeline qualified as H
import System.Console.Terminfo qualified as Term
import System.Directory (canonicalizePath, doesFileExist, getCurrentDirectory)

Notice that we import the previously shown Language.FiboLisp module qualified as L, and Haskeline as H. Another important library that we use here is terminfo, which helps us do colored output.

A REPL must preserve the context through a session. In case of FiboLisp, this means we should be able to define a function9 as one input, and then use it later in the session, one or many times10. The REPL should also respect the REPL settings through the session till they are unset.

Additionally, the REPL has to remember whether it is in middle of writing a multiline input. To support multiline input, the REPL also needs to remember the previous indentation, and the input done in previous lines of a multiline input. Together these form the ReplState:

data ReplState = ReplState
  { _replDefs :: Defs,
    _replSettings :: Settings,
    _replLineMode :: LineMode,
    _replIndent :: Int,
    _replSeenInput :: String
  }

type Defs = Map.Map L.Ident L.Def
type Settings = Set.Set Setting
data Setting = Dump | MeasureTime deriving (Eq, Ord, Enum)
data LineMode = SingleLine | MultiLine deriving (Eq)

instance Show Setting where
  show = \case
    Dump -> "dump"
    MeasureTime -> "time"

Let’s deal with settings first. We set and unset settings using the :set and :unset commands. So, we write the code to parse setting the settings:

data SettingMode = Set | Unset deriving (Eq, Enum)

instance Show SettingMode where
  show = \case
    Set -> ":set"
    Unset -> ":unset"

parseSetting :: String -> Maybe Setting
parseSetting = \case
  "dump" -> Just Dump
  "time" -> Just MeasureTime
  _ -> Nothing

parseSettingMode :: String -> Maybe SettingMode
parseSettingMode = \case
  ":set" -> Just Set
  ":unset" -> Just Unset
  _ -> Nothing

parseSettingCommand :: String -> Either String (SettingMode, Setting)
parseSettingCommand command = case words command of
  [modeStr, settingStr] -> case parseSettingMode modeStr of
    Just mode -> case parseSetting settingStr of
      Just setting -> Right (mode, setting)
      Nothing -> Left $ "Unknown setting: " <> settingStr
    Nothing -> Left $ "Unknown command: " <> command
  [modeStr]
    | Just _ <- parseSettingMode modeStr -> Left "No setting specified"
  _ -> Left $ "Unknown command: " <> command

Nothing fancy here, just splitting the input into words and going through them to make sure they are valid.

The REPL is a monad that wraps over ReplState:

newtype Repl a = Repl
  { runRepl_ :: StateT ReplState (ReaderT AddColor IO) a
  }
  deriving
    ( Functor,
      Applicative,
      Monad,
      MonadIO,
      MonadState ReplState,
      MonadReader AddColor,
      Catch.MonadThrow,
      Catch.MonadCatch,
      Catch.MonadMask
    )

type AddColor = Term.Color -> String -> String

runRepl :: AddColor -> Repl a -> IO a
runRepl addColor =
  fmap fst
    . flip runReaderT addColor
    . flip runStateT (ReplState Map.empty Set.empty SingleLine 0 "")
    . runRepl_

Repl also lets us do IO — is it really a REPL if you can’t do printing — and deal with exceptions. Additionally, we have a read-only state that is a function, which will be explained soon. The REPL starts in the single line mode, with no indentation, functions definitions, settings, or previously seen input.

REPLing Down the Prompt

Let’s go top-down. We write the run function that is the entry point of this module:

run :: IO ()
run = do
  term <- Term.setupTermFromEnv
  let addColor =
        case Term.getCapability term $ Term.withForegroundColor @String of
          Just fc -> fc
          Nothing -> \_ s -> s
  runRepl addColor . H.runInputT settings $ do
    H.outputStrLn $ addColor promptColor "FiboLisp REPL"
    H.outputStrLn $ addColor infoColor "Press <TAB> to start"
    repl
  where
    settings =
      H.setComplete doCompletions $
        H.defaultSettings {H.historyFile = Just ".fibolisp"}

This sets up Haskeline to run our REPL using the functions we provide in the later sections: repl and doCompletions. This also demystifies the read-only state of the REPL: a function that adds colors to our output strings, depending on the capabilities of the terminal in which our REPL is running in. We also set up a history file to remember the previous REPL inputs.

When the REPL starts, we output some messages in nice colors, which are defined as:

promptColor, printColor, outputColor, errorColor, infoColor :: Term.Color
promptColor = Term.Green
printColor = Term.White
outputColor = Term.Green
errorColor = Term.Red
infoColor = Term.Cyan

Off we go repling now:

type Prompt = H.InputT Repl

repl :: Prompt ()
repl = do
  replLineMode .= SingleLine
  replIndent .= 0
  replSeenInput .= ""
  Catch.handle (\H.Interrupt -> repl) . H.withInterrupt $
    readInput >>= \case
      EndOfInput -> outputWithColor promptColor "Goodbye."
      input -> evalAndPrint input >> repl

outputWithColor :: Term.Color -> String -> Prompt ()
outputWithColor color text = do
  addColor <- getAddColor
  H.outputStrLn $ addColor color text

getAddColor :: Prompt AddColor
getAddColor = lift Reader.ask

We infuse our Repl with the powers of Haskeline by wrapping it with Haskeline’s InputT monad transformer, and call it the Prompt type. In the repl function, we readInput, evalAndPrint it, and repl again.

We also deal with the user quitting the REPL (the EndOfInput case), and hitting Ctrl + C to interrupt typing or a running evaluation (the handling for H.Interrupt).

Wait a minute! What is that imperative looking .= doing in our Haskell code? That’s right, we are looking through some lenses!

type Lens' s a = Lens.Lens s s a a

replDefs :: Lens' ReplState Defs
replDefs = $(Lens.field '_replDefs)

replSettings :: Lens' ReplState Settings
replSettings = $(Lens.field '_replSettings)

replLineMode :: Lens' ReplState LineMode
replLineMode = $(Lens.field '_replLineMode)

replIndent :: Lens' ReplState Int
replIndent = $(Lens.field '_replIndent)

replSeenInput :: Lens' ReplState String
replSeenInput = $(Lens.field '_replSeenInput)

use :: (MonadTrans t, MonadState s m) => Lens' s a -> t m a
use l = lift . State.gets $ Lens.view l

(.=) :: (MonadTrans t, MonadState s m) => Lens' s a -> a -> t m ()
l .= a = lift . State.modify' $ Lens.set l a

(%=) :: (MonadTrans t, MonadState s m) => Lens' s a -> (a -> a) -> t m ()
l %= f = lift . State.modify' $ Lens.over l f

If you’ve never encountered lenses before, you can think of them as pairs of setters and getters. The repl* lenses above are for setting and getting the corresponding fields from the ReplState data type11. The use, .=, and %= functions are for getting, setting and modifying respectively the state in the State monad using lenses. We see them in action at the beginning of the repl function when we use .= to set the various fields of ReplState to their initial values in the State monad.

All that is left now is actually reading the input, evaluating it and printing the results.

Reading the Input

Haskeline gives us functions to read the user’s input as text. However, being Haskellers, we prefer some structure around it:

data Input
  = Setting (SettingMode, Setting)
  | Load FilePath
  | Source String
  | Help
  | Program L.Program
  | BadInputError String
  | EndOfInput

We’ve got all previously mentioned cases covered with the Input data type. We also do some input validation and capture errors for the failure cases with the BadInputError constructor. EndOfInput is used for when the user quits the REPL.

Here is how we read the input:

readInput :: Prompt Input
readInput = do
  addColor <- getAddColor
  lineMode <- use replLineMode
  prevIndent <- use replIndent

  let promptSym = case lineMode of SingleLine -> "λ"; _ -> "|"
      prompt = addColor promptColor $ promptSym <> "> "

  mInput <- H.getInputLineWithInitial prompt (replicate prevIndent ' ', "")
  let currentIndent = maybe 0 (length . takeWhile (== ' ')) mInput

  case trimStart . trimEnd <$> mInput of
    Nothing -> return EndOfInput
    Just input | null input -> do
      replIndent .= case lineMode of
        SingleLine -> prevIndent
        MultiLine -> currentIndent
      readInput
    Just input@(':' : _) -> parseCommand input
    Just input -> parseCode input currentIndent

trimStart :: String -> String
trimStart = dropWhile Char.isSpace

trimEnd :: String -> String
trimEnd = dropWhileEnd Char.isSpace

We use the getInputLineWithInitial function provided by Haskeline to show a prompt and read user’s input as a string. The prompt shown depends on the LineMode of the REPL state. In the SingleLine mode we show λ>, where in the MultiLine mode we show |>.

If there is no input, that means the user has quit the REPL. In that case we return EndOfInput, which is handled in the repl function. If the input is empty, we read more input, preserving the previous indentation (prevIndent) in the MultiLine mode.

If the input starts with :, we parse it for various commands:

parseCommand :: String -> Prompt Input
parseCommand input
  | ":help" `isPrefixOf` input = return Help
  | ":load" `isPrefixOf` input =
      checkFilePath . trimStart . fromJust $ stripPrefix ":load" input
  | ":source" `isPrefixOf` input = do
      return . Source . trimStart . fromJust $ stripPrefix ":source" input
  | input == ":" = return $ BadInputError "No command specified"
  | otherwise = case parseSettingCommand input of
      Right setting -> return $ Setting setting
      Left err -> return $ BadInputError err

checkFilePath :: String -> Prompt Input
checkFilePath file
  | null file = return $ BadInputError "No file specified"
  | otherwise =
      isSafeFilePath file <&> \case
        True -> Load file
        False -> BadInputError $ "Cannot access file: " <> file

isSafeFilePath :: (MonadIO m) => FilePath -> m Bool
isSafeFilePath fp =
  liftIO $ isPrefixOf <$> getCurrentDirectory <*> canonicalizePath fp

The :help and :source cases are straightforward. In case of :load, we make sure to check that the file asked to be loaded is located somewhere inside the current directory of the REPL or its recursive subdirectories. Otherwise, we deny loading by returning a BadInputError. We parse the settings using the parseSettingCommand function we wrote earlier.

If the input is not a command, we parse it as code:

parseCode :: String -> Int -> Prompt Input
parseCode currentInput indent = do
  seenInput <- use replSeenInput
  let input = seenInput <> " " <> currentInput
  case L.parse input of
    Left L.EndOfStreamError -> do
      replLineMode .= MultiLine
      replIndent .= indent
      replSeenInput .= input
      readInput
    Left err ->
      return $ BadInputError $ "ERROR: " <> displayException err
    Right program -> return $ Program program

We append the previously seen input (in case of multiline input) with the current input and parse it using the parse function provided by the Language.FiboLisp module. If parsing fails with an EndOfStreamError, it means that the input is incomplete. In that case, we set the REPL line mode to Multiline, REPL indentation to the current indentation, and seen input to the previously seen input appended with the current input, and read more input. If it is some other error, we return a BadInputError with it.

If the result of parsing is a program, we return it as a Program input.

That’s it for reading the user input. Next, we evaluate it.

Evaluating the Input

Recall that the repl function calls the evalAndPrint function with the read input:

evalAndPrint :: Input -> Prompt ()
evalAndPrint = \case
  EndOfInput -> return ()
  BadInputError err -> outputWithColor errorColor err
  Help -> H.outputStr helpMessage
  Setting (Set, setting) -> replSettings %= Set.insert setting
  Setting (Unset, setting) -> replSettings %= Set.delete setting
  Source ident -> showSource ident
  Load fp -> loadAndEvalFile fp
  Program program -> interpretAndPrint program
  where
    helpMessage =
      unlines
        [ "Available commands",
          ":set/:unset dump       Dumps the program AST",
          ":set/:unset time       Shows the program execution time",
          ":load <file>           Loads a source file",
          ":source <func_name>    Prints the source code of a function",
          ":help                  Shows this help"
        ]

The cases of EndOfInput, BadInputError and Help are straightforward. For settings, we insert or remove the setting from the REPL settings, depending on it being set or unset. For the other cases, we call the respective helper functions.

For a :source command, we check if the requested identifier maps to a user-defined or builtin function, and if so, print its source. Otherwise we print an error.

showSource :: L.Ident -> Prompt ()
showSource ident = do
  defs <- use replDefs
  case Map.lookup ident defs of
    Just def -> outputWithColor infoColor $ L.prettyShowDef def
    Nothing -> case Map.lookup ident L.builtinFuncs of
      Just func -> outputWithColor infoColor $ show func
      Nothing ->
        outputWithColor errorColor $ "No such function: " <> ident

For a :load command, we check if the requested file exists. If so, we read and parse it, and interpret the resultant program. In case of any errors in reading or parsing the file, we catch and print them.

loadAndEvalFile :: FilePath -> Prompt ()
loadAndEvalFile fp =
  liftIO (doesFileExist fp) >>= \case
    False -> outputWithColor errorColor $ "No such file: " <> fp
    True -> Catch.handleAll outputError $ do
      code <- liftIO $ readFile fp
      outputWithColor infoColor $ "Loaded " <> fp
      case L.parse code of
        Left err -> outputError err
        Right program -> interpretAndPrint program

outputError :: (Exception e) => e -> Prompt ()
outputError err =
  outputWithColor errorColor $ "ERROR: " <> displayException err

Finally, we come to the workhorse of the REPL: the interpretation of the user provided program:

interpretAndPrint :: L.Program -> Prompt ()
interpretAndPrint (L.Program pDefs exprs) =
  Catch.handleAll outputError $ do
    defs <- use replDefs
    settings <- use replSettings

    let defs' =
          foldl' (\ds d -> Map.insert (L.defName d) d ds) defs pDefs
        program = L.Program (Map.elems defs') exprs
    when (Dump `Set.member` settings) $
      outputWithColor infoColor (L.showProgram program)

    addColor <- getAddColor
    extPrint <- H.getExternalPrint

    (execTime, val) <- liftIO . measureElapsedTime $ do
      val <- L.interpret (extPrint . addColor printColor) program
      evaluate $ DS.force val

    case val of
      Left err -> outputError err
      Right v -> do
        let output = show v
        if null output
          then return ()
          else outputWithColor outputColor $ "=> " <> output

    when (MeasureTime `Set.member` settings) $
      outputWithColor infoColor $
        "(Execution time: " <> show execTime <> ")"

    replDefs .= defs'

measureElapsedTime :: IO a -> IO (NominalDiffTime, a)
measureElapsedTime f = do
  start <- getCurrentTime
  ret <- f
  end <- getCurrentTime
  return (diffUTCTime end start, ret)

We start by collecting the user defined functions in the current input with the previously defined functions in the session such that current functions override the previous functions with the same names. At this point, if the dump setting is set, we print the program AST.

Then we invoke the interpret function provided by the Language.FiboLisp module. Recall that the interpret function takes the program to interpret and a function of type String -> IO (). This function is a color-adding wrapper over the function returned by the Haskeline function getExternalPrint12. This function allows non-REPL code to safely print to the Haskeline driven REPL without garbling the output. We pass it to the interpret function so that the interpret can invoke it when the user code invokes the builtin print function or similar.

We make sure to force and evaluate the value returned by the interpreter so that any lazy values or errors are fully evaluated13, and the measured elapsed time is correct.

If the interpreter returns an error, we print it. Else we convert the value to a string, and if is it not empty14, we print it.

Finally, we print the execution time if the time setting is set, and set the REPL defs to the current program defs.

That’s all! We have completed our REPL. But wait, I think we forgot one thing …

Doing the Completions

The REPL would work fine with this much code, but it would not be a good experience for the user, because they’d have to type everything without any help from the REPL. To make it convenient for the user, we provide contextual auto-completion functionality while typing. Haskeline lets us plug in our custom completion logic by setting a completion function, which we did way back at the start. Now we need to implement it.

doCompletions :: H.CompletionFunc Repl
doCompletions =
  fmap runIdentityT . H.completeWordWithPrev Nothing " " $ \leftRev word -> do
    defs <- use replDefs
    lineMode <- use replLineMode
    settings <- use replSettings
    let funcs = nub $ Map.keys defs <> Map.keys L.builtinFuncs
        vals = map show L.builtinVals
    case (word, lineMode) of
      ('(' : rest, _) ->
        pure
          [ H.Completion ('(' : hint) hint True
            | hint <- nub . sort $ L.carKeywords <> funcs,
              rest `isPrefixOf` hint
          ]
      (_, SingleLine) -> case word of
        "" | null leftRev ->
          pure [H.Completion "" s True | s <- commands <> funcs <> vals]
        ':' : _ | null leftRev ->
          pure [H.simpleCompletion c | c <- commands, word `isPrefixOf` c]
        _
          | "tes:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion $ show s
                | s <- [Dump ..], s `notElem` settings, word `isPrefixOf` show s
              ]
          | "tesnu:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion $ show s
                | s <- [Dump ..], s `elem` settings, word `isPrefixOf` show s
              ]
          | "daol:" `isSuffixOf` leftRev ->
            isSafeFilePath word >>= \case
              True -> H.listFiles word
              False -> pure []
          | "ecruos:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion ident
                | ident <- funcs,
                  ident `Map.notMember` L.builtinFuncs,
                  word `isPrefixOf` ident
              ]
          | otherwise ->
            pure [H.simpleCompletion c | c <- funcs <> vals, word `isPrefixOf` c]
      _ -> pure []
  where
    commands = ":help" : ":load" : ":source" : map show [Set ..]

Haskeline provides us the completeWordWithPrev function to easily create our own completion function. It takes a callback function that it calls with the current word being completed (the word immediately to the left of the cursor), and the content of the line before the word (to the left of the word), reversed. We use these to return different completion lists of strings.

Going case by case:

  1. If the word starts with (, it means we are in middle of writing FiboLisp code. So we return the carKeywords and the user-defined and builtin function names that start with the current word sans the initial (. This happens regardless of the current line mode. Rest of the cases below apply only in the SingleLine mode.
  2. If the entire line is empty, we return the names of all commands, functions, and builtin values.
  3. If the word starts with :, and is at the beginning of the line, we return the commands that start with the word.
  4. If the line starts with
    1. :set, we return the not set settings
    2. :unset, we return the set settings
    3. :load, we return the names of the files and directories in the current directory
    4. :source, we return the names of the user-defined functions
    that start with the word.
  5. Otherwise we return no completions.

This covers all cases, and provides helpful completions, while avoiding bad ones. And this completes the implementation of our wonderful REPL.

Conclusion

I wrote this REPL while implementing a Lisp that I wrote15 while going through the Essentials of Compilation book, which I thoroughly recommend for getting started with compilers. It started as a basic REPL, and gathered a lot of nice functionalities over time. So I decided to extract and share it here. I hope that this Haskeline tutorial helps you in creating beautiful and useful REPLs. Here is the complete code for the REPL.


  1. The online demo is rather slow to load and to run, and works only on Firefox and Chrome. Even though I managed to put it together somehow, I don’t actually know how it exactly works, and I’m unable to fix the issues with it.↩︎

  2. Lisps are awesome and I absolutely recommend creating one or more of them as an amateur PL implementer. Some resources I recommend are: the Build Your Own Lisp book, and the Make-A-Lisp tutorial.↩︎

  3. REPLs are wonderful for doing interactive and exploratory programming where you try out small snippets of code in the REPL, and put your program together piece-by-piece. They are also good for debugging because they let you inspect the state of running programs from within. I still fondly remember the experience of connecting (or jacking in) to running productions systems written in Clojure over REPL, and figuring out issues by dumping variables.↩︎

  4. We don’t even need let. We can, and have to, define variables by creating functions, with parameters serving the role of variables. In fact, we can’t even assign or reassign variables. Functions are the only scoping mechanism in FiboLisp, much like old-school JavaScript with its IIFEs.↩︎

  5. car is obviously Contents of the Address part of the Register, the first expression in a list form in a Lisp.↩︎

  6. You may be wondering about why we need the NFData instances for the errors and values. This will become clear when we write the REPL.↩︎

  7. I recommend the sexp-grammar library, which provides both parsing and printing facilities for S-expressions based languages. Or you can write something by yourself using the parsing and pretty-printing libraries like megaparsec and prettyprinter.↩︎

  8. We assume that our project’s Cabal file sets the default-language to GHC2021, and the default-extensions to LambdaCase, OverloadedStrings, RecordWildCards, and StrictData.↩︎

  9. Recall that there is no way to define variables in FiboLisp.↩︎

  10. If the interpreter allows mutually recursive function definitions, functions can be called before defining them.↩︎

  11. We are using the basic-lens library here, which is the tiniest lens library, and provides only the five functions and types we see used here.↩︎

  12. Using the function returned from getExternalPrint is not necessary in our case because the REPL blocks when it invokes the interpreter. That means, nothing but the interpreter can print anything while it is running. So the interpreter can actually print directly to stdout and nothing will go wrong.

    However, imagine a case in which our code starts a background thread that needs to print to the REPL. In such case, we must use the Haskeline provided print function instead of printing directly. When printing to the REPL using it, Haskeline coordinates the prints so that the output in the terminal is not garbled.↩︎

  13. Now we see why we derive NFData instances for errors and Value.↩︎

  14. Returned value could be of type void with no textual representation, in which case we would not print it.↩︎

  15. I wrote the original REPL code almost three years ago. I refactored, rewrote and improved a lot of it in the course of writing this post. As they say, writing is thinking.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at October 31, 2024 12:00 AM

October 25, 2024

Derek Elkins

Classical First-Order Logic from the Perspective of Categorical Logic

Introduction

Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself often taken as the foundation of mathematics, is built. When classical FOL was being established there was a lot of study and debate around alternative options. There are a variety of philosophical and metatheoretic reasons supporting classical FOL as The Right Choice.

This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s eyes, and when type theory was taking its first stumbling steps.

My focus in this article is on what classical FOL looks like to a modern categorical logician. This can be neatly summarized as “classical FOL is the internal logic of a Boolean First-Order Hyperdoctrine. Each of the three words in this term,”Boolean”, “First-Order”, and “Hyperdoctrine”, suggest a distinct axis in which to vary the (class of categorical models of the) logic. All of them have compelling categorical motivations to be varied.

Boolean

The first and simplest is the term “Boolean”. This is what differentiates the categorical semantics of classical (first-order) logic from constructive (first-order) logic. Considering arbitrary first-order hyperdoctrines would give us a form of intuitionistic first-order logic.

It is fairly rare that the categories categorists are interested in are Boolean. For example, most toposes, all of which give rise to first-order hyperdoctrines, are not Boolean. The assumption that they are tends to correspond to a kind of “discreteness” that’s often at odds with the purpose of the topos. For example, a category of sheaves on a topological space is Boolean if and only if that space is a Stone space. These are certainly interesting spaces, but they are also totally disconnected unlike virtually every non-discrete topological space one would typically mention.

First-Order

The next term is the term “first-order”. As the name suggests, a first-order hyperdoctrine has the necessary structure to interpret first-order logic. The question, then, is what kind of categories have this structure and only this structure. The answer, as far as I’m aware, is not many.

Many (classes of) categories have the structure to be first-order hyperdoctrines, but often they have additional structure as well that it seems odd to ignore. The most notable and interesting example is toposes. All elementary toposes (which includes all Grothendieck toposes) have the structure to give rise to a first-order hyperdoctrine. But, famously, they also have the structure to give rise to a higher order logic. Even more interesting, while Grothendieck toposes, being elementary toposes, technically do support the necessary structure for first-order logic, the natural morphisms of Grothendieck toposes, geometric morphisms, do not preserve that structure, unlike the logical functors between elementary toposes.

The natural internal logic for Grothendieck toposes turns out to be geometric logic. This is a logic that lacks universal quantification and implication (and thus negation) but does have infinitary disjunction. This leads to a logic that is, at least superficially, incomparable to first-order logic. Closely related logics are regular logic and coherent logic which are sub-logics of both geometric logic and first-order logic.

We see, then, just from the examples of the natural logics of toposes, none of them are first-order logic, and we get examples that are more powerful, less powerful, and incomparable to first-order logic. Other common classes of categories give other natural logics, such as the cartesian logic from left exact categories, and monoidal categories give rise to (ordered) linear logics. We get the simply typed lambda calculus from cartesian closed categories which leads to the next topic.

Hyperdoctrine

A (posetal) hyperdoctrine essentially takes a category and, for each object in that category, assigns to it a poset of “predicates” on that object. In many cases, this takes the form of the Sub functor assigning to each object its poset of subobjects. Various versions of hyperdoctrines will require additional structure on the source category, these posets, and/or the functor itself to interpret various logical connectives. For example, a regular hyperdoctrine requires the source category to have finite limits, the posets to be meet-semilattices, and the functor to give rise to monotonic functions with left adjoints satisfying certain properties. This notion of hyperdoctrines is suitable for regular logic.

It’s very easy to recognize that these functors are essentially indexed |(0,1)|-categories. This immediately suggests that we should consider higher categorical versions or at the very least normal indexed categories.

What this means for the logic is that we move from proof-irrelevant logic to proof-relevant logic. We now have potentially multiple ways a “predicate” could “entail” another “predicate”. We can present the simply typed lambda calculus in this indexed category manner. This naturally leads/connects to the categorical semantics of type theories.

Pushing forward to |(\infty, 1)|-categories is also fairly natural, as it’s natural to want to talk about an entailment holding for distinct but “equivalent” reasons.

Summary

Moving in all three of these directions simultaneously leads pretty naturally to something like Homotopy Type Theory (HoTT). HoTT is a naturally constructive (but not anti-classical) type theory aimed at being an internal language for |(\infty, 1)|-toposes.

Why Classical FOL?

Okay, so why did people pick classical FOL in the first place? It’s not like the concept of, say, a higher-order logic wasn’t considered at the time.

Classical versus Intuitionistic was debated at the time, but at that time it was primarily a philosophical argument, and the defense of Intuitionism was not very compelling (to me and obviously people at the time). The focus would probably have been more on (classical) FOL versus second- (or higher-)order logic.

Oversimplifying, the issue with second-order logic is fairly evident from the semantics. There are two main approaches: Henkin-semantics and full (or standard) semantics. Henkin-semantics keeps the nice properties of (classical) FOL but fails to get the nice properties, namely categoricity properties, of second-order logic. This isn’t surprising as Henkin-semantics can be encoded into first-order logic. It’s essentially syntactic sugar. Full semantics, however, states that the interpretation of predicate sorts is power sets of (cartesian products of) the domain1. This leads to massive completeness problems as our metalogical set theory has many, many ways of building subsets of the domain. There are metatheoretic results that state that there is no computable set of logical axioms that would give us a sound and complete theory for second-order logic with respect to full semantics. This aspect is also philosophically problematic, because we don’t want to need set theory to understand the very formulation of set theory. Thus Quine’s comment that “second-order logic [was] set theory in sheep’s clothing”.

On the more positive and (meta-)mathematical side, we have results like Lindström’s theorem which states that classical FOL is the strongest logic that simultaneously satisfies (downward) Löwenheim-Skolem and compactness. There’s also a syntactic result by Lindström which characterizes first-order logic as the only logic having a recursively enumerable set of tautologies and satisfying Löwenheim-Skolem2.

The Catch

There’s one big caveat to the above. All of the above results are formulated in traditional model theory which means there are various assumptions built in to their statements. In the language of categorical logic, these assumptions can basically be summed up in the statement that the only category of semantics that traditional model theory considers is Set.

This is an utterly bizarre thing to do from the standpoint of categorical logic.

The issues with full semantics follow directly from this choice. If, as categorical logic would have us do, we considered every category with sufficient structure as a potential category of semantics, then our theory would not be forced to follow every nook and cranny of Set’s notion of subset to be complete. Valid formulas would need to be true not only in Set but in wildly different categories, e.g. every (Boolean) topos.

These traditional results are also often very specific to classical FOL. Dropping this constraint of classical logic would lead to an even broader class of models.

Categorical Perspective on Classical First-Order Logic

A Boolean category is just a coherent category where every object has a complement. Since coherent functors preserve complements, we have that the category of Boolean categories is a full subcategory of the category of coherent categories.

One nice thing about, specifically, classical first-order logic from the perspective of category theory is the following. First, coherent logic is a sub-logic of geometric logic restricted to finitary disjunction. Via Morleyization, we can encode classical first-order logic into coherent logic such that the categories of models of each are equivalent. This implies that a classical FOL formula is valid if and only if its encoding is. Morleyization allows us to analyze classical FOL using the tools of classifying toposes. On the one hand, this once again suggests the importance of coherent logic, but it also means that we can use categorical tools with classical FOL.

Conclusion

There are certain things that I and, I believe, most logicians take as table stakes for a (foundational) logic3. For example, checking a proof should be computably decidable. For these reasons, I am in complete accord with early (formal) logicians that classical second-order logic with full semantics is an unacceptably worse alternative to classical first-order logic.

However, when it comes to statements about the specialness of FOL, a lot of them seem to be more statements about traditional model theory than FOL itself, and also statements about the philosophical predilections of the time. I feel that philosophical attitudes among logicians and mathematicians have shifted a decent amount since the beginning of the 20th century. We have different philosophical predilections today than then, but they are informed by another hundred years of thought, and they are more relevant to what is being done today.

Martin-Löf type theory (MLTT) and its progeny also present an alternative path with their own philosophical and metalogical justifications. I mention this to point out actual cases of foundational frameworks that a (very) superficial reading of traditional model theory results would seem to have been “ruled out”. Even if one thinks the FOL+ZFC (or whatever) is the better foundations, I think it is unreasonable to assert that MLTT derivatives are unworkable as a foundations.


  1. It’s worth mentioning that this is exactly what categorical logic would suggest: our syntactic power objects should be mapped to semantic power objects.↩︎

  2. While nice, it’s not clear that compactness and, especially, Löwenheim-Skolem are sacrosanct properties that we’d be unwilling to do without. Lindström’s first theorem is thus a nice abstract characterization theorem for classical FOL, but it doesn’t shut the door on considering alternatives even in the context of traditional model theory.↩︎

  3. I’m totally fine thinking about logics that lack these properties, but I would never put any of them forward as an acceptable foundational logic.↩︎

October 25, 2024 12:55 AM

October 17, 2024

Tweag I/O

Introducing rules_gcs

At Tweag, we are constantly striving to improve the developer experience by contributing tools and utilities that streamline workflows. We recently completed a project with IMAX, where we learned that they had developed a way to simplify and optimize the process of integrating Google Cloud Storage (GCS) with Bazel. Seeing value in this tool for the broader community, we decided to publish it together under an open source license. In this blog post, we’ll dive into the features, installation, and usage of rules_gcs, and how it provides you with access to private resources.

What is rules_gcs?

rules_gcs is a Bazel ruleset that facilitates the downloading of files from Google Cloud Storage. It is designed to be a drop-in replacement for Bazel’s http_file and http_archive rules, with features that make it particularly suited for GCS. With rules_gcs, you can efficiently fetch large amounts of data, leverage Bazel’s repository cache, and handle private GCS buckets with ease.

Key Features

  • Drop-in Replacement: rules_gcs provides gcs_file and gcs_archive rules that can directly replace http_file and http_archive. They take a gs://bucket_name/object_name URL and internally translate this to an HTTPS URL. This makes it easy to transition to GCS-specific rules without major changes to your existing Bazel setup.

  • Lazy Fetching with gcs_bucket: For projects that require downloading multiple objects from a GCS bucket, rules_gcs includes a gcs_bucket module extension. This feature allows for lazy fetching, meaning objects are only downloaded as needed, which can save time and bandwidth, especially in large-scale projects.

  • Private Bucket Support: Accessing private GCS buckets is seamlessly handled by rules_gcs. The ruleset supports credential management through a credential helper, ensuring secure access without the need to hardcode credentials or use gsutil for downloading.

  • Bazel’s Downloader Integration: rules_gcs uses Bazel’s built-in downloader and repository cache, optimizing the download process and ensuring that files are cached efficiently across builds, even across multiple Bazel workspaces on your local machine.

  • Small footprint: Apart from the gcloud CLI tool (for obtaining authentication tokens), rules_gcs requires no additional dependencies or Bazel modules. This minimalistic approach reduces setup complexity and potential conflicts with other tools.

Understanding Bazel Repositories and Efficient Object Fetching with rules_gcs

Before we dive into the specifics of rules_gcs, it’s important to understand some key concepts about Bazel repositories and repository rules, as well as the challenges of efficiently managing large collections of objects from a Google Cloud Storage (GCS) bucket.

Bazel Repositories and Repository Rules

In Bazel, external dependencies are managed using repositories, which are declared in your WORKSPACE or MODULE.bazel file. Each repository corresponds to a package of code, binaries, or other resources that Bazel fetches and makes available for your build. Repository rules, such as http_archive or git_repository, and module extensions define how Bazel should download and prepare these external dependencies.

However, when dealing with a large number of objects, such as files stored in a GCS bucket, using a single repository to download all objects can be highly inefficient. This is because Bazel’s repository rules typically operate in an “eager” manner—they fetch all the specified files as soon as any target of the repository is needed. For large buckets, this means downloading potentially gigabytes of data even if only a few files are actually needed for the build. This eager fetching can lead to unnecessary network usage, increased build times, and larger disk footprints.

The rules_gcs Approach: Lazy Fetching with a Hub Repository

rules_gcs addresses this inefficiency by introducing a more granular approach to downloading objects from GCS. Instead of downloading all objects at once into a single repository, rules_gcs uses a module extension that creates a “hub” repository, which then manages individual sub-repositories for each GCS object.

How It Works
  1. Hub Repository: The hub repository acts as a central point of reference, containing metadata about the individual GCS objects. This follows the “hub-and-spoke” paradigm with a central repository (the bucket) containing references to a large number of small repositories for each object. This architecture is commonly used by Bazel module extensions to manage dependencies for different language ecosystems (including Python and Rust).

  2. Individual Repositories per GCS Object: For each GCS object specified in the lockfile, rules_gcs creates a separate repository using the gcs_file rule. This allows Bazel to fetch each object lazily—downloading only the files that are actually needed for the current build.

  3. Methods of Fetching: Users can choose between different methods in the gcs_bucket module extension. The default method of creating symlinks is efficient while preserving the file structure set in the lockfile. If you need to access objects as regular files, choose one of the other methods.

    • Symlink: Creates a symlink from the hub repo pointing to a file in its object repo, ensuring the object repo and symlink pointing to it are created only when the file is accessed.
    • Alias: Similar to symlink, but uses Bazel’s aliasing mechanism to reference the file. No files are created in the hub repo.
    • Copy: Creates a copy of a file in the hub repo when accessed.
    • Eager: Downloads all specified objects upfront into a single repository.

This modular approach is particularly beneficial for large-scale projects where only a subset of the data is needed for most builds. By fetching objects lazily, rules_gcs minimizes unnecessary data transfer and reduces build times.

Integrating with Bazel’s Credential Helper Protocol

Another critical aspect of rules_gcs is its seamless integration with Bazel’s credential management system. Accessing private GCS buckets securely requires proper authentication, and Bazel uses a credential helper protocol to handle this.

How Bazel’s Credential Helper Protocol Works

Bazel’s credential helper protocol is a mechanism that allows Bazel to fetch authentication credentials dynamically when accessing private resources, such as a GCS bucket. The protocol is designed to be simple and secure, ensuring that credentials are only used when necessary and are never hardcoded into build files.

When Bazel’s downloader prepares a request and a credential helper was configured, it invokes the credential helper with the command get. Additionally, the request URI is passed to the helpers standard input encoded as JSON. The helper is expected to return a JSON object containing HTTP headers, including the necessary Authorization token, which Bazel will then include in its requests.

Here’s a breakdown of how the credential_helper script used in rules_gcs works:

  1. Authentication Token Retrieval: The script uses the gcloud CLI tool to obtain an access token via gcloud auth application-default print-access-token. This token is tied to the user’s current authentication context and can be used to fetch any objects the user is allowed to access.

  2. Output Format: The script outputs the token in a JSON format that Bazel can directly use:

    {
      "headers": {
        "Authorization": ["Bearer ${TOKEN}"]
      }
    }

    This JSON object includes the Authorization header, which Bazel uses to authenticate its requests to the GCS bucket.

  3. Integration with Bazel: To use this credential helper, you need to configure Bazel by specifying the helper in the .bazelrc file:

    common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper

    This line tells Bazel to use the specified credential_helper script whenever it needs to access resources from storage.googleapis.com. If a request returns an error code or unexpected content, credentials are invalidated and the helper is invoked again.

How rules_gcs Hooks Into the Credential Helper Protocol

rules_gcs leverages this credential helper protocol to manage access to private GCS buckets securely and efficiently. By providing a pre-configured credential helper script, rules_gcs ensures that users can easily set up secure access without needing to manage tokens or authentication details manually.

Moreover, by limiting the scope of the credential helper to the GCS domain (storage.googleapis.com), rules_gcs reduces the risk of credentials being misused or accidentally exposed. The helper script is designed to be lightweight, relying on existing gcloud credentials, and integrates seamlessly into the Bazel build process.

Installing rules_gcs

Adding rules_gcs to your Bazel project is straightforward. The latest version is available on the Bazel Central Registry. To install, simply add the following to your MODULE.bazel file:

bazel_dep(name = "rules_gcs", version = "1.0.0")

You will also need to include the credential helper script in your repository:

mkdir -p tools
wget -O tools/credential-helper https://raw.githubusercontent.com/tweag/rules_gcs/main/tools/credential-helper
chmod +x tools/credential-helper

Next, configure Bazel to use the credential helper by adding the following lines to your .bazelrc:

common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper
# optional setting to make rules_gcs more efficient
common --experimental_repository_cache_hardlinks

These settings ensure that Bazel uses the credential helper specifically for GCS requests. Additionally, the setting --experimental_repository_cache_hardlinks allows Bazel to hardlink files from the repository cache instead of copying them into a repository. This saves time and storage space, but requires the repository cache to be located on the same filesystem as the output base.

Using rules_gcs in Your Project

rules_gcs provides three primary rules: gcs_bucket, gcs_file, and gcs_archive. Here’s a quick overview of how to use each:

  • gcs_bucket: When dealing with multiple files from a GCS bucket, the gcs_bucket module extension offers a powerful and efficient way to manage these dependencies. You define the objects in a JSON lockfile, and gcs_bucket handles the rest.

    gcs_bucket = use_extension("@rules_gcs//gcs:extensions.bzl", "gcs_bucket")
    
    gcs_bucket.from_file(
        name = "trainingdata",
        bucket = "my_org_assets",
        lockfile = "@//:gcs_lock.json",
    )
  • gcs_file: Use this rule to download a single file from GCS. It’s particularly useful for pulling in assets or binaries needed during your build or test processes. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

    gcs_file = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_file")
    
    gcs_file(
        name = "my_testdata",
        url = "gs://my_org_assets/testdata.bin",
        sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    )
  • gcs_archive: This rule downloads and extracts an archive from GCS, making it ideal for pulling in entire repositories or libraries that your project depends on. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

    gcs_archive = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_archive")
    
    gcs_archive(
        name = "magic",
        url = "gs://my_org_code/libmagic.tar.gz",
        sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        build_file = "@//:magic.BUILD",
    )

Try it Out

rules_gcs is a versatile and simple solution for integrating Google Cloud Storage with Bazel. We invite you to try out rules_gcs in your projects and contribute to its development. As always, we welcome feedback and look forward to seeing how this tool enhances your workflows. Check out the full example to get started!

Thanks to IMAX for sharing their initial implementation of rules_gcs and allowing us to publish the code under an open source license.

October 17, 2024 12:00 AM

October 15, 2024

Philip Wadler

You can help Cards Against Humanity pay "blue leaning" nonvoters $100 to vote


How is this not illegal??? Cards Against Humanity is PAYING people who didn't vote in 2020 to apologize, make a voting plan, and post #DonaldTrumpIsAHumanToilet—up to $100 for blue-leaning people in swing states. I helped by getting a 2024 Election Pack: checkout.giveashit.lol. Spotted via BoingBoing. More info at The Register. (Only American citizens and residents can participate. If, like me, you are an American citizen but non-resident, you will need a VPN.)

by Philip Wadler (noreply@blogger.com) at October 15, 2024 08:11 AM

October 14, 2024

Edward Z. Yang

Tensor programming for databases, with first class dimensions

Tensor libraries like PyTorch and JAX have developed compact and accelerated APIs for manipulating n-dimensional arrays. N-dimensional arrays are kind of similar to tables in database, and this results in the logical question which is could you setup a Tensor-like API to do queries on databases that would be normally done with SQL? We have two challenges:

  • Tensor computation is typically uniform and data-independent. But SQL relational queries are almost entirely about filtering and joining data in a data-dependent way.
  • JOINs in SQL can be thought of as performing outer joins, which is not a very common operation in tensor computation.

However, we have a secret weapon: first class dimensions were primarily designed to as a new frontend syntax that made it easy to express einsum, batching and tensor indexing expressions. They might be good for SQL too.

Representing the database. First, how do we represent a database? A simple model following columnar database is to have every column be a distinct 1D tensor, where all columns part of the same table have a consistent indexing scheme. For simplicity, we'll assume that we support rich dtypes for the tensors (e.g., so I can have a tensor of strings). So if we consider our classic customer database of (id, name, email), we would represent this as:

customers_id: int64[C]
customers_name: str[C]
customers_email: str[C]

Where C is the number of the entries in the customer database. Our tensor type is written as dtype[DIM0, DIM1, ...], where I reuse the name that I will use for the first class dimension that represents it. Let's suppose that the index into C does not coincide with id (which is good, because if they did coincide, you would have a very bad time if you ever wanted to delete an entry from the database!)

This gives us an opportunity for baby's first query: let's implement this query:

SELECT c.name, c.email FROM customers c WHERE c.id = 1000

Notice that the result of this operation is data-dependent: it may be zero or one depending on if the id is in the database. Here is a naive implementation in standard PyTorch:

mask = customers_id == 1000
return (customers_name[mask], customers_email[mask])

Here, we use boolean masking to perform the data-dependent filtering operation. This implementation in eager is a bit inefficient; we materialize a full boolean mask that is then fed into the subsequent operations; you would prefer for a compiler to fuse the masking and indexing together. First class dimensions don't really help with this example, but we need to introduce some new extensions to first class dimensions. First, what we can do:

C = dims(1)
c_id = customers_id[C]  # {C} => int64[]
c_name = customers_name[C]  # {C} => str[]
c_email = customers_email[C]  # {C} => str[]
c_mask = c_id == 1000  # {C} => bool[]

Here, a tensor with first class tensors has a more complicated type {DIM0, DIM1, ...} => dtype[DIM2, DIM3, ...]. The first class dimensions are all reported in the curly braces to the left of the double arrow; curly braces are used to emphasize the fact that first class dimensions are unordered.

What next? The problem is that now we want to do something like torch.where(c_mask, c_name, ???) but we are now in a bit of trouble, because we don't want anything in the false branch of where: we want to provide something like "null" and collapse the tensor to a smaller number of elements, much like how boolean masking did it without first class dimensions. To express this, we'll introduce a binary version of torch.where that does exactly this, as well as returning the newly allocated FCD for the new, data-dependent dimension:

C2, c2_name = torch.where(c_mask, c_name)  # {C2} => str[]
_C2, c2_email = torch.where(c_mask, c_email)  # {C2} => str[], n.b. C2 == _C2
return c2_name, c2_email

Notice that torch.where introduces a new first-class dimension. I've chosen that this FCD gets memoized with c_mask, so whenever we do more torch.where invocations we still get consistently the same new FCD.

Having to type out all the columns can be a bit tiresome. If we assume all elements in a table have the same dtype (let's call it dyn, short for dynamic type), we can more compactly represent the table as a 2D tensor, where the first dimension is the indexing as before, and the second dimension is the columns of the database. For clarity, we'll support using the string name of the column as a shorthand for the numeric index of the column. If the tensor is contiguous, this gives a more traditional row-wise database. The new database can be conveniently manipulated with FCDs, as we can handle all of the columns at once instead of typing them out individually):

customers:  dyn[C, C_ATTR]
C = dims(1)
c = customers[C]  # {C} => dyn[C_ATTR]
C2, c2 = torch.where(c["id"] == 1000, c)  # {C2} => dyn[C_ATTR]
return c2[["name", "email"]].order(C2)  # dyn[C2, ["name", "email"]]

We'll use this for the rest of the post, but the examples should be interconvertible.

Aggregation. What's the average age of all customers, grouped by the country they live in?

SELECT AVG(c.age) FROM customers c GROUP BY c.country;

PyTorch doesn't natively support this grouping operation, but essentially what is desired here is a conversion into a nested tensor, where the jagged dimension is the country (each of which will have a varying number of countries). Let's hallucinate a torch.groupby analogous to its Pandas equivalent:

customers: dyn[C, C_ATTR]
customers_by_country = torch.groupby(customers, "country")  # dyn[COUNTRY, JC, C_ATTR]
COUNTRY, JC = dims(2)
c = customers_by_country[COUNTRY, JC]  # {COUNTRY, JC} => dyn[C_ATTR]
return c["age"].mean(JC).order(COUNTRY)  # f32[COUNTRY]

Here, I gave the generic indexing dimension the name JC, to emphasize that it is a jagged dimension. But everything proceeds like we expect: after we've grouped the tensor and rebound its first class dimensions, we can take the field of interest and explicitly specify a reduction on the dimension we care about.

In SQL, aggregations have to operate over the entirety of groups specified by GROUP BY. However, because FCDs explicitly specify what dimensions we are reducing over, we can potentially decompose a reduction into a series of successive reductions on different columns, without having to specify subqueries to progressively perform the reductions we are interested in.

Joins. Given an order table, join it with the customer referenced by the customer id:

SELECT o.id, c.name, c.email FROM orders o JOIN customers c ON o.customer_id = c.id

First class dimensions are great at doing outer products (although, like with filtering, it will expensively materialize the entire outer product naively!)

customers: dyn[C, C_ATTR]
orders: dyn[O, O_ATTR]
C, O = dims(2)
c = customers[C]  # {C} => dyn[C_ATTR]
o = orders[O]  # {O} => dyn[O_ATTR]
mask = o["customer_id"] == c["id"]  # {C, O} => bool[]
outer_product = torch.cat(o[["id"]], c[["name", "email"]])  # {C, O} => dyn[["id", "name", "email"]]
CO, co = torch.where(mask, outer_product)  # {CO} => dyn[["id", "name", "email"]]
return co.order(CO)  # dyn[C0, ["id", "name", "email"]]

What's the point. There are a few reasons why we might be interested in the correspondence here. First, we might be interested in applying SQL ideas to the Tensor world: a lot of things people want to do in preprocessing are similar to what you do in traditional relational databases, and SQL can teach us what optimizations and what use cases we should think about. Second, we might be interested in applying Tensor ideas to the SQL world: in particular, I think first class dimensions are a really intuitive frontend for SQL which can be implemented entirely embedded in Python without necessitating the creation of a dedicated DSL. Also, this might be the push needed to get TensorDict into core.

by Edward Z. Yang at October 14, 2024 05:07 AM

Brent Yorgey

MonadRandom: major or minor version bump?

MonadRandom: major or minor version bump?

Posted on October 14, 2024
Tagged , , , ,

tl;dr: a fix to the MonadRandom package may cause fromListMay and related functions to extremely rarely output different results than they used to. This could only possibly affect anyone who is using fixed seed(s) to generate random values and is depending on the specific values being produced, e.g. a unit test where you use a specific seed and test that you get a specific result. Do you think this should be a major or minor version bump?


The Fix

Since 2013 I have been the maintainer of MonadRandom, which defines a monad and monad transformer for generating random values, along with a number of related utilities.

Recently, Toni Dietze pointed out a rare situation that could cause the fromListMay function to crash (as well as the other functions which depend on it: fromList, weighted, weightedMay, uniform, and uniformMay). This function is supposed to draw a weighted random sample from a list of values decorated with weights. I’m not going to explain the details of the issue here; suffice it to say that it has to do with conversions between Rational (the type of the weights) and Double (the type that was being used internally for generating random numbers).

Even though this could only happen in rare and/or strange circumstances, fixing it definitely seemed like the right thing to do. After a bit of discussion, Toni came up with a good suggestion for a fix: we should no longer use Double internally for generating random numbers, but rather Word64, which avoids conversion and rounding issues.

In fact, Word64 is already used internally in the generation of random Double values, so we can emulate the behavior of the Double instance (which was slightly tricky to figure out) so that we make exactly the same random choices as before, but without actually converting to Double.

The Change

…well, not exactly the same random choices as before, and therein lies the rub! If fromListMay happens to pick a random value which is extremely close to a boundary between choices, it’s possible that the value will fall on one side of the boundary when using exact calculations with Word64 and Rational, whereas before it would have fallen on the other side of the boundary after converting to Double due to rounding. In other words, it will output the same results almost all the time, but for a list of \(n\) weighted choices there is something like an \(n/2^{64}\) chance (or less) that any given random choice will be different from what it used to be. I have never observed this happening in my tests, and indeed, I do not expect to ever observe it! If we generated one billion random samples per second continuously for a thousand years, we might expect to see it happen once or twice. I am not even sure how to engineer a test scenario to force it to happen, because we would have to pick an initial PRNG seed that forces a certain Word64 value to be generated.

To PVP or not to PVP?

Technically, a function exported by MonadRandom has changed behavior, so according to the Haskell PVP specification this should be a major version bump (i.e. 0.6 to 0.7).Actually, I am not even 100% clear on this. The decision tree on the PVP page says that changing the behavior of an exported function necessitates a major version bump; but the actual specification does not refer to behavior at all—as I read it, it is exclusively concerned with API compatibility, i.e. whether things will still compile.

But there seem to be some good arguments for doing just a minor version bump (i.e. 0.6 to 0.6.1).

  • Arguments in favor of a minor version bump:

    • A major version bump would cause a lot of (probably unnecessary) breakage! MonadRandom has 149 direct reverse dependencies, and about 3500 distinct transitive reverse dependencies. Forcing all those packages to update their upper bound on MonadRandom would be a lot of churn.

    • What exactly constitutes the “behavior” of a function to generate random values? It depends on your point of view. If we view the function as a pure mathematical function which takes a PRNG state as input and produces some value as output, then its behavior is defined precisely by which outputs it returns for which input seeds, and its behavior has changed. However, if we think of it in more effectful terms, we could say its “behavior” is just to output random values according to a certain distribution, in which case its behavior has not changed.

    • It’s extremely unlikely that this change will cause any breakage; moreover, as argued by Boyd Stephen Smith, anyone who cares enough about reproducibility to be relying on specific outputs for specific seeds is probably already pinning all their package versions.

  • Arguments in favor of a major version bump:

    • It’s what the PVP specifies; what’s the point of having a specification if we don’t follow it?

    • In the unlikely event that this change does cause any breakage, it could be extremely difficult for package maintainers to track down. If the behavior of a random generation function completely changes, the source of the issue is obvious. But if it only changes for very rare inputs, you might reasonably think the problem is something else. A major version bump will force maintainers to read the changelog for MonadRandom and assess whether this is a change that could possibly affect them.

So, do you have opinions on this? Would the release affect you one way or the other? Feel free to leave a comment here, or send me an email with your thoughts. Note there has already been a bit of discussion on Mastodon as well.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at October 14, 2024 12:00 AM

October 05, 2024

Lysxia's blog

Unicode shenanigans: Martine écrit en UTF-8

An old French meme
Martine écrit en UTF-8 (parody cover of the Martine series of French children's books)

On my feed aggregator haskell.pl-a.net, I occasionally saw posts with broken titles like this (from ezyang’s blog):

What’s different this time? LLM edition

Yesterday I decided to do something about it.

Locating the problem

Tracing back where it came from, that title was sent already broken by Planet Haskell, which is itself a feed aggregator for blogs. The blog originally produces the good not broken title. Therefore the blame lies with Planet Haskell. It’s probably a misconfigured locale. Maybe someone will fix it. It seems to be running archaic software on an old machine, stuff I wouldn’t deal with myself so I won’t ask someone else to.

ASCII diagram of how a blog title travels through the relevant parties
      Blog
       |
       | What’s
       v
 Planet Haskell
       | 
       | What’s
       v
haskell.pl-a.net (my site)
       |
       | What’s
       v
  Your screen

In any case, this mistake can be fixed after the fact. Mis-encoded text is such an ubiquitous issue that there are nicely packaged solutions out there, like ftfy.

ftfy has been used as a data processing step in major NLP research, including OpenAI’s original GPT.

But my hobby site is written in OCaml and I would rather have fun solving this encoding problem than figure out how to install a Python program and call it from OCaml.

Explaining the problem

This is the typical situation where a program is assuming the wrong text encoding.

Text encodings

A quick summary for those who don’t know about text encodings.

Humans read and write sequences of characters, while computers talk to each other using sequences of bytes. If Alice writes a blog, and Bob wants to read it from across the world, the characters that Alice writes must be encoded into bytes so her computer can send it over the internet to Bob’s computer, and Bob’s computer must decode those bytes to display them on his screen. The mapping between sequences of characters and sequences of bytes is called an encoding.

Multiple encodings are possible, but it’s not always obvious which encoding to use to decode a given byte string. There are good and bad reasons for this, but the net effect is that many text-processing programs arbitrarily guess and assume the encoding in use, and sometimes they assume wrong.

Back to the problem

UTF-8 is the most prevalent encoding nowadays.1 I’d be surprised if one of the Planet Haskell blogs doesn’t use it, which is ironic considering the issue we’re dealing with.

  1. A blog using UTF-8 encodes the right single quote2 " ’ " as three consecutive bytes (226, 128, 153) in its RSS or Atom feed.
  2. The culprit, Planet Haskell, read those bytes but wrongly assumed an encoding different from UTF-8 where each byte corresponds to one character.
  3. It did some transformation to the decoded text (extract the title and body and put it on a webpage with other blogs).
  4. It encoded the final result in UTF-8.
ASCII diagram of how text gets encoded and decoded (wrongly)
      What the blog sees →       '’'
                                  |
                                  | UTF-8 encode (one character into three bytes)
                                  v
                             226 128 153
                                  |
                                  | ??? decode (not UTF-8)
                                  v
What Planet Haskell sees →   'â' '€' '™'
                                  |
                                  | UTF-8 encode
                                  v
                                (...)
                                  |
                                  | UTF-8 decode
                                  v
            What you see →   'â' '€' '™'

The final encoding doesn’t really matter, as long as everyone else downstream agrees with it. The point is that Planet Haskell outputs three characters “’” in place of the right single quote " ’ ", all because UTF-8 represents " ’ " with three bytes.

In spite of their differences, most encodings in practice agree at least about ASCII characters, in the range 0-127, which is sufficient to contain the majority of English language writing if you can compromise on details such as confusing the apostrophe and the single quotes. That’s why in the title “What’s different this time?” everything but one character was transferred fine.

Solving the problem

The fix is simple: replace “’” with " ’ ". Of course, we also want to do that with all other characters that are mis-encoded the same way: those are exactly all the non-ASCII Unicode characters. The more general fix is to invert Planet Haskell’s decoding logic. Thank the world that this mistake can be reversed to begin with. If information had been lost by mis-encoding, I may have been forced to use one of those dreadful LLMs to reconstruct titles.3

  1. Decode Planet Haskell’s output in UTF-8.
  2. Encode each character as a byte to recover the original output from the blog.
  3. Decode the original output correctly, in UTF-8.

There is one missing detail: what encoding to use in step 2? I first tried the naive thing: each character is canonically a Unicode code point, which is a number between 0 and 1114111, and I just hoped that those which did occur would fit in the range 0-255. That amounts to making the hypothesis that Planet Haskell is decoding blog posts in Latin-1. That seems likely enough, but you will have guessed correctly that the naive thing did not reconstruct the right single quote in this case. The Latin-1 hypothesis was proven false.

As it turns out, the euro sign “€” and the trademark symbol “™” are not in the Latin-1 alphabet. They are code points numbers 8364 and 8482 in Unicode, which are not in the range 0-255. Planet Haskell has to be using an encoding that features these two symbols. I needed to find which one.

Faffing about, I came across the Wikipedia article on Western Latin character sets which lists a comparison table. How convenient. I looked up the two symbols to find what encoding had them, if any. There were two candidates: Windows-1252 and Macintosh. Flip a coin. It was Windows-1252.

Windows-1252 differs from Latin-1 (and thus Unicode) in 27 positions, those whose byte starts with 8 or 9 in hexadecimal (27 valid characters + 5 unused positions): that’s 27 characters that I had to map manually to the range 0-255 according to the Windows-1252 encoding, and the remaining characters would be mapped for free by Unicode. This data entry task was autocompleted halfway through by Copilot, because of course GPT-* knows Windows-1252 by heart.

let windows1252_hack (c : Uchar.t) : int =
  let c = Uchar.to_int c in
  if      c = 0x20AC then 0x80
  else if c = 0x201A then 0x82
  else if c = 0x0192 then 0x83
  else if c = 0x201E then 0x84
  else if c = 0x2026 then 0x85
  else if c = 0x2020 then 0x86
  else if c = 0x2021 then 0x87
  else if c = 0x02C6 then 0x88
  else if c = 0x2030 then 0x89
  else if c = 0x0160 then 0x8A
  else if c = 0x2039 then 0x8B
  else if c = 0x0152 then 0x8C
  else if c = 0x017D then 0x8E
  else if c = 0x2018 then 0x91
  else if c = 0x2019 then 0x92
  else if c = 0x201C then 0x93
  else if c = 0x201D then 0x94
  else if c = 0x2022 then 0x95
  else if c = 0x2013 then 0x96
  else if c = 0x2014 then 0x97
  else if c = 0x02DC then 0x98
  else if c = 0x2122 then 0x99
  else if c = 0x0161 then 0x9A
  else if c = 0x203A then 0x9B
  else if c = 0x0153 then 0x9C
  else if c = 0x017E then 0x9E
  else if c = 0x0178 then 0x9F
  else c

And that’s how I restored the quotes, apostrophes, guillemets, accents, et autres in my feed.


See also


Update: When Planet Haskell picked up this post, it fixed the intentional mojibake in the title.

Screenshot of Planet Haskell with a correctly displayed diacritic. October 05, 2024. Lysxia's blog. Unicode shenanigans: Martine écrit en UTF-8

There is no room for this in my mental model. Planet Haskell is doing something wild to parse blog titles.


  1. As of September 2024, UTF-8 is used by 98.3% of surveyed web sites.↩︎

  2. The Unicode right single quote is sometimes used as an apostrophe, to much disapproval.↩︎

  3. Or I could just query the blogs directly for their titles.↩︎

by Lysxia at October 05, 2024 12:00 AM

Christopher Allen

Routines in caring for children

I have 4 children aged 4, 3, almost 2, and 19 weeks. Parents are increasingly isolated from each other socially so it's harder to compare tactics and strategies for caregiving. I want to share a run-down of how my wife and I care for our children and what has seemed to work and what has not.

by Unknown at October 05, 2024 12:00 AM

October 04, 2024

Derek Elkins

Global Rebuilding, Coroutines, and Defunctionalization

Introduction

In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.

Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.

The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.

More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.

Coroutines

Okasaki passingly mentions that global rebuilding “can be usefully viewed as running the rebuilding transformation as a coroutine”. Also, the situation described above is quite reminiscent of garbage collection. There the classic half-space stop-the-world copying collector is naturally the batched rebuilding version. More incremental versions often have read or write barriers and break the garbage collection into incremental steps. Garbage collection is also often viewed as two processes coroutining.

The goal of this article is to derive global rebuilding-based data structures from an expression of them as two coroutining processes. Ideally, we should be able to take a data structure implemented via batched rebuilding and simply run the batch rebuilding step as a coroutine. Modifying the data structure’s operations and the rebuilding step should, in theory, just be a matter of inserting appropriate yield statements. Of course, it won’t be that easy since the batched version of rebuilding doesn’t need to worry about concurrent updates to the original data structure.

In theory, such a representation would be a perfectly effective way of articulating the global rebuilding version of the data structure. That said, I will be using the standard power move of CPS transforming and defunctionalizing to get a more data structure-like result.

I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with continuations. In that context, a “process” written in continuation-passing style “yields” to the scheduler by passing its continuation to a scheduling function. Normally, the scheduler would place that continuation at the end of a work queue and then pick up a continuation from the front of the work queue and invoke it resuming the previously suspended “process”. In our case, we only have two “processes” so our “work queue” can just be a single mutable cell. When one “process” yields, it just swaps its continuation into the cell and the other “process’” out and invokes the continuation it read.

Since the rebuilding process is always driven by the main process, the pattern is a bit more like generators. This has the benefit that only the rebuilding process needs to be written in continuation-passing style. The following is a very quick and dirty set of functions for this.

module Coroutine ( YieldFn, spawn ) where
import Control.Monad ( join )
import Data.IORef ( IORef, newIORef, readIORef, writeIORef )

type YieldFn = IO () -> IO ()

yield :: IORef (IO ()) -> IO () -> IO ()
yield = writeIORef

resume :: IORef (IO ()) -> IO ()
resume = join . readIORef

terminate :: IORef (IO ()) -> IO ()
terminate yieldRef = writeIORef yieldRef (ioError $ userError "Subprocess completed")

spawn :: (YieldFn -> IO () -> IO ()) -> IO (IO ())
spawn process = do
    yieldRef <- newIORef undefined
    writeIORef yieldRef $ process (yield yieldRef) (terminate yieldRef)
    return (resume yieldRef)

A simple example of usage is:

process :: YieldFn -> Int -> IO () -> IO ()
process     _ 0 k = k
process yield i k = do
    putStrLn $ "Subprocess: " ++ show i
    yield $ process yield (i-1) k

example :: IO ()
example = do
    resume <- spawn $ \yield -> process yield 10
    forM_ [(1 :: Int) .. 10] $ \i -> do
        putStrLn $ "Main process: " ++ show i
        resume
    putStrLn "Main process done"

with output:

Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done

Queues

I’ll use queues since they are very simple and Purely Functional Data Structures describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global rebuilding. We’ll end up with something quite similar which could be made more similar by changing the rebuilding code. Indeed, the differences are just an artifact of specific, easily changed details of the rebuilding coroutine, as we’ll see.

The examples I’ll present are mostly imperative, not purely functional. There are two reasons for this. First, I’m not focused on purely functional data structures and the technique works fine for imperative data structures. Second, it is arguably more natural to talk about coroutines in an imperative context. In this case, it’s easy to adapt the code to a purely functional version since it’s not much more than a purely functional data structure stuck in an IORef.

For a more imperative structure with mutable linked structure and/or in-place array updates, it would be more challenging to produce a purely functional version. The techniques here could still be used, though there are more “concurrency” concerns. While I don’t include the code here, I did a similar exercise for a random-access stack (a fancy way of saying a growable array). There the “concurrency” concern is that the elements you are copying to the new array may be popped and potentially overwritten before you switch to the new array. In this case, it’s easy to solve, since if the head pointer of the live version reaches the source offset for copy, you can just switch to the new array immediately.

Nevertheless, I can easily imagine scenarios where it may be beneficial, if not necessary, for the coroutines to communicate more and/or for there to be multiple “rebuild” processes. The approach used here could be easily adapted to that. It’s also worth mentioning that even in simpler cases, non-constant-time operations will either need to invoke resume multiple times or need more coordination with the “rebuild” process to know when it can do more than a constant amount of work. This could be accomplished by “rebuild” process simply recognizing this from the data structure state, or some state could be explicitly set to indicate this, or the techniques described earlier could be used, e.g. a different process for non-constant-time operations.

The code below uses the extensions BangPatterns, RecordWildCards, and GADTs.

Batched Rebuilding Implementation

We start with the straightforward, amortized constant-time queues where we push to a stack representing the back of the queue and pop from a stack representing the front. When the front stack is empty, we need to expensively reverse the back stack to make a new front stack.

I intentionally separate out the reverse step as an explicit rebuild function.

module BatchedRebuildingQueue ( Queue, new, enqueue, dequeue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef )

data Queue a = Queue {
    queueRef :: IORef ([a], [a])
}

new :: IO (Queue a)
new = do
    queueRef <- newIORef ([], [])
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q@(Queue { .. }) = do
    (front, back) <- readIORef queueRef
    case front of
        (x:front') -> do
            writeIORef queueRef (front', back)
            return (Just x)
        [] -> case back of
                [] -> return Nothing
                _ -> rebuild q >> dequeue q

enqueue :: a -> Queue a -> IO ()
enqueue x (Queue { .. }) =
    modifyIORef queueRef (\(front, back) -> (front, x:back))

rebuild :: Queue a -> IO ()
rebuild (Queue { .. }) =
    modifyIORef queueRef (\([], back) -> (reverse back, []))

Global Rebuilding Implementation

This step is where a modicum of thought is needed. We need to make the rebuild step from the batched version incremental. This is straightforward, if tedious, given the coroutine infrastructure. In this case, we incrementalize the reverse by reimplementing reverse in CPS with some yield calls inserted. Then we need to incrementalize append. Since we’re not waiting until front is empty, we’re actually computing front ++ reverse back. Incrementalizing append is hard, so we actually reverse front and then use an incremental reverseAppend (which is basically what the incremental reverse does anyway1).

One of first thing to note about this code is that the actual operations are largely unchanged other than inserting calls to resume. In fact, dequeue is even simpler than in the batched version as we can just assume that front is always populated when the queue is not empty. dequeue is freed from the responsibility of deciding when to trigger a rebuild. Most of the bulk of this code is from reimplementing a reverseAppend function (twice).

The parts of this code that require some deeper though are 1) knowing when a rebuild should begin, 2) knowing how “fast” the incremental operations should go2 (e.g. incrementalReverse does two steps at a time and the Hood-Melville implementation has an explicit exec2 that does two steps at a time), and 3) dealing with “concurrent” changes.

For the last, Overmars describes a queue of deferred operations to perform on the shadow copy once it finishes rebuilding. This kind of suggests a situation where the “rebuild” process can reference some “snapshot” of the data structure. In our case, that is the situation we’re in, since our data structures are essentially immutable data structures in an IORef. However, it can easily not be the case, e.g. the random-access stack. Also, this operation queue approach can easily be inefficient and inelegant. None of the implementations below will have this queue of deferred operations. It is easier, more efficient, and more elegant to just not copy over parts of the queue that have been dequeued, rather than have an extra phase of the rebuilding that just pops off the elements of the front stack that we just pushed. A similar situation happens for the random-access stack.

The use of drop could probably be easily eliminated. (I’m not even sure it’s still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.

module GlobalRebuildingQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
import Coroutine ( YieldFn, spawn )

data Queue a = Queue {
    resume :: IO (),
    frontRef :: IORef [a],
    backRef :: IORef [a],
    frontCountRef :: IORef Int,
    backCountRef :: IORef Int
}

new :: IO (Queue a)
new = do
    frontRef <- newIORef []
    backRef <- newIORef []
    frontCountRef <- newIORef 0
    backCountRef <- newIORef 0
    resume <- spawn $ const . rebuild frontRef backRef frontCountRef backCountRef
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q = do
    resume q
    front <- readIORef (frontRef q)
    case front of
        [] -> return Nothing
        (x:front') -> do
            modifyIORef' (frontCountRef q) pred
            writeIORef (frontRef q) front'
            return (Just x)

enqueue :: a -> Queue a -> IO ()
enqueue x q = do
    modifyIORef (backRef q) (x:)
    modifyIORef' (backCountRef q) succ
    resume q

rebuild :: IORef [a] -> IORef [a] -> IORef Int -> IORef Int -> YieldFn -> IO ()
rebuild frontRef backRef frontCountRef backCountRef yield = let k = go k in go k where
  go k = do
    frontCount <- readIORef frontCountRef
    backCount <- readIORef backCountRef
    if backCount > frontCount then do
        back <- readIORef backRef
        front <- readIORef frontRef
        writeIORef backRef []
        writeIORef backCountRef 0
        incrementalReverse back [] $ \rback ->
            incrementalReverse front [] $ \rfront ->
                incrementalRevAppend rfront rback 0 backCount k
      else do
        yield k

  incrementalReverse [] acc k = k acc
  incrementalReverse [x] acc k = k (x:acc)
  incrementalReverse (x:y:xs) acc k = yield $ incrementalReverse xs (y:x:acc) k

  incrementalRevAppend [] front !movedCount backCount' k = do
    writeIORef frontRef front
    writeIORef frontCountRef $! movedCount + backCount'
    yield k
  incrementalRevAppend (x:rfront) acc !movedCount backCount' k = do
    currentFrontCount <- readIORef frontCountRef
    if currentFrontCount <= movedCount then do
        -- This drop count should be bounded by a constant.
        writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
        writeIORef frontCountRef $! currentFrontCount + backCount'
        yield k
      else if null rfront then
        incrementalRevAppend [] (x:acc) (movedCount + 1) backCount' k
      else
        yield $! incrementalRevAppend rfront (x:acc) (movedCount + 1) backCount' k

Defunctionalized Global Rebuilding Implementation

This step is completely mechanical.

There’s arguably no reason to defunctionalize. It produces a result that is more data-structure-like, but, unless you need the code to work in a first-order language, there’s nothing really gained by doing this. It does lead to a result that is more directly comparable to other implementations.

For some data structures, having the continuation be analyzable would provide a simple means for the coroutines to communicate. The main process could directly look at the continuation to determine its state, e.g. if a rebuild is in-progress at all. The main process could also directly manipulate the stored continutation to change the “rebuild” process’ behavior. That said, doing this would mean that we’re not deriving the implementation. Still, the opportunity for additional optimizations and simplifications is nice.

As a minor aside, while it is, of course, obvious from looking at the previous version of the code, it’s neat how the Kont data type implies that the call stack is bounded and that most calls are tail calls. REVERSE_STEP is the only constructor that contains a Kont argument, but its type means that that argument can’t itself be a REVERSE_STEP. Again, I just find it neat how defunctionalization makes this concrete and explicit.

module DefunctionalizedQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )

data Kont a r where
  IDLE :: Kont a ()
  REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
  REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
  REV_APPEND_START :: [a] -> !Int -> Kont a [a]
  REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()

applyKont :: Queue a -> Kont a r -> r -> IO ()
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
    incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
    incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
    incrementalRevAppend q rfront acc movedCount backCount

rebuildLoop :: Queue a -> IO ()
rebuildLoop q@(Queue { .. }) = do
    frontCount <- readIORef frontCountRef
    backCount <- readIORef backCountRef
    if backCount > frontCount then do
        back <- readIORef backRef
        front <- readIORef frontRef
        writeIORef backRef []
        writeIORef backCountRef 0
        incrementalReverse q back [] $ REVERSE_FRONT front backCount
      else do
        writeIORef resumeRef IDLE

incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> IO ()
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = writeIORef (resumeRef q) $ REVERSE_STEP xs (y:x:acc) k

incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> IO ()
incrementalRevAppend (Queue { .. }) [] front !movedCount backCount' = do
    writeIORef frontRef front
    writeIORef frontCountRef $! movedCount + backCount'
    writeIORef resumeRef IDLE
incrementalRevAppend q@(Queue { .. }) (x:rfront) acc !movedCount backCount' = do
    currentFrontCount <- readIORef frontCountRef
    if currentFrontCount <= movedCount then do
        -- This drop count should be bounded by a constant.
        writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
        writeIORef frontCountRef $! currentFrontCount + backCount'
        writeIORef resumeRef IDLE
      else if null rfront then
        incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
      else
        writeIORef resumeRef $! REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount'

resume :: Queue a -> IO ()
resume q = do
    kont <- readIORef (resumeRef q)
    applyKont q kont ()

data Queue a = Queue {
    resumeRef :: IORef (Kont a ()),
    frontRef :: IORef [a],
    backRef :: IORef [a],
    frontCountRef :: IORef Int,
    backCountRef :: IORef Int
}

new :: IO (Queue a)
new = do
    frontRef <- newIORef []
    backRef <- newIORef []
    frontCountRef <- newIORef 0
    backCountRef <- newIORef 0
    resumeRef <- newIORef IDLE
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q  = do
    resume q
    front <- readIORef (frontRef q)
    case front of
        [] -> return Nothing
        (x:front') -> do
            modifyIORef' (frontCountRef q) pred
            writeIORef (frontRef q) front'
            return (Just x)

enqueue :: a -> Queue a -> IO ()
enqueue x q = do
    modifyIORef (backRef q) (x:)
    modifyIORef' (backCountRef q) succ
    resume q

Functional Defunctionalized Global Rebuilding Implementation

This is just a straightforward reorganization of the previous code into purely functional code. This produces a persistent queue with worst-case constant time operations.

It is, of course, far uglier and more ad-hoc than Okasaki’s extremely elegant real-time queues, but the methodology to derive it was simple-minded. The result is also quite similar to the Hood-Melville Queues even though I did not set out to achieve that. That said, I’m pretty confident you could derive pretty much exactly the Hood-Melville queues with just minor modifications to Global Rebuilding Implementation.

module FunctionalQueue ( Queue, empty, dequeue, enqueue ) where

data Kont a r where
  IDLE :: Kont a ()
  REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
  REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
  REV_APPEND_START :: [a] -> !Int -> Kont a [a]
  REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()

applyKont :: Queue a -> Kont a r -> r -> Queue a
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
    incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
    incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
    incrementalRevAppend q rfront acc movedCount backCount

rebuildLoop :: Queue a -> Queue a
rebuildLoop q@(Queue { .. }) =
    if backCount > frontCount then
        let q' = q { back = [], backCount = 0 } in
        incrementalReverse q' back [] $ REVERSE_FRONT front backCount
      else
        q { resumeKont = IDLE }

incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> Queue a
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = q { resumeKont = REVERSE_STEP xs (y:x:acc) k }

incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> Queue a
incrementalRevAppend q [] front' !movedCount backCount' =
    q { front = front', frontCount = movedCount + backCount', resumeKont = IDLE }
incrementalRevAppend q (x:rfront) acc !movedCount backCount' =
    if frontCount q <= movedCount then
        -- This drop count should be bounded by a constant.
        let !front = drop (movedCount - frontCount q) acc in
        q { front = front, frontCount = frontCount q + backCount', resumeKont = IDLE }
      else if null rfront then
        incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
      else
        q { resumeKont = REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount' }

resume :: Queue a -> Queue a
resume q = applyKont q (resumeKont q) ()

data Queue a = Queue {
    resumeKont :: !(Kont a ()),
    front :: [a],
    back :: [a],
    frontCount :: !Int,
    backCount :: !Int
}

empty :: Queue a
empty = Queue { resumeKont = IDLE, front = [], back = [], frontCount = 0, backCount = 0 }

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q =
    case front of
        [] -> (Nothing, q)
        (x:front') ->
            (Just x, q' { front = front', frontCount = frontCount - 1 })
  where q'@(Queue { .. }) = resume q

enqueue :: a -> Queue a -> Queue a
enqueue x q@(Queue { .. }) = resume (q { back = x:back, backCount = backCount + 1 })

Hood-Melville Implementation

This is just the Haskell code from Purely Functional Data Structures adapted to the interface of the other examples.

This code is mostly to compare. The biggest difference, other than some code structuring differences, is the front and back lists are reversed in parallel while my code does them sequentially. As mentioned before, to get a structure like that would simply be a matter of defining a parallel incremental reverse back in the Global Rebuilding Implementation.

Again, Okasaki’s real-time queue that can be seen as an application of the lazy rebuilding and scheduling techniques, described in his thesis and book, is a better implementation than this in pretty much every way.

module HoodMelvilleQueue (Queue, empty, dequeue, enqueue) where

data RotationState a
  = Idle
  | Reversing !Int [a] [a] [a] [a]
  | Appending !Int [a] [a]
  | Done [a]

data Queue a = Queue !Int [a] (RotationState a) !Int [a]

exec :: RotationState a -> RotationState a
exec (Reversing ok (x:f) f' (y:r) r') = Reversing (ok+1) f (x:f') r (y:r')
exec (Reversing ok [] f' [y] r') = Appending ok f' (y:r')
exec (Appending 0 f' r') = Done r'
exec (Appending ok (x:f') r') = Appending (ok-1) f' (x:r')
exec state = state

invalidate :: RotationState a -> RotationState a
invalidate (Reversing ok f f' r r') = Reversing (ok-1) f f' r r'
invalidate (Appending 0 f' (x:r')) = Done r'
invalidate (Appending ok f' r') = Appending (ok-1) f' r'
invalidate state = state

exec2 :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
exec2 !lenf f state lenr r =
    case exec (exec state) of
        Done newf -> Queue lenf newf Idle lenr r
        newstate -> Queue lenf f newstate lenr r

check :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
check !lenf f state !lenr r =
    if lenr <= lenf then exec2 lenf f state lenr r
    else let newstate = Reversing 0 f [] r []
         in exec2 (lenf+lenr) f newstate 0 []

empty :: Queue a
empty = Queue 0 [] Idle 0 []

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue _ [] _ _ _) = (Nothing, q)
dequeue (Queue lenf (x:f') state lenr r) =
    let !q' = check (lenf-1) f' (invalidate state) lenr r in
    (Just x, q')

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue lenf f state lenr r) = check lenf f state (lenr+1) (x:r)

Okasaki’s Real-Time Queues

Just for completeness. This implementation crucially relies on lazy evaluation. Our queues are of the form Queue f r s. If you look carefully, you’ll notice that the only place we consume s is in the first clause of exec, and there we discard its elements. In other words, we only care about the length of s. s gets “decremented” each time we enqueue until it’s empty at which point we rotate r to f in the second clause of exec. The key thing is that f and s are initialized to the same value in that clause. That means each time we “decrement” s we are also forcing a bit of f. Forcing a bit of f/s means computing a bit of rotate. rotate xs ys a is an incremental version of xs ++ reverse ys ++ a (where we use the invariant length ys = 1 + length xs for the base case).

Using Okasaki’s terminology, rotate illustrates a simple form of lazy rebuilding where we use lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we interleave the evaluation of rotate with enqueue and dequeue via forcing the conses of f/s. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is amortized optimal). We need to use Okasaki’s other technique of scheduling to strategically force the thunks incrementally rather than all at once. Here s is a schedule telling us when to force parts of f. (As mentioned, s also serves as a counter telling us when to perform a rebuild.)

module OkasakiQueue ( Queue, empty, dequeue, enqueue ) where

data Queue a = Queue [a] ![a] [a]

empty :: Queue a
empty = Queue [] [] []

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _) = (Nothing, q)
dequeue (Queue (x:f) r s) = (Just x, exec f r s)

rotate :: [a] -> [a] -> [a] -> [a]
rotate     [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)

exec :: [a] -> [a] -> [a] -> Queue a
exec f !r (_:s) = Queue f r s
exec f !r [] = let f' = rotate f r [] in Queue f' [] f'

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f r s) = exec f (x:r) s 

It’s instructive to compare the above to the following implementation which doesn’t use a schedule. This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy rebuilding to spread the xs ++ reverse ys (particularly the reverse part) over multiple dequeues via rotate. The following implementation performs extremely well in my benchmark, but the operations are subtly not constant-time. Specifically, after a long series of enqueues, a dequeue will do work proportional to the logarithm of the number of enqueues. Essentially, f will be a nested series of rotate calls, one for every doubling of the length of the queue. Even if we change let f' to let !f', that will only make the first dequeue cheap. The second will still be expensive.

module UnscheduledOkasakiQueue ( Queue, empty, dequeue, enqueue ) where

data Queue a = Queue [a] !Int [a] !Int

empty :: Queue a
empty = Queue [] 0 [] 0

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _ _) = (Nothing, q)
dequeue (Queue (x:f) lenf r lenr) = (Just x, exec f (lenf - 1) r lenr)

rotate :: [a] -> [a] -> [a] -> [a]
rotate     [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)

exec :: [a] -> Int -> [a] -> Int -> Queue a
exec f !lenf !r !lenr | lenf >= lenr = Queue f lenf r lenr
exec f !lenf !r !lenr = let f' = rotate f r [] in Queue f' (lenf + lenr) [] 0

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f lenf r lenr) = exec f lenf (x:r) (lenr + 1) 

Empirical Evaluation

I won’t reproduce the evaluation code as it’s not very sophisticated or interesting. It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce an enqueue over a dequeue so that the queues would grow. It measured the average time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.

The main thing I wanted to see was relatively stable average enqueue and dequeue times with only the batched implementation having a growing maximum dequeue time. This is indeed what I saw, though it took about 1,000,000 operations (or really a queue of a couple hundred thousand elements) for the numbers to stabilize.

The results were mostly unsurprising. Unsurprisingly, in overall time, the batched implementation won. Its enqueue is also, obviously, the fastest. (Indeed, there’s a good chance my measurement of its average enqueue time was largely a measurement of the timer’s resolution.) The operations’ average times were stable illustrating their constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue time versus the average stabilized around 7000 to 1, except, of course, for the batched version which grew linearly to millions to 1 ratios at queue sizes of tens of millions of elements. This illustrates the worst-case time complexity of all the other implementations, and the merely amortized time complexity of the batched one.

While the batched version was best in overall time, the difference wasn’t that great. The worst implementations were still less 1.4x slower. All the worst-case optimal implementations performed roughly the same, but there were still some clear winners and losers. Okasaki’s real-time queue is almost on-par with the batched implementation in overall time and handily beats the other implementations in average enqueue and dequeue times. The main surprise for me was that the loser was the Hood-Melville queue. My guess is this is due to invalidate which seems like it would do more work and produce more garbage than the approach taken in my functional version.

Conclusion

The point of this article was to illustrate the process of deriving a deamortized data structure from an amortized one utilizing batched rebuilding by explicitly modeling global rebuilding as a coroutine.

The point wasn’t to produce the fastest queue implementation, though I am pretty happy with the results. While this is an extremely simple example, it was still nice that each step was very easy and natural. It’s especially nice that this derivation approach produced a better result than the Hood-Melville queue.

Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue with worst-case constant-time operations.


  1. This code could definitely be refactored to leverage this similarity to reduce code. Alternatively, one could refunctionalize the Hood-Melville implementation at the end.↩︎

  2. Going “too fast”, so long as it’s still a constant amount of work for each step, isn’t really an issue asymptotically, so you can just crank the knobs if you don’t want to think too hard about it. That said, going faster than you need to will likely give you worse worst-case constant factors. In some cases, going faster than necessary could reduce constant factors, e.g. by better utilizing caches and disk I/O buffers.↩︎

October 04, 2024 08:24 AM

Edward Z. Yang

What’s different this time? LLM edition

One of the things that I learned in grad school is that even if you've picked an important and unsolved problem, you need some reason to believe it is solvable--especially if people have tried to solve it before! In other words, "What's different this time?" This is perhaps a dreary way of shooting down otherwise promising research directions, but you can flip it around: when the world changes, you can ask, "What can I do now that I couldn't do before?"

This post is a list of problems in areas that I care about (half of this is PL flavor, since that's what I did my PhD in), where I suspect something has changed with the advent of LLMs. It's not a list of recipes; there is still hard work to figure out how exactly an LLM can be useful (for most of these, just feeding the entire problem into ChatGPT usually doesn't work). But I often talk to people want to get started on something, anything, but have no idea to start. Try here!

Static analysis. The chasm between academic static analysis work and real world practice is the scaling problems that come with trying to apply the technique to a full size codebase. Asymptotics strike as LOC goes up, language focused techniques flounder in polyglot codebases, and "Does anyone know how to write cmake?" But this is predicated on the idea that static analysis has to operate on a whole program. It doesn't; humans can do perfectly good static analysis on fragments of code without having to hold the entire codebase in their head, without needing access to a build system. They make assumptions about APIs and can do local reasoning. LLMs can play a key role in drafting these assumptions so that local reasoning can occur. What if the LLM gets it wrong? Well, if an LLM could get it wrong, an inattentive junior developer might get it wrong too--maybe there is a problem in the API design. LLMs already do surprisingly well if you one-shot prompt them to find bugs in code; with more traditional static analysis support, maybe they can do even better.

DSL purgatory. Consider a problem that can be solved with code in a procedural way, but only by writing lots of tedious, error prone boilerplate (some examples: drawing diagrams, writing GUIs, SQL queries, building visualizations, scripting website/mobile app interactions, end to end testing). The PL dream is to design a sweet compositional DSL that raises the level of abstraction so that you can render a Hilbert curve in seven lines of code. But history is also abound with cases where the DSL did not solve the problems, or maybe it did solve the problem but only after years of grueling work, and so there are still many problems that feel like there ought to be a DSL that should solve them but there isn't. The promise of LLMs is that they are extremely good at regurgitating low level procedural actions that could conceivably be put together in a DSL. A lot of the best successes of LLMs today is putting coding powers in the hands of domain experts that otherwise do not how to code; could it also help in putting domain expertise in the hands of people who can code?

I am especially interested in these domains:

  • SQL - Its strange syntax purportedly makes it easier for non-software engineers to understand, whereas many (myself included) would often prefer a more functional syntax ala LINQ/list comprehensions. It's pretty hard to make an alternate SQL syntax take off though, because SQL is not one language, but many many dialects everywhere with no obvious leverage point. That sounds like an LLM opportunity. Or heck, just give me one of those AI editor environments but specifically fine tuned for SQL/data visualization, don't even bother with general coding.
  • End to end testing - This is https://momentic.ai/ but personally I'm not going to rely on a proprietary product for testing in my OSS projects. There's definitely an OSS opportunity here.
  • Scripting website/mobile app interactions - The website scraping version of this is https://reworkd.ai/ but I am also pretty interested in this from the browser extension angle: to some extent I can take back control of my frontend experience with browser extensions; can I go further with LLMs? And we typically don't imagine that I can do the same with a mobile app... but maybe I can??

OSS bread and butter. Why is Tesseract still the number one OSS library for OCR? Why is smooth and beautiful text to voice not ubiquitous? Why is the voice control on my Tesla so bad? Why is the wake word on my Android device so unreliable? Why doesn't the screenshot parser on a fansite for my favorite mobage not able to parse out icons? The future has arrived, but it is not uniformly distributed.

Improving the pipeline from ephemeral to durable stores of knowledge. Many important sources of knowledge are trapped in "ephemeral" stores, like Discord servers, private chat conversations, Reddit posts, Twitter threads, blog posts, etc. In an ideal world, there would be a pipeline of this knowledge into more durable, indexable forms for the benefit of all, but actually doing this is time consuming. Can LLMs help? Note that the dream of LLMs is you can just feed all of this data into the model and just ask questions to it. I'm OK with something a little bit more manual, we don't have to solve RAG first.

by Edward Z. Yang at October 04, 2024 04:30 AM

October 02, 2024

Ken T Takusagawa

[mlzpqxqu] import with type signature

proposal for a Haskell language extension: when importing a function from another module, one may optionally also specify a type signature for the imported function.  this would be helpful for code understanding.  the reader would have immediately available the type of the imported symbol, not having to go track down the type in the source module (which may be many steps away when modules re-export symbols, and the source module might not even have a type annotation), nor use a tool such as ghci to query it.  (maybe the code currently fails to compile for other reasons, so ghci is not available.)

if a function with the specified type signature is not exported by an imported module, the compiler can offer suggestions of other functions exported by the module which do have, or unify with, the imported type signature.  maybe the function got renamed in a new version of the module.

or, the compiler can do what Hoogle does and search among all modules in its search path for functions with the given signature.  maybe the function got moved to a different module.

the specified type signature may be narrower than how the function was originally defined.  this can limit some of the insanity caused by the Foldable Traversable Proposal (FTP):

import Prelude(length :: [a] -> Int) -- prevent length from being called on tuples and Maybe

various potentially tricky issues:

  1. a situation similar to the diamond problem (multiple inheritance) in object-oriented programming: module A defines a polymorphic function f, imported then re-exported by modules B and C.  module D imports both B and C, unqualified.  B imports and re-exports f from A with a type signature more narrow than originally defined in A.  C does not change the type signature.  what is the type of f as seen by D?  which version of f, which path through B or C, does D see?  solution might be simple: if the function through different paths are not identical, then the user has to qualify.

  2. the following tries to make List.length available only for lists, and Foldable.length available for anything else.  is this asking for trouble?

    import Prelude hiding(length);
    import qualified Prelude(length :: [a] -> Int) as List;
    import qualified Prelude(length) as Foldable;

by Unknown (noreply@blogger.com) at October 02, 2024 12:43 AM

October 01, 2024

Haskell Interlude

56: Satnam Singh

Today on the Haskell Interlude, Matti and Sam are joined by Satnam Singh. Satnam has been a lecturer at Glasgow, and Software Engineer at Google, Meta, and now Groq. He talks about convincing people to use Haskell, laying out circuits and why community matters.

PS: After the recording, it was important to Satnam to clarify that his advise to “not be afraid to loose your job” was specially meant to encourage to quit jobs that are not good for you, if possible, but he acknowledges that unfortunately not everybody can afford that risk.

by Haskell Podcast at October 01, 2024 05:00 PM

Brent Yorgey

Retiring BlogLiterately

Retiring BlogLiterately

Posted on October 1, 2024
Tagged , , , , , ,

Way back in 2012 I took over maintainership of the BlogLiterately tool from Robert Greayer, its initial author. I used it for many years to post to my Wordpress blog, added a bunch of features, solved some fun bugs, and created the accompanying BlogLiterately-diagrams plugin for embedding diagrams code in blog posts. However, now that I have fled Wordpress and rebuilt my blog with hakyll, I don’t use BlogLiterately any more (there is even a diagrams-pandoc package which does the same thing BlogLiterately-diagrams used to do). So, as of today I am officially declaring BlogLiterately unsupported.

The fact is, I haven’t actually updated BlogLiterately since March of last year. It currently only builds on GHC 9.4 or older, and no one has complained, which I take as strong evidence that no one else is using it either! However, if anyone out there is actually using it, and would like to take over as maintainer, I would be very happy to pass it along to you.

I do plan to continue maintaining HaXml and haxr, at least for now; unlike BlogLiterately, I know they are still in use, especially HaXml. However, BlogLiterately was really the only reason I cared about these packages personally, so I would be happy to pass them along as well; please get in touch if you would be willing to take over maintaining one or both packages.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at October 01, 2024 12:00 AM