I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!
Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!
Submission Deadline: 9 February 2025 Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.
Submit your talk and share your wisdom with the FP community.
We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity. Enter Vite: a modern build tool that’s not just an alternative to CRA, but a glimpse into the future of web development tooling. I’ll introduce both CRA and Vite, share how switching to Vite transformed our development workflow with concrete numbers and benchmarks to demonstrate the dramatic improvements in build times, startup speed, and overall developer experience.
Create React App: A Historical Context
Create React App played a very important role in making React what it is today. By introducing a single, clear, and recommended approach for creating React projects, it enabled developers to focus on building applications without worrying about the complexity of the underlying build tools.
However, like many mature and widely established tools, CRA has become stagnant over time by not keeping up with features provided by modern (meta-)frameworks like server-side rendering, routing, and data fetching. It also hasn’t taken advantage of web APIs to deliver fast applications by default.
Let’s dive into some of the most noticeable limitations.
Performance Issues
CRA’s performance issues stem from one major architectural factor: its reliance on Webpack as its bundler. Webpack, while powerful and flexible, has inherent performance limitations. Webpack processes everything through JavaScript, which is single-threaded by nature and slower at CPU-intensive tasks compared to lower-level languages like Go or Rust.
Here’s a simplified version of what happens every time you make a code change:
CRA (using Webpack) needs to scan your entire project to understand how all your files are connected to build a dependency graph
It then needs to transform all your modern JavaScript, TypeScript, or JSX code into a version that browsers can understand
Finally, it bundles everything together into a single package that can be served to your browser
Rebuilding the app becomes increasingly time-consuming as the project grows. During development, Webpack’s incremental builds help mitigate performance challenges by only reprocessing modules that have changed, leveraging the dependency graph to minimize unnecessary work. However, the bundling step still needs to consider all files—both cached and reprocessed, to generate a complete bundle that can be served to the browser, which means Webpack must account for the entire codebase’s structure with each build.
Security Issues
When running npx create-react-app <project-directory>, after waiting for a while, a good amount of deprecated warnings (23 packages as of writing this) will be shown. At the end of the installation process, a message indicating 8 vulnerabilities (2 moderate, 6 high) will appear. This means that create-react-app relies on packages that have known critical security vulnerabilities.
Support Issues
The React team no longer recommends CRA for new projects, and they have stopped providing support for it. The last published version on npm was 3 years ago.
While CRA served its purpose well in the past, its aging architecture, security vulnerabilities, and lack of modern features make it increasingly difficult to justify for new projects.
Introducing Vite
Vite is a build tool that is designed to be simpler, faster and more efficient for building modern web applications. It’s opinionated and comes with sensible defaults out of the box.
Vite was created by Evan You, author of Vue, in 2020 to solve the complexity, slowness and the heaviness of the JavaScript module bundling toolchain. Since then, Vite has become one of the most popular build tools for web development, with over 15 million downloads per week and a community that has rated it as the Most Loved Library Overall, No.1 Most Adopted (+30%) and No.2 Highest Retention (98%) in the State of JS 2024 Developer Survey.
In addition to streamlining the development of single-page applications, Vite can also power meta frameworks and has support for server-side rendering (SSR). Although its scope is broader than what CRA was meant for, it does a fantastic job replacing CRA.
Why Vite is Faster
Vite applies several modern web technologies to improve the development experience:
1. Native ES Modules (ESM)
During development mode, Vite serves source code over native ES modules basically letting the browser handle module loading directly and skipping the bundling step. With this approach, Vite only processes and sends code as it is imported by the browser, and conditionally imported modules are processed only if they’re actually needed on the current page. This means the dev server can start much faster, even in large projects.
2. Efficient Hot Module Replacement (HMR)
By serving source code as native ESM to the browser, thus skipping the bundling step, Vite’s HMR process can provide near-instant updates while preserving the application state. When code changes, Vite updates only the modified module and its direct dependencies, ensuring fast updates regardless of project size. Additionally, Vite leverages HTTP headers and caching to minimize server requests, speeding up page reloads when necessary. More information about what HMR is and how it works in Vite can be found in this exhaustive blog post.
3. Optimized Build Tooling
Even though ESM are now widely supported, dependencies can still be shipped as CommonJS or UMD. To leverage the benefits of ESM during development, Vite uses esbuild to pre-bundle dependencies when starting the dev server. This step involves transforming CommonJS/UMD to ES modules and converting dependencies with many internal modules into a single module, thus improving performance and reducing browser requests.
When it comes to production, Vite switches to Rollup to bundle the application. Bundling is still preferred over ESM when shipping to production, as it allows for more optimizations like tree-shaking, lazy-loading and chunk splitting.
While this dual-bundler approach leverages the strengths of each bundler, it’s important to note that it’s a trade-off that can potentially introduce subtle inconsistencies between development and production environments and adds to Vite’s complexity.
By leveraging modern web technologies like ESM and efficient build tools like esbuild and Rollup, Vite represents a significant leap forward in development tooling, offering speed and simplicity that CRA simply cannot match with the way it’s currently architected.
Practical Results
The Migration Process
The codebase we migrated from CRA to Vite had around 250 files and 30k lines of code. Built as a Single Page Application using React 18, it uses Zustand and React Context for state management, with Tailwind CSS and shadcn/ui and some Bootstrap legacy components.
Here is a high-level summary of the migration process as it applied to our project, which took roughly a day to complete. The main steps included:
Adjusting tsconfig.json to align with Vite’s requirements
All steps are well documented in the Vite documentation and in several step-by-step guides available on the web.
Most challenges encountered were related to environment variables and path aliases, which were easily resolved using Vite’s documentation, and its vibrant community has produced extensive resources, guides, and solutions for even the most specialized setups.
Build Time
The build time for the project using Create React App (CRA) was 1 minute and 34 seconds. After migrating to Vite, the build time was reduced to 29.2 seconds, making it 3.2 times faster.
This reduction in build time speeds up CI/CD cycles, enabling more frequent testing and deployment. This is crucial for our development workflow, where faster builds mean quicker turnaround times and fewer delays for other team members. It can also reduce the cost of running the build process.
Dev Server Startup Time
The speed at which the development server starts can greatly impact the development workflow, especially in large projects.
The development server startup times saw a remarkable improvement after migrating from Create React App (CRA) to Vite. With CRA, a cold start took 15.469 seconds, and a non-cold start was 6.241 seconds. Vite dramatically reduced these times, with a cold start at just 1.202 seconds—12.9 times faster—and a non-cold start at 598 milliseconds, 10.4 times faster. The graph below highlights these impressive gains.
This dramatic reduction in startup time is particularly valuable when working with multiple branches or when frequent server restarts are needed during development.
HMR Update Time
While both CRA and Vite perform well with Hot Module Replacement at our current project scale, there are notable differences in the developer experience. CRA’s Webpack-based HMR typically takes around 1 second to update—which might sound fast, but the difference becomes apparent when compared to Vite’s near-instantaneous updates.
This distinction becomes more pronounced as projects grow in size and complexity. More importantly, the immediate feedback from Vite’s HMR creates a noticeably smoother development experience, especially when designing features that require frequent code changes and UI testing cycles. The absence of even a small delay helps maintain a more fluid and enjoyable workflow.
Bundle Size
Another essential factor is the size of the final bundled application, which affects load times and overall performance.
This represents a 27.5% reduction in raw bundle size and a 9.3% reduction in gzipped size. For end users, this means faster page loads, less data usage, and better performance, especially on mobile devices.
The data clearly illustrates that Vite’s improvements in build times, startup speed, and bundle size provide a significant and measurable upgrade to our development workflow.
The Hidden Advantage: Reduced Context Switching
One of the less obvious but valuable benefits of migrating to a faster environment like Vite is the reduction in context switching. In environments with slower build and start-up times, developers are more likely to engage in other tasks during these “idle” moments. Research on task interruptions shows that even brief context switches can introduce cognitive “reorientation” costs, increasing stress and reducing efficiency.
By reducing build and start-up times, Vite allows our team to maintain focus on their primary tasks. Developers are less likely to switch tasks and better able to stay within the “flow” of development, ultimately leading to a smoother, more focused workflow and, over time, less cognitive strain.
Beyond the measurable metrics, the real victory lies in how Vite’s speed helps developers maintain their focus and flow, leading to a more enjoyable and happy experience overall.
The Future of Vite is Bright
Vite is aiming to be a unified toolchain for the JavaScript ecosystem, and it is already showing great progress by introducing new tools like Rolldown and OXC.
Rolldown, Vite’s new bundler written in Rust, promises to be even faster than esbuild while maintaining full compatibility with the JavaScript ecosystem. It also unifies Vite’s bundling approach across development and production environments, solving the previously mentioned trade-off. Meanwhile, OXC provides a suite of high-performance tools including the fastest JavaScript parser, resolver, and TypeScript transformer available.
These innovations are part of Vite’s broader vision to create a more unified, efficient, and performant development experience that eliminates the traditional fragmentation in JavaScript tooling.
Early benchmarks show impressive performance improvements:
With innovations like Rolldown and OXC on the horizon, Vite is not just solving today’s development challenges but is actively shaping the future of web development tooling.
Conclusion
Migrating from Create React App to Vite proved to be a straightforward process that delivered substantial benefits across multiple dimensions. The quantifiable improvements in terms of build time, bundle size and development server startup time were impressive and by themselves justify the migration effort.
However, the true value extends beyond these measurable metrics. The near-instant Hot Module Replacement, reduced context switching, and overall smoother development workflow have significantly enhanced our team’s development experience. Developers spend less time waiting and more time in their creative flow, leading to better focus and increased productivity.
The migration also positions our project for the future, as Vite continues to evolve with promising innovations like Rolldown and OXC. Given the impressive results and the relatively straightforward migration process, the switch from CRA to Vite stands as a clear win for both our development team and our application’s performance.
The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:
Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
Try to beat inflation by investing in the stock market and other risk-on investments.
Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.
(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)
High living and melting ice cubes
Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.
But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.
Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.
The investment option
With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.
You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?
Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.
So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.
Michael (Midas) Mulligan
Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.
He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.
And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.
No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.
That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.
Michael (Crappy Investor) Snoyman
I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.
I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.
So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.
I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.
And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.
Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.
And yes, I ultimately lost a massive amount of money through this.
Normal people shouldn’t invest
Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.
(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)
The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.
One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.
Option 4: deflation
The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.
If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.
And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.
So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.
Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.
I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.
This blog post is in the style of my previous blog post on Matrix. I'm reviewing and sharing onboarding experience with a new technology. I'm sharing in the hopes that it will help others become aware of this new technology, understand what it can do, and if people are intrigued, have a more pleasant onboarding experience. Just keep in mind: I'm in no way an expert on this. PRs welcome to improve the content here!
I’d describe Nostr as decentralized social media. It’s a protocol for people to identify themselves via public key cryptography (thus no central identity service), publish various kinds of information, access through any compatible client, and interact with anyone else. At its simplest, it’s a Twitter/X clone, but it offers much more functionality than that (though I’ve barely scratched the surface).
Nostr has a high overlap with the Bitcoin ecosystem, including built-in micropayments (zaps) via the Lightning Network, an instantaneous peer-to-peer payment layer built on top of Bitcoin.
I'll start off by saying: right now, Nostr's user experience is not on a par with centralized services like X. But I can see a lot of promise. The design of the protocol encourages widespread innovation, as demonstrated by the plethora of clients and other tools to access the protocol. Decentralized/federated services are more difficult to make work flawlessly, but the advantages in terms of freedom of expression, self-custody of your data, censorship resistance, and ability to build more featureful tools on top of it make me excited.
I was skeptical (to say the least) about the idea of micropayments built into social media. But I'm beginning to see the appeal. Firstly, getting away from an advertiser-driven business model fixes the age old problem of "if you're not paying for a service, you're the product." But I see a deeper social concept here too. I intend to blog more in the future on the topic of non-monetary competition and compensation. But in short, in the context of social media: every social network ends up making its own version of imaginary internet points (karma, moderator privileges, whatever you want to call it). Non-monetary compensation has a lot of downsides, which I won't explore here. Instead, making the credit system based on money with real-world value has the promise to vastly improve social media interactions.
Did that intrigue you enough to want to give this a shot? Awesome! Let me give you an overview of the protocol, and then we'll dive into my recommendation on getting started.
Protocol overview
The basics of the protocol can be broken down into:
Relays
Events
Identities
Clients
As a decentralized protocol, Nostr relies on public key cryptography for identities. That means, when you interact on the network, you'll use a private key (represented as an nsec value) to sign your messages, and will be identified by your public key (represented as an npub value). Anyone familiar with Bitcoin or cryptocurrency will be familiar with the keys vs wallet divide, and it lines up here too. Right off the bat, we see the first major advantage of Nostr: no one controls your identity except you.
Clients are how a user will interact with the protocol. You'll provide your identity to the client in one of a few ways:
Directly entering your nsec. This is generally frowned upon since it opens you up for exploit, though most mobile apps work by direct nsec entry.
Getting a view-only experience in clients that support it by entering your npub.
Using a signing tool to perform the signing on behalf of the client without giving away your private keys to everyone. (This matches most web3 interactions that rely on a wallet browser extension.)
Events are a general-purpose concept, and are the heart of Nostr interaction. Events can represent a note (similar to a Tweet), articles, likes, reposts, profile updates, and more. Anything you do on the protocol involves creating and signing an event. This is also the heart of Nostr's extensibility: new events can be created to support new kinds of interactions.
Finally there are relays. Relays are the servers of the Nostr world, and are where you broadcast your events to. Clients will typically configure multiple relays, broadcast your events to those relays, and query relays for relevant events for you (such as notes from people you follow, likes on your posts, and more).
Getting started
This is where the suboptimal experience really exists for Nostr. It took me a few days to come up with a setup that worked reliably. I'm going to share what worked best for me, but keep in mind that there are many other options. I'm a novice; other guides may give different recommendations and you may not like my selection of tools. My best recommendation: don't end up in shell shock like I did. Set up any kind of a Nostr profile, introduce yourself with the #introductions hashtag, and ask for help. I've found the community unbelievably welcoming.
Alright, so here are the different pieces you're going to need for a full experience:
Browser extension for signing
Web client
Mobile client
Lightning wallet
A Nostr address
I'm going to give you a set of steps that I hope both provides easy onboarding while still leaving you with the ability to more directly control your Nostr experience in the future.
Lightning wallet: coinos
First, you're going to set up a Lightning wallet. There are a lot of options here, and there are a lot of considerations between ease-of-use, self-custody, and compatibility with other protocols. I tried a bunch. My recommendation: use coinos. It's a custodial wallet (meaning: they control your money and you're trusting them), so don't put any large sums into it. But coinos is really easy to use, and supports Nostr Wallet Connect (NWC). After you set up your account, click on the gear icon, and then click on "Reveal Connection String." You'll want to use that when setting up your clients. Also, coinos gives you a Lightning address, which will be <username>@coinos.io. You'll need that for setting up your profile on Nostr.
Web client: YakiHonne
I tried a bunch of web clients and had problems with almost all of them. I later realized most of my problems seemed to be caused by incorrectly set relays, which we'll discuss below. In any event, I ultimately chose YakiHonne. It also has mobile iOS and Android clients, so you can have a consistent experience. (I also used the Damus iOS client, which is also wonderful.)
Go to the homepage, click on the Login button in the bottom-left, and then choose "Create an account." You can add a profile picture, banner image, choose a display name, and add a short description. In the signup wizard, you'll see an option to let YakiHonne set up a wallet (meaning a Lightning wallet) for you. I chose not to rely on this and used coinos instead to keep more flexibility for swapping clients in the future. If you want to simplify, however, you can just use the built-in wallet.
Before going any further, make sure you back up your nsec secret key!!! Click on your username in the bottom-left, then settings, and then "Your keys." I recommend saving both your nsec and npub values in your password manager.
No, this isn't my actual set of keys, this was a test profile I set up while writing this post.
Within that settings page, click on "wallets," then click on the plus sign next to add wallets, and choose "Nostr wallet connect." Paste the NWC string you got from coinos, and you'll be able to zap people money!
Next, go back to settings and choose "Edit Profile." Under "Lightning address," put your <username>@coinos.io address. Now you'll also be able to receive zaps from others.
Another field you'll notice on the profile is NIP-05. That's your Nostr address. Let's talk about getting that set up.
NIP-05 Nostr address
Remembering a massive npub address is a pain. Instead, you'll want to set up a NIP-05 address. (NIP stands for Nostr Implementation Possibilities, you can see NIP-05 on GitHub.) There are many services—both paid and free—to get a NIP-05 address. You can see a set of services on AwesomeNostr. Personally, I decided to set up an identifier on my own domain. You can see my live nostr.json file, which at time of writing supports:
michael@snoyman.com and an alias snoyberg@snoyman.com
The special _@snoyman.com, which actually means "the entire domain itself"
And an identifier for my wife as well, miriam@snoyman.com
If you host this file yourself, keep in mind these two requirements:
You cannot have any redirects at that URL! If your identifier is name@domain, the URL https://domain/.well-known/nostr.json?name=<name>must resolve directly to this file.
You need to set CORS headers appropriately to allow for web client access, specifically the response header access-control-allow-origin: *.
Once you have that set up, add your Nostr address to your profile on YakiHonne. Note that you'll need the hex version of your npub, which you can generate by using the Nostr Army Knife:
BONUS I decided to also set up my own Lightning wallet address, by rehosting the Lightning config file from https://coinos.io/.well-known/lnurlp/snoyberg on my domain at https://snoyman.com/.well-known/lnurlp/michael.
Signer
As far as I can tell, Alby provides the most popular Nostr signing browser extension. The only problem I had with it was confusion about all the different things it does. Alby provides a custodial lightning wallet via Alby Hub, plus a mobile Alby Go app for accessing it, plus a browser extension for Nostr signing, and that browser extension supports using both the Alby Hub wallet and some other lightning wallets. I did get it all to work together, and it's a pleasant experience.
However, to keep things a bit more simple and single-task focused, I'll recommend trying out the nos2x extension first. It's not pretty, but handles the signer piece very well. Install the extension, enter the nsec you got from YokiHonne, click save, and you're good to go. If you go to another Nostr client, like noStrudel, you should be able to "sign in with extension."
You may also notice that there's any entry area for "preferred relays." We'll discuss relays next. Feel free to come back to this step and add those relays. (And, after you've done that, you can also use a nostr.json generator to help you self-host your NIP-05 address if you're so inclined.)
Final note: once you've done the initial setup, it's not clear how to get back to the nos2x settings page. Right-click the extension, click manage extension, and then choose "extension options." At least those were the steps in Brave; it may be slightly different in other browsers.
Relays
This has been my biggest pain point with Nostr so far. Everything you do with Nostr needs to be sent to relays or received from relays. You want to have a consistent and relatively broad set of relays to make sure your view of the world is consistent. If you don't, you'll likely end up with things like mismatched profiles across relays, messages that seem to disappear, and more. This was probably my biggest stumbling block when first working with Nostr.
There seem to be three common ways to set the list of relays:
Manually entering the relays in the client's settings.
Getting the list of relays from the signer extension (covered by NIP-07).
Getting the list of relays from your NIP-05 file.
Unfortunately, it looks like most clients don't support the latter two methods. So unfortunately, any time you start using a new client, you should check the relay list and manually sync it up with a list of relays you maintain.
You can look at my nostr.json file for my own list of relays. One relay in particular I was recommended to use is wss://hist.nostr.land. This relay will keep track of your profile and follow list updates. As I mentioned, it's easy to accidentally partially-override your profile information through inconsistent relay lists, and apparently lots of new users (myself included) end up doing this. If you go to hist.nostr.land you can sign in, find your historical updates, and restore old versions.
Mobile
You're now set up on your web experience. For mobile, download any mobile app and set it up similarly to what I described for web. The major difference will be that you'll likely be entering your nsec directly into the mobile app.
I've used both Damus and YakiHonne. I had better luck with YakiHonne for getting zaps working reliably, but that may simply be because I'd tried Damus before I'd gotten set up with coinos before. I'll probably try out Damus some more soon.
Note on Damus: I had trouble initially with sending Zaps on Damus, but apparently that's because of Apple rules. You can enable Zaps by visiting this site on your device: https://zap.army/. Thanks William Cassarin for the guidance and the great app.
Introductions
You should now be fully set up to start interacting on Nostr! As a final step, I recommend you start off by sending an introduction note. This is a short note telling the world a bit about yourself, with the #introductions hashtag. For comparison, here's my introduction note (or a Nostr-native URL).
And in addition, feel free to @ me in a note as well, I'd love to meet other people on Nostr who joined up after reading this post. My identifier is michael@snoyman.com. You can also check out my profile page on njump, which is a great service to become acquainted with.
And now that you're on Nostr, let me share my experiences with the platform so far.
My experience
I'm definitely planning to continue using Nostr. The community has a different feel to my other major social media hub, X, which isn't surprising. There's a lot more discussion of Bitcoin and economics, which I love. There's also, at least subjectively, more of a sense of having fun versus X. I described it as joyscrolling versus doomscrolling.
Nostr is a free speech haven. It's quite literally impossible to fully silence someone. People can theoretically be banned from specific relays, but a banned user could always just use other relays are continue to create new keys. There's no KYC process to stop them. I've only found one truly vile account so far, and it was easy enough to just ignore. This fits very well with my own personal ethos. I'd rather people have a public forum to express any opinion, especially the opinions I most strongly disagree with, including calls to violence. I believe the world is better for allowing these opinions to be shared, debated, and (hopefully) exposed as vapid.
The process of zapping is surprisingly engaging. The amount of money people send around isn't much. The most common zap amount is 21 satoshis, which at the current price of Bitcoin is just about 2 US cents. Unless you become massively popular, you're not going to retire on zaps. But it's far more meaningful to receive a zap than a like; it means someone parted with something of actual value because you made their day just a little bit better. And likewise, zapping someone else has the same feeling. It's also possible to tip providers of clients and other tools, which is a fundamental shift from the advertiser-driven web of today.
I'd love to hear from others about their own experiences! Please reach out with your own findings. Hopefully we'll all be able to push social media into a more open, healthy, and fun direction.
The GHC developers are very pleased to announce the release of GHC 9.12.1.
Binary distributions, source distributions, and documentation are available at
downloads.haskell.org.
We hope to have this release available via ghcup shortly.
GHC 9.12 will bring a number of new features and improvements, including:
The new language extension OrPatterns allowing you to combine multiple
pattern clauses into one.
The MultilineStrings language extension to allow you to more easily write
strings spanning multiple lines in your source code.
Improvements to the OverloadedRecordDot extension, allowing the built-in
HasField class to be used for records with fields of non lifted representations.
The NamedDefaults language extension has been introduced allowing you to
define defaults for typeclasses other than Num.
More deterministic object code output, controlled by the -fobject-determinism
flag, which improves determinism of builds a lot (though does not fully do so)
at the cost of some compiler performance (1-2%). See #12935 for the details
GHC now accepts type syntax in expressions as part of GHC Proposal #281.
The WASM backend now has support for TemplateHaskell.
Experimental support for the RISC-V platform with the native code generator.
… and many more
A full accounting of changes can be found in the release notes.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
This is the twenty-fifth edition of our GHC activities report, which describes
the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain.
The current edition covers roughly the months of September to November 2024.
You can find the previous editions collected under the
ghc-activities-report tag.
Sponsorship
We are delighted to offer Haskell Ecosystem Support Packages to provide
commercial users with access to Well-Typed’s experts, while investing in the
Haskell community and its technical ecosystem. Clients will both fund the work described in this report
and support the Haskell Foundation. If your company is using Haskell,
read more about our offer, or
get in touch with us today, so we can help you get
the most out of the toolchain. We need more funding to continue our essential maintenance work!
Many thanks to our existing sponsors who make this work possible:
Anduril and Juspay. In
addition, we are grateful to Mercury for funding
specific work on improved performance for developer tools on large codebases.
Ben released GHC 9.8.3 and
has been working towards the upcoming release of GHC 9.8.4, which will make
some packaging improvements and update libraries distributed with GHC.
Zubin has been preparing for the release of GHC 9.12 series by releasing
9.12-alpha1,
9.12-alpha2
and
9.12-alpha3.
The release candidate and final release are expected soon.
Cabal-3.14.0.0 was released in September, adding initial support for the new
hooks build-type we implemented to replace custom setup scripts, as part of our
work for the Sovereign Tech Fund on Cabal long-term
maintainability.
Corresponding new versions of cabal-install and the release of the
Cabal-hooks library are due soon, at which point it will become easier for
users to explore replacing their custom setup scripts with the hooks feature.
Related to this effort:
Sam amended the Cabal-hooks version to match
the Cabal library version (#10579).
Rodrigo made progress on Sam’s work to simplify the way cabal-install uses
the Cabal library to build packages. This will make the code easier to work
with in the future, and should improve build performance
(#9871).
Rodrigo made cabal-install invoke git clone concurrently when downloading
from git repositories for source-repository-package stanzas or cabal get,
and switched it to use shallow clones by default
(#10254).
This speeds up the cloning step significantly.
Rodrigo’s work on private dependencies
(#9743) is slowly making progress,
thanks to recent work by Kristen Kozak.
Rodrigo and Matthew fixed various minor Cabal bugs, improved documentation and
future-proofed against future core libraries changes
(#10311,
#10404,
#10415 and
#10433).
HLS
Hannes finished off and merged support for a new “jump to instance definition”
feature in HLS
(#4392),
which will make it easier for users to understand which typeclass instance is in
use in a particular expression.
GHC
Exception backtraces
Rodrigo worked on improving several facets of the exception backtraces story:
Improved the rendering of uncaught exceptions so the default output is much
clearer and easier to understand (CLC proposal
#285,
!13301). This included reformatting the output, reducing duplication,
avoiding exposing internal implementation details in the call stacks, and
reducing duplication.
Changed functions such as catch to propagate the original cause
if another exception is subsequently thrown
(CLC proposal #202.
Landed a patch by Ben so that the HasCallStack-based backtrace for an error call is more informative (!12620, #24807).
Overall, exception backtraces will be much more useful in GHC 9.12 and later, making it easier to debug Haskell applications.
Frontend
Andreas added new primops, is[Mutable]ByteArrayWeaklyPinned#, which allow
checking whether a bytearray can be moved by the RTS, as per CLC proposal #283.
Sam fixed a GHC panic involving out-of scope pattern synonyms (#25056, !13092).
Sam augmented the -fdiagnostics-as-json output to include the reason an error or warning was emitted (!13577, #25403).
Matthew and Rodrigo deprecated the unused -Wcompat-unqualified-imports warning
(!12755, #24904, !13349, #25330).
Ben improved the parsing and parser errors for sizes in GHC RTS flags
(!12384, #20201).
Matthew fixed a bug in the interaction of -working-dir and foreign files
(!13196, #25150).
Zubin bumped the Haddock binary interface version to 46, to improve errors
when there are mismatched interface files (!13342).
SIMD in the NCG backend
Sam and Andreas teamed up to finish the mega-MR adding SIMD support to GHC’s
X86 native code generator backend (!12860, and see our previous report for
more background).
This also fixed critical correctness bugs that affected SIMD support in the
existing LLVM backend, such as #25062 and #25169.
Sam followed this up with user’s guide documentation for the feature (!13880)
and a couple of additional fixes for bugs that have been reported since
(!13561, !13612).
LLVM backend
Sam implemented several fixes relating to the LLVM backend, in collaboration
with GHC contributor @aratamizuki:
fix bugs involving fltused to ensure that GHC can use the LLVM backend
on Windows once more (#22487, !13183),
Matthew bumped the LLVM upper bound to allow GHC to use LLVM 19 (!13311, #25295).
RISC-V backend
Andreas added support for floating-point min/max operations in the RISC-V
NCG backend (!13325)
Matthew fixed some issues to do with the fact that
the RISC-V backend does not yet support SIMD vectors (!13327, #25314, #13327).
Object code determinism
Rodrigo merged !12680, which goes 95% of the way towards ensuring GHC produces
fully deterministic object code (#12935).
Rodrigo made the unique generation used by the LLVM backend deterministic
(!13307, #25274), thus making GHC 96% object-code deterministic.
Rodrigo ensured that re-exports did not spoil determinism of interface files
in !13316 (#25304).
Compiler performance
Matthew, Rodrigo and Adam published a proposal for Explicit Level
Imports. The
proposed language feature will allow users of Template Haskell to communicate
more precise dependencies for quotes and splices, which can unlock significant
compile-time performance improvements and is a step towards better
cross-compilation support.
Rodrigo improved the performance of module reachability queries (!13593),
which can significantly reduce compile times and memory usage for projects
with very large numbers of modules.
Andreas introduced a new flag, -fmax-forced-spec-args, to control the
maximum size of specialised functions introduced when using the SPEC keyword
(!13184, #25197). This avoids a potential compile-time performance cliff
caused by specialisations with excessively large numbers of arguments.
Matthew greatly reduced the memory footprint of linking with the Javascript
backend by making some parts of the compiler more lazy (!13346).
Ben fixed the encoding of breakpoint instructions in the RTS to account for
a recent addition of support for inlining breakpoints (!13423, #25374).
Ben tightened up the documentation and invariants in the bytecode interpreter
(!13565).
Ben allowed GNU-style non-executable stack notes to be used on FreeBSD
(!13587, #25475).
Ben fixed an incorrect EINTR check in the timerfd ticker (!13588, #25477).
Zubin ensured that any new cost centres added by freshly-loaded objects are
correctly included in the eventlog (!13114, #24148).
Ben increased the gen_workspace alignment in the RTS from 64 bytes to 128
bytes in order to prevent false sharing on Apple’s ARMv8 implementation, which
uses a cache-line size of 128 bytes (!13594, #25459).
Ben removed some incorrect platform-dependent pointer casts in the RTS
(!13597).
Zubin fixed a segfault when using the non-moving GC with profiling
(!13271, #25232).
Ben fixed a stack overrun error with i386 adjustors (!13599, #25485).
Ben introduced a convenience printIPE debugging function for printing
info-provenance table entries (!13614).
Andreas fixed a crash that could happen if an exception and the
compacting GC aligned in specific ways (#24791, !13640).
Documentation
Ben fleshed out missing documentation of the eventlog format
in the user’s guide (!13398, #25296).
Ben documented the :where GHCi command (!13399, #24509).
Andreas clarified the documentation of fexpose-overloaded-unfoldings
(!13286, #24844).
Ben documented that GHC coalesces adjacent sub-word size fields of
data constructors (!13397).
Ben improved the documentation of equality constraints to mention which
language extensions are required to use them (!12395, #24127).
Codebase improvements
Ben fixed a few warnings in the runtime system, including some FreeBSD specific
warnings (!13586).
Ben fixed (!13394, #25362) some incomplete pattern matches in GHC.Internal.IO.Windows.Handle
that came to light in !13308, a refactor of the desugarer.
Andreas also chipped in, squashing some warnings in ghc-heap (!13510).
Matthew refactored the partitionByWorkerSize function to avoid spurious
pattern-match warning bugs when compiling with -g3 (!13359, #25338).
Matthew removed the hs-boot file for Language.Haskell.Syntax.ImpExp and
introduced one for GHC.Hs.Doc, which better reflects the intended modular
hierarchy of the modules (!13406).
GHC API
We are pleased to see that the Haskell Foundation and Tweag are resuming
efforts aimed at defining a stable API for GHC.
Ben added lookupTHName, a more convenient way to look up a name in a
GHC plugin that re-uses Template Haskell lookup functions (!12432, #24741).
Zubin made sure that the driverPlugin is correctly run for static plugins
(!13199, #25217).
Libraries
Andreas allowed unknown FD device types in the setNonBlockingMode function
(!13204, #25199), as per CLC proposal #282. This fixes a regression in the hinotify
package on GHC 9.10.
Andreas made sure that all primops are re-exported in the ghc-experimental
package via the module GHC.PrimOps (!13245), and changed the versioning
scheme of ghc-experimental to follow GHC versions (!13344, #25289).
Ben fixed a performance regression in throw by judicious insertion of
noinline (!13275, #25066), as discussed in CLC proposal #290.
Matthew unwired the base package to pave the way for a reinstallable base
package (!13200).
Matthew upgraded GHC’s Unicode support to Unicode 16 (!13514).
Matthew enabled late cost-centres by default when the libraries distributed
with GHC are built for profiling (!10930, #21732). This greatly improves the
resolution of cost centre stacks when profiling.
Andreas fixed a bug in which profiling could affect program behaviour by
allowing profiling ticks to move past unsafeCoerce# in Core (!13413, #25212).
Build system
Ben fixed the configure script incorrectly reporting that subsections-via-symbols
is enabled on AArch64/Darwin (!12834, #24962).
The first alpha pre-release of 9.12 incorrectly had a version number of
9.12.20241014 instead of 9.12.0.20241014, which broke the expected
lexicographic ordering of GHC releases. Ben added a check for the validity of
a release GHC version number to prevent such issues in the future (!13456).
Matthew allowed GHC to build with happy-2.0.2 (!13318, #25276), and
Ben with happy-2.1.2 (!13532, #25438).
Andreas made the Hadrian progress messages including the working directory
to allow quickly distinguishing builds when multiple builds are in progress
(!13353, #25335).
Ben allowed Haddock options to be passed as Hadrian key-value settings (!11006).
Testsuite
Ben improved the reporting of certain errors in the testsuite (!13332).
Ben ensured performance metrics are collected even when there are untracked
files (!13579, #25471).
Ben ensured performance metrics are properly pushed for WebAssembly jobs
(!13312).
Zubin made several improvements to the testsuite, in particular regarding
normalisation of tests and fixing a Haddock bug involving files with the same
modification time (!13418, !13522).
CI
Matthew added a i386 validation job that can be triggered by adding the
i386 label on an MR (!13352).
Matthew added a mechanism for only triggering certain individual jobs in
CI by using the ONLY_JOBS variable (!13350).
Matthew fixed some issues of variable inheritance in the ghcup-metadata
testing job (!13306).
The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).
We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.
LTS 23 includes many package changes, and almost 3200 packages!
Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen.
(The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)
At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this.
This initial release build was made by Jens Petersen (64 commits).
Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled.
You can see all the changes made relative to the preceding last 9.8 nightly snapshot.
Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section.
We also have some tracking issues still open related to 9.10 core boot libraries.
Thank you to all those who have already done work updating their packages for ghc-9.10.
Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.
(Based on Gilbert and Sullivan’s ‘Major General song’)
John Longley
I am the very model of an Informatics lecturer, For educating students you will never find a betterer. I teach them asymptotics with a rigour that’s impeccable, I’ll show them how to make their proofs mechanically checkable. On parsing algorithms I can hold it with the best of them, With LL(1) and CYK and Earley and the rest of them. I’ll teach them all the levels of the Chomsky hierarchy… With a nod towards that Natural Language Processing malarkey.
I’ll summarize the history of the concept of a function, And I’ll tell them why their Haskell code is ‘really an adjunction’. In matters mathematical and logical, etcetera, I am the very model of an Informatics lecturer.
For matters of foundations I’m a genuine fanaticker: I know by heart the axioms of Principia Mathematica, I’m quite au fait with Carnap and with Wittgenstein’s Tractatus, And I’ll dazzle you with Curry, Church and Turing combinators. I’ll present a proof by Gödel with an algebraic seasoning, I’ll instantly detect a step of non-constructive reasoning. I’ll tell if you’re a formalist or logicist or Platonist… For I’ll classify your topos by the kinds of objects that exist.
I’ll scale the heights of cardinals from Mahlo to extendible, I’ll find your favourite ordinals and stick them in an n-tuple. In matters philosophical, conceptual, etcetera, I am the very essence of an Informatics lecturer.
And right now I’m getting started on my personal computer, I’ve discovered how to get it talking to the Wifi router. In Internet and World Wide Web I’ve sometimes had my finger dipped, And once I wrote a line of code in HTML/Javascript. [Sigh.] I know I have a way to go to catch up with my students, But I try to face each lecture with a dash of common prudence. When it comes to modern tech: if there’s a way to get it wrong, I do! But that seems to be forgiven if I ply them with a song or two.
So… although my present IT skills are rather rudimentary, And my knowledge of computing stops around the nineteenth century, Still, with help from all my colleagues and my audience, etcetera… I’ll be the very model of an Informatics lecturer.
Tom Ellis, who I have the privilege of working with at Groq, has an excellent article up about using HasCallStack in embedded DSLs. You should read it. If you don’t, though, the key idea is that HasCallStack isn’t just about exceptions: you can use it to get source code locations in many different contexts, and storing call stacks with data is particularly powerful in providing a helpful experience to programmers.
Seeing Tom’s article reminded me of a CodeWorld feature which was implemented long ago, but I’m excited to share again in this brief note.
CodeWorld Recap
If you’re not familiar with CodeWorld, it’s a web-based programming environment I created mainly to teach mathematics and computational thinking to students in U.S. middle school, ages around 11 to 14 years old. The programming language is based on Haskell — well, it is technically Haskell, but with a lot of preprocessing and tricks aimed at smoothing out the rough edges. There’s also a pure Haskell mode, giving you the full power of the idiomatic Haskell language.
In CodeWorld, the standard library includes primitives for putting pictures on the screen. This includes:
A few primitive pictures: circles, rectangles, and the like
Transformations to rotate, translate, scale, clip, and and recolor an image
Compositions to overlay and combine multiple pictures into a more complex picture.
Because the environment is functional and declarative — and this will be important — there isn’t a primitive to draw a circle. There is a primitive that represents the concept of a circle. You can include a circle in your drawing, of course, but you compose a picture by combining simpler pictures declaratively, and then draw the whole thing only at the very end.
Debugging in CodeWorld
CodeWorld’s declarative interface enables a number of really fun kinds of interactivity… what programmers might call “debugging”, but for my younger audience, I view as exploratory tools: ways they can pry open the lid of their program and explore what it’s doing.
There are a few of these that are pretty awesome. Lest I seem to be claiming the credit, the implementation for these features is due to two students in Summer of Haskell and then in Google Summer of Code: Eric Roberts, and Krystal Maughan.
Not the point here, but there are some neat features for rewinding and replaying programs, zooming in, etc.
There’s also an “inspect” mode, in which you not only see the final result, but the whole structure of the resulting picture (e.g., maybe it’s an overlay of three other pictures: a background, and two characters, and each of those is transformed in some way, and the base picture for the transformation is some other overlay of multiple parts…) This is possible because pictures are represented not as bitmaps, but as data structures that remember how the picture was built from its individual parts
Krystal’s recap blog post contains demonstrations of not only her own contributions, but the inspect window as well. Here’s a section showing what I’ll talk about now.
The inspect window is linked to the code editor! Hover over a structural part of the picture, and you can see which expression in your own code produced that part of the picture.
This is another application of the technique from Tom’s post. The data type representing pictures in CodeWorld stores a call stack captured at each part of the picture, so that when you inspect the picture and hover over some part, the environment knows where in your code you described that part, and it highlights the code for you, and jumps there when clicked.
While it’s the same technique, I really like this example because it’s not at all like an exception. We aren’t reporting errors or anything of the sort. Just using this nice feature of GHC that makes the connection between code and declarative data observable to help our users observe things about their own code.
Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.
In this blog post we will introduce a new open source Haskell library called
debuggable, which provides various utilities designed to
make it easier to debug your applications. Some of these are intended for use
during actual debugging, others are designed to be a regular part of your
application, ready to be used if and when necessary.
Non-interleaved output
Ever see output like this when debugging concurrent applications?
ATnhdi st hiiss ai sm eas smaegses afgreo mf rtohme tfhier sste ctohnrde atdh
read
AndT htihsi si si sa am emsessasgaeg ef rformo mt hteh ef isrescto ntdh rteharde
ad
TAhnids tihsi sa imse sas amgees sfargoem ftrhoem ftihres ts etchorneda dt
hread
The problem is that concurrent calls to putStrLn can result in interleaved
output. To solve this problem, debuggable offers
Debug.NonInterleavedIO, which provides variants of
putStrLn and friends, as well as trace and its variants, all of which can
safely be called concurrently without ever resulting in interleaved output.
For example:
importDebug.NonInterleavedIOqualifiedasNIIOuseDebuggable ::IO ()useDebuggable =do concurrently_ ( replicateM_ 10$do NIIO.putStrLn "This is a message from the first thread" threadDelay 100_000 ) ( replicateM_ 10$do NIIO.putStrLn "And this is a message from the second thread" threadDelay 100_000 )
If we run this as-is, we will only see
niio output to /tmp/niio2418318-0
on the terminal; inspecting /tmp/niio2418318-0 will show
And this is a message from the second thread
This is a message from the first thread
And this is a message from the second thread
This is a message from the first thread
...
If you want to send the output to a specific file (or /dev/stdout for output
to the terminal), you can set the NIIO_OUTPUT environment variable.
Provenance
Provenance is about tracking of what was called when and where.
The callstack we get from this example looks something like this:
CallStack (from HasCallStack):
f3, called at Demo/Provenance.hs:15:6 in ..
f2, called at Demo/Provenance.hs:12:6 in ..
Callstacks are awesome, and a huge help during debugging, but there are some
minor issues with this example:
Personally, this has always felt a bit “off by one” to me: the first entry
tells us that we are inf3, but we were called from line 15, which is
f2; likewise, the second entry in the stack tells us that we are inf2,
but we were called from line 12, which is f1. Not a huge deal, but
arguably a bit confusing.
(See also GHC ticket #25546: Make HasCallStack include the caller.)
Somewhat relatedly, when we are in f3, and ask for a CallStack, being
told that we are in f3 is not particularly helpful (we knew that already).
Finally, it is sometimes useful to have just the “first” entry in the
callstack; “we were called from line such and such, which is function so and
so”.
For this reason, Debug.Provenance provides a CallSite
abstraction
where line 31 is the call to g3 in g2. Due to the (alleged) “off-by-one”,
both g2 and g3 must be given a HasCallStack constraint, otherwise we get
{unknown} -> g3 (Demo/CallSite.hs:31:6)
when g2 lacks the constraint, or
{unknown} -> {unknown} ()
when g3 does. There is also a variant callSiteWithLabel, which results in
output such as
g2 -> g3 (Demo/CallSite.hs:31:6, "foo")
Invocations
Sometimes we are not so much interested in where we are called from, but how
often a certain line in the source is called.
Debug.Provenance offers “invocations” to track this:
The definition of g4 above is still a little clunky, especially if we also
want to include other output than just the invocation itself. We can do better:
importDebug.NonInterleavedIO.ScopedqualifiedasScopedg4 ::HasCallStack=>IO ()g4 =do Scoped.putStrLn "start"-- f4 does something .. Scoped.putStrLn "middle"-- f4 does something else .. Scoped.putStrLn "end"
As the name suggests, though, there is more going on here than simply a more
convenient API: Debug.Provenance.Scope offers a combinator
called scoped for scoping invocations:
The counters that are part of an Invocation can be very useful to
cross-reference output messages from multiple threads. Continuing with the g4
example we introduced in the section on Scope, suppose we have
Suppose we have some functions which take another function, a callback,
as argument, and invoke that callback at some point:
f1 ::HasCallStack=> (Int->IO ()) ->IO ()f1 k = f2 kf2 ::HasCallStack=> (Int->IO ()) ->IO ()f2 k = scoped $ k 1
Let’s use this example callback function:
g1 ::HasCallStack=>Int->IO ()g1 n = g2 ng2 ::HasCallStack=>Int->IO ()g2 n = Scoped.putStrLn $"n = "++show n ++" at "++ prettyCallStack callStack
and invoke f1 as follows:
withoutDebuggable ::HasCallStack=>IO ()withoutDebuggable = f1 g1
This outputs:
[g2 (Demo/Callback.hs:26:8) #1, f2 (Demo/Callback.hs:20:8) #1]
n = 1 at CallStack (from HasCallStack):
g2, called at Demo/Callback.hs:23:8 in ..
g1, called at Demo/Callback.hs:29:24 in ..
withoutDebuggable, called at Demo.hs:25:36 in ..
Confusingly, this callstack does not include any calls to f1 or f2. This
happens because the call to k in f2 does not pass the current CallStack;
instead we see the CallStack as it was when we definedg1.
For callbacks like this it is often useful to have two pieces of information:
the CallStack that shows how the callback is actually invoked, and the
CallSite where the callback was defined.
Debug.Provenance.Callback provides a Callback
abstraction that does exactly this. A Callback m a b is essentially a function
a -> m b, modulo treatment of the CallStack. Let’s change f1 and f2 to
take a CallBack instead:
h1 ::HasCallStack=>CallbackIOInt () ->IO ()h1 k = h2 kh2 ::HasCallStack=>CallbackIOInt () ->IO ()h2 k = scoped $ invokeCallback k 1
[g2 (Demo/Callback.hs:26:8) #1, h2 (Demo/Callback.hs:39:8) #1]
n = 1 at CallStack (from HasCallStack):
g2, called at Demo/Callback.hs:23:8 in ..
g1, called at Demo/Callback.hs:42:30 in ..
callbackFn, called at src/Debug/Provenance/Callback.hs:57:48 in ..
invoking callback defined at useDebuggable (Demo/Callback.hs:42:21), called at ..
h2, called at Demo/Callback.hs:36:8 in ..
h1, called at Demo/Callback.hs:42:17 in ..
useDebuggable, called at Demo.hs:26:36 in ..
Alternative: profiling backtraces
in addition to HasCallStack-style backtraces, there may also be other types of
backtraces available, depending on how we build and how we run the code (we
discuss some of these in the context of exception handling in episode 29 of the
Haskell Unfolder). The most important of these is probably the
profiling (cost centre) backtrace.
We can request the “current” callstack with currentCallstack, and the
callstack attached to an object (“where was this created”) using whoCreated.
This allows us to make similar distinctions that we made in Callback, for
example:
f1 :: (Int->IO ()) ->IO ()f1 k =do cs <- whoCreated kputStrLn$"f1: invoking callback defined at "++show (cs) f2 kf2 :: (Int->IO ()) ->IO ()f2 k = k 1g1 ::Int->IO ()g1 n = g2 ng2 ::Int->IO ()g2 n =do cs <- currentCallStackputStrLn$"n = "++show n ++" at "++show cs
This does require the code to be compiled with profiling enabled. The profiling
callstacks are sometimes more useful than HasCallstack callstacks, and sometimes
worse; for example, in
demo ::MaybeInt->IO ()demo Nothing= f1 (\x -> g1 x)demo (Just i) = f1 (\x -> g1 (x + i))
the function defined in the Just case will have a useful profiling callstack,
but since the function defined in the Nothing case is entirely static (does
not depend on any runtime info), its callstack is reported as
["MAIN.DONT_CARE (<built-in>)"]
It would be useful to extend debuggable with support for both types of
backtraces in a future release.
Performance considerations
Adding permanent HasCallStack constraints to functions does come at a slight
cost, since they correspond to additional arguments that must be passed at
runtime. For most functions this is not a huge deal; personally, I consider some
well-placed HasCallStack constraints part of designing with debugging in mind.
That said, you will probably want to avoid adding HasCallStack constraints to
functions that get called repeatedly in tight inner loops; similar
considerations also apply to the use of the Callback abstraction.
Conclusions
Although debuggable is a small library, it offers some functionality that has
proven quite useful in debugging applications, especially concurrent ones.
We can probably extend it over time to cover more use cases; “design for
debuggability” is an important principle, and is made easier with proper
library support. Contributions and comments are welcome!
As a side note, the tracing infrastructure of debuggable can also be combined
with the recover-rtti package, which implements some dark magic
to recover runtime type information by looking at the heap; in particular, it
offers
anythingToString ::forall a. a ->String
which can be used to print objects without having a Show a instance available
(though this is not the only use of recover-rtti). The only reason that
debuggable doesn’t provide explicit support for this is that the dependency
footprint of recover-rtti is a bit larger.
Today, 2024-12-04, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 37th episode of the Haskell Unfolder live on YouTube.
In this episode of the Haskell Unfolder, we are going to try solving the latest problem of this year’s Advent of Code live.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
The GHC developers are happy to announce the availability of GHC 9.8.4. Binary
distributions, source distributions, and documentation are available on the
release page.
This release is a small release fixing a few issues noted in 9.8.3, including:
Update the filepath submodule to avoid a misbehavior of splitFileName under Windows.
Update the unix submodule to fix a compilation issue on musl platforms
Fix a potential source of miscompilation when building large projects on 32-bit platforms
Fix unsound optimisation of prompt# uses
A full accounting of changes can be found in the release notes. As
some of the fixed issues do affect correctness users are encouraged to
upgrade promptly.
We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and
other anonymous contributors whose on-going financial and in-kind support has
facilitated GHC maintenance and release management over the years. Finally,
this release would not have been possible without the hundreds of open-source
contributors whose work comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
When writing a small tool to interface with Keycloak I found an endpoint that
require the content type to be application/json while the body should be plain
text. (The details are in the issue.) Since servant assumes that the content
type and the content match (I know, I'd always thought that was a safe
assumption to make too) it doesn't work with ReqBody '[JSON] Text. Instead I
had to create a custom type that's a combination of JSON and PlainText,
something that turned out to required surprisingly little code:
I'm working on a side project that is written in Rust on the backend and the frontend. The frontend component is in Leptos. Our app is about 20kLOC in total, so it takes a little time.
Last month I did a fairly complex piece of systems programming that
worked surprisingly well. But it had one big bug that took me a day
to track down.
One reason I find the bug interesting is that it exemplifies the sort
of challenges that come up in systems programming. The essence of
systems programming is that your program is dealing with the state of
a complex world, with many independent agents it can't control, all
changing things around. Often one can write a program that puts down
a wrench and then picks it up again without looking. In systems
programming, the program may have to be prepared for the possibility
that someone else has come along and moved the wrench.
The other reason the bug is interesting is that although it was a big
bug, fixing it required only a tiny change. I often struggle to
communicate to nonprogrammers just how finicky and fussy programming
is. Nonprogrammers, even people who have taken a programming class or
two, are used to being harassed by crappy UIs (or by the compiler)
about missing punctuation marks and trivially malformed inputs, and
they think they understand how fussy programming is. But they usually
do not. The issue is much deeper, and I think this is a great example
that will help communicate the point.
The job of my program, called sync-spam, was to move several weeks of
accumulated email from system S to system T. Each message was
probably spam, but its owner had not confirmed that yet, and the
message was not yet old enough to be thrown away without confirmation.
The probably-spam messages were stored on system S in
a directory hierarchy with paths like this:
/spam/2024-10-18/…
where 2024-10-18 was the date the message had been received. Every
message system S had received on October 18 was somewhere under
/spam/2024-10-18.
One directory, the one for the current date, was "active", and new
messages were constantly being written to it by some other programs
not directly related to mine. The directories for the older dates
never changed. Once sync-spam had dealt with the backlog of old messages, it
would continue to run, checking periodically for new messages in the
active directory.
The sync-spam program had a database that recorded, for each message, whether
it had successfully sent that message from S to T, so that it
wouldn't try to send the same message again.
The program worked like this:
Repeat forever:
Scan the top-level spam directory for the available dates
For each date D:
Scan the directory for D and find the messages in it.
Add to the database any messages not already recorded there.
Query the database for the list of
messages for date D that have not yet been sent
to T
For each such message:
Attempt to send the message
If the attempt was successful, record that in the database
Wait some appropriate amount of time and continue.
Okay, very good. The program would first attempt to deal with all the
accumulated messages in roughly chronological order, processing the
large backlog. Let's say that on November 1 it got around to scanning
the active 2024-11-01 directory for the first time. There are many
messages, and scanning takes several minutes, so by the time it
finishes scanning, some new messages will be in the active directory that
it hasn't seen. That's okay. The program will attempt to send the
messages that it has seen. The next time it comes around to
2024-11-01 it will re-scan the directory and find the new messages
that have appeared since the last time around.
But scanning a date directory takes several minutes, so we would
prefer not to do it if we don't have to. Since only the active
directory ever changes, if the program is running on November 1, it
can be sure that none of the directories from October will ever change
again, so there is no point in its rescanning them. In fact, once we
have located the messages in a date directory and recorded them in the
database, there is no point in scanning it again unless it is the
active directory, the one for today's date.
So sync-spam had an elaboration that made it much more efficient. It was
able to put a mark on a date directory that meant "I have completely
scanned this directory and I know it will not change again". The
algorithm was just as I said above, except with these elaborations.
Repeat forever:
Scan the top-level spam directory for the available dates
For each date D:
If the directory for D is marked as having already been scanned, we
already know exactly what messages are in it, since they are already
recorded in the database.
Otherwise:
Scan the directory for D and find the messages in it.
Add to the database any messages not already recorded there.
If D is not today's date, mark the directory for
D as
having been
scanned completely, because we need not
scan it again.
Query the database for the list of
messages for date D that have not yet been sent
to T
For each such message:
Attempt to send the message
If the attempt was successful, record that in the database
Wait some appropriate amount of time and continue.
It's important to not mark the active directory as having been
completely scanned, because new messages are continually being
deposited into it until the end of the day.
I implemented this, we started it up, and it looked good. For several
days it processed the backlog of unsent messages from
September and October, and it successfully sent most of them. It
eventually caught up to the active directory for the current date, 2024-11-01, scanned
it, and sent most of the messages. Then it went back and started over
again with the earliest date, attempting to send any messages that it
hadn't sent the first time.
But a couple of days later, we noticed that something was wrong.
Directories 2024-11-02 and 2024-11-03 had been created and were
well-stocked with the messages that had been received on those dates. The
program had found the directories for those dates and had marked them as having been
scanned, but there were no messages from those dates in its database.
Now why do you suppose that is?
(Spoilers will follow the horizontal line.)
I investigate this in two ways. First, I made sync-spam's logging more
detailed and looked at the results. While I was waiting for more logs
to accumulate, I built a little tool that would generate a small,
simulated spam directory on my local machine, and then I ran sync-spam
against the simulated messages, to make sure it was doing what I
expected.
In the end, though, neither of these led directly to my solving the
problem; I just had a sudden inspiration. This is very unusual for
me. Still, I probably wouldn't have had the sudden inspiration if the
information from the logging and the debugging hadn't been percolating
around my head. Fortune favors the prepared mind.
The problem was this: some other agent was creating the 2024-11-02
directory a bit prematurely, say at 11:55 PM on November 1.
Then sync-spam came along in the last minutes of November 1 and started its
main loop. It scanned the spam directory for available dates, and
found 2024-11-02. It processed the unsent messages from the
directories for earlier dates, then looked at 2024-11-02 for the
first time. And then, at around 11:58, as per above it would:
Scan the directory for 2024-11-02 and find the messages in it.
Add to the database any messages not already recorded there.
There weren't any yet, because it was still 11:58 on November 1.
If 2024-11-02 is not today's date, mark the directory
as having been scanned completely, because we need not
scan it again.
Since the 2024-11-02 directory was not the one for today's date —
it was still 11:58 on November 1 — sync-spam recorded that it had
scanned that directory completely and need
not scan it again.
Five minutes later, at 00:03 on November 2, there would be new
messages in the 2024-11-02, which was now the active directory, but
sync-spam wouldn't look for them, because it had already marked 2024-11-02
as having been scanned completely.
This complex problem in this large program was completely fixed by changing:
if ($date ne $self->current_date) {
$self->mark_this_date_fully_scanned($date_dir);
}
to:
if ($date lt $self->current_date) {
$self->mark_this_date_fully_scanned($date_dir);
}
(ne and lt are Perl-speak for "not equal to" and "less than".)
Many organizations have their own version of a certain legend,
which tells how a famous
person from the past was once called out of retirement to solve a
technical problem that nobody else could understand. I first heard
the General Electric version of the legend, in which
Charles Proteus Steinmetz
was called out of retirement to figure out why a large complex of
electrical equipment was not working.
In the story, Steinmetz walked around the room, looking briefly at
each of the large complicated machines. Then, without a word, he took
a piece of chalk from his pocket, marked one of the panels, and
departed. When the puzzled engineers removed that panel, they found a
failed component, and when that component was replaced, the problem
was solved.
Steinmetz's consulting bill for $10,000 arrived the following week.
Shocked, the bean-counters replied that $10,000 seemed an exorbitant fee
for making a single chalk mark, and, hoping to embarrass him into
reducing the fee, asked him to itemize the bill.
Steinmetz returned the itemized bill:
One chalk mark
$1.00
Knowing where to put it
$9,999.00
TOTAL
$10,000.00
This felt like one of those times. Any day when I can feel a
connection with Charles Proteus Steinmetz is a good day.
This episode also makes me think of the following variation on an old joke:
A: Ask me what is the most difficult thing about systems programming.
The GHC developers are very pleased to announce the availability
of the release candidate for GHC 9.12.1. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org.
We hope to have this release available via ghcup shortly.
GHC 9.12 will bring a number of new features and improvements, including:
The new language extension OrPatterns allowing you to combine multiple
pattern clauses into one.
The MultilineStrings language extension to allow you to more easily write
strings spanning multiple lines in your source code.
Improvements to the OverloadedRecordDot extension, allowing the built-in
HasField class to be used for records with fields of non lifted representations.
The NamedDefaults language extension has been introduced allowing you to
define defaults for typeclasses other than Num.
More deterministic object code output, controlled by the -fobject-determinism
flag, which improves determinism of builds a lot (though does not fully do so)
at the cost of some compiler performance (1-2%). See #12935 for the details
GHC now accepts type syntax in expressions as part of GHC Proposal #281.
The WASM backend now has support for TemplateHaskell.
… and many more
A full accounting of changes can be found in the release notes.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
Suppose we have a list of items of length \(n\), and we want to
consider windows (i.e. contiguous subsequences) of width \(w\)
within the list.
A list of numbers, with contiguous size-3 windows highlighted
We can compute the sum of each window by brute
force in \(O(nw)\) time, by simply generating the list of all the
windows and then summing each. But, of course, we can do better: keep
track of the sum of the current window; every time we slide the window
one element to the right we can add the new element that enters the
window on the right and subtract the element that falls of the window
to the left. Using this “sliding window” technique, we can compute the
sum of every window in only \(O(n)\) total time instead of \(O(nw)\).
How about finding the maximum of every window? Of course the brute
force \(O(nw)\) algorithm still works, but doing it in only \(O(n)\) is
considerably trickier! We can’t use the same trick as we did for sums
since there’s no way to “subtract” the element falling off the left.
This really comes down to the fact that addition forms a group
(i.e. a monoid-with-inverses), but max does not. So more
generally, the question is: how can we compute a monoidal summary
for every window in only \(O(n)\) time?
Today I want to show you how to solve this problem using one of my
favorite competitive programming tricks, which fits beautifully in a
functional context. Along the way we’ll also see how to implement
simple yet efficient functional queues.
Stacks
Before we get to queues, we need to take a detour through stacks.
Stacks in Haskell are pretty boring. We can just use a list, with the
front of the list corresponding to the top of the stack. However, to
make things more interesting—and because it will come in very handy
later—we’re going to implement monoidally-annotated stacks. Every
element on the stack will have a measure, which is a value from some
monoid m. We then want to be able to query any stack for the total
of all the measures in \(O(1)\). For example, perhaps we want to always
be able to find the sum or max of all the elements on a stack.
If we wanted to implement stacks annotated by a group, we could just
do something like this:
dataGroupStack g a =GroupStack (a -> g) !g [a]
That is, a GroupStack stores a measure function, which assigns to
each element of type a a measure of type g (which is intended to
be a Group); a value of type g representing the sum (via the group
operation) of measures of all elements on the stack; and the actual
stack itself. To push, we would just compute the measure of the new element
and add it to the cached g value; to pop, we subtract the measure of
the element being popped, something like this:
push :: a ->GroupStack g a ->GroupStack g apush a (GroupStack f g as) =GroupStack f (f a <> g) (a:as)pop ::GroupStack g a ->Maybe (a, GroupStack g a)pop (GroupStack f g as) =case as of [] ->Nothing (a:as') ->GroupStack f (inv (f a) <> g) as'
But this won’t work for a monoid, of course. The problem is pop, where
we can’t just subtract the measure for the element being
popped. Instead, we need to be able to restore the measure of a
previous stack. Hmmm… sounds like we might be able to use… a stack! We
could just store a stack of measures alongside the stack of elements;
even better is to store a stack of pairs. That is, each element on
the stack is paired with an annotation representing the sum of all the
measures at or below it. Here, then, is our representation of
monoidally-annotated stacks:
{-# LANGUAGE BangPatterns #-}moduleStackwheredataStack m a =Stack (a -> m) !Int [(m, a)]
A Stack m a stores three things:
A measure function of type a -> m.Incidentally, what if we want
to be able to specify an arbitrary measure for each element, and
even give different measures to the same element at different
times? Easy: just use (m,a) pairs as elements, and use fst as
the measure function.
An Int representing the size of the stack. This is not strictly
necessary, especially since one could always just use a monoidal
annotation to keep track of the size; but wanting the size is so
ubiquitous that it seems convenient to just include it as a special
case.
The aforementioned stack of (annotation, element) pairs.
Note that we cannot write a Functor instance for Stack m, since
a occurs contravariantly in (a -> m). But this makes sense: if we
change all the a values, the cached measures would no longer be valid.
When creating a new, empty stack, we have to specify the measure
function; to get the measure of a stack, we just look up the measure
on top, or return mempty for an empty stack.
new :: (a -> m) ->Stack m anew f =Stack f 0 []size ::Stack m a ->Intsize (Stack _ n _) = nmeasure ::Monoid m =>Stack m a -> mmeasure (Stack _ _ as) =case as of [] ->mempty (m, _) : _ -> m
Now let’s implement push and pop. Both are relatively
straightforward.
push ::Monoid m => a ->Stack m a ->Stack m apush a s@(Stack f n as) =Stack f (n +1) ((f a <> measure s, a) : as)pop ::Stack m a ->Maybe (a, Stack m a)pop (Stack f n as) =case as of [] ->Nothing (_, a) : as' ->Just (a, Stack f (n -1) as')
Note that if we care about using non-commutative monoids,
in the implementation of push we have a choice to make between f a <> measure s and measure s <> f a. The former seems nicer to me,
since it keeps the measures “in the same order” as the list
representing the stack. For example, if we push a list of elements
onto a stack via foldr, using the measure function (:[]) that injects
each element into the monoid of lists, the resulting measure is just
the original list:
measure .foldr push (new (:[])) ==id
And more generally, for any measure function f, we have
measure .foldr push (new f) ==foldMap f
Finally, we are going to want a function to reverse a stack, which
is a one-liner:
reverse ::Monoid m =>Stack m a ->Stack m areverse (Stack f _ as) = foldl' (flip push) (new f) (mapsnd as)
That is, to reverse a stack, we extract the elements and then use
foldl' to push the elements one at a time onto a new stack using the
same measure function.
Now that we have monoidally-annotated stacks under our belt, let’s
turn to queues. And here’s where my favorite trick is revealed: we
can implement a queue out of two stacks, so that enqueue and dequeue
run in \(O(1)\) amortized time; and if we use monoidally-annotated
stacks, we get monoidally-annotated queues for free!
First, some imports.
{-# LANGUAGE ImportQualifiedPost #-}moduleQueuewhereimportData.Bifunctor (second)importStack (Stack)importStackqualifiedasStack
A Queue m a just consists of two stacks, one for the front and one
for the back. To create a new queue, we just create two new stacks;
to get the size of a queue, we just add the sizes of the stacks; to
get the measure of a queue, we just combine the measures of the
stacks. Easy peasy.
typeCommutativeMonoid=MonoiddataQueue m a =Queue {getFront ::Stack m a, getBack ::Stack m a}deriving (Show, Eq)new :: (a -> m) ->Queue m anew f =Queue (Stack.new f) (Stack.new f)size ::Queue m a ->Intsize (Queue front back) = Stack.size front + Stack.size backmeasure ::CommutativeMonoid m =>Queue m a -> mmeasure (Queue front back) = Stack.measure front <> Stack.measure back
Note the restriction to commutative monoids, since the queue
elements are stored in different orders in the front and back stacks.
If we really cared about making this work with non-commutative
monoids, we would have to make two different push methods for the
front and back stacks, to combine the measures in opposite orders.
That just doesn’t seem worth it. But if you have a good example
requiring the use of a queue annotated by a non-commutative monoid,
I’d love to hear it!
Now, to enqueue, we just push the new element on the back:
enqueue ::CommutativeMonoid m => a ->Queue m a ->Queue m aenqueue a (Queue front back) =Queue front (Stack.push a back)
Dequeueing is the magic bit that makes everything work. If there are
any elements in the front stack, we can just pop from there.
Otherwise, we need to first reverse the back stack into the front
stack. This means dequeue may occasionally take \(O(n)\) time, but it’s
still \(O(1)\) amortized.The easiest way to see this is to note that
every element is touched exactly three times: once when it is pushed
on the back; once when it is transferred from the back to the front;
and once when it is popped from the front. So, overall, we do \(O(1)\)
work per element.
dequeue ::CommutativeMonoid m =>Queue m a ->Maybe (a, Queue m a)dequeue (Queue front back)| Stack.size front ==0&& Stack.size back ==0=Nothing| Stack.size front ==0= dequeue (Queue (Stack.reverse back) front)|otherwise= second (\front' ->Queue front' back) <$> Stack.pop front
Finally, for convenience, we can make a function drop1 which just
dequeues an item from the front of a queue and throws it away.
drop1 ::CommutativeMonoid m =>Queue m a ->Queue m adrop1 q =case dequeue q ofNothing-> qJust (_, q') -> q'
This “banker’s queue” method of building a queue out of two stacks is
discussed in Purely Functional Data Structures by Okasaki, though I
don’t think he was the first to come up with the idea. It’s also
possible to use some clever tricks to make both enqueue and
dequeue take \(O(1)\) time in the worst
case.
In a future post I’d like to do some benchmarking to compare various
queue implementations (i.e. banker’s queues, Data.Sequence,
circular array queues built on top of STArray). At least
anecdotally, in solving some sliding window problems, banker’s queues
seem quite fast so far.
Sliding windows
I hope you can see how this solves the initial motivating problem: to
find e.g. the max of a sliding window, we can just put the elements
in a monoidally-annotated queue, enqueueing and dequeueing one element
every time we slide the window over.More generally, of course, it
doesn’t even matter if the left and right ends of the window stay
exactly in sync; we can enqueue and dequeue as many times as we want.
The following windows function computes the monoidal sum foldMap f window for each window of width \(w\), in only \(O(n)\) time
overall.
windows ::CommutativeMonoid m =>Int-> (a -> m) -> [a] -> [m]windows w f as = go startQ restwhere (start, rest) =splitAt w as startQ = foldl' (flip enqueue) (new f) start go q as = measure q :case as of [] -> [] a : as -> go (enqueue a (drop1 q)) as
“But…maximum and minimum do not form monoids, only semigroups!”
I hear you cry. Well, we can just adjoin special positive or negative
infinity elements as needed, like so:
dataMax a =NegInf|Max a deriving (Eq, Ord, Show)instanceOrd a =>Semigroup (Max a) whereNegInf<> a = a a <>NegInf= aMax a <>Max b =Max (max a b)instanceOrd a =>Monoid (Max a) wheremempty=NegInfdataMin a =Min a |PosInfderiving (Eq, Ord, Show)instanceOrd a =>Semigroup (Min a) wherePosInf<> a = a a <>PosInf= aMin a <>Min b =Min (min a b)instanceOrd a =>Monoid (Min a) wheremempty=PosInf
Now we can write, for example, windows 3 Max [1,4,2,8,9,4,4,6] which
yields [Max 4, Max 8, Max 9, Max 9, Max 9, Max 6], the maximums of
each 3-element window.
Challenges
If you’d like to try solving some problems using the techniques from this
blog post, I can recommend the following (generally in order of difficulty):
In a future post I’ll walk through my solution to Hockey
Fans. And here’s another
couple problems along similar lines; unlike the previous problems I am
not so sure how to solve these in a nice way. I may write about them
in the future.
UPDATE A few days after posting this article, I saw a video on X that gives (IMO) a better argument in favor of tariffs than I came up with here. If you're interested in the topic, I'd recommend giving it a watch, it's only about 11 minutes.
I’m a believer in the idea of free markets. The principle is simple: with less regulation and freedom of individuals to engage in trade and their own price discovery, we’ll end up with optimal price points and maximizing production, making everyone’s life better.
Tariffs fly in the face of this by introducing unnecessary and artificial barriers to trade. A classic example is sugar. The US has an import tariff on sugar, which makes it artificially more expensive to use sugar in products. Corn syrup, on the other hand, is produced from domestically grown corn and faces no such penalty. It is therefore artificially cheaper than sugar, and ends up being used in products. The results:
More corn is produced than is actually needed, preventing agriculture from focusing on higher value production for society
Consumers receive inferior products made out of corn syrup instead of sugar
Consumers pay more for these goods than they would without the tariff
Sugar producers outside the US make smaller profits
There’s an even worse aspect to tariffs though: they can kick off devastating battles between countries. Tariffs are essentially economic warfare, harming citizens of another country to help your own citizens. Once one country starts tariffs, it can snowball into a crippling domino effect that impedes all global trade.
Donald Trump has said he’s going to use tariffs, because we’re “losing on trade” by having a trade deficit. But that statement is bonkers. A trade deficit means a country receives more goods than it sends out. In other words, citizens do better with a trade deficit.
Based on all this, it seems pretty straightforward that economists would oppose Trump’s tariff plan, and would balk at his explanations. In this post, I want to steelman the position: give the best argument in favor of Trump’s plan that I can think of. (I won’t bother trying to defend the “losing on trade” comment though, it’s factually wrong, but seems like a good rallying cry for a policy from a political standpoint.)
To set the stage, we’re going to start by discussing two ideas, and then bringing them together: negative externalities and granularity of competition.
Negative externalities
In economics, a negative externality is when some activity has a negative impact on others. This essentially transfers some of the costs of an activity to others, while keeping all the benefits for the actor. A great example is pollution. A factory can either spend a million dollars a year cleaning up its waste, or it could dump its pollution into the lake. The business gets no benefit from an unpolluted lake, but normal people will lose the ability to use the lake. The business has externalized a cost onto society.
Programmers may already be familiar with another term for this concept: the tragedy of the commons.
One method to address externalities like this is through regulation: make it illegal for companies to pollute in the lake. But economics offers another approach to this as well: assign ownership rights on the lake. An owner can decide whether or not to allow pollution based on any criteria they want. Being a rational actor (one of the largest assumptions in economics, often violated), the new owner is incentivized to set up an auction for usage rights to the lake. The polluting business can compete against companies offering leisure activities on the lake. Then the free market can determine if the million dollars of cost savings is more valuable than the benefits people can take from a clean lake.
Instead of assigning the property rights to the lake to a private entity, the government can engage in this activity through open auction as well. This results in increased tax revenue, which will decrease the overall tax burden for everyone, flipping the tragedy of the commons into a benefit for all.
You may not like this solution, because you believe that the free market can’t properly price in the true value of a non-polluted lake, or because you don’t believe people will act rationally, or any other reason. That’s not terribly relevant for this discussion. The point is the definition of a negative externality, and the fact that the government can extract money from economic actors while increasing public good.
Granularity of competition
All of economics is a story of competition over scarce goods. Generally, we talk about the competition of individuals or private entities. In other words: people and businesses competing with each other. Note that the competition isn’t between buyers and sellers, as is often believed. Instead, buyers are competing with other buyers, pushing prices up, while sellers are competing with other sellers, pushing prices down.
One of the underlying assumptions of capitalism is that there’s a fair playing field. All actors should be treated equally. In practice, this fails fairly often. For example, in crony capitalism, select businesses receive special handouts from the government. Monopolies are another arguable example, where a company can leverage its overwhelming market share in one industry to subsidize the destruction of competitors in another industry.
As mentioned above, tariffs hurt people by creating an uneven playing field between individuals. But viewed at the national level, tariffs are a method of competition between different governments. In other words, leveraging tariffs may end up hurting competition at the granular level, but might end up serving the interests of a government policy which is at odds with “make all goods cheaper through more competition.”
Protecting workers
In this sense, tariffs are not at all unique. Differences in regulations between countries, laws about fair labor practices, environment impact, local tax structures, and manipulation of currency exchange rates are all part of the competition between different nations. As a simple example, suppose country A has strict labor laws, demanding safe working conditions and demanding health coverage for all workers, while country B does not. Country B will be able to outcompete country A for new businesses, because it’s relatively cheaper to produce in country B.
Tariffs in such a scenario can be a method for country A to make production in country B less attractive. If country A imposes a 20% tariff on country B imports due to human rights violations, it is in essence making production in country A more competitive again. Without this kind of change, countries seeking to attract investment and new businesses may be incentivized to pass laws that hurt their citizens just to bring down production costs.
National security
Another topic is national security. Let’s take countries C and D, who are not on the best of terms. Both countries are stocking up on weapons in case war breaks out. Both countries locally source weapons production, ordering lots of tanks from domestic producers.
Firstly, why domestic? Because it would be crazy to put your national defense in the hands of a potential enemy!
But suppose there’s completely free international trade, with no embargos and no tariffs. Country C is a major steel exporter, obviously an import input to tank production. Country C can engage in some economic warfare of its own against country D:
Subsidize local production of steel
Cheaper steel exports prevent domestic production of steel from ramping up in country D
Under normal circumstances, this is great! It’s what we would term specialization, allowing citizens in country D to focus on what they’re relatively better at.
In the long run, the strategy would be unprofitable for country C’s government, and it will eventually either go bankrupt or have to halt the policy.
However, since we’re discussing national security…
When war breaks out, country C can simply block all steel exports
Country D will face a supply chain crisis. Its domestic steel production is low, it hasn’t invested in better tools and technology for steel production, and it will have to quickly and inefficiently produce enough steel to keep up with the war effort.
Tying it together
The best argument I can pull together from these points is that tariffs can be used to place the United States in a stronger position for future competition with geopolitical competitors. Instead of allowing poor labor practices in other countries to drag down the standard of living for Americans, tariffs will artificially inflate the price of incoming goods. Instead of rewarding other countries that pollute, tariffs will extract a penalty from those countries, properly allocating the costs of the negative externalities to those polluting nations. And finally, by incentivizing an increase in domestic production across the board, the US is set up for more autonomy in the case of escalation (through either embargos or full-on warfare), protecting its interests.
It’s the best argument I can make. Others can probably critique my points as well as provide better justifications than I have. I’d love to see those in the comments. The real question is: do the arguments in favor of these tariffs justify the costs many of us anticipate seeing: higher product costs, trade wars, decreased international trade, and isolationism of the US.
Personally, I don’t think the arguments add up. I’m mostly on the side of the mainstream for once. I wouldn’t have proposed tariffs in the current world environment, I wouldn’t vote in favor of them, and I wouldn’t speak out in support of them. And especially given that the proposal seems to be a flat tariff on most countries (with a higher tariff on China), it doesn’t seem to address the “negative externalities” bit at all, which would do better from targeted tariffs attempting to incentivize specific changes (like carbon emission reduction or improvement to labor laws).
Which leaves me with only one final argument in favor of the tariffs: they could be a great bluff. I think many people in the world believe that Trump would be willing to pull the trigger and enact such a policy. That gives him a really great negotiating position for whatever trade deals and other foreign policy objectives he has.
My prediction
I’m a software developer who watches politics and studies economics. My prediction on topics like this isn’t particularly informed, and is likely to be completely wrong. But I may as well put my thoughts in writing so everyone can remind me how wrong I was in the future!
I think Trump will continue to talk about tariffs in his new administration. He’ll spend more time discussing them with the press and foreign leaders than with the Republicans in Congress. There will always be a convenient reason why the tariffs aren’t proposed. And eventually, Trump will get some concessions from other nations, and will eventually “fail” to pass tariffs. The media will have a field day with it, and Trump will accuse someone somewhere of being the reason why “the greatest tax plan of all time” failed.
And maybe it’s my prediction just because I’m hoping it’s what becomes reality. Other outcomes I can foresee are much less rosy.
Two years ago I wrote a blog post to announce that the
GHC wasm backend had been merged upstream. I’ve been too lazy to write
another blog post about the project since then, but rest assured, the
project hasn’t stagnated. A lot of improvements have happened after the
initial merge, including but not limited to:
Many, many bugfixes in the code generator and runtime, witnessed by
the full GHC testsuite for the wasm backend in
upstream GHC CI pipelines. The GHC wasm backend is much more robust
these days compared to the GHC-9.6 era.
The GHC wasm backend can be built and tested on macOS and
aarch64-linux hosts as well.
Earlier this year, I landed the JSFFI feature for
wasm. This lets you call JavaScript from Haskell and vice versa,
with seamless integration of JavaScript async computation and
Haskell’s green threading concurrency model. This allows us to
support Haskell frontend frameworks like reflex & miso, and we have
an example repo to demonstrate that.
$ curl https://gitlab.haskell.org/haskell-wasm/ghc-wasm-meta/-/raw/master/bootstrap.sh |sh...
Everything set up in /home/terrorjack/.ghc-wasm.
Run 'source /home/terrorjack/.ghc-wasm/env' to add tools to your PATH.
$ . ~/.ghc-wasm/env
$ wasm32-wasi-ghc --interactive
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/ :? forhelp
ghci>
Both the Nix and non-Nix installation methods default to GHC HEAD, for
which binary artifacts for Linux and macOS hosts, for both x86_64 and aarch64, are
provided. The Linux binaries are statically linked so they should work
across a wide range of Linux distros.
If you take a look at htop, you’ll notice wasm32-wasi-ghc spawns
a node child process. That’s the “external interpreter” process that
runs our Template Haskell (TH) splice code as well as ghci bytecode. We’ll get to what
this “external interpreter” is about later, just keep in mind that
whatever code is typed into this ghci session is executed on the wasm
side, not on the native side.
Now let’s run some code. It’s been six years since I published the
first blog post when I joined Tweag and worked on a
prototype compiler codenamed “Asterius”; the first Haskell program I
managed to compile to wasm was fib, time to do that again:
It works, though with <semantics>O(2n)<annotation encoding="application/x-tex">O(2^n)</annotation></semantics>O(2n) time complexity. It’s easy to do an <semantics>O(n)<annotation encoding="application/x-tex">O(n)</annotation></semantics>O(n)
version, using the canonical Haskell fib implementation based on a
lazy infinite list:
That’s still boring isn’t it? Now buckle up, we’re gonna do an <semantics>O(1)<annotation encoding="application/x-tex">O(1)</annotation></semantics>O(1)
implementation… using Template Haskell!
Joking aside, the real point is not about how to implement fib, but
rather to demonstrate that the GHC wasm backend indeed supports
Template Haskell and ghci now.
Here’s a quick summary of wasm’s TH/ghci support status:
The patch has landed in the GHC master branch and will
be present in upstream release branches starting from ghc-9.12. I
also maintain non-official backport branches in my
fork, and wasm TH/ghci has been backported to 9.10
as well. The GHC release branch bindists packaged by
ghc-wasm-meta are built from my branches.
TH splices that involve only pure computation (e.g. generating class
instances) work. Simple file I/O also works, so
file-embed works. Side effects are limited to those
supported by WASI, so packages like gitrev won’t
work because you can’t spawn subprocesses in WASI. The same restrictions
apply to ghci.
Our wasm dynamic linker can load bytecode and compiled code, but the
only form of compiled code it can load are wasm shared
libraries. If you’re using wasm32-wasi-ghc directly to
compile code that involves TH, make sure to pass -dynamic-too to
ensure the dynamic flavour of object code is also generated. If
you’re using wasm32-wasi-cabal, make sure shared: True is
present in the global config file ~/.ghc-wasm/.cabal/config.
The wasm TH/ghci feature requires at least cabal-3.14 to work (the
wasm32-wasi-cabal shipped in ghc-wasm-meta is
based on the correct version).
Our novel JSFFI feature also works in ghci! You
can type foreign import javascript declarations directly into a
ghci session, use that to import sync/async JavaScript functions, and
even export Haskell functions as JavaScript ones.
If you have c-sources/cxx-sources in a cabal package, those can
be linked and run in TH/ghci out of the box. However, more complex
forms of C/C++ foreign library dependencies like pkgconfig-depends,
extra-libraries, etc. will require special care to build both
static and dynamic flavours of those libraries.
For ghci, hot reloading and basic REPL functionality works, but the
ghci debugger doesn’t work yet.
What happens under the hood?
For the curious mind, -opti-v can be passed to wasm32-wasi-ghc.
This tells GHC to pass -v to the external interpreter, so the
external interpreter will print all messages passed between it and the
host GHC process:
Why is any message passing involved in the first place? There’s a past
blog post which contains an overview of cross
compilation issues in Template Haskell, most of the points still hold
today, and apply to both TH as well as ghci. To summarise:
When GHC cross compiles and evaluates a TH splice, it has to load
and run code that’s compiled for the target platform. Compiling both
host/target code and running host code for TH is never officially
supported by GHC/Cabal.
The “external interpreter” runs on the target platform and handles
target code. Messages are passed between the host GHC and the external
interpreter, so GHC can tell the external interpreter to load stuff,
and the external interpreter can send queries back to GHC when
running TH splices.
In the case of wasm, the core challenge is dynamic linking: to be able to
interleave code loading and execution at run-time, all while sharing the
same program state. Back when I worked on Asterius, it could only link
a self-contained wasm module that wasn’t able to share any code/data with other
Asterius-linked wasm modules at run-time.
So I went with a hack: when compiling each single TH splice, just link
a temporary wasm module and run it, get the serialized result and
throw it away! That completely bypasses the need to make a wasm
dynamic linker. Needless to say, it’s horribly slow and doesn’t support
cross-splice state or ghci. Though it is indeed sufficient to support
compiling many packages that use TH.
Now it’s 2024, time to do it the right way: implement our own wasm
dynamic linker! Some other toolchains like emscripten
also support dynamic linking of wasm, but there’s really no code to
borrow here: each wasm dynamic linker is tailored to that toolchain’s
specific needs, and we have JSFFI-related custom sections in our wasm
code that can’t be handled by other linkers anyway.
Our wasm dynamic linker supports loading exactly one kind of wasm
module: wasm shared libraries. This is something that you
get by compiling C with wasm32-wasi-clang -shared, which enables generation of
position-independent code. Such machine code can be placed
anywhere in the address space, making it suitable for run-time code loading. A
wasm shared library is yet another wasm module; it imports the linear
memory and function table, and you can specify any base address for
memory data and functions.
So I rolled up my sleeves and got to work. Below is a summary of the journey I took towards
full TH & ghci support in the GHC wasm backend:
Step one was to have a minimum NodeJS script to load libc.so: it is
the bottom of all shared library dependencies, the first and most
important one to be loaded. It took me many cans of energy drink to
debug mysterious memory corruptions! But finally I could invoke any libc
function and do malloc/free, etc. from the NodeJS REPL, with the
wasm instance state properly persisted.
Then load multiple shared libraries up to libc++.so and running
simple C++ snippets compiled to .so. Dependency management logic
of shared libraries is added at this step: the dynamic linker
traverses the dependency tree of a .so, spawns async
WebAssembly.compile tasks, then sequentially loads the dynamic
libraries based on their topological order.
Then figure out a way to emit wasm position-independent-code from
GHC’s wasm backend’s native code generator. The GHC native code
generator emits a .s assembly file for the target platform, and
while assembly format for x86_64 or aarch64, etc. is widely taught,
there’s really no tutorial nor blog post to teach me about assembly
syntax for wasm! Luckily, learning from Godbolt output examples was
easy enough and I quickly figured out how the position-independent
entities are represented in the assembly syntax.
The dynamic linker can now load the Haskell ghci shared library!
It contains the default implementation of the external interpreter;
it almost worked out of the box, though the linker needed some special
logic to handle the piping logic across wasm/JS and the host GHC
process.
In ghci, the logic to load libraries, lookup symbols, etc. are
calling into the RTS linker on other platforms. Given all the logic
exists on the JS side instead of C for wasm, they are patched to call
back into the linker using JSFFI imports.
The GHC build system and driver needed quite a few adjustments, to
ensure that shared libraries are generated for the wasm target when
TH/ghci is involved. Thanks to Matthew Pickering for his patient and
constructive review of my patch, I was able to replace many hacks
in the GHC driver with more principled approaches.
The GHC driver also needs to learn to handle the wasm flavour of the
external interpreter. Thanks to the prior work of the JS backend team here,
my life is a lot easier when adding wasm external interpreter logic.
The GHC testsuite also needed quite a bit of work. In the end, there
are over 1000 new test case passes after I flip on TH/ghci support
for the wasm target.
What comes next?
The GHC wasm backend TH/ghci feature is way faster and more robust
than what I hacked in Asterius back then. One nice example I’d like to
show off here is pandoc-wasm: it’s finally possible
to compile our beloved pandoc tool to wasm again since
Asterius is deprecated.
The new pandoc-wasm is more performant not only at run-time, but
also at compile-time. On a GitHub-hosted runner with just 4 CPU cores
and 16 GB of memory, it takes around 16min to compile pandoc from
scratch, and the time consumption can even be halved on my own laptop
with peak memory usage at around 10.8GB. I wouldn’t doubt that
time/memory usage can triple or more with legacy GHC-based compilers
like Asterius or GHCJS to compile the same codebase!
The work on wasm TH/ghci is not fully finished yet. I do have some
things in mind to work on next:
Support running the wasm external interpreter in the browser via
puppeteer. So your ghci session can connect to the browser, all
your Haskell code runs in the browser main thread, and all
JSFFI logic in your code can access the browser’s
window context. This would allow you to do Haskell frontend livecoding
using ghci.
Support running an interactive ghci session within the browser.
Which means a truly client side Haskell playground in the browser.
It’ll only support in-memory bytecode, since it can’t invoke
compiler processes to do any heavy lifting, but it’s still good for
teaching purposes.
Maybe make it even faster? Performance isn’t my concern right
now, though I haven’t done any serious profiling and optimization in
the wasm dynamic linker either, so we’ll see.
Fix ghci debugger support.
You’re welcome to join the Haskell wasm Matrix room
to chat about the GHC wasm backend. Do get in touch if you feel it is
useful to your project!
Today, 2024-11-20, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 36th episode of the Haskell Unfolder live on YouTube.
There are two primary ways to import C functions in Haskell: “unsafe” and “safe”. We will first briefly recap what this means: unsafe functions are fast but cannot call back into Haskell, safe functions are much slower but can. As we will see in this episode, however, there are many more differences between unsafe and safe functions, especially in a concurrent setting. In particular, safe functions are not always safer!
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
In this episode, Matti and Sam traveled to the International Conference on Functional Programming (ICFP 2024) in Milan, Italy, and recorded snippets with various participants, including keynote speakers, Haskell legends, and organizers.
Since deciding to write more blog posts again, I’ve drafted and thrown away a few different versions of this blog post. Originally, I was going to try to explain how Bitcoin works, or motivate to others why they should care about it. But the reality is that there is already much better material out there than I can produce. So instead, I’ve decided to make this much more personal: my own journey, why I ultimately changed my opinion and decided to embrace Bitcoin, and then answer some questions and comments I’ve received from others.
If you really do want to learn the best arguments in favor of Bitcoin (as opposed to “price goes up” style arguments), here is my list of top resources. I’ll try to keep this list up to date over time:
My worldview is highly impactful on this discussion, so I need to get the cliffnotes of my outlook clarified. I come from a financially conservative background. I grew up believing that dollars were king, fixed income savings were good, and the stock market was little more than a casino. I ran most of my financial investments like that for most of my life, either investing in property (i.e., the house I own, not extra investment properties) or keeping cash, US treasuries, and Certificates of Deposit (CDs). When I went truly crazy, I would occasionally invest in the S&P 500 index, the least gambly of gambling. (I’ve commented on that previously.)
I studied actuarial science in school, and worked as an actuary for a few years before moving to Israel. Being an actuary definitely put me in the world of finance, but on a much more theoretical as opposed to practical side. As a small example, I learned all the financial mathematics involved in pricing of options, but never learned any real-life strategy for trading options. And that suited me just fine: options are even more risky than normal stock market gambling!
What that theoretical background gave me was two relevant bodies of knowledge: statistics for risk analysis, and economics. Overall I loved my economics courses. The simple concepts of scarcity and competition fit so perfectly with human psychology and perfectly describe so much of the world around us. (Side note: I don’t just mean in financial matters, I strongly recommend everyone learns the basics of economics to better understand the world at large.)
One minor note about studying economics. My courses were roughly broken down into micro and macro economics. Microeconomics clicked with me from day 1. Things like “price caps lead to shortages” are so simple to understand and evidently true that I can explain them to my 7 year old without a problem.
It was macroeconomics that I couldn’t understand. Why did all the rules of microeconomics–intervention prevents free markets from discovering equilibrium–go out the door when you went to the macro level? Why was it that the government needed to intervene by printing money and spurring the market into action by increasing spending?
I was just a math student larping as an economist, not a true economist, so I incorrectly assumed that this was all just beyond my feeble understanding. And I happily lived my life for about 15 years as an actuary/programmer who enjoyed some economics and lived a fiscally conservative lifestyle.
First hints of Bitcoin
I heard about Bitcoin relatively early in its existence. My wife and I even discussed buying some when it was still under a dollar. To our chagrin, we didn’t. This fit in nicely with my “avoid casinos” approach. Bitcoin was simply a get-rich-quick scam, a Ponzi scheme, fake internet money, you name it. To be completely fair, I didn’t fully come to those conclusions at the time, but it was more-or-less what I thought.
What’s amazing is that I had just lived through the Great Financial Crisis of 2008. I was painfully aware of what a financial disaster we had. I had already graduated at the time, but I was still close to many friends from UCLA, many in economics and financial majors. I remember discussing how ridiculous “too big to fail” was. The phrase I learned much later, Privatizing Profits and Socializing Losses, was exactly how we all felt, and I knew it was setting up incorrect incentive structures. But I didn’t spend enough time to recognize that there was any connection between that disaster and this new Bitcoin fake money scam.
I spent close to 7 years having virtually nothing to do with blockchain and cryptocurrency. At some point around 2016, through my work in the Haskell space, I ended up consulting on a few blockchain projects, including building blockchains for others. I also ended up on a lot of sales calls as a sales engineer.
I won’t call out any specific projects. But suffice it to say that I walked away completely believing that all of “crypto” was a scam. And to quote a phrase from Jewish literature, אוי לרשע אוי לשכנו (woe to the evil one, woe to his neighbor). I must have opened up a Binance account at some point in this, and probably bought some crypto to get a feel for how it all worked. But I had no interest in being part of the space. Blockchain seemed like a cool technology on its own early in its hype cycle. But cryptocurrency itself wasn’t for me.
COVID-19, money printer goes brrrr
When COVID-19 began kicking off, many of us saw the huge amount of money printing, paired with forced lack of productivity due to various COVID restrictions. The combination meant that, simultaneously, true productivity in the economy slowed down (meaning: less goods) and more money was chasing those goods. A lot more money. Money printer goes brrr.
As much as I’d buried my head in the sand during the 2008 financial crisis, things were different this time.
I was older (and hopefully more mature), more financially aware, and had more money at risk.
In 2008, I was a young, newly married man dealing with his first job and learning how to raise my first child. I didn’t have a lot of free thought cycles around. While I wasn’t exactly lounging around in 2020, I had more time available to think about the problem.
Thanks to developments at work, I ended up working in the cryptocurrency space again.
Putting these things together, I paid attention, did some level of research, and decided inflation terrified me. I could easily see the value of all my savings go down dramatically. After some soul searching, I decided it was time to start abandoning my highly-risk-averse savings strategy, and begin embracing a more diversified investment strategy. This meant dividing among:
Cash, CDs, and treasury bonds. Higher interest rates certainly made this pretty attractive to myself at the time.
Buying into the S&P 500 index. While I still had my original financial outlook of the stock market being little more than a gamble, I also understood risk and empirical data, and investing in the index would–on average and assuming markets continued operating the same way as the past–continue to go up. Hopefully somewhere in line with the officially reported inflation numbers, if you believe those.
And the scariest and riskiest of all: crypto.
I want to be clear that this was not a purchase of me saying “I’m a total believer in crypto, this thing is going to skyrocket.” It was simple risk hedging. I was afraid of the fiat currency system, i.e. dollars, losing a huge amount of their value. That made stock investments far less risky in comparison. But my faith in the stock market wasn’t exactly stalwart. With all the financial system changes coming as a result of COVID-19 restrictions and money printing, I was looking for any kind of safety net.
I invested in crypto like I would invest in stocks: I chose a basket of the top performers at the time, diversified my money into them, and hoped for the best.
Terra collapse
This is a side note almost not worth including, but some people may be comforted by this part of the story.
The work I’d been doing in crypto at the time had been on the Terra blockchain. For those who aren’t aware, Terra had a systemic collapse in May 2022 due to the depegging of the TerraUSD (a.k.a. UST) stablecoin. Or, as the jokes correctly put it, not-so-stablecoin.
I lost a significant amount of money on that. With my fiscally conservative background, this was essentially a worst-case-scenario making all of my greatest fears of risky investment come true.
At this point, however, I had learned enough about the crypto boom/bust cycles. Miriam (my wife) and I spent a lot of time discussing, and decided we’d ride out the bear market, not simply run away screaming.
The point of the inclusion of this here: I basically went through the worst possible financial outcome I could imagine. And it wasn’t nearly as bad as I thought it would be. I lost money. That’s never fun. But the true moral of the story I’m telling is that everything in the financial world is a massive jumble of different risks these days. There isn’t a single safe haven, at least not like gold was in the 1800s.
The important part for everyone: don’t fall into the sunk cost fallacy! If you’ve taken losses on an investment, try to stay calm, look at it rationally, and make the best decision you possibly can with current knowledge and no emotional input.
The Bitcoin maxi path
I’m almost embarrassed by this next sentence. Despite working in the crypto space off-and-on for about 8 years now, and despite spending 3 years full-time working on crypto and Decentralized Finance products, I only recently understood the connection between the beginning of this story (the Great Financial Crisis of 2008) and Bitcoin.
You see, I shouldn’t have stuck my head in the sand back then. Had I had more time and more curiosity, I could have discovered a few truths. Firstly, Bitcoin is directly targeted at addressing the unsound system that led to the Great Recession. Secondly, Bitcoin and crypto–at least in general–are very different beasts. My guess is that, like me, most people are more afraid of Bitcoin because they have it mentally associated with the rest of the crypto world and all the fun casino-like-games it contains. And finally, I discovered that I knew both more and less about economics than I thought I did.
You’ll see people in the Bitcoin world talk about “doing your 100 hours” before you understand Bitcoin. I think I’ve only completed that in the past few months. It’s from watching a lot of random YouTube videos, chatting with people on X and Reddit, reading blog posts, and most powerfully and most recently: reading The Bitcoin Standard. I haven’t even finished it yet, I’m looking forward to the rest. But already, it’s snapped a lot of my economics understanding into focus.
In particular, it’s given me a much better understanding of the concept of money than I ever received in my university economics courses. (Or I’m simply listening better this time around.) It’s also helped me understand why macroeconomics never clicked with me. Anyone who’s read the book will know that it’s not exactly gentle in its treatment of Keynesian and Monetarist economic theories. Getting a clear breakdown of the different schools of thought, how they compare to Austrian economics, and the author’s very opinionated views on them, has been one of the most intellectually stimulating things I’ve done since learning monads and the borrow checker.
Side note: I wish more texts included the authors’ direct and unapologetic opinions like this. If someone has a similarly direct take-down of the arguments in The Bitcoin Standard, I would love to read it, please pass it along.
Putting that all together: a monetary system which allows for no inflation and grants no party central control is far more powerful than I originally understood. My opinion evolved somewhat slowly from “magical internet money” to “potentially good risk hedge in a balanced portfolio.” It went really quickly from that to “oh, I get it, this is the hardest money that exists, it will store value for the foreseeable future.”
I no longer look at my investment in Bitcoin as a risk. I’ve switched worldviews completely. Every other asset I hold is the risk. I can be hurt significantly by this of course. If Bitcoin has another bear market and I need to buy groceries, I’ll be taking a huge loss on the Bitcoin I need to liquidate. (I discuss how I address this in my buying Bitcoin or selling dollars post.)
Aren’t I scared?
Yes. I think we’re living in some of the scariest financial times of our lives. But my viewpoint now is that everything is risky, there’s no escaping it. There isn’t a safe-haven in dollars and potentially big gains in other assets. Everything is on a precipice right now. And my honest belief is that, for the long haul, Bitcoin is the least risky of all potentially stores of value.
Am I right? We’ll know in 20 years.
My recommendation to others
This is my personal journey. This story shouldn’t convince anyone to do anything with their money. I haven’t presented any true arguments in favor of Bitcoin here, just some comments and references to larger ideas.
If you take anything away from this, I hope it’s this: there’s some guy out there who’s really scared of risky investments, has at least some formal training in economics and finance, wanted nothing to do with Bitcoin, and then decided he was completely wrong and embraced it.
I hope that motivates those of you who are opposed to Bitcoin to do a bit more research. Challenge the ideas you read. Challenge the ideas you already hold. Ask questions. Get in debates. Treat this seriously. Because whichever way you decide, for or against Bitcoin, will likely have a major impact on the rest of your life.
Random Q&A
I received some questions previously on my Bitcoin vs gold blog post, and haven’t had a chance to answer them in another post. Instead of a dedicated post for that, this seems like a good time to address that.
Bitcoin was originally marketed as anonymous. Nowadays we consider it pseudonymous at best. Do you believe this is an important feature for money? Do you believe it is actually *desirable*?
Yes, I do. The economy works best by all people being completely free to make whatever financial decisions they believe are best for themselves. Having non-anonymous financial transactions will add friction to that system. Each time I make a purchase, I’ll wonder if others will judge me unfavorably for it.
Having the Bitcoin chain work as it does today with all transactions being publicly visible is great for transparency. Publicly held companies publishing their wallet addresses is great too. But there needs to be some room for truly anonymous interactions. And I believe we’ll see that space expand over time as Bitcoin moves from “great speculative asset” to “money.” The Lightning Network already does a pretty great job at this.
If anonymity is important, why not some more modern cryptocurrency?
Because it isn’t needed. Bitcoin has one thing few other cryptocurrencies have, and one thing none of them have:
Bitcoin is a scarce resource. If you look at the tokenomics of most other cryptocurrencies, they are massively complex and usually inflationary in some way. None of those can ever act as a true store of value over time.
Bitcoin is first. That’s obviously not entirely true; there are plenty of other attempts at digital money that predate it, arguably the modern dollar is also a “digital asset,” etc. But for this new class of “algorithmically scarce, decentralized, digital money,” Bitcoin is the first on the scene. As a result, it has major network effects, and will be essentially impossible to dethrone until someone comes up with a fundamentally better approach. No one has so far.
How would/should society be organized in a world where a lot of modern taxation would likely(?) be very hard due to completely unobservable cash flows? Currently I think a lot of this relies on cash being local, and the sources and sinks of it being relatively traceable.
Exposing my politics a bit more, I’m totally in favor of a society based far less on taxing the populace.
That said, I don’t think Bitcoin will fundamentally change this. People who have wanted to evade taxes have found ways in the past, and the government has always found ways to increase its surveillance apparatus to keep up with them. It’s an arms race that will continue.
If we get to a point where all money is Bitcoin and everyone transacts daily in Lightning wallets, I have no doubt that the local supermarket will fully comply with all reporting rules to the government on purchases, property purchases will require justification of where your funds came from, and other such things that will ensure the majority of financial transactions stay in the legal, taxable world.
Modern economic theory indeed often aims at a ~2% annual inflation (not more, not less), as an incentive to keep investing in ways that presumably benefit the society instead of sitting on money. Deflation is considered dangerous for the same reason. It seems to me, perhaps naively, that zero inflation would make investment a zero sum game, which seemingly removes a lot of the incentives. What do you think of this?
I’m still coming to terms on inflation vs deflation. Until recently, I had naively assumed that everyone believed deflation was awesome because all of society benefits from getting more stuff, and inflation was the penalty for letting the government print money. Apparently that is far from the modern well-accepted outlook.
I may be making a mistake in my interpretation, but based on what I’ve read in The Bitcoin Standard, here’s an answer that I hope is accurate. Whenever you have money, you can do one of two things with it: use it for immediate consumption, or defer its usage to later. The real interest rate (nominal interest - inflation rate) gives an indication of how much you’ll be rewarded for deferring usage of your money till later.
With deflation (i.e., negative inflation), you end up with a high real interest. This means “if you hold off on consumption you’ll get even more stuff in the future.” This incentivizes a low time preference: I don’t value immediate consumption very much versus later consumption.
Coming back to your question: the “sitting on money” idea comes directly from Keynesian economics, where the focus is on spending, consumerism, and short term money flows. An Austrian approach is completely different. Sitting on money means that I allow productive capability today to be put into capital goods, things which will increase production in the future, thus making more stuff available in the future at lower prices, thus further fueling deflation.
Note that another way of looking at interest rates is the cost of capital. It’s a price just like any other price in the economy. If you have the government set the price of apples, you’ll either have shortages or surpluses. So what happens to capital when the central bank gets to set the price of capital by controlling interest rates?
There’s a lot more to this topic, and I’m pretty fresh to it, so I’ll call it there. I’d definitely recommend The Bitcoin Standard for more on the topic.
In my previous
post
I explained how to implement a reasonably efficient union-find data
structure in Haskell, and challenged you to solve a couple Kattis
problems. In this post, I will (1) touch on a few generalizations
brought up in the comments of my last post, (2) go over my solutions
to the two challenge problems, and (3) briefly discuss generalizing
the second problem’s solution to finding max-edge decompositions of
weighted trees.
Generalizations
Before going on to explain my solutions to those problems, I want to
highlight some things from a comment by Derek
Elkins
and a related blog post by Philip
Zucker. The first
is that instead of (or in addition to) annotating each set with a
value from a commutative semigroup, we can also annotate the edges
between nodes with elements from a
group (or, more
generally, a groupoid). The
idea is that each edge records some information about, or evidence
for, the relationship between the endpoints of the edge. To compute
information about the relationship between two arbitrary nodes in the
same set, we can compose elements along the path between them. This
is a nifty idea—I have never personally seen it used for a
competitive programming problem, but it probably has been at some
point. (It kind of makes me want to write such a problem!) And of
course it has “real” applications beyond competitive programming as
well. I have not actually generalized my union-find code to allow
edge annotations; I leave it as an exercise for the reader.
The other idea to highlight is that instead of thinking in terms of
disjoint sets, what we are really doing is building an equivalence
relation, which
partitions the elements into disjoint equivalence classes. In
particular, we do this by incrementally building a relation \(R\), where
the union-find structure represents the reflexive, transitive,
symmetric closure of \(R\). We start with the empty relation \(R\) (whose
reflexive, transitive, symmetric closure is the discrete equivalence
relation, with every element in its own equivalence class); every
\(\mathit{union}(x,y)\) operation adds \((x,y)\) to \(R\); and the \(\mathit{find}(x)\)
operation computes a canonical representative of the equivalence class
of \(x\). In other words, given some facts about which things are
related to which other things (possibly along with some associated
evidence), the union-find structure keeps track of everything we can
infer from the given facts and the assumption that the relation is an
equivalence.
Finally, through the comments I also learned about other
potentially-faster-in-practice schemes for doing path compression such
as Rem’s
Algorithm;
I leave it for future me to try these out and see if they speed things up.
Now, on to the solutions!
Duck Journey
In Duck Journey, we are
essentially given a graph with edges labelled by bitstrings, where
edges along a path are combined using bitwise OR. We are then asked
to find the greatest possible value of a path between two given
vertices, assuming that we are allowed to retrace our steps as much as
we want.Incidentally, if we are not allowed to retrace our steps,
this problem probably becomes NP-hard.
If we can retrace our steps,
then on our way from A to B we might as well visit every edge in the
entire connected component, so this problem is not really
about path-finding at all. It boils down to two things: (1) being
able to quickly test whether two given vertices are in the same
connected component or not, and (2) computing the bitwise OR of all
the edge labels in each connected component.
One way to solve this would be to first use some kind of graph
traversal, like DFS, to find the connected components and build a map
from vertices to component labels; then partition the edges by
component and take the bitwise OR of all the edge weights in each
component. To answer queries we could first look up the component
label of the two vertices; if the labels are the same then we look up
the total weight for that component.
This works, and is in some sense the most “elemantary” solution, but
it requires building some kind of graph data structure, storing all
the edges in memory, doing the component labelling via DFS and
building another map, and so on. An alternative solution is to use a
union-find structure with a bitstring annotation for each set: as we
read in the edges in the input, we simply union the endpoints of the
edge, and then update the bitstring for the resulting equivalence
class with the bitstring for the edge. If we take a union-find library
as given, this solution seems simpler to me.
{-# LANGUAGE ImportQualifiedPost #-}{-# LANGUAGE OverloadedStrings #-}{-# LANGUAGE RecordWildCards #-}moduleMainwhereimportControl.Category ((>>>))importControl.Monad.STimportData.BitsimportData.ByteString.Lazy.Char8 (ByteString)importData.ByteString.Lazy.Char8qualifiedasBSimportScannerBSimportUnionFindqualifiedasUFmain = BS.interact $ runScanner tc >>> solve >>> formatformat :: [MaybeInt] ->ByteStringformat =map (maybe"-1" (show>>> BS.pack)) >>> BS.unlines
Next, some data types to represent the input, and a Scanner to read
it.
-- Each edge is a "filter" represented as a bitstring stored as an Int.newtypeFilter=FilterIntderiving (Eq, Show)instanceSemigroupFilterwhereFilter x <>Filter y =Filter (x .|. y)filterSize ::Filter->IntfilterSize (Filter f) = popCount fdataChannel=ChannelUF.NodeUF.NodeFilterderiving (Eq, Show)dataTC=TC {n ::!Int, channels :: [Channel], queries :: [(Int, Int)]}deriving (Eq, Show)tc ::ScannerTCtc =do n <- int m <- int q <- int channels <- m >< (Channel<$> int <*> int <*> (Filter<$> int)) queries <- q >< pair int intreturnTC {..}
Finally, here’s the solution itself: process each channel with a
union-find structure, then process queries. The annoying thing, of
course, is that this all has to be in the ST monad, but other than
that it’s quite straightforward.
solve ::TC-> [MaybeInt]solve TC {..} = runST $do uf <- UF.new (n +1) (Filter0)mapM_ (addChannel uf) channelsmapM (answer uf) queriesaddChannel ::UF.UnionFind s Filter->Channel->ST s ()addChannel uf (Channel a b f) =do UF.union uf a b UF.updateAnn uf a fanswer ::UF.UnionFind s Filter-> (Int, Int) ->ST s (MaybeInt)answer uf (a, b) =do c <- UF.connected uf a bcase c ofFalse->pureNothingTrue->Just. filterSize <$> UF.getAnn uf a
Inventing Test Data
In Inventing Test Data,
we are given a tree \(T\) with integer weights on its edges, and asked
to find the minimum possible weight of a complete graph for which \(T\)
is the unique minimum spanning
tree (MST).
Let \(e = (x,y)\) be some edge which is not in \(T\). There must be a
unique path between \(x\) and \(y\) in \(T\) (so adding \(e\) to \(T\) would
complete a cycle); let \(m\) be the maximum weight of the edges along
this path. Then I claim that we must give edge \(e\) weight \(m+1\):
On the one hand, this ensures \(e\) can never be in any MST, since an
edge which is strictly the largest edge in some cycle can never be
part of an MST (this is often called the “cycle property”).
Conversely, if \(e\) had a weight less than or equal to \(m\), then \(T\)
would not be a MST (or at least not uniquely): we
could remove any edge in the path from \(x\) to \(y\) through \(T\) and
replace it with \(e\), resulting in a spanning tree with a lower (or
equal) weight.
Hence, every edge not in \(T\) must be given a weight one more than the
largest weight in the unique \(T\)-path connecting its endpoints; these
are the minimum weights that ensure \(T\) is a unique MST.
A false start
At first, I thought what we needed was a way to quickly compute this
max weight along any path in the tree (where by “quickly” I mean
something like “faster than linear in the length of the path”). There
are indeed ways to do this, for example, using a heavy-light
decomposition and then putting a data structure on each heavy path
that allows us to query subranges of the path quickly. (If we use a
segment tree on each path we can even support operations to update
the edge weights quickly.)
All this is fascinating, and something I
may very well write about later. But it doesn’t actually help! Even
if we could find the max weight along any path in \(O(1)\), there are
still \(O(V^2)\) edges to loop over, which is too big. There can be up
to \(V = 15\,000\) nodes in the tree, so \(V^2 = 2.25 \times 10^8\). A
good rule of thumb is \(10^8\) operations per second, and there are
likely to be very high constant factors hiding in whatever complex
data structures we use to query paths efficiently.
So we need a way to somehow process many edges at once. As usual, a
change in perspective is helpful; to get there we first need to take a
slight detour.
Kruskal’s Algorithm
It helps to be familiar with Kruskal’s
Algorithm, which
is the simplest algorithm I know for finding minimum spanning
trees:
Sort the edges from smallest to biggest weight.
Initialize \(T\) to an empty set of edges.
For each edge \(e\) in order from smallest to biggest:
If \(e\) does not complete a cycle with the other edges already in
\(T\), add \(e\) to \(T\).
To efficiently check whether \(e\) completes a cycle with the other
edges in \(T\), we can use a union-find, of course: we maintain
equivalence classes of vertices under the “is connected to”
equivalence relation; adding \(e\) would complete a cycle if and only if
the endpoints of \(e\) are already connected to each other in \(T\). If
we do add an edge \(e\), we can just \(\mathit{union}\) its endpoints to properly
maintain the relation.
A change of perspective
So how does this help us solve “Inventing Test Data”? After all, we
are not being directly asked to find a minimum spanning tree.
However, it’s still helpful to think about the process Kruskal’s
Algorithm would go through, in order to choose edge weights that
will force it to do what we want (i.e. pick all the edges in \(T\)).
That is, instead of thinking about each individual edge not in \(T\),
we can instead think about the edges that are in \(T\), and what must
be true to force Kruskal’s algorithm to pick each one.
Suppose we are part of the way through running Kruskal’s algorithm,
and that it is about to consider a given edge \(e = (x,y) \in T\) which
has weight \(w_e\). At this point it has already considered any edges
with smaller weight, and (we shall assume) chosen all the
smaller-weight edges in \(T\). So let \(X\) be the set of vertices
reachable from \(x\) by edges in \(T\) with weight less than or equal to
\(w_e\), and similarly let \(Y\) be those reachable from \(y\). Kruskal’s
algorithm will pick edge \(e\) after checking that \(X\) and \(Y\) are
disjoint.
Think about all the other edges from \(X\) to \(Y\): all of them must
have weight greater than \(w_e\), because otherwise Kruskal’s algorithm
would have already considered them earlier, and used one of them to
connect \(X\) and \(Y\). In fact, all of these edges must have weight
\(w_e + 1\), as we argued earlier, since \(e\) is the largest-weight edge
on the \(T\)-path between their endpoints (all the other edges on these
paths were already chosen earlier and hence have smaller weight). The
number of such edges is just \(|X| |Y| - 1\) (there is an edge for every
pair of vertices, but we do not want to count \(e\) itself). Hence they
contribute a total of \((|X||Y| - 1)(w_e + 1)\) to the sum of edge
weights.
Hopefully the solution is now becoming clear: we process the edges of
\(T\) in order from smallest to biggest, using a union-find to keep
track equivalence classes of connected vertices so far. For each edge
\((x,y)\) we look up the sizes of the equivalence classes of \(x\) and
\(y\), add \((|X||Y| - 1)(w_e + 1)\) to a running total, and union. This
accounts for all the edges not in \(T\); finally we must also add the
weights of the edges in \(T\) themselves.
First some standard pragmas and imports, along with some data types
and a Scanner to parse the input. Note the custom Ord instance
for Edge, so we can sort edges by weight.
{-# LANGUAGE ImportQualifiedPost #-}{-# LANGUAGE RecordWildCards #-}importControl.Category ((>>>))importControl.Monad.STimportData.ByteString.Lazy.Char8qualifiedasBSimportData.List (sort)importData.Ord (comparing)importData.STRefimportScannerBSimportUnionFindqualifiedasUFmain = BS.interact $ runScanner (numberOf tc) >>>map (solve >>>show>>> BS.pack) >>> BS.unlinesdataEdge=Edge {a ::!Int, b ::!Int, w ::!Integer}deriving (Eq, Show)instanceOrdEdgewherecompare= comparing wdataTC=TC {n ::!Int, edges :: [Edge]}deriving (Eq, Show)tc ::ScannerTCtc =do n <- int edges <- (n -1) >< (Edge<$> int <*> int <*> integer)returnTC {..}
Finally, the (remarkably short) solution proper: we sort the edges
and process them from smallest to biggest; for each edge we update an
accumulator according to the formula discussed above. Since we’re
already tied to the ST monad anyway, we might as well keep the
accumulator in a mutable STRef cell.
solve ::TC->Integersolve TC {..} = runST $do uf <- UF.new (n +1) total <- newSTRef (0 ::Integer)mapM_ (processEdge uf total) (sort edges) readSTRef totalprocessEdge ::UF.UnionFind s ->STRef s Integer->Edge->ST s ()processEdge uf total (Edge a b w) =do modifySTRef' total (+ w) sa <- UF.size uf a sb <- UF.size uf b modifySTRef' total (+ (fromIntegral sa *fromIntegral sb -1) * (w +1)) UF.union uf a b
Max-edge decomposition
Incidentally, there’s something a bit more general going on here: for
a given nonempty weighted tree \(T\), a max-edge decomposition of
\(T\) is a binary tree defined as follows:
The max-edge decomposition of a trivial single-vertex tree is a
single vertex.
Otherwise, the max-edge decomposition of \(T\) consists of a root node
with two children, which are the max-edge decompositions of the two
trees that result from deleting a largest-weight edge from \(T\).
Any max-edge decomposition of a tree \(T\) with \(n\) vertices will have
\(n\) leaf nodes and \(n-1\) internal nodes. Typically we think of the
leaf nodes of the decomposition as being labelled by the vertices of
\(T\), and the internal nodes as being labelled by the edges of \(T\).
An alternative way to think of the max-edge decomposition is as the
binary tree of union operations performed by Kruskal’s algorithm while
building \(T\), starting with each vertex in a singleton leaf and then
merging two trees into one with every union operation. Thinking
about, or even explicitly building, this max-edge decomposition
occasionally comes in handy. For example, see
Veður and Toll
Roads.
Incidentally, I can’t remember whether I got the term “max-edge
decomposition” from somewhere else or if I made it up myself; in any
case, regardless of what it is called, I think I first learned of it
from this blog post by Petr
Mitrichev.
<noscript>Javascript needs to be activated to view comments.</noscript>
The first stable release of Rust was on May 15, 2015, just about 9½ years ago. My first “production” Rust code was a Slack bot, which talked to GoCD to control the rollout of a web app. This was utterly reliable. And so new bits of Rust started popping up.
I’m only going to talk about open source stuff here. This will be mostly production projects, with a couple of weekend projects thrown in. Each project will ideally get its own post over the next couple of months.
Planned posts
Here are some of the tools I’d like to talk about:
Moving tables easily between many databases (dbcrossbar)
700-CPU batch jobs
Geocoding 60,000 addresses per second
Interlude: Neural nets from scratch in Rust
Lots of CSV munging
Interlude: Language learning using subtitles, Anki, Whisper and ChatGPT
Transpiling BigQuery SQL for Trino (a work in progress)
I’ll update this list to link to the posts. Note that I may not get to all of these!
Maintaining Rust & training developers
One of the delightful things about Rust is the low rate of “bit rot”. If something worked 5 years ago—and if it wasn’t linked against the C OpenSSL libraries—then it probably works unchanged today. And if it doesn’t, you can usually fix it in 20 minutes. This is largely thanks to Rust’s “stability without stagnation” policy, the Edition system, and the Crater tool which is used to nest new Rust releases against the entire ecosystem.
The more interesting questions are (1) when should you use Rust, and (2) how do you make sure your team can use it?
This is a post is an FAQ answering the most common questions people
ask me related to inlining and specialization. I’ve also structured it
as a blog post that you can read from top to bottom.
What is inlining?
“Inlining” means a compiler substituting a function call or a
variable with its definition when compiling code. A really simple
example of inlining is if you write code like this:
moduleExamplewherex ::Intx =5y ::Inty = x +1
… then at compile time the Haskell compiler can (and will) substitute
the last occurrence of x with its definition
(i.e. 5):
y ::Inty =5+1
… which then allows the compiler to further simplify the code to:
y ::Inty =6
In fact, we can verify that for ourselves by having the compiler dump
its intermediate “core” representation like this:
… which we can squint a little bit and read it as:
x = 5
y = 6
… and ignore the other stuff.
A slightly more interesting example of inlining is a function call,
like this one:
f ::Int->Intf x = x +1y ::Inty = f 5
The compiler will be smart enough to inline f by
replacing f 5 with 5 + 1 (here x
is 5):
y ::Inty =5+1
… and just like before the compiler will simplify that further to
y = 6, which we can verify from the core output:
y = I# 6#
What is specialization?
“Specialization” means replacing a “polymorphic” function with a
“monomorphic” function. A “polymorphic” function is a function whose
type has a type variable, like this one:
-- Here `f` is our type variableexample ::Functor f => f Int-> f Intexample =fmap (+1)
… and a “monomorphic” version of the same function replaces the type
variable with a specific (concrete) type or type constructor:
example2 ::MaybeInt->MaybeIntexample2 =fmap (+1)
Notice that example and example2 are
defined in the same way, but they are not exactly the same function:
example is more flexible and works on strictly more
type constructors
example works on any type constructor f
that implements Functor, whereas example2 only
works on the Maybe type constructor (which implements
Functor).
example and example2 compile to very
different core representations
In fact, they don’t even have the same “shape” as far as GHC’s core
representation is concerned. Under the hood, the example
function takes two extra “hidden” function arguments compared to
example2, which we can see if you dump the core output (and
I’ve tidied up the output a lot for clarity):
example @f $Functor=fmap$Functor (\v -> v +1)example2 Nothing=Nothingexample2 (Just a) =Just (a +1)
The two extra function arguments are:
@f: This represents the type variable
f
Yes, the type variable that shows up in the type signature
also shows up at the term level in the GHC core representation.
If you want to learn more about this you might be interested in my Polymorphism
for Dummies post.
$Functor: This represents the Functor
instance for f
Yes, the Functor instance for a type like f
is actually a first-class value passed around within the GHC core
representation. If you want to learn more about this you might be
interested in my Scrap
your Typeclasses post.
Notice how the compiler cannot optimize example as well
as it can optimize example2 because the compiler doesn’t
(yet) know which type constructor f we’re going to call
example on and also doesn’t (yet) know which
Functor f instance we’re going to use. However, once the
compiler does know which type constructor we’re using it can
optimize a lot more.
In fact, we can see this for ourselves by changing our code a little
bit to simply define example2 in terms of
example:
example ::Functor f => f Int-> f Intexample =fmap (+1)example2 ::MaybeInt->MaybeIntexample2 = example
This compiles to the exact same code as before (you can check for
yourself if you don’t believe me).
Here we would say that example2 is “example
specialized to the Maybe type constructor”. When write
something like this:
example2 ::MaybeInt->MaybeIntexample2 = example
… what’s actually happening under the hood is that the compiler is
actually doing something like this:
example2 = example @Maybe$FunctorMaybe
In other words, the compiler is taking the more general
example function (which works on any type constructor
f and any Functor f instance) and then
“applying” it to a specific type constructor (@Maybe) and
the corresponding Functor instance
($FunctorMaybe).
In fact, we can see this for ourselves if we generate core output
with optimization disabled (-O0 instead of
-O2) and if we remove the -dsuppress-all
flag:
… then GHC inlines the definition of example and
simplifies things further, which is how it generates this much more
optimized core representation for example2:
example2 Nothing=Nothingexample2 (Just a) =Just (a +1)
In fact, specialization is essentially the same thing as inlining
under the hood (I’m oversimplifying a bit, but they are morally the same
thing). The main distinction between inlining and specialization is:
specialization simplifies function calls with “type-level”
arguments
By “type-level” arguments I mean (hidden) function arguments that are
types, type constructors, and type class instances
inlining simplifies function calls with “term-level”
arguments
By “term-level” arguments I mean the “ordinary” (visible) function
arguments you know and love
Does GHC always
inline or specialize code?
NO. GHC does not always inline or specialize code,
for two main reasons:
Inlining is not always an optimization
Inlining can sometimes make code slower. In particular, it can often
be better to not inline a function with a large implementation
because then the corresponding CPU instructions can be cached.
Inlining a function requires access to the function’s source
code
In particular, if the function is defined in a different module from
where the function is used (a.k.a. the “call site”) then the call site
does not necessarily have access to the function’s source code.
To expand on the latter point, Haskell modules are compiled
separately (in other words, each module is a separate “compilation
unit”), and the compiler generates two outputs when compiling a
module:
a .o file containing object code
(e.g. Example.o)
This object code is what is linked into the final executable to
generate a runnable program.
a .hi file containing (among other things) source
code
The compiler can optionally store the source code for any compiled
functions inside this .hi file so that it can inline those
functions when compiling other modules.
However, the compiler does not always save the source code for all
functions that it compile because there are downsides to storing source
code for functions:
this slows down compilation
This slows down compilation both for the “upstream” module (the
module defining the function we might want to inline) and the
“downstream” module (the module calling the function we might want to
inline). The upstream module takes longer to compile because now the
full body of the function needs to be saved in the .hi file
and the downstream module takes longer to compile because inlining isn’t
free (all optimizations, including inlining, generate more work for the
compiler).
this makes the .hi file bigger
The .hi file gets bigger because it’s storing the source
code of the function.
this can also make the object code larger, too
Inlining a function multiple times can lead to duplicating the
corresponding object code for that function.
This is why by default the compiler uses its own heuristic to decide
which functions are worth storing in the .hi file. The
compiler does not indiscriminately save the source code
for all functions.
You can override the compiler’s heuristic, though, using …
Compiler directives
There are a few compiler directives (a.k.a. “pragmas”) related to
inlining and specialization that we’ll cover here:
INLINABLE
INLINE
NOINLINE
SPECIALIZE
My general rule of thumb for these compiler directives is:
don’t use any compiler directive until you benchmark your code to
show that it helps
if you do use a compiler directive, INLINABLE is
probably the one you should pick
I’ll still explain what what all the compiler directives mean,
though.
INLINABLE
INLINABLE is a compiler directive that you use like
this:
f ::Int->Intf x = x +1{-# INLINABLE f #-}
The INLINABLE directive tells the compiler to save the
function’s source code in the .hi file in order to make
that function available for inlining downstream.
HOWEVER, INLINABLE does
NOT force the compiler to inline that function. The
compiler will still use its own judgment to decide whether or not the
function should be inlined (and the compiler’s judgment tends to be
fairly good).
INLINE
INLINE is a compiler directive that you use in a similar
manner as INLINABLE:
f ::Int->Intf x = x +1{-# INLINE f #-}
INLINE behaves like INLINABLE except that
it also heavily biases the compiler in favor of inlining the
function. There are still some cases where the compiler will refuse to
fully inline the function (for example, if the function is recursive),
but generally speaking the INLINE directive override’s the
compiler’s own judgment for whether or not to inline the function.
I would argue that you usually should prefer the
INLINABLE pragma over the INLINE pragma
because the compiler’s judgment for whether or not to inline things is
usually good. If you override the compiler’s judgment there’s a good
chance you’re making things worse unless you have benchmarks showing
otherwise.
NOINLINE
If you mark a function as NOINLINE:
f ::Int->Intf x = x +1{-# NOINLINE f #-}
… then the compiler will refuse to inline that function. It’s pretty
rare to see people use a NOINLINE annotation for
performance reasons (although there are circumstances where
NOINLINE can be an optimization). It’s far, far,
far more common to see people use NOINLINE in
conjunction with unsafePerformIO because that’s what the unsafePerformIO
documentation recommends:
Use {-# NOINLINE foo #-} as a pragma on any function
foo that calls unsafePerformIO. If the call is inlined,
the I/O may be performed more than once.
SPECIALIZE
SPECIALIZE lets you hint to the compiler that it should
compile a polymorphic function for a monomorphic type ahead of time. For
example, if we define a polymorphic function like this:
example ::Functor f => f Int-> f Intexample =fmap (+1)
… we can tell the compiler to go ahead and specialize the
example function for the special case where f
is Maybe, like this:
example ::Functor f => f Int-> f Intexample =fmap (+1){-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}
This tells the compiler to go ahead and compile the more specialized
version, too, because we expect some other module to use that more
specialized version. This is nice if we want to get the benefits of
specialization without exporting the function’s source code (so we don’t
bloat the .hi file) or if we want more precise control over
when specialize does and does not happen.
In practice, though, I find that most Haskell programmers don’t want
to go to the trouble of anticipating and declaring all possible
specializations, which is why I endorse INLINABLE as the
more ergonomic alternative to SPECIALIZE.
The GHC developers are very pleased to announce the availability
of the third alpha release of GHC 9.12.1. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org.
We hope to have this release available via ghcup shortly.
GHC 9.12 will bring a number of new features and improvements, including:
The new language extension OrPatterns allowing you to combine multiple
pattern clauses into one.
The MultilineStrings language extension to allow you to more easily write
strings spanning multiple lines in your source code.
Improvements to the OverloadedRecordDot extension, allowing the built-in
HasField class to be used for records with fields of non lifted representations.
The NamedDefaults language extension has been introduced allowing you to
define defaults for typeclasses other than Num.
More deterministic object code output, controlled by the -fobject-determinism
flag, which improves determinism of builds a lot (though does not fully do so)
at the cost of some compiler performance (1-2%). See #12935 for the details
GHC now accepts type syntax in expressions as part of GHC Proposal #281.
The WASM backend now has support for TemplateHaskell.
… and many more
A full accounting of changes can be found in the release notes.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
The paper is about representing graphs (especially in functional
languages). We argue in the paper that graphs are naturally
coinductive, rather than inductive, and that many of the
problems with graphs in functional languages go away once you give up on
induction and pattern-matching, and embrace the coinductive way of doing
things.
Of course, coinduction comes with its own set of problems, especially
when working in a total language or proof assistant. Another big focus
of the paper was figuring out a representation that was amenable to
formalisation (we formalised the paper in Cubical Agda). Picking a good
representation for formalisation is a tricky thing: often a design
decision you make early on only looks like a mistake after a few
thousand lines of proofs, and modern formal proofs tend to be brittle,
meaning that it’s difficult to change an early definition without also
having to change everything that depends on it. On top of this, we
decided to use quotients for an important part of the representation,
and (as anyone who’s worked with quotients and coinduction will tell
you) productivity proofs in the presence of quotients can be a real
pain.
All that said, I think the representation we ended up with in the
paper is quite nice. We start with a similar representation to the one
we had in our ICFP
paper in 2021: a graph over vertices of type a is
simply a function a -> [a] that returns the neighbours
of a supplied vertex (this is the same representation as in this post). Despite the
simplicity, it turns out that this type is enough to implement a decent
number of search algorithms. The really interesting thing is that the
arrow methods (from Control.Arrow)
work on this type, and they define an algebra on graphs similar to the
one from Mokhov (2017). For example, the
<+> operator is the same as the overlay
operation in Mokhov (2017).
That simple type gets expanded upon and complicated: eventually, we
represent a possibly-infinite collection as a function that takes a
depth and then returns everything in the search space up to that depth.
It’s a little like representing an infinite list as the partial
application of the take function. The paper spends a lot of
time picking an algebra that properly represents the depth, and figuring
out coherency conditions etc.
One thing I’m especially proud of is that all the Agda code snippets
in the paper are hyperlinked to a
rendered html version of the code. Usually, when I want more info on
some code snippet in a paper, I don’t really want to spend an
hour or so downloading some artefact, installing a VM, etc. What I
actually want is just to see all of the definitions the snippet relies
on, and the 30 or so lines of code preceding it. With this paper, that’s
exactly what you get: if you click on any Agda code in the paper, you’re
brought to the source of that code block, and every definition is
clickable so you can browse without having to install
anything.
I think the audience for this paper is anyone who is interested in
graphs in functional languages. It should be especially interesting to
people who have dabbled in formalising some graphs, but who might have
been stung by an uncooperative proof assistant. The techniques in the
second half of the paper might help you to convince Agda (or Idris, or
Rocq) to accept your coinductive and quotient-heavy arguments.
Mokhov, Andrey. 2017. “Algebraic Graphs with
Class (Functional Pearl).” In
Proceedings of the 10th ACM SIGPLAN International
Symposium on Haskell, 2–13. Haskell 2017. New
York, NY, USA: ACM. doi:10.1145/3122955.3122956.
Effect is a powerful library for TypeScript developers that brings functional programming techniques into managing effects and errors. It aims to be a comprehensive utility library for TypeScript, offering a range of tools that could potentially replace specialized libraries like Lodash, Zod, Immer, or RxJS.
In this blog post, we will introduce you to Effect by creating a simple weather widget app. This app will allow users to search for weather information by city name, making it a good example as it involves API data fetching, user input handling, and error management. We will implement this project in both vanilla TypeScript and using Effect to demonstrate the advantages Effect brings in terms of code readability and maintainability.
What is Effect?
Effect promises to improve TypeScript code by providing a set of modules and functions that are composable with maximum type-safety.
The term “effect” refers to an effect system, which provides a declarative approach to handling side effects. Side effects are operations that have observable consequences in the real world, like logging, network requests, database operations, etc.
The library revolves around the Effect<Success, Error, Requirements> type, which can be used to represent an immutable value that lazily describes a workflow or job.
Effects are not functions by themselves, they are descriptions of what should be done. They can be composed with other effects, and they can be interpreted by the Effect runtime system.
Before we dive into the project we will build, let’s look at some basic concepts of Effect.
Creating effects
We can create an effect based on a value using the Effect.succeed and Effect.fail functions:
const success: Effect.Effect<number,never,never>= Effect.succeed(42)const fail: Effect.Effect<never, Error,never>= Effect.fail(newError("Something went wrong"))
An effect with never as the Error means it never fails
An effect with never as the Success means it never produces a successful value.
An effect with never as the Requirements means it doesn’t require any context to run.
With the functions above, we can create effects like this:
const divide =(a:number, b:number): Effect.Effect<number, Error,never>=>
b ===0? Effect.fail(newError("Cannot divide by zero")): Effect.succeed(a / b)
To create an effect based on a function, we can use the Effect.sync and Effect.promise for synchronous and asynchronous functions that can’t fail, respectively, and Effect.try and Effect.tryPromise for synchronous and asynchronous functions that can fail.
// Synchronous function that can't failconst log =(message:string): Effect.Effect<void,never,never>=>
Effect.sync(()=>console.log(message))// Asynchronous function that can't failconst delay =(message:string): Effect.Effect<string,never,never>=>
Effect.promise<string>(()=>newPromise(resolve =>{setTimeout(()=>{resolve(message)},2000)}))// Synchronous function that can failconst parse =(input:string): Effect.Effect<any, Error,never>=>
Effect.try({// JSON.parse may throw for bad inputtry:()=>JSON.parse(input),// remap the errorcatch: _unknown =>newError(`something went wrong while parsing the JSON`),})// Asynchronous function that can failconst getTodo =(id:number): Effect.Effect<Response, Error,never>=>
Effect.tryPromise({// fetch can throw for network errorstry:()=>fetch(`https://jsonplaceholder.typicode.com/todos/${id}`),// remap the errorcatch:unknown=>newError(`something went wrong ${unknown}`),})
In order to run an effect, we need to use the appropriate function depending on the effect type. In our application we’ll use the Effect.runPromise function, which is used for effects that are asynchronous and can’t fail:
When writing a program using Effect, we usually need to run a sequence of operations, and we can use the pipe function to compose them:
constdouble=(n:number)=> n *2constdivide=(b:number)=>(a:number): Effect.Effect<number, Error>=>
b ===0? Effect.fail(newError("Cannot divide by zero")): Effect.succeed(a / b)constincrement=(n:number)=> Effect.succeed(n +1)const result =pipe(42,// Here we have an Effect.Effect<number, Error> with the value 21divide(2),// To run a function over the value changing the effect's value, we use Effect.map
Effect.map(double),// To run a function over the value without changing the effect's value, we use Effect.tap
Effect.tap(n =>console.log(`The double is ${n}`)),// To run a function that returns a new effect, we use Effect.andThen
Effect.andThen(increment),
Effect.tap(n =>console.log(`The incremented value is ${n}`)))
Effect.runSync(result)// -> The double is 42// -> The incremented value is 43
Now that we have a basic understanding of Effect, we can start the project! We will build a simple weather app in which the user types the name of a city, selects the desired one from a list of suggestions, and then the app shows the current weather in that city.
The project will have three main components: the input field, the list of suggestions, and the weather information.
We will use the Open-Meteo API to get the weather information as it doesn’t require an API key.
Setup
We begin by creating a new TypeScript project:
mkdir weather-app
cd weather-app
npm init -y
Next, we install the dependencies. We will use Parcel to bundle the project as it works without any configuration:
npm run dev
Server running at http://localhost:1234
✨ Built in 8ms
By accessing the URL, you should see the application, but it won’t work yet.
Let’s write the TypeScript code!
Without Effect
All the following code examples should be placed in the src/index.ts file.
First, we query the elements from the DOM:
// The field inputconst cityElement = document.querySelector<HTMLInputElement>("#city")// The list of suggestionsconst citiesElement = document.querySelector<HTMLUListElement>("#cities")// The weather informationconst weatherElement = document.querySelector<HTMLDivElement>("#weather")
Next, we’ll define the types for the data we’ll fetch from the API.
To validate the data, we’ll use a library called Zod. Zod is a TypeScript-first schema declaration and validation library.
npminstall zod
First, we define the schema by using z.object and, for each property, we use z.string, z.number and other functions to define its type:
Now, we create the function to fetch the cities from the Open-Meteo API. It fetches the cities that match the given name and returns a list of suggestions. In order to validate the API response, we use the safeParse method that our GeocodingResponse Zod schema provides. This method returns an object with two key properties:
success: A boolean indicating if the parsing succeeded.
data: The parsed data if successful, matching our defined schema.
const getCity =async(city:string):Promise<CityResponse[]>=>{try{const response =awaitfetch(`https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`)// Convert the response to JSONconst geocoding =await response.json()// Parse the response using the GeocodingResponse schemaconst parsedGeocoding = GeocodingResponse.safeParse(geocoding)if(!parsedGeocoding.success){return[]}return parsedGeocoding.data.results
}catch(error){console.error("Error:", error)return[]}}
To make the input field work, we need to attach an event listener to it to call the getCity function:
constgetCities=asyncfunction(input: HTMLInputElement){const{ value }= input
// Check if the HTML element existsif(citiesElement){// Clear the list of suggestions
citiesElement.innerHTML =""}// Check if the input is emptyif(!value){return}// Fetch the citiesconst results =awaitgetCity(value)renderCitySuggestions(results)}
cityElement?.addEventListener("input",function(_event){getCities(this)})
Next, we create the renderCitySuggestions function to render the list of suggestions or display an error message if there are no suggestions:
constrenderCitySuggestions=(cities: CityResponse[])=>{// If there are cities, populate the suggestionsif(cities.length >0){populateSuggestions(cities)return}// Otherwise, show a message that the city was not foundif(weatherElement){const search = cityElement?.value ||"searched"
weatherElement.innerHTML =`<p>City ${search} not found</p>`}}
The populateSuggestions function is very simple - it creates a list item for each city:
We have a type WeatherResult for error handling; it can be ok or error.
The getWeather function fetches the weather information based on the latitude and longitude of a city and returns the result.
We are passing some parameters to the API to get the current temperature, humidity, apparent temperature, and precipitation.
If you want to know more about these parameters, you can check the API documentation.
One last thing we need to do is to use a debounce function to avoid making too many requests to the API while the user is typing.
To do that, we’ll install Lodash which provides many useful functions for everyday programming.
We’ll wrap the getCities function with the debounce function:
import{ debounce }from"lodash"// ...const getCities =debounce(asyncfunction(input: HTMLInputElement){// The same code as before},500)
This way, the getCities function will be called only after the user stops typing for 500 milliseconds.
Our small weather app is now complete: when we type a city name in the input field, a list of suggestions is displayed, and when we click on one of them, we can see the weather information for that city.
While our current code works and handles errors well, let’s explore how using Effect can potentially improve its robustness and simplicity.
With Effect
To get started with Effect, we need to install it:
npminstall effect
We will start by refactoring the functions in the order we implemented them in the previous section.
First, we refactor the querySelector calls. We’ll use the Option type from Effect: it represents a value that may or may not exist. If the value exists, it’s a Some, if it doesn’t, it’s a None.
import{ Option }from"effect"// The field inputconst cityElement = Option.fromNullable(
document.querySelector<HTMLInputElement>("#city"))// The list of suggestionsconst citiesElement = Option.fromNullable(
document.querySelector<HTMLUListElement>("#cities"))// The weather informationconst weatherElement = Option.fromNullable(
document.querySelector<HTMLDivElement>("#weather"))
Using the Option type, we can chain operations without worrying about null or undefined values. This approach simplifies our code by eliminating the need for explicit null checks. We can use functions like Option.map and Option.andThen to handle the transformations and checks in a more elegant way. To know more about the Option type, take a look at the page about it in the documentation.
Now, let’s move to the getCity function. We’ll use the Schema.Struct to define the types of the CityResponse and GeocodingResponse objects. Those schemas will be used to validate the response from the API. This is the same thing we did before with Zod, but this time we don’t have to install any library. Instead, we can just use the Schema module that Effect provides.
import{/* ... */, Effect, Scope, pipe }from"effect";import{ Schema }from"@effect/schema"import{
FetchHttpClient,
HttpClient,
HttpClientResponse,
HttpClientError
}from"@effect/platform";// ...const CityResponse = Schema.Struct({
name: Schema.String,
country_code:pipe(Schema.String, Schema.length(2)),
latitude: Schema.Number,
longitude: Schema.Number,})typeCityResponse= Schema.Schema.Type<typeof CityResponse>const GeocodingResponse = Schema.Struct({
results: Schema.Array(CityResponse),})typeGeocodingResponse= Schema.Schema.Type<typeof GeocodingResponse>const getRequest =(url:string): Effect.Effect<HttpClientResponse.HttpClientResponse, HttpClientError.HttpClientError, Scope.Scope>=>pipe(
HttpClient.HttpClient,// Using `Effect.andThen` to get the client from the `HttpClient.HttpClient` tag and then make the request
Effect.andThen(client => client.get(url)),// We don't need to send the tracing headers to the API to avoid CORS errors
HttpClient.withTracerPropagation(false),// Providing the HTTP client to the effect
Effect.provide(FetchHttpClient.layer))const getCity =(city:string): Effect.Effect<readonly CityResponse[],never,never>=>pipe(getRequest(`https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`),// Validating the response using the `GeocodingResponse` schema
Effect.andThen(HttpClientResponse.schemaBodyJson(GeocodingResponse)),// Providing a default value in case of failure
Effect.orElseSucceed<GeocodingResponse>(()=>({ results:[]})),// Extracting the `results` array from the `GeocodingResponse` object
Effect.map(geocoding => geocoding.results),// Providing a scope to the effect
Effect.scoped
)
Here we already have some interesting things happening!
The getRequest function sets up the HTTP client. While we could use the built-in fetch API as our HTTP client, Effect provides a solution called HttpClient in the @effect/platform package. It’s important to note that this package is currently in beta, as mentioned in the official documentation. Despite its beta status, we’ll be using it to explore more of Effect’s capabilities and showcase how it integrates with the broader Effect ecosystem. This choice allows us to demonstrate Effect’s approach to HTTP requests and error handling in a more idiomatic way. HttpClient.HttpClient is something called a “tag” that we can use to get the HTTP client from the context. To do that, we use the Effect.andThen function.
After that, we’re setting withTracerPropagation to false to avoid sending the tracing headers to the API and getting a CORS error.
Since we’re using the HttpClient service, it’s a requirement to our effect (remember the Effect<Success, Error, Requirements> type?) and we need to provide this requirement in order to run the effect.
With the Effect.provide function we can add a layer to the effect that provides the HttpClient service. For more information about the Effect.provide function and how it works, take a look at the runtime page on the Effect documentation.
In the getCity function, we call the getRequest function to get the response from the API.
Then we validate the response using the HttpClientResponse.schemaBodyJson function, which validates the response body using the GeocodingResponse schema.
In the last line of the function, we use the Effect.scoped function to provide a scope to the effect, this is a requirement for the HttpClient service that we’re using in the getRequest function. The scope ensures that if the program is interrupted, any request will be aborted, preventing memory leaks.
getCity returns a Effect.Effect<CityResponse[], never, never>: the two never means it never fails (we’re providing a default value in case of failure), and it doesn’t require any context to run.
Next, we refactor the getCities function:
import{/* ... */, Effect, Option, pipe }from"effect";// ...const getCities =(search:string): Effect.Effect<Option.Option<void>,never,never>=>{
Option.map(citiesElement, citiesEl =>(citiesEl.innerHTML =""))returnpipe(getCity(search),
Effect.map(renderCitySuggestions),// Check if the input is empty
Effect.when(()=>Boolean(search)))}
We’re using the Option.map function to access the actual citiesElement and clear the list of suggestions.
After that, it’s pretty straightforward: we call the getCity function with the search term, then we map the renderCitySuggestions function over the successful value, and finally, we apply a condition that makes the effect run only if the search term is not empty.
Here is how we add the event listener to the input field:
Actually, we’re doing more than just adding an event listener.
The debounce function that we had to import from Lodash before is now part of Effect as the Stream.debounce function. In order to use this function, we need to create a Stream.
A Stream has the type Stream<A, E, R> and it’s a program description that, when executed, can emit zero or more values of type A, handle errors of type E, and operates within a context of type R. There are a couple of ways to create a Stream, which are detailed in the page about streams in the documentation. In this case, we’re using the Stream.async function as it receives a callback that emits values to the stream.
After creating the Stream and assigning it to the stream variable, we use a pipe to build a pipeline where we debounce the stream by 500 milliseconds, run the getCities function whenever the stream gets a value (that is, when we emit a value), and finally run the effect with Effect.runPromise.
Let’s move on to the renderCitySuggestions function:
import{/* ... */,Array, Option, pipe }from"effect";// ...const renderCitySuggestions =(cities:readonly CityResponse[]):void| Option.Option<void>=>// If there are multiple cities, populate the suggestions// Otherwise, show a message that the city was not foundpipe(
cities,Array.match({
onNonEmpty: populateSuggestions,onEmpty:()=>{const search = Option.match(cityElement,{onSome:(cityEl)=> cityEl.value,onNone:()=>"searched",});
Option.map(
weatherElement,(weatherEl)=>(weatherEl.innerHTML =`<p>City ${search} not found</p>`),);},}),);
Instead of manually checking the length of the cities array, we’re using the Array.match function to handle that. If the array is empty, it calls the callback defined in the onEmpty property, and if the array is not empty, it calls the callback defined in the onNonEmpty property.
The populateSuggestions function remains almost the same. The only change is that we now wrap the forEach operation in an Option.map to safely handle the optional cities element. This ensures we only attempt to populate suggestions when the element exists.
There is no checking for the data.tag any more, we’re using the Effect.match function to handle both cases, success and failure, and we don’t throw anything anymore.
We’re again using the Schema.Struct to define the WeatherResponse type. However, we don’t need to have a WeatherResult anymore as the Effect type already handles the success and failure cases.
After this refactoring, the app works the same way it did before, but now we have the confidence that our code is more robust and type-safe. Let’s see the benefits of Effect when comparing to the code without it.
Conclusion
Now that we have the two versions of the application, we can analyze them and highlight the pros and cons of using Effect:
Pros
Type-safety: Effect provides a way to handle errors and requirements in a type-safe way and using it increases the overall type safety of our app.
Error handling: The Effect type has built-in error handling, making the code more robust.
Validation: We don’t need to use a library like Zod to validate the response - we can use the Schema module to validate the response.
Utility functions: We don’t need to use a library like Lodash to use utility functions. Instead, we can use the Array, Option, Stream, and other modules.
Declarative style: Writing code with Effect means we’re using a more declarative approach: we’re describing “what” we want our program to do, rather than “how” we want it to do it.
Cons
Complexity: The code is more complex than the one without Effect; it may be hard to understand for people who are not familiar with the library.
Learning curve: You need to learn how to use the library - it’s not as simple as writing plain TypeScript code.
Documentation: The documentation is good, but could be better. Some parts are not clear.
While the code written with Effect may initially appear more complex to those unfamiliar with the library, its benefits far outweigh the initial learning curve.
Effect offers powerful tools for maximum type-safety, error handling, asynchronous operations, streams and more, all within a single library that is incrementally adoptable. In our project, we used two separate libraries (Zod and Lodash) to achieve what Effect accomplishes on its own.
While plain TypeScript may be adequate for small projects, we believe Effect can truly shine in larger, more complex applications. Its robust handling of side-effects and comprehensive error management have the potential to make it a game changer for taming complexity and maintaining code quality at scale.
Today, 2024-11-06, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 35th episode of the Haskell Unfolder live on YouTube.
We’re going to look at two somewhat more exotic type classes in the Haskell library ecosystem: Distributive and Representable. The former allows you to distribute one functor over another, the latter provides you with a notion of an index to access the elements. As an example, we’ll return once more to the grids used in Episodes 32 and 33 to describe the tic-tac-toe game, and we’ll see how some operations we used can be made more elegant in terms of these type classes. This episode is, however, self-contained; having seen the previous episodes is not required.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn't. By the way, some of these gaps are either served by export, or by missing features we are actively working on, those will be some other posts!
Improve training efficiency on a small-medium scale
Scenario: You have a model in PyTorch that you want to train at a small-medium scale (e.g., below 1K GPUs--at the 1K point there is a phase change in behavior that deserves its own section). You would like it to train faster. Locally, it's nice to get a trained model faster than you would have otherwise. But globally, the faster everyone's models train, the less GPU hours they use, which means you can run more jobs in a given time window with a fixed cluster. If your supply of GPUs is inelastic (lol), efficiency improvement means you can support more teams and use cases for the same amount of available GPUs. At a capacity planning level, this can be a pretty big deal even if you are GPU rich.
What to do: In some sense, this is the reason we built torch.compile. (When we were initially planning torch.compile, we were trying to assess if we were going after inference; but inference compilers are a much more crowded space than training compilers, and we reasoned that if we did a good job building a training compiler, inference would work too--which it did!) The dream which we sold with torch.compile is that you could slap it on the top of your model and get a speed up. This turns out to... not quite be true? But the fact remains that if you're willing to put in some work, there is almost always performance waiting at the end of the road for you. Some tips:
Compile only the modules you need. You don't have to compile the entire model; there might be specific modules which are easy to compile which will give you the most of the benefit. For example, in recommendation systems, there is not much compute improvement to be had from optimizing the embedding lookups, and their model parallelism is often quite hard to handle in the compiler, so torch.compiler.disable them. NB: This doesn't apply if you want to do some global graph optimization which needs the whole model: in that case, pass fullgraph=True to torch.compile and ganbatte!
Read the missing manual. The missing manual is full of guidance on working with the compiler, with a particular emphasis on working on training.
Open source examples:torchtune and torchtitan are two first party libraries which are intended to showcase modern PyTorch using torch.compile in a training context. There's also some training in torchao.
Downsides:
The compiler is complicated. One of the things we've slowly been coming to terms with is that, uh, maybe promising you could just slap torch.compile on a model and have it run faster was overselling the feature a teensy bit? There seems to be some irreducible complexity with compilers that any user bringing their own model to torch.compile has to grapple with. So yes, you are going to spend some of your complexity budget on torch.compile, in hopes that the payoff is worth it (we think it is!) One ameliorating factor is that the design of torch.compile (graph breaks) means it is very easy to incrementally introduce torch.compile into a codebase, without having to do a ton of upfront investment.
Compile time can be long. The compiler is not a straightforward unconditional win. Even if the compiler doesn't slow down your code (which it can, in pathological cases), you have to spend some amount of time compiling your model (investment), which you then have to make back by training the model more quickly (return). For very small experimentation jobs, or jobs that are simply crashing, the time spent compiling is just dead weight, increasing the overall time your job takes to run. (teaser: async compilation aims to solve this.) To make matters worse, if you are scheduling your job on systems that have preemption, you might end up repeatedly compiling over and over again every time your job gets rescheduled (teaser: caching aims to solve this.) But even when you do spend some time training, it is not obvious without an A/B test whether or not you are actually getting a good ROI. In an ideal world, everyone using torch.compile would actually verify this ROI calculation, but it doesn't happen automatically (teaser: automatic ROI calculation) and in large organizations we see people running training runs without even realizing torch.compile is enabled.
Numerics divergence from eager. Unfortunately, the compiler does not guarantee exact bitwise equivalence with eager code; we reserve the right to do things like select different matrix multiply algorithms with different numerics or eliminate unnecessary downcast/upcasts when fusing half precision compute together. The compiler is also complicated and can have bugs that can cause loss not to converge. Expect to also have to evaluate whether or not application of torch.compile affects accuracy. Fortunately, for most uses of compiler for training efficiency, the baseline is the eager model, so you can just run an ablation to figure out who is actually causing the accuracy problem. (This won't be true in a later use case when the compiler is load bearing, see below!)
Improve Python inference efficiency
Scenario: You've finished training your model and you want to deploy it for inference. Here, you want to improve the efficiency of inference to improve response latency or reduce the overall resource requirements of the system, so you can use less GPUs to serve the traffic you are receiving. Admittedly, it is fairly common to just use some other, more inference friendly systems (which I will decline to name by name lol) to serve the model. But let's say you can't rewrite the model in a more serving friendly language (e.g., because the model authors are researchers and they keep changing the model, or there's a firehose of models and you don't have the money to keep continuously porting each of them, or you depend on an ecosystem of libraries that are only available in CPython).
What to do: If Python can keep up with the CPU-side QPS requirements, a way of getting good performance without very much work is taking the Python model, applying torch.compile on it in the same way as you did in training and directly using this as your inference solution. Some tips that go beyond training:
Autotuning makes the most sense for inference. In training runs, you have a limited window (the lifetime of the training job) to get return on the investment you spent optimizing the model. In the serving regime, you can amortize over the entire lifetime of your model in inference, which is typically much longer. Therefore, expensive optimization modes like mode="max-autotune" are more likely to pay off!
Warmup inference processes before serving traffic to them. Because torch.compile is a just-in-time compiler, you will spend quite a bit of time compiling (even if you cache hit) at startup. If you have latency requirements, you will want to warmup a fresh process with a representative set of inputs so that you can make sure you trigger all of the compilation paths you need to hit. Caching will reduce compile time but not eliminate it.
Try skip_guard_eval_unsafe to reduce guard overhead. Dynamo guard overhead can be material in the inference case. If this is a problem, get a nightly and try skip_guard_eval_unsafe.
Open source examples: LLM serving on torch.compile is quite popular: vllm, sglang, tensorrt-llm, gpt-fast (this is technically not an E2E serving solution, but one of its primary reasons for existing is to serve as a starting point so you can build your own torch.compile based LLM inference stack on top of it). Stable diffusion models are also notable beneficiaries of torch.compile, e.g., diffusers.
Downsides:
Just in time compilation is a more complicated operational model. It would be better if you didn't have to warmup inference processes before serving traffic to them. Here, torch.compile has traded operational simplicity for ease of getting started. If you wanted to guarantee that compilation had already happened ahead of time, you have to instead commit to some sort of export-based flow (e.g., C++ GPU/CPU inference) below.
Model and dependency packaging in Python is unaddressed. You need to somehow package and deploy the actual Python code (and all its dependencies) which constitute the model; torch.compile doesn't address this problem at all (while torch.export does). If you are running a monorepo and do continuous pushes of your infra code, it can be organizationally complicated to ensure people don't accidentally break model code that is being shipped to production--it's very common to be asked if there's a way to "freeze" your model code so that the monorepo can move on. But with Python inference you have to solve this problem yourself, whether the solution is torch.package, Docker images, or something else.
Caches are not guaranteed to hit. Do you have to recompile the model every time you restart the inference process? Well, no, we have an Inductor and Triton (and an in-progress AOTAutograd) cache which in principle can cache all of the cubin's that are generated by torch.compile. Most of the time, you can rely on this to reduce startup cost to Dynamo tracing the model only. However, the caches are not guaranteed to hit: there are rarer cases where we don't know how to compute the cache key for some feature a model is using, or the compiler is nondeterministic in a way that means the cache doesn't hit. You should file bugs for all of these issues as we are interested in fixing them, but we don't give a categorical guarantee that after you've compiled your inference program once, you won't have to compile it again. (And indeed, under torch.compile's user model, we can't, because the user code might be the root cause of the nondeterminism--imagine a model that is randomly sampling to decide what version of a model to run.)
Multithreading is currently buggy. It should, in principle, be possible to run torch.compile'd code from multiple threads in Python and get a speedup, especially when CUDA graphs or CPP wrapper is used. (Aside: Inductor's default compile target is "Python wrapper", where Inductor's individually generated Triton kernels are called from Python. In this regime, you may get in trouble due to the GIL; CUDA graphs and CPP wrapper, however, can release the GIL when the expensive work is being done.) However, it doesn't work. Track the issue at https://github.com/pytorch/pytorch/issues/136833
Like above, but the compiler is load bearing
Scenario: In both the cases above, we assumed that we had a preexisting eager model that worked, and we just wanted to make it faster. But you can also use the compiler in a load bearing way, where the model does not work without the compiler. Here are two common cases where this can occur:
Performance: A compiler optimization results in an asymptotic or large constant factor improvement in performance can make a naive eager implementation that would have otherwise been hopelessly slow have good performance. For example, SimpleFSDP chooses to apply no optimizations to the distributed collectives it issues, instead relying on the compiler to bucket and prefetch them for acceptable performance.
Memory: A compiler optimization reduces the memory usage of a model, can allow you to fit a model or batch size that would otherwise OOM. Although we don't publicly expose APIs for doing so, you can potentially use the compiler to do things like force a certain memory budget when doing activation checkpointing, without requiring the user to manually specify what needs to be checkpointed.
What to do: Unlike in the previous cases where you took a preexisting model and slap torch.compile, this sort of use of the compiler is more likely to arise from a codevelopment approach, where you use torch.compile while you build your model, and are constantly checking what the compiler does to the code you write. Some tips:
Don't be afraid to write your own optimization pass. Inductor supports custom FX optimization passes. torch.compile has done the work of getting your model into an optimizable form; you can take advantage of this to apply domain specific optimizations that Inductor may not support natively.
Open source examples. SimpleFSDP as mentioned above. VLLM uses torch.compile to apply custom optimization passes. Although its implementation is considerably more involved than what you might reasonable expect a third party to implement, FlexAttention is a good example of a non-compiler feature that relies on the compiler in a load-bearing way for performance.
Downsides: Beyond the ones mentioned above:
You can no longer (easily) use eager as a baseline. This is not always true; for example, FlexAttention has an eager mode that runs everything unfused which can still be fast enough for small experiments. But if you have an accuracy problem, it may be hard to compare against an eager baseline if you OOM in that case! It turns out that it's really, really useful to have access to an eager implementation, so it's worth working harder to make sure that the eager implementation works, even if it is slow. (It's less clear how to do that with, e.g., a fancy global optimization based activation checkpointing strategy.)
My friend Alan Jeffrey passed away earlier this year. I described his professional life at a Celebration in Oxford on 2nd November 2024. This post is a slightly revised version of what I said.
Edinburgh, 1983–1987
I’ve known Alan for over 40 years—my longest-standing friend. We met at the University of Edinburgh in 1983, officially as computer science freshers together, but really through the clubs for science fiction and for role-playing games. Alan was only 16: like many in Scotland, he skipped the final school year for an earlier start at university. It surely helped that his school had no computers, so he wasted no time in transferring to a university that did. His brother David says that it also helped that he would then be able to get into the student union bars.
Oxford, 1987–1991
After Edinburgh, Alan and I wound up together again as freshers at the University of Oxford. We didn’t coordinate this; we independently and simultaneously applied to the same DPhil programme (Oxford’s name for the PhD). We were officemates for those 4 years, and shared a terraced hovel on St Mary’s Road in bohemian East Oxford with three other students for most of that time. He was clever, funny, kind, and serially passionate about all sorts of things. It was a privilege and a pleasure to have known him.
Alan had a career that spanned academia and industry, and he excelled at both. He described himself as a “semanticist”: using mathematics instead of English for precise descriptions of programming languages. He had already set out in that direction with his undergraduate project on concurrency under Robin Milner at Edinburgh; and he continued to work on concurrency for his DPhil under Bill Roscoe at Oxford, graduating in 1992.
Chalmers, 1991–1992
Alan spent the last year of his DPhil as a postdoc working for K V S Prasad at Chalmers University in Sweden. While there, he was assigned to host fellow Edinburgh alumnus Carolyn Brown visiting for an interview; Carolyn came bearing a bottle of malt whisky, as one does, which she and Alan proceeded to polish off together that evening.
Sussex, 1992–1999
Carolyn’s interview was successful; but by the time she arrived at Chalmers, Alan had left for a second postdoc under Matthew Hennessy at the University of Sussex. They worked together again when Carolyn was in turn hired as a lecturer at Sussex. In particular, they showed in 1994 that “string diagrams”—due to Roger Penrose and Richard Feynman in physics—provide a “fully abstract” calculus for hardware circuits, meaning that everything true of the diagrams is true of the hardware, and vice versa. This work foreshadowed a hot topic in the field of Applied Category Theory today.
Matthew essentially left Alan to his own devices: as Matthew put it, “something I was very happy with as he was an exceptional researcher”. Alan was soon promoted to a lectureship himself. He collaborated closely with Julian Rathke, then Matthew’s PhD student and later postdoc, on the Full Abstraction Factory project, developing a bunch more full abstraction results for concurrent and object-oriented languages. That fruitful collaboration continued even after Alan left Sussex.
Alan established the Foundations of Programming Languages research group at DePaul, attracting Radha Jagadeesan from Loyola, James Riely from Sussex, and Corin Pitcher from Oxford, working among other things on “relaxed memory”—modern processors don’t actually read and write their multiple levels of memory in the order you tell them to, when they can find quicker ways to do things concurrently and sometimes out of order.
James remembers showing Alan his first paper on relaxed memory, co-authored with Radha. Alan thought their approach was an “appalling idea”; the proper way was to use “event structures”, an idea from the 1980s. This turned in 2016 into a co-authored paper at LICS (Alan’s favourite conference), and what James considers his own best ever talk—an on-stage reenactment of the to and fro of their collaboration, sadly not recorded for posterity.
James was Alan’s most frequent collaborator over the years, with 14 joint papers. Their modus operandi was that, having identified a problem together, Alan would go off by himself and do some Alanny things, eventually coalescing on a solution, and choose an order of exposition, tight and coherent; this is about 40% of the life of the paper. But then there are various tweaks, extensions, corrections… Alan would never look at the paper again, and would be surprised years later to learn what was actually in it. However, Alan was always easy to work with: interested only in the truth, although it must be beautiful. He had a curious mix of modesty and egocentricity: always convinced he was right (and usually right that he was right). Still, he had no patience for boring stuff, especially university admin.
After the dot com crash in 2000, things got more difficult at DePaul, and Alan left in 2004 for Bell Labs, nominally as a member of technical staff in Naperville but actually part of a security group based at HQ in Murray Hill NJ. He worked on XPath, “a modal logic for XML”, with Michael Benedikt, now my databases colleague at Oxford. They bonded because only Alan and Michael lived in Chicago rather than the suburbs. Michael had shown Alan a recent award-winning paper in which Alan quickly spotted an error in a proof—an “obvious” and unproven lemma that turned out to be false—which led to their first paper together.
(A recurring pattern. Andy Gordon described Alan’s “uncanny ability to find bugs in arguments”: he found a type unsoundness bug in a released draft specification for Java, and ended up joining the standards committee to help fix it. And as a PhD examiner he “shockingly” found a subtle bug that unpicked the argument of half of the dissertation, necessitating major corrections: it took a brave student to invite Alan as examiner—or a very confident one.)
Michael describes Alan as an “awesome developer”. They once had an intern; it didn’t take long after the intern had left for Alan to discard the intern’s code and rewrite it from scratch. Alan was unusual in being able to combine Euro “abstract nonsense” and US engineering. Glenn Bruns, another Bell Labs colleague, said that “I think Alan was the only person I’ve met who could do theory and also low-level hackery”.
At Bell Labs Alan also worked with Peter Danielsen on the Web InterFace Language, WIFL for short: a way of embedding API descriptions in HTML. Peter recalls: “We spent a few months working together on the conceptual model. In the early stages of software development, however, Alan looked at what I’d written and said, “I wouldn’t do it that way at all!”, throwing it all away and starting over. The result was much better; and he inadvertently taught me a new way to think in JavaScript (including putting //Sigh… comments before unavoidable tedious code.)”
Mozilla Research, 2015–2020
The Bell Labs group dissolved in 2015, and Alan moved to Mozilla Research as a staff research engineer to work on Servo, a new web rendering engine in the under-construction programming language Rust.
For one of Alan’s projects at Mozilla, he took a highly under-specified part of the HTML specification about how web links and the back and forwards browser buttons should interact, created a formal model in Agda based on the existing specification, identified gaps in it as well as ways that major browsers did not match the model, then wrote it all up as a paper. Alan’s manager Josh Matthews recalls the editors of the HTML standard being taken aback by Alan’s work suddenly being dropped in their laps, but quickly appreciated how much more confidently they could make changes based on it.
Josh also recalled: “Similarly, any time other members of the team would talk about some aspect of the browser engine being safe as long as some property was upheld by the programmer, Alan would get twitchy. He had a soft spot for bad situations that looked like they could be made impossible by clever enough application of static types.”
In 2017 Alan made a rather surprising switch to working on augmented reality for the web, partly driven by internal politics at Mozilla. He took the lead on integrating Servo into the Magic Leap headset; the existing browser was clunky, the only way to interact with pages being via an awkward touchpad on the hand controller. This was not good enough for Alan: after implementing the same behaviour for Servo and finding it frustrating, he had several email exchanges with the Magic Leap developers, figured out how to access some interfaces that weren’t technically public but also were not actually private, and soon he proudly showed off a more natural laser pointer-style means of interacting with pages in augmented reality in Servo—to much acclaim from the team and testers.
Roblox, 2020–2024
Then in 2020, Mozilla’s funding stream got a lot more constrained, and Alan moved to the game platform company Roblox. Alan was a principal software engineer, and the language owner of Luau, “a fast, small, safe, gradually typed embeddable scripting language derived from Lua”, working on making the language easier to use, teach, and learn. Roblox supports more than two million “content creators”, mostly kids, creating millions of games a year; Alan’s goal was to empower them to build larger games with more characters.
The Luau product manager Bryan Nealer says that “people loved Alan”. Roblox colleagues appreciated his technical contributions: “Alan was meticulous in what he built and wrote at Roblox. He would stress not only the substance of his work, but also the presentation. His attention to detail inspired the rest of us!”; “One of the many wonderful things Alan did for us was to be the guy who could read the most abstruse academic research imaginable and translate it into something simple, useful, interesting, and even fun.” They also appreciated the more personal contributions: Alan led an internal paper reading group, meeting monthly to study some paper on programming or networking, but he also established the Roblox Book Club: “He was always thoughtful when discussing books, and challenged us to think about the text more deeply. He also had an encyclopedic knowledge of scifi. He recommended Iain M. Banks’s The Culture series to me, which has become my favorite scifi series. I think about him every time I pick up one of those books.”
Envoie
From my own perspective, one of the most impressive things about Alan is that he was impossible to pigeonhole: like Dr Who, he was continually regenerating. He explained to me that he got bored quickly with one area, and moved on to another. As well as his academic abilities, he was a talented and natural cartoonist: I still have a couple of the tiny fanzine comics he produced as a student.
Of course he did some serious science for his DPhil and later career: but he also took a strong interest in typography and typesetting. He digitized some beautiful Japanese crests for the chapter title pages of his DPhil dissertation. Alan dragged me in typography with him, a distraction I have enjoyed ever since. Among other projects, Alan and I produced a font containing some extra symbols so that we could use them in our papers, and named it St Mary’s Road after our Oxford digs. And Alan produced a full blackboard bold font, complete with lowercase letters and punctuation: you can see some of it in the order of service. But Alan was not satisfied with merely creating these things; he went to all the trouble to package them up properly and get them included in standard software distributions, so that they would be available for everyone: Alan loved to build things for people to use. These two fonts are still in regular use 35 years later, and I’m sure they will be reminding us of him for a long time to come.
The GHC developers are very pleased to announce the availability
of the second alpha release of GHC 9.12.1. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org.
We hope to have this release available via ghcup shortly.
GHC 9.12 will bring a number of new features and improvements, including:
The new language extension OrPatterns allowing you to combine multiple
pattern clauses into one.
The MultilineStrings language extension to allow you to more easily write
strings spanning multiple lines in your source code.
Improvements to the OverloadedRecordDot extension, allowing the built-in
HasField class to be used for records with fields of non lifted representations.
The NamedDefaults language extension has been introduced allowing you to
define defaults for typeclasses other than Num.
More deterministic object code output, controlled by the -fobject-determinism
flag, which improves determinism of builds a lot (though does not fully do so)
at the cost of some compiler performance (1-2%). See #12935 for the details
GHC now accepts type syntax in expressions as part of GHC Proposal #281.
The WASM backend now has support for TemplateHaskell.
… and many more
A full accounting of changes can be found in the release notes.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
For many years now I’ve been using a mental model and intuition that has
guided me well for understanding and teaching and using functors, applicatives,
monads, and other related Haskell abstractions, as well as for approaching
learning new ones. Sometimes when teaching Haskell I talk about this concept and
assume everyone already has heard it, but I realize that it’s something
universal yet easy to miss depending on how you’re learning it. So, here it is:
how I understand the Functor and other related abstractions and free
constructions in Haskell.
The crux is this: instead of thinking about what fmap changes,
ask: what does fmap keep constant?
This isn’t a rigorous understanding and isn’t going to explain every
aspect about every Functor, and will probably only be useful if you
already know a little bit about Functors in Haskell. But it’s a nice intuition
trick that has yet to majorly mislead me.
The Secret of Functors
First of all, what is a Functor? A capital-F Functor, that is, the
Haskell typeclass and abstraction. Ask a random Haskeller on the street and
they’ll tell you that it’s something that can be “mapped over”, like a list or
an optional. Maybe some of those random Haskellers will feel compelled to
mention that this mapping should follow some laws…they might even list the laws.
Ask them why these laws are so important and maybe you’ll spend a bit of time on
this rhetorical street of Haskellers before finding one confident enough to give
an answer.
So I’m going to make a bit of a tautological leap: a Functor gives
you a way to “map over” values in a way that preserves shape. And what
is “shape”? A shape is the thing that fmap preserves.
The Functor typeclass is simple enough: for Functor f, you have
a function fmap :: (a -> b) -> f a -> f b, along with
fmap id = id and fmap f . fmap g = fmap (f . g). Cute
things you can drop into quickcheck to prove for your instance, but it seems
like those laws are hiding some sort of deeper, fundamental truth.
The more Functors you learn about, the more you see that fmap
seems to always preserve “something”:
For lists, fmap preserves length and relative orderings.
For optionals (Maybe), fmap preserves
presence (the fact that something is there or not). It cannot flip a
Just to a Nothing or vice versa.
For Either e, fmap preserves the error
(if it exists) or the fact that it was succesful.
For Map k, fmap preserves the keys: which
keys exist, how many there are, their relative orderings, etc.
For IO, fmap preserves the IO effect.
Every bit of external I/O that an IO action represents is unchanged by an
fmap, as well as exceptions.
For Writer w or (,) w, fmap preserves
the “logged” w value, leaving it unchanged. Same for
Const w.
For Tree, fmap preserves the tree
structure: how many layers, how big they are, how deep they are, etc.
For State s, fmap preserves what happens to the
input state s. How a State s transform a state value
s is unchanged by fmap
For ConduitT i o m from conduit,
fmap preserves what the conduit pulls upstream and what it yields
downstream. fmap will not cause the conduit to yield more or
different objects, nor cause it to consume/pull more or less.
For parser-combinator Parser, fmap preserves what
input is consumed or would fail to be consumed. fmap cannot change
whether an input string would fail or succeed, and it cannot change how much it
consumes.
For optparse-applicativeParsers, fmap preserves the command line arguments
available. It leaves the --help message of your program
unchanged.
It seems like as soon as you define a Functor instance, or as
soon as you find out that some type has a Functor instance, it
magically induces some sort of … “thing” that must be preserved.1 A
conserved quantity must exist. It reminds me a bit of Noether’s Theorem
in Physics, where any continuous symmetry “induces” a conserved quantity (like
how translation symmetry “causes” conservation of momentum). In Haskell, every
lawful Functor instance induces a conserved quantity. I don’t know
if there is a canonical name for this conserved quantity, but I like to call it
“shape”.
A Story of Shapes
The word “shape” is chosen to be as devoid of external baggage/meaning as
possible while still having some. The word isn’t important as much as
saying that there is some “thing” preserved by fmap, and
not exactly the nature of that “thing”. The nature of that thing
changes a lot from Functor to Functor, where we might better call it an “effect”
or a “structure” specifically, but that some “thing” exists is almost
universal.
Of course, the value if this “thing” having a canonical name at all is
debatable. I were to coin a completely new term I might call it a “conserved
charge” or “gauge” in allusion to physics. But the most useful name probably
would be shape.
For some Functor instances, the word shape is more literal than
others. For trees, for instance, you have the literal shape of the tree
preserved. For lists, the “length” could be considered a literal shape.
Map k’s shape is also fairly literal: it describes the structure of
keys that exist in the map. But for Writer w and
Const w, shape can be interpreted as some information outside of
the values you are mapping that is left unchanged by mapping. For
Maybe and Either e shape also considers if there has
been any short-circuiting. For State s and IO and
Parser, “shape” involves some sort of side-computation or
consumption that is left unchanged by fmap, often called an effect.
For optparse-applicative, “shape” involves some sort of inspectable and
observable static aspects of a program. “Shape” comes in all forms.
But, this intuition of “looking for that conserved quantity” is very helpful
for learning new Functors. If you stumble onto a new type that you know
is a Functor instance, you can immediately ask “What shape
is this fmap preserving?”, and it will almost always yield insight
into that type.
This viewpoint also sheds insight onto why Set.map isn’t a good
candidate for fmap for Data.Set:
What “thing” does Set.map f preserve? Not size, for sure. In a
hypothetical world where we had
ordfmap :: Ord b => (a -> b) -> f a -> f b, we would
still need Set.map to preserve something for it to be
useful as an “Ord-restricted Functor”.2
A Result
Before we move on, let’s look at another related and vague concept that is
commonly used when discussing functors: fmap is a way to map a
function that preserves the shape and changes the result.
If shape is the thing that is preserved by
fmap, result is the thing that is changed by it.
fmap cleanly splits the two.
Interestingly, most introduction to Functors begin with describing functor
values as having a result and fmap as the thing that changes it, in
some way. Ironically, though it’s a more common term, it’s by far the more vague
and hard-to-intuit concept.
For something like Maybe, “result” is easy enough: it’s the
value present if it exists. For parser-combinator Parsers too it’s
relatively simple: the “shape” is the input consumed but the “result” is the
Haskell value you get as a result of the consumption. For
optparse-applicative parser, it’s the actual parsed command line
arguments given by the user at runtime. But sometimes it’s more complicated: for
the technical List functor, the “non-determinism” functor, the “shape” is the
number of options to choose from and the order you get them in, and the “result”
(to use precise semantics) is the non-deterministic choice that you eventually
pick or iterate over.
So, the “result” can become a bit confusing to generalize. So, in my mind, I
usually reduce the definitions to:
Shape: the “thing” that fmap preserves: the
f in f a
Result: the “thing” that fmap changes: the
a in f a
With this you could “derive” the Functor laws:
fmap id == id: fmap leaves the shape unchanged,
id leaves the result unchanged. So entire thing must remain
unchanged!
fmap f . fmap g == fmap (f . g). In both cases the shape
remains unchanged, but one changes the result by f after g, and the other
changes the result by f . g. They must be the same
transformation!
All neat and clean, right? So, maybe the big misdirection is focusing too
much on the “result” when learning Functors, when we should really be
focusing more on the “shape”, or at least the two together.
Once you internalize “Functor gives you shape-preservation”,
this helps you understand the value of the other common typeclass abstractions
in Haskell as well, and how they function based on how they manipulate “shape”
and “result”.
Traversable
For example, what does the Traversable typeclass give us? Well,
if Functor gives us a way to map pure functions and
preserve shape, then Traversable gives us a way to map
effectful functions and preserve shape.
Whenever someone asks me about my favorite Traversable instance,
I always say it’s the Map k traversable:
traverse ::Applicative f => (a -> f b) ->Map k a -> f (Map k b)
Notice how it has no constraints on k? Amazing isn’t it?
Map k b lets us map an (a -> f b) over the values
at each key in a map, and collects the results under the key the a
was originally under.
In essence, you can be assured that the result map has the same keys
as the original map, perfectly preserving the “shape” of the map. The
Map k instance is the epitome of beautiful Traversable
instances. We can recognize this by identifying the “shape” that
traverse is forced to preserve.
Applicative
What does the Applicative typeclass give us? It has
ap and pure, but its
laws are infamously difficult to understand.
But, look at liftA2 (,):
liftA2 (,) ::Applicative f => f a -> f b -> f (a, b)
It lets us take “two things” and combine their shapes. And, more
importantly, it combines the shapes without considering the
results.
For Writer w, <*> lets us combine the two
logged values using mappend while ignoring the actual
a/b results.
For list, <*> (the cartesian product) lets us multiply
the lengths of the input lists together. The length of the new list
ignores the actual contents of the list.
For State s, <*> lets you compose
the s -> s state functions together, ignoring the
a/bs
For Parser, <*> lets you sequence input
consumption in a way that doesn’t depend on the actual values you parse: it’s
“context-free” in a sense, aside from some
caveats.
For optparse-applicative, <*> lets you combine
your command line argument specs together, without depending on the actual
values provided at runtime by the caller.
The key takeaway is that the “final shape” only depends on the input
shapes, and not the results. You can know the length of
<*>-ing two lists together with only knowing the length of
the input lists, and you can also know the relative ordering of inputs to
outputs. Within the specific context of the semantics of IO, you
can know what “effect” <*>-ing two IO actions would produce
only knowing the effects of the input IO actions3. You can
know what command line arguments <*>-ing two
optparse-applicative parsers would have only knowing the command line
arguments in the input parsers. You can know what strings
<*>-ing two parser-combinator parsers would consume or
reject, based only on the consumption/rejection of the input parsers. You can
know the final log of <*>-ing two Writer w as
together by only knowing the logs of the input writer actions.
And hey…some of these combinations feel “monoidal”, don’t they?
Writer w sequences using mappend
List lengths sequence by multiplication
State s functions sequence by composition
You can also imagine “no-op” actions:
Writer w’s no-op action would log mempty, the
identity of mappend
List’s no-op action would have a length 1, the identity of
multiplication
State s’s no-op action would be id, the identity
of function composition
That might sound familiar — these are all pure from the
Applicative typeclass!
So, the Applicative typeclass laws aren’t that mysterious at all. If you
understand the “shape” that a Functor induces, Applicative gives
you a monoid on that shape! This is why Applicative is
often called the “higher-kinded” Monoid.
This intuition takes you pretty far, I believe. Look at the examples above
where we clearly identify specific Applicative instances with
specific Monoid instances (Monoid w,
Monoid (Product Int), Monoid (Endo s)).
Put in code:
-- A part of list's shape is its length and the monoid is (*, 1)length (xs <*> ys) ==length xs *length yslength (pure r) ==1-- Maybe's shape is isJust and the monoid is (&&, True)isJust (mx <*> my) == isJust mx && isJust myisJust (pure r) =True-- State's shape is execState and the monoid is (flip (.), id)execState (sx <*> sy) == execState sy . execState sxexecState (pure r) ==id-- Writer's shape is execWriter and the monoid is (<>, mempty)execWriter (wx <*> wy) == execWriter wx <> execWriter wyexecWriter (pure r) ==mempty
We can also extend this to non-standard Applicative instances:
the ZipList newtype wrapper gives us an Applicative
instance for lists where <*> is zipWith. These
two have the same Functor instances, so their “shape” (length) is
the same. And for both the normal Applicative and the
ZipListApplicative, you can know the length of the
result based on the lengths of the input, but ZipList combines
shapes using the Min monoid, instead of the Product
monoid. And the identity of Min is positive infinity, so
pure for ZipList is an infinite list.
-- A part of ZipList's shape is length and its monoid is (min, infinity)length (xs <*> ys) ==length xs `min`length yslength (pure r) == infinity
The “know-the-shape-without-knowing-the-results” property is actually
leveraged by many libraries. It’s how optparse-applicative can give you
--help output: the shape of the optparse-applicative
parser (the command line arguments list) can be computed without knowing the
results (the actual arguments themselves at runtime). You can list out what
arguments are expecting without ever getting any input from the user.
This is also leveraged by the async library to give
us the ConcurrentlyApplicative instance. Normally
<*> for IO gives us sequential combination of IO effects.
But, <*> for Concurrently gives us
parallel combination of IO effects. We can launch all of the IO effects
in parallel at the same time because we know what the IO effects are
before we actually have to execute them to get the results. If we needed to know
the results, this wouldn’t be possible.
This also gives some insight into the Backwards
Applicative wrapper — because the shape of the final does not depend on the
result of either, we are free to combine the shapes in whatever order
we want. In the same way that every monoid gives rise to a
“backwards” monoid:
The monoidal nature of Applicative with regards to shapes and effects is the
heart of the original intent, and I’ve discussed this in earlier
blog posts.
Alternative
The main function of the Alternative
typeclass is <|>:
(<|>) ::Alternative f => f a -> f a -> f a
At first this might look a lot like <*> or
liftA2 (,)
liftA2 (,) ::Applicative f => f a -> f b -> f (a, b)
Both of them take two f a values and squish them into a single
one. Both of these are also monoidal on the shape, independent of the result.
They have a different monoidal action on <|> than as
<*>:
-- A part of list's shape is its length:-- the Ap monoid is (*, 1), the Alt monoid is (+, 0)length (xs <*> ys) ==length xs *length yslength (pure r) ==1length (xs <|> ys) ==length xs +length yslength empty ==0-- Maybe's shape is isJust:-- The Ap monoid is (&&, True), the Alt monoid is (||, False)isJust (mx <*> my) == isJust mx && isJust myisJust (pure r) =TrueisJust (mx <|> my) == isJust mx || isJust myisJust empty =False
If we understand that functors have a “shape”, Applicative
implies that the shapes are monoidal, and Alternative implies that
the shapes are a “double-monoid”. The exact nature of how the two monoids relate
to each other, however, is not universally agreed upon. For many instances,
however, it does happen to form a semiring, where
empty “annihilates” via empty <*> x == empty,
and <*> distributes over <|> like
x <*> (y <|> z) == (x <*> y) <|> (x <*> z).
But this is not universal.
However, what does Alternative bring to our shape/result
dichotomy that Applicative did not? Notice the subtle difference
between the two:
liftA2 (,) ::Applicative f => f a -> f b -> f (a, b)(<|>) ::Alternative f => f a -> f a -> f a
For Applicative, the “result” comes from the results of both
inputs. For Alternative, the “result” could come from one or the
other input. So, this introduces a fundamental data dependency for the
results:
Applicative: Shapes merge monoidally independent of the results, but to get
the result of the final, you need to produce the results of both of the two
inputs in the general case.
Alternative: Shapes merge monoidally independent of the results, but to get
the result of the final, you need the results of one or the other input in the
general case.
This also implies that choice of combination method for shapes in
Applicative vs Alternative aren’t arbitrary: the
former has to be “conjoint” in a sense, and the latter has to be “disjoint”.
See again that clearly separating the shape and the result gives us the
vocabulary to say precisely what the different data dependencies are.
Monad
Understanding shapes and results also help us appreciate more the sheer
power that Monad gives us. Look at >>=:
(>>=) ::Monad m => m a -> (a -> m b) -> m b
Using >>= means that the shape of the final action is
allowed to depend on the result of the first action! We are no longer
in the Applicative/Alternative world where shape only depends on shape.
Now we can write things like:
greet =doputStrLn"What is your name?" n <-getLineputStrLn ("Hello, "++ n ++"!")
Remember that for “IO”, the shape is the IO effects (In this case, what
exactly gets sent to the terminal) and the “result” is the haskell value
computed from the execution of that IO effect. In our case, the action
of the result (what values are printed) depends on the result of of the
intermediate actions (the getLine). You can no longer know in
advance what action the program will have without actually running it and
getting the results.
The same thing happens when you start sequencing parser-combinator parsers:
you can’t know what counts as a valid parse or how much a parser will consume
until you actually start parsing and getting your intermediate parse
results.
Monad is also what makes guard and co. useful.
Consider the purely Applicative:
If you passed in a list of 100 items and a list of 200 items, you can know
that the result has 100 * 200 = 20000 items, without actually knowing any of the
items in the list.
But, consider an alternative formulation where we are allowed to use Monad
operations:
Now, even if you knew the lengths of the input lists, you can
not know the length of the output list without actually knowing what’s
inside your lists. You need to actually start “sampling”.
That’s why there is no Monad instance for Backwards
or optparse-applicative parsers. For Backwards doesn’t
work because we’ve now introduced an asymmetry (the m b depends on
the a of the m a) that can’t be reversed. For
optparse-applicative, it’s because we want to be able to inspect the
shape without knowing the results at runtime (so we can show a useful
--help without getting any actual arguments): but, with
Monad, we can’t know the shape without knowing the results!
In a way, Monad simply “is” the way to combine Functor shapes
together where the final shape is allowed to depend on the results. Hah, I
tricked you into reading a monad tutorial!
Free Structures
I definitely write way too much about
free structures on this blog. But this “shapeful” way of thinking also gives
rise to why free structures are so compelling and interesting to work with in
Haskell.
Before, we were describing shapes of Functors and Applicatives and Monads
that already existed. We had thisFunctor, what was
its shape?
However, what if we had a shape that we had in mind, and wanted to
create an Applicative or Monad that
manipulated that shape?
For example, let’s roll our own version of optparse-applicative that
only supported --myflag somestring options. We could say that the
“shape” is the list of supported option and parsers. So a single element of this
shape would be the specification of a single option:
dataOption a =Option { optionName ::String, optionParse ::String->Maybe a }derivingFunctor
The “shape” here is the name and also what values it would parse,
essentially. fmap won’t affect the name of the option and won’t
affect what would succeed or fail.
We specified the shape we wanted, now we get the Applicative of
that shape for free! We can now combine our shapes monoidally using the
<*> instance, and then use runAp_ to inspect
it:
Remember that Applicative is like a “monoid” for shapes, so
Ap gives you a free “monoid” on your custom shape: you can now
create list-like “sequences” of your shape that merge via concatenation through
<*>. You can also know that fmap on
Ap Option will not add or remove options: it’ll leave the actual
options unchanged. It’ll also not affect what options would fail or succeed to
parse.
You could also write a parser combinator library this way too! Remember that
the “shape” of a parser combinator Parser is the string that it
consumes or rejects. The single element might be a parser that consumes and
rejects a single Char:
newtypeSingle a =Single { satisfies ::Char->Maybe a }derivingFunctor
The “shape” is whether or not it consumes or rejects a char. Notice that
fmap for this cannot change whether or not a char is
rejected or accepted: it can only change the Haskell result a
value. fmap can’t flip the Maybe into a
Just or Nothing.
Again, we specified the shape we wanted, and now we have a Monad for that
shape! For more information on using this, I’ve written a blog post in the
past. Ap gives you a free “monoid” on your shapes, but in a way
Free gives you a “tree” for your shapes, where the sequence of
shapes depends on which way you go down their results. And, again,
fmap won’t ever change what would or would not be parsed.
How do we know what free structure to pick? Well, we ask questions about what
we want to be able to do with our shape. If we want to inspect the shape without
knowing the results, we’d use the free Applicative or free Alternative. As
discussed earlier, using the free Applicative means that our final result must
require producing all of the input results, but using the free Alternative means
it doesn’t. If we wanted to allow the shape to depend on the results (like for a
context-sensitive parser), we’d use the free Monad. Understanding the concept of
the “shape” makes this choice very intuitive.
The Shape of You
Next time you encounter a new Functor, I hope these insights can be useful.
Ask yourself, what is fmap preserving? What is fmap
changing? And from there, its secrets will unfold before you. Emmy Noether would
be proud.
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
There are some exceptions, especially degenerate cases
like Writer () aka Identity which add no meaningful
structure. So for these this mental model isn’t that useful.↩︎
Incidentally, Set.mapdoes preserve one
thing: non-emptiness. You can’t Set.map an empty set into a
non-empty one and vice versa. So, maybe if we recontextualized Set
as a “search for at least one result” Functor or Monad
where you could only ever observe a single value, Set.map would
work for Ord-restricted versions of those abstractions, assuming lawful
Ord instances.↩︎
That is, if we take the sum consideration of all input-output
with the outside world, independent of what happens within the Haskell results,
we can say the combination of effects is deterministic.↩︎
Gabriele Keller, professor at Utrecht University, is interviewed by Andres and Joachim. We follow her journey through the world as well as programming languages, learn why Haskell is the best environment for embedding languages and how the desire to implement parallel programming sparked the development of type families in Haskell and that teaching functional programming works better with graphics.
A union-find data
structure
(also known as a disjoint set data structure) keeps track of a
collection of disjoint sets, typically with elements drawn from
\(\{0, \dots, n-1\}\). For example, we might have the sets
\(\{1,3\}, \{0, 4, 2\}, \{5, 6, 7\}\)
A union-find structure must support three basic operations:
We can \(\mathit{create}\) a union-find structure with \(n\) singleton sets
\(\{0\}\) through \(\{n-1\}\). (Alternatively, we could support two
operations: creating an empty union-find structure, and adding a new
singleton set; occasionally this more fine-grained approach is
useful, but we will stick with the simpler \(\mathit{create}\) API for now.)
We can \(\mathit{find}\) a given \(x \in \{0, \dots, n-1\}\), returning some sort
of “name” for the set \(x\) is in. It doesn’t matter what these names
are; the only thing that matters is that for any \(x\) and \(y\),
\(\mathit{find}(x) = \mathit{find}(y)\) if and only if \(x\) and \(y\) are in the same
set. The most important application of \(\mathit{find}\) is therefore to check
whether two given elements are in the same set or not.
We can \(\mathit{union}\) two elements, so the sets that contain them become
one set. For example, if we \(\mathit{union}(2,5)\) then we would have
\(\{1,3\}, \{0, 4, 2, 5, 6, 7\}\)
Note that \(\mathit{union}\) is a one-way operation: once two sets have been
unioned together, there’s no way to split them apart again. (If both
merging and splitting are required, one can use a link/cut
tree, which is very cool—and possibly something I will write
about in the future—but much more complex.) However, these three
operations are enough for union-find structures to have a large number
of interesting applications!
In addition, we can annotate each set with a value taken from some
commutative semigroup. When creating a new union-find structure, we
must specify the starting value for each singleton set; when unioning
two sets, we combine their annotations via the semigroup operation.
For example, we could annotate each set with its size; singleton
sets always start out with size 1, and every time we union two sets
we add their sizes.
We could also annotate each set with the sum, product, maximum, or
minumum of all its elements.
Of course there are many more exotic examples as well.
We typically use a commutative semigroup, as in the examples above;
this guarantees that a given set always has a single well-defined
annotation value, regardless of the sequence of union-find operations
that were used to create it. However, we can actually use any binary
operation at all (i.e. any magma), in which case the annotations
on a set may reflect the precise tree of calls to \(\mathit{union}\) that were
used to construct it; this can occasionally be useful.
For example, we could annotate each set with a list of values, and
combine annotations using list concatenation; the order of elements
in the list associated to a given set will depend on the order of
arguments to \(\mathit{union}\).
We could also annotate each set with a binary tree storing values at
the leaves. Each singleton set is annotated with a single leaf; to
combine two trees we create a new branch node with the two trees as
its children. Then each set ends up annotated with the precise tree
of calls to \(\mathit{union}\) that were used to create it.
Implementing union-find
My implementation is based on one by Kwang Yul
Seo,
but I have modified it quite a bit. The code is also available in my
comprog-hs
repository. This
blog post is not intended to be a comprehensive union-find tutorial,
but I will explain some things as we go.
{-# LANGUAGE RecordWildCards #-}moduleUnionFindwhereimportControl.Monad (when)importControl.Monad.STimportData.Array.ST
Let’s start with the definition of the UnionFind type itself.
UnionFind has two type parameters: s is a phantom type parameter
used to limit the scope to a given ST computation; m is the type
of the arbitrary annotations. Note that the elements are also
sometimes called “nodes”, since, as we will see, they are organized
into trees.
typeNode=IntdataUnionFind s m =UnionFind {
The basic idea is to maintain three mappings:
First, each element is mapped to a parent (another element).
There are no cycles, except that some elements can be their own
parent. This means that the elements form a forest of rooted
trees, with the self-parenting elements as roots. We
store the parent mapping as an STUArray (see here for another post where we used STUArray) for
efficiency.
parent ::!(STUArray s NodeNode),
Each element is also mapped to a size. We maintain the
invariant that for any element which is a root (i.e. any element
which is its own parent), we store the size of the tree rooted at
that element. The size associated to other, non-root elements
does not matter.
(Many implementations store the height of each tree instead of
the size, but it does not make much practical difference, and the
size seems more generally useful.)
sz ::!(STUArray s NodeInt),
Finally, we map each element to a custom annotation value; again,
we only care about the annotation values for root nodes.
ann ::!(STArray s Node m) }
To \(\mathit{create}\) a new union-find structure, we need a size and a
function mapping each element to an initial annotation value. Every
element starts as its own parent, with a size of 1. For convenience,
we can also make a variant of createWith that gives every element
the same constant annotation value.
createWith ::Int-> (Node-> m) ->ST s (UnionFind s m)createWith n m =UnionFind<$> newListArray (0, n -1) [0.. n -1] -- Every node is its own parent<*> newArray (0, n -1) 1-- Every node has size 1<*> newListArray (0, n -1) (map m [0.. n -1])create ::Int-> m ->ST s (UnionFind s m)create n m = createWith n (const m)
To perform a \(\mathit{find}\) operation, we keep following parent
references up the tree until reaching a root. We can also do a cool
optimization known as path compression: after finding a
root, we can directly update the parent of every node along the path
we just traversed to be the root. This means \(\mathit{find}\) can be very
efficient, since it tends to create trees that are extremely wide and
shallow.
find ::UnionFind s m ->Node->ST s Nodefind uf@(UnionFind {..}) x =do p <- readArray parent xif p /= xthendo r <- find uf p writeArray parent x rpure relsepure xconnected ::UnionFind s m ->Node->Node->ST s Boolconnected uf x y = (==) <$> find uf x <*> find uf y
Finally, to implement \(\mathit{union}\), we find the roots of the given nodes;
if they are not the same we make the root with the smaller tree the
child of the other root, combining sizes and annotations as
appropriate.
union ::Semigroup m =>UnionFind s m ->Node->Node->ST s ()union uf@(UnionFind {..}) x y =do x <- find uf x y <- find uf y when (x /= y) $do sx <- readArray sz x sy <- readArray sz y mx <- readArray ann x my <- readArray ann yif sx < sythendo writeArray parent x y writeArray sz y (sx + sy) writeArray ann y (mx <> my)elsedo writeArray parent y x writeArray sz x (sx + sy) writeArray ann x (mx <> my)
Note the trick of writing x <- find uf x: this looks kind of like an
imperative statement that updates the value of a mutable variable x,
but really it just makes a new variable x which shadows the old
one.
Finally, a few utility functions. First, one to get the size of
the set containing a given node:
size ::UnionFind s m ->Node->ST s Intsize uf@(UnionFind {..}) x =do x <- find uf x readArray sz x
Also, we can provide functions to update and fetch the custom
annotation value associated to the set containing a given node.
updateAnn ::Semigroup m =>UnionFind s m ->Node-> m ->ST s ()updateAnn uf@(UnionFind {..}) x m =do x <- find uf x old <- readArray ann x writeArray ann x (old <> m)-- We could use modifyArray above, but the version of the standard library-- installed on Kattis doesn't have itgetAnn ::UnionFind s m ->Node->ST s mgetAnn uf@(UnionFind {..}) x =do x <- find uf x readArray ann x
Challenge
Here are a couple of problems I challenge you to solve for next time:
So you went ahead and created a new programming language, with an AST, a parser, and an interpreter. And now you hate how you have to write the programs in your new language in files to run them? You need a REPL! In this post, we’ll create a shiny REPL with lots of nice features using the Haskeline library to go along with your new PL that you implemented in Haskell.
<noscript>
Play demo
<noscript></noscript>
</noscript>
That is a pretty good REPL, isn’t it? You can even try it online1, running entirely in your browser.
Dawn of a New Language
Let’s assume that we have created a new small Lisp2, just large enough to be able to conveniently write and run the Fibonacci function that returns the nth Fibonacci number. That’s it, nothing more. This lets us focus on the features of the REPL3, not the language.
We have a parser to parse the code from text to an AST, and an interpreter that evaluates an AST and returns a value. We are not going into the details of the parser and the interpreter, just listing the type signatures of the functions they provide is enough for this post.
That’s right! We named our little language FiboLisp.
FiboLisp is expression oriented; everything is an expression. So naturally, we have an Expr AST. Writing the Fibonacci function requires not many syntactic facilities. In FiboLisp we have:
integer numbers,
booleans,
variables,
addition, subtraction, and less-than binary operations on numbers,
conditional if expressions, and
function calls by name.
We also have function definitions, captured by Def, which records the function name, its parameter names, and its body as an expression.
And finally we have Programs, which are a bunch of function definitions to define, and another bunch of expressions to evaluate.
Short and simple. We don’t need anything more4. This is how the Fibonacci function looks in FiboLisp:
(def fibo [n] (if (< n 2) n (+ (fibo (- n 1)) (fibo (- n 2)))))
We can see all the AST types in use here. Note that FiboLisp is lexically scoped.
The module also lists a bunch of keywords (carKeywords) that can appear in the car5 position of a Lisp expression, that we use later for auto-completion in the REPL, and some functions to convert the AST types to nice looking strings.
The essential function is parse, which takes the code as a string, and returns either a ParsingError on failure, or a Program on success. If the parser detects that an S-expression is not properly closed, it returns an EndOfStreamError error.
We also have this pretty-printer module that converts function ASTs back to pretty Lisp code:
We have elided the details again. All that matters to us is the interpret function that takes a program, and returns either a runtime error or a value. Value is the runtime representation of the values of FiboLisp expressions, and all we care about is that it can be shown and fully evaluated via NFData6. interpret also takes a String->IO () function, that’ll be demystified when we get into implementing the REPL.
Lastly, we have a map of built-in functions and a list of built-in values. We expose them so that they can be treated specially in the REPL.
If you want, you can go ahead and fill in the missing code using your favourite parsing and pretty-printing libraries7, and the method of writing interpreters. For this post, those implementation details are not necessary.
Let’s package all this functionality into a module for ease of importing:
Now, with all the preparations done, we can go REPLing.
A REPL of Our Own
The main functionality that a REPL provides is entering expressions and definitions, one at a time, that it Reads, Evaluates, and Prints, and then Loops back, letting us do the same again. This can be accomplished with a simple program that prompts the user for an input and does all these with it. However, such a REPL will be quite lackluster.
These days programming languages come with advanced REPLs like IPython and nREPL, which provide many functionalities beyond simple REPLing. We want FiboLisp to have a great REPL too.
You may have already noticed some advanced features that our REPL provides in the demo. Let’s state them here:
Commands starting with colon:
to set and unset settings: :set and :unset,
to load files into the REPL: :load,
to show the source code of functions: :source,
to show a help message: :help.
Settings to enable/disable:
dumping of parsed ASTs: dump,
showing program execution times: time.
Multiline expressions and functions, with correct indentation.
Colored output and messages.
Auto-completion of commands, code and file names.
Safety checks when loading files.
Readline-like navigation through the history of previous inputs.
Haskeline — the Haskell library that we use to create the REPL — provides only basic functionalities, upon which we build to provide these features. Let’s begin.
Notice that we import the previously shown Language.FiboLisp module qualified as L, and Haskeline as H. Another important library that we use here is terminfo, which helps us do colored output.
A REPL must preserve the context through a session. In case of FiboLisp, this means we should be able to define a function9 as one input, and then use it later in the session, one or many times10. The REPL should also respect the REPL settings through the session till they are unset.
Additionally, the REPL has to remember whether it is in middle of writing a multiline input. To support multiline input, the REPL also needs to remember the previous indentation, and the input done in previous lines of a multiline input. Together these form the ReplState:
Nothing fancy here, just splitting the input into words and going through them to make sure they are valid.
The REPL is a monad that wraps over ReplState:
newtypeRepl a =Repl { runRepl_ ::StateTReplState (ReaderTAddColorIO) a }deriving ( Functor,Applicative,Monad,MonadIO,MonadStateReplState,MonadReaderAddColor,Catch.MonadThrow,Catch.MonadCatch,Catch.MonadMask )typeAddColor=Term.Color->String->StringrunRepl ::AddColor->Repl a ->IO arunRepl addColor =fmapfst.flip runReaderT addColor.flip runStateT (ReplState Map.empty Set.empty SingleLine0""). runRepl_
Repl also lets us do IO — is it really a REPL if you can’t do printing — and deal with exceptions. Additionally, we have a read-only state that is a function, which will be explained soon. The REPL starts in the single line mode, with no indentation, functions definitions, settings, or previously seen input.
REPLing Down the Prompt
Let’s go top-down. We write the run function that is the entry point of this module:
run ::IO ()run =do term <- Term.setupTermFromEnvlet addColor =case Term.getCapability term $ Term.withForegroundColor @StringofJust fc -> fcNothing-> \_ s -> s runRepl addColor . H.runInputT settings $do H.outputStrLn $ addColor promptColor "FiboLisp REPL" H.outputStrLn $ addColor infoColor "Press <TAB> to start" replwhere settings = H.setComplete doCompletions $ H.defaultSettings {H.historyFile =Just".fibolisp"}
This sets up Haskeline to run our REPL using the functions we provide in the later sections: repl and doCompletions. This also demystifies the read-only state of the REPL: a function that adds colors to our output strings, depending on the capabilities of the terminal in which our REPL is running in. We also set up a history file to remember the previous REPL inputs.
When the REPL starts, we output some messages in nice colors, which are defined as:
We infuse our Repl with the powers of Haskeline by wrapping it with Haskeline’s InputT monad transformer, and call it the Prompt type. In the repl function, we readInput, evalAndPrint it, and repl again.
We also deal with the user quitting the REPL (the EndOfInput case), and hitting Ctrl + C to interrupt typing or a running evaluation (the handling for H.Interrupt).
Wait a minute! What is that imperative looking .= doing in our Haskell code? That’s right, we are looking through some lenses!
typeLens' s a =Lens.Lens s s a areplDefs ::Lens'ReplStateDefsreplDefs =$(Lens.field '_replDefs)replSettings ::Lens'ReplStateSettingsreplSettings =$(Lens.field '_replSettings)replLineMode ::Lens'ReplStateLineModereplLineMode =$(Lens.field '_replLineMode)replIndent ::Lens'ReplStateIntreplIndent =$(Lens.field '_replIndent)replSeenInput ::Lens'ReplStateStringreplSeenInput =$(Lens.field '_replSeenInput)use :: (MonadTrans t, MonadState s m) =>Lens' s a -> t m ause l = lift . State.gets $ Lens.view l(.=) :: (MonadTrans t, MonadState s m) =>Lens' s a -> a -> t m ()l .= a = lift . State.modify' $ Lens.set l a(%=) :: (MonadTrans t, MonadState s m) =>Lens' s a -> (a -> a) -> t m ()l %= f = lift . State.modify' $ Lens.over l f
If you’ve never encountered lenses before, you can think of them as pairs of setters and getters. The repl* lenses above are for setting and getting the corresponding fields from the ReplState data type11. The use, .=, and %= functions are for getting, setting and modifying respectively the state in the State monad using lenses. We see them in action at the beginning of the repl function when we use .= to set the various fields of ReplState to their initial values in the State monad.
All that is left now is actually reading the input, evaluating it and printing the results.
Reading the Input
Haskeline gives us functions to read the user’s input as text. However, being Haskellers, we prefer some structure around it:
We’ve got all previously mentioned cases covered with the Input data type. We also do some input validation and capture errors for the failure cases with the BadInputError constructor. EndOfInput is used for when the user quits the REPL.
We use the getInputLineWithInitial function provided by Haskeline to show a prompt and read user’s input as a string. The prompt shown depends on the LineMode of the REPL state. In the SingleLine mode we show λ>, where in the MultiLine mode we show |>.
If there is no input, that means the user has quit the REPL. In that case we return EndOfInput, which is handled in the repl function. If the input is empty, we read more input, preserving the previous indentation (prevIndent) in the MultiLine mode.
If the input starts with :, we parse it for various commands:
The :help and :source cases are straightforward. In case of :load, we make sure to check that the file asked to be loaded is located somewhere inside the current directory of the REPL or its recursive subdirectories. Otherwise, we deny loading by returning a BadInputError. We parse the settings using the parseSettingCommand function we wrote earlier.
If the input is not a command, we parse it as code:
We append the previously seen input (in case of multiline input) with the current input and parse it using the parse function provided by the Language.FiboLisp module. If parsing fails with an
EndOfStreamError, it means that the input is incomplete. In that case, we set the REPL line mode to Multiline, REPL indentation to the current indentation, and seen input to the previously seen input appended with the current input, and read more input. If it is some other error, we return a BadInputError with it.
If the result of parsing is a program, we return it as a Program input.
That’s it for reading the user input. Next, we evaluate it.
Evaluating the Input
Recall that the repl function calls the evalAndPrint function with the read input:
evalAndPrint ::Input->Prompt ()evalAndPrint = \caseEndOfInput->return ()BadInputError err -> outputWithColor errorColor errHelp-> H.outputStr helpMessageSetting (Set, setting) -> replSettings %= Set.insert settingSetting (Unset, setting) -> replSettings %= Set.delete settingSource ident -> showSource identLoad fp -> loadAndEvalFile fpProgram program -> interpretAndPrint programwhere helpMessage =unlines [ "Available commands",":set/:unset dump Dumps the program AST",":set/:unset time Shows the program execution time",":load <file> Loads a source file",":source <func_name> Prints the source code of a function",":help Shows this help" ]
The cases of EndOfInput, BadInputError and Help are straightforward. For settings, we insert or remove the setting from the REPL settings, depending on it being set or unset. For the other cases, we call the respective helper functions.
For a :source command, we check if the requested identifier maps to a user-defined or builtin function, and if so, print its source. Otherwise we print an error.
For a :load command, we check if the requested file exists. If so, we read and parse it, and interpret the resultant program. In case of any errors in reading or parsing the file, we catch and print them.
Finally, we come to the workhorse of the REPL: the interpretation of the user provided program:
interpretAndPrint ::L.Program->Prompt ()interpretAndPrint (L.Program pDefs exprs) = Catch.handleAll outputError $do defs <- use replDefs settings <- use replSettingslet defs' = foldl' (\ds d -> Map.insert (L.defName d) d ds) defs pDefs program =L.Program (Map.elems defs') exprs when (Dump`Set.member` settings) $ outputWithColor infoColor (L.showProgram program) addColor <- getAddColor extPrint <- H.getExternalPrint (execTime, val) <- liftIO . measureElapsedTime $do val <- L.interpret (extPrint . addColor printColor) program evaluate $ DS.force valcase val ofLeft err -> outputError errRight v ->dolet output =show vifnull outputthenreturn ()else outputWithColor outputColor $"=> "<> output when (MeasureTime`Set.member` settings) $ outputWithColor infoColor $"(Execution time: "<>show execTime <>")" replDefs .= defs'measureElapsedTime ::IO a ->IO (NominalDiffTime, a)measureElapsedTime f =do start <- getCurrentTime ret <- f end <- getCurrentTimereturn (diffUTCTime end start, ret)
We start by collecting the user defined functions in the current input with the previously defined functions in the session such that current functions override the previous functions with the same names. At this point, if the dump setting is set, we print the program AST.
Then we invoke the interpret function provided by the Language.FiboLisp module. Recall that the interpret function takes the program to interpret and a function of type String->IO (). This function is a color-adding wrapper over the function returned by the Haskeline function getExternalPrint12. This function allows non-REPL code to safely print to the Haskeline driven REPL without garbling the output. We pass it to the interpret function so that the interpret can invoke it when the user code invokes the builtin print function or similar.
We make sure to force and evaluate the value returned by the interpreter so that any lazy values or errors are fully evaluated13, and the measured elapsed time is correct.
If the interpreter returns an error, we print it. Else we convert the value to a string, and if is it not empty14, we print it.
Finally, we print the execution time if the time setting is set, and set the REPL defs to the current program defs.
That’s all! We have completed our REPL. But wait, I think we forgot one thing …
Doing the Completions
The REPL would work fine with this much code, but it would not be a good experience for the user, because they’d have to type everything without any help from the REPL. To make it convenient for the user, we provide contextual auto-completion functionality while typing. Haskeline lets us plug in our custom completion logic by setting a completion function, which we did way back at the start. Now we need to implement it.
doCompletions ::H.CompletionFuncRepldoCompletions =fmap runIdentityT . H.completeWordWithPrev Nothing" "$ \leftRev word ->do defs <- use replDefs lineMode <- use replLineMode settings <- use replSettingslet funcs = nub $ Map.keys defs <> Map.keys L.builtinFuncs vals =mapshow L.builtinValscase (word, lineMode) of ('(': rest, _) ->pure [ H.Completion ('(': hint) hint True| hint <- nub .sort$ L.carKeywords <> funcs, rest `isPrefixOf` hint ] (_, SingleLine) ->case word of""|null leftRev ->pure [H.Completion"" s True| s <- commands <> funcs <> vals]':': _ |null leftRev ->pure [H.simpleCompletion c | c <- commands, word `isPrefixOf` c] _|"tes:"`isSuffixOf` leftRev ->pure [ H.simpleCompletion $show s| s <- [Dump..], s `notElem` settings, word `isPrefixOf`show s ]|"tesnu:"`isSuffixOf` leftRev ->pure [ H.simpleCompletion $show s| s <- [Dump..], s `elem` settings, word `isPrefixOf`show s ]|"daol:"`isSuffixOf` leftRev -> isSafeFilePath word >>= \caseTrue-> H.listFiles wordFalse->pure []|"ecruos:"`isSuffixOf` leftRev ->pure [ H.simpleCompletion ident| ident <- funcs, ident `Map.notMember` L.builtinFuncs, word `isPrefixOf` ident ]|otherwise->pure [H.simpleCompletion c | c <- funcs <> vals, word `isPrefixOf` c] _ ->pure []where commands =":help":":load":":source":mapshow [Set..]
Haskeline provides us the completeWordWithPrev function to easily create our own completion function. It takes a callback function that it calls with the current word being completed (the word immediately to the left of the cursor), and the content of the line before the word (to the left of the word), reversed. We use these to return different completion lists of strings.
Going case by case:
If the word starts with (, it means we are in middle of writing FiboLisp code. So we return the carKeywords and the user-defined and builtin function names that start with the current word sans the initial (. This happens regardless of the current line mode. Rest of the cases below apply only in the SingleLine mode.
If the entire line is empty, we return the names of all commands, functions, and builtin values.
If the word starts with :, and is at the beginning of the line, we return the commands that start with the word.
If the line starts with
:set, we return the not set settings
:unset, we return the set settings
:load, we return the names of the files and directories in the current directory
:source, we return the names of the user-defined functions
that start with the word.
Otherwise we return no completions.
This covers all cases, and provides helpful completions, while avoiding bad ones. And this completes the implementation of our wonderful REPL.
Conclusion
I wrote this REPL while implementing a Lisp that I wrote15 while going through the Essentials of Compilation book, which I thoroughly recommend for getting started with compilers. It started as a basic REPL, and gathered a lot of nice functionalities over time. So I decided to extract and share it here. I hope that this Haskeline tutorial helps you in creating beautiful and useful REPLs. Here is the complete code for the REPL.
The online demo is rather slow to load and to run, and works only on Firefox and Chrome. Even though I managed to put it together somehow, I don’t actually know how it exactly works, and I’m unable to fix the issues with it.↩︎
Lisps are awesome and I absolutely recommend creating one or more of them as an amateur PL implementer. Some resources I recommend are: the Build Your Own Lisp book, and the Make-A-Lisp tutorial.↩︎
REPLs are wonderful for doing interactive and exploratory programming where you try out small snippets of code in the REPL, and put your program together piece-by-piece. They are also good for debugging because they let you inspect the state of running programs from within. I still fondly remember the experience of connecting (or jacking in) to running productions systems written in Clojure over REPL, and figuring out issues by dumping variables.↩︎
We don’t even need let. We can, and have to, define variables by creating functions, with parameters serving the role of variables. In fact, we can’t even assign or reassign variables. Functions are the only scoping mechanism in FiboLisp, much like old-school JavaScript with its IIFEs.↩︎
You may be wondering about why we need the NFData instances for the errors and values. This will become clear when we write the REPL.↩︎
I recommend the sexp-grammar library, which provides both parsing and printing facilities for S-expressions based languages. Or you can write something by yourself using the parsing and pretty-printing libraries like megaparsec and prettyprinter.↩︎
We assume that our project’s Cabal file sets the default-language to GHC2021, and the default-extensions to LambdaCase, OverloadedStrings, RecordWildCards, and StrictData.↩︎
Recall that there is no way to define variables in FiboLisp.↩︎
If the interpreter allows mutually recursive function definitions, functions can be called before defining them.↩︎
We are using the basic-lens library here, which is the tiniest lens library, and provides only the five functions and types we see used here.↩︎
Using the function returned from getExternalPrint is not necessary in our case because the REPL blocks when it invokes the interpreter. That means, nothing but the interpreter can print anything while it is running. So the interpreter can actually print directly to stdout and nothing will go wrong.
However, imagine a case in which our code starts a background thread that needs to print to the REPL. In such case, we must use the Haskeline provided print function instead of printing directly. When printing to the REPL using it, Haskeline coordinates the prints so that the output in the terminal is not garbled.↩︎
Now we see why we deriveNFData instances for errors and Value.↩︎
Returned value could be of type void with no textual representation, in which case we would not print it.↩︎
I wrote the original REPL code almost three years ago. I refactored, rewrote and improved a lot of it in the course of writing this post. As they say, writing is thinking.↩︎
Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional
logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself
often taken as the foundation of mathematics, is built. When classical FOL was being established
there was a lot of study and debate around alternative options. There are a variety of philosophical
and metatheoretic reasons supporting classical FOL as The Right Choice.
This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s
eyes, and when type theory was taking its first stumbling steps.
My focus in this article is on what classical FOL looks like to a modern categorical logician.
This can be neatly summarized as “classical FOL is the internal logic of
a Boolean First-Order Hyperdoctrine.
Each of the three words in this term,”Boolean”, “First-Order”, and “Hyperdoctrine”, suggest
a distinct axis in which to vary the (class of categorical models of the) logic. All of them
have compelling categorical motivations to be varied.
Boolean
The first and simplest is the term “Boolean”. This is what differentiates the categorical semantics
of classical (first-order) logic from constructive (first-order) logic. Considering arbitrary
first-order hyperdoctrines would give us a form of intuitionistic first-order logic.
It is fairly rare that the categories categorists are interested in are Boolean. For example, most
toposes, all of which give rise to first-order hyperdoctrines, are not Boolean. The assumption that they
are tends to correspond to a kind of “discreteness” that’s often at odds with the purpose of the
topos. For example, a category of sheaves on a topological space is Boolean if and only if that space is
a Stone space. These are certainly interesting spaces,
but they are also totally disconnected unlike virtually every non-discrete topological space one
would typically mention.
First-Order
The next term is the term “first-order”. As the name suggests, a first-order hyperdoctrine has the
necessary structure to interpret first-order logic. The question, then, is what kind of categories
have this structure and only this structure. The answer, as far as I’m aware, is not many.
Many (classes of) categories have the structure to be first-order hyperdoctrines, but often they have
additional structure as well that it seems odd to ignore. The most notable and interesting example is
toposes. All elementary toposes (which includes all Grothendieck toposes) have the structure to give
rise to a first-order hyperdoctrine. But, famously, they also have the structure to give rise to a
higher order logic. Even more interesting, while Grothendieck toposes, being elementary toposes, technically
do support the necessary structure for first-order logic, the natural morphisms of Grothendieck
toposes, geometric morphisms, do not preserve that structure, unlike the logical functors between
elementary toposes.
The natural internal logic for Grothendieck toposes turns out to
be geometric logic. This is a logic that lacks
universal quantification and implication (and thus negation) but does have infinitary disjunction.
This leads to a logic that is, at least superficially, incomparable to first-order logic. Closely
related logics are regular logic and coherent logic which are sub-logics of both geometric
logic and first-order logic.
We see, then, just from the examples of the natural logics of toposes, none of them are first-order
logic, and we get examples that are more powerful, less powerful, and incomparable to first-order
logic. Other common classes of categories give other natural logics, such as the cartesian logic
from left exact categories, and monoidal categories give rise to (ordered) linear logics.
We get the simply typed lambda calculus from cartesian closed categories which leads
to the next topic.
Hyperdoctrine
A (posetal) hyperdoctrine essentially takes a category and, for each object in that category, assigns to it
a poset of “predicates” on that object. In many cases, this takes the form of the Sub functor assigning
to each object its poset of subobjects. Various versions of hyperdoctrines will require additional structure
on the source category, these posets, and/or the functor itself to interpret various logical connectives.
For example, a regular hyperdoctrine requires the
source category to have finite limits, the posets to be meet-semilattices, and the functor to give rise
to monotonic functions with left adjoints satisfying certain properties. This notion of hyperdoctrines is
suitable for regular logic.
It’s very easy to recognize that these functors are essentially indexed|(0,1)|-categories. This immediately
suggests that we should consider higher categorical versions or at the very least normal indexed categories.
What this means for the logic is that we move from proof-irrelevant logic to proof-relevant logic. We now
have potentially multiple ways a “predicate” could “entail” another “predicate”. We can
present the simply typed lambda calculus in this indexed category manner.
This naturally leads/connects to the categorical semantics of type theories.
Pushing forward to |(\infty, 1)|-categories is also fairly natural, as it’s natural to want to
talk about an entailment holding for distinct but “equivalent” reasons.
Summary
Moving in all three of these directions simultaneously leads pretty naturally to something like
Homotopy Type Theory (HoTT). HoTT is a naturally constructive (but not anti-classical) type theory
aimed at being an internal language for |(\infty, 1)|-toposes.
Why Classical FOL?
Okay, so why did people pick classical FOL in the first place? It’s not like the concept of, say, a
higher-order logic wasn’t considered at the time.
Classical versus Intuitionistic was debated at the time, but at that time it was primarily a philosophical
argument, and the defense of Intuitionism was not very compelling (to me and obviously people at the time).
The focus would probably have been more on (classical) FOL versus second- (or higher-)order logic.
Oversimplifying, the issue with second-order logic is fairly evident from the semantics. There are two
main approaches: Henkin-semantics and full (or standard) semantics. Henkin-semantics keeps the nice
properties of (classical) FOL but fails to get the nice properties, namely categoricity properties,
of second-order logic. This isn’t surprising as Henkin-semantics can be encoded into first-order logic.
It’s essentially syntactic sugar. Full semantics, however, states that the interpretation of predicate
sorts is power sets of (cartesian products of) the domain1. This leads to massive completeness problems as our metalogical set theory has many, many
ways of building subsets of the domain. There are metatheoretic results
that state that there is no computable set of logical axioms that would give us a sound and
complete theory for second-order logic with respect to full semantics. This aspect is also philosophically
problematic, because we don’t want to need set theory to understand the very formulation of set theory.
Thus Quine’s comment that “second-order logic [was] set theory in sheep’s clothing”.
On the more positive and (meta-)mathematical side, we have results like
Lindström’s theorem which states that
classical FOL is the strongest logic that simultaneously satisfies (downward) Löwenheim-Skolem
and compactness. There’s also a syntactic
result by Lindström which characterizes first-order logic as the only logic having a recursively
enumerable set of tautologies and satisfying Löwenheim-Skolem2.
The Catch
There’s one big caveat to the above. All of the above results are formulated in traditional model
theory which means there are various assumptions built in to their statements. In the language of
categorical logic, these assumptions can basically be summed up in the statement that the only
category of semantics that traditional model theory considers is Set.
This is an utterly bizarre thing to do from the standpoint of categorical logic.
The issues with full semantics follow directly from this choice. If, as categorical logic would
have us do, we considered every category with sufficient structure as a potential category of
semantics, then our theory would not be forced to follow every nook and cranny of Set’s
notion of subset to be complete. Valid formulas would need to be true not only in Set but
in wildly different categories, e.g. every (Boolean) topos.
These traditional results are also often very specific to classical FOL. Dropping this constraint
of classical logic would lead to an even broader class of models.
Categorical Perspective on Classical First-Order Logic
A Boolean category is just a coherent category
where every object has a complement. Since coherent functors
preserve complements, we have that the category of Boolean categories is a full subcategory of the
category of coherent categories.
One nice thing about, specifically, classical first-order logic from the perspective of category
theory is the following. First, coherent logic
is a sub-logic of geometric logic restricted to finitary disjunction. Via Morleyization,
we can encode classical first-order logic into coherent logic such that the
categories of models of each are equivalent. This implies that a classical FOL formula is
valid if and only if its encoding is. Morleyization allows us to analyze classical FOL using
the tools of classifying toposes. On the one hand, this once again suggests the importance
of coherent logic, but it also means that we can use categorical tools with classical FOL.
Conclusion
There are certain things that I and, I believe, most logicians take as table stakes for a
(foundational) logic3.
For example, checking a proof should be computably decidable. For these reasons, I am in
complete accord with early (formal) logicians that classical second-order logic with full semantics
is an unacceptably worse alternative to classical first-order logic.
However, when it comes to statements about the specialness of FOL, a lot of them seem to be more
statements about traditional model theory than FOL itself, and also statements about the philosophical
predilections of the time. I feel that philosophical attitudes among logicians and mathematicians
have shifted a decent amount since the beginning of the 20th century. We have different philosophical
predilections today than then, but they are informed by another hundred years of thought, and they are
more relevant to what is being done today.
Martin-Löf type theory (MLTT) and its progeny also present an alternative path with their own philosophical
and metalogical justifications. I mention this to point out actual cases of foundational frameworks
that a (very) superficial reading of traditional model theory results would seem to have been “ruled
out”. Even if one thinks the FOL+ZFC (or whatever) is the better foundations, I think it is unreasonable
to assert that MLTT derivatives are unworkable as a foundations.
It’s worth mentioning that this is exactly
what categorical logic would suggest: our syntactic power objects should be mapped to semantic power
objects.↩︎
While nice, it’s not clear that
compactness and, especially, Löwenheim-Skolem are sacrosanct properties that we’d be unwilling
to do without. Lindström’s first theorem is thus a nice abstract characterization theorem for
classical FOL, but it doesn’t shut the door on considering alternatives even in the context of
traditional model theory.↩︎
I’m totally fine thinking about logics that lack these properties, but
I would never put any of them forward as an acceptable foundational logic.↩︎
At Tweag, we are constantly striving to improve the developer experience by contributing tools and utilities that streamline workflows.
We recently completed a project with IMAX, where we learned that they had developed a way to simplify and optimize the process of integrating Google Cloud Storage (GCS) with Bazel. Seeing value in this tool for the broader community, we decided to publish it together under an open source license. In this blog post, we’ll dive into the features, installation, and usage of rules_gcs, and how it provides you with access to private resources.
What is rules_gcs?
rules_gcs is a Bazel ruleset that facilitates the downloading of files from Google Cloud Storage. It is designed to be a drop-in replacement for Bazel’s http_file and http_archive rules, with features that make it particularly suited for GCS. With rules_gcs, you can efficiently fetch large amounts of data, leverage Bazel’s repository cache, and handle private GCS buckets with ease.
Key Features
Drop-in Replacement: rules_gcs provides gcs_file and gcs_archive rules that can directly replace http_file and http_archive. They take a gs://bucket_name/object_name URL and internally translate this to an HTTPS URL. This makes it easy to transition to GCS-specific rules without major changes to your existing Bazel setup.
Lazy Fetching with gcs_bucket: For projects that require downloading multiple objects from a GCS bucket, rules_gcs includes a gcs_bucket module extension. This feature allows for lazy fetching, meaning objects are only downloaded as needed, which can save time and bandwidth, especially in large-scale projects.
Private Bucket Support: Accessing private GCS buckets is seamlessly handled by rules_gcs. The ruleset supports credential management through a credential helper, ensuring secure access without the need to hardcode credentials or use gsutil for downloading.
Bazel’s Downloader Integration: rules_gcs uses Bazel’s built-in downloader and repository cache, optimizing the download process and ensuring that files are cached efficiently across builds, even across multiple Bazel workspaces on your local machine.
Small footprint: Apart from the gcloud CLI tool (for obtaining authentication tokens), rules_gcs requires no additional dependencies or Bazel modules. This minimalistic approach reduces setup complexity and potential conflicts with other tools.
Understanding Bazel Repositories and Efficient Object Fetching with rules_gcs
Before we dive into the specifics of rules_gcs, it’s important to understand some key concepts about Bazel repositories and repository rules, as well as the challenges of efficiently managing large collections of objects from a Google Cloud Storage (GCS) bucket.
Bazel Repositories and Repository Rules
In Bazel, external dependencies are managed using repositories, which are declared in your WORKSPACE or MODULE.bazel file. Each repository corresponds to a package of code, binaries, or other resources that Bazel fetches and makes available for your build. Repository rules, such as http_archive or git_repository, and module extensions define how Bazel should download and prepare these external dependencies.
However, when dealing with a large number of objects, such as files stored in a GCS bucket, using a single repository to download all objects can be highly inefficient. This is because Bazel’s repository rules typically operate in an “eager” manner—they fetch all the specified files as soon as any target of the repository is needed. For large buckets, this means downloading potentially gigabytes of data even if only a few files are actually needed for the build. This eager fetching can lead to unnecessary network usage, increased build times, and larger disk footprints.
The rules_gcs Approach: Lazy Fetching with a Hub Repository
rules_gcs addresses this inefficiency by introducing a more granular approach to downloading objects from GCS. Instead of downloading all objects at once into a single repository, rules_gcs uses a module extension that creates a “hub” repository, which then manages individual sub-repositories for each GCS object.
How It Works
Hub Repository: The hub repository acts as a central point of reference, containing metadata about the individual GCS objects. This follows the “hub-and-spoke” paradigm with a central repository (the bucket) containing references to a large number of small repositories for each object. This architecture is commonly used by Bazel module extensions to manage dependencies for different language ecosystems (including Python and Rust).
Individual Repositories per GCS Object: For each GCS object specified in the lockfile, rules_gcs creates a separate repository using the gcs_file rule. This allows Bazel to fetch each object lazily—downloading only the files that are actually needed for the current build.
Methods of Fetching: Users can choose between different methods in the gcs_bucket module extension. The default method of creating symlinks is efficient while preserving the file structure set in the lockfile. If you need to access objects as regular files, choose one of the other methods.
Symlink: Creates a symlink from the hub repo pointing to a file in its object repo, ensuring the object repo and symlink pointing to it are created only when the file is accessed.
Alias: Similar to symlink, but uses Bazel’s aliasing mechanism to reference the file. No files are created in the hub repo.
Copy: Creates a copy of a file in the hub repo when accessed.
Eager: Downloads all specified objects upfront into a single repository.
This modular approach is particularly beneficial for large-scale projects where only a subset of the data is needed for most builds. By fetching objects lazily, rules_gcs minimizes unnecessary data transfer and reduces build times.
Integrating with Bazel’s Credential Helper Protocol
Another critical aspect of rules_gcs is its seamless integration with Bazel’s credential management system. Accessing private GCS buckets securely requires proper authentication, and Bazel uses a credential helper protocol to handle this.
How Bazel’s Credential Helper Protocol Works
Bazel’s credential helper protocol is a mechanism that allows Bazel to fetch authentication credentials dynamically when accessing private resources, such as a GCS bucket. The protocol is designed to be simple and secure, ensuring that credentials are only used when necessary and are never hardcoded into build files.
When Bazel’s downloader prepares a request and a credential helper was configured, it invokes the credential helper with the command get. Additionally, the request URI is passed to the helpers standard input encoded as JSON.
The helper is expected to return a JSON object containing HTTP headers, including the necessary Authorization token, which Bazel will then include in its requests.
Here’s a breakdown of how the credential_helper script used in rules_gcs works:
Authentication Token Retrieval: The script uses the gcloud CLI tool to obtain an access token via gcloud auth application-default print-access-token. This token is tied to the user’s current authentication context and can be used to fetch any objects the user is allowed to access.
Output Format: The script outputs the token in a JSON format that Bazel can directly use:
{"headers":{"Authorization":["Bearer ${TOKEN}"]}}
This JSON object includes the Authorization header, which Bazel uses to authenticate its requests to the GCS bucket.
Integration with Bazel: To use this credential helper, you need to configure Bazel by specifying the helper in the .bazelrc file:
common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper
This line tells Bazel to use the specified credential_helper script whenever it needs to access resources from storage.googleapis.com. If a request returns an error code or unexpected content, credentials are invalidated and the helper is invoked again.
How rules_gcs Hooks Into the Credential Helper Protocol
rules_gcs leverages this credential helper protocol to manage access to private GCS buckets securely and efficiently. By providing a pre-configured credential helper script, rules_gcs ensures that users can easily set up secure access without needing to manage tokens or authentication details manually.
Moreover, by limiting the scope of the credential helper to the GCS domain (storage.googleapis.com), rules_gcs reduces the risk of credentials being misused or accidentally exposed. The helper script is designed to be lightweight, relying on existing gcloud credentials, and integrates seamlessly into the Bazel build process.
Installing rules_gcs
Adding rules_gcs to your Bazel project is straightforward. The latest version is available on the Bazel Central Registry. To install, simply add the following to your MODULE.bazel file:
bazel_dep(name ="rules_gcs", version ="1.0.0")
You will also need to include the credential helper script in your repository:
Next, configure Bazel to use the credential helper by adding the following lines to your .bazelrc:
common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper
# optional setting to make rules_gcs more efficient
common --experimental_repository_cache_hardlinks
These settings ensure that Bazel uses the credential helper specifically for GCS requests. Additionally, the setting --experimental_repository_cache_hardlinks allows Bazel to hardlink files from the repository cache instead of copying them into a repository. This saves time and storage space, but requires the repository cache to be located on the same filesystem as the output base.
Using rules_gcs in Your Project
rules_gcs provides three primary rules: gcs_bucket, gcs_file, and gcs_archive. Here’s a quick overview of how to use each:
gcs_bucket: When dealing with multiple files from a GCS bucket, the gcs_bucketmodule extension offers a powerful and efficient way to manage these dependencies. You define the objects in a JSON lockfile, and gcs_bucket handles the rest.
gcs_bucket = use_extension("@rules_gcs//gcs:extensions.bzl","gcs_bucket")
gcs_bucket.from_file(
name ="trainingdata",
bucket ="my_org_assets",
lockfile ="@//:gcs_lock.json",)
gcs_file: Use this rule to download a single file from GCS. It’s particularly useful for pulling in assets or binaries needed during your build or test processes. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).
gcs_file = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl","gcs_file")
gcs_file(
name ="my_testdata",
url ="gs://my_org_assets/testdata.bin",
sha256 ="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",)
gcs_archive: This rule downloads and extracts an archive from GCS, making it ideal for pulling in entire repositories or libraries that your project depends on. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).
rules_gcs is a versatile and simple solution for integrating Google Cloud Storage with Bazel. We invite you to try out rules_gcs in your projects and contribute to its development. As always, we welcome feedback and look forward to seeing how this tool enhances your workflows. Check out the full example to get started!
Thanks to IMAX for sharing their initial implementation of rules_gcs and allowing us to publish the code under an open source license.
How is this not illegal??? Cards Against Humanity is PAYING people who didn't vote in 2020 to apologize, make a voting plan, and post #DonaldTrumpIsAHumanToilet—up to $100 for blue-leaning people in swing states. I helped by getting a 2024 Election Pack: checkout.giveashit.lol. Spotted via BoingBoing. More info at The Register. (Only American citizens and residents can participate. If, like me, you are an American citizen but non-resident, you will need a VPN.)
Tensor libraries like PyTorch and JAX have developed compact and accelerated APIs for manipulating n-dimensional arrays. N-dimensional arrays are kind of similar to tables in database, and this results in the logical question which is could you setup a Tensor-like API to do queries on databases that would be normally done with SQL? We have two challenges:
Tensor computation is typically uniform and data-independent. But SQL relational queries are almost entirely about filtering and joining data in a data-dependent way.
JOINs in SQL can be thought of as performing outer joins, which is not a very common operation in tensor computation.
However, we have a secret weapon: first class dimensions were primarily designed to as a new frontend syntax that made it easy to express einsum, batching and tensor indexing expressions. They might be good for SQL too.
Representing the database. First, how do we represent a database? A simple model following columnar database is to have every column be a distinct 1D tensor, where all columns part of the same table have a consistent indexing scheme. For simplicity, we'll assume that we support rich dtypes for the tensors (e.g., so I can have a tensor of strings). So if we consider our classic customer database of (id, name, email), we would represent this as:
Where C is the number of the entries in the customer database. Our tensor type is written as dtype[DIM0, DIM1, ...], where I reuse the name that I will use for the first class dimension that represents it. Let's suppose that the index into C does not coincide with id (which is good, because if they did coincide, you would have a very bad time if you ever wanted to delete an entry from the database!)
This gives us an opportunity for baby's first query: let's implement this query:
SELECT c.name, c.email FROM customers c WHERE c.id = 1000
Notice that the result of this operation is data-dependent: it may be zero or one depending on if the id is in the database. Here is a naive implementation in standard PyTorch:
Here, we use boolean masking to perform the data-dependent filtering operation. This implementation in eager is a bit inefficient; we materialize a full boolean mask that is then fed into the subsequent operations; you would prefer for a compiler to fuse the masking and indexing together. First class dimensions don't really help with this example, but we need to introduce some new extensions to first class dimensions. First, what we can do:
Here, a tensor with first class tensors has a more complicated type {DIM0, DIM1, ...} => dtype[DIM2, DIM3, ...]. The first class dimensions are all reported in the curly braces to the left of the double arrow; curly braces are used to emphasize the fact that first class dimensions are unordered.
What next? The problem is that now we want to do something like torch.where(c_mask, c_name, ???) but we are now in a bit of trouble, because we don't want anything in the false branch of where: we want to provide something like "null" and collapse the tensor to a smaller number of elements, much like how boolean masking did it without first class dimensions. To express this, we'll introduce a binary version of torch.where that does exactly this, as well as returning the newly allocated FCD for the new, data-dependent dimension:
Notice that torch.where introduces a new first-class dimension. I've chosen that this FCD gets memoized with c_mask, so whenever we do more torch.where invocations we still get consistently the same new FCD.
Having to type out all the columns can be a bit tiresome. If we assume all elements in a table have the same dtype (let's call it dyn, short for dynamic type), we can more compactly represent the table as a 2D tensor, where the first dimension is the indexing as before, and the second dimension is the columns of the database. For clarity, we'll support using the string name of the column as a shorthand for the numeric index of the column. If the tensor is contiguous, this gives a more traditional row-wise database. The new database can be conveniently manipulated with FCDs, as we can handle all of the columns at once instead of typing them out individually):
We'll use this for the rest of the post, but the examples should be interconvertible.
Aggregation. What's the average age of all customers, grouped by the country they live in?
SELECT AVG(c.age) FROM customers c GROUP BY c.country;
PyTorch doesn't natively support this grouping operation, but essentially what is desired here is a conversion into a nested tensor, where the jagged dimension is the country (each of which will have a varying number of countries). Let's hallucinate a torch.groupby analogous to its Pandas equivalent:
Here, I gave the generic indexing dimension the name JC, to emphasize that it is a jagged dimension. But everything proceeds like we expect: after we've grouped the tensor and rebound its first class dimensions, we can take the field of interest and explicitly specify a reduction on the dimension we care about.
In SQL, aggregations have to operate over the entirety of groups specified by GROUP BY. However, because FCDs explicitly specify what dimensions we are reducing over, we can potentially decompose a reduction into a series of successive reductions on different columns, without having to specify subqueries to progressively perform the reductions we are interested in.
Joins. Given an order table, join it with the customer referenced by the customer id:
SELECT o.id, c.name, c.email FROM orders o JOIN customers c ON o.customer_id = c.id
First class dimensions are great at doing outer products (although, like with filtering, it will expensively materialize the entire outer product naively!)
What's the point. There are a few reasons why we might be interested in the correspondence here. First, we might be interested in applying SQL ideas to the Tensor world: a lot of things people want to do in preprocessing are similar to what you do in traditional relational databases, and SQL can teach us what optimizations and what use cases we should think about. Second, we might be interested in applying Tensor ideas to the SQL world: in particular, I think first class dimensions are a really intuitive frontend for SQL which can be implemented entirely embedded in Python without necessitating the creation of a dedicated DSL. Also, this might be the push needed to get TensorDict into core.
tl;dr: a fix to the MonadRandom package may cause fromListMay
and related functions to extremely rarely output different results than
they used to. This could only possibly affect anyone who is using
fixed seed(s) to generate random values and is depending on the
specific values being produced, e.g. a unit test where you use a
specific seed and test that you get a specific result. Do you think
this should be a major or minor version bump?
The Fix
Since 2013 I have been the maintainer of
MonadRandom,
which defines a monad and monad transformer for generating random
values, along with a number of related utilities.
Recently, Toni Dietze pointed out a rare
situation that could cause the fromListMay function to
crash (as well as
the other functions which depend on it: fromList, weighted,
weightedMay, uniform, and uniformMay). This function is
supposed to draw a weighted random sample from a list of values
decorated with weights. I’m not going to explain the details of the
issue here; suffice it to say that it has to do with conversions
between Rational (the type of the weights) and Double (the type
that was being used internally for generating random numbers).
Even though this could only happen in rare and/or strange
circumstances, fixing it definitely seemed like the right thing to
do. After a bit of discussion, Toni came up with a good suggestion
for a fix: we should no longer use Double internally for generating
random numbers, but rather Word64, which avoids conversion and
rounding issues.
In fact, Word64 is already used internally in the generation of
random Double values, so we can emulate the behavior of the Double
instance (which was slightly
tricky
to figure out)
so that we make exactly the same random choices as before, but without
actually converting to Double.
The Change
…well, not exactly the same random choices as before, and therein
lies the rub! If fromListMay happens to pick a random value which
is extremely close to a boundary between choices, it’s possible that
the value will fall on one side of the boundary when using exact
calculations with Word64 and Rational, whereas before it would
have fallen on the other side of the boundary after converting to
Double due to rounding. In other words, it will output the
same results almost all the time, but for a list of \(n\) weighted
choices there is something like an \(n/2^{64}\) chance (or less) that
any given random choice will be different from what it used to be. I
have never observed this happening in my tests, and indeed, I do not
expect to ever observe it! If we generated one billion random samples
per second continuously for a thousand years, we might expect to see
it happen once or twice. I am not even sure how to engineer a test
scenario to force it to happen, because we would have to pick an
initial PRNG seed that forces a certain Word64 value to be
generated.
To PVP or not to PVP?
Technically, a function exported by MonadRandom has changed
behavior, so according to the Haskell PVP
specification this should be a major
version bump (i.e.0.6 to 0.7).Actually, I am not even
100% clear on this. The decision
tree on the PVP page says
that changing the behavior of an exported function necessitates a
major version bump; but the actual
specification does not
refer to behavior at all—as I read it, it is exclusively concerned
with API compatibility, i.e. whether things will still compile.
But
there seem to be some good arguments for doing just a minor version
bump (i.e.0.6 to 0.6.1).
What exactly constitutes the “behavior” of a function to
generate random values? It depends on your point of view. If
we view the function as a pure mathematical function which
takes a PRNG state as input and produces some value as
output, then its behavior is defined precisely by which outputs
it returns for which input seeds, and its behavior has changed.
However, if we think of it in more effectful terms, we could say
its “behavior” is just to output random values according to a
certain distribution, in which case its behavior has not
changed.
It’s extremely unlikely that this change will cause any
breakage; moreover, as argued by Boyd Stephen Smith, anyone who cares enough about
reproducibility to be relying on specific outputs for specific
seeds is probably already pinning all their package versions.
Arguments in favor of a major version bump:
It’s what the PVP specifies; what’s the point of having a
specification if we don’t follow it?
In the unlikely event that this change does cause any
breakage, it could be extremely difficult for package
maintainers to track down. If the behavior of a random
generation function completely changes, the source of the issue
is obvious. But if it only changes for very rare inputs, you
might reasonably think the problem is something else. A major
version bump will force maintainers to read the changelog for
MonadRandom and assess whether this is a change that could
possibly affect them.
So, do you have opinions on this? Would the release affect you one
way or the other? Feel free to leave a comment here, or send me an
email with your thoughts. Note there has already been a bit of discussion on
Mastodon as well.
<noscript>Javascript needs to be activated to view comments.</noscript>
The act of trading means that both sides give up one good for something they value more. When I go to the supermarket, I’m giving the supermarket dollars (or euros, or shekels) in exchange for food. I value the food more than the money. The supermarket values the money more than the food. Everyone walks away happy with a successful trade.
But we don’t normally talk about going to the supermarket and trading for food. We generally say we’re buying food. Buying is simply a trade where you give money. Similarly, the supermarket is selling food, where selling is a trade where you receive money.
Now let’s say I’m going on a trip to Europe and need some cash. I have US dollars, and I need Euros. Both of those are money. So am I buying Euros, or am I selling dollars? We generally use the term exchange in that case.
You may notice, all of these acts are really identical to trading, it’s just a matter of nomenclature. The terms we use represent how we view the assets at play.
Which brings me to the point of this post: buying Bitcoin.
Buying Bitcoin
I come from a fairly traditional, if very conservative, financial background. I was raised in a house that believes putting money in the stock market is essentially reckless gambling, and then my university education included a lot of economics and finance courses, which gave me a broader view. I’m still fairly conservative in my investments, and was very crypto-wary for a while. I care more about long-term security, not short term gains. Investing in Bitcoin seemed foolish.
At some point in the past 5 years, I changed my opinion on this slightly. I began to see Bitcoin as a prudent hedge against risks in other asset classes. From that world view, I began to buy Bitcoin. Dollars are the real money, and Bitcoin is the risk asset that I’m speculatively investing in and hoping for a return. Meaning: I ultimately intend to sell that Bitcoin for more dollars than I spent to get it. Much like I would treat stock.
As those 5 years have trudged along, I’ve become more confident in Bitcoin, and simultaneously less confident in fiat currency. Like many others, the rampant money printing and high levels of inflation have me worried about staking my future on fiat currencies. Investing in stocks would be the traditional inflation protection hedge, but I’m coming around more to a Bitcoin maxi-style belief that fixed total supply is the most important feature of anything we use for long term storage.
All of this led to the question that kicked off this blog post:
Am I buying Bitcoin, or selling dollars?
Remember that buying and selling are both the same thing as trading. There’s no difference between the act of buying Bitcoin with dollars, or selling dollars for Bitcoin. It’s just a difference in what you view as the real money. Most people in the world would consider the dollar to be the real money in the equation.
I have some background in Talmudic study, and one of the common phrases we use in studying Talmud is מאי נפקא מינה, pronounced “my nafka meena,” or “what is the practical difference between these two?” There’s no point having a pure debate about terminology. Is there any practical difference in how I relate to the world whether I’m buying Bitcoin or selling dollars? And after some thinking, I realized what it is.
Entering a trade
Forget Bitcoin entirely. I wake up one morning, and go to my brokerage account. I’ve got $50,000 in cash sitting there, waiting to be invested. Let’s say that represents half of my net worth. I start looking at the charts, doing some research, and I strongly believe that a company’s stock is undervalued and is about to go up significantly. What do I do?
Well, most likely I’m going to buy some of that stock. Am I going to put in the entire $50k? Probably not, I’m very risk averse, and I like to hedge my risks. Investing half my net worth in one stock, based on the price on one day, is too dangerous for my taste. (Others invest differently, and there’s certainly value in being more aggressive, just sharing my own views.) Buying the stock is called entering a trade.
Similarly, if two weeks later, that stock has gone up 20%, I’m sitting on a bunch of profits, and I hear some news that may negatively impact that stock, I may decide to sell the stock or exit the trade.
But let’s change things a bit. Let’s say I’m not that confident the stock will go up at the beginning of this story. Am I going to buy in? Probably not. For those familiar, this may sound like status-quo bias: the bias to stick to whatever we’re currently doing barring additional information. But I think there’s something more subtle going on here as well.
Let’s say I did buy the stock, it did go up 20%, and now I’m nervous it’s about to tank. I’m not confident at all, just a hunch. Depending on the strength of that hunch, I’m going to sell. My overall confidence threshold for buying in is much higher than selling out. And the reason for this is simple: risk. Overall, I view the dollar as the stable asset, and the stock as the risk asset.
By selling early, I risk losing out on further potential gains. Economically, that’s equivalent to losing money when you view things as opportunity costs. But the risks of losing value, to someone fiscally conservative and risk averse like me, outweigh the potential gains.
The price of Bitcoin, the price of the dollar
Alright, back to Bitcoin. My practical difference absolutely applies here. Let’s say (for simplicity of numbers) that the current price of Bitcoin is $50,000. I’m sitting on 1 BTC and $50,000 cash. I have three options:
Trade my dollars to get more Bitcoin
Do nothing
Trade my Bitcoin to get more dollars
But there’s a problem with this framing. By quoting the price of Bitcoin in dollars, I’ve already injected a bias into the analysis. I’m implicitly viewing dollars as money, and Bitcoin as the risk asset. We can equivalently view the current price as 0.00002 BTC per dollar. And, since playing with numbers like that is painful, we can talk about uBTC (micro-BTC, or a millionth of a Bitcoin) instead, and say the current price of a dollar is 20 uBTC.
(Side note: personally, I think the unit ksat, or thousand satoshis, or a one-hundred-thousandth of a Bitcoin, is a good unit for discussing prices, but I’ve never seen anyone else use it, so I’ll stick to uBTC.)
Anyway, let’s come back to the case in point. We have two different world views, and three different cases for each world view:
Bitcoin is priced at $50,000
I think the price will go up, so I should buy Bitcoin
I think the price will go down, so I should sell Bitcoin
I don’t know the direction the price will take
The dollar is priced at 20 uBTC
I think the price will go up, so I should buy dollars
I think the price will go down, so I should sell dollars
I don’t know the direction the price will take
You may notice that cases 1a and 2b are equivalent: the price of Bitcoin going up is the same as the price of the dollar going down. The same with cases 1b and 2a. And more obviously, cases 1c and 2c are the same: in both cases, I don’t know where I think the prices will go.
Risk-averse defaults
This is where risk aversion should come into play. Put simply: what is the least risky asset to hold? In our stock case, it was clearly the dollar. And if you asked me 5 years ago, I absolutely would have said holding onto dollars is far less risky than holding onto Bitcoin.
And this is where I think I begin down the path of the Bitcoin Maxi. I started seriously considering Bitcoin as an investment due to rampant money printing and inflation. It started as a simple hedge, throwing in yet another risky asset with others. But I’ve realized my viewpoint on the matter is changing over time. As many others have put it before me, fiat currency goes to 0 over time as more printing occurs. It’s not a question of “will the dollar lose value,” there’s a guarantee that the dollar will lose value over time, unless monetary policy is significantly altered. And there’s no reason to believe it will be.
I understand and completely respect the viewpoint that Bitcoin is imaginary internet money with no inherent value. I personally disagree, at least today, though it was my dominant view 5 years ago. Assuming sufficient people continue to believe Bitcoin is more than a ponzi scheme and is instead a scarce asset providing a true store of value with no long-term devaluation through money printing, Bitcoin will continue to go up, not down, over time.
In other words, as I stared at this argument, I came to a clear conclusion: my worldview is that the risk-averse asset to hold these days is Bitcoin, not dollars. But this bothered me even more.
Tzvei dinim
OK, I’m a full-on Bitcoin Maxi. I should liquidate all my existing investments and convert them to Bitcoin. Every time I get a paycheck, I should convert the full value into Bitcoin. I’ll never touch a dollar again. Right?
Well, no. Using my framework above, there’s no reason to avoid investing in stocks, fiat, metals, or anything else that you believe will go up in value. It’s a question of the safe default. But even so, I haven’t gone ahead with taking every dollar I have and buying up Bitcoin with it. I still leave my paycheck in dollars and only buy up some Bitcoin when I have a sufficient balance. This felt like cognitive dissonance to me, and I needed to figure out why I was behaving inconsistently!
And fortunately another Talmudic study philosophy came into play. Tzvei dinim is a Yiddish phrase that means “two laws,” and it indicates that two cases have different outcomes because the situations are different. And for me, the answer is that money (and investments in general) have two radically different purposes:
Short-term usage for living. This includes paying rent, buying groceries, and a rainy day fund. Depending on how risk-averse you are, that rainy day fund could be to cover 1 month of expenses while you look for another job, or years of savings in case your entire industry is destroyed by AI.
Long-term store of value.
What’s great about this breakdown is that I’ve lived my entire adult life knowing it, and I bet many of you have too! We’ve all heard phrases around the stock market like “don’t invest more than you can afford to lose.” The point of this is that the price of stocks can fluctuate significantly, and you don’t want to be forced to sell at a low point to cover grocery bills. Keep enough funds for short-term usage, and only invest what you have for long-term store of value.
This significantly assuaged my feelings of cognitive dissonance. And it allows me to answer my question above pretty well about whether I’d buy/sell Bitcoin or dollars:
Keep enough money in dollars to cover expected expenses in the near term
Invest money speculatively based on strong beliefs about where asset prices are heading
And beyond that, keep the rest of the money in Bitcoin, not dollars. Over time, the dollar will decrease in value, and Bitcoin will increase in value. I’d rather have my default exposure be to the asset that’s going up, not down.
Conclusion
Thanks for going on this journey with me. The point here isn’t to evangelize anything in particular. As I said, I understand and respect the hesitancy to buy into a new asset class. I’ve been working in the blockchain field for close to a decade now, and I've only recently come around to this way of thinking. And it’s entirely possible that I’m completely wrong, Bitcoin will turn out to be a complete scam asset and go to 0, and I’ll bemoan my stupid view of the world I’m sharing in this post. If so, please don’t point and laugh when you see me.
My point in this post is primarily to solidify my own viewpoint for myself. And since I do that best by writing up a blog post as a form of rubber ducking, I decided to do so. As I’m writing this, I still don’t know if I’ll even publish it!
And if I did end up publishing this and you’re reading it now, here’s my secondary point: helping others gain a new perspective. I think it’s always valuable to challenge your assumptions. If you’ve been looking at “cryptobros” as crazy investors hoping to make 10,000% returns on a GIF, I’m hoping this post gives you a different perspective of viewing Bitcoin as a better store of value than traditional assets. Feel free to disagree with me! But I hope you at least give the ideas some time to percolate.
Appendix 1: risk aversion
I’m sure plenty of people will read this and think I’m lying to myself. I claim to be risk averse, but I’m gambling on a new and relatively untested asset class. Putting money into the stock market is a far more well-established mechanism for providing inflation protection, and investing in indices like the S&P 500 provides good hedging of risks. So why would I buy into Bitcoin instead?
This is another contradiction that can be resolved by the tzvei dinim approach. You can evaluate risk either based on empirical data (meaning past performance), or by looking at fundamental principles and mechanisms. The stock market is demonstrably a good performer by empirical standards, delivering reliable returns.
Some people might try to claim that Bitcoin has the same track record: it’s gone up in value stupendously during its existence. I don’t actually believe that at all. Yes, Bitcoin has appreciated a lot, but the short time frame means I don’t really care about its track record, definitely not as much as I do the stock market’s.
Instead, when I look at Bitcoin, I’m more persuaded by the mechanism, which simply put is fixed supply. There will never be more than 21,000,000 BTC. If there was a hard fork of the network that started increasing that supply, I’d lose faith in Bitcoin completely and likely sell out of it. I’m a believer in the mechanism of a deflationary currency. And there is no better asset I can think of for fixed supply than Bitcoin. (Though gold comes very close… if people are interested, I may follow up later with a Bitcoin vs gold blog post.)
By contrast, the underlying mechanism for the stock market going up over time is less clear. Some of that is inherent by dint of money printing: more money being printed will flow into stocks, because that’s where people park their newly printed money. My main concern with the stock market is that most people aren’t following any fundamental valuation technique, and are instead treating it as a Ponzi scheme. Said differently, I want to analyze the value of a stock based on my expected future revenues from dividends (or some equivalent objective measure). Instead, stocks are mostly traded based on how much you think someone else will value it in the future.
My views on the stock market are somewhat extreme and colored by the extremely risk-averse viewpoint I received growing up. Others will likely disagree completely that the stock market is pure speculation. And they’d also probably laugh at the idea that Bitcoin has more inherent value than the way stocks are traded. It’s still my stance.
Appendix 2: cryptobros
I mentioned cryptobros above, and made a reference to NFTs. Before getting deeper into the space, I had–like many others–believed “Bitcoin” and “crypto” were more or less synonymous. True believers in Bitcoin, and I’m slowly coming to admit that I’m one of them, disagree completely. Bitcoin is a new monetary system based on fixed supply, no centralized control, censorship resistance, and pseudo-anonymity. Crypto in many of its forms is little more than get-rich-quick schemes.
I don’t believe that’s true across the board for all crypto assets. I do believe that was true for much of the NFT hype and for meme coins. Ethereum to me has intrinsic value, because the ability to have your financial transaction logged on the most secure blockchain in the world is valuable in its own right.
So just keep in mind, crypto does not necessarily mean the same thing as Bitcoin.
Appendix 3: drei dinim
I mentioned “tzvei dinim” above, meaning “two laws.” I want to introduce a drei dinim, meaning three laws. (And if I mistransliterated Yiddish, my apologies, I don’t actually speak the language at all.) I described short-term vs long-term above. In reality, I think there are really three different ideas at play:
Short-term money holding for expenses
Long-term store of value
Speculative investments because you think an asset will outperform the safe asset
My view is that, due to the inflationary nature of fiat currency, groups (2) and (3) have been unfairly lumped together for most people. Want to store value for the next 30 years? Don’t keep it in dollars, you better buy stocks! I don’t like that view of the world. The skill of choosing what to invest in is not universal, it requires work, and many people lose their shirts trying to buy into the right stock. (Side note, that’s why many people recommended investing in indices, specifically to avoid those kinds of concerns.)
I want a world where there’s an asset that retains its value over time, regardless of inflation and money printing. Bitcoin is designed to do just that. But if you really think a stock is going to go up 75% in a week, category (3) still gives plenty of room to do speculative investment, without violating the rest of the cognitive framework I’ve described.
Appendix 4: why specifically Bitcoin?
The arguments I’ve given above just argue for a currency that has a fixed maximum supply. You could argue decentralization is a necessary feature too, since it’s what guarantees the supply won’t be changed. So why is Bitcoin in particular the thing we go with? To go to the absurd, why doesn’t each person on the planet make their own coin (e.g. my Snoycoin) and use that as currency?
This isn’t just a theoretical idea. One of the strongest (IMO) arguments against Bitcoin is exactly this: anyone can create a new one, so the fixed supply is really just a lie. There’s an infinite supply of made-up internet money, even if each individual token may have a fixed supply.
To me, this comes down to the question of competition, as does virtually everything else in economics. Bitcoin is a direct competitor to the dollar. The dollar has strengths over Bitcoin: institutional support, clear regulatory framework, requirement for US citizens to pay taxes with dollars, requirement of US business to accept dollars for payment. Bitcoin is competing with the strengths I’ve described above.
I believe that, ultimately, the advantages of Bitcoin will continue to erode the strength of the dollar. That’s why I’m buying into it, literally and figuratively.
However, new coins don’t have the same competitive power. If I make Snoycoin, it’s worse in every way imaginable to Bitcoin. It simply won’t take off. And it shouldn’t, despite all the money I’d make from it.
There is an argument to be made that Ethereum is a better currency than Bitcoin, since it allows for execution of more complex smart contracts. I personally don’t see Ethereum (or other digital assets) dethroning Bitcoin as king of the hill any time soon.
Tracing back where it came from, that title was sent already broken by
Planet Haskell, which is itself a
feed aggregator for blogs.
The blog originally produces the good not broken title.
Therefore the blame lies with Planet Haskell.
It’s probably a misconfigured locale. Maybe someone will fix it.
It seems to be running archaic software on an old machine,
stuff I wouldn’t deal with myself so I won’t ask someone else to.
In any case, this mistake can be fixed after the fact. Mis-encoded text is such
an ubiquitous issue that there are nicely packaged solutions out there, like
ftfy.
ftfy has been used as a data processing step in major NLP research, including
OpenAI’s original GPT.
But my hobby site is written in OCaml and I would rather have fun solving
this encoding problem than figure out how to install a Python program and call
it from OCaml.
Explaining the problem
This is the typical situation where a program is assuming the wrong text encoding.
Text encodings
A quick summary for those who don’t know about text encodings.
Humans read and write sequences of characters,
while computers talk to each other using sequences of bytes.
If Alice writes a blog, and Bob wants to read it from across the world,
the characters that Alice writes must be encoded into bytes so
her computer can send it over the internet to Bob’s computer,
and Bob’s computer must decode those bytes to display them on his screen.
The mapping between sequences of characters and sequences of bytes
is called an encoding.
Multiple encodings are possible, but it’s not always obvious which encoding
to use to decode a given byte string.
There are good and bad reasons for this, but the net effect is that
many text-processing programs arbitrarily guess and assume the encoding in use,
and sometimes they assume wrong.
Back to the problem
UTF-8 is the most prevalent encoding nowadays.1 I’d be surprised if one of the
Planet Haskell blogs doesn’t use it, which is ironic considering the issue
we’re dealing with.
A blog using UTF-8 encodes the right single quote2 " ’ " as three consecutive bytes
(226, 128, 153) in its RSS or Atom feed.
The culprit, Planet Haskell, read those bytes but wrongly assumed an encoding
different from UTF-8 where each byte corresponds to one character.
It did some transformation to the decoded text
(extract the title and body and put it on a webpage with other blogs).
It encoded the final result in UTF-8.
The final encoding doesn’t really matter, as long as everyone else downstream
agrees with it.
The point is that Planet Haskell outputs three characters “’” in place of the
right single quote " ’ ", all because UTF-8 represents " ’ " with three bytes.
In spite of their differences, most encodings in practice agree at least about
ASCII characters, in the range 0-127, which is sufficient to contain the majority of
English language writing if you can compromise on details such as confusing the
apostrophe and the single quotes. That’s why in the title “What’s different
this time?” everything but one character was transferred fine.
Solving the problem
The fix is simple: replace “’” with " ’ ". Of course, we also want to do that
with all other characters that are mis-encoded the same way:
those are exactly all the non-ASCII Unicode characters.
The more general fix is to invert Planet Haskell’s decoding logic.
Thank the world that this mistake can be reversed to begin with.
If information had been lost by mis-encoding, I may have been forced to use one
of those dreadful LLMs to reconstruct titles.3
Decode Planet Haskell’s output in UTF-8.
Encode each character as a byte to recover the original output from the blog.
Decode the original output correctly, in UTF-8.
There is one missing detail: what encoding to use in step 2?
I first tried the naive thing: each character is canonically a Unicode code point,
which is a number between 0 and 1114111, and I just hoped that those which did
occur would fit in the range 0-255.
That amounts to making the hypothesis that Planet Haskell is decoding blog
posts in Latin-1.
That seems likely enough, but you will have guessed correctly that the
naive thing did not reconstruct the right single quote in this case.
The Latin-1 hypothesis was proven false.
As it turns out, the euro sign “€” and the trademark symbol “™” are not
in the Latin-1 alphabet. They are code points numbers 8364 and 8482 in Unicode,
which are not in the range 0-255.
Planet Haskell has to be using an encoding that features these two symbols.
I needed to find which one.
Faffing about, I came across the Wikipedia article on Western Latin character
sets which lists a comparison table. How convenient.
I looked up the two symbols to find what encoding had them, if any.
There were two candidates: Windows-1252 and Macintosh. Flip a coin.
It was Windows-1252.
Windows-1252 differs from Latin-1 (and thus Unicode) in 27 positions,
those whose byte starts with 8 or 9 in hexadecimal
(27 valid characters + 5 unused positions):
that’s 27 characters that I had to map manually to the range 0-255
according to the Windows-1252 encoding,
and the remaining characters would be mapped for free by Unicode.
This data entry task was autocompleted halfway through by Copilot,
because of course GPT-* knows Windows-1252 by heart.
let windows1252_hack (c : Uchar.t) : int =let c = Uchar.to_int c inif c = 0x20ACthen0x80elseif c = 0x201Athen0x82elseif c = 0x0192then0x83elseif c = 0x201Ethen0x84elseif c = 0x2026then0x85elseif c = 0x2020then0x86elseif c = 0x2021then0x87elseif c = 0x02C6then0x88elseif c = 0x2030then0x89elseif c = 0x0160then0x8Aelseif c = 0x2039then0x8Belseif c = 0x0152then0x8Celseif c = 0x017Dthen0x8Eelseif c = 0x2018then0x91elseif c = 0x2019then0x92elseif c = 0x201Cthen0x93elseif c = 0x201Dthen0x94elseif c = 0x2022then0x95elseif c = 0x2013then0x96elseif c = 0x2014then0x97elseif c = 0x02DCthen0x98elseif c = 0x2122then0x99elseif c = 0x0161then0x9Aelseif c = 0x203Athen0x9Belseif c = 0x0153then0x9Celseif c = 0x017Ethen0x9Eelseif c = 0x0178then0x9Felse c
And that’s how I restored the quotes, apostrophes,
guillemets, accents, et autres in my feed.
I have 4 children aged 4, 3, almost 2, and 19 weeks. Parents are increasingly isolated from each other socially so it's harder to compare tactics and strategies for caregiving. I want to share a run-down of how my wife and I care for our children and what has seemed to work and what has not.
In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures.
The problem it was aimed at solving was turning the amortized time complexity bounds of
batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of
updates to a data structure which may cause the performance of operations to degrade, but
occasionally we expensively rebuild the data structure back into an optimal arrangement.
If the updates don’t degrade performance too much before we rebuild, then we can achieve
our target time complexity bounds in an amortized sense. An update that doesn’t degrade
performance too much is called a weak update.
Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a
binary search tree where deletions occur by simply marking the deleted nodes as deleted.
Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced
binary search tree and clean out the nodes marked as deleted at that time. In this case,
the deletions count as weak updates because leaving the deleted nodes in the tree even
when it corresponds to up to half the tree can only mildly impact the time complexity of
other operations. Specifically, assuming the tree was balanced at the start, then deleting
half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts
are not weak updates as they can quickly increase the tree’s depth.
The idea of global rebuilding is relatively straightforward, though how you would actually
realize it in any particular example is not. The overall idea is simply that instead of
waiting until the last moment and then rebuilding the data structure all at once, we’ll start
the rebuild sooner and work at it incrementally as we perform other operations. If we
update the new version faster than we update the original version, we’ll finish it by the
time we would have wanted to perform a batched rebuild, and we can just switch to this new version.
More concretely, though still quite vaguely, global rebuilding involves, when a
threshold is reached, rebuilding by creating a new “empty” version of the data
structure called the shadow copy. The original version is the working copy. Work on
rebuilding happens incrementally as operations are performed on the data structure. During
this period, we service queries from the working copy and continue to update it as usual.
Each update needs to make more progress on building the shadow copy than it worsens the
working copy. For example, an insert should insert more nodes into the shadow copy than
the working copy. Once the shadow copy is built, we may still have more work to do to
incorporate changes that occurred after we started the rebuild. To this end, we can
maintain a queue of update operations performed on the working copy since the start of
a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again,
we need to apply the updates from the queue at a fast enough rate so that we will
eventually catch up. Of course, all of this needs to happen fast enough so that 1)
the working copy doesn’t get too degraded before the shadow copy is ready, and 2)
we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.
Coroutines
Okasaki passingly mentions that global rebuilding “can be usefully viewed as running
the rebuilding transformation as a coroutine”. Also, the situation described above is
quite reminiscent of garbage collection. There the classic half-space stop-the-world
copying collector is naturally the batched rebuilding version. More incremental versions
often have read or write barriers and break the garbage collection into incremental
steps. Garbage collection is also often viewed as two processes coroutining.
The goal of this article is to derive global rebuilding-based data structures from
an expression of them as two coroutining processes. Ideally, we should be able to
take a data structure implemented via batched rebuilding and simply run the batch
rebuilding step as a coroutine. Modifying the data structure’s operations and the
rebuilding step should, in theory, just be a matter of inserting appropriate yield
statements. Of course, it won’t be that easy since the batched version of rebuilding
doesn’t need to worry about concurrent updates to the original data structure.
In theory, such a representation would be a perfectly effective way of articulating
the global rebuilding version of the data structure. That said, I will be using the
standard power move of CPS transforming and defunctionalizing to get a more data
structure-like result.
I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with
continuations. In that context, a “process” written in continuation-passing style
“yields” to the scheduler by passing its continuation to a scheduling function.
Normally, the scheduler would place that continuation at the end of a work queue
and then pick up a continuation from the front of the work queue and invoke it
resuming the previously suspended “process”. In our case, we only have two
“processes” so our “work queue” can just be a single mutable cell. When one
“process” yields, it just swaps its continuation into the cell and the other
“process’” out and invokes the continuation it read.
Since the rebuilding process is always driven by the main process, the pattern
is a bit more like generators. This has the benefit that only the rebuilding
process needs to be written in continuation-passing style. The following is
a very quick and dirty set of functions for this.
process ::YieldFn->Int->IO () ->IO ()process _ 0 k = kprocess yield i k =doputStrLn$"Subprocess: "++show i yield $ process yield (i-1) kexample ::IO ()example =do resume <- spawn $ \yield -> process yield 10 forM_ [(1 ::Int) ..10] $ \i ->doputStrLn$"Main process: "++show i resumeputStrLn"Main process done"
with output:
Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done
Queues
I’ll use queues since they are very simple and Purely Functional Data Structures
describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global
rebuilding. We’ll end up with something quite similar which could be made more similar
by changing the rebuilding code. Indeed, the differences are just an artifact of
specific, easily changed details of the rebuilding coroutine, as we’ll see.
The examples I’ll present are mostly imperative, not purely functional. There
are two reasons for this. First, I’m not focused on purely functional data structures
and the technique works fine for imperative data structures. Second, it is arguably
more natural to talk about coroutines in an imperative context. In this case,
it’s easy to adapt the code to a purely functional version since it’s not much
more than a purely functional data structure stuck in an IORef.
For a more imperative structure with mutable linked structure and/or in-place
array updates, it would be more challenging to produce a purely functional
version. The techniques here could still be used, though there are more
“concurrency” concerns. While I don’t include the code here, I did a similar
exercise for a random-access stack (a fancy way of saying a growable array).
There the “concurrency” concern is that the elements you are copying to the
new array may be popped and potentially overwritten before you switch to the
new array. In this case, it’s easy to solve, since if the head pointer of
the live version reaches the source offset for copy, you can just switch to
the new array immediately.
Nevertheless, I can easily imagine scenarios where it may be beneficial, if
not necessary, for the coroutines to communicate more and/or for there to be
multiple “rebuild” processes. The approach used here could be easily adapted
to that. It’s also worth mentioning that even in simpler cases, non-constant-time
operations will either need to invoke resume multiple times or need more
coordination with the “rebuild” process to know when it can do more than a
constant amount of work. This could be accomplished by “rebuild” process
simply recognizing this from the data structure state, or some state could
be explicitly set to indicate this, or the techniques described earlier
could be used, e.g. a different process for non-constant-time operations.
The code below uses the extensions BangPatterns, RecordWildCards, and GADTs.
Batched Rebuilding Implementation
We start with the straightforward, amortized constant-time queues where
we push to a stack representing the back of the queue and pop from a stack
representing the front. When the front stack is empty, we need to expensively
reverse the back stack to make a new front stack.
I intentionally separate out the reverse step as an explicit rebuild function.
This step is where a modicum of thought is needed. We need to make the
rebuild step from the batched version incremental. This is straightforward,
if tedious, given the coroutine infrastructure. In this case, we incrementalize
the reverse by reimplementing reverse in CPS with some yield calls
inserted. Then we need to incrementalize append. Since we’re not waiting
until front is empty, we’re actually computing front ++ reverse back.
Incrementalizing append is hard, so we actually reverse front and then
use an incremental reverseAppend (which is basically what the incremental
reverse does anyway1).
One of first thing to note about this code is that the actual operations are
largely unchanged other than inserting calls to resume. In fact, dequeue
is even simpler than in the batched version as we can just assume that front
is always populated when the queue is not empty. dequeue is freed from the
responsibility of deciding when to trigger a rebuild. Most of the bulk of
this code is from reimplementing a reverseAppend function (twice).
The parts of this code that require some deeper though are 1) knowing when
a rebuild should begin, 2) knowing how “fast” the incremental operations
should go2
(e.g. incrementalReverse does two steps at a time and the
Hood-Melville implementation has an explicit exec2 that does two steps
at a time), and 3) dealing with “concurrent” changes.
For the last, Overmars describes a queue of deferred operations to perform
on the shadow copy once it finishes rebuilding. This kind of suggests a
situation where the “rebuild” process can reference some “snapshot” of
the data structure. In our case, that is the situation we’re in, since
our data structures are essentially immutable data structures in an IORef.
However, it can easily not be the case, e.g. the random-access stack.
Also, this operation queue approach can easily be inefficient and inelegant.
None of the implementations below will have this queue of deferred operations.
It is easier, more efficient, and more elegant to just not copy over parts of
the queue that have been dequeued, rather than have an extra phase of the
rebuilding that just pops off the elements of the front stack that we just
pushed. A similar situation happens for the random-access stack.
The use of drop could probably be easily eliminated. (I’m not even sure it’s
still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.
moduleGlobalRebuildingQueue ( Queue, new, dequeue, enqueue ) whereimportData.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )importCoroutine ( YieldFn, spawn )dataQueue a =Queue { resume ::IO (), frontRef ::IORef [a], backRef ::IORef [a], frontCountRef ::IORefInt, backCountRef ::IORefInt}new ::IO (Queue a)new =do frontRef <- newIORef [] backRef <- newIORef [] frontCountRef <- newIORef 0 backCountRef <- newIORef 0 resume <- spawn $const. rebuild frontRef backRef frontCountRef backCountRefreturnQueue { .. }dequeue ::Queue a ->IO (Maybe a)dequeue q =do resume q front <- readIORef (frontRef q)case front of [] ->returnNothing (x:front') ->do modifyIORef' (frontCountRef q) pred writeIORef (frontRef q) front'return (Just x)enqueue :: a ->Queue a ->IO ()enqueue x q =do modifyIORef (backRef q) (x:) modifyIORef' (backCountRef q) succ resume qrebuild ::IORef [a] ->IORef [a] ->IORefInt->IORefInt->YieldFn->IO ()rebuild frontRef backRef frontCountRef backCountRef yield =let k = go k in go k where go k =do frontCount <- readIORef frontCountRef backCount <- readIORef backCountRefif backCount > frontCount thendo back <- readIORef backRef front <- readIORef frontRef writeIORef backRef [] writeIORef backCountRef 0 incrementalReverse back [] $ \rback -> incrementalReverse front [] $ \rfront -> incrementalRevAppend rfront rback 0 backCount kelsedo yield k incrementalReverse [] acc k = k acc incrementalReverse [x] acc k = k (x:acc) incrementalReverse (x:y:xs) acc k = yield $ incrementalReverse xs (y:x:acc) k incrementalRevAppend [] front !movedCount backCount' k =do writeIORef frontRef front writeIORef frontCountRef $! movedCount + backCount' yield k incrementalRevAppend (x:rfront) acc !movedCount backCount' k =do currentFrontCount <- readIORef frontCountRefif currentFrontCount <= movedCount thendo-- This drop count should be bounded by a constant. writeIORef frontRef $!drop (movedCount - currentFrontCount) acc writeIORef frontCountRef $! currentFrontCount + backCount' yield kelseifnull rfront then incrementalRevAppend [] (x:acc) (movedCount +1) backCount' kelse yield $! incrementalRevAppend rfront (x:acc) (movedCount +1) backCount' k
Defunctionalized Global Rebuilding Implementation
This step is completely mechanical.
There’s arguably no reason to defunctionalize. It produces a result that
is more data-structure-like, but, unless you need the code to work in a
first-order language, there’s nothing really gained by doing this. It does
lead to a result that is more directly comparable to other implementations.
For some data structures, having the continuation be analyzable would
provide a simple means for the coroutines to communicate. The main process
could directly look at the continuation to determine its state, e.g. if
a rebuild is in-progress at all. The main process could also directly
manipulate the stored continutation to change the “rebuild” process’
behavior. That said, doing this would mean that we’re not deriving
the implementation. Still, the opportunity for additional optimizations
and simplifications is nice.
As a minor aside, while it is, of course, obvious from looking at the
previous version of the code, it’s neat how the Kont data type
implies that the call stack is bounded and that most calls are tail calls.
REVERSE_STEP is the only constructor that contains a Kont argument,
but its type means that that argument can’t itself be a REVERSE_STEP.
Again, I just find it neat how defunctionalization makes this concrete
and explicit.
moduleDefunctionalizedQueue ( Queue, new, dequeue, enqueue ) whereimportData.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )dataKont a r whereIDLE ::Kont a ()REVERSE_STEP :: [a] -> [a] ->Kont a [a] ->Kont a ()REVERSE_FRONT :: [a] ->!Int->Kont a [a]REV_APPEND_START :: [a] ->!Int->Kont a [a]REV_APPEND_STEP :: [a] -> [a] ->!Int->!Int->Kont a ()applyKont ::Queue a ->Kont a r -> r ->IO ()applyKont q IDLE _ = rebuildLoop qapplyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc kapplyKont q (REVERSE_FRONT front backCount) rback = incrementalReverse q front [] $REV_APPEND_START rback backCountapplyKont q (REV_APPEND_START rback backCount) rfront = incrementalRevAppend q rfront rback 0 backCountapplyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ = incrementalRevAppend q rfront acc movedCount backCountrebuildLoop ::Queue a ->IO ()rebuildLoop q@(Queue { .. }) =do frontCount <- readIORef frontCountRef backCount <- readIORef backCountRefif backCount > frontCount thendo back <- readIORef backRef front <- readIORef frontRef writeIORef backRef [] writeIORef backCountRef 0 incrementalReverse q back [] $REVERSE_FRONT front backCountelsedo writeIORef resumeRef IDLEincrementalReverse ::Queue a -> [a] -> [a] ->Kont a [a] ->IO ()incrementalReverse q [] acc k = applyKont q k accincrementalReverse q [x] acc k = applyKont q k (x:acc)incrementalReverse q (x:y:xs) acc k = writeIORef (resumeRef q) $REVERSE_STEP xs (y:x:acc) kincrementalRevAppend ::Queue a -> [a] -> [a] ->Int->Int->IO ()incrementalRevAppend (Queue { .. }) [] front !movedCount backCount' =do writeIORef frontRef front writeIORef frontCountRef $! movedCount + backCount' writeIORef resumeRef IDLEincrementalRevAppend q@(Queue { .. }) (x:rfront) acc !movedCount backCount' =do currentFrontCount <- readIORef frontCountRefif currentFrontCount <= movedCount thendo-- This drop count should be bounded by a constant. writeIORef frontRef $!drop (movedCount - currentFrontCount) acc writeIORef frontCountRef $! currentFrontCount + backCount' writeIORef resumeRef IDLEelseifnull rfront then incrementalRevAppend q [] (x:acc) (movedCount +1) backCount'else writeIORef resumeRef $!REV_APPEND_STEP rfront (x:acc) (movedCount +1) backCount'resume ::Queue a ->IO ()resume q =do kont <- readIORef (resumeRef q) applyKont q kont ()dataQueue a =Queue { resumeRef ::IORef (Kont a ()), frontRef ::IORef [a], backRef ::IORef [a], frontCountRef ::IORefInt, backCountRef ::IORefInt}new ::IO (Queue a)new =do frontRef <- newIORef [] backRef <- newIORef [] frontCountRef <- newIORef 0 backCountRef <- newIORef 0 resumeRef <- newIORef IDLEreturnQueue { .. }dequeue ::Queue a ->IO (Maybe a)dequeue q =do resume q front <- readIORef (frontRef q)case front of [] ->returnNothing (x:front') ->do modifyIORef' (frontCountRef q) pred writeIORef (frontRef q) front'return (Just x)enqueue :: a ->Queue a ->IO ()enqueue x q =do modifyIORef (backRef q) (x:) modifyIORef' (backCountRef q) succ resume q
Functional Defunctionalized Global Rebuilding Implementation
This is just a straightforward reorganization of the previous code into purely
functional code. This produces a persistent queue with worst-case constant
time operations.
It is, of course, far uglier and more ad-hoc than Okasaki’s
extremely elegant real-time queues, but the methodology to derive it was
simple-minded. The result is also quite similar to the Hood-Melville Queues
even though I did not set out to achieve that. That said, I’m pretty
confident you could derive pretty much exactly the Hood-Melville queues
with just minor modifications to Global Rebuilding Implementation.
moduleFunctionalQueue ( Queue, empty, dequeue, enqueue ) wheredataKont a r whereIDLE ::Kont a ()REVERSE_STEP :: [a] -> [a] ->Kont a [a] ->Kont a ()REVERSE_FRONT :: [a] ->!Int->Kont a [a]REV_APPEND_START :: [a] ->!Int->Kont a [a]REV_APPEND_STEP :: [a] -> [a] ->!Int->!Int->Kont a ()applyKont ::Queue a ->Kont a r -> r ->Queue aapplyKont q IDLE _ = rebuildLoop qapplyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc kapplyKont q (REVERSE_FRONT front backCount) rback = incrementalReverse q front [] $REV_APPEND_START rback backCountapplyKont q (REV_APPEND_START rback backCount) rfront = incrementalRevAppend q rfront rback 0 backCountapplyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ = incrementalRevAppend q rfront acc movedCount backCountrebuildLoop ::Queue a ->Queue arebuildLoop q@(Queue { .. }) =if backCount > frontCount thenlet q' = q { back = [], backCount =0 } in incrementalReverse q' back [] $REVERSE_FRONT front backCountelse q { resumeKont =IDLE }incrementalReverse ::Queue a -> [a] -> [a] ->Kont a [a] ->Queue aincrementalReverse q [] acc k = applyKont q k accincrementalReverse q [x] acc k = applyKont q k (x:acc)incrementalReverse q (x:y:xs) acc k = q { resumeKont =REVERSE_STEP xs (y:x:acc) k }incrementalRevAppend ::Queue a -> [a] -> [a] ->Int->Int->Queue aincrementalRevAppend q [] front' !movedCount backCount' = q { front = front', frontCount = movedCount + backCount', resumeKont =IDLE }incrementalRevAppend q (x:rfront) acc !movedCount backCount' =if frontCount q <= movedCount then-- This drop count should be bounded by a constant.let!front =drop (movedCount - frontCount q) acc in q { front = front, frontCount = frontCount q + backCount', resumeKont =IDLE }elseifnull rfront then incrementalRevAppend q [] (x:acc) (movedCount +1) backCount'else q { resumeKont =REV_APPEND_STEP rfront (x:acc) (movedCount +1) backCount' }resume ::Queue a ->Queue aresume q = applyKont q (resumeKont q) ()dataQueue a =Queue { resumeKont ::!(Kont a ()), front :: [a], back :: [a], frontCount ::!Int, backCount ::!Int}empty ::Queue aempty =Queue { resumeKont =IDLE, front = [], back = [], frontCount =0, backCount =0 }dequeue ::Queue a -> (Maybe a, Queue a)dequeue q =case front of [] -> (Nothing, q) (x:front') -> (Just x, q' { front = front', frontCount = frontCount -1 })where q'@(Queue { .. }) = resume qenqueue :: a ->Queue a ->Queue aenqueue x q@(Queue { .. }) = resume (q { back = x:back, backCount = backCount +1 })
Hood-Melville Implementation
This is just the Haskell code from Purely Functional Data Structures adapted
to the interface of the other examples.
This code is mostly to compare. The biggest difference, other than some code
structuring differences, is the front and back lists are reversed in parallel
while my code does them sequentially. As mentioned before, to get a structure
like that would simply be a matter of defining a parallel incremental reverse
back in the Global Rebuilding Implementation.
Again, Okasaki’s real-time queue that can be seen as an application of the
lazy rebuilding and scheduling techniques, described in his thesis and book,
is a better implementation than this in pretty much every way.
moduleHoodMelvilleQueue (Queue, empty, dequeue, enqueue) wheredataRotationState a=Idle|Reversing!Int [a] [a] [a] [a]|Appending!Int [a] [a]|Done [a]dataQueue a =Queue!Int [a] (RotationState a) !Int [a]exec ::RotationState a ->RotationState aexec (Reversing ok (x:f) f' (y:r) r') =Reversing (ok+1) f (x:f') r (y:r')exec (Reversing ok [] f' [y] r') =Appending ok f' (y:r')exec (Appending0 f' r') =Done r'exec (Appending ok (x:f') r') =Appending (ok-1) f' (x:r')exec state = stateinvalidate ::RotationState a ->RotationState ainvalidate (Reversing ok f f' r r') =Reversing (ok-1) f f' r r'invalidate (Appending0 f' (x:r')) =Done r'invalidate (Appending ok f' r') =Appending (ok-1) f' r'invalidate state = stateexec2 ::Int-> [a] ->RotationState a ->Int-> [a] ->Queue aexec2 !lenf f state lenr r =case exec (exec state) ofDone newf ->Queue lenf newf Idle lenr r newstate ->Queue lenf f newstate lenr rcheck ::Int-> [a] ->RotationState a ->Int-> [a] ->Queue acheck !lenf f state !lenr r =if lenr <= lenf then exec2 lenf f state lenr relselet newstate =Reversing0 f [] r []in exec2 (lenf+lenr) f newstate 0 []empty ::Queue aempty =Queue0 [] Idle0 []dequeue ::Queue a -> (Maybe a, Queue a)dequeue q@(Queue _ [] _ _ _) = (Nothing, q)dequeue (Queue lenf (x:f') state lenr r) =let!q' = check (lenf-1) f' (invalidate state) lenr r in (Just x, q')enqueue :: a ->Queue a ->Queue aenqueue x (Queue lenf f state lenr r) = check lenf f state (lenr+1) (x:r)
Okasaki’s Real-Time Queues
Just for completeness. This implementation crucially relies on lazy evaluation. Our queues are of
the form Queue f r s. If you look carefully, you’ll notice that the only place we consume s is
in the first clause of exec, and there we discard its elements. In other words, we only care about
the length of s. s gets “decremented” each time we enqueue until it’s empty at which point we
rotate r to f in the second clause of exec. The key thing is that f and s are initialized
to the same value in that clause. That means each time we “decrement” s we are also forcing a bit
of f. Forcing a bit of f/s means computing a bit of rotate. rotate xs ys a is an
incremental version of xs ++ reverse ys ++ a (where we use the invariant
length ys = 1 + length xs for the base case).
Using Okasaki’s terminology, rotate illustrates a simple form of lazy rebuilding where we use
lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we
interleave the evaluation of rotate with enqueue and dequeue via forcing the conses of
f/s. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is
amortized optimal). We need to use Okasaki’s other technique of scheduling to strategically
force the thunks incrementally rather than all at once. Here s is a schedule telling us when to
force parts of f. (As mentioned, s also serves as a counter telling us when to perform a
rebuild.)
moduleOkasakiQueue ( Queue, empty, dequeue, enqueue ) wheredataQueue a =Queue [a] ![a] [a]empty ::Queue aempty =Queue [] [] []dequeue ::Queue a -> (Maybe a, Queue a)dequeue q@(Queue [] _ _) = (Nothing, q)dequeue (Queue (x:f) r s) = (Just x, exec f r s)rotate :: [a] -> [a] -> [a] -> [a]rotate [] (y: _) a = y:arotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)exec :: [a] -> [a] -> [a] ->Queue aexec f !r (_:s) =Queue f r sexec f !r [] =let f' = rotate f r [] inQueue f' [] f'enqueue :: a ->Queue a ->Queue aenqueue x (Queue f r s) = exec f (x:r) s
It’s instructive to compare the above to the following implementation which doesn’t use a schedule.
This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy
rebuilding to spread the xs ++ reverse ys (particularly the reverse part) over multiple
dequeues via rotate. The following implementation performs extremely well in my benchmark, but
the operations are subtly not constant-time. Specifically, after a long series of enqueues, a
dequeue will do work proportional to the logarithm of the number of enqueues. Essentially, f
will be a nested series of rotate calls, one for every doubling of the length of the queue. Even
if we change let f' to let !f', that will only make the first dequeue cheap. The second will
still be expensive.
moduleUnscheduledOkasakiQueue ( Queue, empty, dequeue, enqueue ) wheredataQueue a =Queue [a] !Int [a] !Intempty ::Queue aempty =Queue [] 0 [] 0dequeue ::Queue a -> (Maybe a, Queue a)dequeue q@(Queue [] _ _ _) = (Nothing, q)dequeue (Queue (x:f) lenf r lenr) = (Just x, exec f (lenf -1) r lenr)rotate :: [a] -> [a] -> [a] -> [a]rotate [] (y: _) a = y:arotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)exec :: [a] ->Int-> [a] ->Int->Queue aexec f !lenf !r !lenr | lenf >= lenr =Queue f lenf r lenrexec f !lenf !r !lenr =let f' = rotate f r [] inQueue f' (lenf + lenr) [] 0enqueue :: a ->Queue a ->Queue aenqueue x (Queue f lenf r lenr) = exec f lenf (x:r) (lenr +1)
Empirical Evaluation
I won’t reproduce the evaluation code as it’s not very sophisticated or interesting.
It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce
an enqueue over a dequeue so that the queues would grow. It measured the average
time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.
The main thing I wanted to see was relatively stable average enqueue and dequeue
times with only the batched implementation having a growing maximum dequeue time.
This is indeed what I saw, though it took about 1,000,000 operations (or really
a queue of a couple hundred thousand elements) for the numbers to stabilize.
The results were mostly unsurprising. Unsurprisingly, in overall time, the batched
implementation won. Its enqueue is also, obviously, the fastest. (Indeed, there’s
a good chance my measurement of its average enqueue time was largely a measurement
of the timer’s resolution.) The operations’ average times were stable illustrating their
constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue
time versus the average stabilized around 7000 to 1, except, of course, for the
batched version which grew linearly to millions to 1 ratios at queue sizes of tens
of millions of elements. This illustrates the worst-case time complexity of all the
other implementations, and the merely amortized time complexity of the batched one.
While the batched version was best in overall time, the difference wasn’t that great.
The worst implementations were still less 1.4x slower. All the worst-case optimal
implementations performed roughly the same, but there were still some clear winners
and losers. Okasaki’s real-time queue is almost on-par with the batched
implementation in overall time and handily beats the other implementations in average
enqueue and dequeue times. The main surprise for me was that the loser was the
Hood-Melville queue. My guess is this is due to invalidate which seems like it
would do more work and produce more garbage than the approach taken in my functional
version.
Conclusion
The point of this article was to illustrate the process of deriving a deamortized
data structure from an amortized one utilizing batched rebuilding by explicitly
modeling global rebuilding as a coroutine.
The point wasn’t to produce the fastest queue implementation, though I am pretty
happy with the results. While this is an extremely simple example, it was still
nice that each step was very easy and natural. It’s especially nice that this
derivation approach produced a better result than the Hood-Melville queue.
Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue
with worst-case constant-time operations.
This code could definitely be refactored to leverage
this similarity to reduce code. Alternatively, one could refunctionalize
the Hood-Melville implementation at the end.↩︎
Going “too fast”, so long as it’s still a constant amount of
work for each step, isn’t really an issue asymptotically, so you can just
crank the knobs if you don’t want to think too hard about it. That said,
going faster than you need to will likely give you worse worst-case
constant factors. In some cases, going faster than necessary could reduce
constant factors, e.g. by better utilizing caches and disk I/O buffers.↩︎
One of the things that I learned in grad school is that even if you've picked an important and unsolved problem, you need some reason to believe it is solvable--especially if people have tried to solve it before! In other words, "What's different this time?" This is perhaps a dreary way of shooting down otherwise promising research directions, but you can flip it around: when the world changes, you can ask, "What can I do now that I couldn't do before?"
This post is a list of problems in areas that I care about (half of this is PL flavor, since that's what I did my PhD in), where I suspect something has changed with the advent of LLMs. It's not a list of recipes; there is still hard work to figure out how exactly an LLM can be useful (for most of these, just feeding the entire problem into ChatGPT usually doesn't work). But I often talk to people want to get started on something, anything, but have no idea to start. Try here!
Static analysis. The chasm between academic static analysis work and real world practice is the scaling problems that come with trying to apply the technique to a full size codebase. Asymptotics strike as LOC goes up, language focused techniques flounder in polyglot codebases, and "Does anyone know how to write cmake?" But this is predicated on the idea that static analysis has to operate on a whole program. It doesn't; humans can do perfectly good static analysis on fragments of code without having to hold the entire codebase in their head, without needing access to a build system. They make assumptions about APIs and can do local reasoning. LLMs can play a key role in drafting these assumptions so that local reasoning can occur. What if the LLM gets it wrong? Well, if an LLM could get it wrong, an inattentive junior developer might get it wrong too--maybe there is a problem in the API design. LLMs already do surprisingly well if you one-shot prompt them to find bugs in code; with more traditional static analysis support, maybe they can do even better.
DSL purgatory. Consider a problem that can be solved with code in a procedural way, but only by writing lots of tedious, error prone boilerplate (some examples: drawing diagrams, writing GUIs, SQL queries, building visualizations, scripting website/mobile app interactions, end to end testing). The PL dream is to design a sweet compositional DSL that raises the level of abstraction so that you can render a Hilbert curve in seven lines of code. But history is also abound with cases where the DSL did not solve the problems, or maybe it did solve the problem but only after years of grueling work, and so there are still many problems that feel like there ought to be a DSL that should solve them but there isn't. The promise of LLMs is that they are extremely good at regurgitating low level procedural actions that could conceivably be put together in a DSL. A lot of the best successes of LLMs today is putting coding powers in the hands of domain experts that otherwise do not how to code; could it also help in putting domain expertise in the hands of people who can code?
I am especially interested in these domains:
SQL - Its strange syntax purportedly makes it easier for non-software engineers to understand, whereas many (myself included) would often prefer a more functional syntax ala LINQ/list comprehensions. It's pretty hard to make an alternate SQL syntax take off though, because SQL is not one language, but many many dialects everywhere with no obvious leverage point. That sounds like an LLM opportunity. Or heck, just give me one of those AI editor environments but specifically fine tuned for SQL/data visualization, don't even bother with general coding.
End to end testing - This is https://momentic.ai/ but personally I'm not going to rely on a proprietary product for testing in my OSS projects. There's definitely an OSS opportunity here.
Scripting website/mobile app interactions - The website scraping version of this is https://reworkd.ai/ but I am also pretty interested in this from the browser extension angle: to some extent I can take back control of my frontend experience with browser extensions; can I go further with LLMs? And we typically don't imagine that I can do the same with a mobile app... but maybe I can??
OSS bread and butter. Why is Tesseract still the number one OSS library for OCR? Why is smooth and beautiful text to voice not ubiquitous? Why is the voice control on my Tesla so bad? Why is the wake word on my Android device so unreliable? Why doesn't the screenshot parser on a fansite for my favorite mobage not able to parse out icons? The future has arrived, but it is not uniformly distributed.
Improving the pipeline from ephemeral to durable stores of knowledge. Many important sources of knowledge are trapped in "ephemeral" stores, like Discord servers, private chat conversations, Reddit posts, Twitter threads, blog posts, etc. In an ideal world, there would be a pipeline of this knowledge into more durable, indexable forms for the benefit of all, but actually doing this is time consuming. Can LLMs help? Note that the dream of LLMs is you can just feed all of this data into the model and just ask questions to it. I'm OK with something a little bit more manual, we don't have to solve RAG first.
proposal for a Haskell language extension: when importing a function from another module, one may optionally also specify a type signature for the imported function. this would be helpful for code understanding. the reader would have immediately available the type of the imported symbol, not having to go track down the type in the source module (which may be many steps away when modules re-export symbols, and the source module might not even have a type annotation), nor use a tool such as ghci to query it. (maybe the code currently fails to compile for other reasons, so ghci is not available.)
if a function with the specified type signature is not exported by an imported module, the compiler can offer suggestions of other functions exported by the module which do have, or unify with, the imported type signature. maybe the function got renamed in a new version of the module.
or, the compiler can do what Hoogle does and search among all modules in its search path for functions with the given signature. maybe the function got moved to a different module.
the specified type signature may be narrower than how the function was originally defined. this can limit some of the insanity caused by the Foldable Traversable Proposal (FTP):
import Prelude(length :: [a] -> Int) -- prevent length from being called on tuples and Maybe
various potentially tricky issues:
a situation similar to the diamond problem (multiple inheritance) in object-oriented programming: module A defines a polymorphic function f, imported then re-exported by modules B and C. module D imports both B and C, unqualified. B imports and re-exports f from A with a type signature more narrow than originally defined in A. C does not change the type signature. what is the type of f as seen by D? which version of f, which path through B or C, does D see? solution might be simple: if the function through different paths are not identical, then the user has to qualify.
the following tries to make List.length available only for lists, and Foldable.length available for anything else. is this asking for trouble?
import Prelude hiding(length); import qualified Prelude(length :: [a] -> Int) as List; import qualified Prelude(length) as Foldable;
Today on the Haskell Interlude, Matti and Sam are joined by Satnam Singh. Satnam has been a lecturer at Glasgow, and Software Engineer at Google, Meta, and now Groq. He talks about convincing people to use Haskell, laying out circuits and why community matters.
PS: After the recording, it was important to Satnam to clarify that his advise to “not be afraid to loose your job” was specially meant to encourage to quit jobs that are not good for you, if possible, but he acknowledges that unfortunately not everybody can afford that risk.
Way back in 2012 I took over maintainership of the BlogLiterately
tool from Robert
Greayer, its initial author. I used it for many years to post to my
Wordpress blog, added a
bunch
of
features,
solved some fun
bugs,
and created the accompanying BlogLiterately-diagrams
plugin
for embedding diagrams code in blog
posts. However, now that I have fled Wordpress and rebuilt my blog
with hakyll, I don’t use
BlogLiterately any more (there is even a diagrams-pandoc package
which does the same thing BlogLiterately-diagrams used to do). So,
as of today I am officially declaring BlogLiterately unsupported.
The fact is, I haven’t actually updated BlogLiterately since March
of last year. It currently only builds on GHC 9.4 or older, and no one
has complained, which I take as strong evidence that no one else is
using it either! However, if anyone out there is actually using it,
and would like to take over as maintainer, I would be very happy to
pass it along to you.
I do plan to continue maintaining
HaXml and
haxr, at least for now;
unlike BlogLiterately, I know they are still in use, especially
HaXml. However, BlogLiterately was really the only reason I cared
about these packages personally, so I would be happy to pass them
along as well; please get in touch if you would be willing to take
over maintaining one or both packages.
<noscript>Javascript needs to be activated to view comments.</noscript>
PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).
This document summarises the design and use of the PenroseKiteDart package.
PenroseKiteDart package is now available on Hackage.
In figure 1 we show a dart and a kite. All angles are multiples of (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length , where is the golden ratio.
Aperiodic Infinite Tilings
What is interesting about these tiles is:
It is possible to tile the entire plane with kites and darts in an aperiodic way.
Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.
The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)
This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.
Legal Tilings
In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.
If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.
Correct Tilings
Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.
2. Using the PenroseKiteDart Package
You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.
Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).
3. Overview of Types and Operations
Half-Tiles
In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).
The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.
dataHalfTilerep=LDrep-- Left Dart|RDrep-- Right Dart|LKrep-- Left Kite|RKrep-- Right Kitederiving(Show,Eq)
Tgraphs
We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.
typeTileFace=HalfTile(Vertex,Vertex,Vertex)typeVertex=Int-- must be positive
The function
makeTgraph::[TileFace]->Tgraph
then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have
faces::Tgraph->[TileFace]
to retrieve the TileFace list from a Tgraph.
As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):
fool::Tgraphfool=makeTgraph[RD(1,2,3),LD(1,3,4)-- right and left dart,LK(5,3,2),RK(5,2,7)-- left and right kite,RK(5,4,3),LK(5,6,4)-- right and left kite]
To produce a diagram, we simply draw the Tgraph
foolFigure::DiagramBfoolFigure=drawfool
which will produce the diagram on the left in figure 4.
Alternatively,
foolFigure::DiagramBfoolFigure=labelleddrawjfool
will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).
When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length (for kites).
Tgraph Properties
Tgraphs are actually implemented as
newtypeTgraph=Tgraph[TileFace]deriving(Show)
but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.
Note: If the TilFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.
For convenience, in addition to makeTgraph, we also have
The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.
Main Tiling Operations
There are three key operations on finite tilings, namely
Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor times the scale of the former, to reflect the change in scale.
We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph
Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]
An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.
Compose
Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.
Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that
If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.
Try Results
It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.
We define Try as a synonym for Either String (which is a monad) in module Tgraph.Try.
type Try a = Either String a
Successful results have the form Right r (for some correct result r) and failure results have the form Left s (where s is a String describing the problem as a failure report).
The function
runTry::Trya->arunTry=eithererrorid
will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.
force=runTry.tryForcecompose=runTry.tryCompose
Elementary Tgraph and TileFace Operations
The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.
When we need to refer to particular vertices of a TileFace we use
originV::TileFace->Vertex-- the first vertex - red dot in figure 2oppV::TileFace->Vertex-- the vertex at the opposite end of the join edge from the originwingV::TileFace->Vertex-- the vertex not on the join edge
A directed edge is represented as a pair of vertices.
typeDedge=(Vertex,Vertex)
So (a,b) is regarded as a directed edge from a to b. In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.
For example,
internalEdges::Tgraph->[Dedge]
produces an edge list, whereas
graphBoundary::Tgraph->[Dedge]
produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function
graphDedges::Tgraph->[Dedge]
produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.
The above three functions are defined using
faceDedges::TileFace->[Dedge]
which produces a list of the three directed edges going clockwise round a TileFace starting at the origin vertex.
When we need to refer to particular edges of a TileFace we use
joinE::TileFace->Dedge-- shown dotted in figure 2shortE::TileFace->Dedge-- the non-join short edgelongE::TileFace->Dedge-- the non-join long edge
which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:
Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.
where the first draws the non-join edges of a Piece, the second does the same but adds a dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.
Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.
Vertex Patches
It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).
calculates vertex locations using a default orientation and scale.
VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.
One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.
labelleddraw(rotatesomeAngle(makeVPg))
We can also align a VPatch using vertex labels.
alignXaxis::(Vertex,Vertex)->VPatch->VPatch
So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.
labelleddraw(alignXaxis(a,b)(makeVPg))
Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).
4. Drawing in More Detail
Class Drawable
There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume
draw::Drawablea=>a->DiagramB-- draws non-join edgesdrawj::Drawablea=>a->DiagramB-- as with draw but also draws dashed join edgesfillDK::Drawablea=>ColourDouble->ColourDouble->a->DiagramB-- fills with colours
where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.
These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.
(Update Sept 2024) As of version 1.1 of PenroseKiteDart, these will be
Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).
So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define
and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.
Further drawing functions
There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph
smart::(VPatch->DiagramB)->Tgraph->DiagramB
So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g
Here, restrictSmart g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.
Overlaid examples (location map sharing)
The function
drawForce::Tgraph->DiagramB
will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly
drawPCompose::Tgraph->DiagramB
applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.
Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.
The location map sharing is done with
subVP::VPatch->[TileFace]->VPatch
so that subVP vp fcs is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]
For cases where labels are only going to be drawn for certain faces, we need a version of subVP which also gets rid of vertex locations that are not relevant to the faces. For this situation we have
restrictVP::VPatch->[TileFace]->VPatch
which filters out un-needed vertex locations from the vertex location map. Unlike subVP, restrictVP checks for missing vertex locations, so restrictVP vp fcs raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.
5. Forcing in More Detail
The force rules
The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.
Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.
Main Forcing Operations
To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.
The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.
There are several other operations related to forcing including
The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.
Update Generators
An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example
where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.
In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.
In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).
Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.
Forcible Class Operations
The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.
For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.
We also specialize tryFSOpWith to use the default update generator
Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.
We also specialize tryChangeBoundaryWith to use the default update generator
Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.
The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case
Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.
For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.
Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.
This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so
force.addHalfDartd.force
becomes
tryForce<=<tryAddHalfDartd<=<tryForce
Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes
The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.
A limitation of forcing
To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use
These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either
recalibratingForce(decompositionskingGraph!!6)
or using an extra force before the decompositions
force(decompositions(forcekingGraph)!!6)
In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.
6. Advanced Operations
Guided comparison of Tgraphs
Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have
sameGraph::(Tgraph,Dedge)->(Tgraph,Dedge)->Bool
The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.
where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)
which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.
which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.
Empires and SuperForce
In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation
superForce::Forciblea=>a->a
and operations to calculate empires.
We will not repeat the descriptions here other than to note that
forcedBoundaryECovering::Tgraph->[Tgraph]
finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that
each extension has the whole boundary of force g as internal edges.
each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.
(possible here means – not leading to a stuck Tgraph when forced.) There is also
forcedBoundaryVCovering::Tgraph->[Tgraph]
which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.
Combinations
Combinations such as
compForce::Tgraph->Tgraph-- compose after forcingallCompForce::Tgraph->[Tgraph]-- iterated (compose after force) while not emptyTgraphmaxCompForce::Tgraph->Tgraph-- last item in allCompForce (or emptyTgraph)
which relies on the fact that composition of a forced Tgraph does not need to be checked for connectedness and no crossing boundaries. Similarly, only the initial force is necessary in allCompForce with subsequent iteration of uncheckedCompose because composition of a forced Tgraph is necessarily a forced Tgraph.
has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of
decomposeTracked::TrackedTgraph->TrackedTgraph
Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.
is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.
Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have
Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
In a recent comment (that I sadly cannot find any longer) in https://www.reddit.com/r/math/, someone mentioned the following game. There are n players, and they each independently choose a natural number. The player with the lowest unique number wins the game. So if two people choose 1, a third chooses 2, and a fourth chooses 5, then the third player wins: the 1s were not unique, so 2 was the least among the unique numbers chosen. (Presumably, though this wasn’t specified in the comment, if there is no unique number among all players, then no one wins).
I got nerd-sniped, so I’ll share my investigation.
For me, since the solution to the general problem wasn’t obvious, it made sense to specialize. Let’s say there are n players, and just to make the game finite, let’s say that instead of choosing any natural number, you choose a number from 1 to m. Choosing very large numbers is surely a bad strategy anyway, so intuitively I expect any reasonably large choice of m to give very similar results.
n = 2
Let’s start with the case where n = 2. This one turns out to be easy: you should always pick 1, daring your opponent to pick 1, as well. We can induct on m to prove this. If m = 1, then you are required to pick 1 by the rules. But if m > 1, suppose you pick m. Either your opponent also picks m and you both lose, or your opponent picks a number smaller than m and you still lose. Clearly, this is a bad strategy, and you always do at least as well choosing one of the first m - 1 options instead. This reduces the game to one where we already know the best strategy is to pick 1.
That wasn’t very interesting, so let’s try more players.
n = 3, m = 2
Suppose there are three players, each choosing either 1 or 2. It’s impossible for all three players to choose a different number! If you do manage to pick a unique number, then, you will be the only player to do so, so it will always be the least unique number simply because it’s the only one!
If you don’t think your opponents will have figured this out, you might be tempted to pick 2, in hopes that your opponents go for 1 to try to get the least number, and you’ll be the only one choosing 2. But this makes you predictable, so the other players can try to take advantage. But if one of the other players reasons the same way, you both are guaranteed to lose! What we want here is a Nash equilibrium: a strategy for all players such that no single player can do better by deviating from that strategy.
It’s not hard to see that all players should flip a coin, choosing either 1 or 2 with equal probability. There’s a 25% chance each that a player picks the unique number and wins, and there’s a 25% chance that they all choose the same number and all lose. Regrettable, but anything you do to try to avoid that outcome just makes your play more predictable so that the other players could exploit that.
It’s interesting to look at the actual computation. When computing a Nash equilibrium, we generally rely on the indifference principle: a player should always be indifferent between any choice that they make at random, since otherwise, they would take the one with the better outcome and always play that instead.
This is a bit counter-intuitive! Naively, you might think that the optimal strategy is the one that gives the best expected result, but when a Nash equilibrium involves a random choice— known as a mixed strategy — then any single player actually does equally well against other optimal players no matter which mix of those random choices they make! In this game, though, predictability is a weakness. Just as a poker player tries to avoid ‘tells’ that give away the strength of their hand, players in this number-choosing game need to be unpredictable. The reason for playing the Nash equilibrium isn’t that it gives the best expected result against optimal opponents, but rather that it can’t be exploited by an opponent.
Let’s apply this indifference principle. This game is completely symmetric — there’s no order of turns, and all players have the same choices and payoffs available — so an optimal strategy ought to be the same for any player. Then, let’s say p is the probability that any single player will choose 1. Then if you choose 1, you will win with probability (1 — p)², while if you choose 2, you’ll win with probability p². If you set these equal to each other as per the indifference principle, and solve the equation, you get p = 0.5, as we reasoned above.
n = 3, m = 3
Things get more interesting if each player can choose 1, 2, or 3. Now it’s possible for each player to choose uniquely, so it starts to matter which unique number you pick. Let’s say each player chooses 1, 2, and 3 with the probabilities p, q, and r respectively. We can analyze the probability of winning with each choice.
If you pick 1, then you always win unless someone else also picks a 1. Your chance of winning, then, is (q + r)².
If you pick 2, then for you to win, either both other players need to pick 1 (eliminating each other because of uniqueness and leaving you to win by default), or both other players need to pick 3, so that you’ve picked the least number. Your chance of winning is p²+ r².
If you pick 3, then you need your opponents to pick the same different number: either 1 or 2. Your chance of winning is p²+ q².
Setting these equal to each other immediately shows us that since p²+ q² = p²+ r², we must conclude that q = r. Then p²+ q² = (q + r)² = 4q², so p² = 3q² = 3r². Together with p + q + r = 1, we can conclude that p = 2√3 - 3 ≈ 0.464, while q = r = 2 - √3 ≈ 0.268.
This is our first really interesting result. Can we generalize?
n = 3, in general
The reasoning above generalizes well. If there are three players, and you pick a number k, you are betting that either the other two players will pick the same number less than k, or they will each pick numbers greater than k (regardless of whether they are the same one).
I’ll switch notation here for convenience. Let X be a random variable representing a choice by a player from the Nash equilibrium strategy. Then if you choose k, your probability of winning is P(X=1)² + … + P(X=k-1)² + P(X>k)². The indifference principle tells us that this should be equal for any choice of k. Equivalently, for any k from 1 to m - 1, the probability of winning when choosing k is the same as the probability when choosing k + 1. So:
Cancelling the common terms: P(X>k)² = P(X=k)² + P(X>k+1)²
Rearranging: P(X=k) = √(P(X≥k+1)² - P(X>k+1)²)
This gives us a recursive formula that we can use (in reverse) to compute P(X=k), if only we knew P(X=m) to get started. If we just pick something arbitrary, though, it turns out that all the results are just multiples of that choice. We can then divide by the sum of them all to normalize the probabilities to sum to 1.
nashEquilibriumTo :: Integer -> Distribution Double Integer nashEquilibriumTo m = categorical (zip allPs [1 ..]) where allPs = go m 1 0 [] go 1 pEqual pGreater ps = (/ (pEqual + pGreater)) <$> (pEqual : ps) go k pEqual pGreater ps = let pGreaterEqual = pEqual + pGreater in go (k - 1) (sqrt (pGreaterEqual * pGreaterEqual - pGreater * pGreater)) pGreaterEqual (pEqual : ps)
main :: IO () main = print (probabilities (nashEquilibriumTo 100))
I’ve used a probability library from https://github.com/cdsmith/prob that I wrote with Shae Erisson during a fun hacking session a few years ago. It doesn’t help yet, but we’ll play around with some of its further features below.
Trying a few large values for m confirms my suspicion that any reasonably large choice of m gives effectively the same result.
By inspection, this appears to be a geometric distribution, parameterized by the probability 0.4563109873079237. We can check that the distribution is geometric, which just means that for all k < m - 1, the ratio P(X > k) / P(X ≥ k) is the same as P(X > k + 1) / P(X ≥ k + 1). This is the defining property of a geometric distribution, and some simple algebra confirms that it holds in this case.
But what is this bizarre number? A few Google queries gets us to an answer of sorts. A 2002 Ph.D. dissertation by Joseph Myers seems to arrive at the same number in the solution to a question about graph theory, where it’s identified as the real root of the polynomial x³ - 4x² + 6x - 2. We can check that this is right for a geometric distribution. Starting with P(X=k) = √(P(X≥k+1)² -P(X>k+1)²) where k = 1, we get P(X=1) = √(P(X ≥ 2)² -P(X > 2)²). If P(X=1) = p, then P(X ≥ 2) = 1 - p, and P(X > 2) = (1 - p)², so we have p = √((1-p)² - ((1 - p)²)²), which indeed expands to p⁴ - 4p³ + 6p² - 2p = 0, so either p = 0 (which is impossible for a geometric distribution), or p³ - 4p² + 6p - 2 = 0, giving the probability seen above. (How and if this is connected to the graph theory question investigated in that dissertation, though, is certainly beyond my comprehension.)
You may wonder, in these large limiting cases, how often it turns out that no one wins, or that we see wins with each number. Answering questions like this is why I chose to use my probability library. We can first define a function to implement the game’s basic rule:
leastUnique :: (Ord a) => [a] -> Maybe a leastUnique xs = listToMaybe [x | [x] <- group (sort xs)]
And then we can define the whole game using the strategy above for each player:
gameTo :: Integer -> Distribution Double (Maybe Integer) gameTo m = do ns <- replicateM 3 (nashEquilibriumTo m) return (leastUnique ns)
Then we can update main to tell us the distribution of game outcomes, rather than plays:
main :: IO () main = print (probabilities (gameTo 100))
And get these probabilities:
Nothing -> 0.11320677243374572 Just 1 -> 0.40465349320873445 Just 2 -> 0.22000565820506113 Just 3 -> 0.11961465909617276 Just 4 -> 6.503317590749513e-2 Just 5 -> 3.535782320137907e-2 Just 6 -> 1.9223659987298684e-2 Just 7 -> 1.0451692718822408e-2
An 11% probability of no winner for large m is an improvement over the 25% we computed for m = 2. Once again, a least unique number greater than 7 has less than 1% probability, and the probabilities drop even more rapidly from there.
More than three players?
With an arbitrary number of players, the expressions for the probability of winning grow rather more involved, since you must consider the possibility that some other players have chosen numbers greater than yours, while others have chosen smaller numbers that are duplicated, possibly in twos or in threes.
For the four-player case, this isn’t too bad. The three winning possibilities are:
All three other players choose the same smaller number. This has probability P(X=1)³ + … + P(X=k-1)³
All three other players choose larger numbers, though not necessarily the same one. This has probability P(X > k)³
Two of the three other players choose the same smaller number, and the third chooses a larger number. This has probability 3 P(X > k) (P(X=1)² + … + P(X=k-1)²)
You could possibly work out how to compute this one without too much difficulty. The algebra gets harder, though, and I dug deep enough to determine that the Nash equilibrium is no longer a geometric distribution. If you assume the Nash equilibrium is geometric, then numerically, the probability of choosing 1 that gives 1 and 2 equal rewards would need to be about 0.350788, but this choice gives too small a reward for choosing 3 or more, implying they ought to be chosen less often.
For larger n, even stating the equations turns into a nontrivial problem of accurately counting the possible ways to win. I’d certainly be interested if there’s a nice-looking result here, but I do not yet know what it is.
Numerical solutions
We can solve this numerically, though. Using the probability library mentioned above, one can easily compute, for any finite game and any strategy (as a probability distribution of moves) the expected benefit for each choice.
expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double] expectedOutcomesTo n m dist = [ probability (== Just i) $ leastUnique . (i :) <$> replicateM (n - 1) dist | i <- [1 .. m] ]
We can then then iteratively adjust the probability of each choice slightly based on how its expected outcome compares to other expected outcomes in the distribution. It turns out to be good enough to compare with an immediate neighbor. Just so that all of our distributions remain valid, instead of working with the global probabilities P(X=k), we’ll do the computation with conditional probabilities P(X = k | X ≥ k), so that any sequence of probabilities is valid, without worrying about whether they sum to 1. Given this list of conditional probabilities, we can produce a probability distribution like this.
distFromConditionalStrategy :: [Double] -> Distribution Double Int distFromConditionalStrategy = go 1 where go i [] = pure i go i (q : qs) = do choice <- bernoulli q if choice then pure i else go (i + 1) qs
Then we can optimize numerically, using the difference of each choice’s win probability from its neighbor as a diff to add to the conditional probability of that choice.
refine :: Int -> Int -> [Double] -> Distribution Double Int refine n iters strategy | iters == 0 = equilibrium | otherwise = let ps = expectedOutcomesTo n m equilibrium delta = zipWith subtract (drop 1 ps) ps adjs = zipWith (+) strategy delta in refine n (iters - 1) adjs where m = length strategy + 1 equilibrium = distFromConditionalStrategy strategy
It works well enough to run this for 10,000 iterations at n = 4, m = 10.
main :: IO () main = do let n = 4 m = 10 d = refine n 10000 (replicate (m - 1) 0.3) print $ probabilities d print $ expectedOutcomesTo n m d
The resulting probability distribution is, to me, at least, quite surprising! I would have expected that more players would incentivize you to choose a higher number, since the additional players make collisions on low numbers more likely. But it seems the opposite is true. While three players at least occasionally (with 1% or more probability) should choose numbers up to 7, four players should apparently stop at 3.
Huh. I’m not sure why this is true, but I’ve checked the computation in a few ways, and it seems to be a real phenomenon. Please leave a comment if you have a better intuition for why it ought to be so!
With five players, at least, we see some larger numbers again in the Nash equilibrium, lending support to the idea that there was something unusual going on with the four player case. Here’s the strategy for five players:
The six player variant retracts the distribution a little, reducing the probabilities of choosing 5 or 6, but then 7 players expands the choices a bit, and it’s starting to become a pattern that even numbers of players lend themselves to a tighter style of play, while odd numbers open up the strategy.
In general, it looks like this is converging to something. The computations are also getting progressively slower, so let’s stop there.
Game variants
There is plenty of room for variation in the game, which would change the analysis. If you’re looking for a variant to explore on your own, in addition to expanding the game to more players, you might try these:
What if a tie awards each player an equal fraction of the reward for a full win, instead of nothing at all? (This actually simplifies the analysis a bit!)
What if, instead of all wins being equal, we found the least unique number, and paid that player an amount equal to the number itself? Now there’s somewhat less of an incentive for players to choose small numbers, since a larger number gives a large payoff! This gives the problem something like a prisoner’s dilemma flavor, where players could coordinate to make more money, but leave themselves open to being undercut by someone willing to make a small profit by betraying the coordinated strategy.
What other variants might be interesting?
Addendum (Sep 26): Making it faster
As is often the case, the naive code I originally wrote can be significantly improved. In this case, the code was evaluating probabilities by enumerating all the ways players might choose numbers, and then computing the winner for each one. For large values of m and n this is a lot, and it grows exponentially.
There’s a better way. We don’t need to remember each individual choice to determine the outcome of the game in the presence of further choices. Instead, we need only determine which numbers have been chosen once, and which have been chosen more than once.
data GameState = GameState { dups :: Set Int, uniqs :: Set Int } deriving (Eq, Ord)
To add a new choice to a GameState requires checking whether it’s one of the existing unique or duplicate choices:
addToState :: Int -> GameState -> GameState addToState n gs@(GameState dups uniqs) | Set.member n dups = gs | Set.member n uniqs = GameState (Set.insert n dups) (Set.delete n uniqs) | otherwise = GameState dups (Set.insert n uniqs)
We can now directly compute the distribution of GameState corresponding to a set of n players playing moves with a given distribution. The use of simplify from the probability library here is crucial: it combines all the different paths that lead to the same outcome into a single case, avoiding the exponential explosion.
stateDist :: Int -> Distribution Double Int -> Distribution Double GameState stateDist n moves = go n (pure (GameState mempty mempty)) where go 0 states = states go i states = go (i - 1) (simplify $ addToState <$> moves <*> states)
Now it remains to determine whether a certain move can win, given the game state resulting from the remaining moves.
win :: Int -> GameState -> Bool win n (GameState dups uniqs) = not (Set.member n dups) && maybe True (> n) (Set.lookupMin uniqs)
Finally, we update the function that computes win probabilities to use this new code.
expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double] expectedOutcomesTo n m dist = [probability (win i) states | i <- [1 .. m]] where states = stateDist (n - 1) dist
The result is that while I previously had to leave the code running overnight to compute the n = 8 case, I can now easily compute cases up to 15 players with enough patience. This would involve computing the winner for about a quadrillion games in the naive code, making it hopeless , but the simplification reduces that to something feasible.
It seems that once you leave behind small numbers of players where odd combinatorial things happen, the equilibrium eventually follows a smooth pattern. I suppose with enough players, the probability for every number would peak and then decline, just as we see for 4 and 5 here, as it becomes worthwhile to spread your choices even further to avoid duplicates. That’s a nice confirmation of my intuition.
Recently, I published The Monospace Web, a minimalist design
exploration. It all started with this innocent post, yearning for a
simpler web. Perhaps too typewriter-nostalgic, but it was an interesting
starting point. After some hacking and sharing early screenshots,
@noteed asked for grid alignment, and down the rabbit hole I went.
The Python programming language, and its huge ecosystem (there are
more than 500,000 projects hosted on the main Python repository,
PyPI), is used both for software engineering and
scientific research. Both have similar requirements for
reproducibility. But, as we will see, the practices are quite
different.
In fact, the Python ecosystem and community is notorious for the countless ways it uses to declare dependencies.
As we were developping FawltyDeps1,
a tool to ensure that declared dependencies match the actual imports
in the code, we had to accommodate many of these ways.
This got us thinking:
Could FawltyDeps be used
to gain insights into how packaging is done across Python ecosystems?
In this blog post, we look at project structures and dependency declarations across Python projects,
both from biomedical scientific papers (as an example of scientific usage of Python) as well as from more general and widely used Python packages.
We’ll try to answer the following questions:
What practices does the community actually follows? And how do they
differ between software engineering and scientific research?
Could such differences be related to why it’s often hard to reproduce results from scientific notebooks published in the data science community?
Experiment setup
In the following, we discuss the experimental setup — how we decided which data to use, where to get this data from, and what tools we use to analyze it, before we discuss our results in depth.
Data
First, we need to collect the names and source code locations of projects that we want to include in the analysis. Now, where did we find these projects?
We selected projects for analysis based on two key areas: impactful real-world
applications and broad community adoption.
Biomedical data analysis repositories:
biomedical data plays a vital role in healthcare and research.
To capture its significance, we focused on packages directly linked to
biomedical data, sourced from repositories supported or referenced by
scientific biomedical articles. This criterion anchored our experiment in
real-world scientific applications.
To analyze software engineering practices, we’ve chosen to use
the most popular PyPI packages: acknowledging the importance of widely
adopted packages, we included a scan of the most downloaded and frequently
used PyPI packages.
Biomedical data
We leverage a recent study by Samuel, S., & Mietchen, D. (2024):
Computational reproducibility of Jupyter notebooks from biomedical
publications. This study analyzed
2,177 GitHub repositories associated with publications indexed in
PubMed Central to assess computational reproducibility. Specifically,
we reused the dataset they generated (found
here) for our own analyses.
PyPI data
In order to start analyzing actual projects published to PyPI, we still needed
to access some basic metadata about these projects: the project’s name, source URL,
and any extra metadata which could be useful for further analysis such as project tags.
While this information is available via the PyPI REST API, this API is subject
to rate limiting and is not really designed for bulk analyses such as ours.
Conveniently, Google maintains a public BigQuery dataset of PyPI
download statistics and project metadata which we leveraged instead. As a
starting point for our analysis, we produced a CSV with relevant metadata for
top packages downloaded in 2023 using a simple SQL query.
Since the above-mentioned biomedical database contains 2,177 projects, we conducted a scan of the
first 2,000 PyPI packages to create a dataset of comparable size.
Using FawltyDeps to analyze the source code data
Now that we have the source URLs of our projects of interest, we downloaded all sources and ran an analysis script that wraps around FawltyDeps on the packages. For safety, all of this happened in a virtual machine.
Post-processing and filtering of FawltyDeps analysis results
While the data we collected from PyPI was quite clean (modulo broken or inaccessible
project URLs), the biomedical dataset contained some projects written in R and some
projects written in Python 2.X, which are outside of our scope.
To further filter for relevant projects that are written in Python 3.X, we applied the following rules:
there should be .py or .ipynb files in the source code directory of the data.
If there are only .ipynb files and no imports, then it is most likely an R project and not taken into account.
we are also only interested in Python projects that have 3rd-party imports,
as these are the project we would expect to declare their dependencies.
After these filtering steps, we have 1,260 biomedical projects and 1,118 PyPI packages
to be analyzed.
Results
Now that we had crunched thousands of Python packages, we were curious to see what secrets the data produced by FawltyDeps would reveal!
Dependency declaration patterns
First, we investigated which dependency declaration file choices were made in both samples.
The following pie charts show the proportion of projects with and without dependency
declaration files, and whether these files actually contain dependency declarations.
We find that about 60% of biomedical projects have dependency declaration files, while for PyPI packages, that number is almost 100%.
That is expected, as the top PyPI projects are written to be reproducible: they are downloaded by
a large group of people and if they are not working due to lack of dependency declarations, it would be noticed immediately by the users.
Interestingly, we found that some biomedical projects (6.8%) and PyPI packages
(16.0%) have dependency declaration files with no dependencies listed inside
them. This might be because they genuinely have no third-party dependencies,
but more commonly it is a symptom of either:
setup.py files with complex dependency calculations: although FawltyDeps supports
parsing simple setup.py files with a single setup()call and no computation
involved for setting the install_requires and extras_require arguments,
it is currently not able to analyze more complex scenarios.
pyproject.toml might be used to configure tools with sections like
[tool.black] or [tool.isort], and declaring dependencies (and other project metadata)
in the same file is not strictly required.
For the remainder of the analysis, we do not take these cases into account.
We then examined how different package types utilize various dependency declaration
methods. The following chart shows the distribution of requirements.txt,
pyproject.toml, and setup files across biomedical projects and PyPI
packages (note that these three categories are not exclusive):
For biomedical projects, requirements.txt and setup.py/setup.cfg files are a majority of declaration files. In contrast, PyPI projects show a higher occurrence of pyproject.toml compared to biomedical projects.
pyproject.toml is a suggested modern way of declaring dependencies. This result should not come as a surprise: top PyPI projects are actively maintained
and are more likely to follow best practices. A requirements.txt file, on the other hand, is easier to add
and if you do not need to package your projects it is a simpler option.
Now let’s have a more detailed view in which categories are exclusive:
For biomedical data there are a lot of projects that have either requirements.txt or setup.py/setup.cfg
files (or a combination of both) present. The traditional method of using setup files utilizing
setuptools to create Python packages has been around for a while and is still heavily relied
upon in the scientific community.
On the PyPI side, no single method for declaring dependencies stood out, as
different approaches were used with similar frequency across all projects.
However, when it comes to using pyproject.toml,
PyPI packages were about five times more likely to adopt this method compared to biomedical projects, suggesting that PyPI package authors tend to favor pyproject.toml significantly more often for dependency management.
Also, almost no top biomedical projects (only 2 out of 1,260) and very few PyPI packages (only 25 out of 1,118) used
pyproject.toml and setup files together: it seems that projects don’t often mix the older method - setup files - with the more modern one - pyproject.toml - at the same time.
A different method of visualizing the subset of results pertaining to requirements.txt, pyproject.toml and setup.py/setup.cfg files are Venn diagrams:
While these diagrams don’t contain new insights, they show clearly how much more common pyproject.toml usage is for PyPI packages.
Source code directories
We next examined where projects store their source code, which we refer to as the
“source code directory”. In the following analysis, we defined this directory as the directory that contains the highest number of Python code files and does not have names like “test”, “example”, “sample”, “doc”, or “tutorial”.
We can make some interesting observations: Over
half (53%) of biomedical projects store their main source code in a directory with a name different
than the project itself, and source code is not commonly stored in directories named src or src-python (7%).
For PyPI projects, the numbers are lower, with 37% storing their main code in a directory that matches
the project name. However, naming the source code directory differently from the package name is still fairly common for PyPI projects,
appearing in 36% of cases. A somewhat surprising finding: the src layout, recommended by Python packaging user guide, appears in only 14% of cases.
Another noteworthy observation is that 23% of biomedical projects store all their source code in the
root directory of the project. In contrast, only 12% of PyPI projects follow this pattern. This
difference makes sense, as scientists working on biomedical projects might be less concerned about
maintaining a strict code structure compared to developers on PyPI. Additionally, a lot of biomedical projects might be a loose collection of notebooks/scripts not intended to be packaged/importable, and thus will typically not need to add any subdirectories at all.
On the other hand, everything from the PyPI data set is an importable package. Even in the “flat” layout (according to discussion), related modules are collected in a subdirectory named after the package.
The top PyPI projects that keep their code in the root directory are often small Python modules or plugins, like “python-json-patch”, “appdirs”, and
“python-json-pointer”. These projects usually have all their source code in a single file, so
storing it in the root directory makes sense.
Key results
Many people have preconceptions about how a Python project should look, but the
reality can be quite different.
Our analysis reveals distinct differences between top PyPI projects and biomedical
projects:
PyPI projects tend to use modern tools like pyproject.toml more
frequently, reflecting better overall project structure and dependency management
practices.
In contrast, biomedical projects display a wide variety of practices;
some store code in the root directory and fail to declare dependencies altogether.
This discrepancy is partially explained by the selection criteria: popular PyPI
packages, by necessity, must be usable and thus correctly declare their
dependencies, while biomedical projects accompanying scientific papers do not face
such stringent requirements.
Conclusion
We found that biomedical projects are written with less attention to the coding best practices, which compromises
their reproducibility. There are many projects without dependencies declared. The use of
pyproject.toml, which is
current state-of-the-art way to declare dependencies is less frequently present in biomedical
packages.
In our opinion, though, it’s essential for any package to adhere to the same high standards of reproducibility as top PyPI packages.
This includes implementing robust dependency management practices and embracing modern packaging standards.
Enhancing these practices will not only improve reproducibility but also foster greater trust and adoption within the scientific community.
While our initial analysis revealed some interesting insights, we feel that
there might be some more interesting treasures to be found within this dataset - you can check yourself in
our FawltyDeps-analysis repository! We invite you
to join the discussion on FawltyDeps and reproducibility in package management on our
Discord channel.
Finally, this experiment also served as a real-world stress test for FawltyDeps itself and identified several edge cases we had not yet accounted for, suggesting avenues of further development for FawltyDeps:
One of the main challenges was to parse unconventional require and extra-require sections in
setup.py files.
This issue has been addressed by the FawltyDeps project, specifically through the improvements made in FawltyDeps PR #440.
Furthermore, it was also not trivial to handle projects with multiple packages declared in one.
Addressing these issues will be a focus as we continue to refine and improve FawltyDeps.
Stay tuned as we will drill deeper into the data we’ve collected. So far, we’ve
reused part of FawltyDeps‘ code for our analysis, but the next step will be to run
the full FawltyDeps tool on a large number of packages. Join us as we examine how
FawltyDeps performs under rigorous testing and what improvements can be made to
enhance its capabilities!
In the classic Star Trek episode Errand of Mercy, Spock computes the chance of success:
CAPTAIN JAMES T. KIRK : What would you say the odds are on our getting out of here?
MR. SPOCK : Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.
And yet they get out of there. Are Spock’s probability computations
unreliable? Think of it another way. The Galaxy is a large place. There
must be tens of thousands of Spocks, and Grocks, and Plocks out there
on various missions. But we won’t hear (or don’t want to hear) about
the failures. So they may all be perfectly good at probability theory, but
we’re only hearing about the lucky ones. This is an example of survivor
bias.
2 Simulation
We can model this. I’ve written a small battle simulator for a super-simple
made up role-playing game...
And the rest of this article can be found at github
(Be sure to download the actual PDF if you want to be able to follow links.)
I got the following question on my post on how I handle secrets in my work notes:
Sounds like a nice approach for other secrets but how about :dbconnection for
Orgmode and sql-connection-alist?
I have to admit I'd never come across the variable sql-connection-alist
before. I've never really used sql-mode for more than editing SQL queries and
setting up code blocks for running them was one of the first things I used
yasnippet for.
I did a little reading and unfortunately it looks like sql-connection-alist
can only handle string values. However, there is a variable
sql-password-search-wallet-function, with the default value of
sql-auth-source-search-wallet, so using auth-source is already supported for
the password itself.
There seems to be a lack of good tutorials for setting up sql-mode in a secure
way – all articles I found place the password in clear-text in the config –
filling that gap would be a nice way to contribute to the Emacs community. I'm
sure it'd prompt me to re-evaluate incorporating sql-mode in my workflow.
This is just a “personal life update” kind of post, but I recently found out
a couple of cool things about my academic history that I thought were neat
enough to write down so that I don’t forget them.
Oppenheimer
When the Christopher Nolan
Biopic about the life of J. Robert
Oppenheimer was about to come out, it was billed as an “Avengers of
Physics”, where every major physicist working in the US early and middle 20th
century would be featured. I had a thought tracing my “academic family tree” to
see if my PhD advisor’s advisor’s advisor’s advisor’s was involved in any of the
major physics projects depicted in the movie, to see if I could spot them
portrayed in the movie as a nice personal connection.
If you’re not familiar with the concept, the relationship between a PhD
candidate and their doctoral advisor is a very personal and individual one: they
personally direct and guide the candidate’s research and thesis. To an extent,
they are like an academic parent.
I was able to find my academic
family tree and, to my surprise, my academic lineage actually traces
directly back to a key figure in the movie!
Dr. Kafatos received his PhD under the advisory of Philip Morrison at the
Massachusetts Institute of Technology.
Dr. Morrison received his PhD in 1940 at University of California, Berkeley
under the advisory of none other than J. Robert
Oppenheimer himself!
So, I started this out on a quest to figure out if I was “academically
descended” from anyone in the movie, and I ended up finding out I was
Oppenheimer’s advisee’s advisee’s advisee’s advisee! I ended up being able to
watch the movie and identify my great-great-grand advisor no problem, and I
think even my great-grand advisor. A fun little unexpected surprise and a cool
personal connection to a movie that I enjoyed a lot.
Erdos
As an employee at Google, you can customize your directory page with
“badges”, which are little personalized accomplishments or achievements, usually
unrelated to any actual work you do. I noticed that some people had an “Erdos
Number N” badge (1, 2, 3, etc.). I had never given any thought into my own
personal Erdos number (it was probably really high, in my mind) but I thought
maybe I could look into it in order to get a shiny worthless badge.
In academia, Paul
Erdos is someone who wrote so many papers and
collaborated with so many people that it became a joking
“non-accomplishment” to say that you wrote a paper with him. Then after a while
it became an joking non-accomplishment to say that you wrote a paper with
someone who wrote a paper with him (because, who hasn’t?). And then it became an
even more joking more non-accomplishment to say you had an Erdos Number of 3
(you wrote a paper with someone who wrote a paper with someone who wrote a paper
with Dr. Erdos).
Anyway I just wanted to get that badge so I tried to figure it out. It turns
my most direct trace through:
Dr. Straus collaborated with many people, including Einstein, Graham,
Goldberg, and 20 papers with Erdos.
So I guess my Erdos number is 4? The median number for mathematicians today
seems to be 5, so it’s just one step above that. Not really a note-worthy
accomplishment, but still neat enough that I want a place to put the work
tracking this down the next time I am curious again.
Anyways I submitted the information above and they gave me that sweet Edros 4
badge! It was nice to have for about a month before quitting the company.
That’s It
Thanks for reading and I hope you have a nice rest of your day!