MoreRSS

site iconEvan MartinModify

I gave Google Chrome five years, from before release to 2012; I touched many pieces but I'm most responsible for the Linux port.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Evan Martin

Understanding Jujutsu bookmarks

2025-08-21 08:00:00

Jujutsu ("jj") sits atop a Git repository and its commands mostly mirror into Git operations; for example, a jj commit is a Git commit.

When collaborating with others with Git you push and pull branches. Meanwhile, jj has a feature called "bookmarks" that are the mechanism for working with Git branches, but which have fairly different behavior from Git branches.

This post goes into the why and how to use bookmarks for Git collaboration.

Why doesn't jj just use branches like Git does?

Part of jj's whole deal is that it collapses many Git concepts (stashes, staging, fixups, in-progress rebases, conflicts) into a single unified model of working with history, which then lets you use the same tools to do all of those things. For example, to fix up an old commit you jump to it, edit it, and jump back to where you were; to fix a rebase conflict you jump to the conflicting commit, edit it, and jump back to where you were, using the same commands.

All this jumping around means that the Git idea of being "on" a particular branch does not make sense in jj. When working on a change I might stop part way through doing one thing, start a different thing based on a commit a few steps back, possibly reshuffle commits around, and have a few extra commits on the side with experiments lingering around as well. Based on my former Git expertise I might have done this kind of thing by making a bunch of Git stashes and branches.

Instead, in jj when you work you are "on" a commit, and when you switch you switch between commits, not branches. After a year of using jj I can assure you that not having branch names for these has worked out just fine.

Bookmarks

Like a Git branch, a jj bookmark is a name that points to a commit, and there are the commands you'd expect to create/delete/rename and move bookmarks around. Unlike Git branches, bookmarks are fixed to a commit unless you manually move them; when you create new commits jj does not automatically move bookmarks around.

In my experience with jj, I have had no use for bookmarks other than for interacting with Git. In principle you could use them to make note of important commits, which I suppose is where the name comes from. Maybe other people have different workflows.

In a colocated jj/Git repository (which is the normal way to use jj), bookmarks are 1:1 with Git branches: creations/modifications/etc via either system are reflected in the other.

Remote bookmarks and tracking

After cloning a Git repository, jj creates "remote" bookmarks with names like main@origin. These are immutable and represent the state of the remote repository.

You could also make a local bookmark named main that is wholly independent. But on a fresh clone, the local bookmark main is marked as tracking main@origin. Conceptually this is similar to Git's notion of an "upstream" branch, but with different behavior.

Suppose main is a tracking bookmark. jj attempts to keep it in sync with main@origin:

  • when you jj git push, if main is ahead of main@origin, jj pushes the changes. (When you're in a state where a push would make a change, jj status shows the bookmark name as main*.)
  • when you jj git fetch, jj updates main@origin as well as updates your local main if it's behind.

Conflicts

If after a fetch the two sides diverge (both contain commits), then the local main will be marked as conflicting and point to both commits. This displays in status as main??. You will need to manually choose where it points with jj bookmark set main -r ... to fix it before using it again.

At least for me this was super weird at first, but now makes so much sense that I cannot remember why I was confused. I think the right way to think about it is that a tracking bookmark is modeling "what I intend this bookmark to be, both locally and remotely" and the jj push/fetch commands keep that in sync.

As distinct from Git, note there is no separate "fetch" vs "pull" commands. (A historical note: apparently both "fetch" vs "pull" commands existed in Git and Mercurial. They agreed that one meant "download the changes" and the other meant "do that and also merge them", but they flipped the meanings!)

Workflow: working alone

If you are just making changes locally and just want to push your changes to main, you must update the bookmark before pushing with a command like jj bookmark set main -r @. This is currently the clunkiest part of jj. There have been conversations in the project about how to improve it.

If you search for jj tug online you will see a common alias people set up to automate this.

Workflow: just put my code in

If you are comfortable with Git push syntax, an alternative I use for when I just want to push my code is to tell Git exactly what I want to push and where to put it:

$ git push origin SOMEHASH:main

Note this is plain git push, no jj or any bookmarks involved.

Workflow: pushing branches

If you want to push a bookmark/branch for someone else to review or pull, the commands are:

$ jj bookmark create some-name
$ jj git push

(The second command will complain that some-name does not exist remotely, and then tell you how to fix it. There are flags for specifying which remote to push to etc.)

Workflow: anonymous branches

Typically in jj you won't have bookmark names ready when you're sending off code reviews. To simplify things jj can generate a bookmark name for you as it pushes.

$ jj git push -c @
Creating bookmark push-sytrsqlnznzr for revision sytrsqlnznzr
Changes to push to origin:
  Add bookmark push-sytrsqlnznzr to 5865f9673d0f

This is my primary workflow when working on GitHub, even solo. Pushing changes in a pull request lets the CI run over it.

Branch safety

jj treats history as mutable, making it natural to edit and reorder commits as you work. When collaborating with others, modifying history can be confusing or dangerous.

jj has a notion of "immutable" commits, which is the part of history that should not be modfied. In the default configuration this effectively means code that has been pushed to Git cannot be modified, with the exception of code in tracked bookmarks. This means that you can continue modify a branch after pushing it, for example in response to code reviews. The next push will update it.

There are further safety checks around things like not letting you move a branch backwards (because that would trim off the later commits). In practice I don't understand all the rules, and sometimes it will prompt me to pass a flag to say "I really do mean to do this". It has been fine so far.

Bonus cool thing: plural revsets

(This final section is trivia and only interesting because I came to understand it when writing this post.)

jj commands that accept commits take a "revset" argument, which is the little language for specifying commits. For example you can say jj diff -r @- to see the diff of the previous commit; the @- expression means "parent of the current commit". As the name suggests, revsets can refer to sets of commits. (jj really ought to pick either "commit" or "revision" for talking about things, it's confusing to have these as synonyms!)

When a branch is conflicting, the revset it names refers to multiple commits: the commit you had locally and the commit seen remotely. Meanwhile, note that the command to create a merge in jj is to create a commit with multiple parents, jj new parent1 parent2 ....

Putting these together, with a conflicting main?? bookmark, you can do:

  • jj diff -r main to show a diff of what the merge of the two commits would look like
  • jj new all:main to create a merge commit of the two commits (where the "all:" prefix means something like "I really do mean for this to refer to multiple commits"; looks like they're still figuring out how this should work)

I have never had a reason to need this trivia but it is kind of neat to see how these pieces fit together.

diff --stat for binary files

2025-08-09 08:00:00

I contributed a minor feature to the Jujutsu version control system, which I wrote about previously.

When you run diff --stat in Git, it shows you a summary of your change as a list of modified files and counts of added and removed lines for each modified file. For binary files, Git displays the difference in byte size. Here's an example commit where I grew a .dll file:

commit 9649ab9bf70c92a1ebe2ac39b4d2ef86b1de37b9
Author: Evan Martin <[email protected]>
Date:   Thu Oct 17 11:56:28 2024 -0700

    dinput: more stubs

 win32/dll/dinput.dll               | Bin 2560 -> 3584 bytes
 win32/src/winapi/dinput/builtin.rs |  48 ++++++++++++++++++++++++++++++++++++++++++++----
 win32/src/winapi/dinput/dinput.rs  |  53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 95 insertions(+), 6 deletions(-)

Jujutsu has the same feature except it didn't handle binary files: it would just count the number of 0x0a bytes in the file, which is not very useful. So I fixed that.

This is a very minor feature but it turned out to be more subtle than I expected for one main reason: the above output is sized to make each line fit the terminal width, which means it truncates the file names if they are too long and also scales the graph on the right to fit. You need to end up being careful to measure all the relevant text and be careful with rounding as well as underflowing zero (e.g. if the terminal is too narrow to fit the filename at all).

Here are some minor notes.

Slightly different output: Git shows 2560 -> 3584 bytes, but after discussion in the PR about whether to show plain byte counts or pretty-print the numbers, I convinced myself that the other lines in diff --stat output are only showing the magnitude of the change and not the before/after. So my output looks like (binary) +1024 bytes. This means that you can't tell a grown file from a fully added or removed file, but that was already true for text files, and that's never bothered me in my years of using Git.

expect tests: I had learned about "expect tests" from this Jane Street blog post. From the post it sounded like a great feature but it was for OCaml only, so I never tried it. I was delighted to discover that Jujutsu uses them via Insta, a Rust library that provides a similar thing.

In the test for my change it runs jj diff --stat and asserts what the output looks like, as follows. The cool thing about Insta is that I didn't need to hand-update this text; instead it can run the test and interactively step through which outputs differ, and for the changes I accept it automatically inserts them back into the code.

let output = work_dir.run_jj(["diff", "--stat"]);
// Rightmost display column          ->|
insta::assert_snapshot!(output, @r"
binary_added.png    | (binary) +12 bytes
binary_modified.png | (binary)
...fied_to_text.png | (binary) -8 bytes
binary_removed.png  | (binary) -16 bytes
...y_valid_utf8.png | (binary) +3 bytes
5 files changed, 0 insertions(+), 0 deletions(-)
[EOF]
");

(The idea of expect tests is deeper than just textual command output! Read the original blog post for more.)

Colored output: When generating textual output, Jujutsu tags substrings with keywords like added or binary which then feeds into an outer system that assigns colors to these semantic categories. This is a neat mechanism to keep colors consistent across different commands while allowing for customization. In particular if you customize the output of other commands like log, you'll interact with these.

Rust build output is massive: This is my first tinkering with Jujutsu, but over the ~two months that I worked on this, my target/ dir (containing Rust build output) grew to over 25gb. Jeepers. I think it was maybe intermediate outputs of various libraries whose versions themselves varied over that time period?

PS: I didn't actually work on it for two months! I worked on it for couple hours, forgot about it, picked up some weeks later, and then repeated that a few times.

Double width characters: File names can be Unicode, and even in a terminal some Unicode characters (particularly Chinese) are supposed to occupy two columns. This means to measure the width filename and properly elide it with ... you need not only Unicode character handling, but also data tables about which codepoints are double-width.

This code was already all implemented and I did not touch it, but I mostly note that even a pretty basic thing like "shorten a filename to make the text align on the terminal" quickly becomes a whole project if you try to do it thoroughly.

Commit access: After writing a few PRs, they granted me access to merge my own changes. Pretty cool thing to do for a first-time contributor! I expect the repo is set up to refuse force-pushes so I suppose if I mess things up they can always fix it.

Future work: When writing this blog post I looked at the output a bit more carefully and noticed yet more aligning is to be done.

retrowin32, split into pieces

2025-05-25 08:00:00

This post is part of a series on retrowin32.

The Rust compiler compiles code in parallel. But the unit of caching is the crate — a concept larger than a module, which corresponds maybe to a library in the C world or package in the JS world. A typical program is a single crate. This means every time you run the compiler, it compiles all the code from scratch. To improve build performance, you can split a program into multiple crates, under the hope that with each compile you can reuse the crates you didn't modify.

retrowin32 was already arranged as a few crates along some obvious boundaries. The x86 emulator, the win32 implementation, the native and web targets were each separate. But the win32 implementation and the underlying system were necessarily pretty tangled, because (among other things) x86 code calls win32 functions which might need to call back into x86 code.

This meant any change to the win32 implementation recompiled a significant quantity of code. This post is about how I managed to split things up further, with one crate per Windows library. retrowin32 now has crates like builtin-gdi32 and builtin-ddraw that implement those pieces of Windows, and they can now compile and cache in parallel (mostly).

The big cycle

Going in, there was a god object Machine that held both the CPU emulator (e.g. the state of the registers) as well as the rest of the system (e.g. memory and kernel state). When the Machine emulated its way to a win32 function call (as described in the syscalls post), it passed itself to the target, which would allow it to poke at system state and potentially call back into further emulation.

For example, the Windows CreateWindow API creates a window and as part of that process it synchronously "sends" the WM_CREATE message, which concretely means within CreateWindow we invoke the window procedure and hand control back to the emulated code.

You cannot have cycles between crates, so this cycle meant that we must put Machine and all the win32 implementation in one single crate. The fix, like with most computer science problems, is adding a layer of abstraction.

A new shared crate defines a System trait, which is the interface expressing "things from the underlying system that a win32 function implementation might need to call". This is then passed to win32 APIs and implemented by Machine, allowing us to compile win32 functions as separate crates, each depending only on the definition of System.

One interesting consequence of this layout is that the win32 implementation no longer directly depends on any emulator at all, as long as the System interface exposes some way to invoke user code. You could hypothetically imagine a retrowin32 that runs on native 32-bit x86, or alternatively one that lets you port a Windows program that you have source for to a non-x86 platform like winelib.

System state

I mentioned above that Machine also holds system state. For example, gdi32 implements the drawing API, which provides functions that vend handles to device contexts. The new gdi32 library enabled by the System interface can declare what state it needs, but we must store that state somewhere.

Further, there are interdependencies between these various Windows libraries. user32, which handles windows and messaging, needs to use code from gdi32 to implement drawing upon windows. But the winmm crate, which implements audio, is independent from those.

One obvious way — the way I imagine it might work in real Windows — is for this state to be held in per-library static globals. I came up with a different solution that is a little strange so I thought I would write it down and see if any reader has a name for it or a better way.

To restate the problem, there's a core Machine type that depends on all the libraries and which holds all the program state. But we want to be able to build each library independently, possibly with interdependencies between them, without them holding a dependency on Machine itself.

The answer is for the ouroborus-breaking System trait to expose a dynamically-typed "get my state by its type" function:

fn state(&self, id: &std::any::TypeId) -> &dyn std::any::Any;

Each library, e.g. gdi32, can register its state (a gdi32::State, perhaps) and fetch it when needed from the system. This way a library like user32 can call gdi32 and both of them can access their own internal state off of the shared state object.

It's maybe just a static with extra steps. I'm not sure yet if I like it.

Result

Most of the win32 API is now in separate crates. (The remaining piece is kernel32, which is the lowest-level piece and will need some more work to pull apart.)

Here's a waterfall of the part of the build that involves these separate crates:

Per xkcd this probably won't save me time overall, but at least I don't have to wait as long when I'm repeatedly cycling.

At the bottom you see the final win32 crate that ties everything together. This one is still too slow (possibly due to kernel32), but it's better than before!