2024-12-03 04:00:00
The murderer I emailed with is still in prison. And the software that got him pissed off at me still runs, so I ran it. Now here I am to pass on the history and then go all geeky. Here’s the tell: If you don’t know what a “filesystem” is (that’s perfectly OK, few reasonable adults need to) you might want to stay for the murderer story then step off the train.
Filesystems are one of the pieces of software that computers need to run, where “computers” includes your phone and laptop and each of the millions of servers that drive the Internet and populate the cloud. There are many flavors of filesystem and people who care about them care a lot.
One of the differences between filesystems is how fast they are. This matters because how fast the apps you use run depends (partly) on how fast the underlying filesystems are.
Writing filesystem software is very, very difficult and people who have done this earn immense respect from their peers. So, a lot of people try. One of the people who succeeded was named Hans Reiser and for a while his “ReiserFS” filesystem was heavily used on many of those “Linux” servers out there on the Internet that do things for you.
Reiser at one point worked in Russia and used a “mail-order bride” operation to look for a spouse. He ended up marrying Nina Sharanova, one of the bride-brokerage translators, and bringing her back to the US with him. They had two kids, got divorced, and then, on September 3, 2006, he strangled her and buried her in a hidden location.
To make a long story short, he eventually pleaded guilty to a reduced charge in exchange for revealing the grave location, and remains in prison. I haven’t provided any links because it’s a sad, tawdry story, but if you want to know the details the Internet has them.
I had interacted with Reiser a few times as a consequence of having written a piece of filesystem-related software called “Bonnie” (more on Bonnie below). I can’t say he was obviously murderous but I found him unpleasant to deal with.
As you might imagine, people generally did not want to keep using the murderer’s filesystem software, but it takes a long time to make this kind of infrastructure change and just last month, ReiserFS was removed as a Linux option. Which led to this Mastodon exchange:
(People who don’t care about filesystems can stop reading now.)
After that conversation, on a whim I tracked down the Bonnie source and ran it on my current laptop, a 2023 M2 MacBook Pro with 32G of RAM and 3T of disk. I think the numbers are interesting in and of themselves even before I start discoursing about benchmarking and filesystems and disks and so on.
-------Sequential Output--------- ---Sequential Input--- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU MBP-M2-32G 64 56.9 99.3 3719 89.0 2772 83.4 59.7 99.7 6132 88.0 33613 33.6
Bonnie says:
This puppy can write 3.7 GB/second to a file, and read it back at 6.1GB/sec.
It can update a file in place at 2.8 GB/sec.
It can seek around randomly in a 64GB file at 33K seeks/second.
Single-threaded sequential file I/O is almost but not quite CPU-limited.
I wonder: Are those good numbers for a personal computer in 2024? I genuinely have no idea.
I will shorten the story, because it’s long. In 1988 I was an employee of the University of Waterloo, working on the New Oxford English Dictionary Project. The computers we were using typically had 16MB or so of memory (so the computer I’m typing this on has two thousand times as much) and the full text of the OED occupied 572MB. Thus, we cared really a lot about I/O performance. Since the project was shopping for disks and computers I bashed out Bonnie in a couple of afternoons.
I revised it lots over the years, and Russell Coker made an excellent fork called Bonnie++ that (for a while at least) was more popular than Bonnie. Then I made my own major revision at some point called Bonnie-64.
In 1996, Linux Torvalds recommended Bonnie, calling it a “reasonable disk performance benchmark”.
That’s all I’m going to say here. If for some weird reason you want to know more, Bonnie’s quaint Nineties-flavor home and description pages are still there, plus this blog has documented Bonnie’s twisty history quite thoroughly. And explored, I claim, filesystem-performance issues in a useful way.
I will address a couple of questions here, though.
Many performance-sensitive applications go to a lot of work to avoid reading and/or writing filesystem data on their critical path. There are lots of ways to accomplish this, the most common being to stuff everything into memory using Redis or Memcached or, well, those two dominate the market, near as I can tell. Another approach is to have the data in a file but access it with mmap rather than filesystem logic. Finally, since real disk hardware reads and writes data in fixed-size blocks, you could arrange for your code to talk straight to the disk, entirely bypassing filesystems. I’ve never seen this done myself, but have heard tales of major commercial databases doing so.
I wonder if anyone has ever done a serious survey study of how the most popular high-performance data repositories, including Relational, NoSQL, object stores, and messaging systems, actually persist the bytes on disk when they have to?
I have an opinion, based on intuition and having seen the non-public inside of several huge high-performance systems at previous employers that, yes, filesystem performance still matters. I’ve no way to prove or even publicly support that intuition. But my bet is that benchmarks like Bonnie are still relevant.
I bet a few of the kind of people who read this blog similarly have intuitions which, however, might be entirely different than mine. I’d like to hear them.
There is a wide range of hardware and software constructs which are accessed through filesystem semantics. They have wildly different performance envelopes. If I didn’t have so many other hobbies and projects, it’d be fun to run Bonnie on a sample of EC2 instance types with files on various EBS and EFS and so on configurations.
For the vast majority of CPU/storage operations in the cloud, there’s at least one network hop involved. Out there in the real world, there is still really a lot of NFS in production. None of these things are much like that little SSD slab in my laptop. Hmmm.
I researched whether some great-great-grandchild of Bonnie was the new hotness in filesystem benchmarking, adopting the methodology of typing “filesystem benchmark” into Web search. The results were disappointing; it doesn’t seem like this is a thing people do a lot. Which would suggest that people don’t care about filesystem performance that much? Which I don’t believe. Puzzling.
Whenever there was a list of benchmarks you might look at, Bonnie and Bonnie++ were on that list. Looks to me like IOZone gets the most ink and is thus probably the “industry-leading” benchmark. But I didn’t really turn up any examples of quality research comparing benchmarks in terms of how useful the results are.
The biggest problem in benchmarking filesystem I/O is that Linux tries really hard to avoid doing it, aggressively using any spare memory as a filesystem cache. This is why serving static Web traffic out of the filesystem often remains a good idea in 2024; your server will take care of caching the most heavily fetched data in RAM without you having to do cache management, which everyone knows is hard.
I have read of various cache-busting strategies and have never really been convinced that they’ll outsmart this aspect of Linux, which was written by people who are way smarter and know way more than I think I do. So Bonnie has always used a brute-force approach: Work on a test file which is much bigger than main memory, so Linux has to do at least some real I/O. Ideally you’d like it to be several times the memory size.
But this has a nasty downside. The computer I’m typing on has 32GB of memory, so I ran Bonnie with a 64G filesize (128G would have been better) and it took 35 minutes to finish. I really don’t see any way around this annoyance but I guess it’s not a fatal problem.
Oh, and those numbers: Some of them look remarkably big to me. But I’m an old guy with memories of how we had to move the bits back and forth individually back in the day, with electrically-grounded tweezers.
I can’t remember when this was, but some important organization was doing an evaluation of filesystems for inclusion in a big contract or standard or something, and so they benchmarked a bunch, including ReiserFS. Bonnie was one of the benchmarks.
Bonnie investigates the rate at which programs can seek around in a file by forking off three child processes that do a bunch of random seeks, read blocks, and occasionally dirty them and write them back. You can see how this could be stressful for filesystem code, and indeed, it occasionally made ReiserFS misbehave, which was noted by the organization doing the benchmarking.
Pretty soon I had email from Reiser claiming that what Bonnie was doing was actually violating the contract specified for the filesystem API in terms of concurrent write access. Maybe he was right? I can’t remember how the conversation went, but he annoyed me and in the end I don’t think I changed any code.
At one time Bonnie was on SourceForge, then Google Code, but I decided that if I were going to invest effort in writing this blog, it should be on GitHub too, so here it is. I even filed a couple of bugs against it.
I make no apologies for the rustic style of the code; it was another millennium and I was just a kid.
I cheerfully admit that I felt a warm glow checking in code originally authored 36 years ago.
2024-11-16 04:00:00
As a dangerous and evil man drives people away from Xitter, many stories are talking up Bluesky as the destination for the diaspora. This piece explains why I kind of like Bluesky but, for the moment, have no intention of moving my online social life away from the Fediverse.
(By “Fediverse” I mean the social network built around the ActivityPub protocol, which for most people means Mastodon.)
If we’re gonna judge social-network alternatives, here are three criteria that, for me, really matter: Technology, culture, and money.
I don’t think that’s controversial. But this is: Those are in increasing order of importance. At this point in time, I don’t think the technology matters at all, and money matters more than all the others put together. Here’s why.
Mastodon and the rest of the fediverse rely on ActivityPub implementations. Bluesky relies on the AT Protocol, of which so far there’s only one serious implementation.
Both of these protocols are good enough. We know this is true because both are actually working at scale, providing good and reliable experiences to large numbers of people. It’s reasonable to worry what happens when you get to billions of users and also about which is more expensive to operate. But speaking as someone who spent decades in software and saw it from the inside at Google and AWS, I say: meh. My profession knows how to make this shit work and work at scale. Neither alternative is going to fail, or to trounce its competition, because of technology.
I could write many paragraphs about the competing nice features and problems of the competing platforms, and many people have. But it doesn’t matter that much because they’re both OK.
At the moment, Bluesky seems, generally speaking, to be more fun. The Fediverse is kind of lefty and geeky and queer. The unfortunate Mastodon culture of two years ago (“Ewww, you want us to have better tools and be more popular? Go away!”) seems to have mostly faded out. But the Fediverse doesn’t have much in the way of celebrities shitposting about the meme-du-jour. In fact it’s definitely celebrity-lite.
I enjoy both cultural flavors, but find Fedi quite a lot more conversational. There are others who find the opposite.
More important, I don’t think either culture is set in stone, or has lost the potential to grow in multiple new, interesting directions.
Here’s the thing. Whatever you think of capitalism, the evidence is overwhelming: Social networks with a single proprietor have trouble with long-term survival, and those that do survive have trouble with user-experience quality: see Enshittification.
The evidence is also perfectly clear that it doesn’t have to be this way. The original social network, email, is now into its sixth decade of vigorous life. It ain’t perfect but it is essential, and not in any serious danger.
The single crucial difference between email and all those other networks — maybe the only significant difference — is that nobody owns or controls it. If you have a deployment that can speak the languages of IMAP and SMTP and the many anti-spam tools, you are de facto part of the global email social network.
The definitive essay on this question is Mike Masnick’s Protocols, Not Platforms: A Technological Approach to Free Speech. (Mike is now on Bluesky’s Board of Directors.)
My bet for the future (and I think it’s the only one with a chance) is a global protocol-based conversation with many thousands of individual service providers, many of which aren’t profit-oriented businesses. One of them could be your local Buddhist temple, and another could be Facebook. The possibilities are endless: Universities, government departments, political parties, advocacy organizations, sports teams, and, yes, tech companies.
It’s obvious to me that the Fediverse has the potential to become just this. Because it’s most of the way there already.
Could Bluesky? Well, maybe. As far as I can tell, the underlying AT Protocol is non-proprietary and free for anyone to build on. Which means that it’s not impossible. But at the moment, the service and the app are developed and operated by “Bluesky Social, PBC”. In practice, if that company fails, the app and the network go away. Here’s a bit of Bluesky dialogue:
In practice, “Bsky corp” is not in immediate danger of hard times. Their team is much larger than Mastodon’s and on October 24th they announced they’d received $15M in funding, which should buy them at least a year.
But that isn’t entirely good news. The firm that led the investment is seriously sketchy, with strong MAGA and cryptocurrency connections.
The real problem, in my mind, isn’t in the nature of this particular Venture-Capital operation. Because the whole raison-d’etre of Venture Capital is to make money for the “Limited Partners” who provide the capital. Since VC investments are high-risk, most are expected to fail, and the ones that succeed have to exhibit exceptional revenue growth and profitability. Which is a direct path to the problems of survival and product quality that I mentioned above.
Having said that, the investment announcement is full of soothing words about focus on serving the user and denials that they’ll go down the corrupt and broken crypto road. I would like to believe that, but it’s really difficult.
To be clear, I’m a fan of the Bluesky leadership and engineering team. With the VC money as fuel, I expect their next 12 months or so to be golden, with lots of groovy features and mind-blowing growth. But that’s not what I’ll be watching.
I’ll be looking for ecosystem growth in directions that enable survival independent of the company. In the way that email is independent of any technology provider or network operator.
Just like Mastodon and the Fediverse already are.
Yes, in comparison to Bluesky, Mastodon has a smaller development team and slower growth and fewer celebrities and less buzz. It’s supported by Patreon donations and volunteer labor. And in the case of my own registered co-operative instance CoSocial.ca, membership dues of $50/year.
Think of the Fediverse not as just one organism, but a population of mammals, scurrying around the ankles of the bigger and richer alternatives. And when those alternatives enshittify or fall to earth, the Fediversians will still be there. That’s why it’s where my social-media energy is still going.
On the Fediverse you can follow a hashtag and I’m subscribed to #Bluesky, which means a whole lot of smart, passionate writing on the subject has been coming across my radar. If you’re interested enough to have read to the bottom of this piece, I bet one or more of these will reward an investment of your time:
Maybe Bluesky has “won”, by Gavin Anderegg, goes deep on the trade-offs around Bluesky’s AT Protocol and shares my concern about money.
Blue Sky Mine, by Rob Horning, ignores technology and wonders about the future of text-centric social media and is optimistic about Bluesky.
Does Bluesky have the juice?, by Max Read, is kind of cynical but says smart things about the wave of people currently landing on Bluesky.
The Great Migration to Bluesky Gives Me Hope for the Future of the Internet, by Jason Koebler over at 404 Media, is super-optimistic: “Bluesky feels more vibrant and more filled with real humans than any other social media network on the internet has felt in a very long time.” He also wonders out loud if Threads’ flirtation with Mastodon has been damaging. Hmm.
And finally there’s Cory Doctorow, probably the leading thinker about the existential conflict between capitalism and life online, with Bluesky and enshittification. This is the one to read if you’re thinking that I’m overthinking and over-worrying about a product that is actually pretty nice and currently doing pretty well. If you don’t know what a “Ulysses Pact” is, you should read up and learn about it. Strong stuff.
2024-11-15 04:00:00
They’re listening to us too much, and watching too. We’re not happy about it. The feeling is appropriate but we’ve been unclear about why we feel it.
[Note: This is adapted from a piece called Privacy Primer that I published on Medium in 2013. I did this mostly because Medium was new and shiny then and I wanted to try it out. But I’ve repeatedly wanted to refer to it and then when I looked, wanted to fix it up a little, so I’ve migrated it back to its natural home on the blog.]
This causes two problems: First, people worry that they’re being unreasonable or paranoid or something (they’re not). Second, we lack the right rhetoric (in the formal sense; language aimed at convincing others) for the occasions when we find ourselves talking to the unworried, or to law-enforcement officials, or to the public servants minding the legal framework that empowers the watchers.
The reason I’m writing this is to shoot holes in the “If you haven’t done anything wrong, don’t worry” story. Because it’s deeply broken and we need to refute it efficiently if we’re going to make any progress.
Living in a civilized country means you don’t have to poop in a ditch, you don’t have to fetch water from the well or firewood from the forest, and you don’t have to share details of your personal life. It is a huge gift of civilization that behind your front door you need not care what people think about how you dress, how you sleep, or how you cook. And that when communicating with friends and colleagues and loved ones, you need not care what anyone thinks unless you’ve invited them to the conversation.
Privacy doesn’t need any more justification. It’s a quality-of-life thing and needs no further defense. We and generations of ancestors have worked hard to build a civilized society and one of the rewards is that often, we can relax and just be our private selves. So we should resist anyone who wants to take that away.
The public servants and private surveillance-capitalists who are doing the watching are, at the end of the day, people. Mostly honorable and honest; but some proportion will always be crooked or insane or just bad people; no higher than in the general population, but never zero. I don’t think Canada, where I live, is worse than anywhere else, but we see a pretty steady flow of police brutality and corruption stories. And advertising is not a profession built around integrity. These are facts of life.
Given this, it’s unreasonable to give people the ability to spy on us without factoring in checks and balances to keep the rogues among them from wreaking havoc.
You might think that your communications are definitely not suspicious or sketchy, and in fact boring, and so why should you want privacy or take any effort to have it?
Because you’re forgetting about the people who do need privacy. If only the “suspicious” stuff is made private, then our adversaries will assume that anything that’s private must be suspicious. That endangers our basic civilizational privacy privilege and isn’t a place we want to be.
First, it’s OK to say “I don’t want to be watched”; no justification is necessary. Second, as a matter of civic hygiene, we need to be regulating our watchers, watching out for individual rogues and corrupt cultures.
So it’s OK to demand privacy by default; to fight back against those who would commandeer the Internet; and (especially) to use politics to empower the watchers’ watchers; make their political regulators at least as frightened of the voters as of the enemy.
That’s the reasonable point of view. It’s the surveillance-culture people who want to abridge your privacy who are being unreasonable.
2024-11-12 04:00:00
It’s probably part of your life too. What happened was, we moved to a new place and it had a room set up for a huge TV, so I was left with no choice but to get one. Which got me thinking about TV in general and naturally it spilled over here into the blog. There is good and bad news.
It’s hard. You visit Wirecutter and Consumer Reports and the model numbers they recommend often don’t quite match the listings at the bigbox Web site. Plus too many choices. Plus it’s deceiving because all the name-brand TVs these days have fabulous pictures.
Having bought a TV doesn’t make me an expert, but for what it’s worth we got a 77" Samsung S90C, which is Samsung’s second-best offering from 2023. Both WC and CR liked it last year and specifically called out that it works well in a bright room; ours is south-facing. And hey, it has quantum dots, so it must be good.
Actually I do have advice. There seems to be a pattern where last year’s TV is often a good buy, if you can find one. And you know where you can often find last year’s good product at a good price? Costco, that’s where, and that’s where we went. Glad we did, because when after a week’s use it frapped out, Costco answered the phone pretty quick and sent over a replacement.
But anyhow, the upside is that you’ll probably like whatever you get. TVs are just really good these days.
We were moving the gear around and I snapped a picture of all the video and audio pieces stacked up together. From behind.
Speaking as a guy who’s done standards: This picture is evidence of excellence. All those connections, and the signals they exchange, are 100% interoperable. All the signals are “line-level” (RCA or XLR wires and connectors), or video streams (HDMI), or to speakers (you should care about impedance and capacitance, but 12ga copper is good enough).
Put another way: No two of those boxes come from the same vendor, but when I wired them all up it Just Worked. First time, no compatibility issues. The software profession can only dream of this level of excellence.
Because of this, you can buy all the necessary connectors and cabling super-cheap from your favorite online vendor, but if you’re in Vancouver, go to Lee’s Electronics, where they have everything and intelligent humans will help you find it.
Out of the box the default settings yield eye-stabbing brilliance and contrast, entirely unrealistic, suitable (I guess?) for the bigbox shelves.
“So, adjust the picture,” you say. Cue bitter laughter. There are lots of dials to twist; too many really. And how do you know when you’ve got it right? Of course there are YouTubers with advice, but they don’t agree with each other and are short on quantitative data or color science, it’s mostly “This is how I do it and it looks great so you should too.”
What I want is the equivalent of the Datacolor “Spyder” color calibrators. And I wonder why such a thing couldn’t be a mobile app — phonecams are very high-quality these days and have nice low-level APIs. You’d plug your phone into the screen with a USB-C-to-HDMI adapter, it’d put patterns on the screen, and you’d point your phone at them, and it’d tell you how close you are to neutral.
It turns out there are objective standards and methods for measuring color performance; for example, see “Delta-E” in the Tom’s Hardware S90C review. But they don’t help consumers, even reasonably technical ones like me, fine-tune their own sets.
Anyhow, most modern TVs have a “Filmmaker” or “Cinema” setting which is said to be the truest-to-life. So I pick that and then fine-tune it, subjectively. Measurements, who needs ’em?
Our TVs spy on us. I have repeatedly read that hardware prices are low because the profit is in mining and selling your watching habits. I’ve not read anything that has actual hard facts about who’s buying and how much they’re paying, but it feels so obvious that it’d be stupid not to believe it.
It’s hopeless to try and keep it from happening. If you’re watching a show on Netflix or a ballgame on MLB.tv, or anything on anything really, they’re gonna sell that fact, they’re up-front about it.
What really frosts my socks, though, is ACR, Automatic Content Recognition, where the TV sends hashed screenshots to home base so it (along with Netflix and MLB and so on) can sell your consumption habits to whoever.
Anyhow, here’s what we do. First, prevent the TV from connecting to the Internet, then play all the streaming services through a little Roku box. (With the exception of one sports streamer that only does Chromecast.) Roku lets you turn off ACR, and Chromecast promises not to. Imperfect but better than nothing.
That’s the problem, of course. It seems likely we’re in the declining tail-end of the Golden Age of TV. The streamers, having ripped the guts out of the cable-TV industry, are turning into Cable, the Next Generation. The price increases are relentless. I haven’t so far seen a general quality decline but I’ve read stories about cost-cutting all over the industry. Even, for example, at Apple, which is currently a quality offering.
And, of course, subscription fatigue. There are lots of shows that everyone agrees are excellent that we’ll never see because I just absolutely will not open my wallet on a monthly basis to yet another outgoing funnel. I keep thinking I should be able to pay to watch individual shows that I want to watch, no matter who’s streaming them. Seems that’s crazy talk.
We only watch episodic TV one evening or so a week, and only a couple episodes at a time, so we’re in no danger of running out of input. I imagine being (unlike us) a real video connoisseur must be an (expensive) pain in the ass these days.
But you already knew all that.
Well, yeah. A big honkin’ modern TV being driven by a quality 4K signal can be pretty great. We’re currently watching 3 Body Problem, which has occasional fabulous visuals and also good sound design. I’m pretty sure the data show that 4K, by any reasonable metric, offers enough resolution and color-space coverage for any screen that can fit in a reasonable home. (Sidebar: Why 8K failed.)
The best picture these days is the big-money streamer shows. But not only. On many evenings, I watch YouTube concert videos before I go to bed. The supply is effectively infinite. Some of them are shakycam productions filmed from row 54 (to be fair, some of those capture remarkably good sound). But others are quality 4K productions and I have to say that can be a pretty dazzling sensory experience.
Here are a couple of captures from a well-shot show on PJ Harvey’s current tour, which by the way is musically fabulous. No, I didn’t get it off the TV, I got it from a 4K monitor on my Mac, but I think it gives the feel.
In our previous place we had a big living room with the deranged-audiophile stereo in it, and the TV was in a little side-room we called the Video Cave. The new place has a media room with the big TV wall, so I integrated the systems and now the sound accompanying the picture goes through high-end amplification and (for the two front channels) speakers.
It makes more difference than I would have thought. If you want to improve your home-theatre experience, given that TV performance is plateauing, better speakers might be a good option.
I like live-sports TV. I acknowledge many readers will find this distasteful, for reasons I can’t really disagree with; not least is maybe encouraging brain-damaging behavior in young men. I can’t help it; decades ago I was a pretty good basketball player at university and a few of those games remain among my most intense memories.
I mean, I like drama. In particular I like unscripted drama, where neither you nor your TV hosts know how the show’s going to end. Which is to say, live sports.
I’ve griped about this before, but once again: The state of the sports-broadcasting art is shamefully behind what the hardware can do.
The quality is all over the map, but football, both fútbol and gridiron, is generally awful. I’ve read that the problem is the expense of the in-stadium broadcast infrastructure, routing all the fat 4K streams to the TV truck and turning them into a coherent broadcast.
In practice, what we’re getting is not even as good as a quality 1080p signal. It’s worth noting that Apple TV’s MLS and MLB broadcasts are noticeably better (the sound is a lot better).
It can only improve, right?
When I sit on the comfy video-facing furniture, I need to control Samsung, Parasound, Marantz, Roku, and Chromecast devices. We use a Logitech Harmony; I have another in reserve, both bought off eBay. Logitech has dropped the product but someone is still updating the database; or at least was through 2023, because it knows how to talk to that Samsung TV.
They work well enough that I don’t have to be there for other family members to watch a show. Once they wear out, I have no freaking idea what Plan B is. That’s OK, maybe I’ll die first. And because (as noted above) the audio side has superb interoperability, I can count on upgrading speakers and amplifiers and so on for as long as I last.
Yes, I guess, for TV hardware. As for the shows, who knows? Not my problem; I’m old enough and watch little enough that there’s plenty out there to fill the remainder of my life.
2024-10-30 03:00:00
I took a picture looking down a lane at sunset and liked the way it came out, so I prettied it up a bit in Lightroom to post on Mastodon. When I exported the JPG, I was suddenly in the world of C2PA, so here’s a report on progress and problems. This article is a bit on the geeky side but I think the most interesting bits concern policy issues. So if you’re interested in online truth and disinformation you might want to read on.
If you don’t know what “C2PA” is, I immodestly think my introduction is a decent place to start. Tl;dr: Verifiable provenance for online media files. If for some reason you think “That can’t possibly work”, please go read my intro.
Here’s the Lightroom photo-export dialog that got my attention:
There’s interesting stuff in that dialog. First, it’s “Early Access”, and I hope that means not fixed in stone, because there are issues (not just the obvious typo); I’ll get to them.
There’s a choice of where to put the C2PA data (if you want any): Right there in the image, in “Content Credentials Cloud” (let’s say CCC), or both. That CCC stuff is (weakly) explained here — scroll down to “How are Content Credentials stored and recovered?” I think storing the C2PA data in an online service rather than in the photo is an OK idea — doesn’t weaken the verifiability story I think, although as a blogger I might be happier if it were stored here on the blog? This whole area is work in progress.
What surprised me on that Adobe CCC page was the suggestion that you might be able to recover the C2PA data about a picture from which it had been stripped. Obviously this could be a very bad thing if you’d stripped that data for a good reason.
I’m wondering what other fields you could search on in CCC… could you find pictures if you knew what camera they were shot with, on some particular date? Lots of complicated policy issues here.
Also there’s the matter of size: The raw JPG of the picture is 346K, which balloons to 582K with the C2PA. Which doesn’t bother me in the slightest, but if I were serving millions of pictures per day it would.
I maintain that the single most important thing about C2PA isn’t recording what camera or software was used, it’s identifying who the source of the picture is. Because, living online, your decisions on what to believe are going to rely heavily on who to believe. So what does Lightroom’s C2PA feature offer?
First, it asserts that the picture is by “Timothy Bray”; notice that that value is hardwired and I can’t change it. Second, that there’s a connected account at Instagram. In the C2PA, these assertions are signed with an Adobe-issued certificate, which is to say Adobe thinks you should believe them.
Let’s look at both. Adobe is willing to sign off on the author being “Timothy Bray”, but they know a lot more about me; my email, and that I’ve been a paying customer for years. Acknowledging my name is nice but it’d be really unsurprising if they have another Tim Bray among their millions of customers. And suppose my name was Jane Smith or some such.
It’d be well within Adobe’s powers to become an identity provider and give me a permanent ID like “https://id.adobe.com/timbray0351”, and include that in the C2PA. Which would be way more useful to establish provenance, but then Adobe Legal would want to take a very close look at what they’d be getting themselves into.
But maybe that’s OK, because it’s offering to include my “Connected” Instagram account, https://www.instagram.com/twbray. By “connected” they mean that Lightroom went through an OAuth dance with Meta and I had to authorize either giving Insta access to Adobe or Adobe to Insta, I forget which. Anyhow, that OAuth stuff works. Adobe really truly knows that I control that Insta ID and they can cheerfully sign off on that fact.
They also offered me the choice of Behance, Xitter, and LinkedIn.
I’ll be honest: This excites me. If I really want to establish confidence that this picture is from me, I can’t think of a better way than a verifiable link to a bunch of my online presences, saying “this is from that guy you also know as…” Obviously, I want them to add my blog and Mastodon and Bluesky and Google and Apple and my employer and my alma mater and my bank, and then let me choose, per picture, which (if any) of those I want to include in the C2PA. This is very powerful stuff on the provenance front.
Note that the C2PA doesn’t include anything about what kind of device I took the picture on (a Pixel), nor when I took it, but that’d be reasonably straightforward for Google’s camera app to include. I don’t think that information is as important as provenance but I can imagine applications where it’d be interesting.
The final choice in that export dialog is whether I want to disclose what I did in Lightroom: “Edits and Activity”. Once again, that’s not as interesting as the provenance, but it might be if we wanted to flag AI intervention. And there are already problems in how that data is used; more below.
Anyhow, here’s the picture; I don’t know if it pleases your eye but it does mine.
Now, that image just above has been through the ongoing publishing system, which doesn’t know about C2PA, but if you click and enlarge it, the version you get is straight outta Lightroom and retains the C2PA data.
If you want to be sure, install
c2patool, and apply it to lane.jpg.
Too lazy? No problem, because here’s the JSON output (with the --detailed
option).
If you’re geeky at all and care about this stuff, you might want to poke around in there.
Another thing you might want to do is download lane.jpg and feed it to the Adobe Content Authenticity Inspect page. Here’s what you get:
This is obviously a service that’s early in its life and undoubtedly will get more polish. But still, interesting and useful.
In case it’s not obvious, I’m pretty bullish on C2PA and think it provides us useful weapons against online disinformation and to support trust frameworks. So, yay Adobe, congrats on an excellent start! But, things bother me:
[Update: There used to be a complaint about c2patool here, but its author got in touch with me and pointed out that when you run it and doesn’t complain about validation problems, that means there weren’t any. Very UNIX. Oops.]
Adobe’s Inspector is also available as a Chrome extension. I’m assuming they’ll support more browsers going forward. Assuming a browser extension is actually useful, which isn’t obvious.
The Inspector’s description of what I did in Lightroom doesn’t correspond very well to what the C2PA data says. What I actually did, per the C2PA, was (look for “actions” in the JSON):
Opened an existing file named “PXL_20241013_020608588.jpg”.
Reduced the exposure by -15.
Generated a (non-AI) mask, a linear gradient from the top of the picture down.
In the mask, moved the “Shadows” slider to -16.
Cropped and straightened the picture (the C2PA doesn’t say how much).
Changed the masking again; not sure why this is here because I didn’t do any more editing.
The Inspector output tries to express all this in vague nontechnical English, which loses a lot of information and in one case is just wrong: “Drawing edits: Used tools like pencils, brushes, erasers, or shape, path, or pen tools”. I think that in 2024, anyone who cares enough to look at this stuff knows about cropping and exposure adjustments and so on, they’re ubiquitous everywhere photos are shared.
If I generate C2PA data in an Adobe product, and if I’ve used any of their AI-based tools that either create or remove content, that absolutely should be recorded in the C2PA. Not as an optional extra.
I really, really want Adobe to build a flexible identity framework so you can link to identities via DNS records or
.well-known
files or OpenID Connect flows, so that I get to pick which identities are included with the
C2PA. This, I think, would be huge.
This is not an Adobe problem, but it bothers me that I can’t upload this photo to any of my social-media accounts without losing the C2PA data. It would be a massive win if all the social-media platforms, when you uploaded a photo with C2PA data, preserved it and added more, saying who initially uploaded it. If you know anyone who writes social-media software, please tell them.
Once again, this is progress! Life online with media provenance will be better than the before times.
2024-10-29 03:00:00
The ads are everywhere; on bus shelters and in big-money live-sportscasts and Web interstitials. They say Apple’s products are great because Apple Intelligence and Google’s too because Google Gemini. I think what’s going on here is pretty obvious and a little sad. AI and GG are LLMM: Large Language Mobile Marketing!
It looks like this:
Here are nice factual Wikipedia rundowns on Apple Intelligence and Google Gemini.
The object of the game is to sell devices, and the premise seems to be that people will want to buy them because they’re excited about what AI and GG will do for them. When they arrive, that is, which I guess they’re just now starting to. I guess I’m a little more LLM-skeptical than your average geek, but I read the list of features and thought: Would this sort of thing accelerate my mobile-device-upgrade latency, which at the moment is around three years? Um, no. Anyone’s? Still dubious.
Quite possibly I’m wrong. Maybe there’ll be a wave of influencers raving about how AI/GG improved their sex lives, income, and Buddha-nature, the masses will say “gotta get me some of that” and quarterly sales will soar past everyone’s stretch goals.
I think that the LLMania among the investor/executive class led to a situation where massive engineering muscle was thrown at anything with genAI in its pitch, and when it came time to ship, demanded that that be the white-hot center of the launch marketing.
Because just at the moment, a whole lot of nontechnical people with decision-making power have decided that it’s lethally risky not to bet the farm on a technology they don’t understand. It’s not like it’s the first time it’s happened.
First, because the time has long gone when a new mobile-device feature changed everyone’s life. Everything about them is incrementally better every year. When yours wears out, there’ll be a bit of new-shiny feel about onboarding to your new one. But seriously, what proportion of people buy a new phone for any reason other than “the old one wore out”?
This is sad personally for me because I was privileged to be there, an infinitesimally small contributor during the first years of the mobile wave, when many new features felt miraculous. It was a fine time but it’s gone.
The other reason it’s sad is the remorseless logic of financialized capitalism; the revenue number must go up even when the audience isn’t, and major low-hanging unmet needs are increasingly hard to find.
So, the machine creates a new unmet need (for AI/GG) and plasters it on bus shelters and my TV screen. I wish they wouldn’t.