MoreRSS

site iconMacStoriesModify

About all things apps, automation, the latest in tech, and more.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of MacStories

Clawdbot Showed Me What the Future of Personal AI Assistants Looks Like

2026-01-21 00:45:51

Using Clawdbot via Telegram.

Using Clawdbot via Telegram.

For the past week or so, I’ve been working with a digital assistant that knows my name, my preferences for my morning routine, how I like to use Notion and Todoist, but which also knows how to control Spotify and my Sonos speaker, my Philips Hue lights, as well as my Gmail. It runs on Anthropic’s Claude Opus 4.5 model, but I chat with it using Telegram. I called the assistant Navi (inspired by the fairy companion of Ocarina of Time, not the besieged alien race in James Cameron’s sci-fi film saga), and Navi can even receive audio messages from me and respond with other audio messages generated with the latest ElevenLabs text-to-speech model. Oh, and did I mention that Navi can improve itself with new features and that it’s running on my own M4 Mac mini server?

If this intro just gave you whiplash, imagine my reaction when I first started playing around with Clawdbot, the incredible open-source project by Peter Steinberger (a name that should be familiar to longtime MacStories readers) that’s become very popular in certain AI communities over the past few weeks. I kept seeing Clawdbot being mentioned by people I follow; eventually, I gave in to peer pressure, followed the instructions provided by the funny crustacean mascot on the app’s website, installed Clawdbot on my new M4 Mac mini (which is not my main production machine), and connected it to Telegram.

To say that Clawdbot has fundamentally altered my perspective of what it means to have an intelligent, personal AI assistant in 2026 would be an understatement. I’ve been playing around with Clawdbot so much, I’ve burned through 180 million tokens on the Anthropic API (yikes), and I’ve had fewer and fewer conversations with the “regular” Claude and ChatGPT apps in the process. Don’t get me wrong: Clawdbot is a nerdy project, a tinkerer’s laboratory that is not poised to overtake the popularity of consumer LLMs any time soon. Still, Clawdbot points at a fascinating future for digital assistants, and it’s exactly the kind of bleeding-edge project that MacStories readers will appreciate.

Clawdbot can be overwhelming at first, so I’ll try my best to explain what it is and why it’s so exciting and fun to play around with. Clawdbot is, at a high level, two things:

  • An LLM-powered agent that runs on your computer and can use many of the popular models such as Claude, Gemini, etc.
  • A “gateway” that lets you talk to the agent using the messaging app of your choice, including iMessage, Telegram, WhatsApp and others.

The second aspect was immediately fascinating to me: instead of having to install yet another app, Clawdbot’s integration with multiple messaging services meant I could use it in an app I was already familiar with. Plus, having an assistant live in Messages or Telegram further contributes to the feeling that you’re sending requests to an actual assistant.

The “agent” part of Clawdbot is key, however. Clawdbot runs entirely on your computer, locally, and keeps its settings, preferences, user memories, and other instructions as literal folders and Markdown documents on your machine. Think of it as the equivalent of Obsidian: while there is a cloud service behind it (for Obsidian, it’s Sync; for Clawdbot, it’s the LLM provider you choose), everything else runs locally, on-device, and can be directly controlled and infinitely tweaked by you, either manually, or by asking Clawdbot to change a specific aspect of itself to suit your needs.

My local Clawdbot setup. It’s just folders and some Markdown files.

My local Clawdbot setup. It’s just folders and some Markdown files.

Which brings me to the most important – and powerful – trait of Clawdbot: because the agent is running on your computer, it has access to a shell and your filesystem. Given the right permissions, Clawdbot can execute Terminal commands, write scripts on the fly and execute them, install skills to gain new capabilities, and set up MCP servers to give itself new external integrations. Combine all this with a vibrant community that is contributing skills and plugins for Clawdbot, plus Steinberger’s own collection of command-line utilities, and you have yourself a recipe for a self-improving, steerable, and open personal agent that knows you, can access the web, runs on your local machine, and can do just about anything you can think of. All of this while communicating with it using text messages. It’s an AI nerd’s dream come true, and it’s a lot to wrap your head around at first.

To give you a sense of what’s possible: I asked Clawdbot to give itself support for generating images with Google’s Nano Banana Pro model. After it did that (Clawdbot even told me how to securely store my Gemini credentials in the native macOS Keychain), I asked Navi to give itself a profile picture that combined its original crustacean character with Navi from The Legend of Zelda. The result was a fairy crab featuring the popular “Hey, Listen!” phrase from the videogame, which it preemptively found on the web via a Google search:

Meet Navi, my favorite fairy crustacean.

Meet Navi, my favorite fairy crustacean.

I then went a step beyond: I asked Navi to assess the state of its features and use Nano Banana to create an infographic that described its structure. Since Clawdbot is running on my computer and its features are contained in folders, Clawdbot scanned its own /clawd directory in Finder, went to Nano Banana, and produced the following image:

It’s pure AI slop, but it also shows how Clawdbot is aware of its configuration and how it is structured in Finder.

It’s pure AI slop, but it also shows how Clawdbot is aware of its configuration and how it is structured in Finder.

As you can tell from the image, I’m barely scraping the surface of Clawdbot’s abilities here. “Memory files” are, effectively, daily notes formatted in Markdown that Clawdbot auto-generates each day to keep a plain text log of our interactions. This is its memory system based on Markdown which, if I wanted, I could plug into Obsidian, search with Raycast, or automate with Hazel in some other way.

A daily memory note automatically created by Clawdbot.

A daily memory note automatically created by Clawdbot.

The integrations are, by far, the most fun I’ve had with an LLM in recent years. In keeping with the “you can just do things” philosophy we’ve discussed on AppStories lately, if you want Clawdbot to give itself a piece of functionality that it doesn’t have by default, you can just ask it to do so, and it’ll do it for you. Case in point: a while back I shared a shortcut for Club MacStories members that quickly transcribes audio messages using the Whisper model hosted on Groq. I grabbed a link to the article, gave it to Clawd, and told it that I wanted it to give itself support for transcribing Telegram’s audio messages with that system. Two minutes later, it created a skill that adapted my shortcut for Clawd running on my Mac mini.

A skill created by Clawd.

A skill created by Clawd.

Then, I went a step beyond: like any good assistant, I wanted to make sure that if I issued a request with voice, Navi would also respond with voice, and if I sent a written request, Navi would reply with text. At that point, Clawdbot went off to do some research, found the ElevenLabs documentation for their new TTS model, asked me for ElevenLabs credentials, and created three test voices with different personalities for me to choose from. I chose one, fine-tuned it a little bit, and a few minutes later, Navi had a “voice” to use for future audio replies. Now, when I want to ask my assistant something but I’m busy doing something else and can’t type, I just send it a brain dump as an audio message on Telegram and, a few seconds later, I have a reply ready for me to listen to.

Every morning, Clawdbot sends me a daily report based on my calendar, Notion, Todoist, and more. It also sends me an audio version of the report that features silly artwork created by Nano Banana each day.

Every morning, Clawdbot sends me a daily report based on my calendar, Notion, Todoist, and more. It also sends me an audio version of the report that features silly artwork created by Nano Banana each day.

Being able to dictate messages in either Italian or English – or a mix of both! – for my assistant running in Telegram has been amazing – especially when you consider how the iPhone’s own Siri is still not multilingual today, let alone capable of understanding user context or performing long-running tasks in the background.

Still not impressed? How about this:

Last night, I wondered if I could replace some automations I had configured years ago on Zapier with equivalent actions running on my Mac mini via Clawd, to save some extra money each month. One of them, for instance, was a “zap” that created a project for the next issue of MacStories Weekly in my Todoist soon after we send the newsletter each Friday. It does so by checking an RSS feed, adding 1 to the issue number, and creating a new project via the Todoist API. I asked Clawd if it was possible to replicate it and, surely enough, it outlined a plan: we could set up a cron job on the Mac mini, check the RSS feed every few hours, and create a new project whenever a new issue appears in the feed. Five minutes of back and forth later, Clawd created everything on my Mac, with no cloud dependency, no subscription required – just the task I asked for, pieced together by an LLM with existing shell tools and Internet access. It makes me wonder how many automation layers and services I could replace by giving Clawd some prompts and shell access.

Another skill created by Clawd, which is triggered by an automation.

Another skill created by Clawd, which is triggered by an automation.

All of this is exhilarating and scary at the same time. More so than using the latest flavors of Claude or ChatGPT, using Clawdbot and the process of continuously shaping it for my needs and preferences has been the closest I’ve felt to a higher degree of digital intelligence in a while. I understand now why Fidji Simo, CEO of Applications at OpenAI, wrote that AI labs should do much more to leverage the capabilities of models (address the “capability overhang”) to build personal super-assistants. When I’m using ChatGPT or Claude, the models are constrained by the features that their developers give them and we, the users, can’t do much to tweak the experience. Conversely, Clawdbot is the ultimate expression of a new generation of malleable software that is personalized and adaptive: I get to choose what Clawdbot should be capable of, and I can always inspect what’s going on behind the scenes and, if I don’t like it, ask for changes. Being able to make my computer do things – anything – by just talking to an agent running inside it is incredibly fun, addictive, and educational: I’ve learned more about SSH, cron, web APIs, and Tailscale in the past week than I ever did in almost two decades of tinkering with computers.

Working with Clawdbot on my MacBook Pro. The assistant has access to my Notion and can even control Spotify playback via a Spotify integration.

Working with Clawdbot on my MacBook Pro. The assistant has access to my Notion and can even control Spotify playback via a Spotify integration.

Clawdbot also serves as a shining example of what happens when you give modern agents (with the right harness) access to a computer: they can just build things and become smarter for specific users (but not more intelligent in general) via quasi-recursive improvement. It’s no wonder that all AI companies have noticed, and every major feature launch these days is about a virtual filesystem sandbox or CLI access.

As I argued on AppStories, I believe that the repercussions of all this will soon ripple through the various app stores, and we’ll need to have serious conversations about the role of app developers going forward. Clawdbot is a boutique, nerdy project right now, but consider it as an underlying trend going forward: when the major consumer LLMs become smart and intuitive enough to adapt to you on-demand for any given functionality – when you’ll eventually be able to ask Claude or ChatGPT to do or create anything on your computer with no Terminal UI – what will become of “apps” created by professional developers? I especially worry about standalone utility apps: if Clawdbot can create a virtual remote for my LG television (something I did) or give me a personalized report with voice every morning (another cron job I set up) that work exactly the way I want, why should I even bother going to the App Store to look for pre-built solutions made by someone else? What happens to Shortcuts when any “automation” I may want to carefully create is actually just a text message to a digital assistant away?

I don’t know the answers to these questions right now, but we’re going to try and unpack all of them on AppStories and MacStories this year.

For now, I’ll stop here: Clawdbot is an outstanding project that I can’t recommend tinkering with enough if you find the idea even remotely interesting. Clawdbot has shown me that we’ve barely begun to tap into the potential of LLMs as personal assistants. There’s no going back after wielding this kind of superpower.

How to Enable Smoother 120Hz Scrolling in Safari

2026-01-21 00:15:47

I came across this incredible tip by Matt Birchler a few weeks ago and forgot to link it on MacStories:

Today I learned something amazing: Safari supports higher than 60Hz refresh. It’s the only mainstream web browser that doesn’t, and I have never understood why, but apparently as of the end of 2025 in Safari version 26.3 (and maybe earlier) you can enable it. Here’s how to do it.

I won’t paste the steps here, so you’ll have to click through and visit Matt’s website (I keep recommending his work, and he’s doing some really interesting work with “micro apps” lately). I can’t believe this feature is disabled by default on iOS and iPadOS; I turned it on several days ago, and it made browsing with Safari significantly nicer.

Also new to me: I discovered this outstandingly weird website that lets you test your browser’s refresh and frame rates. Just trust me and click through that as well – what a great way to show people who “don’t see” refresh rates what they actually feel like in practice.

Podcast Rewind: Holiday Break Projects, CES Gaming Announcements, MagSafe Accessories, and Clawdbot

2026-01-17 05:37:40

Enjoy the latest episodes from MacStories’ family of podcasts:

AppStories

This week, Federico and John are back from their holiday break, which included so many hardware and automation projects that this is part one of a two-part episode regarding Federico’s networked music automation setup and John’s new research tool.

On AppStories+, Federico shares his foldable phone experiments.

NPC: Next Portable Console

This week, a whirlwind tour of the handheld news from CES 2026, Switch 2 grips, AR glasses, new chips from AMD, the OneXSugar Wallet, and more.

On NPC XL, John got his MCON controller, Federico’s still waiting for his, and Brendon checks in after more than a month with his.

Comfort Zone

Niléane battles the modern era of smart mice, Chris has a new way of keeping up with everything everywhere all at once, and the whole gang battles it out with the best dang MagSafe accessories you’ve ever seen.

On this week’s Cozy Zone, we roast listeners’ home screens again! This one has, bar none, the weirdest home screen we’ve ever seen, and we might be cursed now.

MacStories Unwind

This week, Federico unpacks Clawdbot, a Claude-based personal assistant, and recommends an Apple TV movie, while John revisits an old favorite on the Nintendo Switch.


AppStories, Episode 467, ‘A Very Nerdy Holiday Break’ Show Notes

This episode is sponsored by:

Presenting Our Nerdy Holiday Projects

AppStories+ Post-Show

Subscribe to AppStories+

Visit AppStories.net to learn more about the extended, high bitrate audio version of AppStories that is delivered early each week and subscribe.


NPC, Episode 63, ‘CES 2026: Foldables, Rollables, and Robot Punches’ Show Notes

The Portable Gaming News from CES

NPC XL

Subscribe to NPC XL

NPC XL is a weekly members-only version of NPC with extra content, available exclusively through our new Patreon for $5/month. Each week on NPC XL, Federico, Brendon, and John record a special segment or deep dive about a particular topic that is released alongside the “regular” NPC episodes. You can subscribe here.


Comfort Zone, Episode 83, ‘Are You Predicting Doom?’ Show Notes

Main Topics

Other Things We Talked About

Cozy Zone

For even more from the Comfort Zone crew, you can subscribe to Cozy Zone. Cozy Zone is a weekly bonus episode of Comfort Zone where Matt, Niléane, and Chris invite listeners to join them in the Cozy Zone where they’ll cover extra topics, invent wilder challenges and games, and share all their great (and not so great) takes on tech. You can subscribe to Cozy Zone for $5 per month here or $50 per year here.


MacStories Unwind, ‘Clawdbot, Animal Crossing, and Mark Wahlberg’ Show Notes

Clawdbot

Picks

Unwind Deal

  • Licorice Pizza: Paul Thomas Anderson’s critically acclaimed coming-of-age tale follows Alana Kane and Gary Valentine as they navigate love and life in 1973 San Fernando Valley. This Academy Award-nominated film features breakout performances and Anderson’s signature nostalgic storytelling style.

MacStories Unwind+

We deliver MacStories Unwind+ to Club MacStories subscribers ad-free with high bitrate audio every week. To learn more about the benefits of a Club MacStories subscription, visit our Plans page.


MacStories launched its first podcast in 2017 with AppStories. Since then, the lineup has expanded to include a family of weekly shows that also includes MacStories UnwindMagic Rays of LightComfort ZoneNPC: Next Portable Console, and First, Last, Everything that collectively, cover a broad range of the modern media world from Apple’s streaming service and videogame hardware to apps for a growing audience that appreciates our thoughtful, in-depth approach to media.

If you’re interested in advertising on our shows, you can learn more here or by contacting our Managing Editor, John Voorhees.

LLMs Have Made Simple Software Trivial

2026-01-14 01:11:25

I enjoyed this thought-provoking piece by (award-winning developer) Matt Birchler, writing for Birchtree on how he’s been making so-called “micro apps” with AI coding agents:

I was out for a run today and I had an idea for an app. I busted out my own app, Quick Notes, and dictated what I wanted this app to do in detail. When I got home, I created a new project in Xcode, I committed it to GitHub, and then I gave Claude Code on the web those dictated notes and asked it to build that app.

About two minutes later, it was done…and it had a build error.

And:

As a simple example, it’s possible the app that I thought of could already be achieved in some piece of software someone’s released on the App Store. Truth be told, I didn’t even look, I just knew exactly what I wanted, and I made it happen. This is a quite niche thing to do in 2026, but what if Apple builds something that replicates this workflow and ships it on the iPhone in a couple of years? What if instead of going to the App Store, they tell you to just ask Siri to make you the app that you need?

John and I are going to discuss this on the next episode of AppStories about the second part of the experiments we did over our holiday break. As I’ll mention in the episode, I ended up building 12 web apps for things I have to do every day, such as appending text to Notion just how I like it or controlling my TV and Hue sync box. I didn’t even think to search the App Store to see if new utilities existed: I “built” (or, rather, steered the building of) my own progressive web apps, and I’m using them every day. As Matt argues, this is a very niche thing to do right now, which requires a terminal, lots of scaffolding around each project, and deeper technical knowledge than the average person who would just prompt “make me a beautiful todo app.” But the direction seems clear, and the timeline is accelerating.

I also can’t help but remember this old rumor from 2023 about Apple exploring the idea of letting users rely on Siri to create apps on the fly for the then-unreleased Vision Pro. If only the guy in charge of the Vision Pro went anywhere and Apple got their hands on a pretty good model for vibe-coding, right?

Apple Unveils Apple Creator Studio App Suite

2026-01-14 00:33:14

Source: Apple.

Source: Apple.

Today, Apple announced Apple Creator Studio, a suite of creativity apps for the Mac and iPad combined with premium content and features for productivity apps across the company’s platforms. This collection of apps, which includes the debut of Pixelmator Pro for iPad, offers tools for creative professionals, aspiring artists, students, and others working across a wide variety of fields, including music, video, and graphic design.

The bundle includes a number of apps:

  • Final Cut Pro for Mac and iPad (video editing)
  • Logic Pro for Mac and iPad (music creation)
  • Pixelmator Pro for Mac and iPad (photo editing and graphic design)
  • Motion for Mac (video effects)
  • Compressor for Mac (video encoding)
  • MainStage for Mac (music performance)

It also features a new Content Hub with premium graphics and photos for Apple’s iWork suite – Pages for word processing, Keynote for presentations, and Numbers for spreadsheets – as well as exclusive templates, themes, and AI features. The company says these features will also come to its Freeform canvas app soon.

Apple Creator Studio will be available on Wednesday, January 28, for $12.99/month or $129/year with a one-month free trial. Students and teachers can subscribe at a discounted rate of $2.99/month or $29.99/year, and three months of Apple Creator Studio will come free with the purchase of a new Mac or iPad. The subscription also includes Family Sharing, allowing users to share the apps and features with up to five family members.

With this offering, Apple is combining several disparate offerings for creatives into a single package that looks quite compelling. Because many of these apps are also available individually – some of them for free – there are a lot of details to get into regarding what’s new, what’s included, and what’s available elsewhere. Let’s get into it.

Beat Detection in Final Cut Pro. Source: Apple.

Beat Detection in Final Cut Pro. Source: Apple.

Final Cut Pro will continue to be available as a one-time $299.99 purchase on the Mac. Whether you purchase it that way or subscribe to access the app on both Mac and iPad, you’ll get a variety of new features, including Transcript Search to find moments in footage based on dialogue and Visual Search. Beat Detection will help video editors make cuts to match the rhythm of music playing under a video. And Montage Maker on the iPad is a new AI feature that automatically pulls together the best moments of a user’s footage into a montage, with options to adjust the pacing, edit the video to match a music track, and reframe from a horizontal aspect ratio to vertical for sharing on social media.

Mac-exclusive video tools Motion and Compressor are included in the bundle but also remain available to purchase separately for $49.99 each.

The Sound Library in Logic Pro. Source: Apple.

The Sound Library in Logic Pro. Source: Apple.

Logic Pro will similarly be available to purchase on the Mac for $199.99, but both the Mac and iPad versions will be included in Creator Studio. New features include the addition of a Synth Player to the app’s collection of AI Session Players; the player can be directed based on the sound the user is going for, and it can even access third-party Audio Units and play hardware synthesizers. Chord ID turns recorded music into readable chord progressions. The Mac version of Logic Pro gets a new Sound Library, while the iPad version gains natural language search for the Sound Browser via Music Understanding as well as the Quick Swipe Comping feature previously exclusive to the Mac.

MainStage for the Mac is available as part of Creator Studio or as a separate one-time $29.99 purchase.

Pixelmator Pro for iPad. Source: Apple.

Pixelmator Pro for iPad. Source: Apple.

Pixelmator Pro comes to the iPad for the first time with a UI optimized for touch and the Apple Pencil as well as file compatibility with its Mac counterpart. Familiar image editing tools like smart selection, Super Resolution, Deband, and Auto Crop are available in the iPad version, and the Apple Pencil integration is optimized for painting digital art. A new Warp tool is available as a Creator Studio exclusive in both the Mac and iPad versions of Pixelmator Pro.

The new Content Hub. Source: Apple.

The new Content Hub. Source: Apple.

Pages, Keynote, and Numbers will remain free and receive a Liquid Glass update, but Creator Studio subscribers gain access to new tools within Apple’s productivity apps. The Content Hub is a collection of images, graphics, and illustrations available for subscribers to include in their documents and presentations. Subscribers also have access to exclusive themes and templates, as well as experimental features. In Keynote, subscribers can generate presentations from an outline, create speaker notes based on slide content, and automatically clean up the layout of their slides. And Numbers includes a new Magic Fill feature to generate formulas and fill in tables automatically.


Apple has long offered powerful apps for creatives, but they’ve never been put together in a single package in this way before. For those who rely on these tools for their everyday work, it’s an exciting proposition to get access to everything, including exclusive and experimental features, for a single price. At the same time, it will be interesting to see how these changes are communicated in practice to those who aren’t subscribed. For example, how prominent will the Content Hub be for the many, many free users of the iWork apps?

We’ll find out exactly how it all works on January 28. In the meantime, I’m encouraged to see all the progress being made on Apple’s creative tools, especially on the iPad, and I look forward to giving Apple Creator Studio a try.

How I Used Claude to Build a Transcription Bot that Learns From Its Mistakes

2026-01-13 04:26:59

Step 1: Transcribe with parakeet-mlx.

Step 1: Transcribe with parakeet-mlx.

[Update: Due to the way parakeet-mlx handles transcript timeline synchronization, which can result in caption timing issues, this workflow has been reverted to use the Apple Speech framework. Otherwise, the workflow remains the same as described below.]

When I started transcribing AppStories and MacStories Unwind three years ago, I had wanted to do so for years, but the tools at the time were either too inaccurate or too expensive. That turned a corner with OpenAI’s Whisper, an open-source speech-to-text model that blew away other readily available options.

Still, the results weren’t good enough to publish those transcripts anywhere. Instead, I kept them as text-searchable archives to make it easier to find and link to old episodes.

Since then, a cottage industry of apps has arisen around Whisper transcription. Some of those tools do a very good job with what is now an aging model, but I have never been satisfied with their accuracy or speed. However, when we began publishing our podcasts as videos, I knew it was finally time to start generating transcripts because as inaccurate as Whisper is, YouTube’s automatically generated transcripts are far worse.

VidCap in action.

VidCap in action.

My first stab at video transcription was to use apps like VidCap and MacWhisper. After a transcript was generated, I’d run it through MassReplaceIt, a Mac app that lets you create and apply a huge dictionary of spelling corrections using a bulk find-and-replace operation. As I found errors in AI transcriptions by manually skimming them, I’d add those corrections to my dictionary. As a result, the transcriptions improved over time, but it was a cumbersome process that relied on me spotting errors, and I didn’t have time to do more than scan through each transcript quickly.

That’s why I was so enthusiastic about the speech APIs that Apple introduced last year at WWDC. The accuracy wasn’t any better than Whisper, and in some circumstances it was worse, but it was fast, which I appreciate given the many steps needed to get a YouTube video published.

The process was sped up considerably when Claude Skills were released. A skill can combine a script with instructions to create a hybrid automation with both the deterministic outcome of scripting and the fuzzy analysis of LLMs.

Transcribing with yap.

Transcribing with yap.

I’d run yap, a command line tool that I used to transcribe videos with Apple’s speech-to-text framework. Next, I’d open the Claude app, attach the resulting transcript, and run a skill that would run the script, replacing known spelling errors. Then, Claude would analyze the text against its knowledge base, looking for other likely misspellings. When it found one, Claude would reply with some textual context, asking if the proposed change should be made. After I responded, Claude would further improve my transcript, and I’d tell Claude which of its suggestions to add to the script’s dictionary, helping improve the results a little each time I used the skill.

Over the holidays, I refined my skill further and moved it from the Claude app to the Terminal. The first change was to move to parakeet-mlx, an Apple silicon-optimized version of NVIDIA’s Parakeet model that was released last summer. Parakeet isn’t as fast as Apple’s speech APIs, but it’s more accurate, and crucially, its mistakes are closer to the right answers phonetically than the ones made by Apple’s tools. Consequently, Claude is more likely to find mistakes that aren’t in my dictionary of misspellings in its final review.

Managing the built-in corrections dictionary.

Managing the built-in corrections dictionary.

With Claude Opus 4.5’s assistance, I rebuilt the Python script at the heart of my Claude skill to run videos through parakeet-mlx, saving the results as either a .srt or .txt file (or both) in the same location as the original file but prepended with “CLEANED TRANSCRIPT.” Because Claude Code can run scripts and access local files from Terminal, the transition to the final fuzzy pass for errors is seamless. Claude asks permission to access the cleaned transcript file that the script creates and then generates a report with suggested changes.

A list of obscure words Claude suggested changing. Every one was correct.

A list of obscure words Claude suggested changing. Every one was correct.

The last step is for me to confirm which suggested changes should be made and which should be added to the dictionary of corrections. The whole process takes just a couple of minutes, and it’s worth the effort. For the last episode of AppStories, the script found and corrected 27 errors, many of which were misspellings of our names, our podcasts, and MacStories. The final pass by Claude managed to catch seven more issues, including everything from a misspelling of the band name Deftones to Susvara, a model of headphones, and Bazzite, an open-source SteamOS project. Those are far from everyday words, but now, their misspellings are not only fixed in the latest episode of AppStories, they’re in the dictionary where those words will always be corrected whether Claude’s analysis catches them or not.

Claude even figured out "goti" was a reference to GOTY (Game of the Year).

Claude even figured out “goti” was a reference to GOTY (Game of the Year).

I’ve used this same pattern over and over again. I have Claude build me a reliable, deterministic script that helps me work more efficiently; then, I layer in a bit of generative analysis to improve the script in ways that would be impossible or incredibly complex to code deterministically. Here, that generative “extra” looks for spelling errors. Elsewhere, I use it to do things like rank items in a database based on a natural language prompt. It’s an additional pass that elevates the performance of the workflow beyond what was possible when I was using a find-and-replace app and later a simple dictionary check that I manually added items to. The idea behind my transcription cleanup workflow has been the same since the beginning, but boy, have the tools improved the results since I first used Whisper three years ago.