MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

MEXC Records $175M Net Inflows in February, Ranking 4th Among Global CEXs

2026-03-19 17:30:05

Victoria, Seychelles, March 10, 2026

According to data from DeFiLlama, global crypto trading platform MEXC recorded $175 million in net capital inflows in February 2026, ranking fourth among major centralized exchanges worldwide. The platform trailed only Binance ($1.92 billion), Deribit ($306 million), and Bitget ($206 million) during the same period.

Throughout February, Bitcoin price momentum remained relatively weak as macroeconomic uncertainty continued to weigh on overall risk appetite. Market sentiment turned increasingly cautious, and several exchanges recorded net outflows during the same period, reflecting a broader trend of capital divergence and portfolio reallocation amid choppy market conditions. Against this backdrop, MEXC maintained positive net inflows, highlighting its sustained ability to attract capital in a complex market environment.

MEXC's capital resilience is closely linked to its product strategy tailored to the current market cycle. The platform has structured its ecosystem around both defensive hedging tools and yield-generating asset management solutions, helping users significantly improve capital efficiency.

As safe-haven demand has strengthened in recent weeks, MEXC has continued to expand its precious metals derivatives offering, including GOLD(XAUT)USDT and SILVER(XAG)USDT perpetual futures. The platform has further increased leverage limits, deepened liquidity, and maintained its zero-fee trading policy, providing traders with more flexible risk-management tools. Combined with MEXC's 24/7 trading environment, these features allow investors to respond quickly to global developments and capture market opportunities as they emerge.

At the same time, when market direction remains unclear, capital tends to become increasingly sensitive to yield opportunities. MEXC Earn has been designed around flexible capital allocation and high-APR offerings, delivering multi-tier yield solutions across different asset categories and enabling investors to keep funds productive while awaiting clearer market signals.

  • Flexible Stablecoin Savings (USDT): up to 15% APR, with new users eligible for promotional rates of up to 600% APR during the first two days;
  • Precious Metals Fixed Savings (XAUT / SLVON): new users can receive up to 400% APR during the first three days, while all users can enjoy up to 10% APR during the first seven days;
  • Futures Earn: margin assets are integrated into a yield-generation mechanism with tiered APR subsidies based on position value, allowing eligible USDT and USDC balances to earn up to 20% APR while positions remain open.

In an environment of rising market volatility, liquidity depth, robust risk management, and cross-asset yield solutions are becoming increasingly important factors for users when choosing long-term trading platforms. MEXC will continue to enhance trading infrastructure and expand its product ecosystem to help global users navigate market cycles, manage risk more effectively, and capture emerging opportunities.

About MEXC

Founded in 2018, MEXC is committed to being "Your Easiest Way to Crypto." Serving over 40 million users across 170+ countries, MEXC is known for its broad selection of trending tokens, everyday airdrop opportunities, and low trading fees. Our user-friendly platform is designed to support both new traders and experienced investors, offering secure and efficient access to digital assets. MEXC prioritizes simplicity and innovation, making crypto trading more accessible and rewarding.

MEXC Official WebsiteXTelegramHow to Sign Up on MEXC

For media inquiries, please contact MEXC PR team: [email protected]

Source

Ordinary Mornings Turn Deadly

2026-03-19 17:00:08

:::info Astounding Stories of Super-Science October 2022, by Astounding Stories is part of HackerNoon’s Book Blog Post series. You can jump to any chapter in this book here. THE MURDER OF ROGER ACKROYD - DR. SHEPPARD AT THE BREAKFAST TABLE

Astounding Stories of Super-Science October 2022: THE MURDER OF ROGER ACKROYD - DR. SHEPPARD AT THE BREAKFAST TABLE

\ By Agatha Christie

:::

\ Mrs. Ferrars died on the night of the 16th–17th September—a Thursday. I was sent for at eight o’clock on the morning of Friday the 17th. There was nothing to be done. She had been dead some hours.

It was just a few minutes after nine when I reached home once more. I opened the front door with my latch-key, and purposely delayed a few moments in the hall, hanging up my hat and the light overcoat that I had deemed a wise precaution against the chill of an early autumn morning. To tell the truth, I was considerably upset and worried. I am not going to pretend that at that moment I foresaw the events of the next few weeks. I emphatically did not do so. But my instinct told me that there were stirring times ahead.

From the dining-room on my left there came the rattle of tea-cups and the short, dry cough of my sister Caroline.

“Is that you, James?” she called.

An unnecessary question, since who else could it be? To tell the truth, it was precisely my sister Caroline who was the cause of my few minutes’ delay. The motto of the mongoose family, so Mr. Kipling tells us, is: “Go and find out.” If Caroline ever adopts a crest, I should certainly suggest a mongoose rampant. One2 might omit the first part of the motto. Caroline can do any amount of finding out by sitting placidly at home. I don’t know how she manages it, but there it is. I suspect that the servants and the tradesmen constitute her Intelligence Corps. When she goes out, it is not to gather in information, but to spread it. At that, too, she is amazingly expert.

It was really this last named trait of hers which was causing me these pangs of indecision. Whatever I told Caroline now concerning the demise of Mrs. Ferrars would be common knowledge all over the village within the space of an hour and a half. As a professional man, I naturally aim at discretion. Therefore I have got into the habit of continually withholding all information possible from my sister. She usually finds out just the same, but I have the moral satisfaction of knowing that I am in no way to blame.

Mrs. Ferrars’ husband died just over a year ago, and Caroline has constantly asserted, without the least foundation for the assertion, that his wife poisoned him.

She scorns my invariable rejoinder that Mr. Ferrars died of acute gastritis, helped on by habitual over-indulgence in alcoholic beverages. The symptoms of gastritis and arsenical poisoning are not, I agree, unlike, but Caroline bases her accusation on quite different lines.

“You’ve only got to look at her,” I have heard her say.

Mrs. Ferrars, though not in her first youth, was a very attractive woman, and her clothes, though simple, always seemed to fit her very well, but all the same, lots of women buy their clothes in Paris and have not, on that account, necessarily poisoned their husbands.

As I stood hesitating in the hall, with all this passing through my mind, Caroline’s voice came again, with a sharper note in it.

“What on earth are you doing out there, James? Why don’t you come and get your breakfast?”

“Just coming, my dear,” I said hastily. “I’ve been hanging up my overcoat.”

“You could have hung up half a dozen overcoats in this time.”

She was quite right. I could have.

I walked into the dining-room, gave Caroline the accustomed peck on the cheek, and sat down to eggs and bacon. The bacon was rather cold.

“You’ve had an early call,” remarked Caroline.

“Yes,” I said. “King’s Paddock. Mrs. Ferrars.”

“I know,” said my sister.

“How did you know?”

“Annie told me.”

Annie is the house parlormaid. A nice girl, but an inveterate talker.

There was a pause. I continued to eat eggs and bacon. My sister’s nose, which is long and thin, quivered a little at the tip, as it always does when she is interested or excited over anything.

“Well?” she demanded.

“A bad business. Nothing to be done. Must have died in her sleep.”

“I know,” said my sister again.

This time I was annoyed.

“You can’t know,” I snapped. “I didn’t know myself4 until I got there, and I haven’t mentioned it to a soul yet. If that girl Annie knows, she must be a clairvoyant.”

“It wasn’t Annie who told me. It was the milkman. He had it from the Ferrars’ cook.”

As I say, there is no need for Caroline to go out to get information. She sits at home, and it comes to her.

My sister continued:

“What did she die of? Heart failure?”

“Didn’t the milkman tell you that?” I inquired sarcastically.

Sarcasm is wasted on Caroline. She takes it seriously and answers accordingly.

“He didn’t know,” she explained.

After all, Caroline was bound to hear sooner or later. She might as well hear from me.

“She died of an overdose of veronal. She’s been taking it lately for sleeplessness. Must have taken too much.”

“Nonsense,” said Caroline immediately. “She took it on purpose. Don’t tell me!”

It is odd how, when you have a secret belief of your own which you do not wish to acknowledge, the voicing of it by some one else will rouse you to a fury of denial. I burst immediately into indignant speech.

“There you go again,” I said. “Rushing along without rhyme or reason. Why on earth should Mrs. Ferrars wish to commit suicide? A widow, fairly young still, very well off, good health, and nothing to do but enjoy life. It’s absurd.”

“Not at all. Even you must have noticed how different she has been looking lately. It’s been coming on for the last six months. She’s looked positively hag-ridden. And you have just admitted that she hasn’t been able to sleep.”

“What is your diagnosis?” I demanded coldly. “An unfortunate love affair, I suppose?”

My sister shook her head.

Remorse,” she said, with great gusto.

“Remorse?”

“Yes. You never would believe me when I told you she poisoned her husband. I’m more than ever convinced of it now.”

“I don’t think you’re very logical,” I objected. “Surely if a woman committed a crime like murder, she’d be sufficiently cold-blooded to enjoy the fruits of it without any weak-minded sentimentality such as repentance.”

Caroline shook her head.

“There probably are women like that—but Mrs. Ferrars wasn’t one of them. She was a mass of nerves. An overmastering impulse drove her on to get rid of her husband because she was the sort of person who simply can’t endure suffering of any kind, and there’s no doubt that the wife of a man like Ashley Ferrars must have had to suffer a good deal——”

I nodded.

“And ever since she’s been haunted by what she did. I can’t help feeling sorry for her.”

I don’t think Caroline ever felt sorry for Mrs. Ferrars whilst she was alive. Now that she has gone where (presumably)6 Paris frocks can no longer be worn, Caroline is prepared to indulge in the softer emotions of pity and comprehension.

I told her firmly that her whole idea was nonsense. I was all the more firm because I secretly agreed with some part, at least, of what she had said. But it is all wrong that Caroline should arrive at the truth simply by a kind of inspired guesswork. I wasn’t going to encourage that sort of thing. She will go round the village airing her views, and every one will think that she is doing so on medical data supplied by me. Life is very trying.

“Nonsense,” said Caroline, in reply to my strictures. “You’ll see. Ten to one she’s left a letter confessing everything.”

“She didn’t leave a letter of any kind,” I said sharply, and not seeing where the admission was going to land me.

“Oh!” said Caroline. “So you did inquire about that, did you? I believe, James, that in your heart of hearts, you think very much as I do. You’re a precious old humbug.”

“One always has to take the possibility of suicide into consideration,” I said repressively.

“Will there be an inquest?”

“There may be. It all depends. If I am able to declare myself absolutely satisfied that the overdose was taken accidentally, an inquest might be dispensed with.”

“And are you absolutely satisfied?” asked my sister shrewdly.

I did not answer, but got up from table.

\ \

:::info About HackerNoon Book Series: We bring you the most important technical, scientific, and insightful public domain books.

This book is part of the public domain. Astounding Stories. (2008). ASTOUNDING STORIES OF SUPER-SCIENCE, JULY 2008. USA. Project Gutenberg. Release date: OCTOBER 2, 2008, from https://www.gutenberg.org/cache/epub/69087/pg69087-images.html

This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever.  You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org, located at https://www.gutenberg.org/policy/license.html.

:::

\

Inside Discord’s Architecture at Scale

2026-03-19 15:51:54

Discord is a permanent, invite-only space where people can hop between voice, video, and text. On the surface, it seems like “just another chat app.” Take a closer look, and you’ll see that it’s really a finely-tuned system that delivers speed, scale, and reliability — the consumer app hat-trick.

\ Every time you send a message, join a voice channel, or watch a stream, Discord has to route the event to the right place, notify tons of clients, and do so fast enough that it feels instantaneous. That’s easy when your server has 50 people. It’s insane when it has 19 million.

\ This is the story of the creative optimizations that keep Discord snappy at scale.

\

Part I: The Actor Model


Before digging into Discord’s implementation details, we need to understand the architectural pattern upon which it’s built: The Actor Model.

Carl Hewitt introduced this idea in his 1973 paper (pdf). Alan Kay embraced it for passing messages in the 80s. Gul Agha formalized its relevance for distributed systems in 1985. Since then, many teams and tools have adopted the model. If you’ve ever read an elaborate sequence diagram or worked in an “event-driven architecture,” you can thank the Actor Model.

What problem does the model address?

In shared-memory, multiple threads use the same state, which quickly results in race conditions. You can prevent this by adding a data-access constraint: locks. But locks come with their own bugs, like when multiple threads wait for the other to release the lock, resulting in a permanent freeze (deadlock). As systems grow, these problems become bottlenecks.

\ Adapted from Gul Agha, Actors: A Model of Concurrent Computation in Distributed Systems (93).

\ The Actor Model allows data to be updated more easily in distributed systems.

An actor is an agent with a mail address and a behavior. Actors communicate through messages and carry out their actions concurrently. Instead of locks, the Actor Model ensures safe concurrency with communication constraints. The Actor Model can be summarized in four rules:

1. Each actor owns its state (no one else can directly mutate it)
2. Actors only communicate through messages
3. Actors process messages one at a time (no race conditions)
4. In response to a message, an actor can:
   - Change its state
   - Send messages
   - Create child actors

\ Anatomy of an actor. Adapted from biohaviour

\ \ Follow the rules and avoid race conditions

\ Here’s an actor who follows the rules.

actor.start();
actor.send(message);          // send to another
actor.subscribe(s => { ...}); // listen to another
actor.getSnapshot();          // cached result from .subscribe

As you can see, actors alone are actually pretty simple, almost pure.

As long as you play by the rules, you won’t have race conditions and lock spaghetti. You also get a few other benefits:

  • Location independence. The interface means that you can be confident each actor will behave consistently regardless of its location. Doesn’t matter if one actor is on localhost and another is remote. Two actors can even share the same thread (no thread management!).
  • Fault tolerance. If one actor fails, its manager can revive it or pass its message to an available actor.
  • Scalability. Actors are easy to instantiate, making them compatible with microservices and horizontal scaling.
  • Composability. They encourage atomic over monolithic architecture.

\ How actors can chat asynchronously. Adapted from theserverside.com

\

Actors in the wild

There was a quiet period after Hewitt dropped his paper, but adoption has recently taken off in response to our growing data footprint.

Here are a few modern examples of the actor model:

Video editing software sends each setting change so that they are reflected in the draft immediately.

Trading platforms (Robinhood) treat each withdrawal as an isolated actor, which has a function to update an account. Say you have $10 in your account. If you and your wife both try to buy $10 of $GME simultaneously, your app will process the request that arrives first. When the second gets out of the queue, it’ll run the checkBalance statement, see that it’s now $0, and deny it.

AI agents. An agent is an actor. It passes messages (prompts), has internal state (context), and spawns other agents (actors). AI Agents are the perfect use case for the pattern, as Hewitt anticipated. His original paper was called “A Universal Modular ACTOR Formalism for Artificial Intelligence,” after all.

If you squint past the trendy design and jargon on Akka’s site, it looks like the same pitch Hewitt made five decades ago.

\

Event-driven fabric

Multi-step agents

Persisted state through snapshots and change events

Isolated execution units that maintain their own state

Parallel processing without shared bottlenecks

Unbounded data flows

\ The next time you see these keywords on a landing page, remember to get out your Actor Pattern Buzzword card and yell “Bingo!”

\ Adapted from akka.io

\ This concept is relevant on the frontier as well as in the verticals. Cursor’s self-driving codebase experiment took a first-principled journey from unconstrained sharing to a well-defined communication flow between planner and workers. Sounds very similar to actors and managers, doesn’t it?

Is it silly that we’re pretending to be pioneers by rejecting the idea of a global state? Yes, but at least we’re not the only ones who need to rediscover old problems: the agents in Cursor’s experiment also tried and failed to make locks work.

\

Agents held locks for too long, forgot to release them, tried to lock or unlock when it was illegal to, and in general didn’t understand the significance of holding a lock on the coordination file. — Wilson Lin @ Cursor

\ Thankfully, the Actor Pattern still works regardless of our willingness to recognize it.

What’s the catch?

It’s really cool that the Actor Pattern has gained relevance over the decades. But the model isn’t without trade-offs. Things become more complicated when you add production necessities, such as a manager, callbacks, promises, initial state, and recovery. The ease of composition will inevitably lead some teams into microservice hell, where they’ll get lost in boilerplate and hairball graphs.

\ Oops, you got excited and now have 4,000 microservices.

\ Debugging dataflow bugs can be harder. Although each actor has well-isolated logging, it can be harder to trace a bug across multiple services.

Price. Agents that create more agents is a dream for the engineering team. If that process isn’t constrained, however, it’ll turn into a nightmare for the finance team.

Finally, implementing the actor model in a big org requires education. Pure functions, state machines, event-driven architecture — these are unfamiliar concepts to many. It took me days of research before I “got it.” Many orgs won’t want to dedicate the time to get everyone thinking in a new paradigm, so they’ll fall back to their monolithic habits.

Thankfully, the industry has started bridging the gap between the usefulness of this Actor Model and its adoption complexity by creating languages and frameworks. These preserve the actors’ tenets while making them easier to implement.

\ They’re all actors

\

Summary

The actor model makes it easier to avoid locks and race conditions in distributed systems. It does this by standardizing communication and data access.

Letting too many things share the same data and chat freely leads to chaos. Think: startup engineer who has to handle info from Slack, hallway requests from the PM, email, and standup. That guy is going to overcommit (deadlock), forget (data loss), and struggle to organize (recovery).

\ Using an actor model is like requiring everyone to communicate over email. Everyone follows the rules of SMTP (recipient, subject, body) and can only respond to one email at a time (concurrency). In this system, the communication constraints minimize mistakes and conflicts.

\ All this adds up to a faster, more reliable system at scale. Everyone knows how to talk to each other. They know how to ask for and deliver things. This helps them work autonomously without blocking others.

Having an efficient pattern becomes more important the more distributed a system becomes. As Gul predicted in 1985, more time is spent on “communication lags than on primitive transformations of data.” A team that knows that all too well is Discord, which has successfully instrumented the Actor Model to process trillions of messages without data loss or latency.

Let’s see how.

\

Part II: How Discord Processes Trillions of Messages


Everything is an “actor.” Every single Discord server, WebSocket connection, voice call, screenshare, etc… distributed using a consistent hash ring.

It’s an incredibly great model for these things. We’ve been able to scale this system from hundreds to hundreds of millions of users with very little changes to the underlying architecture.

— Jake, Principal Engineer @ Discord

\ Thanks to Discord’s initial target user, gamers, speed has always been an unquestioned requirement. When a message is sent, others need to see it immediately. When someone joins a voice channel, they should be able to start yapping right away. A delayed message or laggy chat can ruin a match.

Discord needed a smooth way to turn plain text/voice data into internal messages and then route them to the correct guild (AKA: Discord server) in real-time.

How data flows

Guilds and users talk over Elixir and WebSockets.

  1. Users connect to a WebSocket and spin up an Elixir session process, which then connects to the guild.
  2. Each guild has a single Elixir process, which acts as a router for all guild activity.
  3. A Rust data service deduplicates API queries before sending them to ScyllaDB.
  4. Background communication happens over PubSub. For example, the Elasticsearch worker consumes events, batches with others, and starts indexing.

How the fan-out works

“Fan-out” refers to the act of sending a message to multiple destinations in parallel without requiring a response. This is exactly what Discord needed to implement to make their chat feel real-time. When a user comes online, they connect to a guild, and the guild publishes presence to all other connected sessions. Each guild and connected user has one long-lived Erlang process. The guild’s process keeps track of client sessions and sends updates to them. When a guild receives a message, it fans it out to all client sessions. Finally, each session process pushes the update over WebSocket to the client.

In other words:

User ↔ WebSocket ↔ Session ↔ Guild

Improving the fan-out

Elixir’s functional implementation of the Actor Pattern allowed it to handle a lot of processes with ease compared to other languages.

# Publishing to other guilds in 4 lines of Elixir. Pretty neat.
def handle_call({:publish, message}, _from, %{sessions: sessions}=state) do
  Enum.each(sessions, &send(&1.pid, message))
  {:reply, :ok, state}
end

The language’s affordances helped Discord get started easily. But as usage grew, it became clear that they’d need to do more to stay responsive. If 1,000 online members each said, “hello, world” once, Discord would have to process 1 million notifications.

10,000 messages → 100 million notifications.

100,000 → 10 billion.

Given that each guild is one process, they needed to max out the throughput of each. Here’s how they did that:

  1. Splitting the work across multiple threads using a relay
  2. Tuning the Elixir in-memory database
  3. Using workers processes to offload operations (maintenance, deploys)
  4. Delegating the fan-out to a separate “sender,” offloading the main process

Thanks to these efforts, routing the notifications to users in a massive guild no longer became the crippling bottleneck it easily could’ve.

But when one bottleneck is removed, another takes its place. In Discord’s case, their next biggest problem shifted from the messaging layer down to the database.

\ How the relay saves resources

\

Part III: Discord’s Hot Partitions


Problem: The Cassandra database was causing slowness

With notifications being routed to the correct guild, and Discord’s API converting those payloads into queries, the last step was to get the data back from their Apache Cassandra cluster. As a NoSQL database optimized for horizontal scaling, Cassandra should’ve scaled linearly without degrading performance. Yet, after adding 177 nodes and trillions of messages, reads were slowing in popular servers.

Why??

In a word: partitions.

Discord partitioned their messages data across Cassandra using channel ids and a 10-day window, and partitions were replicated across three Cassandra nodes.

CREATE TABLE messages (
    channel_id bigint,
    bucket int,        // static 10-day time window
    message_id bigint,
    author_id bigint,
    content text,
    PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

That’s fine, but two other factors were at play:

  1. Popular guilds get orders of magnitude more messages than smaller ones. Midjourney has 19 million members, of whom over 2 million could be online at any time.
  2. Reads (search, fetching chat history) are more expensive than writes in Cassandra, due to its need to query its memtable and SSTables.

Lots of reads led to hot partitions, which slowed the messages reads, which slowed the guild. What’s worse, the (valid) use of replication and quorums made other parts of Discord slow, like the upstream nodes that queried the hot partitions. Running maintenance on a backed-up cluster also became tricky due to Cassandra’s high garbage collection, which resulted in many “stop the world” events. Cassandra’s endless config options — flags, gc algorithms, heap sizes — only made it harder to mitigate these issues.

  // messages in a channel
  [{
    "channel_id": 1072891234567890123,
    "bucket": 19783, // #announcements channel
    "message_id": 1072934567890123456,
    "author_id": 856781234567890123, // Midjourney dev
    "content": "Hey @everyone, v2 is live. Try it out!"
  },
  {
    "channel_id": 1072891234567890123, // snowflake id
    "bucket": 19783,
    "message_id": 1072934567890123455,
    "author_id": 923456789012345678, // user
    "content": "woah, so cool"
  }]

\ Why the partition scheme wasn’t the problem

I instinctively pointed my finger at that 10-day static bucket window and thought, “That’s silly. There’s gotta be a better way to organize the data.”

But is there?

  • If they dropped the bucket altogether, the hot partitions would get even hotter.
  • If they added a random shard to the partition key, they’d trade hot partition problems with fan-out problems (extra queries, more latency).
  • If they made the buckets smaller (1-day or 1-hour), they’d have to coordinate more round trips.
  • What if they kept Cassandra for what it’s good at (writes) and moved reads to another solution (cache + index + log)? Sounds complicated.
  • Partitioning by message_id would eliminate hot partitions altogether. But Discord’s primary use case for reads is to get the most recent messages in a channel. Fetching “the last 50 messages in #general” now becomes a huge hunt across every node in the cluster.

Every option either makes it harder to target the data or trades hot partition problems with other problems. So, searching for a more capable database actually seems like a sound first step. If they find one that can cool down the partitions enough to avoid a bottleneck, they could avoid adding more business logic to the data layer.

Thankfully, they found what they were looking for

Solution 1: Switch to ScyllaDB

ScyllaDB is a hard-to-pronounce C++ NoSQL data store that advertises itself as “The Cassandra you really wanted.”

Zing.

On paper, it seems to back that claim up. It’s faster. It’s cheaper. It’s simpler.

How? Here is the gist:

  • Per-core sharding → more efficient CPU
  • Per-query cache bypassing → faster queries
  • Per-row repair (vs partition) → faster maintenance
  • Better algorithms for scheduling, compiling, drivers, and transactions
  • No garbage collection

After reading the full Cassandra vs ScyllaDB comparison, I actually felt bad for Cassandra. Like, this was supposed to be a friendly scrimmage — take it easy, guys.

\

Similar to their early investment in the then-unproven Elixir language, their early bet on Scylla still demanded some work. They had to work with the Scylla team to improve its reverse queries, for example, which were initially insufficient for scanning messages in ascending order. Eventually, Scylla got good enough to support Discord’s workflows, and Discord got good enough at Scylla to casually write the 392-page book on it.

\

# Optimization scorecard

Better fan-out (service layer): ✔️
Better DB (data layer): ✔️

\ What else did Discord do to maintain speed at scale?

\ Turns out, everything.

\ Let’s look at two more in detail.

Solution 2: Request Coalescing

Regardless of how compatible your DB is with your app, overriding it with requests will cause latency. Discord encountered this problem with its Python API, which other services could call directly to fetch data from the database. Unfortunately, the message fan-out mechanism resulted in lots of duplicate requests in popular guilds.

This duplicate request problem isn’t unique to Discord, Python, or NoSQL. It’s been around ever since lots of people wanted to access the same information. If you are serving thousands of hits per second, the queue of waiting requests can get huge.

This introduces two problems:

  1. The CPU releases thousands of threads, sending the load sky high (AKA: “thundering herd”).
  2. Users don’t like waiting.

To protect itself from a herd of redundant messages queries after someone spams @everyone in #announcements, Discord introduced a new Rust service, their Data Service Library. When a request comes into the Data Service Library, it subscribes to a worker thread. This worker makes the query and then sends the result over gRPC to all the subscribers. If 100 users request the same message simultaneously, there will only be one DB request.

Request -> Python API -> DB
Request -> Python API -> Data Service Library -> DB

\ Adapted from medium.com/@mr.sourav.raj

\ This technique successfully reduced inbound query rate by 10-50x for popular channels and scaled DB connections linearly (one per worker, not per concurrent request). In addition to speeding up UX, it also increased Discord’s confidence in shipping concurrent code, which undoubtedly improved product velocity and reduced bugs.

Most importantly, however, it let them say they rewrote it in Rust.

\

Meme cred is very important.

— Bo Ingram, Senior Software Engineer @ Discord

\ Not a team to rest on their meme laurels, they further optimized request handling by routing each message according to its channel_id. For example, all requests for an #announcements channel go to the same message service. This sounds like it’d be a bad thing, but apparently Rust’s magic lets it handle this case without a sweat. This also makes the request routing from the above point much simpler, as all queries for channel_id=123 hit the same instance’s in-memory cache.

\ Matching the channel to the service

\ \ Was the Rust coalescing service overkill? Should they have just used their Redis cache and moved on?

I don’t think so. Caching is about serving slightly stale content. Instead of reducing the total number of requests in the herd, it would’ve simply made each request faster. It also would’ve come with more code to manage.

Caching as a standalone solution to the thundering herd works for apps that don’t need to feel “real-time.” For example, a news site might serve a slightly outdated version of its main page immediately to prevent visitors from having to wait to read something. This buys them time to fetch and render the latest article. The result is faster perceived load times. If they didn’t do this, a thundering herd problem would emerge if 10,000 visitors loaded a trending article at the same time.

Here’s how one HTTP proxy implemented this caching strategy using a simple grace value:

// Vinyl Cache (2010-2014)
// <https://vinyl-cache.org/docs/6.1/users-guide/vcl-grace.html#grace-mode>
sub vcl_hit {
     if (obj.ttl >= 0s) {
          // A pure unadulterated hit, deliver it
          return (deliver);
     }
     if (obj.ttl + obj.grace > 0s) {
          // Object is in grace, deliver it
          // Automatically triggers a background fetch
          return (deliver);
     }
     // fetch & deliver once we get the result
     return (miss);
}

\ Key takeaway: When queries are expensive (DB round-trip, API call), in-flight deduplication is cheaper than a distributed cache.

For the record, Discord utilizes this type of caching with their CDN and Redis, but not in the Rust Data Service. (If you’re still hungry for more request optimization nuances, read up on Request Hedging and comment whether you think it would’ve helped Discord’s thundering herd problem.)

Clearly, the request layer was prime territory for optimization, which makes sense for an app that sends many small payloads (a few hundred bytes each). What might be more surprising is where Discord unlocked its next big performance win.

# Optimization scorecard
Better fan-out (service layer): ✔️
Better DB (data layer): ✔️
Better queries (API layer): ✔️
Memes (vibes layer): ✔️
Better disks (hardware!): ??

Solution 3: Super-Disk

Eventually, the biggest impact on Discord’s database performance became disk operation latency. This latency resulted in an ever-growing queue of disk reads, eventually causing queries to time out.

Discord runs on GCP, which offers SSDs that can operate with microseconds. How is this even an issue?

Remember that Discord needs speed and reliability. The SSDs are fast, but they failed Discord’s second requirement during internal testing. If a disk holds data in memory and then fails, the data is unrecoverable.

GCP + SSD = 🙅‍♂️ (too unreliable)

What about GCP’s Persistent Disk option? These are great for duplicating and recovering data. But they’re slow: expect a couple of milliseconds of latency, compared to half a millisecond for SSDs.

GCP + Persistent Disk = 🙅‍♂️ (too slow)

No worries, surely GCP offers a disk abstraction that delivers the best of both SSD and Persistent Disk.

Nope.

Not only does GCP make you pick between fast and reliable, but their SSDs max out at 375 GB — a non-starter for the Discord DB instances that require 1 TB.

Once again, the Discord engineers were on their own.

They had to find a way to

  1. Stay within GCP
  2. Continue snapshotting for backups
  3. Get low-latency disk reads
  4. Maintain uptime guarantees

So what creative solution did they cook up this time? The “Super-Disk”: A disk abstraction that involves Linux, a write-through cache, and RAID arrays. They wrote a breakdown with the details, so here’s the summary: GCP + SSD + Persistent Disk = 👍 (fast, reliable).

\ Much better

\

Bonus: 5 More Solutions

We walked through the juicy optimization solutions to the hot partition problem. But Discord didn’t view performance as an isolated priority. Instead, they took every opportunity they found to make things fast. Here is a final roundup of other interesting techniques they used.

Request routing by PIDs (messaging). They optimized how messages are passed between nodes by grouping PIDs by their remote node and then hashing them according to the number of cores. Then they called it Manifold and open-sourced it.

Sortable Ids (data). They used Twitter’s “Snowflake” id format, which is based on timestamps. This let them generate ids without a DB, sort by creation time, and infer when a message was sent by its id alone. The Snowflake project is zipped up in a public archive, but I dug up the good part for you. Notice how the timestamp is used.

  protected[snowflake] def nextId(): Long = synchronized {
    var timestamp = timeGen()

    if (timestamp < lastTimestamp) {
      exceptionCounter.incr(1)
      log.error("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
      throw new InvalidSystemClock("Clock moved backwards.  Refusing to generate id for %d milliseconds".format(
        lastTimestamp - timestamp))
    }

    if (lastTimestamp == timestamp) {
      sequence = (sequence + 1) & sequenceMask
      if (sequence == 0) {
        timestamp = tilNextMillis(lastTimestamp)
      }
    } else {
      sequence = 0
    }

    lastTimestamp = timestamp
    ((timestamp - twepoch) << timestampLeftShift) |
      (datacenterId << datacenterIdShift) |
      (workerId << workerIdShift) | 
      sequence
  }

Of course, Discord had to add its own touch to the ID. Instead of using the normal epoch (1970), they used the company’s epoch: the first second of 2015. I’d normally roll my eyes at this kind of unnecessary complexity. But that’s pretty cool, so instead I grant +5 meme points.

Elasticsearch abstraction (search). Although they don’t store messages in Elasticsearch, they store message metadata and user DMs for indexing and fast retrieval. To support the use case of searching through all of a user’s DMs, they grouped their Elasticsearch clusters in “cells.” This helped them avoid some bottlenecks during indexing. It also helped them index a message by user_id (instead of guild_id), which enabled cross-DM search.

WebRTC tuning (entry). I stayed away from WebRTC here because I covered the basics in the Google Meet breakdown on our YouTube channel. However, Discord does do some interesting stuff at this layer: using the Opus codec, circumventing the auto-ducking on Windows in favor of their own volume control, voice detection, and reducing bandwidth during silence.

Passive sessions (entry): Turns out that 90% of large server sessions are passive; the users aren’t actively reading or writing in them. The team realized they could improve latency by simply sending the data to the right users at the right time. If a user doesn’t have a guild tab open, they won’t receive all of the new messages. But once they tab into the guild, their passive session will be “upgraded” into a normal session, and they’ll receive the full firehose of messages in real-time. This led to a 20% reduction in bandwidth. Amazing example of a big perf win that isn’t technically complicated.

Emojis and lists on Android (client): They were committed to React Native. Then they saw that the custom emojis weren’t rendering well on low-grade Androids. So, they abandoned their original plan and wrote the emoji feature in Kotlin while also maintaining React Native. The worst of both worlds: their architecture is now split between iOS (React Native) and Android (native).

\ Left is laggy. From “Supercharging Discord Mobile” (discord.com)

\ Similar story for rendering lists. They needed to create their own library. Then they adopted Shopify’s. When that stopped working, they created another library that used the native RecyclerView. All for rendering lists quickly on Android.

# Optimization scorecard
Fan-out (service layer): ✔️
DB (data layer): ✔️
Queries (API layer): ✔️
Disk (hardware): ✔️
Routing: ✔️
Ids: ✔️
WebRTC: ✔️
Sessions: ✔️
Emojis: ✔️

This team will tackle any performance problem in the stack with a calm recklessness that I can’t help but admire.

Part IV: Lessons for Engineers


While most of us won’t have to handle trillions of messages, there are some elegant engineering principles we can learn from Discord’s success over the last decade.

Build for simplicity in v1, but design for change.

We examined the migration from Cassandra → ScyllaDB. But they actually started with a DB that was even more unfit for their eventual scale: Mongo. They picked it during the early days because it bought them the time to experiment and learn what they needed. As they suspected, Mongo eventually gave them enough problems (like incorrectly labeling states) to warrant moving to Cassandra. When their Cassandra problems got serious enough, they moved to Scylla.

This is the right way the stack should evolve.

They could’ve never anticipated the bottlenecks at the onset, let alone the right solutions to them (ScyllaDB wasn’t even a viable alternative at the time). Instead of wasting time over-projecting future problems, they picked the tool that helped them serve users and focused on their current bottleneck. They also didn’t overfit their systems according to their current DB, which would’ve slowed down the inevitable migration.

Anticipate the future headaches, but don’t worry about preventing them in v1. Create the skeleton of your solution. Fill in the details with a hacky v0. When the pain happens, find your next solution.

Learn the fundamentals.

Language fundamentals.

Deeply understanding a language helps you pick the right one. When Discord started evaluating the functional programming language Elixir, it was only three years old. Elixir’s guarantees seemed too good to be true. As they learned more about it, however, they found it easier to trust. As they scaled, so did their need to master Elixir’s abstractions for concurrency, distribution, and fault-tolerance. Nowadays, frameworks handle these concerns, but the language provided the building blocks for capable teams like Discord to assemble their own solutions. After over a decade, it seems like they’re happy they took a bet on an up-and-coming language:

\

What we do in Discord would not be possible without Elixir. It wouldn’t be possible in Node or Python. We would not be able to build this with five engineers if it was a C++ codebase. Learning Elixir fundamentally changed the way I think and reason about software. It gave me new insights and new ways of tackling problems.

— Jake Heinz, Lead Software Engineer @ Discord

\ There is a world where the 2016 Discord engineers were unwilling to experiment with an unproven language, instead opting for the status quo stack. No one would’ve pushed back at the time. Perhaps that’s also why no one can match their performance in 2026.

\ Garbage collection fundamentals.

GC’s unpredictability hampered Discord’s performance on multiple dimensions.

\

We really don’t like garbage collection

— Bo Ingram, Senior Software Engineer @ Discord

\ On the DB layer, Cassandra’s GC caused those “stop-the-world” slowdowns. On the process layer, BEAM’s default config value for its virtual binary heap was at odds with Discord’s use case. Although it was a simple config fix, a lot of debugging and head-scratching were required (discord).

\

Unfortunately, our usage pattern meant that we would continually trigger garbage collection to reclaim a couple hundred kilobytes of memory, at the cost of copying a heap which was gigabytes in size, a trade-off which was clearly not worth it.

— Yuliy Pisetsky, Staff Software Engineer @ Discord

\ At the microservice layer, they had a ‘Read States’ service written in Go, whose sole purpose was to keep track of which channels and messages users had read. It was used every time you connected to Discord, every time a message was sent, and every time a message was read. Surprise, surprise — Go’s delayed garbage collection led to CPU spikes every two minutes. They had to dig into Go’s source code to understand why this was happening, and then had to roll out a duct-tape config solution that worked OK enough…until Rust, void of a garbage collector, arrived to save the day.

\

Even with just basic optimization, Rust was able to outperform the hyper hand-tuned Go version.

— Jesse Howarth, Software Engineer @ Discord (medium)

\ The team had to go on so many wild garbage-collecting goose hunts that I’m sure they’re now experts in the topic. While learning more about gc wouldn’t have prevented the problems from happening, I suspect it would’ve made the root cause analysis smoother.

As MJ said, “Get the fundamentals down and the level of everything you do will rise.” If Michael Jordan and The Michael Jordan of Chat Apps can appreciate the basics, then so can I.

Define your requirements early.

The high expectations of their early gamer users solidified the need to prioritize speed and reliability from day one. This gave the young team clarity: they knew they’d need to scale horizontally and obsess over performance.

\

From the very start, we made very conscious engineering and product decisions to keep Discord well suited for voice chat while playing your favorite game with your friends. These decisions enabled us to massively scale our operation with a small team and limited resources.

— Jozsef Vass, Staff Software Engineer, Discord

\ At scale, even a “small” feature can turn into a massive undertaking. Calling @here or sending ... typing presence in a 1M user Midjourney guild isn’t trivial. Discord’s need to deliver first-class speed forced it to keep the UX refined and focus on the endless behind-the-scenes witchcraft we discussed here. Had they compromised their performance principle by chasing trends or rushed into other verticals, they would’ve eroded their speed advantage.

This is why, eleven years since the Discord epoch, the app still doesn’t feel bloated or slow.

\ Although every feature feels like it’s just a prompt away from prod, clarifying your value prop is still crucial. Discord’s was voice chat for gamers.

\ What is yours?

\ If you don’t have a good answer, you’ll either ship so slow that you’ll get outcompeted or go so fast that your app becomes slop.

Build a good engineering culture.

Step 1: Have good engineers.

I haven’t seen anyone question this team’s competence. But here’s an example for any lurking doubters.

In 2020, Discord’s messaging system had over 20 services and was deployed to a cluster of 500 Elixir machines capable of handling millions of concurrent users and pushing dozens of millions of messages per second. Yet the infra team responsible only had five engineers. Oh, and none of them had experience with Elixir before joining the company.

Clearly, there were no weak links on this team.

These people are rarely incubated by altruistic companies. Instead, their obsession with juicy technical problems drives their mastery. Then they find the companies that have challenging technical problems and the right incentives. Clearly, Discord has both.

If you don’t have traction, frontier problems, or cash, you’ll have to compete for talent on different vectors. The engineers in your funnel probably won’t be as good as Discord’s, but the takeaway is the same: hire the best you can get.

\ Step 2: Let them be creative.

Good engineers come up with creative solutions to hard problems.

We saw this in the multi-pronged attack on their hot partition problem. When faced with database issues, lesser engineers would’ve blamed everything on Cassandra and the old engineers who chose it. Then they’d have given hollow promises about how all their problems would fix themselves once they switched to a new DB. Discord instead got curious about how they could make things easier for whatever DB they used, which led them to their Data Service Library for request coalescing.

Same story with the Super-Disk. Investigating the internal disk strategy of their GCP instances isn’t a project that a VP would assign a team in a top-down fashion. Instead, management let the engineers explore the problem space until a unique solution emerged.

Having a culture where engineers can figure stuff out is the corollary benefit to getting clear on your core value. Ideas come from the bottom up, people get excited about the work, and unique solutions emerge. I commend both the Discord engineers for embracing the challenges we discussed and the managers who let them cook.

How creative is your team in the face of tough problems? How can you give them the clarity and space they need to venture beyond the trivial solutions?

Don’t seek complexity, but accept it when it finds you.

Complexity is not a virtue. Every engineer learns this the hard way after their lil’ refactor causes downtime and gets them called into the principal’s manager’s office. The true motivation behind a lot of these changes is, “I’m bored, and this new thing seems fun.” (Looking at you, devs who claim users need an agentic workflow to fill out a form). Complex is challenging. I like challenges. Therefore, let’s do it.

No.

But complexity also isn’t the enemy. For Discord, the enemy is latency. When they run out of simple ways to make their app fast, they come up with creative solutions that, yes, are often complex. Their tolerance for the right complexity is what makes them great. They do the complex stuff not for themselves, but for their users.

It’s OK to complicate things…as long as it helps the users.

As we’ve seen, Discord’s need for speed led it across every layer of the stack, from cheap Androids to expensive data centers. This article documents plenty of well-executed perf techniques along the way — some textbook, some creative. What it really highlights, though, is the power available to teams who commit to helping their users at scale.

\

Recommendations


My newsletter for more deep-dives (substack)

The Actor Model

Carl Hewitt (wikipedia)

The Actor Model (wikipedia)

A Feature Model of Programming Languages (sciencedirect)

The Actor Model Whiteboard Session with Carl Hewitt (youtube)

The Actor Model, Behind the Scenes with XState (youtube)

Why Did the Actor Model Not Take Off? (reddit)

Performance Optimization

Fan-Out Definition and Examples (dagster)

How Discord Indexes Trillions of Messages (discord • hackernews • youtube)

How Discord Supercharges Network Disks for Extreme Low Latency (discord)

Pushing Discord’s Limits with a Million+ Online Users in a Single Server (discord)

How Discord Handles Two and Half Million Concurrent Voice Users using WebRTC (discord)

\

Inside the Week That Shook AI

2026-03-19 15:50:09

Last Tuesday, Anthropic's CEO told the Department of Defense that Claude would never be used for autonomous weapons or mass surveillance. By Friday, the Pentagon designated Anthropic a "supply chain risk." By Sunday, Anthropic was suing the Trump administration.

Meanwhile, Meta quietly delayed its next AI model because — and I'm not making this up — it couldn't beat Google's Gemini 3.0. The company that committed $135 billion on AI this year is now considering licensing Gemini from its biggest rival.

And in four days, Jensen Huang takes the stage at GTC 2026 to unveil chips that make everything else look like a calculator.

This isn't your standard AI news roundup. This is the week AI stopped being a tech story and became a political, economic, and existential one.


1. Anthropic vs. the Pentagon: The AI Company That Said No

Anthropic vs Pentagon \n \n \n \n

Here's the timeline. Memorize it, because it matters.

March 3: Dario Amodei, Anthropic's CEO, publicly announces that Claude will not be used for autonomous weapons systems or mass domestic surveillance. He frames this as a safety commitment, not a political statement.

March 5: The Department of Defense designates Anthropic as a "supply chain risk." This isn't a slap on the wrist. It means any company with a Pentagon contract could face penalties for using Claude. We're talking hundreds of millions in potential revenue vaporized.

March 9: Anthropic sues. The complaint asks the court to vacate the designation entirely. Amodei clarifies that the restriction only applies to Claude's use as part of direct Pentagon contracts — the vast majority of Anthropic's customers are unaffected.

This is unprecedented. An AI company is being punished by the U.S. government not for doing something wrong, but for refusing to do something the government wanted.

Why this matters more than you think: Every AI company now faces a question they've been dodging — what happens when your biggest potential customer wants your technology for things your safety policy explicitly forbids?

OpenAI quietly removed its military use prohibition last year. Google's Project Maven controversy was back in 2018. Anthropic just drew the line in 2026 and got blacklisted for it.

The takeaway: AI safety isn't theoretical anymore. It has a price tag, and Anthropic just found out how much it costs.


2. Meta's "Avocado" Disaster: $135 Billion and Still Behind

Meta Avocado \n \n \n

Let's talk about Meta's week, which was — charitably — a catastrophe.

The New York Times reported that Meta has delayed the release of its next AI model, code-named "Avocado," from March to at least May. The reason? It can't match Google's Gemini 3.0, which launched in November. Four months ago.

Think about that. Meta committed $115-135 billion in capital expenditure for 2026 — an 88% increase over last year. They're building data centers. They're buying every GPU NVIDIA will sell them. They're designing custom chips. And their model still can't keep up with Google's.

But the real jaw-dropper is this: according to the NYT, Meta's AI leadership has discussed temporarily licensing Gemini to power Meta's consumer AI products while they fix Avocado.

Mark Zuckerberg, the man who bet the company's entire future on AI, might be running his AI on Google's technology.

The numbers don't lie:

  • Meta AI capex 2026: $115–135 billion
  • Google's AI revenue advantage: Search, Cloud, Android ecosystem
  • Meta's AI revenue: Still mostly… better ad targeting?
  • Model performance: Behind Gemini 3.0 (November 2025 release)

The takeaway: Money doesn't buy you the best AI. Google proved that a 4-month-old model can embarrass a $135 billion spending spree.


3. GTC 2026: Jensen's About to Change the Game (Again)

NVIDIA GTC \n \n

Starting Monday, NVIDIA's GPU Technology Conference runs March 16–19 in San Jose. Jensen Huang's keynote is free to stream. You should watch it. Here's why.

This year's GTC isn't just another product launch. NVIDIA is unveiling two major architectures simultaneously:

Vera Rubin — The next-generation GPU platform featuring HBM4 memory. This is the Blackwell successor, and early specs suggest 1.5 PB/s interconnect bandwidth. For context, that's roughly 3x what current H100 clusters deliver.

Feynman — The next-next-generation architecture designed specifically for agentic AI workloads. Not training. Not inference. Agent infrastructure. NVIDIA is building silicon for a use case that barely existed two years ago.

NemoClaw — NVIDIA's open-source enterprise AI agent platform, inspired by OpenClaw (297K GitHub stars). It's positioned as the enterprise version of what hobbyists are already running on their laptops.

The "Super Bowl of AI" nickname isn't hype this year. With Anthropic in a legal battle, Meta stumbling, and OpenAI retiring GPT-5.1 for 5.3/5.4, NVIDIA is the only company in the AI ecosystem having a good week.

The takeaway: While AI companies fight the government and each other, NVIDIA sells the shovels. And the shovels just got a lot more powerful.


4. The 12-Model Avalanche That Nobody Noticed

Model Avalanche \n

Between March 1–8, at least twelve major AI models and tools dropped from OpenAI, Alibaba, Lightricks, Tencent, Meta, ByteDance, and several universities. In one week.

We're so desensitized to model releases that a dozen dropped and the news cycle barely flinched. Two years ago, a single GPT release would dominate headlines for weeks.

Some highlights you might have missed:

  • OpenAI retired GPT-5.1 entirely (as of March 11), migrating everyone to GPT-5.3 or 5.4
  • Alibaba's Qwen continues its open-source blitz — now competitive with models 3x its size
  • ByteDance shipped video generation tools that make last year's Sora look primitive
  • Lightricks released production-ready image editing models that run on mobile

The pace is unsustainable. Nobody — not researchers, not developers, not users — can evaluate these models as fast as they ship. We're in a "publish or perish" arms race where getting the model out the door matters more than whether anyone needs it.

The takeaway: When 12 AI models drop in a week and nobody blinks, we've either reached the future or we've stopped paying attention. Probably both.


5. The Washington Problem: AI Bills Are Everywhere

AI Legislation

While everyone was watching the Anthropic lawsuit, state legislatures were busy.

Washington state just passed two significant AI bills before their March 12 adjournment: HB 1170 (AI disclosure requirements) and HB 2225 (chatbot safety for kids, including self-harm protocols). These aren't "we'll think about it" proposals. They're law.

This follows a national pattern. More than 30 states introduced AI-related legislation in Q1 2026. The EU AI Act is in full enforcement. And the Trump administration is simultaneously trying to deregulate AI development while threatening companies that won't play ball with defense contracts.

The result is a regulatory landscape that makes no sense. You can build any AI you want (federal deregulation), but if you don't let the Pentagon use it, you're a supply chain risk. States want transparency and safety guardrails. The feds want capabilities with no restrictions.

AI companies are now operating in a regulatory contradiction. And nobody's figured out how to resolve it.

The takeaway: The U.S. doesn't have an AI policy. It has fifty state AI policies and a federal government that punishes companies for having safety standards.


6. What This Week Really Means

Big Picture

Step back from the individual stories and a pattern emerges.

The AI industry just split into three camps:

  1. The Compliant — Companies willing to do whatever governments and militaries ask. OpenAI removed its military use ban. Others will follow. The money is too good.
  2. The Principled — Anthropic drew a line and got punished. Their stock of goodwill with safety researchers just skyrocketed. Their government revenue might never recover.
  3. The Infrastructure — NVIDIA doesn't care who wins the ethics debate. They sell chips to everyone. Jensen Huang sleeps well regardless of who builds what with his GPUs.

Meta falls into a fourth, sadder category: the ones who spent $135 billion and still can't keep up.

This week wasn't about benchmarks or model releases. It was about power. Who has it, who wants it, and what happens when an AI company tells the most powerful military on Earth "no."

The takeaway: AI stopped being about technology this week. It's about politics, money, and the uncomfortable question of what we're actually building all this for.


Quick Hits

  • Health AI agents launched at HIMSS — Amazon, Google, and Microsoft all announced AI doctor assistants. 88% of doctors are worried about skill loss. They should be.
  • Britain's AI investment program is mostly "imported chips in borrowed buildings," per The Guardian. Ouch.
  • Zalando forecasts a 2026 profit jump driven by AI. The "AI boosts earnings" era is reaching retail.

FAQ

Is Anthropic actually banned from government work?

No. The "supply chain risk" designation means Pentagon contractors face restrictions using Claude specifically for Pentagon contract work. Anthropic's commercial customers are unaffected. But the chilling effect on government-adjacent deals is real.

Why is Meta's Avocado model behind Gemini?

Details are scarce. The NYT reports it "has not performed as strongly as Gemini 3.0," which launched in November 2025. Meta's AI team improved over their previous models but couldn't match Google's quality, which benefits from a deeper bench of AI research talent and more diverse training data from Search.

When is NVIDIA's GTC keynote?

March 17, 2026 (Monday). Free to stream at nvidia.com, no registration required. Expect Vera Rubin GPU details, Feynman architecture preview, and NemoClaw enterprise agent platform.

Should I watch GTC?

If you work in AI — yes, absolutely. Jensen's keynotes routinely move the entire industry. Last year's Blackwell reveal changed inference economics overnight.

Independent Podcasters, Oscar Hopefuls, and the iHeartPodcast Awards: Your Complete Guide to SXSW

2026-03-19 15:48:34

\ South by Southwest 2026 hosts a remarkable concentration of awards ceremonies across audio, film, television, and digital culture. This guide covers the three major podcasting award events in detail: the 2026 iHeartPodcast Awards, the inaugural Independent Podcast and Creator Awards (Indie PaC Awards), the SXSW Film & TV Awards, each distinct in format, eligibility, and what they recognize.

The SXSW Awards Schedule

Fri–Sat, March 13–14: SXSW Pitch : Startup showcase across nine categories, a live audience and a panel of investor judges

Fri–Sun, March 13–15: Podcast Movement Evolutions at SXSW : Multi-day conference at Skybox on 6th

Sun, March 15 (4:30–7:00 PM): Indie PaC Awards Ceremony : Closes out Podcast Movement Evolutions

Mon, March 16 (7:00 PM): 2026 iHeartPodcast Awards Ceremony : ACL Live at the Moody Theater

Wed, March 18 (7:30 PM): SXSW Film & TV Awards : Paramount Theatre. An official qualifying festival for the Academy Awards Short Film competition

SXSW Hall of Fame & SXSW Community Service Awards : Both occurring during SXSW week, timing unconfirmed at time of publication

\ \


The 2026 iHeartPodcast Awards

Event Details

Date: Monday, March 16, 2026

Time: 7:00 PM CDT

Venue: ACL Live at the Moody Theater, Austin, Texas

Broadcast: Live on select iHeartMedia Radio Stations and the iHeartRadio app

Attendance: Open to select SXSW badge holders

Organizer: iHeartMedia in partnership with SXSW

Sponsor: Audible (Amazon)

This is the second consecutive year the iHeartPodcast Awards have been held at ACL Live at the Moody Theater during SXSW, cementing its position as a defining event for the audio industry. The ceremony spans all major podcast genres and honors the most influential work produced throughout 2025.

Watch the 2025 ceremony livestream for a taste of what to expect, or read how to tune in for live radio and streaming options on the night.

How Winners Are Determined

Fan-Voted: Podcast of the Year: The night’s top honor is the only category decided entirely by the public. Fan voting ran from January 14 through February 22, 2026, at iHeartPodcastAwards.com, with ten nominated shows competing.

Jury-Judged Category Awards: All other categories are determined by a panel of industry leaders, creatives, and visionaries assembled by iHeartMedia, evaluating shows across the full range of genres and formats.

Icon Awards: Three special recognition awards honor individuals for foundational, career-spanning contributions to podcasting. These are not subject to a public vote.

The Icon Awards

Presented separately from the genre categories, the three Icon Awards recognize people: not shows: who have shaped the podcasting medium itself:

Social Impact Award: Given to a creator who has used their platform to drive meaningful real-world change, particularly around mental health, wellness, or community.

Audible Audio Pioneer Award: Sponsored by Audible (Amazon). Honors a career-spanning contribution to audio journalism, interview craft, or narrative audio: someone who helped define what the medium can be.

Innovator Award: Recognizes a creator who introduced a genuinely new format, approach, or model to podcasting: someone who changed the way audiences and creators think about what podcasts can do.

Podcast of the Year: The Fan Vote

The top honor of the night is listener-determined. Ten nominated shows compete across a wide cross-section of genres, formats, and audience sizes. The winner is announced live at the ceremony on March 16.

The Categories

The 2026 iHeartPodcast Awards span 28 categories in total: 24 genre categories, three performance and format categories recognizing the craft of hosting, ensemble chemistry, and advertising integration, and Podcast of the Year: the night’s top honor, decided entirely by public vote.

Best Business & Finance: entrepreneurship, investing, economics, and professional development

Best Comedy: originality, consistency, and comedic craft

Best Crime: true crime and criminal justice storytelling

Best Pop Culture: entertainment, celebrity, and cultural moments

Best Food: culinary storytelling and food journalism

Best Wellness & Fitness: health, fitness, mental wellness, and personal development

Best History: historical storytelling and archival journalism

Best Kids & Family: shows designed for children and family listening

Best Music: music journalism, criticism, and music-focused storytelling

Best News: current events and journalism-focused shows

Best Fiction: scripted fiction, audio drama, and narrative storytelling

Best Sports: sports commentary, analysis, and fan-facing coverage

Best Science: science communication and discovery storytelling

Best Technology: tech industry coverage and digital culture commentary

Best Political: political commentary, analysis, and civic affairs discussion

Best TV & Film: entertainment coverage focused on television and cinema

Best Spanish Language: Spanish-language podcasts across all genres

Best Advice/Inspirational: self-help, life coaching, and personal growth shows

Best Beauty & Fashion: style, beauty industry, and fashion culture coverage

Best Travel: travel storytelling and destination content

Best Religious & Spirituality: faith, spiritual practice, and religious community programming

Best Branded Podcast: original podcasts created by or for a brand as editorial content

Best International Podcast: English-language shows produced outside the United States

Best Emerging Podcast: new shows with standout debuts in 2025

Performance and Format Categories

Best Host: outstanding individual hosting craft including voice, preparation, and audience connection

Best Ensemble Cast: multi-host shows with exceptional chemistry and group dynamics

Best Ad Read: quality and effectiveness of in-show advertising as a creative skill; unique among podcast awards globally in recognizing ad reads as a form of performance in their own right

\


The Independent Podcast & Creator Awards (Indie PaC Awards)

What the Indie PaC Awards Are

The Indie PaC Awards are a brand-new awards ceremony making their inaugural appearance at SXSW 2026. Created and organized by Oxford Road, one of the largest podcast advertising agencies. These awards exist for a specific and deliberate purpose: to recognize podcasters and creators who built their audiences entirely outside major network control, without platform exclusives, upfront guarantees, or corporate editorial oversight.

The awards are grounded in a clear premise: that independent creators did not just survive without network backing, they thrived: and that they have never had an awards show built specifically for them. The Indie PaC Awards are designed to change that.

Event Details

Date: Sunday, March 15, 2026

Time: 4:30 PM – 7:00 PM CDT

Venue: Skybox on 6th, Austin, Texas (Podcast Movement Evolutions at SXSW)

Organizer: Oxford Road (world’s largest podcast advertising agency)

Founding Sponsor: Libsyn

Website: indiepac.com

Request Invitation : https://oxfordroad.regfox.com/indiepac

Eligibility: What Makes a Show “Independent”

Eligibility has precise structural criteria. A show must meet all of the following to qualify:

  • IP must be owned by the creator(s) or an independently-held company: not a major media corporation.
  • The show must not be majority-owned by, or subject to editorial oversight from: Spotify, SiriusXM, iHeartMedia, Amazon, major broadcast networks, or any platform with an exclusive distribution deal.
  • The show must not receive a minimum guarantee or upfront payment from a major audio platform.
  • Projects that primarily exist as promotional vehicles for top-tier celebrities are excluded.

Of course, criteria on paper only go so far; a follow-up article will get into how these lines are drawn in practice.

\

The Four Award Types

\ The Jury Awards: Ten categories judged by an independent panel drawn from podcasting, journalism, and brand industries. Jurors evaluate nominees on craft, audience engagement, and overall quality.

• Best Independent Interview Show

• Best Independent Narrative/Storytelling

• Best Independent Comedy

• Best Independent Specialty Show

• Best Independent Business & Entrepreneurship

• Best Independent Health & Wellness

• Best Independent Sports & Recreation

• Best Independent News & Politics

• Breakthrough Independent

• Independent Creator of the Year

The ORBIT Influence Awards

ORBIT Influence Awards: Five categories determined entirely by Oxford Road’s proprietary ORBIT benchmarking system, which measures real advertiser performance data: grounded in measurable outcomes, not taste or popularity.

Highest Impact

Highest Volume

Breakout Performer

The Perfect Score

Advertiser’s Choice

The Patron Awards

Patron Awards: Recognizes brands and advertisers that have most significantly invested in independent creators: formally acknowledging the buy-side’s role in sustaining independent podcasting.

Indie Pac Patron: Scale

Indie Pac Patron: Commitment

The Oxford Prize in Podcasting

A fourth award type exists; honoring a single creator who embodies the full potential of the independent podcasting medium. Mentions surface only in Oxford Road’s official launch press release and the industry outlets that covered it. As of March 12, 2026, the award is absent from the Indie PaC Awards website (indiepac.com) entirely.


SXSW Film & TV Awards

The SXSW Film & TV Awards offer one of the more direct paths to Oscar eligibility for short films among major festivals. Winning in the Best Animated Short, Best Narrative Short, or Best Documentary Short categories makes the film immediately eligible for Academy Award consideration. In addition, any qualifying British short film or animation that screens at SXSW (even without winning) becomes eligible for BAFTA nomination in the British Short Film category. This level of official industry recognition is uncommon for a festival that blends tech, culture, and film discovery.

The SXSW Film & TV Awards conclude the Film & TV Festival, running March 12 through 18, 2026, with the ceremony on closing night at the historic Paramount Theatre in Austin. A jury of critics, filmmakers, and industry professionals selects winners across every competitive section, focusing on fresh, often world-premiere work. Unlike the Oscars or Emmys, which honor widely released projects, SXSW celebrates films and shows at their debut moment.

Event Details

Date: Wednesday, March 18, 2026 (closing night of SXSW)

Time: 7:30 PM CT

Venue: The Paramount Theatre, Austin, Texas

Live Updates: Winners announced live on SXSW’s official Instagram, then posted to sxsw.com

Certification: All Audience Awards certified by Maxwell Locke & Ritter

Oscar Eligibility: Short film winners (Animated, Narrative, Documentary) become immediately eligible for Academy Award® consideration

BAFTA Eligibility: Any British short film or animation that screens (not just wins) becomes eligible for BAFTA nomination

Spirit Awards: All SXSW feature films are eligible for the Film Independent Spirit Award

The SXSW Film & TV Awards span a wide range of competitive sections, reflecting the festival’s focus on fresh, innovative storytelling across formats. Here are just a few highlights from the 2026 lineup

Feature Film Competition Categories

• Narrative Feature Competition: open to world premiere fiction feature films; recognizes emerging voices in narrative storytelling through a Grand Jury Prize and Special Jury Recognition

• Documentary Feature Competition: open to world premiere documentary feature films; carries its own separate Grand Jury Prize and Audience Award for real-world storytelling by emerging voices

Short Film Program: Jury Award Categories

Narrative Short Competition: fiction short films showcasing exceptional storytelling; winner becomes eligible for Academy Award nomination

Documentary Short Competition: nonfiction short films built on authentic real-world storytelling; winner becomes eligible for Academy Award nomination

Animated Short Competition: hand-drawn, stop-motion, and digital animation; winner becomes eligible for Academy Award nomination; any British animated short that screens at SXSW (not just wins) becomes eligible for BAFTA nomination

Midnight Short Competition: genre shorts in horror, gore, and dark comedy; one of SXSW’s most distinctive competitive sections

Texas Short Competition: short films with specific ties to Texas by filmmaker origin, subject matter, or production location

Music Video Competition: 20 music videos in competition in 2026; one of the largest music video competitive programs at any film festival worldwide

• Independent TV Pilot Competition: open to world or North American premiere independent TV pilots

• XR Experience Competition: extended reality narrative experiences including VR, AR, and mixed reality

Film Design Awards

Very little information is currently available about the Film Design Awards for SXSW 2026, and the program appears to have been scaled back in recent years. The full historical format: which once included two separate jury-judged competitions (Poster Design and Title Design) with Adobe sponsorship, dedicated juries, and Audience Award tracks is no longer fully detailed or branded as such on the official site.

Poster Design Competition

The Poster Design Competition remains active. It is open to poster art for any film screening at SXSW and is judged on craft, typography, imagery, and the ability to convey narrative.

Audience Awards

Every competitive section (except Special Events) is also eligible for a section-specific Audience Award, voted on by festivalgoers and certified by the independent accounting firm Maxwell Locke & Ritter.

This dual-track system: jury prize and audience prize: means the same film can receive recognition from both industry professionals and general audiences, often with strikingly different outcomes.


SXSW Hall of Fame

The SXSW Hall of Fame induction is also happening during SXSW 2026 (March 12–18, Austin). This unique annual honor recognizes one trailblazer whose work has profoundly shaped the connected digital world: spanning tech, journalism, activism, immersive media, entrepreneurship, and more, with no industry boundaries.


Due to the volume of events, panels, and partnering organisations across SXSW, exact details on award categories, ticketing structures, and affiliated programming can be difficult to pin down. We reached out to SXSW to ask whether tickets could be purchased for award ceremonies individually, but did not receive a response at time of publication. For the most accurate and up-to-date information, we strongly recommend checking the official SXSW schedule and the individual event pages linked throughout this article before attending.

Details throughout this article are subject to change. All information is accurate as of March 12, 2026.

\

Best Speech to Text APIs to Build an AI Notetaker in 2026

2026-03-19 15:45:56

\ This comprehensive guide evaluates the top 8 speech-to-text APIs in 2026, comparing accuracy, pricing, and features to help developers choose the right Voice AI solution for their applications. We'll cover everything from real-time streaming capabilities to multilingual support, with detailed analysis of each provider's strengths for specific use cases like voice agents, meeting transcription, and contact center analytics.

Best speech to text API comparison table

The best speech-to-text APIs convert spoken audio into accurate written text through advanced AI models. These APIs handle everything from voice agents requiring instant responses to batch processing of hours-long recordings.

| API Provider | Accuracy (WER) | Real-time Streaming | Languages | Key Features | Starting Price | Best For | |----|----|----|----|----|----|----| | AssemblyAI | ~5.6% | ✓ WebSocket | Up to 99 (Universal-2) | Universal models, speaker diarization, sentiment analysis | $0.15/hour | AI notetakers, voice agents | | Deepgram | 5-7% | ✓ WebSocket | 40+ | Nova-2 model, low latency | $0.0125/min | Real-time applications | | OpenAI Whisper | 4-8% | ✗ | 99 | Whisper Large-v3, open source | $0.006/min | Batch transcription | | Google Cloud | 6-10% | ✓ gRPC | 125+ | Chirp model, GCP integration | $0.016/min | Enterprise deployments | | Microsoft Azure | 7-11% | ✓ WebSocket | 100+ | Custom models, Azure ecosystem | $0.015/min | Microsoft stack users | | AWS Transcribe | 8-12% | ✓ WebSocket | 100+ | Medical models, AWS integration | $0.024/min | AWS-native applications | | Gladia | 8-10% | ✓ WebSocket | 99 | Audio intelligence, translation | $0.61/hour | Multilingual content | | Rev AI | 5-9% | ✓ WebSocket | 36 | Human-in-the-loop option | $0.02/min | English-focused apps |

Top 8 best speech to text APIs in 2026

1. AssemblyAI

AssemblyAI's Voice AI infrastructure platform delivers industry-leading accuracy through its Universal models. The platform combines breakthrough accuracy with developer-friendly implementation, making it the go-to choice for startups building AI notetakers and enterprises deploying voice agents at scale.

Customers consistently report their users immediately notice the quality difference when switching to AssemblyAI. This leads to higher satisfaction scores and fewer support tickets.

The Universal-3 Pro Streaming model handles everything from noisy phone calls to multi-speaker meetings with remarkable consistency. It processes audio in real-time while maintaining accuracy across diverse conditions.

Main features:

  • Universal-3 Pro model: Industry-leading accuracy across audio conditions
  • Real-time streaming: WebSocket transcription with sub-300ms latency
  • Advanced speech understanding: Sentiment analysis, entity detection, and summarization via the LLM Gateway
  • Speaker diarization: Supports up to 10 speakers by default, expandable to more with configuration
  • Reliability: 99.99% uptime SLA with unlimited concurrency

Ideal for:

  • Developers building AI notetakers and meeting assistants
  • Voice agents requiring real-time transcription
  • Contact center analytics and quality monitoring
  • Startups scaling from prototype to millions of hours

Pricing:

  • Pay-as-you-go starting at $0.15 per hour
  • No upfront commitments or contracts required
  • Volume discounts automatically applied
  • Free tier with $50 credit to start

2. Deepgram

Deepgram's Nova-2 model processes audio with minimal latency through end-to-end deep learning architecture. The platform does well at real-time transcription scenarios where every millisecond counts.

Their streaming API maintains consistent performance even under heavy load. Accuracy can vary more than AssemblyAI across different audio types, but speed remains their strongest advantage.

Main features:

  • Nova-2 model: Optimized for speed and efficiency
  • WebSocket streaming: Low latency real-time processing
  • Batch processing: Handles pre-recorded audio files
  • Custom model training: Available for specialized use cases
  • On-premise deployment: Options for data-sensitive environments

Ideal for:

  • Live captioning and broadcasting applications
  • Voice user interfaces requiring instant responses
  • Real-time translation services
  • High-volume batch processing workflows

Pricing:

  • Starting at $0.0125 per minute
  • Pay-as-you-go and growth plans available
  • Enterprise contracts with custom pricing

3. OpenAI Whisper

OpenAI's Whisper represents a breakthrough in open-source speech recognition, with the Large-v3 model supporting 99 languages through transformer architecture. While it doesn't offer real-time streaming, Whisper excels at batch transcription with impressive multilingual accuracy.

The API version through OpenAI provides convenient cloud processing without managing infrastructure. Many developers also self-host Whisper for complete control and cost optimization at scale.

Main features:

  • Whisper Large-v3: Supports 99 languages with high accuracy
  • Automatic language detection: Identifies spoken language automatically
  • Translation capability: Converts speech to English text
  • Timestamp generation: Provides word-level timing information
  • Open-source availability: Free model for self-hosting

Ideal for:

  • Multilingual content transcription projects
  • Podcast and video subtitling workflows
  • Academic research requiring language diversity
  • Cost-sensitive batch processing applications

Pricing:

  • $0.006 per minute via OpenAI API
  • Free when self-hosted on your infrastructure

4. Google Cloud Speech-to-Text

Google Cloud Speech-to-Text with the Chirp model brings the company's vast AI research to developers through comprehensive Google Cloud Platform integration. The service handles 125+ languages and benefits from continuous improvements driven by Google's massive data resources.

Performance remains solid across use cases, though the complexity of GCP can overwhelm smaller teams. The platform shines when you're already invested in the Google Cloud ecosystem.

Main features:

  • Chirp universal speech model: Leverages Google's latest research
  • Extensive language support: 125+ languages and dialects
  • Real-time streaming: gRPC-based streaming transcription
  • Speaker diarization: Identifies up to 8 speakers
  • Automatic formatting: Punctuation and capitalization included

Ideal for:

  • GCP-native applications and workflows
  • Global enterprise deployments
  • Multi-language customer service centers
  • Video content analysis and indexing

Pricing:

  • $0.016 per minute for standard model
  • $0.024 per minute for enhanced features
  • Volume discounts available for large usage

5. Microsoft Azure Speech Services

Azure Speech Services integrates deeply with Microsoft's ecosystem, offering custom model training and comprehensive language coverage. The platform particularly excels for organizations already using Microsoft 365 or Azure services.

Custom speech models let you fine-tune recognition for industry-specific terminology. Real-time transcription works well, though latency typically runs higher than specialized providers.

Main features:

  • Custom speech models: Train models for specific vocabulary
  • Broad language support: 100+ languages and variants
  • Dual processing modes: Real-time and batch transcription
  • Teams integration: Built-in meeting transcription
  • Neural voice synthesis: Text-to-speech capabilities included

Ideal for:

  • Microsoft-centric organizations and workflows
  • Applications requiring custom vocabulary
  • Teams meeting transcription and analysis
  • Azure-native application development

Pricing:

  • $0.015 per minute for standard transcription
  • $0.024 per minute for custom models
  • Free tier includes 5 hours monthly

6. AWS Transcribe

AWS Transcribe provides reliable speech-to-text within Amazon's cloud infrastructure, with specialized models for medical and call center use cases. The service integrates seamlessly with other AWS services like S3 and Lambda.

While accuracy lags slightly behind leaders, AWS Transcribe offers solid performance for AWS-native applications. The medical transcription model understands clinical terminology particularly well.

Main features:

  • Specialized models: Medical and call center optimized
  • Custom vocabulary: Support for domain-specific terms
  • Real-time streaming: WebSocket-based live transcription
  • Content redaction: Automatic removal of sensitive information
  • Channel identification: Separates speakers in phone calls

Ideal for:

  • AWS-native architectures and deployments
  • Healthcare applications requiring medical accuracy
  • Call center analytics and monitoring
  • Compliance-focused enterprise deployments

Pricing:

  • $0.024 per minute for standard transcription
  • $0.039 per minute for medical model
  • Volume pricing tiers available

7. Gladia

Gladia focuses on audio intelligence beyond basic transcription, offering built-in translation and content analysis features. The platform processes 99 languages with emphasis on European language accuracy.

Their API combines multiple audio processing capabilities in one call. This makes Gladia efficient for applications needing transcription plus translation or sentiment analysis.

Main features:

  • Multilingual processing: 99 languages supported
  • Real-time translation: Convert speech across languages
  • Audio summarization: Generate content summaries
  • Emotion detection: Identify speaker sentiment and emotions
  • Topic classification: Categorize content automatically

Ideal for:

  • Multilingual content platforms and services
  • International meeting transcription
  • Content moderation systems
  • Cross-language communication tools

Pricing:

  • $0.61 per hour of audio processed
  • Pay-as-you-go pricing model
  • Enterprise plans with custom features

8. Rev AI

Rev AI combines automated speech recognition with optional human review, delivering high accuracy for English content. The platform started with human transcription services before adding AI capabilities.

Their English models perform exceptionally well on clear audio. The human-in-the-loop option provides near-perfect accuracy when needed, though at higher cost and longer turnaround.

Main features:

  • English optimization: Models tuned specifically for English
  • Human review option: Professional editors for perfect accuracy
  • Dual API modes: Async and streaming transcription
  • Custom vocabulary: Support for specialized terminology
  • Transcript formatting: Verbatim and clean output modes

Ideal for:

  • English-only applications and content
  • Legal and compliance documentation
  • Media production workflows
  • Applications requiring highest accuracy

Pricing:

  • $0.02 per minute for AI-only transcription
  • $1.50 per minute with human review
  • Volume discounts for large customers

What is a speech to text API?

A speech-to-text API is a cloud-based service that converts spoken audio into written text using AI models trained on millions of hours of speech data. These APIs process audio files or streams through acoustic models that recognize sound patterns and language models that predict likely word sequences.

The result comes back as structured JSON data with the transcript, timestamps, and confidence scores for each word. Modern speech-to-text APIs use transformer architectures and neural networks to achieve human-level accuracy.

Core components work together:

  • Acoustic model: Identifies phonemes and sound patterns in audio
  • Language model: Predicts word sequences based on context
  • Decoder: Combines both models to generate final transcript

They handle various audio formats and sample rates. You can process either pre-recorded files through REST APIs or live audio through WebSocket connections.

How to choose the best speech to text API

Selecting the right speech-to-text API depends on your specific technical requirements, accuracy needs, and budget constraints. Different use cases demand different strengths—a voice agent needs ultra-low latency while podcast transcription prioritizes accuracy over speed.

Accuracy and performance

Word error rate (WER) measures transcription accuracy by calculating the percentage of words transcribed incorrectly. Top APIs achieve under 10% WER on clear audio, but real-world performance depends heavily on audio quality, speaker accents, background noise, and domain-specific vocabulary.

Testing with your actual audio data reveals true accuracy better than published benchmarks. What works for one type of content might fail completely on another.

Key metrics to evaluate:

  • Word Error Rate (WER): Industry standard accuracy measurement (lower is better)
  • Latency: Time from audio input to text output (critical for real-time use)
  • Real-time factor (RTF): Processing speed relative to audio length

Language support and coverage

Global applications require APIs supporting multiple languages with consistent quality across each one. While some providers claim 100+ languages, actual performance varies significantly—many only deliver production-ready accuracy for major languages.

Consider whether you need just transcription or also features like punctuation, capitalization, and speaker diarization in each language. Some APIs excel at English but struggle with accented speech or less common languages.

Real-time vs batch processing

Real-time streaming transcription powers voice agents and live captioning by processing audio chunks as they arrive through WebSocket connections. Results typically arrive within 200-500ms, enabling immediate responses.

Batch processing handles pre-recorded files asynchronously, optimizing for accuracy over speed with support for larger files and longer processing windows. Choose streaming when users expect immediate responses, batch processing for podcasts or meeting recordings.

Pricing and total cost

Speech-to-text pricing typically follows per-minute or per-hour models, ranging from $0.006 to $0.024 per minute for standard transcription. Watch for hidden costs like minimum monthly commitments, overage charges, or separate fees for features like diarization.

Some providers charge extra for streaming, higher sample rates, or additional languages. Others include these features in their base pricing.

Cost optimization strategies:

  • Start with pay-as-you-go to understand usage patterns
  • Negotiate volume discounts once you exceed regular usage
  • Consider self-hosting open-source models at very high volumes

Developer experience and documentation

Comprehensive documentation with code examples in multiple languages dramatically reduces integration time. Look for providers offering SDKs in your programming language, clear error messages, and responsive support.

The best APIs include interactive playgrounds for testing and detailed guides for common use cases. Poor documentation can turn a technically superior API into a development nightmare.

Best speech to text APIs by use case

Different applications require different strengths from speech-to-text APIs. What works for batch transcription might fail completely for real-time voice agents.

Real-time transcription and voice agents

Voice agents demand sub-second latency with streaming transcription that processes audio chunks as users speak. AssemblyAI's Universal-3 Pro Streaming model and Deepgram's Nova-2 excel here, delivering partial transcripts with sub-300ms latency that let voice agents respond naturally.

These APIs handle interruptions, background noise, and varied speaking styles while maintaining conversation flow. Integration with LLMs requires careful orchestration—the speech-to-text API must quickly deliver accurate transcripts that the LLM processes before text-to-speech creates the response.

Every millisecond counts when building conversational AI that feels natural to users.

Meeting notes and AI notetakers

AI notetakers require accurate speaker diarization to identify who said what, plus strong performance on long-form content with multiple speakers talking over each other. AssemblyAI handles 16+ speakers while maintaining transcript quality, and supports generating meeting summaries and chapter-style outputs via the LLM Gateway.

These capabilities transform raw meeting audio into structured, actionable notes. The best meeting transcription APIs also offer summarization and action item extraction, providing immediate value beyond basic transcription.

Call centers and customer support

Contact centers need PII redaction to protect sensitive customer data, sentiment analysis to gauge satisfaction, and real-time agent assist capabilities. AssemblyAI automatically detects and redacts credit card numbers, social security numbers, and other sensitive information while maintaining transcript readability.

Sentiment analysis runs alongside transcription to flag frustrated customers for immediate attention. This helps supervisors intervene before situations escalate.

Essential compliance features:

  • PII redaction: Automatic removal of sensitive data
  • Data residency: Processing in specific geographic regions
  • Audit logs: Complete tracking of data access and processing

Multilingual applications

Global applications require consistent accuracy across languages, with some providers like Gladia and OpenAI Whisper supporting 99+ languages. Consider whether you need language detection, code-switching support for multilingual speakers, and translation capabilities.

Performance often varies dramatically between languages—test thoroughly with your target languages before committing. English typically receives the most optimization, while less common languages may have significantly higher error rates.

Getting started with speech to text APIs

Integration typically starts with signing up for an API key, which authenticates your requests to the service. Most providers offer free tiers or credits to test their APIs before committing to paid plans.

Your first API call usually involves sending a simple audio file and receiving back the transcript in JSON format. The response includes the text, word-level timestamps, and confidence scores for each recognized word.

Audio preparation best practices:

  • Sample rate: Use 16kHz or higher for optimal accuracy
  • Format: PCM WAV or FLAC preserves quality better than MP3
  • Channels: Mono audio often performs better than stereo

For production deployments, implement proper error handling with exponential backoff for rate limits and network issues. Monitor your usage through provider dashboards to track costs and identify optimization opportunities.

Set up webhooks for async processing to avoid polling for results. This reduces server load and provides faster notifications when transcription completes.

\