MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

Rust 1.76.0: What Changes Did It Bring In?

2026-04-05 03:00:39

The Rust team is happy to announce a new version of Rust, 1.76.0. Rust is a programming language empowering everyone to build reliable and efficient software.

\ If you have a previous version of Rust installed via rustup, you can get 1.76.0 with:

$ rustup update stable

\ If you don't have it already, you can get rustup from the appropriate page on our website, and check out the detailed release notes for 1.76.0.

\ If you'd like to help us out by testing future releases, you might consider updating locally to use the beta channel (rustup default beta) or the nightly channel (rustup default nightly). Please report any bugs you might come across!

What's in 1.76.0 stable

This release is relatively minor, but as always, even incremental improvements lead to a greater whole. A few of those changes are highlighted in this post, and others may yet fill more niche needs.

ABI compatibility updates

A new ABI Compatibility section in the function pointer documentation describes what it means for function signatures to be ABI-compatible. A large part of that is the compatibility of argument types and return types, with a list of those that are currently considered compatible in Rust. For the most part, this documentation is not adding any new guarantees, only describing the existing state of compatibility.

\ The one new addition is that it is now guaranteed that char and u32 are ABI compatible. They have always had the same size and alignment, but now they are considered equivalent even in function call ABI, consistent with the documentation above.

Type names from references

For debugging purposes, any::type_name::<T>() has been available since Rust 1.38 to return a string description of the type T, but that requires an explicit type parameter. It is not always easy to specify that type, especially for unnameable types like closures or for opaque return types. The new any::type_name_of_val(&T) offers a way to get a descriptive name from any reference to a type.

\

fn get_iter() -> impl Iterator<Item = i32> {
    [1, 2, 3].into_iter()
}

fn main() {
    let iter = get_iter();
    let iter_name = std::any::type_name_of_val(&iter);
    let sum: i32 = iter.sum();
    println!("The sum of the `{iter_name}` is {sum}.");
}

\ This currently prints:

The sum of the `core::array::iter::IntoIter<i32, 3>` is 6.

Stabilized APIs

  • Arc::unwrap_or_clone
  • Rc::unwrap_or_clone
  • Result::inspect
  • Result::inspect_err
  • Option::inspect
  • type_name_of_val
  • std::hash::{DefaultHasher, RandomState} These were previously available only through std::collections::hash_map.
  • ptr::{from_ref, from_mut}
  • ptr::addr_eq

Other changes

Check out everything that changed in RustCargo, and Clippy.

Contributors to 1.76.0

Many people came together to create Rust 1.76.0. We couldn't have done it without all of you. Thanks!


The Rust Release Team

\ Also published here

\ Photo by Jeremy Bishop on Unsplash

\

Your PyTorch Model Is Slower Than You Think: This Is the Reason Why

2026-04-05 01:00:36

\ Tested on: RTX 5060 · PyTorch 2.7 · CUDA 13.1 · Windows 11


You moved your model to the GPU. You watched nvidia-smi climb toward 100%. You assumed you were done.

\ You probably aren’t.

\ GPU utilization is a coarse, 100ms-sampled metric. A GPU can report 80% utilization while spending most of that time idle between kernels, starved by a DataLoader that can’t keep up, or stalled waiting for your Python code to read a loss value.

\ We’ll cover three categories of hidden bottlenecks I measured on a real RTX 5060 training loop. None of them is in your model architecture. All of them are fixable in minutes. And the numbers will probably surprise you, both in where the speedup is large, and where it isn’t.


The Mental Model You Need First

Before the benchmarks, one concept: the CPU and GPU are two separate workers running in parallel.

\ When you call loss.backward(), PyTorch doesn’t wait for the GPU to finish. It queues work onto the CUDA stream and returns immediately. The CPU races ahead to the next line of Python while the GPU drains its work queue independently.

CPU:  [queue forward] [queue backward] [queue optimizer] [queue forward] ...
GPU:                  [  forward  ][   backward   ][ optimizer ][  forward  ] ...

\ This asynchrony is why GPUs are fast. The CPU is always preparing the next batch of work while the GPU executes the current one.

\ A synchronization point is anything that breaks this pipeline, forcing the CPU to stop and wait until the GPU finishes all pending work. The GPU goes idle. The CPU goes idle. Then they both start again from scratch.

\ This is the bubble. It’s invisible unless you’re looking for it.


Bottleneck 1: CPU → GPU Sync Points

The .item() Tax-Less Than You’d Expect

The most commonly cited sync point is .item(), which pulls a scalar value from the GPU to Python. Every tutorial warns about it. Most of the warnings are overstated.

\ Here’s what it actually costs on a compute-heavy model:

# Version A: .item() every step
running_loss += loss.item()   # sync on every iteration

# Version B: accumulate on GPU, read once
running_loss += loss.detach() # stays on GPU
total = running_loss.item()   # one sync at the end

Results (RTX 5060, 1024→2048→10 MLP, batch 256):

| | ms/step | |----|----| | .item() every step | 2.33ms | | deferred .item() | 2.26ms | | Speedup | 1.03x |

3% faster. On this model, not worth losing sleep over.

\ Why? The GPU is doing ~~2ms of real computation per step. The sync overhead (~~0.1ms) is small relative to that. By the time Python calls .item(), the GPU has often already finished. There’s nothing to wait for.

\ The honest answer: a single .item() per step barely matters on modern hardware when your GPU kernels take several milliseconds.

\

The logging anti-pattern and where it actually hurts

Now here’s the version that actually bites people. A typical training loop with naive logging:

# What "just add some logging" looks like in practice
for step, (x, y) in enumerate(loader):
    optimizer.zero_grad()
    logits = model(x)
    loss = criterion(logits, y)
    loss.backward()
    optimizer.step()

    # Each of these is a separate sync point:
    log("loss", loss.item())                                    # sync 1
    log("accuracy", (logits.argmax(1) == y).float().mean().item())  # sync 2
    log("confidence", logits.max(dim=1).values.mean().item())   # sync 3
    log("logit_var", logits.var().item())                       # sync 4
    for p in model.parameters():
        log("grad_norm", p.grad.norm().item())                  # sync 5..N

Every .item() call is a full GPU stall. Six metrics logged naively means six sync points per step. Here’s what that looks like in the profiler:

One complete train_sync_heavy step (~2.7ms) on the CPU training thread. The brown aten::item bars and the wide magenta aten::local_scalar_dense block (spanning roughly 60% of the step) are CPU stalls; every call forces the CPU to halt until the GPU drains its queue. There are 13 aten::item events per step, arriving in ~6 distinct synchronization clusters. The dominant stall at the right edge of the step is a single ~1.6ms block where the CPU is doing nothing but waiting.

\ The fix: keep everything on GPU until you’re done with the step, then move it all to CPU in a single operation.

# Compute all metrics as GPU tensors — no syncs yet
loss_t  = loss.detach()
acc_t   = (logits.detach().argmax(1) == y).float().mean()
conf_t  = logits.detach().max(dim=1).values.mean()
var_t   = logits.detach().var()
gnorm_t = torch.stack([
    p.grad.norm()
    for p in model.parameters()
    if p.grad is not None
]).mean()

# Single sync: ship all scalars to CPU at once
loss_v, acc_v, conf_v, var_v, gnorm_v = (
    torch.stack([loss_t, acc_t, conf_t, var_t, gnorm_t]).tolist()
)

\ Here’s the same step after the fix, at the same zoom level:

One complete train_sync_clean step at an identical zoom. The 12 aten::item calls that were stalling the CPU are now complete in 1–3 µs each, the GPU had already finished those ops asynchronously, so there was nothing to wait for. The single remaining aten::local_scalar_dense block at the far right is the one intentional sync: the final .item()call that moves the accumulated loss to Python. The step is the same duration, but the GPU was busy the whole time instead of repeatedly going idle. \n

Results (same model, same hardware):

| | ms/step | |----|----| | Naive logging (N syncs/step) | 3.06ms | | Batched logging (1 sync/step) | 2.40ms | | Speedup | 1.28x |

27% slower. Just from how you read your metrics.

\ At 50,000 training steps, that’s the difference between a 2.5-hour run and a 3.2-hour run — for code that produces identical results.

The Two Culprits You Won’t See in Your Own Code

W&B and TensorBoard. Both call .item() internally when you pass a tensor to their logging APIs. If you’re calling wandb.log({"loss": loss}) inside your training loop, you have a sync point on every step. Pass a Python float instead: wandb.log({"loss": loss.item()}) — Yes, the sync still happens, but now it’s your explicit choice, and you can batch it.

\ Conditional branches on tensor values. This one is subtle:

if loss > threshold:          # forces .item() implicitly — Python must
    trigger_early_stop()      # know the value to evaluate the condition

\ Use torch.where or move the threshold logic to a scheduled check every N steps instead.

How to Find Sync Points in Your Own Code

Run your training loop under torch.profiler with with_stack=True:

from torch.profiler import profile, ProfilerActivity, schedule, tensorboard_trace_handler

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    schedule=schedule(wait=1, warmup=2, active=10),
    on_trace_ready=tensorboard_trace_handler("./traces"),
    with_stack=True,
) as prof:
    for step in range(13):
        train_step()
        prof.step()

Open the trace in Perfetto UI. Look for cudaStreamSynchronize events on the CPU thread. Each one is a sync point. The with_stack=True flag tells you exactly which Python line triggered it.


Bottleneck 2: DataLoader stalls

This is the one most likely to be destroying your throughput right now.

What Starvation Looks Like

The DataLoader and the GPU training loop are a producer-consumer pipeline. The DataLoader produces batches; the GPU consumes them. When the producer is slower than the consumer, the GPU sits idle at the start of every step, waiting for data.

\ Open any profiler trace on a starved DataLoader, and you’ll see it immediately: a long gap at the beginning of each training step, before a single GPU kernel has fired. The CPU is in DataLoader.__next__, doing PIL decodes and transforms in the main process, while the GPU is doing nothing.

\ The fix requires exactly two DataLoader arguments.

The num_workers Sweep

DataLoader(dataset, batch_size=128, num_workers=N, pin_memory=True)

I measured throughput across 5 configs on a dataset with heavy image transforms (random crop, color jitter, normalize) at 224×224:

| numworkers | pinmemory | samples/sec | speedup | |----|----|----|----| | 0 | False | 505 | 1.0x | | 2 | False | 886 | 1.75x | | 2 | True | 957 | 1.9x | | 4 | True | 1,619 | 3.2x | | 8 | True | 2,281 | 4.52x |

4.52x throughput improvement. Two arguments. The model, optimizer, and loss function are identical. The only change is how data gets to the GPU.

What These Arguments Actually Do

num_workers=N spawns N worker processes that prefetch and transform batches in parallel. While the GPU is training on batch K, workers are already preparing batches K+1, K+2, … K+N. The GPU never waits.

\ num_workers=0 means the main process does all of this serially, fetch, transform, train, fetch, transform, train. The GPU is idle during every fetch+transform phase.

\ A reasonable starting value is num_workers = min(os.cpu_count(), 8). The throughput curve flattens or dips past a certain point (usually when worker processes start competing for memory bandwidth), so sweep a few values and pick the knee.

\ pin_memory=True allocates host tensors in page-locked memory. This lets the CUDA DMA engine transfer data to the GPU without CPU involvement, and — critically — allows that transfer to overlap with GPU compute on the previous batch. Without pinned memory, host→device transfers block on pageable memory and can’t be pipelined.

\ pin_memory=True only does anything useful when num_workers > 0. Workers must be the ones allocating the tensors for them to be pinned correctly. With num_workers=0, this flag is a no-op.

Windows-Specific Gotcha

On Windows, DataLoader workers use the spawn start method (not fork like Linux/macOS). This means:

  1. Always wrap your training code in if __name__ == "__main__":. Without it, worker processes re-import your script, hit the training code again, try to spawn more workers, and crash or silently fall back to num_workers=0.
  2. Worker startup overhead is higher on Windows than on Linux. If you’re running short experiments (a few batches per epoch), use persistent_workers=True to keep workers alive between epochs rather than paying the spawn cost every epoch.

One More Option: persistent_workers=True

For workflows with many small epochs, hyperparameter sweeps, few-shot learning, anything where epochs are short — DataLoader workers are created and destroyed every epoch by default. On Windows with spawn, this has non-trivial overhead.

DataLoader(dataset, num_workers=4, pin_memory=True, persistent_workers=True)

\ Workers stay alive between epochs. The prefetch queue stays warm. The first batch of each epoch arrives immediately instead of waiting for worker initialization.


Bottleneck 3: Kernel Launch Overhead

What “Small Kernels” Means

Every CUDA operation, a matrix multiply, an elementwise add, a layer norm, is a kernel: a program that runs on the GPU. Launching a kernel has a fixed CPU-side cost of roughly 5–20 microseconds, regardless of how much work the kernel does.

\ For a large matrix multiply that takes 5ms to execute, 20μs of launch overhead is noise. For a x = x + shift on a small tensor that takes 50μs to execute, 20μs of launch overhead is 40% of the total time for that operation.

\ A custom activation function written as sequential PyTorch ops — each line a separate kernel — stacks this overhead for every op, every layer, every step:

def forward(self, x):
    x = x * self.scale        # kernel 1
    x = x + self.shift        # kernel 2
    x = x - x.mean(...)       # kernels 3-4
    std = x.var(...).sqrt()   # kernels 5-7
    x = x / std               # kernel 8
    x = x * 0.5 * (1.0 + torch.tanh(...))  # kernels 9-15
    x = x.clamp(-10.0, 10.0)  # kernel 16
    return x

That’s 16 kernel launches per block, per layer, per step.

How much does it actually cost?

Here, I’ll be honest with you: on a training-scale workload, probably not that much.

\ I benchmarked the above fragmented model (8 layers, batch 128, sequence length 64, dim 256) against torch.compile with the cudagraphs backend, which captures the entire kernel sequence and replays it as a single cudaGraphLaunch:

| | ms/step | |----|----| | Eager (N kernel launches/step) | 14.63ms | | cudagraphs (1 launch/step) | 13.83ms | | Overhead | ~5% |

5%. On this model, GPU arithmetic dominates. The ~0.8ms of launch overhead is real but not catastrophic.

\ The picture changes significantly in two scenarios:

1. Inference with small batches. At batch size 1 for real-time inference, GPU kernels may complete in tens of microseconds. Launch overhead becomes a large fraction of total latency. This is where torch.compile routinely shows 2–4x speedups in the PyTorch benchmarks.

2. Many custom elementwise ops on small tensors. If you’ve written a custom loss function, regularizer, or activation with many sequential ops on small feature maps, the launch overhead compounds. The fix isn’t just torch.compile but check whether a fused implementation already exists in the ecosystem (Flash Attention, torch.nn.functional.scaled_dot_product_attention).

torch.compile on Windows

The default torch.compile backend (inductor) requires Triton, which has no official Windows support as of PyTorch 2.7. Use the cudagraphs backend instead:

model = torch.compile(model, backend="cudagraphs")

\ cudagraphs requires static input shapes — your batch size and sequence length must be fixed across steps. If you have variable-length sequences, pad to a fixed length or use torch.compile(model, dynamic=True) with the inductor backend on Linux.

\ One critical benchmarking note: the first several iterations of a compiled model are graph capture, not inference. They will be 10–100x slower than steady state. Always warm up for at least 10–15 steps before measuring, and never include iteration 1 in your numbers.

# Wrong: first iter is graph capture, not representative
t0 = time.perf_counter()
for i in range(100):
    run_step()
print((time.perf_counter() - t0) / 100)

# Right: warm up first
for _ in range(15):
    run_step()               # graph capture happens here
torch.cuda.synchronize()
t0 = time.perf_counter()
for _ in range(100):
    run_step()               # now measuring steady-state
torch.cuda.synchronize()
print((time.perf_counter() - t0) / 100)

Putting it Together: How to Actually Profile Your Own Code

The benchmark scripts for everything in this article are in the companion repo. But your model isn’t the same as mine. Here’s how to find your bottleneck.

Step 1: Check GPU Utilization

nvidia-smi dmon -s u -d 1

If utilization is consistently above 85%, your GPU is not the bottleneck. Go look at your CPU code. If it’s low, continue.

Step 2: Profile One Training Step

from torch.profiler import profile, ProfilerActivity, schedule, tensorboard_trace_handler

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    schedule=schedule(skip_first=5, wait=1, warmup=2, active=5),
    on_trace_ready=tensorboard_trace_handler("./my_trace"),
    record_shapes=True,
    with_stack=True,
) as prof:
    for step in range(13):
        train_step(batch)
        prof.step()

The skip_first=5 skips early iterations where JIT compilation and DataLoader warmup pollute the trace. Always skip.

Step 3: Read the Trace

Open ./my_trace in Perfetto UI.

\ Look for these three patterns in order:

Gap at the start of each step, before any GPU kernel fires? → DataLoader starvation. Increase num_workers, add pin_memory=True.

\ cudaStreamSynchronize events mid-step on the CPU thread? → Sync points. Find the Python call (visible with with_stack=True) and defer it.

\ GPU busy, but many thin kernel slivers with gaps between them? → Kernel launch overhead. Try torch.compile. Check if a fused op exists for your bottleneck operation.

\ Fix them in that order. DataLoader starvation is almost always the biggest win and takes 30 seconds to fix. Sync points are next. Kernel launch overhead is usually last and often small.

The one benchmarking rule you must follow

Always call torch.cuda.synchronize() before stopping your timer. Without it, you’re measuring how fast the CPU submitted work, not how fast the GPU executed it. The CPU is fast. The GPU timer is what you actually care about.

# Wrong: measures CPU submission time
t0 = time.perf_counter()
run_step()
print(time.perf_counter() - t0)   # suspiciously fast

# Right: waits for GPU to finish
torch.cuda.synchronize()
t0 = time.perf_counter()
run_step()
torch.cuda.synchronize()          # ensures GPU is done before stopping timer
print(time.perf_counter() - t0)

Summary

| Bottleneck | How to detect | Realistic speedup | Fix | |----|----|----|----| | DataLoader starvation | Long gap at the start in the profiler | 4.5x on image workloads | num_workers=N, pin_memory=True | | Logging syncs | N × cudaStreamSynchronize per step | 1.3x (27% savings) | Batch .item() calls; one sync per step | | Single .item() per step | 1 × cudaStreamSynchronize per step | ~1.03x (marginal) | Defer to the end of the epoch if loss tracking allows | | Kernel launch overhead (training) | Dense thin kernels in the GPU timeline | ~1.06x (~5%) | torch.compile(backend="cudagraphs") | | Kernel launch overhead (inference) | High launch/execute ratio | 2–4x possible | torch.compile, fused ops |

The most important takeaway isn’t the numbers, it’s the methodology. GPU utilization is not a profiler. The profiler is a profiler. Run it, look at the gaps, fix the biggest one. Then repeat.

\ The second most important takeaway: measure what you think you’re measuring. The CPU is asynchronous. Your timer is almost certainly lying to you unless you’re calling torch.cuda.synchronize().


All numbers from a single RTX 5060 on Windows 11, PyTorch 2.7, CUDA 13.1. Your results will differ by GPU, workload, and system, which is exactly why you should run the profiler yourself rather than trusting anyone else’s benchmarks.

\

Librarians vs "Data Cartels": What's Going On?

2026-04-05 01:00:26

Hi all, Tara here, The Markup’s education reporter. You may have read my story about how college students are tracked by one system or another near constantly as they study and move around campus. I focused on a community college in Southern California and a student there who is particularly troubled by the digital privacy sacrifices required to pursue a degree. One part of campus I barely touched on was the library, and that’s why I’m back in your inboxes today. 

\ Libraries have long been bastions of privacy. In fact, the American Library Association first put a right to privacy into its Bill of Rights in 1939. Librarians have since stood up time and again, refusing to keep records of what people borrow and thwarting occasional government interest in obtaining that sort of information. 

\ Yet while privacy is at the heart of many librarians’ work, it is becoming increasingly difficult to guarantee. 

\ At a university today, someone doing research through a library is just as likely to access materials in digital as in physical form. And, Markup readers, you know that means new avenues for tracking. 

\ Eliza Bettinger, a librarian at Cornell University, is among those with concerns. 

\ “For intellectual freedom, you have to have some degree of privacy to explore ideas,” Bettinger told me. She is a member of the Library Freedom Project, a coalition of librarians focused on patron privacy in an era of digital surveillance. 

\ At Cornell, international students have asked Bettinger how they can keep their home governments from finding out what they’re reading on campus. Lucky for them, and any other students worried that their interests could be exposed, circulation records aren’t saved or shared at Cornell. This means students can check out physical books and keep their browsing habits private. They can also browse the web and many library databases from library computers without needing to sign in. The university has even taken steps to preserve their privacy when they log in remotely to access digital library resources like academic journals and databases—a harder task, on the technical side. 

\ Bettinger and two co-authors described why and how to follow Cornell’s lead in a recent paper for the Licensing Privacy Project, which advocates for securing library patrons’ digital privacy through contract negotiations with vendors. Most universities don’t. They share personally identifiable data like usernames and email addresses with companies that sell databases of academic journals and other library resources, one of the practices Bettinger and her co-authors say cedes ground to “corporate data cartels whose interest is less to support rich discovery experiences than to enclose the research process within siloed and surveilled profit-making systems.”

\ Sounds extreme, right? Well, companies that got their start as academic publishers have spent the last several decades transforming themselves into hugely profitable data analytics firms. Elsevier, for example, owns ScienceDirect, the premier database of peer-reviewed journals. It is a subsidiary of RELX, a global data broker that also owns LexisNexis and collects massive databases of personal information it then sells to government agencies and businesses to use in algorithms for things like policing, insurance, and banking. 

\ There are concerns the company could add library data directly to that mix. (If you want to go further down this rabbit hole, check out Sarah Lamdan’s book, Data Cartels, which delves deeply into the web of surveillance and data analytics at the heart of both RELX’s and Thomson Reuters’ business models.)

\ Like the Licensing Privacy Project, the open-access advocacy group SPARC has also focused on contracts as a source of leverage. A paper SPARC commissioned analyzing contracts with Elsevier noted that “user tracking that would be unthinkable in a physical library setting now happens routinely through publisher platforms.” 

\ Nick Shockey, SPARC’s director of programs and engagement, told me this sort of tracking has been allowed to spread because higher ed administrators haven’t shifted to see publishing companies like Elsevier for what they are now: tech companies. 

\ “With the largest tech companies, there’s an increased level of awareness of privacy risks,” Shockey said. But academic publishers don’t yet get the skepticism they deserve. SPARC is among those trying to set the record straight.

\ \ Stay tuned for more from me on privacy and surveillance in the education space. And reach out if you have any tips!  


Credits

Design and Graphics

Engagement

Editing

\ Also published here

\ Photo by Iñaki del Olmo on Unsplash

\

The HackerNoon Newsletter: Want to Have Successful OpenTelemetry Projects? Implement This One Tip (4/4/2026)

2026-04-05 00:03:09

How are you, hacker?


🪐 What’s happening in tech today, April 4, 2026?


The HackerNoon Newsletter brings the HackerNoon homepage straight to your inbox. On this day, Martin Luther King Assassination in 1968, Microsoft is founded in 1975, Apollo 6 launched in 1968, and we present you with these top quality stories.

Want to Have Successful OpenTelemetry Projects? Implement This One Tip


By @nfrankel [ 4 Min read ] In this post, I want to tackle a real-world use case and describe which tools you can leverage to reduce the necessary changes. Read More.


🧑‍💻 What happened in your world this week?

It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️


ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME


We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it.See you on Planet Internet! With love, The HackerNoon Team ✌️


Players Are Not Quitting Games - They’re Quitting Bad Experiences

2026-04-05 00:00:38

The gaming industry has a favorite excuse.

\ When players leave, studios blame short attention spans. They blame crowded markets. They blame modern habits, social media, burnout, or the idea that players simply do not stay loyal anymore.

\ That explanation sounds clever, but it protects the industry from asking a harder question.

\ What if players are not quitting because they care less?

\ What if they are quitting because too many games have become frustrating, bloated, manipulative, and exhausting to play?

\ That is the part gaming still struggles to admit.

\ Because it is easier to blame the audience than to admit that the experience got worse.

\ The truth is, people still care deeply about games. They still wait for launches, follow updates, watch creators, and spend hundreds of hours on titles that feel rewarding. They still form communities, build routines, and attach real emotion to games that respect them.

\ Passion is not gone.

Tolerance is.

\ Players are not quitting games because gaming matters less. They are quitting because too many games now demand more patience than they deserve.

Friction Is Not the Same as Depth

Somewhere along the way, many studios started confusing inconvenience with good design.

\ A cluttered progression system became “engagement.” \n A heavy grind became “retention.” \n Too many currencies became “economic depth.” \n Constant pressure became “live-service energy.”

\ Then the industry acted surprised when players left.

\ A lot of modern games do not fail because they lack content. They fail because they put too much friction between the player and the fun.

\ Open the game. \n Install the update. \n Close the pop-up. \n Check the event tab. \n Claim the reward. \n See the bundle offer. \n Read the battle pass screen. \n Sort through daily tasks.

\ Only then do you actually get to play.

That is not exciting. It is exhausting.

\ Many games now feel less like entertainment and more like systems that constantly ask for attention. They do not pull players in with joy. They hold them in place with pressure.

\ Players feel that immediately.

Bigger Games Are Not Automatically Better

The industry has spent years chasing scale.

\ Bigger worlds. Bigger maps. Bigger content plans. Bigger seasonal roadmaps. Bigger promises.

\ But bigger is not always better. Sometimes bigger just means heavier.

\ A lot of games today are not fun in proportion to their size. They are tiring in proportion to their size. They overwhelm players with menus, systems, currencies, events, and progression loops, then wonder why excitement fades so quickly.

\ This is one of the biggest problems in gaming right now: too many games are designed to look massive before they are designed to feel good.

\ The trailer looks polished. \n The feature list looks impressive. \n The roadmap looks ambitious.

\ But the experience feels crowded, unstable, and strangely joyless.

\ Players notice that fast. They do not need months to decide whether a game respects their time. They can feel it in a single weekend, sometimes in a single session.

\ And when they feel it, they leave.

\ Not because they are weak. Not because they are impatient. Not because they got distracted.

\ They leave because the product is not delivering enough fun for the amount of friction it demands.

Live Service Turned Play Into Obligation

Live-service design changed gaming in powerful ways. It made games more dynamic, more social, and more profitable. It also pushed many games into dangerous territory, where they stopped feeling like entertainment and started feeling like maintenance.

\ That shift matters.

A game used to ask, “Do you want to spend time here?”

Now, many games ask, “Can you keep up?”

\ Log in today or miss the reward. \n Finish the pass before the season ends. \n Play the event before it rotates out. \n Buy now before the bundle disappears. \n Stay active or fall behind.

\ This pressure is not always obvious, but players still feel it. It changes the emotional texture of the experience. Even when the gameplay is solid, the surrounding structure can make the game feel heavy.

\ That is why so many players do not rage-quit anymore. They fatigue-quit.

\ They do not uninstall in anger. They just stop opening the game. They tell themselves they will return later. They never do.

\ That quiet exit is more damaging than outrage. Anger means people still care. Silence usually means the game has become a chore.

\ And people already have enough chores.

Grinding Is Fine. Meaningless Grinding Is Not

Players have never had a problem with effort.

\ They will grind for gear, practice mechanics, study systems, replay matches, and spend hours improving if the effort feels meaningful. Difficulty is not the issue. Time investment is not the issue.

\ The problem is emptiness.

\ If grinding leads to mastery, progress, status, or identity, players embrace it. If it feels like an artificial delay designed to stretch playtime, the illusion breaks quickly.

\ That is where many games fail. They treat player time like something to extract rather than something to respect. They inflate progress instead of deepening it. They add chores instead of meaningful loops.

\ Players can tell the difference.

\ They know when a game is challenging them. \n They know when it is stalling them. \n They know when a reward feels earned. \n They know when it feels manipulative.

\ Once that trust breaks, retention gets much harder.

Bad Experience Is Never Just One Problem

Most players do not leave because of one dramatic failure. They leave because of accumulation.

\ One buggy patch. \n One unfair balance change. \n One broken matchmaking streak. \n One weak update. \n One overpriced store push. \n One too many pop-ups. \n One more reminder that monetization seems more polished than the game itself.

\ Any single issue might be survivable.

\ But when these problems pile up, the experience changes. The game begins to feel heavier every time the player returns. What used to feel exciting starts feeling mentally expensive.

\ That is when the connection breaks.

\ Studios often focus on major moments like launch, reviews, and patch notes. But retention is often shaped by smaller repeated frustrations. Not the dramatic failure, but the constant irritation.

\ Players remember that feeling.

\ A game can look beautiful and still feel draining.

Toxicity, Bugs, and Bad Systems Are the Product

Gaming companies sometimes talk about community issues, UI issues, server issues, and progression issues as if they are side problems.

\ They are not.

\ Bad matchmaking is part of the experience. \n Cheating is part of the experience. \n Toxic voice chat is part of the experience. \n Lag is part of the experience. \n A terrible UI is part of the experience.

\ Players do not separate these problems from the game itself. To them, the total experience is the product.

\ So when a game feels unstable, hostile, or unfair, it does not matter how strong the art direction is or how expensive the cinematic trailer was. The player remembers the frustration.

\ That is why the “players have no patience” excuse feels so weak. Why should players keep defending games that launch unfinished, communicate poorly, and then ask for more money on top of that?

\ At some point, leaving is not impatience.

\ It is common sense.

Players Are Harder to Fool

This is the real shift in gaming.

\ Players still want great worlds, strong stories, rewarding competition, and long-term communities. None of that changed.

\ What changed is their willingness to tolerate bad design wrapped in hype.

\ A famous IP is not enough. \n A giant map is not enough. \n A polished trailer is not enough. \n A roadmap is not enough.

\ If the experience feels bloated, repetitive, manipulative, or emotionally draining, players will leave.

\ And when they leave, it is not always because another game stole them. Sometimes the game destroys its own retention by making itself feel heavier every month.

Better Experience Is the Real Retention Strategy

The smartest studios will stop asking how to trap players longer and start asking a better question:

What makes coming back feel natural?

Not addictive. \n Not mandatory. \n Natural.

\ The answer is not more systems. It is a better experience.

\ Cleaner design. \n Fairer progression. \n Less wasted motion. \n Better stability. \n More trust. \n More fun.

\ Players are not quitting games.

\ They are quitting games that feel like chores, stores, casinos, or broken workplaces, pretending to be entertainment.

\ And the studios that understand that first will not need to beg for loyalty.

\ They will build games that players actually want to return to.

\

Want to Have Successful OpenTelemetry Projects? Implement This One Tip

2026-04-04 22:00:24

Leading your organization to use OpenTelemetry is a challenge. In addition to all the usual project hurdles, you'll face one of these two situations: convince your teams to use OpenTelemetry, or convince them to move from the telemetry tool they are already using to OpenTelemetry. Most people don't want to change.

\ You'll need lots of effort and baby steps. My tip is the following: the fewer the changes, the higher your chances of success. In this post, I want to tackle a real-world use case and describe which tools you can leverage to reduce the necessary changes.

The Context

Imagine a JVM application that already offers metrics via JMX. To be more precise, let's take the example of a Spring Boot application running Tomcat embedded. Developers were supportive of the Ops team or were tasked to do Ops themselves.

\ They added the Spring Boot Actuator, as well as Micrometer JMX. A scheduled pipeline gets the metrics from JMX to a backend that ingests them.

MBeans shown in jconsole

During my previous talks on OpenTelemetry, I advised the audience to aim for the low-hanging fruit first. It means you should start with zero-code instrumentation. On the JVM, it translates to setting the OpenTelemetry Java Agent when launching the JVM. \n

java -javaagent:opentelemetry-javaagent.jar -jar app.jar

\ It looks like an innocent change on an individual level. Yet, it may be the responsibility of another team, or even a team outside your direct control. Slight changes pile upon slight changes, and change management overhead compounds with each one. Hence, the fewer changes, the less time you have to spend on change management, and the higher your chances of success.

\ Why not leverage the existing JMX beans?

Integrating JMX in OpenTelemetry

A quick search of JMX and OpenTelemetry yields the JMX Receiver, which is deprecated. It points to the JMX Metric Gatherer, which is also deprecated. The state of the art, at the time of this writing, is the JMX Metric Scraper.

This utility provides a way to query JMX metrics and export them to an OTLP endpoint. The JMX MBeans and their metric mappings are defined in YAML and reuse implementation from jmx-metrics instrumentation.

— JMX Metric Scraper

\ Note that the scraper is only available as a JAR. It is, however, trivial to create a Docker image out of it. \n

FROM eclipse-temurin:21-jre

ADD https://github.com/open-telemetry/opentelemetry-java-contrib/releases/latest/download/opentelemetry-jmx-scraper.jar \
    /opt/opentelemetry-jmx-scraper.jar

ENTRYPOINT ["java", "-jar", "/opt/opentelemetry-jmx-scraper.jar"]

\ While you can configure individual JMX bean values to scrape, the scraper provides config sets for a couple of common software systems that run on the JVM, e.g., Tomcat and Kafka. You can also provide your own config file for specific MXBeans. Here's a sample custom config file: \n

rules:
  - bean: "metrics:name=executor*,type=gauges"     # 1
    mapping:
      Value:
        metric: executor.gauge                     # 2
        type: gauge                                # 3
        unit: "{threads}"
        desc: Spring executor thread pool gauge
  1. JMX bean to map
  2. OpenTelemetry metric key
  3. Metric kind. See possible options here.

\ You can use it with the OTEL_JMX_CONFIGenvironment variable: \n

services:
  jmx-gatherer:
    build: ./jmx-gatherer
    environment:
      OTEL_SERVICE_NAME: jmx-otel-showcase
      OTEL_JMX_SERVICE_URL: service:jmx:rmi:///jndi/rmi://app:9010/jmxrmi
      OTEL_JMX_TARGET_SYSTEM: jvm,tomcat                                       # 1
      OTEL_JMX_CONFIG: /etc/jmx/springboot.yaml                                # 2
  1. JMX presets
  2. Reference the custom config file

Putting it all Together

Here are the components of a starting architecture to display the application's metrics:

  • JMX Metric Scraper
  • OpenTelemetry Collector
  • Prometheus
  • Grafana

Starting OpenTelemetry metrics architecture

The JMX Metric Scraper polls metrics from a JVM, using the JMX interface for this. It then pushes them to an OpenTelemetry Collector. For the demo, I chose a simple flow. The Collector exposes metrics in Prometheus format on an HTTP endpoint. A Prometheus instance polls them and exposes them for Grafana.

Grafana DashboardConclusion

In this post, I kept a JMX-enabled application as is and used the JMX Metric Scraper to send MBean metrics to the OpenTelemetry Collector.

\ Introducing OpenTelemetry in your information system doesn't need more changes than necessary. You can, and you probably should, keep as much as possible of your existing assets to focus your efforts on the required parts. It's always possible to change them at a later stage, when things are more stable. It might be time to nudge a bit toward a regular Java agent, with the influence of a migration that has been successful.

\ The complete source code for this post can be found on Codeberg.

To go further:


Originally published at A Java Geek on March 29th, 2026.