MoreRSS

site iconFlowingDataModify

By Nathan Yau. A combination of highlighting others’ work and visualization guides.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of FlowingData

Making 10M government PDF documents searchable

2025-11-26 16:51:50

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Tags: , ,

Timeline for news coverage of trans communities

2025-11-25 16:23:08

The Trans News Initiative is a collaborative effort to track news coverage of trans communities over time. A streamgraph shows article counts by topic, between 2020 and the present and clicking through shows a set of packed circles and tables that link to each article.

On the classification of articles:

Wire stories published by multiple outlets were treated as individual articles instead of collapsed, prioritizing news dissemination and reach over unique reporting. Generic news round-ups and recaps (e.g., “Weekend Report”, “Top Stories”, “News Roundup”) were filtered from the event data. We then used the RoBERTa-base model to assign embeddings to each article headline, and employed these embeddings to cluster the output using HDBSCAN. The clusters were labelled using an LLM aimed at creating an umbrella cluster phrase from the individual article headlines in the same cluster.

This system was used to identify themes, which again, you can see over time.

Tags: , , ,

Epstein emails presented as a Gmail inbox

2025-11-25 01:07:11

Congress released a collection of emails from Jeffrey Epstein’s inbox. However, as one might expect, it was not in the most usable format. Jmail, made by Luke Igel and Riley Walz, puts the emails in a more familiar Gmail view. Now you can pretend you’re logged into Epstein’s account and search and browse the threads.

Tags: ,

Visual reconstruction of flooding at Camp Mystic

2025-11-21 20:05:33

The New York Times used a mix of media and data sources to reconstruct the flooding at Camp Mystic.

What follows is the most detailed description to date of the events that took the lives of more than two dozen campers and counselors, and the elder Mr. Eastland, at the 99-year-old summer retreat.

The descriptions and rendering of those events were taken from the first interviews that Camp Mystic’s owners have granted, along with never-before-seen videos and photos taken during flooding at the camp, data from devices such as Apple watches, cell phones and vehicle crash data, and court documents from a lawsuit filed by some of the parents of children who died.

The animated water flow and photos help you understand the scale and speed of the flooding, in relation to the 28 lives lost. Tragic from every angle.

Tags: , , ,

Scale of one trillion dollars

2025-11-20 22:39:08

If Elon Musk achieves certain benchmarks for Tesla over the next decade, he gets a $1 trillion bonus. While unlikely Tesla gets there, a trillion is kind of a lot, especially for one person. But our human brains aren’t great at imagining numbers at that scale. So, for the Washington Post, Alyssa Fowers and Leslie Shapiro scaled a trillion by total U.S. workers in a given job.

I like to think in units of number of Jack in the Box tacos I can buy, but I guess that’s more useful for smaller values. Although less so recently. Thanks, inflation.

It’s crazy that just a few years ago we were looking at how comical Jeff Bezos’ net worth of $172 billion was at the time. Pocket change now.

Tags: , , , ,

✚ Claude, a year later

2025-11-20 20:06:03

Hi everyone. This is the Process, the newsletter for FlowingData members on data and visualization beyond defaults. Last year, I documented my experience with Claude, the AI chatbot, for working with and visualizing data. It seemed like a good time to revisit.

Become a member for access to this — plus tutorials, courses, and guides.