MoreRSS

site iconHerman MartinusModify

Creator of the no-nonsense blogging platform, Bear, and a few other things.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Herman Martinus

The Great Scrape

2025-03-26 17:02:00

LLMs feed on data. Vast quantities of text are needed to train these models, which are in turn receiving valuations in the billions. This data is scraped from the broader internet, from blogs, websites, and forums, without the author's permission and all content being opt-in by default.

Needless to say, this is unethical. But as Meta has proven, it's much easier to ask for forgiveness than permission. It is unlikely they will be ordered to "un-train" their next generation models due to some copyright complaints.

I wish the problem ended with the violation of consent for how our writing is used. But there's another, more immediate problem: The actual scraping.

These companies are racing to create the next big LLM, and in order to do that they need more and more novel data with which to train these models. This incentivises these companies to ruthlessly scrape every corner of the internet for any bit of new data to feed the machine. Unfortunately these scrapers are terrible netizens and have been taking down site-after-site in an unintentional wide-spread DDoS attack.

Over the past 6 months Bear, and every other content host on the internet, has been affected. Both Sourcehut and LWN have written about their difficulties in holding back the scourge of AI scrapers. This seems to be happing to big and small players alike. Self-hosted bloggers have had to figure out rate-limiting and CDNs too, which is pretty unfair for someone who just wants to write on the internet.

Bear is hit daily by bot networks requesting tens of thousands of pages in short time periods, and while I now have systems in place to prevent it actually taking down the server, when it started happening a few months ago it certainly had an impact on performance.

This is a difficult problem to solve, due to the way that these scrapers are designed. The first is that only a small portion of these scrapers identify themselves as such. These are all blocked at the WAF (Web Application Firewall) level and never reach any Bear blogs (about 500,000 requests have been blocked in the last 24 hours). However the vast majority of scrapers identify themselves as regular web-browsers, and use multiple servers and IP addresses, making all of the usual tools like rate-limiting and user-agent parsing obsolete. Not to mention that they all completely ignore robots.txt and other self-regulation rules.

One of the mitigation options is to add a challenge to every single page (like Cloudflare's managed challenge), but this is an unpleasant user-experience and blocks bots that are actually welcome, such as search engine crawlers. So while it is possible to mitigate all bot traffic, that would effectively make all blogs non-searchable on all the major search engines. Some of the LLM scrapers cheekily identify themselves as Googlebot or Yandexbot as well. This option would also affect anyone who runs scripts on their own site for backups or custom automations. Not ideal.

I've had to remove RSS subscriber analytics since I can't mitigate bots very well on RSS feeds which are explicitly designed for bots. This influx has caused the RSS analytics to be completely wrong, and it felt better to remove it than to display incorrect information.

As of right now I have several strategies in place to combat this deluge that are working well. If you're a service provider or sysadmin being negatively impacted by these scrapers, send me an email and I'd be happy to show you what's worked for me.

Right now everything is under control on Bear. Over the past month bots have only managed to impact performance on Bear once, and that endpoint has since been protected. I've added significantly more active monitoring, and any time I see a spike of requests I find a common pattern, block it, and monitor whether it has affected any real users.

Thankfully, none of these scrapers render CSS, and therefore don't get logged as visitors on Bear's analytics.

The best case scenario is that the AI companies find another way to train their models without ruthlessly slashing and burning the internet. However, I doubt this will happen. Instead I see it getting worse before it gets better. More tools are being released to combat this, one interesting tool from Cloudflare is the AI Labyrinth which traps AI scrapers that ignore robots.txt in a never-ending maze of no-follow links. This is how the arms race begins.

And I'm ready for it. Let's fight this exploitation of the commons.

The sound of silence

2025-02-11 15:52:00

During the first week or two of January the Bear servers were experiencing some instability. The resource usage would go through the roof causing a short period of cascading timeouts and leaving me scrambling and stressed. This was a fairly complex problem to solve1, and one that I wracked my brain over for a good few days. However, the root of, and solutions to the problems, didn't come during the time spent sitting in-front of my computer, but instead during a 3 hour drive down the East Coast. No music, no podcast. Just me with my thoughts.

I've come to appreciate time spent with nothing but my thoughts. It's something I'd escaped for years. In the modern age it's so easy to always have some kind of entertainment streaming. I'd drive with a podcast playing, or do house chores with a YouTube video in the background. But something is lost in doing this. There's a reason that "the best thoughts come to you in the shower". And that's because we haven't figured out how to inject consumption into our showers gracefully (yet).

It turns out that complex problems, self-actualisation, and meaningful thoughts require time spent with oneself. This could be through journaling, meditation, or prayer. But I've found that so many other activities offer space for this. For over a year now I've exercised without headphones on. I've found that not only am I more focussed on my workouts (the actual movements, the mind-muscle connection, and pushing closer to my limits), but between sets my mind gets to wander. It allows me to plan my day, to consider problems I'm working on, and generally leads to a calmer, more collected me.

And so, one of my intentions for the year is to spend more quiet time with myself. This means not always listening to a podcast while driving. Or doing laundry and chores without any entertainment in the background. It means leaving my phone at home when I go to the beach or take a walk. And when I'm working on my crafts, not having anything playing that draws me out of the experience.

I like being in my head. And I feel the more time I spend with myself, the more I love me.

  1. It turns out the issue was a combination of a massive bot network disrespecting robots.txt and trying to scrape every Bear blog, slow horizontal scaling (30 seconds), and some inefficient endpoints. This has been solved through more granular WAF rules, rate-limiting, better horizontal scaling tools, and more granular logging. As well as some efficiency cleanups on some endpoints.

The Bear Manifesto

2025-01-27 15:06:00

Given recent events in the blogging space (hello to all Cohost and Wordpress refugees), I wanted to take a moment to share my vision and commitments for Bear.

First things first: Bear isn't going anywhere. No sudden shutdowns, no surprise acquisitions, no pivot to becoming an AI-powered metaverse blockchain solution. Just simple, clean blogging—now and in the future.

The promises

  1. Bear won't shut down. Period. I've seen too many great platforms disappear overnight, leaving their communities scrambling. This is made worse when the platform is your personal garden and online neighbourhood. That won't happen here. Bear is built to last.

  2. Bear won't sell. I'm not building this to flip it to the highest bidder. No VC funding, no external pressures, no "exit strategy." Bear is independent and will stay that way.

  3. Bear won't show ads. Your blog is your space. No flashy banners will suddenly appear one day, and no sponsored content. Just your words, your way.

Built to last

Bear isn't just a weekend project—it's built with longevity in mind. The codebase is intentionally simple and maintainable. The infrastructure is robust and redundant. Everything is backed up religiously (and then backed up again, just to be sure).

I'm not just thinking about next week or next month. I'm thinking about Bear being around in 10, 20, 50 years. That means making smart technical choices now that won't paint Bear into a corner later.

In this way, Bear doesn't have explicit integrations with other tools and infrastructure. It is self-reliant with the ability for users to customise their blogs and build out those integrations as they see fit. So while it is possible to integrate blogs with newsletter tools, Mastodon, Bluesky, and The Fediverse at large, it's not the default.

Planning for the future

This is a morbid topic for me to write about: what happens to Bear if something happens to me? I've got that covered too. There's a detailed succession plan in place, including:

  • Full documentation of all systems and processes
  • Multiple trusted developers with access to the codebase
  • Clear instructions for maintaining the platform

So if I were to be incapacitated in any way, the platform will live on.

Legal structure

I've recently chatted to a few bloggers and legal professionals on what a good structure looks like for a project like this. And the common theme was that the legal structure didn't matter nearly as much as the intentions of the people running things. We've seen our fair share of open-source projects become sour (see the recent Wordpress drama) or abandoned entirely. We've seen OpenAI become ClosedAI. There's a common thread here. Trust isn't just a legal structure, but a social contract.

With this is mind, Bear will continue to run as a PTY LTD where the company exchanges some extra add-ons for money, and uses that money to maintain and improve the infrastructure (and allows me to focus on Bear full time). Perhaps a more fitting legal structure will present itself in the future. If you have any ideas please pop me an email.

I've also been thinking about what kinds of organisations last the longest, and there are a few that come to mind:

  • Small (usually family-owned) businesses
  • Monasteries and convents
  • Educational institutions

I'm not sure how this information is best applied, but it's telling that 'growth at all costs' inevitably leads to the dissolution of the company.

The bottom line

Bear is doing great. Not in terms of market share or valuations, but in staying true to what matters: giving people a reliable, simple, and independent place to share their thoughts online.

If you're tired of platforms that treat you like a product, welcome home. Bear is here to stay.

Keep blogging,

Herman Martinus
Creator of Bear

Active rest

2025-01-20 18:09:00

Due to the nature of my work I spend a lot of time in front of screens. Most knowledge work is done behind some screen or another, and while it can be rewarding (especially since I enjoy what I do), all that time spent sitting and thinking, and thinking and sitting does get to me. I feel tight between my shoulders, and afterwards my head is muzzy and I'm low on energy.

I don't see this as an explicit problem. Knowledge work has its perks, and isn't as unhealthy as the news, podcasts, and YouTube videos would have us believe. The problem—I've found—is the screens after screens. Finishing up a long day of work only to plop down on the couch and turn on a slightly larger screen for entertainment is the real adversary. My brain needs rest, but not this kind.

One solution is hobbies and crafts. I've always loved working with my hands. When I was younger it was building fortresses in the mountains with my brother, or putting together small models with superglue (inevitably sticking my fingers to the table in the process), or drawing and inventing new Pokémon. These hobbies have morphed significantly as I've gotten older, but the premise stays the same: the best form of relaxation isn't consumption, it's creation.

On writing this post I realise that I've collected a lot of hobbies over the years: Gyming, rock-climbing, riding motorcycles, cooking, writing, and hiking, to name a few. But more recently I've gotten into crafts such as leather-work and mechanics. These activities allow my brain some rest and my body to take over.

This has grown so important in my life that in April last year Emma and I rented out a small workshop in an industrial building in Salt River, where we make all kinds of cool stuff. We invite our friends over for workshop evenings where we order food and work on interesting projects.

workshop

From left to right we have Riko sewing up a storm (yes, he made that shirt himself); Andrew fixing his coffee grinder; Emma putting together some lego (although more recently she's been sewing and painting with watercolours); and Simon drawing. We've had friends bring in their speakers and mix decks and make music, not to mention some workshop jams.

I've predominately been doing leather work (which is ironic for someone who doesn't eat meat). Over the November and December period I designed and created all kinds of bags, wallets, and belts. Needless to say I didn't have to do any Christmas shopping this year. People just got fancy bags, and wallets, and belts.

bagbag2sunglasses

I did a 5 day intensive wood-working course this past year as well. It was taught by a delightful old man with over 60 years of experience woodworking (and all of his fingers). However, I found that woodworking wasn't to my liking. Wood-chips want to get into your eyes, the wood dust into your lungs. The power tools will give you tinnitus if you give them half a chance, and take a finger or two in the process. There's also the cleanup.

I'm sure woodworking is rewarding for some, but everyone needs to find their own yum.

In order to learn mechanics I undertook stripping a worn out 125cc scooter down to the frame, refurbishing every component, and rebuilding it into a sporty little scoot which was gifted to a friend.

scoot-0scoot

More recently I bought a non-running 1964 Honda Super Cub which I managed to get moving (although only temporarily, this is still a work in progress). I may be a bit too big for it, but it's so cute!

My brother is currently rebuilding an old American Muscle car (a 1969 Chevrolet Chevelle). I visited him in November to help with the installation of the engine, which required some heavy lifting. He has a whole blog on the restoration if you're interested.

engine

All of these activities have been wholesome and satisfying. Certainly more so than binge watching all of Squid Games. After a few hours at the workshop I'm physically exhausted, in the best of ways. I've done something rewarding, spent time with the people I love, and (hopefully) have something cool to show for it.

That isn't to say I don't spend some time consuming as a form of relaxation. I'm currently half-way though the final Stormlight Archives book (Wind and Truth). But I've never once regretted going to the beach or on a mountain walk after work. Never once spent time on a hobby and deemed that Netflix would have been better.

Admittedly, I'm very lucky to have a workshop (it only costs me $150 per month in South Africa and is worth every cent). I'm also lucky to live in a city that has both mountains to climb and oceans to swim in. But I'm sure no-matter where I am, I'll still find something to keep myself busy.

What will you do in 2025?

Bear Blog question challenge

2025-01-13 15:01:00

I'm a bit late to the question challenge posed by Ava. I just arrived back from a 7 day hike up the Wild Coast of South Africa, which was beautiful, relaxing, and a great way to usher in the new year.

hike2hike3hike4hike5

On arriving back I noticed a sharp increase in signups on Bear (new year; new blog!) and had to do some infrastructural work and house-keeping. I apologise to everyone who had timeout issues over the first week of January. This has been resolved and I have auto-scalers in place to prevent timeouts from happening in the future.

Before I get into the questions, I'd like to give a shout-out to both Kev Quirk and Manu. I follow both of their RSS feeds and was pleasantly surprised to see the Bear Blog question challenge on their blogs, despite being off-platform. It's been so interesting and lovely reading about everyone's blogging journeys.

Let's do this!

1. Why did you make a blog in the first place?

I've been blogging in some form or another since about 2014 (10 years!). Most of my early posts were your standard coming-of-age pieces. As most young people do, I felt like I was seeing reality and my place in the universe for the first time and needed an outlet for these deep thoughts. Those posts are lost to time, which I'm both glad—but also a little sad—about.

Later blogging became a way to share my experiences building delightful tools for the internet, as well as random tangents about air-powered vehicles, traffic circles, and other oddities. I found that by exploring a topic in-depth and then writing about it, it solidified my understanding of the topic. It allowed me to critique my own opinions and reason about the subject more holistically. In many ways, blogging helped me learn about the world around me, and retain that information (hopefully long-term).

2. Why did you choose Bear Blog?

This is a funny question for me, since I'm the creator of the platform. But in a way it's a good question because I chose Bear Blog the most. I'd tried all of the other platforms, from self-hosting pure HTML, to Jekyll, to Hugo. I'd signed up for Ghost and Wordpress, and many more that each had something good, but were never quite right. Bear is my attempt to take all of the great platform choices, distill them down, and discard the unnecessary. Design by subtraction, if you will.

I was just very lucky that so many people resonated with my opinions. Bear wouldn't be what it is without you all.

3. Have you blogged on other platforms before?

I guess I answered this in the previous question. But yes. I've been around the block.

4. Do you write your posts directly in the editor or in another software?

I generally write my first draft in Apple Notes then transcribe it to the Bear Blog editor when it's in roughly the correct shape. I'm generally not a fan of proprietary editors that try to do too much, and Apple Notes works as expected, is free (assuming you've bought into the ecosystem), and syncs between my devices.

I used to use iA writer, but just naturally found myself using Apple Notes, since it's also where I take all kinds of other notes and keep my work log.

5. When do you feel most inspired to write?

I rarely feel very inspired to write. Instead I feel inspired by ideas and thoughts. Most of the time writing is difficult and requires some inertia to get going, especially if I've lost writing momentum.

If I'm to be incredibly reductive, there are two kinds of writers: One kind can't help but write. Words flow from them into the world. The other kind pull words into the world, kicking and screaming. Like getting out of bed too early after a late night.

I'm sometimes one, and sometimes the other. Sometimes it's easy, and sometimes it's hard. What I aspire to, though, is to become the kind of writer who shows up regardless. The kind of writer that doesn't rely on the fleetingness of motivation. I find taking a long walk helps.

6. Do you publish immediately after writing or do you let it simmer a bit as a draft?

I generally outline a post in my head or in my notes and let the idea gestate. While I'm driving or taking a walk, or in the shower, the post kinda fleshes itself out, and I keep adding to the notes. Then (when inspiration strikes, but see question 5) I sit down and write a rough first draft. I don't edit or even look at the first draft again until the next day, where I re-read and edit it with fresh eyes before publishing.

In this way my writing process is broken down into 3 phases, usually on 3 different days:

  • Ideation
  • Writing
  • Editing

This isn't prescriptive, by any means. It's just what works for me.

7. Your favourite post on your blog?

There are some posts I'm certainly more proud of than others. The one that I feel has captured me very well is My product is my garden. However, if I can break with the rules and add another post that is more of a short story, it would be The Gods of Toil.

I find that there's generally a negative correlation between posts that I'm very proud of, and the number of reads and upvotes it gets. My most recent post was a rant about me having to upgrade my iPhone. It has become one of my most-read posts, and I understand that it resonated with a lot of people. But I'm significantly more proud of my post on building a better ranking algorithm for Bear. I guess there's a larger subset of people who are unhappy with the state of smartphones than people who are interested in ranking algorithms. So be it.

8. Any future plans for your blog?

The style and structure of my blog will likely remain as it is for the foreseeable future. I love the simplicity and legibility of it, but it also acts as the "Example blog" of Bear, so I feel it needs to show the underlying structure of the platform without any embellishments. I have, however, enjoyed seeing the creativity on so many people's blogs and have felt the pull to do similarly. Bear has explicitly been built so that it's simple to use, but also has incredible depth if you don't mind getting your hands dirty.

That being said, the content on my blog is likely to shift this year. Since the platform itself is stable and mostly feature-complete, I'm going to be spending a lot more time doing interesting things and writing about them. I have a backlog of posts I've been meaning to get to, and there's no time like the present.

To end off

I don't think I've been as excited about a year as I am for 2025. I feel a crackling of magic in the air.

To all of you, happy new year! May it be filled with interesting work, adventures, good food, and good people.

IMG_0236

Forced to upgrade

2024-11-27 14:15:00

I've been happily using an iPhone 8 for the last 7 years. It has been, and still is, a perfect smartphone. I've never bumped up against the limitations of the hardware (gaming on a phone is a bit of a mystery to me), and all of my photos and videos are synced to the cloud, so I've never even run out of the 64gb of storage space.

About a year ago the battery wasn't making it through a full day. So I replaced the battery and breathed new life into my phone. It felt brand new.

But then this year, it started happening. Certain apps become unavailable, requiring iOS 17 or later. The iPhone 8 will never support iOS 17 since Apple only provides software updates for 6 years after the device's release date1. So I'm stuck at 16.7.10 in perpetuity.

And now I feel quite bitter. I have a perfectly functional iPhone that is becoming obsolete, not through any fault of its own or depreciation of the hardware, but due to dwindling support. And the worst part is, I get it! Very few people keep a phone for 7 years. Apple seems to be the best at supporting older devices, and yet I don't actually expect them to support hardware for longer than that.

I just feel like I'm being punished for taking good care of my stuff.

I ended up buying the iPhone 16 in the hopes that it'll last the longest (6 years of software updates starting this year, let's go!). And on setting it up I realise that it does exactly the same stuff that my iPhone 8 does. But the number is doubled from 8 to 16...so why doesn't it feel like it's twice as good?

Now it doesn't fit in my hand properly, all new iPhones are veritable tablets. I also miss TouchID. The new camera is fine.

I'm going nowhere with this post. I'm trying to voice my frustration with something that is entirely reasonable.

I just can't shake this feeling that I've been forced to upgrade to something worse.

Enjoyed the article? I write about 1-2 a month. Subscribe via email or RSS feed.
  1. As far as I can tell Apple has the longest support lifetime of 6 years, followed closely by Samsung with 4 years of major Android OS upgrades and 5 years of security updates. edit: It seems that Googles new flagship phones now come with 7 years of updates.