MoreRSS

site iconDaniel StenbergModify

Swedish open source developer and curl maintainer.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Daniel Stenberg

My cookie spec problem

2025-03-02 01:00:44

Before RFC 6265 was published in 2011, the world of cookies was a combination of anarchy and guesswork because the only “spec” there was was not actually a spec but just a brief text lacking a lot of details.

RFC 6265 brought order to a world of chaos. It was good. It made things a lot better. With this spec, it was suddenly much easier to write compliant and interoperable cookie implementations.

I think these are plain facts. I have written cookie code since 1998 and I thus know this from my own personal experience. Since I believe in open protocols and doing things right, I participated in the making of that first cookie spec. As a non-browser implementer I think I bring a slightly different perspective and different angle to what many of the other involved people have.

Consensus

The cookie spec was published by the IETF and it was created and written in a typical IETF process. Lots of the statements and paragraphs were discussed and even debated. Since there were many people and many opinions involved, of course not everything I think should have been included and stated in the spec ended up the way I wanted them to, but in a way that the consensus seemed to favor. That is just natural and the way of the process. Everyone involved accept this.

I have admitted defeat, twice

The primary detail in the spec, or should I say one of the important style decisions, is one that I disagree with. A detail that I have tried to bring up again when the cookie spec was up for a revision and a new draft was made (still known as 6265bis since it has not become an official RFC yet). A detail that I have failed the get others to agree with me about to an enough degree to have it altered. I have failed twice. The update will ship with this feature as well.

Cookie basics

Cookies are part of (all versions of ) HTTP but are documented and specified in a separate spec. It is a classic client-server setup where Set-Cookie: response headers are sent from a server to a client, the client stores the received cookies and sends them back to servers according to a set of rules and matching criteria using the Cookie: request header.

Set-Cookie

This is the key header involved here. So how does it work? What is the syntax for this header we need to learn so that we all can figure out how servers and clients should be implemented to do cookies interoperable?

As with most client-server protocols, one side generates this header, the other side consumes it. They need to agree on how this header works.

My problem is two

The problem is that this header’s syntax is defined twice in the spec. Differently.

Section 4.1 describes the header from server perspective while section 5.2 does it from a client perspective.

If you like me have implemented HTTP for almost thirty years you are used to reading protocol specifications and in particular HTTP related specification. HTTP has numerous headers described and documented. No other HTTP documents describe the syntax for header fields differently in separate places. Why would they? They are just single headers.

This double-syntax approach comes as a surprise to many readers, and I have many times discussed cookie syntax with people who have read the 6265 document but only stumbled over and read one of the places and then walked away with only a partial understanding of the syntax. I don’t blame them.

The spec insists that servers should send a rather conservative Set-Cookie header but knowing what the world looks like, it simultaneously recommends the client side for the same header to be much more liberal because servers might not be as conservative as this spec tells the server to be. Two different syntax.

The spec tries to be prescriptive for servers: thou shall do it like this, but we all know that cookies were wilder than this at the time 6265 was published and because we know servers won’t stick to these “new” rules, a client can’t trust that servers are that nice but instead need to accept a broader set of data. So clients are told to accept much more. A different syntax.

Servers do what works

As the spec tells clients to accept a certain syntax and widely deployed clients and cookie engines gladly accept this syntax, there is no incitement or motive for servers to change. The do this if you are a good server instruction serves as an ideal, but there is no particularly good way to push anyone in that direction because it works perfectly well to use the extended syntax that the spec says that the clients need to accept.

A spec explaining what is used

What I want? I want the Set-Cookie header to be described in a single place in the spec with a single unified syntax. The syntax that is used and that is interoperable on the web today.

It would probably even make the spec shorter, it would remove confusion and it would certainly remove the risk that people think just one of the places is the canonical syntax.

Will I bring this up again when the cookie spec is due for refresh again soon? Yes I will. Because I think it would make it a much better spec.

Do I accept defeat and accept that I am on the losing side in an argument when nobody else seems to agree with me? Yes to that as well. Just because I think like this does in no way mean that this is objectively right or that this is the way to go in a future cookie spec.

Adding curl release candidates

2025-02-28 18:22:07

Heading towards curl release number 266 we have decided to spice up our release cycle with release candidates in an attempt to help us catch regressions better earlier.

It has become painfully obvious to us in the curl development team that over the past few years we have done several dot-zero releases in which we shipped quite terrible regressions. Several times those regressions have been so bad or annoying that we felt obligated to do quick follow-up releases a week later to reduce friction and pain among users.

Every such patch release have caused pain in our souls and have worked as proof that we to some degree failed in our mission.

We have thousands of tests. We run several hundred CI jobs for every change that verify them. We simply have too many knobs, features, build configs, users and combinations of them all to be able to catch all possible mistakes ourselves.

Release candidates

Decades ago we sometimes did release candidates, but we stopped. We have instead shipped daily snapshots, which is basically what a release would look like, packaged every day and made available. In theory this should remove the need and use of release candidates as people can always just get the latest snapshots, try those out and report problems back to us.

We are also acutely aware of the fact that only releases get tested properly.

Are release candidates really going to make a difference? I don’t know. I figure it is worth a shot. Maybe it is a matter of messaging and gathering the troops around these specific snapshots and by calling out the need for the testing to get done, maybe it will happen at least to some extent?

Let’s attempt this for a while and then come back in a few years and evaluate if it has seemed to help or otherwise improve the regression rate or not.

Release cycle

We have a standard release cycle in the curl project that is exactly eight weeks. When things run smoothly, we ship a new release on a Wednesday every 56 days.

The release cycle is divided into three periods, or phases, that control what kind of commits us maintainers are permitted to merge. Rules to help us ship solid software.

Immediately after a release, we have a ten day cool down period during which we absorb reactions and reports from the release. We only merge bugfixes and we are prepared to do a patch release if we need to.

Ten days after the release, we open the feature window in which we allow new features and changes to the project. The larger things. Innovations, features etc. Typically these are the most risky things that may cause regressions. This is a three-week period and those changes that do not get merged within this window get another chance again next cycle.

The longest phase is the feature freeze that kicks in twenty-five days before the pending release. During this period we only merge bugfixes and is intended to calm things down again, smooth all the frictions and rough corners we can find to make the pending release as good as possible.

Adding three release candidates

The first release candidate (rc1) is planned to ship on the same day we enter feature freeze. From that day on, there will be no more new features before the release so all the new stuff can be checked out and tested. It does not really make any sense to do a release candidate before that date.

We will highlight this release candidate and ask that everyone who can (and want) tests this one out and report every possible issue they find with it. This should be the first good opportunity to catch any possible regressions caused by the new features.

Nine days later we ship rc2. This will be done no matter what bugreports we had on rc1 or what possible bugs are still pending etc. This candidate will have additional bugfixes merged.

The final and third release candidate (rc3) is then released exactly one week before the pending release. A final chance to find nits and perfect the pending release.

I hope I don’t have to say this, but you should not use the release candidates in production, and they may contain more bugs than what a regular curl release normally does.

Technically

The release candidates will be created exactly like a real release, except that there will not be any tags set in the git repository and they will not be archived. The release candidates are automatically removed after a few weeks.

They will be named curl-X.Y.Z-rcN, where x.y.z is the version of the pending release and N is the release candidate number. Running “curl -V” on this build will show “x.y.x-rcN” as the version. The libcurl includes will say it is version x.y.z, so that applications can test out preprocessor conditionals etc exactly as they will work in the real x.y.z release.

You can help!

You can most certainly help us here by getting one of the release candidates when they ship and try it out in your use cases, your applications, your pipelines or whatever. And let us know how it runs.

I will do something on the website to help highlight the release candidates once there is one to show, to help willing contributors find them.

The curl roadmap webinar 2025

2025-02-25 16:05:56

On March 6 2025 at 18:00 UTC, I am doing a curl roadmap webinar, talking through a set of features and things I want to see happen in curl during 2025.

Figure out my local time.

This is an opportunity for you to both learn about the plans but also to provide feedback on said ideas. The roadmap is not carved in stone, nor is it a promise that these things truly will happen. Things and plans might change through the year. We might change our minds. The world might change.

The event will be done in parallel over twitch and Zoom. To join the Zoom version you need to sign up for it in advance. To join the live-streamed twitch version, you just need to show up.

Sign up here for Zoom

You will be able to ask questions and provide comments using either channel.

Recording

A second curl distro meeting 2025

2025-02-24 15:32:16

We are doing a rerun of last year’s successful curl + distro online meeting. A two-hour discussion, meeting, workshop for curl developers and curl distro maintainers. Maybe this is the start of a new tradition?

2025 event details

Last year I think we had a very productive meeting that led to several good outcomes. In fact, just seeing some faces and hearing voices from involved people is good and helps to enhance communication and cooperation going forward.

The objective for the meeting is to make curl better in distros. To make distros do better curl. To improve curl in all and every way we think we can, together.

Everyone who feels this is a subject they care about is welcome to join. The meeting is planned to take place in the early evening European time, early morning west coast US time. With the hope that it covers a large enough amount of curl interested people.

The plan is to do this on April 10, and all the details, planning and discussion items are kept on the dedicated wiki page for the event.

Feel free to add your own proposed discussion items, and if you feel inclined, add yourself as an intended participant. Feel free to help make this invite reach the proper people.

See you in April.

curl website traffic Feb 2025

2025-02-23 06:36:51

Data without logs sure leaves us open for speculations.

I took a quick look at what the curl.se website traffic situation looks like right now. Just as a curiosity.

Disclaimer: we don’t log website visitors at all, we don’t run any web analytics on the site so we basically don’t know a lot of who does what on the site. This is done both for privacy reasons, but also for practical reasons. Managing logs for this setup is work I rather avoid to do and to pay for.

What do we have, is a website that is hosted (fronted) by Fastly on their CDN network, and as part of that setup we have an admin interface that offers accumulated traffic data. We get some numbers, but without details and specifics.

Bandwidth

Over the last month, the site served 62.95 TB. This makes it average over 2TB/day. On the most active day in the period it sent away 3.41 TB.

Requests

At 12.43 billion requests, it makes an average request transfer size 5568 bytes.

Downloads

Since we don’t have logs, we can’t count curl download perfectly. But we do have stats for request frequency for objects of different sizes from the site, and in the category 1MB-10MB we basically only have curl tarballs.

1.12 million such objects were downloaded over the last month. 37,000 downloads per day, or about one curl tarball downloaded every other second around the clock.

Of course most curl users never download it from curl.se. The source archives are also offered from github.com and users typically download curl from their distro or get it installed with their operating system etc.

But…?

The average curl tarball size from the last 22 releases is 4,182,317 bytes. 3.99 MB.

1.12 million x 3.99 MB is only 4,362 gigabytes. Not even ten percent of the total traffic.

Even if we count the average size of only the zip archives from recent releases, 6,603,978 bytes, it only makes 6,888 gigabytes in total. Far away from the almost 63 terabytes total amount.

This, combined with low average transfer size per request, seems to indicate that other things are transferred off the site at fairly extreme volumes.

Origin offload

99.77% of all requests were served by the CDN without reaching the origin site. I suppose this is one of the benefits of us having mostly a static site without cookies and dynamic choices. It allows us to get a really high degree of cache hits and content served directly from the CDN servers, leaving our origin server only a light load.

Regions

Fastly is a CDN with access points distributed over the globe, and the curl website is anycasted, so the theory is that users access servers near them. In the same region. If we assume this works, we can see from where most traffic to the curl website comes from.

The top-3:

  1. North America – 48% of the bandwidth
  2. Europe – 24%
  3. Asia – 22%

TLS

Now I’m not the expert on how exactly the TLS protocol negotiation works with Fastly, so I’m guessing a little here.

It is striking that 99% of the traffic uses TLS 1.2. It seems to imply that a vast amount of it is not browser-based, as all popular browsers these days mostly negotiate TLS 1.3.

HTTP

Seemingly agreeing with my TLS analysis, the HTTP version distribution also seem to point to a vast amount of traffic not being humans in front of browsers. They prefer HTTP/3 these days, and if that causes problems they use HTTP/2.

98.8% of the curl.se traffic uses HTTP/1, 1.1% use HTTP/2 and only the remaining tiny fraction of less than 0.1% uses HTTP/3.

Downloads by curl?

I have no idea how large share of the downloads that are actually done using curl. A fair share is my guess. The TLS + HTTP data imply a huge amount of bot traffic, but modern curl versions would at least select HTTP/2 unless the users guiding it specifically opted not to.

What is all the traffic then?

In the past, we have seen rather extreme traffic volumes from China downloading the CA cert store we provide, but these days the traffic load seems to be fairly evenly distributed over the world. And over time.

According to the stats, objects in the 100KB-1MB range were downloaded 207.31 million times. That is bigger than our images on the site and smaller than the curl downloads. Exactly the range for the CA cert PEM. The most recent one is at 247KB. Fits the reasoning.

A 247 KB file downloaded 207 million times equal 46 TB. Maybe that’s the explanation?

Sponsored

The curl website hosting is graciously sponsored by Fastly.

Changing every line three times

2025-02-18 22:07:43

Is there some magic making three times, or even pi, the number of times you need to write code for it to be good?

So what am I talking about? Let’s rewind and start with talking about writing code.

Let’s say you start out by writing a program that is exactly one hundred lines long, and you release your creation to the world. Every line in this program was written just once.

Then someone reports a bug so you change source code lines. Say you change ten lines. Which is the equivalent of adding ten lines and removing ten lines. The total number of lines remains 100 lines, but you have written 110. The average line has then been changed 1.1 times.

Over time, you come to change more lines and if the project survives you probably add new code too. A living software project that is maintained is bound to have had many more lines added than what is currently present in the current working source code branch.

Exactly how many more lines were added than what is currently present?

That is the question that I asked myself regarding curl development. If we play with the thought that curl is a decently mature project as it has been developed for over twenty-five years maybe the number of times every line has been changed would tell us something?

By counting the number of added lines and comparing how many lines of code are still present, we know how often lines are changed – on average. Sure, some lines in the file headers and similar are probably rarely changed and some others are changed all the time, but let’s smear out the data and just count average.

curl is also an interesting test subject here because it has grown significantly over time. It started out as 180 lines of code in 1996 (then called httpget) and is almost 180,000 lines of code today in early 2025. An almost linear growth in number of lines of code over time, while at the same time having a fierce rate of bugfixes and corrections done.

I narrowed this research to all the product code only, so it does not include test cases, documentation, examples etc. I figured that would be the most interesting bits.

Number of lines of code

First a look at the raw number of how many lines of product code is present at different moments in time during the project’s history.

Added LOC per LOC still present

Then, counting the number of added lines of code (LOC) and comparing with how many lines of code are still present. As you can see here, the change rate is around three for a surprisingly long time.

Already by 2004 we had modified every line on average three times. The rate of changes then goes up and down a little but remains roughly three for years until 2015 something when the change rate start to gradually increase a little to 3.5 in early 2025 – while at the same time the number of lines of code in the project kept growing.

Today, February 18 2025 actually marks the day when it was calculated to a number above 3.5 for the first time ever.

What does it mean?

It means that every line in the product source code tree have by now been edited on average 3.5 times. It might been that we have written bad code and need to fix many bugs or that go back to refactor and improve existing lines frequently. Probably both.

Of course, some lines are edited and changed far more often than others, the 3.5 is just an average. We have some source lines left in the code that was brought before year 2000 and have not been changed since.