MoreRSS

site iconDaniel StenbergModify

Swedish open source developer and curl maintainer.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Daniel Stenberg

Decomplexification

2025-05-30 01:57:06

(Clearly a much better word than simplification.)

I believe we generally accept the truth that we should write simple and easy to read code in order to make it harder to create bugs and cause security problems. The more complicated code we write, the easier it gets to slip up, misunderstand or forget something along the line.

And yet, at the same time, over time functions tend to grow and become more and more complicated as we address edge cases and add new funky features we did not anticipate when we first created the code many decades ago.

Complexity

Cyclomatic complexity is a metric used to indicate the complexity of a program. You can click the link there and read all the fine details, but it boils down to: a higher number means a more complex function. A function that contains many statements and code paths.

There is this fine old command line tool called pmccabe that is able to scan C code and output a summary of all functions and their corresponding complexity scores.

Invoking this tool on your own C code is a perfect way to get a toplist of functions possibly in need of refactor. While of course the idea of what a complex function is and exactly how to count the score is not entirely objective, I believe this method works to a decently sufficient degree.

curl

Last year I created a graph to the curl dashboard that shows the complexity scores of the worst function in curl as well as the 99th percentile. Later, I also added a plot for 90th percentile.

This graph shows how the worst complexity in curl has shifted over time – and like always when there is a graph or you measure something – suddenly we get the urge to do something about the bad look of that graph. Because it looked bad.

The worst

I grabbed my scalpel and refactored a few of the most complex functions we had, and I basically halved the complexity score of the worst-in-curl functions. The steep drop at the right side of the graph felt nice.

I left the state there then for a while, quite pleased with having at least improved the state of things.

A few months later I returned back to the topic. I figured we could do more as the worst was still quite bad. We should have a goal set to extinguish (improve really) all functions in the curl code with a score higher than N.

A goal

In my mail to the team I proposed the acceptable complexity limit to be 100, which is not super aggressive. When I sent the email, there were seven functions ranked over 100 and the worst offender scored 196. After all, only a couple of months earlier, the worst function had scored over 350.

Maybe we could start with 100 as a max and lower it going forward if that works?

To get additional visualization of the curl code complexity (and ideally how we improve the situation) I also created two more graphs for the dashboard.

Graph complexity distribution

The first one gets the function’s complexity score for every line of source code and then shows how large percentage of the source code that has which complexity scores. The ideal of course being that almost the entire thing should have low scores.

This graph shows 100% of the source code, independent of its size at any given time because I think that is what is relevant: the complexity distribution at any particular point in time independent of the size. The size of the code has grown almost linearly all through the period this graph shows, so of course 50% of the code in 2010 was much less code than what 50% is today.

This graph shows how we have had periods of quite a lot of code with complexity over 200 and that today we finally have erased all complexity above 100. It’s a little hard to see in the graph, but the yellow field goes all the way up as of May 28 2025.

Graph average complexity

The second graph would get the same complexity score per source code line, and then calculate the average complexity score of all lines of code at that point in time. Ideally, that line should shrink over time.

It now shows a rather steep drop in mid 2025 after our latest efforts. The average complexity has more than halved since 2022.

Analyzers like it too

Static code analyzers also get better results and fewer false positives when they get to work with smaller and simpler functions. It too helps produce better code.

Refactors could shake things up

Of course, refactoring a complex function into several smaller and simpler functions can be anywhere from straight forward to quite complicated. A refactor in the name of simplification that might be hard. An oxymoron and one that of course might shake things up and could potentially rather add bugs than fix them.

Doing this of course needs to be done with care and there needs to be a solid test suite around the functions to validate that most of the functionality is still there and with the same behavior as before the refactor.

Function length

The most complex functions are also the longest, as there is a strong correlation. For that reason, I also produce a graph for the worst and the 99th percentile function lengths used in curl source code.

(something is wrong in this graph, as the P99 cannot be higher than the worst but the plot seems to indicate it was that in late 2024?)

A CI job to keep us honest

To make absolutely sure not a single function accidentally increases complexity above the permitted level in a pull-request, we created a script that makes a CI job turn red if any function goes over 100 in the complexity check. It is now in place. Maybe we can lower the limit going forward?

Towards the goal

The goal is not so much a goal as a process. An attempt to make us write simpler code, which in turn should help us write better and more secure code. Let’s see where we are in ten years!

As of this writing, here is the toplist of the most complex functions in curl right now. The ones with scores over 70:

100   lib/vssh/libssh.c:myssh_statemach_act
99 lib/setopt.c:setopt_long
92 lib/urlapi.c:curl_url_get
91 lib/ftplistparser.c:parse_unix
88 lib/http.c:http_header
83 src/tool_operate.c:single_transfer
80 src/config2setopts.c:config2setopts
79 lib/setopt.c:setopt_cptr
79 lib/vtls/openssl.c:cert_stuff
75 src/tool_cb_wrt.c:tool_write_cb
73 lib/socks.c:do_SOCKS5
72 lib/vssh/wolfssh.c:wssh_statemach_act
71 lib/vtls/wolfssl.c:Curl_wssl_ctx_init
71 lib/rtsp.c:rtsp_do
71 lib/socks_sspi.c:Curl_SOCKS5_gssapi_negotiate

This is just a snapshot of the moment. I hope things will continue to improve going forward. If even a little slower perhaps as we now have fixed all the most terrible cases.

Everything is public

All the scripts for this, the graphs shown, and the data behind them all are of course publicly available.

curl 8.14.0

2025-05-28 13:48:12

Welcome to another curl release.

Release presentation

Numbers

the 267th release
6 changes
56 days (total: 9,931)
229 bugfixes (total: 12,015)
406 commits (total: 35,190)
0 new public libcurl function (total: 96)
1 new curl_easy_setopt() option (total: 308)
1 new curl command line option (total: 269)
91 contributors, 47 new (total: 3,426)
36 authors, 17 new (total: 1,375)
2 security fixes (total: 166)

Security

Changes

  • When doing MQTT, curl now sends pings
  • The Schannel backend now supports pkcs12 client certificates containing CA certificates
  • Added CURLOPT_SSL_SIGNATURE_ALGORITHMS and --sigalgs for the OpenSSL backend
  • ngtcp2 + OpenSSL’s new QUIC API is now supported. Requires OpenSSL 3.5 or later.
  • wcurl comes bundled in the curl tarball
  • websocket can now disable auto-pong

Bugfixes

See the changelog on the curl site for the full set, or watch the release presentation for a “best of” collection.

The curl user survey 2025 is up

2025-05-19 06:25:23

Yes!

curl user survey 2025

The time has come for you to once again do your curl community duty. Run over and fill in the curl user survey and tell us about how you use curl etc. This is the only proper way we get user feedback on a wide scale so please use this opportunity to tell us what you really think.

This is the 12th time the survey runs. It is generally similar to last year’s but with some details updated and refreshed.

The survey stays up for fourteen days. Tell your friends.

curl user survey 2025

Leeks and leaks

2025-05-16 17:05:50

On the completely impossible situation of blocking the Tor .onion TLD to avoid leaks, but at the same time not block it to make users able to do what they want.

dot-onion leaks

The onion TLD is a Tor specific domain that does not mean much to the world outside of Tor. If you try to resolve one of the domains in there with a normal resolver, you will get told that the domain does not exist.

If you ask your ISP’s resolver for such a hostname, you also advertise your intentions to speak with that domain to a sizeable portion of the Internet. DNS is inherently commonly insecure and has a habit of getting almost broadcast to lots of different actors. A user on this IP address intends to interact with this .onion site.

It is thus of utter importance for Tor users to never use the normal DNS system for resolving .onion names.

Remedy: refuse them

To help preventing DNS leaks as mentioned above, Tor people engaged with the IETF and the cooperation ultimately ended up with the publication of RFC 7686: The “.onion” Special-Use Domain Name. Published in 2015.

This document details how libraries and software should approach handling of this special domain. It basically says that software should to a large degree refuse to resolve such hostnames.

curl

In November 2015 we were made aware of the onion RFC and how it says we should filter these domains. At the time nobody seemed keen to work on this, and the problem was added to the known bugs document.

Eight years later the issue was still lingering in that document and curl had still not done anything about it, when Matt Jolly emerged from the shadows and brought a PR that finally implemented the filtering the RFC says we should do. Merged into curl on March 30, 2023. Shipped in the curl 8.1.0 release that shipped in May 17, 2023. Two years ago.

Since that release, curl users would not accidentally leak their .onion use.

A curl user that uses Tor would obviously use a SOCKS proxy to access the Tor network like they always have and then curl would let the Tor server do the name resolving as that is the entity here that knows about Tor and the dot onion domain etc.

That’s the thing with using a proxy like this. A network client like curl can hand the full host name to the proxy server and let that do the resolving magic instead of it being done by the client. It avoids leaking names to local name resolvers.

Controversy

It did not take long after curl started support the RFC that Tor themselves had pushed for, when Tor users with creative network setups realized and had opinions.

A proposal appeared to provide an override of the filter based on an environment variable for users who have a setup where the normal name resolver is already protected somehow and is known to be okay. Environment variables make for horrible APIs and the discussion did not end up in any particular consensus for other solutions so this suggestion did not make it through into code.

This issue has subsequently popped up a few more times when users have had some issues, but no fix or solution has been merged. curl remains blocking this domain. Usually people realize that when using the SOCKS proxy correctly, the domain name works as expected and that has then been the end of the discussions.

oniux

This week the Tor project announced a new product of their: oniux: a command-line utility providing Tor network isolation for third-party applications using Linux namespaces.

On their introductory web page explaining this new tool they even show it off using a curl command line:

$ oniux curl http://2gzyxa5ihm7wid.onion/index.html

(I decided to shorten the hostname here for illustration purposes.)

Wait a second, isn’t that a .onion hostname in that URL? Now what does curl do with .onion host names since that release two years ago?

Correct: that illustrated command line only works with old curl versions from before we implemented support for RFC 7686. From before we tried to do in curl what Tor indirectly suggested we should do. So now, when we try to do the right thing, curl does not work with this new tool Tor themselves launched!

At least we can’t say we get do live a dull life.

Hey this doesn’t work!

No really? Tell us all about it. Of course there was immediately an issue submitted against curl for this, quite rightfully. That tool was quite clearly made for a use case like this.

So how do we fix this?

Detecting malicious Unicode

2025-05-16 15:09:15

In a recent educational trick, curl contributor James Fuller submitted a pull-request to the project in which he suggested a larger cleanup of a set of scripts.

In a later presentation, he could show us how not a single human reviewer in the team nor any CI job had spotted or remarked on one of the changes he included: he replaced an ASCII letter with a Unicode alternative in a URL.

This was an eye-opener to several of us and we decided we needed to up our game. We are the curl project. We can do better.

GitHub

The replacement symbol looked identical to the ASCII version so it was not possible to visually spot this, but the diff viewer knows there is a difference.

In this GitHub website screenshot below I reproduced a similar case. The right-side version has the Latin letter ‘g’ replaced with the Armenian letter co. They appear to be the same.

The diff viewer says there is a difference but as a human it isn’t possible to detect what it is. Is it a flaw? Does it matter? If done “correctly”, it would be done together with a real and expected fix.

The impact of changing one or more letters in a URL can of course be devastating depending on conditions.

When I flagged about this rather big omission to GitHub people, I got barely no responses at all and I get the feeling the impact of this flaw is not understood and acknowledged. Or perhaps they are all just too busy implementing the next AI feature we don’t want.

Warnings

When we discussed this problem on Mastodon earlier this week, Viktor Szakats provided me with an example screenshot of doing a similar stunt with Gitea which quite helpfully highlights that there is something special about the replacement:

I have been told that some of the other source code hosting services also show similar warnings.

As a user, I would actually like to know even more than this, but at least this warns about the proposed change clearly enough so that if this happens I would get the code manually and investigate before accepting such a change.

Detect

While we wait for GitHub to wake up and react (which I have no expectation will actually happen anytime soon), we have implemented checks to help us poor humans spot things like this. To detect malicious Unicode.

We have added a CI job that scans all files and validates every UTF-8 sequence in the git repository.

In the curl git repository most files and most content are plain old ASCII so we can “easily” whitelist a small set of UTF-8 sequences and some specific files, the rest of the files are simply not allowed to use UTF-8 at all as they will then fail the CI job and turn up red.

In order to drive this change home, we went through all the test files in the curl repository and made sure that all the UTF-8 occurrences were instead replaced by other kind of escape sequences and similar. Some of them were also used more or less by mistake and could easily be replaced by their ASCII counterparts.

The next time someone tries this stunt on us it could be someone with less good intentions, but now ideally our CI will tell us.

Confusables

There are plenty of tools to find similar-looking characters in different Unicode sets. One of them is provided by the Unicode consortium themselves:

https://util.unicode.org/UnicodeJsps/confusables.jsp

Reactive

This was yet another security-related fix reacting on a demonstrated problem. I am sure there are plenty more problems which we have not yet thought about nor been shown and therefore we do not have adequate means to detect and act on automatically.

We want and strive to be proactive and tighten everything before malicious people exploit some weakness somewhere but security remains this never-ending race where we can only do the best we can and while the other side is working in silence and might at some future point attack us in new creative ways we had not anticipated.

That future unknown attack is a tricky thing.

Update

(At 17:30 on the same day of the original post) GitHub has told me they have raised this as a security issue internally and they are working on a fix.

Supported curl versions and end of life

2025-05-14 14:56:45

The other week we shipped the 266th curl release. This counter is perhaps a little inflated since it also includes the versions we did before we renamed it to curl, but still, there are hundreds of them. We keep cranking them out at least once every eight weeks; more often than so when we need to do patch releases. There is no planned end or expected change to this system for the foreseeable future. We can assume around ten new curl releases per year for a long time to come.

Release versions

We have the simplest possible release branching model: there is only one long term development branch. We create releases from master when the time comes. This means that we only have one version that is the latest and that we never fix or patch old releases. No other long-living release branches.

You can still find older curl versions in the wild getting patched, but those are not done by the curl project. Just about every Linux distribution, for example, maintains several old curl versions to which they back-port security fixes etc.

As we work crazy hard at not breaking users and to maintain behaviors, users should always be able to upgrade to the latest version without risking to break their use cases. Even when that upgrade jump is enormous. (We offer commercial alternatives for those who want even stronger guarantees, but they are provided slightly separate from the Open Source project.)

Supported versions

The curl support we provide for no money in the Open Source curl project is of course always a best-effort with no promises. We offer paid support for those that need promises, guaranteed response times or just want more and dedicated attention.

We support users with their curl issues, independent of their version – if we can. It is however likely that we ask the reporters using old versions to first try their cases using a modern curl version to see if the problem is not already fixed, so that we do not have to waste time researching something that might not need any work.

If the user’s reported problem cannot be reproduced with the latest curl version, then we are done. Otherwise, again, the paid support option exists.

So, while this is not quite a supported versions concept, we focus our free-support efforts on recent releases – bugs that are reported on old versions that cannot be reproduced with a modern version are considered outdated.

Not really End of life

Because of this concept, we don’t really have end of life dates for our products. They are all just in varying degrees of aging. We still happily answer questions about versions shipped twenty years ago if we can, but we do not particularly care about bugs in them if they don’t seem to exist anymore.

We urge and push users into using the most recent curl versions at any time so that they get the best features, the most solid functionality and the least amount of security problems.

Or that they pay for support to go beyond this.

In reality, of course, users are regularly stuck with old curl versions. Often because they use an (outdated) Linux distribution which does not upgrade its curl package.

They all “work”

We regularly have users ask questions about curl versions we shipped ten, twelve or even fifteen years ago so we know old releases are used widely. All those old versions still mostly work and as long as they do what the users want and ask curl to do, then things are fine. At least if they use versions from distributions that back-port security fixes.

In reality of course, the users who still use the most ancient curl versions also do this on abandoned or end-of-lived distros, which means that they run insecure versions of curl – and probably basically every other tool and library they use are also insecure by then. In the best of worlds the users have perfect control and awareness of those details.

Feature window

Since we do all releases from the single master branch we have the feature window/freeze concept. We only allow merging features/changes during a few weeks in the release cycle, and all the other time we only merge bugfixes and non-feature changes. This, to make sure that the branch is as stable as possible by the time the release is to be shipped.