2025-10-16 14:02:12
A flow chart describing some steps and decisions done within curl when a HTTP URL is provided. For hostnames, protocol and port numbers.
This flow chart ignores proxies, authentication considerations and use of unix domain sockets to keep things simpler.
An initial step is of course to extract the hostname part from the URL. The hostname in a URL can be provided as a plain IP address or as a name. If a numerical IPv4 or IPv6 address are not provided in the URL, curl checks if the hostname is provided using IDN (International Domain Names) and if so, it converts the name into punycode that it then can continue with.
Given the protocol, the hostname and port number curl checks if it has an existing connection alive suitable for use. Reusing an existing connection is preferred as it is the fastest way to start the new transfer. Connection reuse is done based on the provided name and not the IP address so that curl can skip figuring that out if there already is a connection available.
When trying to connect to a host, curl first checks if there are any tricks selected, like this option that makes curl actually resolve hostname B even when asked to connect to host A.
curl might have a populated alt-svc cache from previous transfers. It is basically a mapping for specific HTTP versions and hostnames over to another HTTP version and hostname for a certain amount of time. This can change hostname A into hostname B.
This is an option that populates the DNS cache with one or more user provided IP addresses for a given hostname.
Before curl resolves a hostname into a set of IP addresses, it checks if it already has the information in its DNS cache, as that is usually much faster than having to ask for that data again. Entries are typically only kept in this cache for a minute until evicted.
When curl resolves a hostname, it wants the A, AAAA and HTTPS DNS records data. A and AAAA provides a list of IP addresses to try to connect to, and the HTTPS field provides HTTP version information, port number, ECH config and possibly more.
curl might also have an HSTS cache, which is another map for when plain HTTP accesses should rather be internally upgraded to instead use HTTPS. This changes protocol to use and default port number.
Depending on what IP versions and HTTP versions the above steps have determined curl should try to use, curl starts a connection race with potentially quite a few parallel connection attempts, each started a little delayed after the previous.
Of course, if any of them can’t be done or fails, they are immediately skipped and the next one in line starts. Each of them also possibly start a new one if the previous one has not connected with a certain time.
The first contender to successfully connect to the host wins and the other attempts are quickly discarded.
If the protocol is HTTPS (which it always is if HTTP/3 is selected), the TLS handshake is performed after the TCP connection is established. For HTTP/3, the TLS handshake is integrated into the QUIC connection setup.
The TLS handshake can make curl reuse an existing session, decide ALPN, use ECH and send early data.
The session id/ticket handling is also a cache curl holds that allows for faster reconnects to hosts it has connected to before.
Once curl has an established connection to use, it starts with sending off the HTTP request, which begins the transfer.
2025-10-10 19:58:37
(See how I cleverly did not mention AI in the title!)
You know we have seen more than our fair share of slop reports sent to the curl project so it seems only fair that I also write something about the state of AI when we get to enjoy some positive aspects of this technology.
Let’s try doing this in a chronological order.
curl is almost 180,000 lines of C89 code, excluding blank lines. About 637,000 words in C and H files.
To compare, the original novel War and Peace (a thick book) consisted of 587,000 words.
The first ideas and traces for curl originated in the httpget project, started in late 1996. Meaning that there is a lot of history and legacy here.
curl does network transfers for 28 URL schemes, it has run on over 100 operating systems and on almost 30 CPU architectures. It builds with a wide selection of optional third party libraries.
We have shipped over 270 curl releases for which we have documented a total of over 12,500 bugfixes. More than 1,400 humans have contributed with commits merged into the repository, over 3,500 humans are thanked for having helped out.
It is a very actively developed project.
On August 11, 2025 there was a curl vulnerability reported against curl that would turn out legitimate and it would later be published as CVE-2025-9086. The reporter of this was the Google Big Sleep team. A team that claims they use “an AI agent developed by Google DeepMind and Google Project Zero, that actively searches and finds unknown security vulnerabilities in software”.
This was the first ever report we have received that seems to have used AI to accurately spot and report a security problem in curl. Of course, we don’t know how much AI and how much human that were involved in the research and the report. The entire reporting process felt very human.
In mid September 2025 we got new a security vulnerability reported against curl from a security researcher we had not been in contact with before.
The report which accurately identified a problem, was not turned into a CVE only because of sheer luck: the code didn’t work for other reasons so the vulnerability couldn’t actually be reached. As a direct result of this lesson, we ripped out support for krb5-ftp.
The reporter of the krb5-ftp problem is called Joshua Rogers. He contacted us and graciously forwarded us a huge list of more potential issues that he had extracted. As I understand it, mostly done with the help of ZeroPath. A code analyzer with AI powers.
In the curl project we continuously run compilers with maximum pickiness enabled and we throw scan-build, clang-tidy, CodeSonar, Coverity, CodeQL and OSS-Fuzz at it and we always address and fix every warning and complaint they report so it was a little surprising that this tool now suddenly could produce over two hundred new potential problems. But it sure did. And it was only the beginning.
As we started to plow through the huge list of issues from Joshua, we received yet another security report against curl. This time by Stanislav Fort from Aisle (using their own AI powered tooling and pipeline for code analysis). Getting security reports is not uncommon for us, we tend to get 2 -3 every week, but on September 23 we got another one we could confirm was a real vulnerability. Again, an AI powered analysis tool had been used. (At the time I write this blog entry, this particular issue has not been disclosed yet so I can’t link it.)
As I was amazed by the quality and insights in some of the issues in Joshua’s initial list he sent over I tooted about it on Mastodon, which later was picked up by Hacker news, The Register, Elektroniktidningen and more.
These new reported issues feel quite similar in nature to defects reported by code analyzers typically do: small mistakes, omissions, flaws, bugs. Most of them are just plain variable mixups, return code confusions, small memory leaks in weird situations, state transition mistakes and variable type conversions possibly leading to problems etc. Remarkably few of them complete false positives.
The quality of the reports make it feel like a new generation of issue identification. Like in this ladder of tool evolution from the old days. Each new step has taken the notch up a level:
Out of that initial list, we merged about 50 separately identifiable bugfixes. The rest were some false positives but also lots of minor issues that we just didn’t think were worth poking at or we didn’t quite agree with.
We (primarily Stefan Eissing and myself) worked hard to get through that initial list from Joshua within only a couple of days. A list we mistakenly thought was “it”.
Joshua then spiced things up for us by immediately delivering a second list with 47 additional issues. Follow by a third list with yet another 158 additional potential problems. At the same time Stanislav did the similar thing and delivered to us two lists with a total of around twenty possible issues.
Don’t take me wrong. This is good. The issues are of high quality and even the ones we dismiss often have some insights and the rate of obvious false positive has remained low and quite manageable. Every bug we find and fix makes curl better. Every fix improves a software that impacts and empowers a huge portion of the world.
The total amount of suspected issues submitted by these two gentlemen are now at over four hundred. A fair pile of work for us curl maintainers!
Because how these reported issues might include security sensitive problems, we have decided to not publish them but limit access to the reporters and the curl security team.
As I write this, we are still working our way through these reports but it feels reasonable to assume that we will get even more soon…
An obvious and powerful benefit this tool seems to have compared to others is that it scans all source code without having a build. That means it can detect problems in all backends used in all build combinations. Old style code analyzers require a proper build to analyze and since you can build curl in countless combinations with a myriad of backend setups (where several are architecture or OS specific), it is literally impossible to have all code analyzed with such tools.
Also, these tools can inject (parts of) third party libraries as well and find issues in the borderland between curl and its dependencies.
I think this is one primary reason it found so many issues: it checked lots of code barely any other analyzers have investigated.
To illustrate the level of “smartness” in this tool, allow me to show a few examples that I think shows it off. These are issues reported against curl in the last few weeks and they have all been fixed. Beware that you might have to understand a thing or two about what curl does to properly follow here.
It correctly spotted that the documentation in the function header incorrectly said an argument is optional when in reality it isn’t. The fix was to correct the comment.
# `Curl_resolv`: NULL out-parameter dereference of `*entry`
* **Evidence:** `lib/hostip.c`. API promise: "returns a pointer to the entry in the `entry` argument (**if one is provided**)." However, code contains unconditional writes: `*entry = dns;` or `*entry = NULL;`.
* **Rationale:** The API allows `entry == NULL`, but the implementation dereferences it on every exit path, causing an immediate crash if a caller passes `NULL`.
I could add that the fact that it takes comments so seriously can also trick it to report wrong things when the comments are outdated and state bad “facts”. Which of course shouldn’t happen because comments should not lie!
It figured out that a piece of telnet code actually wouldn’t comply with the telnet protocol and pointed it out. Quite impressively I might add.
Telnet subnegotiation writes unescaped user-controlled values (tn->subopt_ttype
,tn->subopt_xdisploc
,tn->telnet_vars
) intotemp
(lines 948–989) without escaping IAC (0xFF)
Inlib/telnet.c
(lines 948–989) the code formats Telnet subnegotiation payloads intotemp
usingmsnprintf
and inserts the user-controllable valuestn->subopt_ttype
(lines 948–951),tn->subopt_xdisploc
(lines 960–963), andv->data
fromtn->telnet_vars
(lines 976–989) directly into the suboption data. The buffertemp
is then written to the socket withswrite
(lines 951, 963, 995) without duplicatingCURL_IAC
(0xFF) bytes. Telnet requires any IAC byte inside subnegotiation data to be escaped by doubling; because these values are not escaped, an 0xFF byte in any of them will be interpreted as an IAC command and can break the subnegotiation stream and cause protocol errors or malfunction.
Another case where it seems to know the best-practice for a TFTP implementation (pinning the used IP address for the duration of the transfer) and it detected that curl didn’t apply this best-practice in code so it correctly complained:
No TFTP peer/TID validation
The TFTP receive handler updates state->remote_addr from recvfrom() on every datagram and does not validate that incoming packets come from the previously established server address/port (transfer ID). As a result, any host able to send UDP packets to the client (e.g., on-path attacker or local network adversary) can inject a DATA/OACK/ERROR packet with the expected next block number. The client will accept the payload (Curl_client_write), ACK it, and switch subsequent communication to the attacker’s address, allowing content injection or session hijack. Correct TFTP behavior is to bind to the first server TID and ignore, or error out on, packets from other TIDs.
Most memory leaks are reported when someone runs code and notices that not everything is freed in some specific circumstance. We of course test for leaks all the time in tests, but in order to see them in a test we need to run that exact case and there are many code paths that are hard to travel in tests.
Apart from doing tests you can of course find leaks by manually reviewing code, but history and experience tell us that is an error-prone method.
# GSSAPI security message: leaked `output_token` on invalid token length
* **Evidence:** `lib/vauth/krb5_gssapi.c:205--207`. Short quote:
```c
if(output_token.length != 4) { ... return CURLE_BAD_CONTENT_ENCODING; }
```
The `gss_release_buffer(&unused_status, &output_token);` call occurs later at line 215, so this early return leaks the buffer from `gss_unwrap`.
* **Rationale:** Reachable with a malicious peer sending a not-4-byte security message; repeated handshakes can cause unbounded heap growth (DoS).
This particular bug looks straight forward and in hindsight easy enough to spot, but it has existed like this in plain sight in code for over a decade.
I think I maybe shocked some people when I stated that the AI tooling helped us find 22, 70 and then a 100 bugs etc. I suspect people in general are not aware of and does not think about what kind of bugfix frequency we work on in this project. Fixing several hundred bugs per release is a normal rate for us. Sure, this cycle we will probably reach a new record, but I still don’t grasp for breath because of this.
I don’t consider this new tooling a revolution. It does not massively or drastically change code or how we approach development. It is however an excellent new project assistant. A powerful tool that highlights code areas that need more attention. A much appreciated evolutionary step.
I might of course be speaking too early. Perhaps it will develop a lot more and it can then turn into a revolution.
The AI engines burn the forests and they are built by ingesting other people’s code and work. Is it morally and ethically right to use AI for improving Open Source in this way? It is a question to wrestle with and I’m sure the discussion will go on. At least this use of AI does not generate duplicates of someone else’s code for us to use, but it certainly takes lessons from and find patterns based on others’ code. But so do we all, I hope.
I can imagine that curl is a pretty good source code to use a tool of this caliber on, as curl is old, mature and all the minor nits and defect have been polished away. It is a project where we have a high bar and we want to raise it even higher. We love the opportunity to get additional help and figure out where we might have slipped. Then fix those and try again. Over and over until the end of time.
At the DEF CON 33 conference which took place in August 2025, DARPA ran a competition called the AI Cyber Challenge or AIxCC for short. In this contest, the competing teams used AI tools to find artificially injected vulnerabilities in projects – with zero human intervention. One of the projects used in the finals that the teams looked for problems in, was… curl!
I have been promised a report or a list of findings from that exercise, as presumably the teams found something more than just the fake inserted problems. I will report back when that happens.
We do not yet have any AI powered code analyzer in our CI setup, but I am looking forward to adding such. Maybe several.
We can ask GitHub copilot for pull-request reviews but from the little I’ve tried copilot for reviews it is far from comparable to the reports I have received from Joshua and Stanislav, and quite frankly it has been mostly underwhelming. We do not use it. Of course, that can change and it might turn into a powerful tool one day.
We now have an established constructive communication setup with both these reporters, which should enable a solid foundation for us to improve curl even more going forward.
I personally still do not use any AI at all during development – apart from occasional small experiments. Partly because they all seem to force me into using VS code and I totally lose all my productivity with that. Partly because I’ve not found it very productive in my experiments.
Interestingly, this productive AI development happens pretty much concurrently with the AI slop avalanche we also see, proving that one AI is not necessarily like the other AI.
2025-10-01 17:03:23
I believe a good product needs clear and thorough documentation. I think shipping a quality product requires you to provide detailed and informative release notes. I try to live up to this in the curl project, and this is how we do it.
Some of the scripts I use to maintain the RELEASE NOTES and the associated documentation.
A foundational script to make things work smoothly is the single invoke script that puts a release tarball together from what is currently in the file system. We can run this in a cronjob and easily generate daily snapshots that look exactly like a release would look like if we had done one at that point. Our script for this purpose is called maketgz. We have a containerized version of that, which runs in a specific docker setup and we called that dmaketgz. This version of the script builds a fully reproducible release tarball.
If you want to verify that all the contents of a release tarball only originate from the git repository and the associated release tools, we provide a script for that purpose: verify-release.
An important documentation for each release is of course the RELEASE-NOTES
file that details exactly what changes and fixes that have been done since the previous release. It also gives proper credit to all the people that were involved and helped making the release this particular way.
We use a quite simple git commit message standard for curl. It details how the first line should be constructed and how to specify meta-data in the message. Sticking to this message format allows us to write scripts and do automation around the git history.
When I invoke the release-notes.pl script, it performs a git log
command that lists all changes done in the repository since the previous commit of the RELEASE-NOTES files with the commit message “synced”. Those changes are then parsed: the first line is used as a release notes entry and issue tracker references within the message are used for linking the changes to allow users to track their origins.
The script cannot itself actually know if a commit is a change, a bugfix or something else, so after it has been invoked I have to go over the updated release notes file manually. I check the newly added entries and I remove the ones that are irrelevant and I move the lines referring to changes to the changes list.
I then run release-notes.pl cleanup
, which cleans up the release notes file – it sorts the bugfixes list alphabetically and removes pending orphaned references no longer used (for previously listed entries that were deleted in the process mentioned above).
When invoked, this script extracts all contributors to the project since the most recent release (tag). Commit authors, committers and everyone given credit in all commit messages done since. Also all committers and authors in the web repository over the same period. It also takes the existing names mentioned in the existing RELEASE NOTES file.
It cleans up the names, runs them through the THANKS-filter and then outputs each unique name in a convenient way and format suitable for easy copy and paste into RELEASE-NOTES.
The delta script outputs data and counters about the current state of the repository compared to the most recent release.
Invoking the script in a terminal shows something like this:
= Since curl-8_12_1 Feb 13 08:18:33 2025 +0100 =
Elapsed time: 10.4 days (total 9837 / 10331)
Commits: 122 (total 34405)
Commit authors: 14, 1 new (total 1343)
Contributors: 19, 8 new (total 3351)
New public functions: 0 (total 96)
New curl_easy_setopt() options: 0 (total 306)
New command line options: 0 (total 267)
Changes logged: 0
Bugfixes logged: 67 (6.44 per day)
Added files: 10 (total 4058)
Deleted files: 2 (delta: 8)
Files changed: 328 (8.08%)
Lines inserted: 7798
Lines deleted: 6318 (delta: 1480)
With this output, I can update the counters at the top of the RELEASE NOTES file.
I then commit the RELEASE-NOTES files with the commit message “RELEASE-NOTES: synced” so that the automation knows exactly when it was last updated.
As a courtesy to curious users and developers, we always keep an updated version of the current in progress release notes document on the curl website: https://curl.se/dev/release-notes.html.
In my ~/.gitconfig
file I have a useful alias that helps me:
[alias]
latest = log @^{/RELEASE-NOTES:.synced}..
This lets me easily list all changes done in the repository since I last updated the release notes file. I often list them like this:
git latest --oneline
As this then lists all the commits as one line per commit. If the list is large enough, maybe 20-30 lines or something and there has been at least a few days since the previous update, I might update the release notes.
Whenever there is a curl release, I also make sure the release notes notes document is fully updated and properly synced for that.
2025-09-22 15:57:31
As the Cyber Resilience Act (CRA) is getting closer and companies wanting to sell digital services in goods within the EU need to step up, tighten their procedures, improve their documentation and get control over their dependencies I feel it could be timely to remind everyone:
We of course offer full support and fully CRA compliant curl versions to support customers.
curl is not a manufacturer as per the legislation’s terminology so we as a project don’t have those requirements, but we always have our ducks in order and we will gladly assist and help manufacturers to comply.
We have done internet transfers for the world for decades. Fast, securely, standards compliant, feature packed and rock solid. We make curl to empower the world’s digital infrastructure.
You can rely on us.
2025-09-19 14:23:25
We are dropping support for this feature in curl 8.17.0. Kerberos5 FTP to be exact. The last Kerberos support we had for FTP.
On September 16, 2025 we received a security report that accurately identified a possible stack based buffer overflow in the Kerberos FTP code that could allow a malicious FTP server cause havoc in curl.
Yikes. That is bad.
But wait, it also identified a second problem. In the exact same commit that introduced the potential security vulnerability (by me, no less) I also injected a second bug!
This second bug effectively and completely broke the function and prevented Kerberos FTP from working. So no user would actually be vulnerable to the first problem because it simply never works anymore and no user would then use this against a malicious server!
At the time when I merged the commit this second bug was not detected because we obviously do not have tests and CI that test this piece of the code. It pains me to admit this, but we do have a few areas left in curl that aren’t covered by tests or enough tests.
I merged this bad code back in May 2024 and we have done over a year’s worth of releases since then and since not a single person has reported this breakage we can use this as a decent canary in the mine and safely conclude that not a single soul has used this feature in this time (with a recent curl install). If they did they didn’t tell us about it and I don’t count that.
With this accidental/clever user check, we have then decided to instead of fixing the code we rip the entire thing out. Clearly we should not support this code since A) it isn’t used and B) it isn’t tested in the test suite. Perhaps also C) it is weird code.
Bye bye Kerberos5 FTP support. We introduced it back in July 2007.
We had Kerberos4 support for FTP between September 2000 and August 2013.
As a follow-on effect, we also get rid of the last piece of code in the repository that were copyrighted “Kungliga Tekniska Högskolan” under a BSD-3 license. The only piece that was BSD-3 licensed. One less license to care about!
The top image is a cropped version of Cerberus and Heracles. An etching by Antonio Tempesta (Florence, Italy, 1555–1630).
2025-09-18 15:32:47
Every curl security report starts out with someone submitting an issue to us on https://hackerone.com/curl. The reporter tells us what they suspect and what they think the problem is. This report is kept private, visible only to the curl security team and the reporter while we work on it.
In recent months we have gotten 3-4 security reports per week. The program has run for over six years now, with almost 600 reports accumulated.
On average, someone in the team makes a first response to that report already within the first hour.
The curl security team right now consists of seven long time and experienced curl maintainers. We immediately start to analyze and assess the received issue and its claims. Most reports are not identifying actual security problems and are instead quickly dismissed and closed. Some of them identify plain bugs that are not security issues and then we move the discussion over to the public bug tracker instead.
This part can take anything from hours up to multiple days and usually involves several curl security team members.
If we think the issue might have merit, we ask follow-up questions, test reproducible code and discuss with the reporter.
A small fraction of the incoming reports is actually considered valid security vulnerabilities. We work together with the reporter to reach a good understanding of what exactly is required for the bug to trigger and what the flaw can lead to. Together we set a severity for the problem (low, medium, high, critical) and we work out a first patch – which also helps to make sure we understand the issue. Unless the problem is deemed serious we tend to sync the publication of the new vulnerability with the pending next release. Our normal release cycle is eight weeks so we are never farther than 56 days away from the next release.
For security issues we deem to be severity low or medium we create a pull request for the problem in the public repository – but we don’t mention the security angle of the problem in the public communication of it. This way, we also make sure that the fix gets added test exposure and time to get polished before the pending next release. Over the last five or so years, only two in about eighty confirmed security vulnerabilities have been rated a higher severity than medium. Fixes for vulnerabilities we consider to be severity high or critical are instead merged into the git repository when there is approximately 48 hours left to the pending release – to limit the exposure time before it is announced properly. We need to merge it into the public before the release because our entire test infrastructure and verification system is based on public source code.
Next, we write up a detailed security advisory that explains the problem and exactly what the mistake is and how it can lead to something bad – including all the relevant details we can think of. This includes version ranges for affected curl versions and the exact git commits that introduced the problem as well as which commit that fixed the issue – plus credits to the reporter and to the patch author etc. We have the ambition to provide the best security advisories you can find in the industry. (We also provide them in JSON format etc on the site for the rare few users who care about that.) We of course want the original reporter involved as well so that we make sure that we get all the angles of the problem covered accurately.
As we are a CNA (CVE Numbering Authority), we reserve and manage CVE Ids for our own issues ourselves.
About a week before the pending release when we also will publish the CVE, we inform the distros@openwall mailing list about the issue, including the fix, and when it is going to be released. It gives Open Source operating systems a little time to prepare their releases and adjust for the CVE we will publish.
On the release day we publish the CVE details and we ship the release. We then also close the HackerOne report and disclose it to the world. We disclose all HackerOne reports once closed for maximum transparency and openness. We also inform all the curl mailing lists and the oss-security mailing list about the new CVE. Sometimes we of course publish more than one CVE for the same release.
Once the HackerOne report is closed and disclosed to the world, the vulnerability reporter can claim a bug bounty from the Internet Bug Bounty which pays the researcher a certain amount of money based on the severity level of the curl vulnerability.
(The original text I used for this blog post was previously provided to the interview I made for Help Net Security. Tweaked and slightly extended here.)
The heroes in the curl security team who usually work on all this in silence and without much ado, are currently (in no particular order):