2025-12-10 23:08:20

Say you have a common 6-sided die and need to roll it until the sum of your rolls is at least 6. How many times would you need to roll?
If you had a 20-sided die and you need to roll for a sum of at least 20, would that take more rolls or fewer rolls on average?
According to [1], the expected number of rolls of an n-sided dice for the sum of the rolls to be n or more equals
So for a 6-sided die, the expected number of rolls is (7/6)5 = 2.1614.
For a 20-sided die, the expected number of rolls is (21/20)19 = 2.5270.
The expected number of rolls is an increasing function of n, and it converges to e.

Here’s a little simulation script for the result above.
from numpy.random import randint
def game(n):
s = 0
i = 0
while s
This produced 2.5273.
[1] Enrique Treviño. Expected Number of Dice Rolls for the Sum to Reach n. American Mathematical Monthly, Vol 127, No. 3 (March 2020), p. 257.
The post Rolling n-sided dice to get at least n first appeared on John D. Cook.
2025-12-10 22:33:53
There are numerous memes floating around with the words “Being weak is nothing to be ashamed of; staying weak is.” Or some variation. I thought about this meme in the context of weak derivatives.

The last couple posts have talked about distributions, also called generalized functions. The delta function, for example, is not actually a function but a generalized function, a linear functional on a space of test functions.
Distribution theory lets you take derivatives of functions that don’t have a derivative in the classical sense. View the function as a regular distribution, take its derivative as a distribution, and if this derivative is a regular distribution, that function is called a weak derivative of the original function.
You can use distribution theory to complete a space of functions analogous to how the real numbers complete the rational numbers.
To show that an equation has a rational solution, you might first show that it has a real solution, then show that the real solution is in fact a rational. To state the strategy more abstractly, to find a solution in a small space, you first look for solutions in a larger space where solutions are easier to find. Then you see whether the solution you found lies in the smaller space.
This is the modern strategy for studying differential equations. You first show that a differential equation has a solution in a weak sense, then if possible prove a regularity result that shows the solution is a classical solution. There’s no shame in finding a weak solution. But from a classical perspective, there’s shame in stopping there.
2025-12-08 23:00:37
The previous post showed how we can take the Fourier transform of functions that don’t have a Fourier transform in the classical sense.
The classical definition of the Fourier transform of a function f requires the integral of |f| over the real line to be finite. This implies f(x) must approach zero as x goes to ∞ and −∞. A constant function won’t do, and yet we got around that in the previous post. Distribution theory even lets you take the Fourier transform of functions that grow as their arguments go off to infinity, as long as they don’t grow too fast, i.e. like a polynomial but not like an exponential.
In this post we want to take the Fourier transform of functions like sine and cosine. If you read that sentence as saying Fourier series, you have the right instinct for classical analysis: you take the Fourier series of periodic functions, not the Fourier transform. But with distribution theory you can take the Fourier transform, unifying Fourier series and Fourier transforms.
For this post I’ll be defining the classical Fourier transform using the convention
and generalizing this definition to distributions as in the previous post.
With this convention, the Fourier transform of 1 is δ, and the Fourier transform of δ is 2π.
One can show that the Fourier transform of a cosine is a sum of delta functions, and the Fourier transform of a sine is a difference of delta functions.
It follows that the Fourier transform of a Fourier series is a sum of delta functions shifted by integers. In fact, if you convert the Fourier series to complex form, the coefficients of the deltas are exactly the Fourier series coefficients.
2025-12-08 20:30:58
Suppose you have a constant function f(x) = c. What is the Fourier transform of f?
We will show why the direct approach doesn’t work, give two hand-wavy approaches, and a rigorous definition.
Unfortunately there are multiple conventions for defining the Fourier transform.
For this post, we will define the Fourier transform of a function f to be
If f(x) = c then the integral diverges unless c = 0.
The more concentrated a function is in the time domain, the more it spreads out in the frequency domain. And the more spread out a function is in the time domain, the more concentrated it is in the frequency domain. If you think this sounds like the Heisenberg uncertainty principle, you’re right: there is a connection.
A constant function is as spread out as possible, so it seems that its Fourier transform should be as concentrated as possible, i.e. a delta function. The delta function isn’t literally a function, but it can be made rigorous. More on that below.
The Fourier transform of the Gaussian function exp(−x²/2) is the same function, i.e. the Gaussian function is a fixed point of the Fourier transform. More generally, the Fourier transform of the density function for a normal random variable with standard deviation σ is the density function for a normal random variable with standard deviation 1/σ.
As σ gets larger, the density becomes flatter. So we could think of our function f(x) = c as some multiple of a Gaussian density in the limit as σ goes to infinity. The Fourier transform is then some multiple of a Gaussian density with σ = 0, i.e. a point mass or delta function.
If f and φ are two well-behaved functions then
In other words, we can move the “hat” representing the Fourier transform from one function to the other. The equation above is a theorem when f and φ are nice functions. We can use it to motivate a definition when the function f is not so nice but the function φ is very nice. Specifically, we will assume φ is an infinitely differentiable function that goes to zero at infinity faster than any polynomial.
Given a Lebesgue integrable function f, we can think of f as a linear operator via the map
More generally, we can define a distribution to be any continuous [1] linear operator from the space of test functions to the complex numbers. A distribution that can be defined by integral as above is called a regular distribution. When we say we’re taking the Fourier transform of the constant function f(x) = c, we’re actually taking the Fourier transform of the regular distribution associated with f. [2]
Not all distributions are regular. The delta “function” δ(x) is a distribution that acts on test functions by evaluating them at 0.
We define the Fourier transform of (the regular distribution associated with) a function f to be the distribution whose action on a test function φ equals the integral of the product of f and the Fourier transform of φ. When a function is Lebesgue integrable, this definition matches the classical definition.
With this definition, we can calculate that the Fourier transform of a constant function c equals
Note that with a different convention for defining the Fourier transform, you might get 2π c δ or just c δ.
An advantage of the convention that we’re using is that the Fourier transform of the Fourier transform of f(x) is f(−x) and not some multiple of f(−x). This implies that the Fourier transform of √2π δ is 1 and so the Fourier transform of δ is 1/√2π.
[1] To define continuity we need to put a topology on the space of test functions. That’s too much for this post.
[2] The constant function doesn’t have a finite integral, but its product with a test function does because test functions decay rapidly. In fact, even the product of a polynomial with a test function is integrable
The post Fourier transform of a flat line first appeared on John D. Cook.2025-12-08 20:11:44
The weakest link in the privacy of cryptocurrency transactions is often outside the blockchain. There are technologies such as stealth addresses and subaddresses to try to thwart attempts to link transactions to individuals. They do a good job of anonymizing transaction data, but the weak link may be metadata, as is often the case.
Cryptocurrency nodes circulate transaction data using a peer-to-peer network. An entity running multiple nodes can compare when data arrived at each of its nodes and triangulate to infer which node first sent a set of transactions. The Dandelion protocol, and its refinement Dandelion++, aims to mitigate this risk. Dandelion++ is currently used in Monero and a few other coins; other cryptocurrencies have considered or are considering using it.

The idea behind the Dandelion protocol is to have a “stalk” period and a “diffusion” period. Imagine data working up the stalk of a dandelion plant before diffusing like seeds in the wind. The usual P2P process is analogous to simply blowing on the seed head [1].
During the stalk period, information travels from one node to one node. Then after some number of hops, the diffusion process begins; the final node in the stalk period diffuses the information to all its peers. An observer with substantial but not complete visibility of the network may be able to determine which node initiated the diffusion, but maybe not the node at the other end of the stem.
A natural question is how this differs from something like Tor. In a nutshell, Tor offers identity protection before you enter a P2P network, and Dandelion offers identity protection inside the P2P network.
For more details, see the original paper on Dandelion [2].
[1] The original paper on Dandelion uses a dandelion seed as the metaphor for the protocol. “The name ‘dandelion spreading’ reflects the spreading pattern’s resemblance to a dandelion seed head and refers to the diagram below. However, other sources refer to the stalk and head of the dandelion plant, not just a single seed. Both mental images work since the plant has a slightly fractal structure with a single seed looking something like the plant.

[2] Shaileshh Bojja Venkatakrishnan, Giulia Fanti, Pramod Viswanath. Dandelion: Redesigning the Bitcoin Network for Anonymity. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 1 Article No.: 22, Pages 1–34. Available here: https://doi.org/10.1145/3084459.
The post Obscuring P2P nodes with Dandelion first appeared on John D. Cook.2025-12-07 00:36:37
I’m taking a break from my series on celestial navigation. The previous posts give the basics, but I haven’t thought of a way to go further that I’m satisfied with. So now for something completely different: Pedersen commitments.
Pedersen commitments are a building block of zero knowledge proofs (ZKP), and they give an opportunity to look at a couple other other interesting topics: nothing-up-my-sleeve constructions and homomorphic encryption.
A Pedersen commitment to a value v takes a random number x and two generators of an elliptic curve, G and H, and returns
C = vG + xH.
The significance of C is that it appears to be a random number to the recipient, but the sender who calculated it can later show that it was computed from v and x. C is called a commitment to the value v because the sender cannot later say that C was computed from a different v and a different x.
The addition in
C = vG + xH
is carried out on on an elliptic curve, such as Ed25519 in the case of Monero. Multiplication is defined by repeated addition, though it’s not computed that way [1]. G and H are not just points on the elliptic curve but points in a large, prime-order subgroup of the elliptic curve.
Because the value x is random, the possible values of C are uniformly distributed on the curve, and so someone observing C learns nothing about v. For that reason x is called a blinding factor.
The difficulty of the discrete logarithm problem insures that it is impractical come up with different values v‘ and x‘ such that
v G + x H = v‘ G + x‘ H.
This depends on two assumptions.
The first assumption is that the discrete logarithm problem is hard to solve given current algorithms and hardware. The prevailing opinion is that it is unlikely anyone will come up with an efficient algorithm for solving the discrete logarithm problem on current hardware. However, Shor’s algorithm could solve the discrete logarithm problem efficiently if and when a practical, large-scale quantum computer is created.
The second assumption is that the generator H was chosen at random and not calculated to be a backdoor.
Because G and H are members of the same prime-order (i.e. cyclic) group, there exists some integer h such
H = hG
If the generator H was randomly selected, nobody knows h and nobody can calculate it. But if H was calculated by first selecting h and multiplying hG then there is a backdoor.
Now
C = vG + xH = vG + x(hG) = (v + xh)G.
If you know h, you can pick a new v‘ and solve for x‘ such that
v + xh = v‘ + x‘ h.
That would mean that in the context of a cryptocurrency that uses Pedersen commitments, such as Monero or the Liquid Network on top of Bitcoin, you could initially commit to spending v and later claimed that you committed to spending v‘.
Note that solving for x‘ requires modular arithmetic, not solving the discrete logarithm problem.
The way to prove that the generator H was chosen in good faith is to be transparent about how it was created. In practice this means using some sort of cryptographic hash function. For example, Bulletproofs hashed “bulletproof_g” and “bulletproof_h” to create its values of G and H. Bulletproofs require multiple values of G and H and so consecutive integers were concatenated to the strings before hashing.
Reversing a cryptographic hash like SHA256 is impractical, even assuming you have a quantum computer, and so it is extremely unlikely that there is a backdoor when the generators were created by hashing a natural string.
It’s said that Pedersen commitments do not require a trusted setup. That’s true in spirit, but more precisely they require a transparent setup that is easy to trust.
The function
C: (v, x) ↦ vG + xH
is a group homomorphism from pairs of integers to the subgroup generated by G and H. This means that
C(v, x) + C(v‘, x‘) = C(v + v‘, x + x‘)
or in other words, you can combine multiple commitments into a single commitment. The sum of a commitment to (v, x) and a commitment to (v‘, x‘) is a commitment to (v + v‘, x + x‘).
[1] In practice the number x is enormous, say on the order of the number of points on the elliptic curve, and so software does not add H to itself x times. Instead it uses a process analogous to fast exponentiation. In fact, if you write the group operation multiplicatively rather than addititively, it is exactly fast exponentiation.
The post What is a Pedersen commitment? first appeared on John D. Cook.