MoreRSS

site iconJohn D. CookModify

I have decades of consulting experience helping companies solve complex problems involving applied math, statistics, and data privacy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of John D. Cook

Does additional data always reduce posterior variance?

2026-07-04 10:50:36

A discussion over lunch today brought up the fact that additional data does not always decrease the size of a confidence interval. This post will look at this from a Bayesian perspective.

In general, new information reduces your uncertainty regarding whatever you’re estimating. The posterior distribution becomes more concentrated as more data are collected.

That’s what happens “in general” but does it necessarily happen every time you get new data? Conceivably if you get surprising data, data that is very unlikely given your current prior, posterior uncertainty might increase.

Binomial-beta model

To show that this is the case, suppose the probability of success in some binary trial has parameter θ and that θ has a beta prior. You could imagine this prior to be the posterior after having made some number of previous observations. Can a new observation increase the posterior variance in θ? If so, under what conditions?

The variance of a beta(a, b) random variable is

ab / (a + b)²(a + b + 1).

After observing a successful trial, the posterior distribution on θ is beta(a + 1, b). We can calculate the ratio of the posterior variance to the prior variance and ask under what circumstances, if any, the ratio is greater than 1.

If 2ab the posterior variance will be strictly less than the prior variance. This says if the prior mean odds against a success are no more than 2 : 1, observing a success will reduce the variance. (So will observing a failure.) But for any value of b, you can find a small enough value of a that observing a success will increase the variance.

Normal-normal model

Whether an observation can increase the posterior variance depends on the data model. If your data have a normal likelihood function with known variance and a normal prior on the mean θ, the posterior variance is always less than the prior observation, and it reduces by the same amount, independent of the observation x. If x is very unlikely a priori then it will pull the posterior mean toward itself more than an observation that is more concordant with the prior would have, but the change in the posterior variance is the same.

Proof of beta theorem

Here is a proof in Lean 4 of the statement above that if 2ab the posterior variance will be strictly less than the prior variance.

import Mathlib

set_option linter.style.header false

noncomputable def f (a b : ℝ) : ℝ := a * b / ((a + b) ^ 2 * (a + b + 1))

theorem f_ratio_lt_one' (a b : ℝ) (ha : 0 The post Does additional data always reduce posterior variance? first appeared on John D. Cook.

DNA Sequence Alignment and Kings

2026-07-01 08:21:21

This morning I wrote a post that included the central Delannoy numbers. The nth central Delannoy number Dn counts the number of ways a king can move from one corner of a chessboard to the diagonally opposite corner without backtracking.

The more general Delannoy numbers Dm,n are the analogy for an m × n rectangular board, not necessarily square.

Dm,n is also the number of possible sequence alignments for a strand of DNA with m base pairs and a strand with n base pairs [1]. At each step in the alignment process, you can introduce a gap in the first strand, the second strand or neither, which is analogous to the king who can move N, E, or NE at each step.

The Delannoy numbers can be computed recursively:

def D(m, n):
    if m == 0 or n == 0:
        return 1
    return D(m - 1, n) + D(m, n - 1) + D(m - 1, n - 1)

The code above can be sped up tremendously by adding the decorator

@lru_cache(maxsize=None)

above the function definition to turn on memoization. I did an experiment computing D12,15 with and without memoization and the times were 77.1805 seconds and 0.000062 seconds respectively, i.e. memoization made the code over a million times faster.

Incidentally, D12,15 = 2653649025 and so there are a lot of ways to align even short sequences unless you place some restriction on the permissible alignments.

Update: Here’s a heatmap plotting log10(Dm,n). Obviously the function increases with m and n: bigger chessboards have more possible paths. Moreover, it’s larger along the diagonal (i.e. the central Delannoy numbers). If you look along northeast to southwest diagonals, the function is largest in the middle where m = n.

[1] Torres, A., Cabada, A., & Nieto, J. J. (2003). An exact formula for the number of alignments between two DNA sequences. DNA Sequence, 14(6), 427–430. https://doi.org/10.1080/10425170310001617894

The post DNA Sequence Alignment and Kings first appeared on John D. Cook.

Distinguishing variables from parameters

2026-07-01 02:51:36

Imagine the following dialog.

Professorf is a function of a real variable x that takes a real parameter k.

Student: What’s a parameter?

Professor: It’s a constant that can vary.

Student: Then if it can vary, isn’t it a variable?

Professor: Sorta, but no not really.

This conversation plays out over and over, and unfortunately it often ends as it does above, with the student confused. Here’s how I believe the conversation should continue.

Professor: You’re absolutely right that f is a function of two variables, x and k. But usually k is fixed in the context of a specific application and x is not. A different application might have a different, but also fixed, value of k. So it is helpful to think of f(xk), a function of x with a parameter k, rather than f(xk), a function of two variables. The former carries more information, giving a hint as to how the numbers are used.

Is there really a difference between a parameter and a variable? In a reductionistic sense, no. But in a practical sense, yes, absolutely.

It might sound pedantic to distinguish a variable from a parameter, and it is, in the best sense of the word. Pedant literally means teacher. Usually pedantic carries a negative connotation, such as making a distinction without a difference. But here the pedant would be making a helpful distinction.

For example, we might write a probability density function as f(x; μ, σ). The function gives the probability density at a point x. The density depends on parameters μ and σ, and these parameters change between applications, but for a given application they have fixed values.

You find the probability of a random variable taking on values in an interval [ab] by integrating f over that interval. When I say that, you know that I mean you’d integrate with respect to x, because f is a function of x. It is also, in an abstract sense, a function of μ and σ, but it’s typically not useful to think of it that way.

Hypergeometric functions have two sets of parameters, and so you may see two semicolons, such as f(xabc). This denotes a function of the variable x, with upper parameters a and b, and a lower parameter c. In some abstract sense this is a function of four variables, but it acts very differently with respect to x than with respect to ab, and c. There’s also a difference between a and b on the one hand and c on the other, one worth paying attention to, though it is less of a difference than between x and the parameters collectively.

Sometimes you’ll see a vertical bar rather than a semicolon to separate variables from parameters. This works out even better for probability densities because then f(x | μ, σ) suggests the probability density of x given μ and σ since the vertical bar is also used for conditional probability. You might also see f(xa, b; c) for hypergeometric functions, with the vertical bar separating variables from parameters and the semicolon separating two kinds of parameters.

When I first saw a semicolon separating variables from parameters, no explanation was given, and I figured I could mentally replace the semicolon with a comma. Then later I realized that the semicolon was an act of kindness by the author giving the reader additional information.

The post Distinguishing variables from parameters first appeared on John D. Cook.

Silver Rectangles and the Ways of Kings

2026-06-30 22:36:14

Golden rectangles

The defining property of golden rectangle is that if you stick a square on its longer side, you get another golden rectangle.

The smaller vertical rectangle is similar to the larger horizontal rectangle. This means

φ / 1 = (1 + φ) / φ

which tells us φ² = 1 + φ and so the golden ratio φ equals (1 + √5)/2.

Silver rectangles

A silver rectangle is one that if you stick two squares on its longer side you get another rectangle with the same aspect ratio.

This tells us

σ / 1 = (1 + 2σ) / σ

and so σ² = 1 + 2σ and the silver ratio is σ = 1 + √2.

Just as you can define a golden ratio and a silver ratio, there’s an analogous way to define a sequence of metallic ratios.

Kings and Delannoy numbers

The silver ratio has several connections to the ways of ways kings. By that I mean the number of ways a king can go from one corner of a chessboard to the diagonally opposite corner without backtracking.

A king can move one space in any direction. If we start with a king in the bottom left corner of the board, the no-backtracking requirement means the king can move up, right, or up and right.

The number of paths a king can take from one corner to the opposite corner of an n × n chessboard is the nth central Delannoy number Dn. more generally Delannoy numbers are defined for an m × n chessboard, but I’ll stick to the case mn called the central Delannoy number, or just Delannoy numbers for short.

The first Delannoy number is 1 because there’s only one way for a king to get from one corner to the other: do nothing, because the opposite corner is the same corner. The second Delannoy number is 3 because the king can move up then right, or right then up, or move diagonally up and right.

For a 3 × 3 grid things are significantly more complicated, and D3 = 13. For an 8 × 8 grid the number of paths is 48,639.

Generating function

How would you estimate the number of paths on an n × n board for large values of n without calculating it exactly? You might start by finding a generating function for the Delannoy numbers, which works out to be

(x² − 6x + 1)−1/2

The radius of convergence r for the generating function series is the distance from 0 to the closest singularity of the generating function, which is the smaller root of

x² − 6x + 1

which is

3 − √8 = (3 + √8)−1 = (1 + √2)−2 = 1/σ²

i.e. the radius of convergence is the reciprocal of the silver ratio squared.

Asymptotic estimate

The radius of convergence gives us a first approximation to the asymptotic size of the series coefficients. Since we’re working with the generating function of the Delannoy numbers, these coefficients are the Delannoy numbers. That is,

Dn ~ rn = (σ2)n = σ2n.

That’s as good as you can do just knowing the radius of convergence. A more careful analysis would refine this estimate by dividing by a factor proportional to √n.

Related posts

The post Silver Rectangles and the Ways of Kings first appeared on John D. Cook.

Derivative equals inverse

2026-06-30 09:06:12

Here’s kind of a strange problem with an interesting solution: find a function f such that the derivative of f equals the inverse of f for all positive x.

f ′(x) = f−1(x)

This is a differential equation, but a very unusual one, one that cannot be solved using any of the techniques taught in a class on differential equations.

The unique solution is

f(x) = φ(x / φ)φ

where φ is the golden ratio. What an unexpected appearance of the golden ratio!

The problem was proposed by H. L. Nelson and solved by A. C. Hindmarsh. See The American Mathematical Monthly, Vol. 76, No. 6 p. 696.

The post Derivative equals inverse first appeared on John D. Cook.

Who you gonna believe: Grok or the docs?

2026-06-29 20:12:05

The calculator utility bc has a minimal math library. For example, there’s no tangent function because you’re expected take the ratio of sine and cosine. (The Gnu version of bc does have a function for tangent, but the POSIX version does not.) And yet bc includes support for Bessel functions J(x).

The bc function j takes two arguments. Is the first argument n or x? Grok said the function arguments are j(n,x). I thought I should run man bc just to make sure, and it said

j(x, n) Returns the bessel integer order n (truncated) of x.

So Grok says j(n,x) and the documentation that ships with the software says j(x,n). Which one should you believe? Neither! You should run a little test.

~$ bc -l
>>> j(1, 0)
0
>>> j(0, 1)
.76519768655796655144

Now J1(0) = 0, so apparently the first argument is the order n. Grok was right and the man page was wrong.

Groucho Marx saysing

As further confirmation, let’s see which argument is truncated.

>>> j(1.2, 3.4)
.17922585168150711099
>>> j(1, 3.4)
.17922585168150711099
>>> j(1.2, 3)
.33905895852593645892

The first argument is truncated to an integer value, so that’s the order n.

Turns out there’s a bug in the man page. The man page text above comes from running man bc on my Macbook. On my Linux box, the documentation is correct. It says

j(n,x) The Bessel function of integer order n of x.

The software produces the same results on both computers. It’s just a documentation bug.

The version running on my Macbook is the version that ships with the OS. It’s not the Gnu version, though the documentation says “This bc is compatible with both the GNU bc and the POSIX bc spec.” It has a function t for tangent, for example, which a POSIX version does not. But if you run bc --standard -l attempting to call t produces an error.

The post Who you gonna believe: Grok or the docs? first appeared on John D. Cook.