MoreRSS

site iconJohn D. CookModify

I have decades of consulting experience helping companies solve complex problems involving applied math, statistics, and data privacy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of John D. Cook

Most popular posts of 2024

2024-12-24 22:16:02

I looked at Hacker News to see which posts on this site were most popular. I didn’t look at my server logs, but generally the posts that get the most traffic are posts that someone submits to Hacker News.

Older posts popular this year

Two posts written earlier got a lot of traffic this year, namely

Writes large correct programs

from 2008 and

Where has all the productivity gone?

from 2021.

Posts written this year

The most popular post this year, at least on Hacker News, was

Why does FM sound better than AM?

The runner up was

Evaluating a class of infinite sums in closed form

The following post looks at a way for a satellite to move from one orbit to another that under some circumstances is more efficient (in terms of fuel, not in terms of time) than the more common Hohmann transfer maneuver.

Efficiently transferring to a much higher orbit

This post considers interpolation as a form of compression. Instead of saving a table of function values at fine-grained intervals, you could store values at points further apart and store interpolation formulas for recovering the lost precision.

Compression and interpolation

One of the arguments between Frequentist and Bayesian statisticians is whether you should be allowed to look at data as it accrues during an experiment, such as in A/B testing. If you do look at the interim data, how should you analyze it and how should you interpret the results?

Can you look at experimental results along the way or not?

Finally, I wrote a post about solving a problem I ran into with the command line utility find. As is often the case, I got a lot of useful feedback.

Resolving a mysterious problem with find

 

The post Most popular posts of 2024 first appeared on John D. Cook.

Series for the reciprocal of the gamma function

2024-12-24 21:41:22

Stirling’s asymptotic series for the gamma function is

\Gamma(z) \sim (2\pi)^{1/2} z^{z - 1/2} e^{-z} \sum_{n=0}^\infty (-1)^n \frac{\gamma_n}{z^n}

Now suppose you’d like to find an asymptotic series for the function 1/Γ(z).

Since the series for Γ has the form f(z) times an infinite sum, it would make sense to look for a series for 1/Γ of the form 1/f(z) times an infinite sum. The hard part would be finding the new infinite sum. In general the series for function and the series for its reciprocal look nothing alike.

Here’s where we have a pleasant surprise: the coefficients in the series for 1/Γ are exactly the same as the coefficients in the series for Γ, except the signs don’t alternate.

\frac{1}{\Gamma(z)} \sim (2\pi)^{-1/2} z^{-z +1/2} e^{z} \sum_{n=0}^\infty \frac{\gamma_n}{z^n}

Illustration

The following is not a proof, but it shows that the result is at least plausible.

Define Γ* to be Γ divided by the term in front of the infinite series:

\Gamma^*(z) = (2\pi)^{-1/2} z^{-z +1/2} e^{z} \Gamma(z)

Then the discussion above claims that Γ* and 1/Γ* have the same asymptotic series, except with alternating signs on the coefficients. So if we multiply the first few terms of the series for Γ* and 1/Γ* we expect to get something approximately equal to 1.

Now

\Gamma^*(z) = 1 + \frac{1}{12z} + \frac{1}{288z^2} - \frac{139}{51840z^3} - \cdots

and we claim

\frac{1}{\Gamma^*(z)} = 1 - \frac{1}{12z} + \frac{1}{288z^2} + \frac{139}{51840z^3} - \cdots

So if we multiply the terms up to third order we expect to get 1 and some terms involving powers of z in the denominator with exponent greater than 3. In fact the product equals

1 + \frac{571}{1244160 z^4} -\frac{19321}{2687385600 z^6}

which aligns with our expectations.

The post Series for the reciprocal of the gamma function first appeared on John D. Cook.

Starlink configurations

2024-12-23 22:05:16

My nephew recently told me about being on a camping trip and seeing a long line of lights in the sky. The lights turned out to be Starlink satellites. It’s fairly common for people report seeing lines of these satellites.

Four lights in the sky in a line

Why would the satellites be in a line? Wouldn’t it be much more efficient to spread them out? They do spread out, but they’re launched in groups. Satellites released into orbit at the same time initially orbit in a line close together.

It would seem the optimal strategy would be to spread communication satellites out evenly in a sphere. There are several reasons why that is neither desirable or possible. It is not desirable because human population is far from evenly distributed. It’s very nice to have some coverage over the least-populated places on earth, such as Antarctica, but there is far more demand for service over the middle latitudes.

It is not possible to evenly distribute more than 20 points on a sphere, and so it would not be possible to spread out thousands of satellites perfectly evenly. However there are ways to arbitrarily many points somewhat evenly, such as in a Fibonacci lattice.

It’s also not possible to distribute satellites in a static configuration. Unless a satellite is in geostationary orbit, it will constantly move relative to the earth. One problem with geostationary orbit is that it is at an altitude of 42,000 km. Starlink satellites are in low earth orbit (LEO) between 300 km and 600 km altitude. It is less expensive to put satellites into LEO and there is less latency bouncing signals off satellites closer to the ground.

Satellites orbit at different altitudes, and altitude and velocity are tightly linked. You want satellites orbiting at different altitudes to avoid collisions, they’re orbiting at different velocities. Even if you wanted all satellites to orbit at the same altitude, this would require constant maintenance due to various real-world departures from ideal Keplerian conditions. Satellites are going to move around relative to each other whether you want them to or not.

Related posts

The post Starlink configurations first appeared on John D. Cook.

Putting a face on a faceless account

2024-12-21 04:20:43

I’ve been playing around with Grok today, logging into some of my X accounts and trying out the prompt “Draw an image of me based on my posts.” [1] In most cases Grok returned a graphic, but sometimes it would respond with a text description. In the latter case asking for a photorealistic image made it produce a graphic.

Here’s what I get for @AlgebraFact:

The icons for all my accounts are cerulean blue dots with a symbol in the middle. Usually Grok picks up on the color, as above. With @AnalysisFact, it dropped a big blue piece of a circle on the image.

For @UnixToolTip it kept the & from the &> in the icon. Generative AI typically does weird things with text in images, but it picked up “awk” correctly.

Here’s @ProbFact. Grok seems to think it’s a baseball statistics account.

Last but not least, here’s @DataSciFact.

I wrote a popular post about how to put Santa hats on top of symbols in LaTeX, and that post must have had an outsided influence on the image Grok created.

[1] Apparently if you’re logging into account A and ask it to draw B, the image will be heavily influence by A‘s posts, not B‘s. You have to log into B and ask in the first person.

The post Putting a face on a faceless account first appeared on John D. Cook.

Can AI models reason: Just a stochastic parrot?

2024-12-20 02:14:58

OpenAI has just released its full o1 model—a new kind of model that is more capable of multi-step reasoning than previous models. Anthropic, Google and others are no doubt working on similar products. At the same time, it’s hotly debated in many quarters whether AI models actually “reason” in a way similar to humans.

Emily Bender and her colleagues famously described large language models as nothing more than “stochastic parrots“—systems that simply repeat their training data blindly, based on a statistical model, with no real understanding (reminiscent of the Chinese Room experiment). Others have made similar comments, describing LLMs as “n-gram models on steroids” or a “fancy extrapolation algorithm.

There is of course some truth to this. AI models sometimes generate remarkable results and yet lack certain basic aspects of understanding that might inhibit their sometimes generation of nonsensical results. More to the point of “parroting” the training data, recent work from Yejin Choi’s group has shown how LLMs at times will cut and paste snippets from its various training documents, almost verbatim, to formulate its outputs.

Are LLMs (just) glorified information retrieval tools?

The implication of these concerns is that an LLM can “only” repeat back what it was taught (albeit with errors). However this view does not align with the evidence. LLM training is a compression process in which new connections between pieces of information are formed that were not present in the original data. This is evidenced both mathematically and anecdotally. In my own experience, I’ve gotten valid answers to such obscure and detailed technical question that it is hard for me to believe would exist in any training data in exactly that form. Whether you would call this “reasoning” or not might be open to debate, but regardless of what you call it, it is something more than just unadorned information retrieval like a “stochastic parrot.”

What is your experience? Let us know in the comments.

The post Can AI models reason: Just a stochastic parrot? first appeared on John D. Cook.

Interval arithmetic and fixed points

2024-12-19 22:40:22

A couple days ago I analyzed the observation that repeatedly pressing the cosine key on a calculator leads to a fixed point. After about 90 iterations the number no longer changes. This post will analyze the same phenomenon a different way.

Interval arithmetic

Interval arithmetic is a way to get exact results of a sort from floating point arithmetic.

Suppose you start with a number x that cannot be represented exactly as a floating point number, and you want to compute f(x) for some function f. You can’t represent x exactly, but unless x is too large you can represent a pair of numbers a and b such that x is certainly in the interval [a, b]. Then f(x) is in the set f( [a, b] ).

Maybe you can represent f( [a, b] ) exactly. If not, you can enlarge the interval a bit to exactly represent an interval that contains f(x). After applying several calculations, you have an interval, hopefully one that’s not too big, containing the exact result.

(I said above that interval arithmetic gives you exact results of a sort because even though you don’t generally get an exact number at the end, you do get an exact interval containing the result.)

Cosine iteration

In this post we will use interval arithmetic, not to compensate for the limitations of computer arithmetic, but to illustrate the convergence of iterated cosines.

The cosine of any real number lies in the interval [−1, 1]. To put it another way,

cos( [−∞, ∞] ) = [−1, 1].

Because cosine is an even function,

cos( [−1, 1] ) = cos( [0, 1] )

and so we can limit our attention to the interval [0, 1].

Now the cosine is a monotone decreasing function from 0 to π, and so it’s monotone on [0, 1]. For any two points with 0 ≤ ab ≤ π we have

cos( [a, b] ) = [cos(b), cos(a)].

Note that the order of a and b reverses on the right hand side of the equation because cosine is decreasing. When we apply cosine again we get back the original order.

cos(cos( [a, b] )) = [cos(cos(a)), cos(cos(b))].

Incidentally, this flip-flop is explains why the cobweb plot from the previous post looks like a spiral rather than a staircase.

Now define a0 = 0, b0 = 1, and

[an+1, bn+1] = cos( [an, bn] ) = [cos(bn), cos(an)].

We could implement this in Python with a pair of mutually recursive functions.

    a = lambda n: 0 if n == 0 else cos(b(n-1))
    b = lambda n: 1 if n == 0 else cos(a(n-1))

Here’s a plot of the image of [0, 1] after n iterations.

Note that odd iterations increase the lower bound and even iterations decrease the upper bound.

Numerical interval arithmetic

This post introduced interval arithmetic as a numerical technique, then proceeded to do pure math. Now let’s think about computing again.

The image of [0, 1] under cosine is [cos(1), cos(0)] = [cos(1), 1]. A computer can represent 1 exactly but not cos(1). Suppose we compute

cos(1) = 0.5403023058681398

and assume each digit in the result is correct. Maybe the exact value of cos(1) was slightly smaller and was rounded to this value, but we know for sure that

cos( [0, 1] ) ⊂ [0.5403023058681397, 1]

So in this case we don’t know the image of [0, 1], but we know an interval that contains the image, hence the subset symbol.

We could iterate this process, next computing an interval that contains

cos( [0.5403023058681397, 1] )

and so forth. At each step we would round the left endpoint down to the nearest representable lower bound and round the right endpoint up to the nearest representable upper bound. In practice we’d be concerned with machine representable numbers rather than decimal representable numbers, but the principle is the same.

The potential pitfall of interval arithmetic in practice is that intervals may grow so large that the final result is not useful. But that’s not the case here. The rounding error at each step is tiny, and contraction maps reduce errors at each step rather than magnifying them. In a more complicated calculation, we might have to resort to lose estimates and not have such tight intervals at each step.

Related posts

The post Interval arithmetic and fixed points first appeared on John D. Cook.