MoreRSS

site iconJohn D. CookModify

I have decades of consulting experience helping companies solve complex problems involving applied math, statistics, and data privacy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of John D. Cook

4-bit floating point FP4

2026-04-18 09:08:54

In ancient times, floating point numbers were stored in 32 bits. Then somewhere along the way 64 bits became standard. The C programming language retains the ancient lore, using float to refer to a 32-bit floating point number and double to refer to a floating point number with double the number of bits. Python simply uses float to refer to the most common floating point format, which C calls double.

Programmers were grateful for the move from 32-bit floats to 64-bit floats. It doesn’t hurt to have more precision, and some numerical problems go away when you go from 32-bits to 64-bits. (Though not all. Something I’ve written about numerous times.)

Neural networks brought about something extraordinary: demand for floating point numbers with less precision. These networks have an enormous number of parameters, and its more important to fit more parameters into memory than it is to have higher precision parameters. Instead of double precision (64 bits) developers wanted half precision (16 bits), or even less, such as FP8 (8 bits) or FP4 (4 bits). This post will look at 4-bit floating point numbers.

Why even bother with floating point numbers when you don’t need much precision? Why not use integers? For example, with four bits you could represent the integers 0, 1, 2, … 15. You could introduce a bias, subtracting say 7 from each value, so your four bits represent −7 through 8. Turns out it’s useful to have a more dynamic range.

FP4

Signed 4-bit floating point numbers in FP4 format use the first bit to represent the sign. The question is what to do with the remaining three bits. The notation ExMm denotes a format with x exponent bits and m mantissa bits. In the context of signed 4-bit numbers,

xy = 3

but in other contexts the sum could be larger. For example, for an 8-bit signed float, xy = 7.

For 4-bit signed floats we have four possibilities: E3M0, E2M1, E1M2, and E0M3. All are used somewhere, but E2M1 is the most common and is supported in Nvidia hardware.

A number with sign bit s, exponent e, and mantissa m has the value

(−1)s 2eb (1 + m/2)

where b is the bias. The purpose of the bias is to allow positive and negative exponents without using signed numbers for e. So, for example, if b = 1 and e = 1, 2, or 3 then the exponent part 2eb can represent 1, 2, or 4.

The bias impacts the range of possible numbers but not their relative spacing. For any value of bias b, the E3M0 format is all exponent, no mantissa, and so its possible values are uniformly distributed on a log scale. The E0M3 format is all mantissa, so its values are uniformly distributed on a linear scale. The E1M2 and E2M1 formats are unevenly spaced on both log and linear scales.

There is an exception to the expression above for converting (sem) into a real number when e = 0. In that case, m = 0 represents 0 and m = 1 represents ½.

Table of values

Since there are only 16 possible FP4 numbers, it’s possible to list them all. Here is a table for the E2M1 format.

Bits s exp m  Value
-------------------
0000 0  00 0     +0
0001 0  00 1   +0.5
0010 0  01 0     +1
0011 0  01 1   +1.5
0100 0  10 0     +2
0101 0  10 1     +3
0110 0  11 0     +4
0111 0  11 1     +6
1000 1  00 0     -0
1001 1  00 1   -0.5
1010 1  01 0     -1
1011 1  01 1   -1.5
1100 1  10 0     -2
1101 1  10 1     -3
1110 1  11 0     -4
1111 1  11 1     -6

Note that even in this tiny floating point format, there are two zeros, +0 and −0, just like full precision floats. More on that here.

Pychop library

The Python library Pychop emulates a wide variety of reduced-precision floating point formats. Here is the code that used Pychop to create the table above.

import pychop

# Pull the format metadata from Pychop.
spec = pychop.MX_FORMATS["mxfp4_e2m1"]
assert (spec.exp_bits, spec.sig_bits) == (2, 1)

def e2m1_value(s: int, e: int, m: int) -> float:
    sign = -1.0 if s else 1.0

    # Subnormal / zero
    if e == 0:
        return sign * (m / 2.0)

    # Normal
    return sign * (2.0 ** (e - 1)) * (1.0 + m / 2.0)

def display_value(bits: int, x: float) -> str:
    if bits == 0b0000:
        return "+0"
    if bits == 0b1000:
        return "-0"
    return f"{x:+g}"

rows = []
for bits in range(16):
    s = (bits >> 3) & 0b1
    e = (bits >> 1) & 0b11
    m = bits & 0b1
    x = e2m1_value(s, e, m)

    rows.append(
        {
            "Bits": f"{bits:04b}",
            "s": s,
            "exp_bits": f"{e:02b}",
            "m": m,
            "Value": display_value(bits, x),
        }
    )

# Pretty-print the table.
header = f"{'Bits':<4} {'s':>1} {'exp':>3} {'m':>1} {'Value':>6}"
print(header)
print("-" * len(header))
for row in rows:
    print(
        f"{row['Bits']:<4} " f"{row['s']:>1} "
        f"{row['exp_bits']:>3} "
        f"{row['m']:>1} "
        f"{row['Value']:>6}"
    )

Other formats

FP4 isn’t the only 4-bit floating point format. There’s a surprisingly large number of formats in use. I intend to address another format my next post.

The post 4-bit floating point FP4 first appeared on John D. Cook.

Newton diameters

2026-04-16 20:02:13

Let f(xy) be an nth degree polynomial in x and y. In general, a straight line will cross the zero set of f in n locations [1].

Newton defined a diameter to be any line that crosses the zero set of f exactly n times. If

f(xy) = x² + y² − 1

then the zero set of f is a circle and diameters of the circle in the usual sense are diameters in Newton’s sense. But Newton’s notion of diameter is more general, including lines the cross the circle without going through the center.

Newton’s theorem of diameters says that if you take several parallel diameters (in his sense of the word), the centroids of the intersections of each diameter with the curve f(xy) = 0 all line on a line.

To illustrate this theorem, let’s look at the elliptic curve

y² = x³ − 2x + 1,

i.e. the zeros of f(xy) = y² − (x³ − 2x + 1). This is a third degree curve, and so in general a straight line will cross the curve three times [2].

The orange, green, and red lines are parallel, each intersecting the blue elliptic curve three times. The dot on each line is the centroid of the intersection points, the center of mass if you imagine each intersection to be a unit point mass. The centroids all lie on a line, a vertical line in this example though in general the line could have any slope.

I hadn’t seen this theorem until I ran across it recently when skimming [3]. Search results suggest the theorem isn’t widely known, which is surprising for a result that goes back to Newton.

Related posts

[1] Bézout’s theorem says a curve of degree m and a curve of degee n will always intersect in mn points. But that includes complex roots, adds a line at infinity, and counts intersections with multiplicity. So a line, a curve of degree 1, will intersect a curve of degree n at n points in this extended sense.

[2] See the description of Bézout’s theorem in the previous footnote. In the elliptic curve example, the parallel lines meet at a point at infinity. A line that misses the closed component of the elliptic curve and only passes through the second component has 1 real point of intersection but there would be 2 more if we were working in ℂ² rather than ℝ².

In algebraic terms, the system of equations

y² = x³ − 2x + 1
3y = 2x + k

has three real solutions for small values of k, but for sufficiently large values of |k| two of the solutions will be complex.

[3] Mathematics: Its Content, Methods, and Meaning. Edited by A. D. Aleksandrov, A. N. Kolmogorov, and M. A. Lavrent’ev. Volume 1.

The post Newton diameters first appeared on John D. Cook.

Intersecting spheres and GPS

2026-04-14 22:08:36

If you know the distance d to a satellite, you can compute a circle of points that passes through your location. That’s because you’re at the intersection of two spheres—the earth’s surface and a sphere of radius d centered on the satellite—and the intersection of two spheres is a circle. Said another way, one observation of a satellite determines a circle of possible locations.

If you know the distance to a second satellite as well, then you can find two circles that contain your location. The two circles intersect at two points, and you know that you’re at one of two possible positions. If you know your approximate position, you may be able to rule out one of the intersection points.

If you know the distance to three different satellites, now you know three circles that you’re standing on, and the third circle will only pass through one of the two points determined by the first two satellites. Now you know exactly where you are.

Knowing the distance to more satellites is even better. In theory additional observations are redundant but harmless. In practice, they let you partially cancel out inevitable measurement errors.

If you’re not on the earth’s surface, you’re still at the intersection of n spheres if you know the distance to n satellites. If you’re in an airplane, or on route to the moon, the same principles apply.

Errors and corrections

How do you know the distance to a satellite? The satellite can announce what time it is by its clock, then when you receive the announcement you compare it to the time by your clock. The difference between the two times tells you how long the radio signal traveled. Multiply by the speed of light and you have the distance.

However, your clock will probably not be exactly synchronized with the satellite clock. Observing a fourth satellite can fix the problem of your clock not being synchronized with the satellite clocks. But it doesn’t fix the more subtle problems of special relativity and general relativity. See this post by Shri Khalpada for an accessible discussion of the physics.

Numerical computation

Each distance measurement gives you an equation:

|| xsi || = di

where si is the location of the ith satellite and di is your distance to that satellite. If you square both sides of the equation, you have a quadratic equation. You have to solve a system of nonlinear equations, and yet there is a way to transform the problem into solving linear equations, i.e. using linear algebra. See this article for details.

Related posts

The post Intersecting spheres and GPS first appeared on John D. Cook.

Finding a parabola through two points with given slopes

2026-04-14 20:19:42

The Wikipedia article on modern triangle geometry has an image labeled “Artzt parabolas” with no explanation.

A quick search didn’t turn up anything about Artzt parabolas [1], but apparently the parabolas go through pairs of vertices with tangents parallel to the sides.

The general form of a conic section is

ax² + bxy + cy² + dx + ey + f = 0

and the constraint b² = 4ac means the conic will be a parabola.

We have 6 parameters, each determined only up to a scaling factor; you can multiply both sides by any non-zero constant and still have the same conic. So a general conic has 5 degrees of freedom, and the parabola condition b² = 4ac takes us down to 4. Specifying two points that the parabola passes through takes up 2 more degrees of freedom, and specifying the slopes takes up the last two. So it’s plausible that there is a unique solution to the problem.

There is indeed a solution, unique up to scaling the parameters. The following code finds parameters of a parabola that passes through (xi, yi) with slope mi for i = 1, 2.

def solve(x1, y1, m1, x2, y2, m2):
    
    Δx = x2 - x1
    Δy = y2 - y1
    λ = 4*(Δx*m1 - Δy)*(Δx*m2 - Δy)/(m1 - m2)**2
    k = x2*y1 - x1*y2

    a = Δy**2 + λ*m1*m2
    b = -2*Δx*Δy - λ*(m1 + m2)
    c = Δx**2 + λ
    d =  2*k*Δy + λ*(m1*y2 + m2*y1 - m1*m2*(x1 + x2))
    e = -2*k*Δx + λ*(m1*x1 + m2*x2 - y1 - y2)
    f = k**2 + λ*(m1*x1 - y1)*(m2*x2 - y2)

    return (a, b, c, d, e, f)

[1] The page said “Artz” when I first looked at it, but it has since been corrected to “Artzt”. Maybe I didn’t find anything because I was looking for the wrong spelling.

The post Finding a parabola through two points with given slopes first appeared on John D. Cook.

Mathematical minimalism

2026-04-13 22:33:16

Andrzej Odrzywolek recently posted an article on arXiv showing that you can obtain all the elementary functions from just the function

\operatorname{eml}(x,y) = \exp(x) - \log(y)

and the constant 1. The following equations, taken from the paper’s supplement, show how to bootstrap addition, subtraction, multiplication, and division from the eml function.

\begin{align*} \exp(z) &\mapsto \operatorname{eml}(z,1) \\ \log(z) &\mapsto \operatorname{eml}(1,\exp(\operatorname{eml}(1,z))) \\ x - y &\mapsto \operatorname{eml}(\log(x),\exp(y)) \\ -z &\mapsto (\log 1) - z \\ x + y &\mapsto x - (-y) \\ 1/z &\mapsto \exp(-\log z) \\ x \cdot y &\mapsto \exp(\log x + \log y) \end{align*}

See the paper and supplement for how to obtain constants like π and functions like square and square root, as well as the standard circular and hyperbolic functions.

Related posts

The post Mathematical minimalism first appeared on John D. Cook.

Lunar period approximations

2026-04-13 07:42:01

The date of Easter

The church fixed Easter to be the first Sunday after the first full moon after the Spring equinox. They were choosing a date in the Roman (Julian) calendar to commemorate an event whose date was known according to the Jewish lunisolar calendar, hence the reference to equinoxes and full moons.

The previous post explained why the Eastern and Western dates of Easter differ. The primary reason is that both churches use March 21 as the first day of Spring, but the Eastern church uses March 21 on the Julian calendar and the Western church uses March 21 on the Gregorian calendar.

But that’s not the only difference. The churches chose different algorithms for calculating when the first full moon would be. The date of Easter doesn’t depend on the date of the full moon per se, but the methods used to predict full moons.

This post will show why determining the date of the full moon is messy.

Lunation length

The moon takes between 29 and 30 days between full moons (or between new moons, which are easier to objectively measure). This period is called a lunation. The average length of a lunation is L = 29.530588853 days. This is not a convenient number to work with, and so there’s no simple way of reconciling the orbital period of the moon with the rotation period of the earth [1]. Lunar calendars alternate months with 29 and 30 days, but they can’t be very accurate, so they have to have some fudge factor analogous to leap years.

The value of L was known from ancient times. Meton of Athens calculated in 432 BC that 235 lunar cycles equaled 19 tropical years or 6940 days. This corresponds to L ≈ 29.5319. Around a century later the Greek scholar Callippus refined this to 940 cycles in 76 years or 27,759 days. This corresponds to L ≈ 29.53085.

The problem wasn’t knowing L but devising a convenient way of working with L. There is no way to work with lunations that is as easy as the way the Julian (or even the more complicated Gregorian) calendar reconciles days with years.

Approximations

Let’s look at the accuracy of several approximations for L. We’d like an approximation that is not only accurate in an absolute sense, but also accurate relative to its complexity. The complexity of a fraction is measured by a height function. We’ll use what’s called the “classic” height function: log( max(n, d) ) where n and d are the numerator and denominator of a fraction. Since we’re approximating a number bigger than 1, this will be simply log(n).

We will compare the first five convergents, approximations that come from the continued fraction form of L, and the approximations of Meton and Callippus. Here’s a plot.

And here’s the code that produced the plot, showing the fractions used.

from numpy import log
import matplotlib.pyplot as plt

fracs = [
    (30, 1), 
    (59, 2),
    (443, 15),
    (502, 17),
    (1447, 49),
    (6940, 235),
    (27759, 940)
]

def error(n, d):
    L = 29.530588853    
    return abs(n/d - L)

for f in fracs:
    plt.plot(log(f[0]), log(error(*f)), 'o')
plt.xlabel("log numerator")
plt.ylabel("log error")
plt.show()

The approximation 1447/49 is the best by far, both in absolute terms and relative to the size of the numerator. But it’s not very useful for calendar design because 1447 is not nicely related to the number of days in a year.

 

[1] The time between full moons is a synodic month, the time it takes for the moon to return to the same position relative to the sun. This is longer than a sidereal month, the time it takes the moon to complete one orbit relative to the fixed stars.

The post Lunar period approximations first appeared on John D. Cook.