2025-05-21 22:28:31
We humans have perhaps 100 billion neurons in our brains. But what if we had many more? Or what if the AIs we built effectively had many more? What kinds of things might then become possible? At 100 billion neurons, we know, for example, that compositional language of the kind we humans use is possible. At the 100 million or so neurons of a cat, it doesn’t seem to be. But what would become possible with 100 trillion neurons? And is it even something we could imagine understanding?
My purpose here is to start exploring such questions, informed by what we’ve seen in recent years in neural nets and LLMs, as well as by what we now know about the fundamental nature of computation, and about neuroscience and the operation of actual brains (like the one that’s writing this, imaged here):
One suggestive point is that as artificial neural nets have gotten bigger, they seem to have successively passed a sequence of thresholds in capability:
So what’s next? No doubt there’ll be things like humanoid robotic control that have close analogs in what we humans already do. But what if we go far beyond the ~1014 connections that our human brains have? What qualitatively new kinds of capabilities might there then be?
If this was about “computation in general” then there wouldn’t really be much to talk about. The Principle of Computational Equivalence implies that beyond some low threshold computational systems can generically produce behavior that corresponds to computation that’s as sophisticated as it can ever be. And indeed that’s the kind of thing we see both in lots of abstract settings, and in the natural world.
But the point here is that we’re not dealing with “computation in general”. We’re dealing with the kinds of computations that brains fundamentally do. And the essence of these seems to have to do with taking in large amounts of sensory data and then coming up with what amount to decisions about what to do next.
It’s not obvious that there’d be any reasonable way to do this. The world at large is full of computational irreducibility—where the only general way to work out what will happen in a system is just to run the underlying rules for that system step by step and see what comes out:
And, yes, there are plenty of questions and issues for which there’s essentially no choice but to do this irreducible computation—just as there are plenty of cases where LLMs need to call on our Wolfram Language computation system to get computations done. But brains, for the things most important to them, somehow seem to routinely manage to “jump ahead” without in effect simulating every detail. And what makes this possible is the fundamental fact that within any system that shows overall computational irreducibility there must inevitably be an infinite number of “pockets of computational reducibility”, in effect associated with “simplifying features” of the behavior of the system.
It’s these “pockets of reducibility” that brains exploit to be able to successfully “navigate” the world for their purposes in spite of its “background” of computational irreducibility. And in these terms things like the progress of science (and technology) can basically be thought of as the identification of progressively more pockets of computational reducibility. And we can then imagine that the capabilities of bigger brains could revolve around being able to “hold in mind” more of these pockets of computational reducibility.
We can think of brains as fundamentally serving to “compress” the complexity of the world, and extract from it just certain features—associated with pockets of reducibility—that we care about. And for us a key manifestation of this is the idea of concepts, and of language that uses them. At the level of raw sensory input we might see many detailed images of some category of thing—but language lets us describe them all just in terms of one particular symbolic concept (say “rock”).
In a rough first approximation, we can imagine that there’s a direct correspondence between concepts and words in our language. And it’s then notable that human languages all tend to have perhaps 30,000 common words (or word-like constructs). So is that scale the result of the size of our brains? And could bigger brains perhaps deal with many more words, say millions or more?
“What could all those words be about?” we might ask. After all, our everyday experience makes it seem like our current 30,000 words are quite sufficient to describe the world as it is. But in some sense this is circular: we’ve invented the words we have because they’re what we need to describe the aspects of the world we care about, and want to talk about. There will always be more features of, say, the natural world that we could talk about. It’s just that we haven’t chosen to engage with them. (For example, we could perfectly well invent words for all the detailed patterns of clouds in the sky, but those patterns are not something we currently feel the need to talk in detail about.)
But given our current set of words or concepts, is there “closure” to it? Can we successfully operate in a “self-consistent slice of concept space” or will we always find ourselves needing new concepts? We might think of new concepts as being associated with intellectual progress that we choose to pursue or not. But insofar as the “operation of the world” is computationally irreducible it’s basically inevitable that we’ll eventually be confronted with things that cannot be described by our current concepts.
So why is it that the number of concepts (or words) isn’t just always increasing? A fundamental reason is abstraction. Abstraction takes collections of potentially large numbers of specific things (“tiger”, “lion”, …) and allows them to be described “abstractly” in terms of a more general thing (say, “big cats”). And abstraction is useful if it’s possible to make collective statements about those general things (“all big cats have…”), in effect providing a consistent “higher-level” way of thinking about things.
If we imagine concepts as being associated with particular pockets of reducibility, the phenomenon of abstraction is then a reflection of the existence of networks of these pockets. And, yes, such networks can themselves show computational irreducibility, which can then have its own pockets of reducibility, etc.
So what about (artificial) neural nets? It’s routine to “look inside” these, and for example see the possible patterns of activation at a given layer based on a range of possible (“real-world”) inputs. We can then think of these patterns of activation as forming points in a “feature space”. And typically we’ll be able to see clusters of these points, which we can potentially identify as “emergent concepts” that we can view as having been “discovered” by the neural net (or rather, its training). Normally there won’t be existing words in human languages that correspond to most of these concepts. They represent pockets of reducibility, but not ones that we’ve identified, and that are captured by our typical 30,000 or so words. And, yes, even in today’s neural nets, there can easily be millions of “emergent concepts”.
But will these be useful abstractions or concepts, or merely “incidental examples of compression” not connected to anything else? The construction of neural nets implies that a pattern of “emergent concepts” at one layer will necessarily feed into the next layer. But the question is really whether the concept can somehow be useful “independently”—not just at this particular place in the neural net.
And indeed the most obvious everyday use for words and concepts—and language in general—is for communication: for “transferring thoughts” from one mind to another. Within a brain (or a neural net) there are all kinds of complicated patterns of activity, different in each brain (or each neural net). But a fundamental role that concepts, words and language play is to define a way to “package up” certain features of that activity in a form that can be robustly transported between minds, somehow inducing “comparable thoughts” in all of them.
The transfer from one mind to another can never be precise: in going from the pattern of activity in one brain (or neural net) to the pattern of activity in another, there’ll always be translation involved. But—at least up to a point—one can expect that the “more that’s said” the more faithful a translation can be.
But what if there’s a bigger brain, with more “emergent concepts” inside? Then to communicate about them at a certain level of precision we might need to use more words—if not a fundamentally richer form of language. And, yes, while dogs seem to understand isolated words (“sit”, “fetch”, …), we, with our larger brains, can deal with compositional language in which we can in effect construct an infinite range of meanings by combining words into phrases, sentences, etc.
At least as we currently imagine it, language defines a certain model of the world, based on some finite collection of primitives (words, concepts, etc.). The existence of computational irreducibility tells us that such a model can never be complete. Instead, the model has to “approximate things” based on the “network of pockets of reducibility” that the primitives in the language effectively define. And insofar as a bigger brain might in essence be able to make use of a larger network of pockets of reducibility, it can then potentially support a more precise model of the world.
And it could then be that if we look at such a brain and what it does, it will inevitably seem closer to the kind of “incomprehensible and irreducible computation” that’s characteristic of so many abstract systems, and systems in nature. But it could also be that in being a “brain-like construct” it’d necessarily tap into computational reducibility in such a way that—with the formalism and abstraction we’ve built—we’d still meaningfully be able to talk about what it can do.
At the outset we might have thought any attempt for us to “understand minds beyond ours” would be like asking a cat to understand algebra. But somehow the universality of the concepts of computation that we now know—with their ability to address the deepest foundations of physics and other fields—makes it seem more plausible we might now be in a position to meaningfully discuss minds beyond ours. Or at least to discuss the rather more concrete question of what brains like ours, but bigger than ours, might be able to do.
As we’ve mentioned, at least in a rough approximation, the role of brains is to turn large amounts of sensory input into small numbers of decisions about what to do. But how does this happen?
Human brains continually receive input from a few million “sensors”, mostly associated with photoreceptors in our eyes and touch receptors in our skin. This input is processed by a total of about 100 billion neurons, each responding in a few milliseconds, and mostly organized into a handful of layers. There are altogether perhaps 100 trillion connections between neurons, many quite long range. At any given moment, a few percent of neurons (i.e. perhaps a billion) are firing. But in the end, all that activity seems to feed into particular structures in the lower part of the brain that in effect “take a majority vote” a few times a second to determine what to do next—in particular with the few hundred “actuators” our bodies have.
This basic picture seems to be more or less the same in all higher animals. The total number of neurons scales roughly with the number of “input sensors” (or, in a first approximation, the surface area of the animal—i.e. volume2/3—which determines the number of touch sensors). The fraction of brain volume that consists of connections (“white matter”) as opposed to main parts of neurons (“gray matter”) increases as a power of the number of neurons. The largest brains—like ours—have a roughly nested pattern of folds that presumably reduce average connection lengths. Different parts of our brains have characteristic functions (e.g. motor control, handling input from our eyes, generation of language, etc.), although there seems to be enough universality that other parts can usually learn to take over if necessary. And in terms of overall performance, animals with smaller brains generally seem to react more quickly to stimuli.
So what was it that made brains originally arise in biological evolution? Perhaps it had to do with giving animals a way to decide where to go next as they moved around. (Plants, which don’t move around, don’t have brains.) And perhaps it’s because animals can’t “go in more than one direction at once” that brains seem to have the fundamental feature of generating a single stream of decisions. And, yes, this is probably why we have a single thread of “conscious experience”, rather than a whole collection of experiences associated with the activities of all our neurons. And no doubt it’s also what we leverage in the construction of language—and in communicating through a one-dimensional sequence of tokens.
It’s notable how similar our description of brains is to the basic operation of large language models: an LLM processes input from its “context window” by feeding it through large numbers of artificial neurons organized in layers—ultimately taking something like a majority vote to decide what token to generate next. There are differences, however, most notably that whereas brains routinely intersperse learning and thinking, current LLMs separate training from operation, in effect “learning first” and “thinking later”.
But almost certainly the core capabilities of both brains and neural nets don’t depend much on the details of their biological or architectural structure. It matters that there are many inputs and few outputs. It matters that there’s irreducible computation inside. It matters that the systems are trained on the world as it is. And, finally, it matters how “big” they are, in effect relative to the “number of relevant features of the world”.
In artificial neural nets, and presumably also in brains, memory is encoded in the
strengths (or “weights”) of connections between neurons. And at least in neural nets it seems that the number of tokens (of textual data) that can reasonably be “remembered” is a few times the number of weights. (With current methods, the number of computational operations of training needed to achieve this is roughly the product of the total number of weights and the total number of tokens.) If there are too few weights, what happens is that the “memory” gets fuzzy, with details of the fuzziness reflecting details of the structure of the network.
But what’s crucial—for both neural nets and brains—is not so much to remember specifics of training data, but rather to just “do something reasonable” for a wide range of inputs, regardless of whether they’re in the training data. Or, in other words, to generalize appropriately from training data.
But what is “appropriate generalization”? As a practical matter, it tends to be “generalization that aligns with what we humans would do”. And it’s then a remarkable fact that artificial neural nets with fairly simple architectures can successfully do generalizations in a way that’s roughly aligned with human brains. So why does this work? Presumably it’s because there are universal features of “brain-like systems” that are close enough between human brains and neural nets. And once again it’s important to emphasize that what’s happening in both cases seems distinctly weaker than “general computation”.
A feature of “general computation” is that it can potentially involve unbounded amounts of time and storage space. But both brains and typical neural nets have just a fixed number of neurons. And although both brains and LLMs in effect have an “outer loop” that can “recycle” output to input, it’s limited.
And at least when it comes to brains, a key feature associated with this is the limit on “working memory”, i.e. memory that can readily be both read and written “in the course of a computation”. Bigger and more developed brains typically seem to support larger amounts of working memory. Adult humans can remember perhaps 5 or 7 “chunks” of data in working memory; for young children, and other animals, it’s less. Size of working memory (as we’ll discuss later) seems to be important in things like language capabilities. And the fact that it’s limited is no doubt one reason we can’t generally “run code in our brains”.
As we try to reflect on what our brains do, we’re most aware of our stream of conscious thought. But that represents just a tiny fraction of all our neural activity. Most of the activity is much less like “thought” and much more like typical processes in nature, with lots of elements seemingly “doing their own thing”. We might think of this as an “ocean of unconscious neural activity”, from which a “thread of consensus thought” is derived. Usually—much like in an artificial neural net—it’s difficult to find much regularity in that “unconscious activity”. Though when one trains oneself enough to get to the point of being able to “do something without thinking about it”, that presumably happens by organizing some part of that activity.
There’s always a question of what kinds of things we can learn. We can’t overcome computational irreducibility. But how broadly can we handle what’s computationally reducible? Artificial neural nets show a certain genericity in their operation: although some specific architectures are more efficient than others, it doesn’t seem to matter much whether the input they’re fed is images or text or numbers, or whatever. And for our brains it’s probably the same—though what we’ve normally experienced, and learned from, are the specific kinds of input the come from our eyes, ears, etc. And from these, we’ve ended up recognizing certain types of regularities—that we’ve then used to guide our actions, set up our environment, etc.
And, yes, this plugs into certain pockets of computational reducibility in the world. But there’s always further one could go. And how that might work with brains bigger than ours is at the core of what we’re trying to discuss here.
At some level we can view our brains as serving to take the complexity of the world and extract from it a compressed representation that our finite minds can handle. But what is the structure of that representation? A central aspect of it is that it ignores many details of the original input (like particular configurations of pixels). Or, in other words, it effectively equivalences many different inputs together.
But how then do we describe that equivalence class? Implementationally, say in a neural net, the equivalence class might correspond to an attractor to which many different initial conditions all evolve. In terms of the detailed pattern of activity in the neural net the attractor will typically be very hard to describe. But on a larger scale we can potentially just think of it as some kind of robust construct that represents a class of things—or what in terms of our process of thought we might describe as a “concept”.
At the lowest level there’s all sorts of complicated neural activity in our brains—most of it mired in computational irreducibility. But the “thin thread of conscious experience” that we extract from this we can for many purposes treat as being made up of higher-level “units of thought”, or essentially “discrete concepts”.
And, yes, it’s certainly our typical human experience that robust constructs—and particularly ones from which other constructs can be built—will be discrete. In principle one can imagine that there could be things like “robust continuous spaces of concepts” (“cat and dog and everything in between”). But we don’t have anything like the computational paradigm that shows us a consistent universal way that such things could fit together (there’s no robust analog of computation theory for real numbers, for example). And somehow the success of the computational paradigm—potentially all the way down to the foundations of the physical universe—doesn’t seem to leave much room for anything else.
So, OK, let’s imagine that we can represent our thread of conscious experience in terms of concepts. Well, that’s close to saying that we’re using language. We’re “packaging up” the details of our neural activity into “robust elements” which we can think of as concepts—and which are represented in language essentially by words. And not only does this “packaging” into language give a robust way for different brains to communicate; it also gives a single brain a robust way to “remember” and “redeploy” thoughts.
Within one brain one could imagine that one might be able to remember and “think” directly in terms of detailed low-level neural patterns. But no doubt the “neural environment” inside a brain is continually changing (not least because of its stream of sensory input). And so the only way to successfully “preserve a thought” across time is presumably to “package it up” in terms of robust elements, or essentially in terms of language. In other words, if we’re going to be able to consistently “think a particular thought” we probably have to formulate it in terms of something robust—like concepts.
But, OK, individual concepts are one thing. But language—or at least human language—is based on putting together concepts in structured ways. One might take a noun (“cat”) and qualify it with an adjective (“black”) to form a phrase that’s in effect a finer-grained version of the concept represented by the noun. And in a rough approximation one can think of language as formed from trees of nested phrases like this. And insofar as the phrases are independent in their structure (i.e. “context free”), we can parse such language by recursively understanding each phrase in turn—with the constraint that we can’t do it if the nesting goes too deep for us to hold the necessary stack of intermediate steps in our working memory.
An important feature of ordinary human language is that it’s ultimately presented in a sequential way. Even though it may consist of a nested tree of phrases, the words that are the leaves of that tree are spoken or written in a one-dimensional sequence. And, yes, the fact that this is how it works is surely closely connected to the fact that our brains construct a single thread of conscious experience.
In the actuality of the few thousand human languages currently in use, there is considerable superficial diversity, but also considerable fundamental commonality. For example, the same parts of speech (noun, verb, etc.) typically show up, as do concepts like “subject” and “object”. But the details of how words are put together, and how things are indicated, can be fairly different. Sometimes nouns have case endings; sometimes there are separate prepositions. Sometimes verb tenses are indicated by annotating the verb; sometimes with extra words. And sometimes, for example, what would usually be whole phrases can be smooshed together into single words.
It’s not clear to what extent commonalities between languages are the result of shared history, and to what extent they’re consequences either of the particulars of our human sensory experience of the world, or the particular construction of our brains. It’s not too hard to get something like concepts to emerge in experiments on training neural nets to pass data through a “bottleneck” that simulates a “mind-to-mind communication channel”. But how compositionality or grammatical structure might emerge is not clear.
OK, but so what might change if we had bigger brains? If neural nets are a guide, one obvious thing is that we should be able to deal directly with a larger number of “distinct concepts”, or words. So what consequences would this have? Presumably one’s language would get “grammatically shallower”, in the sense that what would otherwise have had to be said with nested phrases could now be said with individual words. And presumably this would tend to lead to “faster communication”, requiring fewer words. But it would likely also lead to more rigid communication, with less ability to tweak shades of meaning, say by changing just a few words in a phrase. (And it would presumably also require longer training, to learn what all the words mean.)
In a sense we have a preview of what it’s like to have more words whenever we deal with specialized versions of existing language, aimed say at particular technical fields. There are additional words of “jargon” available, that make certain things “faster to say” (but require longer to learn). And with that jargon comes a certain rigidity, in saying easily only what the jargon says, and not something slightly different.
So how else could language be different with a bigger brain? With larger working memory, one could presumably have more deeply nested phrases. But what about more sophisticated grammatical structures, say ones that aren’t “context free”, in the sense that different nested phrases can’t be parsed separately? My guess is that this quickly devolves into requiring arbitrary computation—and runs into computational irreducibility. In principle it’s perfectly possible to have any program as the “message” one communicates. But if one has to run the program to “determine its meaning”, that’s in general going to involve computational irreducibility.
And the point is that with our assumptions about what “brain-like systems” do, that’s something that’s out of scope. Yes, one can construct a system (even with neurons) that can do it. But not with the “single thread of decisions from sensory input” workflow that seems characteristic of brains. (There are finer gradations one could consider—like languages that are context sensitive but don’t require general computation. But the Principle of Computational Equivalence strongly suggests that the separation between nested context-free systems and ones associated with arbitrary computation is very thin, and there doesn’t seem to be any particular reason to expect that the capabilities of a bigger brain would land right there.)
Said another way: the Principle of Computational Equivalence says it’s easy to have a system that can deal with arbitrary computation. It’s just that such a system is not “brain like” in its behavior; it’s more like a typical system we see in nature.
OK, but what other “additional features” can one imagine, for even roughly “brain-like” systems? One possibility is to go beyond the idea of a single thread of experience, and to consider a multiway system in which threads of experience can branch and merge. And, yes, this is what we imagine happens at a low level in the physical universe, particularly in connection with quantum mechanics. And indeed it’s perfectly possible to imagine, for example, a “quantum-like” LLM system in which one generates a graph of different textual sequences. But just “scaling up the number of neurons” in a brain, without changing the overall architecture, won’t get to this. We have to have a different, multiway architecture. Where we have a “graph of consciousness” rather than a “stream of consciousness”, and where, in effect, we’re “thinking a graph of thoughts”, notably with thoughts themselves being able to branch and merge.
In our practical use of language, it’s most often communicated in spoken or written form—effectively as a one-dimensional sequence of tokens. But in math, for example, it’s common to have a certain amount of 2D structure, and in general there are also all sorts of specialized (usually technical) diagrammatic representations in use, often based on using graphs and networks—as we’ll discuss in more detail below.
But what about general pictures? Normally it’s difficult for us to produce these. But in generative AI systems it’s basically easy. So could we then imagine directly “communicating mental images” from one mind to another? Maybe as a practical matter some neural implant in our brain could aggregate neural signals from which a displayed image could be generated. But is there in fact something coherent that could be extracted from our brains in this way? Perhaps that can only happen after “consensus is formed”, and we’ve reduced things to a much thinner “thread of experience”. Or, in other words, perhaps the only robust way for us to “think about images” is in effect to reduce them to discrete concepts and language-like representations.
But perhaps if we “had the hardware” to display images directly from our minds it’d be a different story. And it’s sobering to imagine that perhaps the reason cats and dogs don’t appear to have compositional language is just that they don’t “have the hardware” to talk like we do (and it’s too laborious for them to “type with their paws”, etc.). And, by analogy, that if we “had the hardware” for displaying images, we’d discover we could also “think very differently”.
Of course, in some small ways we do have the ability to “directly communicate with images”, for example in our use of gestures and body language. Right now, these seem like largely ancillary forms of communication. But, yes, it’s conceivable that with bigger brains, they could be more.
And when it comes to other animals the story can be different. Cuttlefish are notable for dynamically producing elaborate patterns on their skin—giving them in a sense the hardware to “communicate in pictures”. But so far as one can tell, they produce just a small number of distinct patterns—and certainly nothing like a “pictorial generalization of compositional language”. (In principle one could imagine that “generalized cuttlefish” could do things like “dynamically run cellular automata on their skin”, just like all sorts of animals “statically” do in the process of growth or development. But to decode such patterns—and thereby in a sense enable “communicating in programs”—would typically require irreducible amounts of computation that are beyond the capabilities of any standard brain-like system.)
We humans have raw inputs coming into our brains from a few million sensors distributed across our usual senses of touch, sight, hearing, taste and smell (together with balance, temperature, hunger, etc.). In most cases the detailed sensor inputs are not independent; in a typical visual scene, for example, neighboring pixels are highly correlated. And it doesn’t seem to take many layers of neurons in our brains to distill our typical sensory experience from pure pieces of “raw data” to what we might view as “more independent features”.
Of course there’ll usually be much more in the raw data than just those features. But the “features” typically correspond to aspects of the data that we’ve “learned are useful to us”—normally connected to pockets of computational reducibility that exist in the environment in which we operate. Are the features we pick out all we’ll ever need? In the end, we typically want to derive a small stream of decisions or actions from all the data that comes in. But how many “intermediate features” do we need to get “good” decisions or actions?
That really depends on two things. First, what our decisions and actions are like. And second, what our raw data is like. Early in the history of our species, everything was just about “indigenous human experience”: what the natural world is like, and what we can do with our bodies. But as soon as we were dealing with technology, that changed. And in today’s world we’re constantly exposed, for example, to visual input that comes not from the natural world, but, say, from digital displays.
And, yes, we often try to arrange our “user experience” to align with what’s familiar from the natural world (say by having objects that stay unchanged when they’re moved across the screen). But it doesn’t have to be that way. And indeed it’s easy—even with simple programs—to generate for example visual images very different from what we’re used to. And in many such cases, it’s very hard for us to “tell what’s going on” in the image. Sometimes it’ll just “look too complicated”. Sometimes it’ll seem like it has pieces we should recognize, but we don’t:
When it’s “just too complicated”, that’s often a reflection of computational irreducibility. But when there are pieces we might “think we should recognize”, that can be a reflection of pockets of reducibility we’re just not familiar with. If we imagine a space of possible images—as we can readily produce with generative AI—there will be some that correspond to concepts (and words) we’re familiar with. But the vast majority will effectively lie in “interconcept space”: places where we could have concepts, but don’t, at least yet:
So what could bigger brains do with all this? Potentially they could handle more features, and more concepts. Full computational irreducibility will always in effect ultimately overpower them. But when it comes to handling pockets of reducibility, they’ll presumably be able to deal with more of them. So in the end, it’s very much as one might expect: a bigger brain should be able to track more things going on, “see more details”, etc.
Brains of our size seem like they are in effect sufficient for “indigenous human experience”. But with technology in the picture, it’s perfectly possible to “overload” them. (Needless to say, technology—in the form of filtering, data analysis, etc.—can also reduce that overload, in effect taking raw input and bringing our actual experience of it closer to something “indigenous”.)
It’s worth pointing out that while two brains of a given size might be able to “deal with the same number of features or concepts”, those features or concepts might be different. One brain might have learned to talk about the world in terms of one set of primitives (such as certain basic colors); another in terms of a different set of primitives. But if both brains are sampling “indigenous human experience” in similar environments one can expect that it should be possible to translate between these descriptions—just as it is generally possible to translate between things said in different human languages.
But what if the brains are effectively sampling “different slices of reality”? What if one’s using technology to convert different physical phenomena to forms (like images) that we can “indigenously” handle? Perhaps we’re sensing different electromagnetic frequencies; perhaps we’re sensing molecular or chemical properties; perhaps we’re sensing something like fluid motion. The kinds of features that will be “useful” may be quite different in these different modalities. Indeed, even something as seemingly basic as the notion of an “object” may not be so relevant if our sensory experience is effectively of continuous fluid motion.
But in the end, what’s “useful” will depend on what we can do. And once again, it depends on whether we’re dealing with “pure humans” (who can’t, for example, move like octopuses) or with humans “augmented by technology”. And here we start to see an issue that relates to the basic capabilities of our brains.
As “pure humans”, we have certain “actuators” (basically in the form of muscles) that we can “indigenously” operate. But with technology it’s perfectly possible for us to use quite different actuators in quite different configurations. And as a practical matter, with brains like ours, we may not be able to make them work.
For example, while humans can control helicopters, they never managed to control quadcopters—at least not until digital flight controllers could do most of the work. In a sense there were just too many degrees of freedom for brains like ours to deal with. Should bigger brains be able to do more? One would think so. And indeed one could imagine testing this with artificial neural nets. In millipedes, for example, their actual brains seem to support only a couple of patterns of motion of their legs (roughly, same phase vs. opposite phase). But one could imagine that with a bigger brain, all sorts of other patterns would become possible.
Ultimately, there are two issues at stake here. The first is having a brain be able to “independently address” enough actuators, or in effect enough degrees of freedom. The second is having a brain be able to control those degrees of freedom. And for example with mechanical degrees of freedom there are again essentially issues of computational irreducibility. Looking at the space of possible configurations—say of millipede legs—does one effectively just have to trace the path to find out if, and how, one can get from one configuration to another? Or are there instead pockets of reducibility, associated with regularities in the space of configurations, that let one “jump ahead” and figure this out without tracing all the steps? It’s those pockets of reducibility that brains can potentially make use of.
When it comes to our everyday “indigenous” experience of the world, we are used to certain kinds of computational reducibility, associated for example with familiar natural laws, say about motion of objects. But what if we were dealing with different experiences, associated with different senses?
For example, imagine (as with dogs) that our sense of smell was better developed than our sense of sight—as reflected by more nerves coming into our brains from our noses than our eyes. Our description of the world would then be quite different, based for example not on geometry revealed by the line-of-sight arrival of light, but instead by the delivery of odors through fluid motion and diffusion—not to mention the probably-several-hundred-dimensional space of odors, compared to the red, green, blue space of colors. Once again there would be features that could be identified, and “concepts” that could be defined. But those might only be useful in an environment “built for smell” rather than one “built for sight”.
And in the end, how many concepts would be useful? I don’t think we have any way to know. But it certainly seems as if one can be a successful “smell-based animal” with a smaller brain (presumably supporting fewer concepts) than one needs as a successful “sight-based animal”.
One feature of “natural senses” is that they tend to be spatially localized: an animal basically senses things only where it is. (We’ll discuss the case of social organisms later.) But what if we had access to a distributed array of sensors—say associated with IoT devices? The “effective laws of nature” that one could perceive would then be different. Maybe there would be regularities that could be captured by a small number of concepts, but it seems more likely that the story would be more complicated, and that in effect one would “need a bigger brain” to be able to keep track of what’s going on, and make use of whatever pockets of reducibility might exist.
There are somewhat similar issues if one imagines changing the timescales for sensory input. Our perception of space, for example, depends on the fact that light travels fast enough that in the milliseconds it takes our brain to register the input, we’ve already received light from everything that’s around us. But if our brains operated a million times faster (as digital electronics does) we’d instead be registering individual photons. And while our brains might aggregate these to something like what we ordinarily perceive, there may be all sorts of other (e.g. quantum optics) effects that would be more obvious.
The more abstractly we try to think, the harder it seems to get. But would it get easier if we had bigger brains? And might there perhaps be fundamentally higher levels of abstraction that we could reach—but only if we had bigger brains.
As a way to approach such questions, let’s begin by talking a bit about the history of the phenomenon of abstraction. We might already say that basic perception involves some abstraction, capturing as it does a filtered version of the world as it actually is. But perhaps we reach a different level when we start to ask “what if?” questions, and to imagine how things in the world could be different than they are.
But somehow when it comes to us humans, it seems as if the greatest early leap in abstraction was the invention of language, and the explicit delineation of concepts that could be quite far from our direct experience. The earliest written records tend to be rather matter of fact, mostly recording as they do events and transactions. But already there are plenty of signs of abstraction. Numbers independent of what they count. Things that should happen in the future. The concept of money.
There seems to be a certain pattern to the development of abstraction. One notices that some category of things one sees many times can be considered similar, then one “packages these up” into a concept, often described by a word. And in many cases, there’s a certain kind of self amplification: once one has a word for something (as a modern example, say “blog”), it becomes easier for us to think about the thing, and we tend to see it or make it more often in the world around us. But what really makes abstraction take off is when we start building a whole tower of it, with one abstract concept recursively being based on others.
Historically this began quite slowly. And perhaps it was seen first in theology. There were glimmerings of it in things like early (syllogistic) logic, in which one started to be able to talk about the form of arguments, independent of their particulars. And then there was mathematics, where computations could be done just in terms of numbers, independent of where those numbers came from. And, yes, while there were tables of “raw computational results”, numbers were usually discussed in terms of what they were numbers of. And indeed when it came to things like measures of weight, it took until surprisingly modern times for there to be an absolute, abstract notion of weight, independent of whether it was a weight of figs or of wool.
The development of algebra in the early modern period can be considered an important step forward in abstraction. Now there were formulas that could be manipulated abstractly, without even knowing what particular numbers x stood for. But it would probably be fair to say that there was a major acceleration in abstraction in the 19th century—with the development of formal systems that could be discussed in “purely symbolic form” independent of what they might (or might not) “actually represent”.
And it was from this tradition that modern notions of computation emerged (and indeed particularly ones associated with symbolic computation that I personally have extensively used). But the most obvious area in which towers of abstraction have been built is mathematics. One might start with numbers (that could count things). But soon one’s on to variables, functions, spaces of functions, category theory—and a zillion other constructs that abstractly build on each other.
The great value of abstraction is that it allows one to think about large classes of things all at once, instead of each separately. But how do those abstract concepts fit together? The issue is that often it’s in a way that’s very remote from anything about which we have direct experience from our raw perception of the world. Yes, we can define concepts about transfinite numbers or higher categories. But they don’t immediately relate to anything we’re familiar with from our everyday experience.
As a practical matter one can often get a sense of how high something is on the tower of abstraction by seeing how much one has to explain to build up to it from “raw experiential concepts”. Just sometimes it turns out that actually, once one hears about a certain seemingly “highly abstract” concept, one can actually explain it surprisingly simply, without going through the whole historical chain that led to it. (A notable example of this is the concept of universal computation—which arose remarkably late in human intellectual history, but is now quite easy to explain, albeit particularly given its actual widespread embodiment in technology.) But the more common case is that there’s no choice but to explain a whole tower of concepts.
At least in my experience, however, when one actually thinks about “highly abstract” things, one does it by making analogies to more familiar, more concrete things. The analogies may not be perfect, but they provide scaffolding which allows our brains to take what would otherwise be quite inaccessible steps.
At some level any abstraction is a reflection of a pocket of computational reducibility. Because if a useful abstraction can be defined, what it means is that it’s possible to say something in a “summarized” or reduced way, in effect “jumping ahead”, without going through all the computational steps or engaging with all the details. And one can then think of towers of abstraction as being like networks of pockets of computational reducibility. But, yes, it can be hard to navigate these.
Underneath, there’s lots of computational irreducibility. And if one is prepared to “go through all the steps” one can often “get to an answer” without all the “conceptual difficulty” of complex abstractions. But while computers can often readily “go through all the steps”, brains can’t. And that’s in a sense why we have to use abstraction. But inevitably, even if we’re using abstraction, and the pockets of computational reducibility associated with it, there’ll be shadows of the computational irreducibility underneath. And in particular, if we try to “explore everything”, our network of pockets of reducibility will inevitably “get complicated”, and ultimately also be mired in computational irreducibility, albeit with “higher-level” constructs than in the computational irreducibility underneath.
No finite brain will ever be able to “go all the way”, but it starts to seem likely that a bigger brain will be able to “reach further” in the network of abstraction. But what will it find there? How does the character of abstraction change when we take it further? We’ll be able to discuss this a bit more concretely when we talk about computational language below. But perhaps the main thing to say now is that—at least in my experience—most higher abstractions don’t feel as if they’re “structurally different” once one understands them. In other words, most of the time, it seems as if the same patterns of thought and reasoning that one’s applied in many other places can be applied there too, just to different kinds of constructs.
Sometimes, though, there seem to be exceptions. Shocks to intuition that seem to separate what one’s now thinking about from anything one’s thought before. And, for example, for me this happened when I started looking broadly at the computational universe. I had always assumed that simple rules would lead to simple behavior. But many years ago I discovered that in the computational universe this isn’t true (hence computational irreducibility). And this led to a whole different paradigm for thinking about things.
It feels a bit like in metamathematics. Where one can imagine one type of abstraction associated with different constructs out of which to form theorems. But where somehow there’s another level associated with different ways to build new theorems, or indeed whole spaces of theorems. Or to build proofs from proofs, or proofs from proofs of proofs, etc. But the remarkable thing is that there seems to be an ultimate construct that encompasses it all: the ruliad.
We can describe the ruliad as the entangled limit of all possible computations. But we can also describe it as the limit of all possible abstractions. And it seems to lie underneath all physical reality, as well as all possible mathematics, etc. But, we might ask, how do brains relate to it?
Inevitably, it’s full of computational irreducibility. And looked at as a whole, brains can’t get far with it. But the key idea is to think about how brains as they are—with all their various features and limitations—will “parse” it. And what I’ve argued is that what “brains as they are” will perceive about the ruliad are the core laws of physics (and mathematics) as we know them. In other words, it’s because brains are the way they are that we perceive the laws of physics that we perceive.
Would it be different for bigger brains? Not if they’re the “same kind of brains”. Because what seems to matter for the core laws of physics are really just two properties of observers. First, that they’re computationally bounded. And second, that they believe they are persistent in time, and have a single thread of experience through time. And both of these seem to be core features of what makes brains “brain-like”, rather than just arbitrary computational systems.
It’s a remarkable thing that just these features are sufficient to make core laws of physics inevitable. But if we want to understand more about the physics we’ve constructed—and the laws we’ve deduced—we probably have to understand more about what we’re like as observers. And indeed, as I’ve argued elsewhere, even our physical scale (much bigger than molecules, much smaller than the whole universe) is for example important in giving us the particular experience (and laws) of physics that we have.
Would this be different with bigger brains? Perhaps a little. But anything that something brain-like can do pales in comparison to the computational irreducibility that exists in the ruliad and in the natural world. Nevertheless, with every new pocket of computational reducibility that’s reached we get some new abstraction about the world, or in effect, some new law about how the world works.
And as a practical matter, each such abstraction can allow us to build a whole collection of new ways of thinking about the world, and making things in the world. It’s challenging to trace this arc. Because in a sense it’ll all be about “things we never thought to think about before”. Goals we might define for ourselves that are built on a tower of abstraction, far away from what we might think of as “indigenous human goals”.
It’s important to realize that there won’t just be one tower of abstraction that can be built. There’ll inevitably be an infinite network of pockets of computational reducibility, with each path leading to a different specific tower of abstraction. And indeed the abstractions we have pursued reflect the particular arc of human intellectual history. Bigger brains—or AIs—have many possible directions they can go, each one defining a different path of history.
One question to ask is to what extent reaching higher levels of abstraction is a matter of education, and to what extent it requires additional intrinsic capabilities of a brain. It is, I suspect, a mixture. Sometimes it’s really just a question of knowing “where that pocket of reducibility is”, which is something we can learn from education. But sometimes it’s a question of navigating a network of pockets, which may only be possible when brains reach a certain level of “computational ability”.
There’s another thing to discuss, related to education. And that’s the fact that over time, more and more “distinct pieces of knowledge” get built up in our civilization. There was perhaps a time in history when a brain of our size could realistically commit to memory at least the basics of much of that knowledge. But today that time has long passed. Yes, abstraction in effect compresses what one needs to know. But the continual addition of new and seemingly important knowledge, across countless specialties, makes it impossible for brains of our size to keep up.
Plenty of that knowledge is, though, quite siloed in different areas. But sometimes there are “grand analogies” to make—say pulling an idea from relativity theory and applying it to biological evolution. In a sense such analogies reveal new abstractions—but to make them requires knowledge that spans many different areas. And that’s a place where bigger brains—or AIs—can potentially do something that’s in a fundamental way “beyond us”.
Will there always be such “grand analogies” to make? The general growth of knowledge is inevitably a computationally irreducible process. And within it there will inevitably be pockets of reducibility. But how often in practice will one actually encounter “long-range connections” across “knowledge space”? As a specific example one can look at metamathematics, where such connections are manifest in theorems that link seemingly different areas of mathematics. And this example leads one to realize that at some deep level grand analogies are in a sense inevitable. In the context of the ruliad, one can think of different domains of knowledge as corresponding to different parts. But the nature of the ruliad—encompassing as it does everything that is computationally possible—inevitably imbues it with a certain homogeneity, which implies that (as the Principle of Computational Equivalence might suggest) there must ultimately be a correspondence between different areas. In practice, though, this correspondence may be at a very “atomic” (or “formal”) level, far below the kinds of descriptions (based on pockets of reducibility) that we imagine brains normally use.
But, OK, will it always take an “expanding brain” to keep up with the “expanding knowledge” we have? Computational irreducibility guarantees that there’ll always in principle be “new knowledge” to be had—separated from what’s come before by irreducible amounts of computation. But then there’s the question of whether in the end we’ll care about it. After all, it could be that the knowledge we can add is so abstruse that it will never affect any practical decisions we have to make. And, yes, to some extent that’s true (which is why only some tiny fraction of the Earth’s population will care about what I’m writing here). But another consequence of computational irreducibility is that there will always be “surprises”—and those can eventually “push into focus” even what at first seems like arbitrarily obscure knowledge.
Language in general—and compositional language in particular—is arguably the greatest invention of our species. But is it somehow “the top”—the highest possible representation of things? Or if, for example, we had bigger brains, is there something beyond it that we could reach?
Well, in some very formal sense, yes, compositional language (at least in idealized form) is “the top”. Because—at least if it’s allowed to include utterances of any length—then in some sense it can in principle encode arbitrary, universal computations. But this really isn’t true in any useful sense—and indeed to apply ordinary compositional language in this way would require doing computationally irreducible computations.
So we return to the question of what might in practice lie beyond ordinary human language. I wondered about this for a long time. But in the end I realized that the most important clue is in a sense right in front of me: the concept of computational language, that I’ve spent much of my life exploring.
It’s worth saying at the outset that the way computational language plays out for computers and for brains is somewhat different, and in some respects complementary. In computers you might specify something as a Wolfram Language symbolic expression, and then the “main action” is to evaluate this expression, potentially running a long computation to find out what the expression evaluates to.
Brains aren’t set up to do long computations like this. For them a Wolfram Language expression is something to use in effect as a “representation of a thought”. (And, yes, that’s an important distinction between the computational language concept of Wolfram Language, and standard “programming languages”, which are intended purely as a way to tell a computer what to do, not a way to represent thoughts.)
So what kinds of thoughts can we readily represent in our computational language? There are ones involving explicit numbers, or mathematical expressions. There are ones involving cities and chemicals, and other real-world entities. But then there are higher-level ones, that in effect describe more abstract structures.
For example, there’s NestList, which gives the result of nesting any operation, here named f:
At the outset, it’s not obvious that this would be a useful thing to do. But in fact it’s a very successful abstraction: there are lots of functions f for which one wants to do this.
In the development of ordinary human language, words tend to get introduced when they’re useful, or, in other words, when they express things one often wants to express. But somehow in human language the words one gets tend to be more concrete. Maybe they describe something that directly happens to objects in the world. Maybe they describe our impression of a human mental state. Yes, one can make rather vague statements like “I’m going to do something to someone”. But human language doesn’t normally “go meta”, doing things like NestList where one’s saying that one wants to take some “direct statement” and in effect “work with the statement”. In some sense, human language tends to “work with data”, applying a simple analog of code to it. Our computational language can “work with code” as “raw material”.
One can think about this as a “higher-order function”: a function that operates not on data, but on functions. And one can keep going, dealing with functions that operate on functions that operate on functions, and so on. And at every level one is increasing the generality—and abstraction—at which one is working. There may be many specific functions (a bit analogous to verbs) that operate on data (a bit analogous to nouns). But when we talk about operating on functions themselves we can potentially have just a single function (like NestList) that operates, quite generally, on many functions. In ordinary language, we might call such things “metaverbs”, but they aren’t something that commonly occurs.
But what makes them possible in computational language? Well, it’s taking the computational paradigm seriously, and representing everything in computational terms: objects, actions, etc. In Wolfram Language, it’s that we can represent everything as a symbolic expression. Arrays of numbers (or countries, or whatever) are symbolic expressions. Graphics are symbolic expressions. Programs are symbolic expressions. And so on.
And given this uniformity of representation it becomes feasible—and natural—to do higher-order operations, that in effect manipulate symbolic structure without being concerned about what the structure might represent. At some level we can view this as leading to the ultimate abstraction embodied in the ruliad, where in a sense “everything is pure structure”. But in practice in Wolfram Language we try to “anchor” what we’re doing to known concepts from ordinary human language—so that we use names for things (like NestList) that are derived from common English words.
In some formal sense this isn’t necessary. Everything can be “purely structural”, as it is not only in the ruliad but also in constructs like combinators, where, say, the operation of addition can be represented by:
Combinators have been around for more than a century. But they are almost impenetrably difficult for most humans to understand. Somehow they involve too much “pure abstraction”, not anchored to concepts we “have a sense of” in our brains.
It’s been interesting for me to observe over the years what it’s taken for people (including myself) to come to terms with the kind of higher-order constructs that exist in the Wolfram Language. The typical pattern is that over the course of months or years one gets used to lots of specific cases. And only after that is one able—often in the end rather quickly—to “get to the next level” and start to use some generalized, higher-order construct. But normally one can in effect only “go one level at a time”. After one groks one level of abstraction, that seems to have to “settle” for a while before one can go on to the next one.
Somehow it seems as if one is gradually “feeling out” a certain amount of computational irreducibility, to learn about a new pocket of reducibility, that one can eventually use to “think in terms of”.
Could “having a bigger brain” speed this up? Maybe it’d be useful to be able to remember more cases, and perhaps get more into “working memory”. But I rather suspect that combinators, for example, are in some sense fundamentally beyond all brain-like systems. It’s much as the Principle of Computational Equivalence suggests: one quickly “ascends” to things that are as computationally sophisticated as anything—and therefore inevitably involve computational irreducibility. There are only certain specific setups that remain within the computationally bounded domain that brain-like systems can deal with.
Of course, even though they can’t directly “run code in their brains”, humans—and LLMs—can perfectly well use Wolfram Language as a tool, getting it to actually run computations. And this means they can readily “observe phenomena” that are computationally irreducible. And indeed in the end it’s very much the same kind of thing observing such phenomena in the abstract computational universe, and in the “real” physical universe. And the point is that in both cases, brain-like systems will pull out only certain features, essentially corresponding to pockets of computational reducibility.
How do things like higher-order functions relate to this? At this point it’s not completely clear. Presumably in at least some sense there are hierarchies of higher-order functions that capture certain kinds of regularities that can be thought of as associated with networks of computational reducibility. And it’s conceivable that category theory and its higher-order generalizations are relevant here. In category theory one imagines applying sequences of functions (“morphisms”) and it’s a foundational assumption that the effect of any sequence of functions can also be represented by just a single function—which seems tantamount to saying that one can always “jump ahead”, or in other words, that everything one’s dealing with is computationally reducible. Higher-order category theory then effectively extends this to higher-order functions, but always with what seem like assumptions of computational reducibility.
And, yes, this all seems highly abstract, and difficult to understand. But does it really need to be, or is there some way to “bring it down” to a level that’s close to everyday human thinking? It’s not clear. But in a sense the core art of computational language design (that I’ve practiced so assiduously for nearly half a century) is precisely to take things that at first might seem abstruse, and somehow cast them into an accessible form. And, yes, this is something that’s about as intellectually challenging as anything—because in a sense it involves continually trying to “figure out what’s really going on”, and in effect “drilling down” to get to the foundations of everything.
But, OK, when one gets there, how simple will things be? Part of that depends on how much computational irreducibility is left when one reaches what one considers to be “the foundations”. And part in a sense depends on the extent to which one can “find a bridge” between the foundations and something that’s familiar. Of course, what’s “familiar” can change. And indeed over the four decades that I’ve been developing the Wolfram Language quite a few things (particularly in areas like functional programming) that at first seemed abstruse and unfamiliar have begun to seem more familiar. And, yes, it’s taken the collective development and dissemination of the relevant ideas to achieve that. But now it “just takes education”; it doesn’t “take a bigger brain” to deal with these things.
One of the core features of the Wolfram Language is that it represents everything as a symbolic expression. And, yes, symbolic expressions are formally able to represent any kind of computational structure. But beyond that, the important point is that they’re somehow set up to be a match for how brains work.
And in particular, symbolic expressions can be thought of “grammatically” as consisting of nested functions that form a tree-like structure; effectively a more precise version of the typical kind of grammar that we find in human language. And, yes, just as we manage to understand and generate human language with a limited working memory, so (at least at the grammatical level) we can do the same thing with computational language. In other words, in dealing with Wolfram Language we’re leveraging our faculties with human language. And that’s why Wolfram Language can serve as such an effective bridge between the way we think about things, and what’s computationally possible.
But symbolic expressions represented as trees aren’t the only conceivable structures. It’s also possible to have symbolic expressions where the elements are nodes on a graph, and the graph can even have loops in it. Or one can go further, and start talking, for example, about the hypergraphs that appear in our Physics Project. But the point is that brain-like systems have a hard time processing such structures. Because to keep track of what’s going on they in a sense have to keep track of multiple “threads of thought”. And that’s not something individual brain-like systems as we current envision them can do.
As we’ve discussed several times here, it seems to be a key feature of brains that they create a single “thread of experience”. But what would it be like to have multiple threads? Well, we actually have a very familiar example of that: what happens when we have a whole collection of people (or other animals).
One could imagine that biological evolution might have produced animals whose brains maintain multiple simultaneous threads of experience. But somehow it has ended up instead restricting each animal to just one thread of experience—and getting multiple threads by having multiple animals. (Conceivably creatures like octopuses may actually in some sense support multiple threads within one organism.)
Within a single brain it seems important to always “come to a single, definite conclusion”—say to determine where an animal will “move next”. But what about in a collection of organisms? Well, there’s still some kind of coordination that will be important to the fitness of the whole population—perhaps even something as direct as moving together as a herd or flock. And in a sense, just as all those different neuron firings in one brain get collected to determine a “final conclusion for what to do”, so similarly the conclusions of many different brains have to be collected to determine a coordinated outcome.
But how can a coordinated outcome arise? Well, there has to be communication of some sort between organisms. Sometimes it’s rather passive (just watch what your neighbor in a herd or flock does). Sometimes it’s something more elaborate and active—like language. But is that the best one can do? One might imagine that there could be some kind of “telepathic coordination”, in which the raw pattern of neuron firings is communicated from one brain to another. But as we’ve argued, such communication cannot be expected to be robust. To achieve robustness, one must “package up” all the internal details into some standardized form of communication (words, roars, calls, etc.) that one can expect can be “faithfully unpacked” and in effect “understood” by other, suitably similar brains.
But it’s important to realize that the very possibility of such standardized communication in effect requires coordination. Because somehow what goes on in one brain has to be aligned with what goes on in another. And indeed the way that’s maintained is precisely through continual communication.
So, OK, how might bigger brains affect this? One possibility is that they might enable more complex social structures. There are plenty of animals with fairly small brains that successfully form “all do the same thing” flocks, herds and the like. But the larger brains of primates seem to allow more complex “tribal” structures. Could having a bigger brain let one successfully maintain a larger social structure, in effect remembering and handling larger numbers of social connections? Or could the actual forms of these connections be more complex? While human social connections seem to be at least roughly captured by social networks represented as ordinary graphs, maybe bigger brains would for example routinely require hypergraphs.
But in general we can say that language—or standardized communication of some form—is deeply connected to the existence of a “coherent society”. For without being able to exchange something like language there’s no way to align the members of a potential society. And without coherence between members something like language won’t be useful.
As in so many other situations, one can expect that the detailed interactions between members of a society will show all sorts of computational irreducibility. And insofar as one can identify “the will of society” (or, for that matter, the “tide of history”), it represents a pocket of computational reducibility in the system.
In human society there is a considerable tendency (though it’s often not successful) to try to maintain a single “thread of society”, in which, at some level, everyone is supposed to act more or less the same. And certainly that’s an important simplifying feature in allowing brains like ours to “navigate the social world”. Could bigger brains do something more sophisticated? As in other areas, one can imagine a whole network of regularities (or pockets of reducibility) in the structure of society, perhaps connected to a whole tower of “higher-order social abstractions”, that only brains bigger than ours can comfortably deal with. (“Just being friends” might be a story for the “small brained”. With bigger brains one might instead have patterns of dependence and connectivity that can only be represented in complicated graph theoretic ways.)
We humans have a tremendous tendency to think—or at least hope—that our minds are somehow “at the top” of what’s possible. But with what we know now about computation and how it operates in the natural world it’s pretty clear this isn’t true. And indeed it seems as if it’s precisely a limitation in the “computational architecture” of our minds—and brains—that leads to that most cherished feature of our existence that we characterize as “conscious experience”.
In the natural world at large, computation is in some sense happening quite uniformly, everywhere. But our brains seem to be set up to do computation in a more directed and more limited way—taking in large amounts of sensory data, but then filtering it down to a small stream of actions to take. And, yes, one can remove this “limitation”. And while the result may lead to more computation getting done, it doesn’t lead to something that’s “a mind like ours”.
And indeed in what we’ve done here, we’ve tended to be very conservative in how we imagine “extending our minds”. We’ve mostly just considered what might happen if our brains were scaled up to have more neurons, while basically maintaining the same structure. (And, yes, animals physically bigger than us already have larger brains—as did Neanderthals—but what we really need to look at is size of brain relative to size of the animal, or, in effect “amount of brain for a given amount of sensory input”.)
A certain amount about what happens with different scales of brains is already fairly clear from looking at different kinds of animals, and at things like their apparent lack of human-like language. But now that we have artificial neural nets that do remarkably human-like things we’re in a position to get a more systematic sense of what different scales of “brains” can do. And indeed we’ve seen a sequence of “capability thresholds” passed as neural nets get larger.
So what will bigger brains be able to do? What’s fairly straightforward is that they’ll presumably be able to take larger amounts of sensory input, and generate larger amounts of output. (And, yes, the sensory input could come from existing modalities, or new ones, and the outputs could go to existing “actuators”, or new ones.) As a practical matter, the more “data” that has to be processed for a brain to “come to a decision” and generate an output, the slower it’ll probably be. But as brains get bigger, so presumably will the size of their working memory—as well as the number of distinct “concepts” they can “distinguish” and “remember”.
If the same overall architecture is maintained, there’ll still be just a single “thread of experience”, associated with a single “thread of communication”, or a single “stream of tokens”. At the size of brains we have, we can deal with compositional language in which “concepts” (represented, basically, as words) can have at least a certain depth of qualifiers (corresponding, say, to adjectival phrases). As brain size increases, we can expect there can both be more “raw concepts”—allowing fewer qualifiers—as well as more working memory to deal with more deeply nested qualifiers.
But is there something qualitatively different that can happen with bigger brains? Computational language (and particularly my experience with the Wolfram Language) gives some indications, the most notable of which is the idea of “going meta” and using “higher-order constructs”. Instead of, say, operating directly on “raw concepts” with (say, “verb-like”) “functions”, we can imagine higher-order functions that operate on functions themselves. And, yes, this is something of which we see powerful examples in the Wolfram Language. But it feels as if we could somehow go further—and make this more routine—if our brains in a sense had “more capacity”.
To “go meta” and “use higher-order constructs” is in effect a story of abstraction—and of taking many disparate things and abstracting to the point where one can “talk about them all together”. The world at large is full of complexity—and computational irreducibility. But in essence what makes “minds like ours” possible is that there are pockets of computational reducibility to be found. And those pockets of reducibility are closely related to being able to successfully do abstraction. And as we build up towers of abstraction we are in effect navigating through networks of pockets of computational reducibility.
The progress of knowledge—and the fact that we’re educated about it—lets us get to a certain level of abstraction. And, one suspects, the more capacity there is in a brain, the further it will be able to go.
But where will it “want to go”? The world at large—full as it is with computational irreducibility, along with infinite numbers of pockets of reducibility—leaves infinite possibilities. And it is largely the coincidence of our particular history that defines the path we have taken.
We often identify our “sense of purpose” with the path we will take. And perhaps the definiteness of our belief in purpose is related to the particular feature of brains that leads us to concentrate “everything we’re thinking” down into just a single stream of decisions and action.
And, yes, as we’ve discussed, one could in principle imagine “multiway minds” with multiple “threads of consciousness” operating at once. But we humans (and individual animals in general) don’t seem to have those. Of course, in collections of humans (or other animals) there are still inevitably multiple “threads of consciousness” —and it’s things like language that “knit together” those threads to, for example, make a coherent society.
Quite what that “knitting” looks like might change as we scale up the size of brains. And so, for example, with bigger brains we might be able to deal with “higher-order social structures” that would seem alien and incomprehensible to us today.
So what would it be like to interact with a “bigger brain”? Inside, that brain might effectively use many more words and concepts than we know. But presumably it could generate at least a rough (“explain-like-I’m-5”) approximation that we’d be able to understand. There might well be all sorts of abstractions and “higher-order constructs” that we are basically blind to. And, yes, one is reminded of something like a dog listening to a human conversation about philosophy—and catching only the occasional “sit” or “fetch” word.
As we’ve discussed several times here, if we remove our restriction to “brain-like” operation (and in particular to deriving a small stream of decisions from large amounts of sensory input) we’re thrown into the domain of general computation, where computational irreducibility is rampant, and we can’t in general expect to say much about what’s going on. But if we maintain “brain-like operation”, we’re instead in effect navigating through “networks of computational reducibility”, and we can expect to talk about things like concepts, language and towers of abstraction.
From a foundational point of view, we can imagine any mind as in effect being at a particular place in the ruliad. When minds communicate, they are effectively exchanging the rulial analog of particles—robust concepts that are somehow unchanged as they propagate within the ruliad. So what would happen if we had bigger brains? In a sense it’s a surprisingly “mechanical” story: a bigger brain—encompassing more concepts, etc.—in effect just occupies a larger region of rulial space. And the presence of abstraction—perhaps learned from a whole arc of intellectual history—can lead to more expansion in rulial space.
And in the end it seems that “minds beyond ours” can be characterized by how large the regions of the ruliad they occupy are. (Such minds are, in some very literal rulial sense, more “broad minded”.) So what is the limit of all this? Ultimately, it’s a “mind” that spans the whole ruliad, and in effect incorporates all possible computations. But in some fundamental sense this is not a mind like ours, not least because by “being everything” it “becomes nothing”—and one can no longer identify it as having a coherent “thread of individual existence”.
And, yes, the overall thrust of what we’ve been saying applies just as well to “AI minds” as to biological ones. If we remove restrictions like being set up to generate the next token, we’ll be left with a neural net that’s just “doing computation”, with no obvious “mind-like purpose” in sight. But if we make neural nets do typical “brain-like” tasks, then we can expect that they too will find and navigate pockets of reducibility. We may well not recognize what they’re doing. But insofar as we can, then inevitably we’ll mostly be sampling the parts of “minds beyond ours” that are aligned with “minds like ours”. And it’ll take progress in our whole human intellectual edifice to be able to fully appreciate what it is that minds beyond ours can do.
Thanks for recent discussions about topics covered here in particular to Richard Assar, Joscha Bach, Kovas Boguta, Thomas Dullien, Dugan Hammock, Christopher Lord, Fred Meinberg, Nora Popescu, Philip Rosedale, Terry Sejnowski, Hikari Sorensen, and James Wiles.
2025-03-19 02:25:33
Things are invented. Things are discovered. And somehow there’s an arc of progress that’s formed. But are there what amount to “laws of innovation” that govern that arc of progress?
There are some exponential and other laws that purport to at least measure overall quantitative aspects of progress (number of transistors on a chip; number of papers published in a year; etc.). But what about all the disparate innovations that make up the arc of progress? Do we have a systematic way to study those?
We can look at the plans for different kinds of bicycles or rockets or microprocessors. And over the course of years we’ll see the results of successive innovations. But most of the time those innovations won’t stay within one particular domain—say shapes of bicycle frames. Rather they’ll keep on pulling in innovations from other domains—say, new materials or new manufacturing techniques. But if we want to get closer to the study of the pure phenomenon of innovation we need a case where—preferably over a long period of time—everything that happens can be described in a uniform way within a single narrowly defined framework.
Well, some time ago I realized that, actually, yes, there is such a case—and I’ve even personally been following it for about half a century. It’s the effort to build “engineering” structures within the Game of Life cellular automaton. They might serve as clocks, wires, logic gates, or things that generate digits of π. But the point is that they’re all just patterns of bits. So when we talk about innovation in this case, we’re talking about the rather pure question of how patterns of bits get invented, or discovered.
As a long-time serious researcher of the science of cellular automata (and of what they generically do), I must say I’ve long been frustrated by how specific, whimsical and “non-scientific” the things people do with the Game of Life have often seemed to me to be. But what I now realize is that all that detail and all that hard work have now created what amounts to a unique dataset of engineering innovation. And my goal here is to do what one can call “metaengineering”—and to study in effect what happened in that process of engineering over the nearly six decades since the Game of Life was invented.
We’ll see in rather pure form many phenomena that are at least anecdotally familiar from our overall experience of progress and innovation. Most of the time, the first step is to identify an objective: some purpose one can describe and wants to achieve. (Much more rarely, one instead observes something that happens, then realizes there’s a way one can meaningfully make use of it.) But starting from an objective, one either takes components one has, and puts human effort into arranging them to “invent” something that will achieve the objective—or in effect (usually at least somewhat systematically, and automatically) one searches to try to “discover” new ways to achieve the objective.
As we explore what’s been done with the Game of Life we’ll see occasional sudden advances—together with much larger amounts of incremental progress. We’ll see towers of technology being built, and we’ll see old, rather simple technology being used to achieve new objectives. But most of all, we’ll see an interplay between what gets discovered by searching possibilities—and what gets invented by explicit human effort.
The Principle of Computational Equivalence implies that there is, in a sense, infinite richness to what a computational system like the Game of Life can ultimately do—and it’s the role of science to explore this richness in all its breadth. But when it comes to engineering and technology the crucial question is what we choose to make the system do—and what paths we follow to get there. Inevitably, some of this is determined by the underlying computational structure of the system. But much of it is a reflection of how we, as humans, do things, and the patterns of choices we make. And that’s what we’ll be able to study—at quite large scale—by looking at the nearly six decades of work on the Game of Life.
How similar are the results of such “purposeful engineering” to the results of “blind” adaptive evolution of the kind that occurs in biology? I recently explored adaptive evolution (as it happens, using cellular automata as a model) and saw that it can routinely deliver what seem like “sequences of new ideas”. But now in the example of the Game of Life we have what we can explicitly identify as “sequences of new ideas”. And so we’re in a position to compare the results of human effort (aided, in many cases, by systematic search) with what we can “automatically” do by the algorithmic process of adaptive evolution.
In the end, we can think of the set of things that we can in principle engineer as being laid out in a kind of “metaengineering space”, much as we can think of mathematical theorems we can prove as being laid out in metamathematical space. In the mathematical case (notwithstanding some of my own work) the vast majority of theorems have historically been found purely by human effort. But, as we’ll see below, in Game-of-Life engineering it’s been a mixture of human effort and fairly automated exploration of metaengineering space. Though—much like in traditional mathematics—we’ve still in a sense always only pursuing objectives we’ve already conceptualized. And in this way what we’re doing is very different from what I’ve done for so long in studying the science (or, as I would now say, the ruliology) of what computational systems like cellular automata (of which the Game of Life is an example) do “in the wild”, when they’re unconstrained by objectives we’re trying to achieve with them.
Here’s a typical example of what it looks like to run the Game of Life:
There’s a lot of complicated—and hard to understand—stuff going on here. But there are still some recognizable structures—like the “blinkers” that alternate on successive steps
and the “gliders” that steadily move across the screen:
Seeing these structures might make one think that one should be able to “do engineering” in the Game of Life, setting up patterns that can ultimately do all sorts of things. And indeed our main subject here is the actual development of such engineering over the past nearly six decades since the introduction of the Game of Life.
What we’ll be concentrating on is essentially the “technology” of the Game of Life: how we take the “raw material” that the Game of Life provides, and make from it “meaningful engineering structures”.
But what about the science of the Game of Life? What can we say about what the Game of Life “naturally does”, independent of “useful” structures we create in it? The vast majority of the effort that’s been put into the Game of Life over the past half century hasn’t been about this. But this type of fundamental question is central to what one asks in what I now call ruliology—a kind of science that I’ve been energetically pursuing since the early 1980s.
Ruliology looks in general at classes of systems, rather then at the kind of specifics that have typically been explored in the Game of Life. And within ruliology, the Game of Life is in a sense nothing special; it’s just one of many “class 4” 2D cellular automaton (in my numbering scheme, it’s the 2-color 9-neighbor cellular automaton with outer totalistic code 224).
My own investigations of cellular automata have particularly focused in 1D than 2D examples. And I think that’s been crucial to many of the scientific discoveries I’ve made. Because somehow one learns so much more by being able to see at a glance the history of a system, rather than just seeing frames in a video go by. With a class 4 2D rule like the Game of Life, one can begin to approach this by including “trails” of what’s previously happened, and we’ll often use this kind of visualization in what follows:
We can get a more complete view of history by looking at the whole (2+1)-dimensional “spacetime history”—though then we’re confronted with 3D forms that are often somewhat difficult for our human visual system to parse:
But taking a slice through this 3D form we get “silhouette” pictures that turn out to look remarkably similar to what I generated in large quantities starting in the early 1980s across many 1D cellular automata:
Such pictures—with their complex forms—highlight the computational irreducibility that’s close at hand even in the Game of Life. And indeed it’s the presence of such computational irreducibility that ultimately makes possible the richness of engineering that can be done in the Game of Life. But in actually doing that engineering—and in setting up structures and processes that behave in understandable and “technologically useful” ways—we need to keep the computational irreducibility “bottled up”. And in the end, we can think of the path of engineering innovation in the Game of Life as like an effort to navigate through an ocean of computational irreducibility, finding “islands of reducibility” that achieve the purposes we want.
Most of the structures of “engineering interest” in the Game of Life are somehow persistent. The simplest are structures that just remain constant, some small examples being:
And, yes, structures in the Game of Life have been given all sorts of (usually whimsical) names, which I’ll use here. (And, in that vein, structures in the Game of Life that remain constant are normally called “still lifes”.)
Beyond structures that just remain constant, there are “oscillators” that produce periodic patterns:
We’ll be discussing oscillators at much greater length below, but here are a few examples (where now we’re including a visualization that shows “trails”):
Next in our inventory of classes of structures come “gliders” (or in general “spaceships”): structures that repeat periodically but move when they do so. A classic example is the basic glider, which takes on the same form every 4 steps—after moving 1 cell horizontally and 1 cell vertically:
Here are a few small examples of such “spaceship”-style structures:
Still lifes, oscillators and spaceships are most of what one sees in the “ash” that survives from typical random initial conditions. And for example the end result (after 1103 steps) from the evolution we saw in the previous section consists of:
The structures we’ve seen so far were all found not long after the Game of Life was invented; indeed, pretty much as soon it was simulated on a computer. But one feature that they all share is that they don’t systematically grow; they always return to the same number of black cells. And so one of the early surprises (in 1970) was the discovery of a “glider gun” that shoots out a glider every 30 steps forever:
Something that gives a sense of progress that’s been made in Game-of-Life “technology” is that a “more efficient” glider gun—with period 15—was discovered, but only in 2024, 54 years after the previous one:
Another kind of structure that was quickly discovered in the early history of the Game of Life is a “puffer”—a “spaceship” that “leaves debris behind” (in this case every 128 steps):
But given these kinds of “components”, what can one build? Something constructed very early was the “breeder”, that uses streams of gliders to create glider guns, that themselves then generate streams of gliders:
The original pattern covers about a quarter million cells (with 4060 being black). Running it for 1000 steps we see it builds up a triangle containing a quadratically increasing number of gliders:
OK, but knowing that it’s in principle possible to “fill a growing region of space”, is there a more efficient way to do it? The surprisingly simple answer, as discovered in 1993, is yes:
So what other kinds of things can be built in the Game of Life? Lots—even from the simple structures we’ve seen so far. For example, here’s a pattern that was constructed to compute the primes
emitting a “lightweight spaceship” at step 100 + 120n only if n is prime. It’s a little more obvious how this works when it’s viewed “in spacetime”; in effect it’s running a sieve in which all multiples of all numbers are instantiated as streams of gliders, which knock out spaceships generated at non-prime positions:
If we look at the original pattern here, it’s just made up of a collection of rather simple structures:
And indeed structures like these have been used to build all sorts of things, including for example Turing machine emulators—and also an emulator for the Game of Life itself, with this 499×499 pattern corresponding to a single emulated Life cell:
Both these last two patterns were constructed in the 1990s—from components that had been known since the early 1970s. And—as we can see—they’re large (and complicated). But do they need to be so large? One of the lessons of the Principle of Computational Equivalence is that in the computational universe there’s almost always a way to “do just as much, but with much less”. And indeed in the Game of Life many, many discoveries along these lines have been made in the past few decades.
As we’ll see, often (but not always) these discoveries built on “new devices” and “new mechanisms” that were identified in the intervening years. A long series of such “devices” and “mechanisms” involved handling “signals” associated with streams of gliders. For example, the “glider pusher” (from 1993) has the somewhat subtle (but useful) effect of “pushing” a glider by one cell when it goes past:
Another example (actually already known in 1971, and based on the period-15 “pentadecathlon” oscillator) is a glider reflector:
But a feature of this glider pusher and glider reflector is that they work only when both the glider and the stationary object are in a particular phase with respect to their periods. And this makes it very tricky to build larger structures out of these that operate correctly (and in many cases it wouldn’t be possible but for the commensurability of the period 30 of the original glider gun, and the period 15 of the glider reflector).
Could glider pushing and glider reflection be done more robustly? The answer turns out to be yes. Though it wasn’t until 2020 that the “bandersnatch” was created—a completely static structure that “pushes” gliders independent of their phase:
Meanwhile, in 2013 the “snark” had been created—which served as a phase-independent glider reflector:
One theme—to which we’ll return later—is that after certain functionality was first built in the Game of Life, there followed many “optimizations”, achieving that functionality more robustly, with smaller patterns, etc. An important methodology has revolved around so-called “hasslers”, which in effect allow one to “mine” small pieces of computational irreducibility, by providing “harnesses” that “rein in” behavior, typically returning patterns to their original states after they’ve done what one wants them to do.
So, for example, here’s a hassler (found, as it happens just on February 8, 2025!) that “harnesses” the first pattern we looked at above (that didn’t stabilize for 1103 steps) into an oscillator with period 80:
And based on this (indeed, later that same day) the most-compact-ever “spaceship gun” was constructed from this:
We’ve talked about some of what it’s been possible to build in the Game of Life over the years. Now I want to talk about how that happened, or, in other words, the “arc of progress” in the Game of Life. And as a first indication of this, we can plot the number of new Life structures that have been identified each year (or, more specifically, the number of structures deemed significant enough to name, and to record in the LifeWiki database or its predecessors):
There’s an immediate impression of several waves of activity. And we can break this down into activity around various common categories of structures:
For oscillators we see fairly continuous activity for five decades, but with rapid acceleration recently. For “spaceships” and “guns” we see a long dry spell from the early 1970s to the 1990s, followed by fairly consistent activity since. And for conduits and reflectors we see almost nothing until sudden peaks of activity, in the mid-1990s and mid-2010s respectively.
But what was actually done to find all these structures? There have basically been two methods: construction and search. Construction is a story of “explicit engineering”—and of using human thought to build up what one wants. Search, on the other hand, is a story of automation—and of taking algorithmically generated (usually large) collections of possible patterns, and testing them to find ones that do what one wants. Particularly in more recent times it’s also become common to interleave these methods, for example using construction to build a framework, and then using search to find specific patterns that implement some feature of that framework.
When one uses construction, it’s like “inventing” a structure, and when one uses search, it’s like “discovering” it. So how much of each is being done in practice? Text mining descriptions of recently recorded structures the result is as follows—suggesting that, at least in recent times, search (i.e. “discovery”) has become the dominant methodology for finding new structures:
When the Game of Life was being invented, it wasn’t long before it was being run on computers—and people were trying to classify the things it could do. Still lifes and simple oscillators showed up immediately. And then—evolving from the (“R pentomino”) initial condition that we used at the beginning here—after 69 steps something unexpected showed up. In between complicated behavior that was hard to describe was a simple free-standing structure that just systematically moved—a “glider”:
Some other moving structures (dubbed “spaceships”) were also observed. But the question arose: could there be a structure that would somehow systematically grow forever? To find it involved a mixture of “discovery” and “invention”. In running from the (“R pentomino”) initial condition lots of things happen. But at step 785 it was noticed that there appeared the following structure:
For a while this structure (dubbed the “queen bee”) behaves in a fairly orderly way—producing two stable “beehive” structures (visible here as vertical columns). But then it “decays” into more complicated behavior:
But could this “discovered” behavior be “stabilized”? The answer was that, yes, if a “queen bee” was combined with two “blocks” it would just repeatedly “shuttle” back and forth:
What about two “queen bees”? Now whenever these collided there was a side effect: a glider was generated—with the result that the whole structure became a glider gun repeatedly producing gliders forever:
The glider gun was the first major example of a structure in the Game of Life that was found—at least in part—by construction. And within a year of it being found in November 1970, two more guns—with very similar methods of operation—had been found:
But then the well ran dry—and no further gun was found until 1990. Pretty much the same thing happened with spaceships: four were found in 1970, but no more were found until 1989. As we’ll discuss later, it was in a sense a quintessential story of computational irreducibility: there was no way to predict (or “construct”) what spaceships would exist; one just had to do the computation (i.e. search) to find out.
It was, however, easier to have incremental success with oscillators—and (as we’ll see) pretty much every year an oscillator with some new period was found, essentially always by search. Some periods were “long holdouts” (for example the first period-19 oscillator was found only in 2023), once again reflecting the effects of computational irreducibility.
Glider guns provided a source of “signals” for Life engineering. But what could one do with these signals? An important idea—that first showed up in the “breeder” in 1971—was “glider synthesis”: the concept that combinations of gliders could produce other structures. So, for example, it was found that three carefully-arranged gliders could generate a period-15 (“pentadecathlon”) oscillator:
It was also soon found that 8 gliders could make the original glider gun (the breeder made glider guns by a slightly more ornate method). And eventually there developed the conjecture that any structure that could be synthesized from gliders would need at most 15 gliders, carefully arranged at positions whose values effectively encoded the object to be constructed.
By the end of the 1970s a group of committed Life enthusiasts remained, but there was something of a feeling that “the low-hanging fruit had been picked”, and it wasn’t clear where to go next. But after a somewhat slow decade, work on the Game of Life picked up substantially towards the end of the 1980s. Perhaps my own work on cellular automata (and particularly the identification of class 4 cellular automata, of which the Game of Life is a 2D example) had something to do with. And no doubt it also helped that the fairly widespread availability of faster (“workstation class”) computers now made it possible for more people to do large-scale systematic searches. In addition, when the web arrived in the early 1990s it let people much more readily share results—and had the effect of greatly expanding and organizing the community of Life enthusiasts.
In the 1990s—along with more powerful searches that found new spaceships and guns—there was a burst of activity in constructing elaborate “machines” out of existing known structures. The idea was to start from a known type of “machine” (say a Turing machine), then to construct a Life implementation of it. The constructions were made particularly ornate by the need to make the phases of gliders, guns, etc. appropriately correspond. Needless to say, any Life configuration can be thought of as doing some computation. But the “machines” that were constructed were ones whose “purpose” and “functionality” was already well established in general computation, independent of the Game of Life.
If the 1990s saw a push towards “construction” in the Game of Life, the first decade of the 2000s saw a great expansion of search. Increasingly powerful cloud and distributed computing allowed “censuses” to be created of structures emerging from billions, then trillions of initial conditions. Mostly what was emphasized was finding new instances of existing categories of objects, like oscillators and spaceships. There were particular challenges, like (as we’ll discuss below) finding oscillators of any period (finally completely solved in 2023), or finding spaceships with different patterns of motion. Searches did yield what in censuses were usually called “objects with unusual growth”, but mostly these were not viewed as being of “engineering utility”, and so were not extensively studied (even though from the point of the “science of the Game of Life” they are, for example, perhaps the most revealing examples of computational irreducibility).
As had happened throughout the history of the Game of Life, some of the most notable new structures were created (sometimes over a long period of time) by a mixture of construction and search. For example, the “stably-reflect-gliders-without-regard-to-phase” snark—finally obtained in 2013—was the result of using parts of the (ultimately unstable) “simple-structures” construction from around 1998
and combining them with a hard-to-explain-why-it-works “still life” found by search:
Another example was the “Sir Robin knightship”—a spaceship that moves like a chess knight 2 cells down and 1 across. In 2017 a spaceship search found a structure that in 6 steps has many elements that make a knight move—but then subsequently “falls apart”:
But the next year a carefully orchestrated search was able to “find a tail” that “adds a fix” to this—and successfully produces a final “perfect knightship”:
By the way, the idea that one can take something that “almost works” and find a way to “fix it” is one that’s appeared repeatedly in the engineering history of the Game of Life. At the outset, it’s far from obvious that such a strategy would be viable. But the fact that it is seems to be similar to the story of why both biological evolution and machine learning are viable—which, as I’ve recently discussed, can be viewed as yet another consequence of the phenomenon of computational irreducibility.
One thing that’s happened many times in the history of the Game of Life is that at some point some category of structure—like a conduit—is identified, and named. But then it’s realized that actually there was something that could be seen as an instance of the same category of structure found much earlier, though without the clarity of the later instance, its significance wasn’t recognized. For example, in 1995 the “Herschel conduit” that moves a from one position to another (here in 64 steps) was discovered (by a search):
But then it was realized that—if looked at correctly—a similar phenomenon had actually already been seen in 1972, in the form of a structure that in effect takes if it is present, and “moves it” (in 28 steps) to a
at a different position (albeit with a certain amount of “containable” other activity):
Looking at the plots above of the number of new structures found per year we see the largest peak after 2020. And, yes, it seems that during the pandemic people spent more time on the Game of Life—in particular trying to fill in tables of structures of particular types, for example, with each possible period.
But what about the human side of engineering in the Game of Life? The activity brought in people from many different backgrounds. And particularly in earlier years, they often operated quite independently, and with very different methods (some not even using a computer). But if we look at all “recorded structures” we can look at how many structures in total different people contributed, and when they made these contributions:
Needless to say—given that we’re dealing with an almost-60-year span—different people tend to show up as active in different periods. Looking at everyone, there’s a roughly exponential distribution to the number of (named) structures they’ve contributed. (Though note that several of the top contributors shown here found parametrized collections of structures and then recorded many instances.)
As a first example of systematic “innovation history” in the Game of Life let’s talk about oscillators. Here are the periods of oscillators that were found up to 1980:
As of 1980, many periods were missing. But in fact all periods are possible—though it wasn’t until 2023 that they were all filled in:
And if we plot the number of distinct periods (say below 60) found by a given year, we can get a first sense of the “arc of progress” in “oscillator technology” in the Game of Life:
Finding an oscillator of a given period is one thing. But how about the smallest oscillator of that period? We can be fairly certain that not all of these are known, even for periods below 30. But here’s a plot that shows when the progressive “smallest so far” oscillators were found for a given period (red indicates the first instance of a given period; blue the best result to date):
And here’s the corresponding plot for all periods up to 100:
But what about the actual reduction in size that’s achieved? Here’s a plot for each oscillator period showing the sequence of sizes found—in effect the “arc of engineering optimization” that’s achieved for that period:
So what are the actual patterns associated with these various oscillators? Here are some results (including timelines of when the patterns were found):
But how were these all found? The period-2 “blinker” was very obvious—showing up in evolution from almost any random initial condition. Some other oscillators were also easily found by looking at the evolution of particular, simple initial conditions. For example, a line of 10 black cells after 3 steps gives the period-15 “pentadecathlon”. Similarly, the period-3 “pulsar” emerges from a pair of length-5 blocks after 22 steps:
Many early oscillators were found by iterative experimentation, often starting with stable “still life” configurations, then perturbing them slightly, as in this period-4 case:
Another common strategy for finding oscillators (that we’ll discuss more below) was to take an “unstable” configuration, then to “stabilize” it by putting “robust” still lifes such as the “block” or the “eater”
around it—yielding results like:
For periods that can be formed as LCMs of smaller periods one “construction-oriented” strategy has been to take oscillators with appropriate smaller periods, and combine them, as in:
In general, many different strategies have been used, as indicated for example by the sequence of period-3 oscillators that have been recorded over the years (where “smallest-so-far” cases are highlighted):
By the mid-1990s oscillators of many periods had been found. But there were still holdouts, like period 19 and for example pretty much all periods between 61 and 70 (except, as it happens, 66). At the time, though, all sorts of complicated constructions—say of prime generators—were nevertheless being done. And in 1996 it was figured out that one could in effect always “build a machine” (using only structures that had already been found two decades earlier) that would serve as an oscillator of any (sufficiently large) period (here 67)—effectively by “sending a signal around a loop of appropriate size”:
But by the 2010s, with large numbers of fast computers becoming available, there was again an emphasis on pure random search. A handful of highly efficient programs were developed, that could be run on anyone’s machine. In a typical case, a search might consist of starting, say, from a trillion randomly chosen initial conditions (or “soups”), identifying new structures that emerge, then seeing whether these act, for example, as oscillators. Typically any new discovery was immediately reported in online forums—leading to variations of it being tried, and new follow-on results often being reported within hours or days.
Many of the random searches started just from 16×16 regions of randomly chosen cells (or larger regions with symmetries imposed). And in a typical manifestation of computational irreducibility, many surprisingly small and “random-looking” (at least up to symmetries) results were found. So, for example, here’s the sequence of recorded period-16 oscillators with smaller-than-before cases highlighted:
Up through the 1990s results were typically found by a mixture of construction and small-scale search. But in 2016, results from large-scale random searches (sometimes symmetrical, sometimes not) started to appear.
The contrast between construction and search could be dramatic, like here for period 57:
One might wonder whether there could actually be a systematic, purely algorithmic way to find, say, possible oscillators of a given period. And indeed for one-dimensional cellular automata (as I noted in 1984), it turns out that there is. Say one considers blocks of cells of width w. Which block can follow which other is determined by a de Bruijn graph, or equivalently, a finite state machine. If one is going to have a pattern with period p, all blocks that appear in it must also be periodic with period p. But such blocks just form a subgraph of the overall de Bruijn graph, or equivalently, form another, smaller, finite state machine. And then all patterns with period p must correspond to paths through this subgraph. But how long are the blocks one has to consider?
In 1D cellular automata, it turns out that there’s an upper bound of 22p. But for 2D cellular automata—like the Game of Life—there is in general no such upper bound, a fact related to the undecidability of the 2D tiling problem. And the result is that there’s no complete, systematic algorithm to find oscillators in a general 2D cellular automaton, or presumably in the Game of Life.
But—as was actually already realized in the mid-1990s—it’s still possible to use algorithmic methods to “fill in” pieces of patterns. The idea is to define part of a pattern of a given period, then use this as a constraint on filling in the rest of it, finding “solutions” that satisfy the constraint using SAT-solving techniques. In practice, this approach has more often been used for spaceships than for oscillators (not least because it’s only practical for small periods). But one feature of it is that it can generate fairly large patterns with a given period.
Yet another method that’s been tried has been to generate oscillators by colliding gliders in many possible ways. But while this is definitely useful if one’s interested in what can be made using gliders, it doesn’t seem to have, for example, allowed people to find much in the way of interesting new oscillators.
In traditional engineering a key strategy is modularity. Rather than trying to build something “all in one go”, the idea is to build a collection of independent subsystems, from which the whole system can then be assembled. But how does this work in the Game of Life? We might imagine that to identify the modular parts of a system, we’d have to know the “process” by which the system was put together, and the “intent” involved. But because in the Game of Life we’re ultimately just dealing with pure patterns of bits we can in effect just as well “come in at the end” and algorithmically figure out what pieces are operating as separate, modular parts.
So how can we do this? Basically what we want to find out is which parts of a pattern “operate independently” at a given step, in the sense that these parts don’t have any overlap in the cells they affect. Given that in the rules for the Game of Life a particular cell can affect any of the 9 cells in its neighborhood, we can say that black cells can only have “overlapping effects” if they are at most
cell units apart. So then we can draw a “nearest neighbor graph” that shows which cells are connected in this sense:
But what about the whole evolution? We can draw what amounts to a causal graph that shows the causal connections between the “independent modular parts” that exist at each step:
And given this, we can summarize the “modular structure” of this particular oscillator by the causal graph:
Ultimately all that matters in the “overall operation” of the oscillator is the partial ordering defined by this graph. Parts that appear “horizontally separated” (or, more precisely, in antichains, or in physics terminology, spacelike separated) can be generated independently and in parallel. But parts that follow each other in the partial order need to be generated in that order (i.e. in physics terms, they are timelike separated).
As another example, let’s look at graphs for the various oscillators of period 16 that we showed above:
What we see is that the early period-16 oscillators were quite modular, and had many parts that in effect operated independently. But the later, smaller ones were not so modular. And indeed the last one shown here had no parts that could operate independently; the whole pattern had to be taken together at each step.
And indeed, what we’ll often see is that the more optimized a structure is, the less modular it tends to be. If we’re going to construct something “by hand” we usually need to assemble it in parts, because that’s what allows us to “understand what we’re doing”. But if, for example, we just find a structure in a search, there’s no reason for it to be “understandable”, and there’s no reason for it to be particularly modular.
Different steps in a given oscillator can involve different numbers of modular parts. But as a simple way to assess the “modularity” of an oscillator, we can just ask for the average number of parts over the course of one period. So as an example, here are the results for period-30 oscillators:
Later, we’ll discuss how we can use the level of modularity to assess whether a pattern is likely to have been found by a search or by construction. But for now, this shows how the modularity index has varied over the years for the best known progressively smaller oscillators of a given period—with the main conclusion being that as the oscillators get optimized for size, so also their modularity index tends to decrease:
Oscillators are structures that cycle but do not move. “Gliders” and, more generally, “spaceships” are structures that move every time they cycle. When the Game of Life was first introduced, four examples of these (all of period 4) were found almost immediately (the last one being the result of trying to extend the one before it):
Within a couple of years, experimentation had revealed two variants, with periods 12 and 20 respectively, involving additional structures:
But after that, for nearly two decades, no more spaceships were found. In 1989, however, a systematic method for searching was invented, and in the years since, a steady stream of new spaceships have been found. A variety of different periods have been seen
as well as a variety of speeds (and three different angles):
The forms of these spaceships are quite diverse:
Some are “tightly integrated”, while some have many “modular pieces”, as revealed by their causal graphs:
Period-96 spaceships provide an interesting example of the “arc of progress” in the Game of Life. Back in 1971, a systematic enumeration of small polyominoes was done, looking for one that could “reproduce itself”. While no polyomino on its own seemed to do this, a case was found where part of the pattern produced after 48 steps seemed to reappear repeatedly every 48 steps thereafter:
One might expect this repeated behavior to continue forever. But in a typical manifestation of computational irreducibility, it doesn’t, instead stopping its “regeneration” after 24 cycles, and then reaching a steady state (apart from “radiated” gliders) after 3911 steps:
But from an engineering point of view this kind of complexity was just viewed as a nuisance, and efforts were made to “tame” and avoid it.
Adding just one still-life block to the so-called “switch engine”
produces a structure that keeps generating a “periodic wake” forever:
But can this somehow be “refactored” as a “pure spaceship” that doesn’t “leave anything behind”? In 1991 it was discovered that, yes, there was an arrangement of 13 switch engines that could successfully “clean up behind themselves”, to produce a structure that would act as a spaceship with period 96:
But could this be made simpler? It took many years—and tests of many different configurations—but in the end it was found that just 2 switch engines were sufficient:
Looking at the final pattern in spacetime gives a definite impression of “narrowly contained complexity”:
What about the causal graphs? Basically these just decrease in “width” (i.e. number of independent modular parts) as the number of engines decreases:
Like many other things in Game-of-Life engineering, both search and construction have been used to find spaceships. As an extreme example of construction let’s talk about the case of spaceships with speed 31/240. In 2013, an analog of the switch engine above was found—which “eats” blocks 31 cells apart every 240 steps:
But could this be turned into a “self-sufficient” spaceship? A year later an almost absurdly large (934852×290482) pattern was constructed that did this—by using streams of gliders and spaceships (together with dynamically assembled glider guns) to create appropriate blocks in front, and remove them behind (along with all the “construction equipment” that was used):
By 2016, a pattern with about 700× less area had been constructed. And now, just a few weeks ago, a pattern with 1300× less area (11974×45755) was constructed:
And while this is still huge, it’s still made of modular pieces that operate in an “understandable” way. No doubt there’s a much smaller pattern that operates as a spaceship of the same speed, but—computational irreducibility being what it is—we have no idea how large the pattern might be, or how we might efficiently search for it.
What can one engineer in the Game of Life? A crucial moment in the development of Game-of-Life engineering was the discovery of the original glider gun in 1970. And what was particularly important about the glider gun is that it was a first example of something that could be thought of as a “signal generator”—that one could imagine would allow one to implement electrical-engineering-style “devices” in the Game of Life.
The original glider gun produces gliders every 30 steps, in a sense defining a “clock speed” of 1/30 for any “circuit” driven by it. Within a year after the original glider gun, two other “slower” glider guns had also been discovered
both working on similar principles, as suggested by their causal graphs:
It wasn’t until 1990 that any additional “guns” were found. And in the years since, a sequence of guns have been found, with a rather wide range of distinct periods:
Some of the guns found have very long periods:
But as part of the effort to do constructions in the 1990s a gun was constructed that had overall period 210, but which interwove multiple glider streams to ultimately produce gliders every 14 steps (which is the maximum rate possible, while avoiding interference of successive gliders):
Over the years, a whole variety of different glider guns have been found. Some are in effect “thoroughly controlled” constructions. Others are more based on some complex process that is reined in to the point where it just produces a stream of gliders and nothing more:
An example of a somewhat surprising glider gun—with the shortest “true period” known—was found in 2024:
The causal graph for this glider gun shows a mixture of irreducible “search-found” parts, together with a collection of “well-known” small modular parts:
By the way, in 2013 it was actually found possible to extend the construction for oscillators of any period to a construction for guns of any period (or at least any period above 78):
In addition to having streams of gliders, it’s also sometimes been found useful to have streams of other “spaceships”. Very early on, it was already known that one could create small spaceships by colliding gliders:
But by the mid-1990s it had been found that direct “spaceship guns” could also be made—and over the years smaller and smaller “optimized” versions have been found:
The last of these—from just last month—has a surprisingly simple structure, being built from components that were already known 30 years ago, and having a causal graph that shows very modular construction:
We’ve talked about some of the history of how specific patterns in the Game of Life were found. But what about the overall “flow of engineering progress”? And, in particular, when something new is found, how much does it build on what has been found before? In real-world engineering, things like patent citations potentially give one an indication of this. But in the Game of Life one can approach the question much more systematically and directly, just asking what configurations of bits from older patterns are used in newer ones.
As we discussed above, given a pattern such as
we can pick out its “modular parts”, here rotated to canonical orientations:
Then we can see if these parts correspond to (any phase of) previously known patterns, which in this case they all do:
So now for all structures in the database we can ask what parts they involve. Here’s a plot of the overall frequencies of these parts:
It’s notable that the highest-ranked part is a so-called “eater” that’s often used in constructions, but occurs only quite infrequently in evolution from random initial conditions. It’s also notable that (for no particularly obvious reason) the frequency of the nthmost common structure is roughly 1/n.
So when were the various structures that appear here first found? As this picture shows, most—but not all—were found very early in the history of the Game of Life:
In other words, most of the parts used in structures from any time in the history of the Game of Life come from very early in its history. Or, in effect, structures typically go “back to basics” in the parts they use.
Here’s a more detailed picture, showing the relative amount of use of each part in structures from each year:
There are definite “fashions” to be seen here, with some structures “coming into fashion” for a while (sometimes, but not always, right after they were first found), and then dropping out.
One might perhaps imagine that smaller parts (i.e. ones with smaller areas) would be more popular than larger ones. But plotting areas of parts against their rank, we see that there are some large parts that are quite common, and some small ones that are rare:
We’ve seen that many of the most popular parts overall are ones that were found early in the history of the Game of Life. But plenty of distinct modular parts were also found much later. This shows the number of distinct new modular parts found across all patterns in successive years:
Normalizing by the number of new patterns found each year, we see a general gradual increase in the relative number of new modular parts, presumably reflecting the greater use of search in finding patterns, or components of patterns:
But how important have these later-found modular parts been? This shows the total rate at which modular parts found in a given year were subsequently used—and what we see, once again, is that parts found early are overwhelmingly the ones that are subsequently used:
A somewhat complementary way to look at this is to ask of all patterns found in a given year, how many are “purely de novo”, in the sense that they use no previously found modular parts (as indicated in red), and how many use previously found parts:
A cumulative version of this makes it clear that in early years most patterns are purely de novo, but later on, there’s an increasing amount of “reuse” of previously found parts—or, in other words, in later years the “engineering history” is increasingly important:
It should be said, however, that if one wants the full story of “what’s being used” it’s a bit more nuanced. Because here we’re always treating each modular part of each pattern as a separate entity, so that we consider any given pattern to “depend” only on base modular parts. But “really” it could depend on another whole structure, itself built of many modular parts. And in what we’re doing here, we’re not tracking that hierarchy of dependencies. Were we to do so, we would likely be able to see more complex “technology stacks” in the Game of Life. But instead we’re always “going down to the primitives”. (If we were dealing with electronics it’d be like asking “What are the transistors and capacitors that are being used?”, rather than “What is the caching architecture, or how is the floating point unit set up?”)
OK, but in terms of “base modular parts” a simple question to ask is how many get used in each pattern. This shows the number of (base) modular parts in patterns found in each year:
There are always a certain number of patterns that just consist of a single modular part—and, as we saw above, that was more common earlier in the history of the Game of Life. But now we also see that there have been an increasing number of patterns that use many modular parts—typically reflecting a higher degree of “construction” (rather than search) going on.
By the way, for comparison, these plots show the total areas and the numbers of (black) cells in patterns found in each year; both show increases early on, but more or less level off by the 1990s:
But, OK, if we look across all patterns in the database, how many parts do they end up using? Here’s the overall distribution:
At least for a certain range of numbers of parts, this falls roughly exponentially, reflecting the idea that it’s been exponentially less likely for people to come up with (or find) patterns that have progressively larger numbers of distinct modular parts.
How has this changed over time? This shows a cumulative plot of the relative frequencies with which different numbers of modular parts appear in patterns up to a given year
indicating that over time the distribution of the number of modular parts has gotten progressively broader—or, in other words, as we’ve seen in other ways above, more patterns make use of larger numbers of modular parts.
We’ve been looking at all the patterns that have been found. But we can also ask, say, just about oscillators. And then we can ask, for example, which oscillators (with which periods) contain which others, as in:
And looking at all known oscillators we can see how common different “oscillator primitives” are in building up other oscillators:
We can also ask in which year “oscillator primitives” at different ranks were found. Unlike in the case of all structures above, we now see that some oscillator primitives that were found only quite recently appear at fairly high ranks—reflecting the fact that in this case, once a primitive has been found, it’s often immediately useful in making oscillators that have multiples of its period:
We can think of almost everything we’ve talked about so far as being aimed at creating structures (like “clocks” and “wires”) that are recognizably useful for building traditional “machine-like” engineering systems. But a different possible objective is to find patterns that have some feature we can recognize, whether with obvious immediate “utility” or not. And as one example of this we can think about finding so-called “die hard” patterns that live as long as possible before dying out.
The phenomenon of computational irreducibility tells us that even given a particular pattern we can’t in general “know in advance” how long it’s going to take to die out (or if it ultimately dies out at all). So it’s inevitable that the problem of finding ultimate die-hard patterns can be unboundedly difficult, just like analogous problems for other computational systems (such as finding so-called “busy beavers” in Turing machines).
But in practice one can use both search and construction techniques to find patterns that at least live a long time (even if not the very longest possible time). And as an example, here’s a very simple pattern (found by search) that lives for 132 steps before dying out (the “puff” at the end on the left is a reflection of how we’re showing “trails”; all the actual cells are zero at that point):
Searching nearly 1016randomly chosen 16×16 patterns (out of a total of ≈ 1077 possible such patterns), the longest lifetime found is 1413 steps—achieved with a rather random-looking initial pattern:
But is this the best one can do? Well, no. Just consider a block and a spaceship n cells apart. It’ll take 2n steps for them to collide, and if the phases are right, annihilate each other:
So by picking the separation n to be large enough, we can make this configuration “live as long as we want”. But what if we limit the size of the initial pattern, say to 32×32? In 2022 the following pattern was constructed:
And this pattern is carefully set up so that after 30,274 steps, everything lines up and it dies out, as we can see in the (vertically foreshortened) spacetime diagram on the left:
And, yes, the construction here clearly goes much further than search was able to reach. But can we go yet further? In 2023 a 116×86 pattern was constructed
that it was proved eventually dies out, but only after the absurdly large number of 17↑↑↑3 steps (probably even much larger than the number of emes in the ruliad), as given by:
or
There are some definite rough ways in which technology development parallels biological evolution. Both involve the concept of trying out possibilities and building on ones that work. But technology development has always ultimately been driven by human effort, whereas biological evolution is, in effect, a “blind” process, based on the natural selection of random mutations. So what happens if we try to apply something like biological evolution to the Game of Life? As an example, let’s look at adaptive evolution that’s trying to maximize finite lifetime based on making a sequence of random point mutations within an initially random 16×16 pattern. Most of those mutations don’t give patterns with larger (finite) lifetimes, but occasionally there’s a “breakthrough” and the lifetime achieved so far jumps up:
The actual behaviors corresponding to the breakthroughs in this case are:
And here are some other outcomes from adaptive evolution:
In almost all cases, a limited number of steps of adaptive evolution do succeed in generating patterns with fairly long finite lifetimes. But the behavior we see typically shows no “readily understandable mechanisms”—and no obviously separable modular parts. And instead—just like in my recent studies of both biological evolution and machine learning—what we get are basically “lumps of irreducible computation” that “just happen” to show what we’re looking for (here, long lifetime).
Let’s say we’re presented with an array of cells that’s an initial condition for the Game of Life. Can we tell “where it came from”? Is it “just arbitrary” (or “random”)? Or was it “set up for a purpose”? And if it was “set up for a purpose”, was it “invented” (and “constructed”) for that purpose, or was it just “discovered” (say by a search) to fulfill that purpose?
Whether one’s dealing with archaeology, evolutionary biology, forensic science, the identification of alien intelligence or, for that matter, theology, the question of whether something “was set up for a purpose” is a philosophically fraught one. Any behavior one sees one can potentially explain either in terms of the mechanism that produces it, or in terms of what it “achieves”. Things get a little clearer if we have a particular language for describing both mechanisms and purposes. Then we can ask questions like: “Is the behavior we care about more succinctly described in terms of its mechanism or its purpose?” So, for example, “It behaves as a period-15 glider gun” might be an adequate purpose-oriented description, that’s much shorter than a mechanism-oriented description in terms of arrangements of cells.
But what is the appropriate “lexicon of purposes” for the Game of Life? In effect, that’s a core question for Game-of-Life engineering. Because what engineering—and technology in general—is ultimately about is taking whatever raw material is available (whether from the physical world, or from the Game of Life) and somehow fashioning it into something that aligns with human purposes. But then we’re back to what counts as a valid human purpose. How deeply does the purpose have to connect in to everything we do? Is it, for example, enough for something to “look nice”, or is that not “utilitarian enough”? There aren’t absolute answers to these questions. And indeed the answers can change over time, as new uses for things are discovered (or invented).
But for the Game of Life we can start with some of the “purposes” we’ve discussed here—like “be an oscillator of a certain period”, “reflect gliders”, “generate the primes” or even just “die after as long as possible”. Let’s say we just start enumerating possible initial patterns, either randomly, or exhaustively. How often will we come across patterns that “achieve one of these purposes”? And will it “only achieve that purpose” or will it also “do extra stuff” that “seems irrelevant”?
As an example, consider enumerating all possible 3×3 patterns of cells. There are altogether
Other patterns can take a while to “become period 2”, but then at least give “pure period-2 objects”. And for example this one can be interpreted as being the smallest precursor, and taking the least time, to reach the period-2 object it produces:
There are other cases that “get to the same place” but seem to “wander around” doing so, and therefore don’t seem as convincing as having been “created for the purpose of making a period-2 oscillator”:
Then there are much more egregious cases. Like
which after 173 steps gives
but only after going through all sorts of complicated intermediate behavior
that definitely doesn’t make it look like it’s going “straight to its purpose” (unless perhaps its purpose is to produce that final pattern from the smallest initial precursor, etc.).
But, OK. Let’s imagine we have a pattern that “goes straight to” some “recognizable purpose” (like being an oscillator of a certain period). The next question is: was that pattern explicitly constructed with an understanding of how it would achieve its purpose, or was it instead “blindly found” by some kind of search?
As an example, let’s look at some period-9 oscillators:
One like
seems like it must have been constructed out of “existing parts”, while one like
seems like it could only plausibly have been found by a search.
Spacetime views don’t tell us much in these particular cases:
But causal graphs are much more revealing:
They show that in the first case there are lots of “factored modular parts”, while in the second case there’s basically just one “irreducible blob” with no obvious separable parts. And we can view this as an immediate signal for “how human” each pattern is. In a sense it’s a reflection of the computational boundedness of our minds. When there are factored modular parts that interact fairly rarely and each behave in a fairly simple way, it’s realistic for us to “get our minds around” what’s going on. But when there’s just an “irreducible blob of activity” we’d have to compute too much and keep too much in mind at once for us to be able to really “understand what’s going on” and for example produce a human-level narrative explanation of it.
If we find a pattern by search, however, we don’t really have to “understand it”; it’s just something we computationally “discover out there in the computational universe” that “happens” to do what we want. And, indeed, as in the example here, it often does what it does in a quite minimal (if incomprehensible) way. Something that’s found by human effort is much less likely to be minimal; in effect it’s at least somewhat “optimized for comprehensibility” rather than for minimality or ease of being found by search. And indeed it will often be far too big (e.g. in terms of number of cells) for any pure exhaustive or random search to plausibly find it—even though the “human-level narrative” for it might be quite short.
Here are the causal graphs for all the period-9 oscillators from above:
Some we can see can readily be broken down into multiple rarely interacting distinct components; others can’t be decomposed in this kind of way. And in a first approximation, the “decomposable” ones seem to be precisely those that were somehow “constructed by human effort”, while the non-decomposable ones seem to be those that were “discovered by searches”.
Typically, the way the “constructions” are done is to start with some collection of known parts, then, by trial and error (sometimes computer assisted) see how these can be fit together to get something that does what one wants. Searches, on the other hand, typically operate on “raw” configurations of cells, blindly going through a large number of possible configurations, at every stage automatically testing whether one’s got something that does what one wants.
And in the end these different strategies reveal themselves in the character of the final patterns they produce, and in the causal graphs that represent these patterns and their behavior.
In engineering as it’s traditionally been practiced, the main emphasis tends to be on figuring out plans, and then constructing things based on those plans. Typically one starts from components one has, then tries to figure out how to combine them to incrementally build up what one wants.
And, as we’ve discussed, this is also a way of developing technology in the Game of Life. But as we’ve discussed at length, it’s not the only way. Another way is just to search for whole pieces of technology one wants.
Traditional intuition might make one assume this would be hopeless. But the repeated lesson of my discoveries about simple programs—as well as what’s been done with the Game of Life—is that actually it’s often not hopeless at all, and instead it’s very powerful.
Yes, what you get is not likely to be readily “understandable”. But it is likely to be minimal and potentially quite optimal for whatever it is that it does. I’ve often talked of this approach as “mining from the computational universe”. And over the course of many years I’ve had success with it in all sorts of disparate areas. And now, here, we’ve see in the Game of Life a particularly clean example where search is used alongside construction in developing technology.
It’s a feature of things produced by construction that they are “born understandable”. In effect, they are computationally reducible enough that we can “fit them in our finite minds” and “understand them”. But things found by search don’t have this feature. And most of the time the behavior they’ll show will be full of computational irreducibility.
In both biological evolution and machine learning my recent investigations suggest that most of what we’re seeing are “lumps of irreducible computation” found at random that just “happen to achieve the necessary objectives”. This hasn’t been something familiar in traditional engineering, but it’s something tremendously powerful. And from the examples we’ve seen here in the Game of Life it’s clear that it can often achieve things that seem completely inaccessible by traditional methods based on explicit construction.
At first we might assume that irreducible computation is too unruly and unpredictable to be useful in achieving “understandable objectives”. But if we find just the right piece of irreducible computation then it’ll achieve the objective we want, often in a very minimal way. And the point is that the computational universe is in a sense big enough that we’ll usually be able to find that “right piece of irreducible computation”.
One thing we see in Game-of-Life engineering is something that’s in a sense a compromise between irreducible computation and predictable construction. The basic idea is to take something that’s computationally irreducible, and to “put it in a cage” that constrains it to do what one wants. The computational irreducibility is in a sense the “spark” in the system; the cage provides the control we need to harness that spark in a way that meets our objectives.
Let’s look at some examples. As our “spark” we’ll use the R pentomino that we discussed at the very beginning. On its own, this generates all sorts of complex behavior—that for the most part doesn’t align with typical objectives we might define (though as a “side show” it does happen to generate gliders). But the idea is to put constraints on the R pentomino to make it “useful”.
Here’s a case where we’ve tried to “build a road” for the R pentomino to go down:
And looking at this every 18 steps we see that, at least for a while, the R pentomino has indeed moved down the road. But it’s also generated something of an “explosion”, and eventually this explosion catches up, and the R pentomino is destroyed.
So can we maintain enough control to let the R pentomino survive? The answer is yes. And here, for example, is a period-12 oscillator, “powered” by an R pentomino at its center:
Without the R pentomino, the structure we’ve set up cycles with period 6:
And when we insert the R pentomino this structure “keeps it under control”—so that the only effect it ultimately has is to double the period, t0 12.
Here’s a more dramatic example. Start with a static configuration of four so-called “eaters”:
Now insert two R pentominoes. They’ll start doing their thing, generating what seems like quite random behavior. But the “cage” defined by the “eaters” limits what can happen, and in the end what emerges is an oscillator—that has period 129:
What else can one “make R pentominoes do”? Well, with appropriate harnesses, they can for example be used to “power” oscillators with many different periods:
“Be an oscillator of a certain period” is in a sense a simple objective. But what about more complex objectives? Of course, any pattern of cells in the Game of Life will do something. But the question is whether that something aligns with technological objectives we have.
Generically, things in the Game of Life will behave in computationally irreducible ways. And it’s this very fact that gives such richness to what can be done with the Game of Life. But can the computational irreducibility be controlled—and harnessed for technological purposes? In a sense that is the core challenge of engineering in both the Game of Life, and in the real world. (It’s also rather directly the challenge we face in making use of the computational power of AI, but still adequately aligning it with human objectives.)
As we look at the arc of technological development in the Game of Life we see over the course of half a century all sorts of different advances being made. But will there be an end to this? Will we eventually run out of inventions and discoveries? The underlying presence of computational irreducibility makes it clear that we will not. The only thing that might end is the set of objectives we’re trying to meet. We now know how to make oscillators of any period. And unless we insist on for example finding the smallest oscillator of a given period, we can consider the problem of finding oscillators solved, with nothing more to discover.
In the real world nature and the evolution of the universe inevitably confront us with new issues, which lead to new objectives. In the Game of Life—as in any other abstract area, like mathematics—the issue of defining new objectives is up to us. Computational irreducibility leads to infinite diversity and richness of what’s possible. The issue for us is to figure out what direction we want to go. And the story of engineering and technology in the Game of Life gives us, in effect, a simple model for the issues we confront in other areas of technology, like AI.
I’m not sure if I made the right decision back in 1981. I had come up with a very simple class of systems and was doing computer experiments on them, and was starting to get some interesting results. And when I mentioned what I was doing to a group of (then young) computer scientists they said “Oh, those things you’re studying are called cellular automata”. Well, actually, the cellular automata they were talking about were 2D systems while mine were 1D. And though that might seem like a technical difference, it has a big effect on one’s impression of what’s going on—because in 1D one can readily see “spacetime histories” that gave an immediate sense of the “whole behavior of the system”, while in 2D one basically can’t.
I wondered what to call my models. I toyed with the term “polymones”—as a modernized nod to Leibniz’s monads. But in the end I decided that I should stick with a simpler connection to history, and just call my models, like their 2D analogs, “cellular automata”. In many ways I’m happy with that decision. Though one of its downsides has been a certain amount of conceptual confusion—more than anything centered around the Game of Life.
People often know that the Game of Life is an example of a cellular automaton. And they also know that within the Game of Life lots of structures (like gliders and glider guns) can be set up to do particular things. Meanwhile, they hear about my discoveries about the generation of complexity in cellular automata (like rule 30). And somehow they conflate these things—leading to all too many books etc. that show pictures of simple gliders in the Game of Life and say “Look at all this complexity!”
At some level it’s a confusion between science and engineering. My efforts around cellular automata have centered on empirical science questions like “What does this cellular automaton do if you run it?” But—as I’ve discussed at length above—most of what’s been done with the Game of Life has centered instead on questions of engineering, like “What recognizable (or useful) structures can you build in the system?” It’s a different objective, with different results. And, in particular, by asking to “engineer understandable technology” one’s specifically eschewing the phenomenon of computational irreducibility—and the whole story of the emergence of complexity that’s been so central to my own scientific work on cellular automata and so much else.
Many times over the years, people would show me things they’d been able to build in the Game of Life—and I really wouldn’t know what to make of them. Yes, they seemed like impressive hacks. But what was the big picture? Was this just fun, or was there some broader intellectual point? Well, finally, not long ago I realized: this is not a story of science, it’s a story about the arc of engineering, or what one can call “metaengineering”.
And back in 2018, in connection with the upcoming 50th anniversary of the Game of Life, I decided to see what I could figure out about this. But I wasn’t satisfied with how far I got, and other priorities interceded. So—beyond one small comment that ended up in a 2020 New York Times article—I didn’t write anything about what I’d done. And the project languished. Until now. When somehow my long-time interest in “alien engineering”, combined with my recent results about biological evolution coalesced into a feeling that it was time to finally figure out what we could learn from all that effort that’s been put into the Game of Life.
In a sense this brings closure to a very long-running story for me. The first time I heard about the Game of Life was in 1973. I was an early teenager then, and I’d just gotten access to a computer. By today’s standards the computer (an Elliott 903C) was a primitive one: the size of a desk, programmed with paper tape, with only 24 kilobytes of memory. I was interested in using it for things like writing a simulator for the physics of idealized gas molecules. But other kids who had access to the computer were instead more interested (much as many kids might be today) in writing games. Someone wrote a “Hunt the Wumpus” game. And someone else wrote a program for the “Game of Life”. The configurations of cells at each generation were printed out on a teleprinter. And for some reason people were particularly taken with the “Cheshire cat” configuration, in which all that was left at the end (as in Alice in Wonderland) was a “smile”. At the time, I absolutely didn’t see the point of any of this. I was interested in science, not games, and the Game of Life pretty much lost me at “Game”.
For a number of years I didn’t have any further contact with the Game of Life. But then I met Bill Gosper, who I later learned had in 1970 discovered the glider gun in the Game of Life. I met Gosper first “online” (yes, even in 1978 that was a thing, at least if you used the MIT-MC computer through the ARPANET)—then in person in 1979. And in 1980 I visited him at Xerox PARC, where he described himself as part of the “entertainment division” and gave me strange math formulas printed on a not-yet-out-of-the-lab color laser printer
and also showed me a bitmapped display (complete with GUI) with lots of pixels dancing around that he enthusiastically explained were showing the Game of Life. Knowing what I know now, I would have been excited by what I saw. But at the time, it didn’t really register.
Still, in 1981, having started my big investigation of 1D cellular automata, and having made the connection to the 2D case of the Game of Life, I started wondering whether there was something “scientifically useful” that I could glean from all the effort I knew (particularly from Gosper) had been put into Life. It didn’t help that almost none of the output of that effort had been published. And in those days before the web, personal contact was pretty much the only way to get unpublished material. One of my larger “finds” was from a friend of mine from Oxford who passed on “lab notebook pages” he’d got from someone who was enumerating outcomes from different Game-of-Life initial configurations:
And from material like this, as well as my own simulations, I came up with some tentative “scientific conclusions”, which I summarized in 1982 in a paragraph in my first big paper about cellular automata:
But then, at the beginning of 1983, as part of my continuing effort to do science on cellular automata, I made a discovery. Among all cellular automata there seemed to be four basic classes of behavior, with class 4 being characterized by the presence of localized structures, sometimes just periodic, and sometimes moving:
I immediately recognized the analogy to the Game of Life, and to oscillators and gliders there. And indeed this analogy was part of what “tipped me off” to thinking about the ubiquitous computational capabilities of cellular automata, and to the phenomenon of computational irreducibility.
Meanwhile, in March 1983, I co-organized what was effectively the first-ever conference on cellular automata (held at Los Alamos)—and one of the people I invited was Gosper. He announced his Hashlife algorithm (which was crucial to future Life research) there, and came bearing gifts: printouts for me of Life, that I annotated, and still have in my archives:
I asked Gosper to do some “more scientific” experiments for me—for example starting from a region of randomness, then seeing what happened:
But Gosper really wasn’t interested in what I saw as being science; he wanted to do engineering, and make constructions—like this one he gave me, showing two glider guns exchanging streams of gliders (why would one care, I wondered):
I’d mostly studied 1D cellular automata—where I’d discovered a lot by systematically looking at their behavior “laid out in spacetime”. But in early 1984 I resolved to also systematically check out 2D cellular automata. And mostly the resounding conclusion was that their basic behavior was very similar to 1D. Out of all the rules we studied, the Game of Life didn’t particularly stand out. But—mostly to provide a familiar comparison point—I included pictures of it in the paper we wrote:
And we also went to the trouble of making a 3D “spacetime” picture of the Game of Life on a Cray supercomputer—though it was too small to show anything terribly interesting:
It had been a column in Scientific American in 1970 that had first propelled the Game of Life to public prominence—and that had also launched the first great Life engineering challenge of finding a glider gun. And in both 1984 and 1985 a successor to that very same column ran stories about my 1D cellular automata. And in 1985, in collaboration with Scientific American, I thought it would be fun and interesting to reprise the 1970 glider gun challenge, but now for 1D class 4 cellular automata:
Many people participated. And my main conclusion was: yes, it seemed like one could do the same kinds of engineering in typical 1D class 4 cellular automata as one could in the Game of Life. But this was all several years before the web, and the kind of online community that has driven so much Game of Life engineering in modern times wasn’t yet able to form.
Meanwhile, by the next year, I was starting the development of Mathematica and what’s now the Wolfram Language, and for a few years didn’t have much time to think about cellular automata. But in 1987 when Gosper got involved in making pre-release demos of Mathematica he once again excitedly told me about his discoveries in the Game of Life, and gave me pictures like:
It was in 1992 that the Game of Life once again appeared in my life. I had recently embarked on what would become the 10-year project of writing my book A New Kind of Science. I was working on one of the rather few “I already have this figured out” sections in the book—and I wanted to compare class 4 behavior in 1D and 2D. How was I to display the Game of Life, especially in a static book? Equipped with what’s now the Wolfram Language it was easy to come up with visualizations—looking “out” into a spacetime slice with more distant cells “in a fog”, as well as “down” into a fog of successive states:
And, yes, it was immediately striking how similar the spacetime slice looked to my pictures of 1D class 4 cellular automata. And when I wrote a note for the end of the book about Life, the correspondence became even more obvious. I’d always seen the glider gun as a movie. But in a spacetime slice it “made much more sense”, and looked incredibly similar to analogous structures in 1D class 4 cellular automata:
In A New Kind of Science I put a lot of effort into historical notes. And as a part of such a note on “History of cellular automata” I had a paragraph about the Game of Life:
I first met John Conway in September 1983 (at a conference in the south of France). As I would tell his biographer many years later, my relationship with Conway was complicated from the start. We were both drawn to systems defined by very simple rules, but what we found interesting about them was very different. I wanted to understand the big picture and to explore science-oriented questions (and what I would now call ruliology). Conway, on the other hand, was interested in specific, often whimsically presented results—and in questions that could be couched as mathematical theorems.
In my conversations with Conway, the Game of Life would sometimes come up, but Conway never seemed too interested in talking about it. In 2001, though, when I was writing my note about the history of 2D cellular automata, I spent several hours specifically asking Conway about the Game of Life and its history. At first Conway told me the standard origin story that Life had arisen as a kind of game. A bit later he said he’d at the time just been hired as a logic professor, and had wanted to use Life as a simple way to enumerate the recursive functions. In the end, it was hard to disentangle true recollections from false (or “elaborated”) ones. And, notably, when asked directly about the origin of the specific rules of Life, he was evasive. Of course, none of that should detract from Conway’s achievement in the concept of the Game of Life, and in the definition of the hacker-like culture around it—the fruits of which have now allowed me to do what I’ve done here.
For many years after the publication of A New Kind of Science in 2002, I didn’t actively engage with the Game of Life—though I would hear from Life enthusiasts with some frequency, but none as much as Gosper, from whom I was a recipient of hundreds of messages about Life, a typical example from 2017 concerning
and saying:
Novelty is mediated by the sporadic glider gas (which forms very sparse
beams), sporadic debris (forming sparse lines), and is hidden in sporadic
defects in the denser beams and lines. At this scale, each screen pixel
represents 262144 x 262144 Life cells. Thus very sparse lines, e.g. density
10^-5, appear solid, while being very nearly transparent to gliders.
After 3.4G, (sparse) new glider beams are still fading up. The beams
repeatedly strafe the x and y axis stalagmites.
I suspect this will (very) eventually lead to a positive density of
switch-engines, and thus quadratic population growth.
⋮
Finally, around 4.2G, an eater1 (fish hook):
Depending on background novelty radiation, there ought to be one of
these every few billion, all lying on a line through the origin.
⋮
With much help from Tom R, I slogged to 18G, with *zero* new nonmovers
in the 4th quadrant, causing me to propose a mechanism that precluded
future new ones. But then Andrew Trevorrow fired up his Big Mac (TM),
ran 60G, and found three new nonmovers! They are, respectively, a mirror
image(!) of the 1st eater, and two blinkers, in phase, but not aligned with
the origin. I.e., all four are "oners'", or at least will lie on different
trash trails.
I’m still waiting for one of these to sprout switch-engines and begin quadratic
growth. But here’s a puzzle: Doesn’t the gas of sparse gliders (actually glider
packets) in the diagonal strips athwart the 1st quadrant already reveal (small
coefficient) quadratic growth? Which will *eventually* dominate? The area of the
strips is increasing quadratically. Their density *appears* to be at least holding,
but possibly along only one axis. I don’t see where quadratically many gliders could
arise. They’re being manufactured at a (roughly) fixed rate. Imagine the above
picture in the distant future. Where is the amplification that will keep those
strips full? ‐‐Bill
Does it just happen to come out that way, or was it somehow made to be that way? It was a big shock to my intuition at the beginning of the 1980s when I began to see all the richness that even very simple programs can produce. And it made me start to wonder about all our technological and other achievements. With our goals and intentions, were we producing things that were somehow different from what even simple programs “could do anyway”? How would we be able to tell whether that interstellar radio signal was the product of some sophisticated civilization, or just something that “happened naturally”? My Principle of Computational Equivalence implied that at an ultimate level there wouldn’t really be a way to tell. But I kept on wondering whether there might at least be some signature of “purposes like ours” that we could detect.
At first it was extraterrestrial intelligence and animal intelligence, later also artificial intelligence. But the question kept on coming back: what distinguishes what’s engineered from what just “scientifically happens”? (And, yes, there was a theological question in there too.) I had wondered for a while about using the Game of Life as a testing ground for this, and as the 50th anniversary of the Game of Life approached in 2018, I took this as a cue to explore it.
Over the years I had accumulated a paper file perhaps 6 inches thick about the Game of Life (a few samples from which I’ve shown above). But looking around the web I was impressed at how much well-organized material there now was out there about the Game of Life. I started to try to analyze it, imagining that I might see something like an analog of Moore’s law. Meanwhile, over the preceding decade I had written a lot about the history of science, and I thought that as part of my contribution to the 50th anniversary of the Game of Life I should try to write about its history. What were the stories of all those people whose names were attached to discoveries in Life? A research assistant of mine began to track them down, and interview them. It turned out to be a very disparate group, many of whom knew little about each other. (Though they often, but not always, had in common graduate-level education in math.) And in any case it became clear that writing a coherent history was going to be a huge undertaking. In addition, the first few ways I tried to discern trends in data about the Game of Life data didn’t yield much. And soon the 50th anniversary had passed—and I got busy with other things.
But the project of studying the “metaengineering” of the Game of Life stayed on my “to do” list (and a couple of students at our Wolfram Summer School worked on it). Then in 2022 a nice book on the Game of Life came out (by Nathaniel Johnston and Dave Greene, the latter of whom had actually been at our Summer School back in 2011). Had my project been reduced to just reading this book, I wondered. I soon realized that it hadn’t. And there were now all kinds of questions on which I imagined a study of the Game of Life could shed light. Not only questions about the “signature of purpose”. But also questions about novelty and creativity. And about the arc and rhythm of innovation.
Then in 2024 came the surprises of my work on biological evolution, and on machine learning. And I found myself again wondering about how things work when there’s “intentional engineering”. And so I finally decided to do my long-planned study of the Game of Life. There’s much, much more that can be done. But I think what I’ve done here provides an indication of some of the directions one can go, and some of what there is to discover in what is effectively the new field of “computational metaengineering”.
Thanks to Willem Nielsen of the Wolfram Institute for extensive help, as well as to Ed Pegg of Wolfram Research. (Thanks also to Brad Klee for earlier work.) Over the years, I’ve interacted with many people about the Game of Life. In rough order of my first (“Life”) interactions with them, these include: Jeremy Barford (1973), Philip Gladstone (1973), Nicholas Goulder (1973), Norman Routledge (1973), Bill Gosper (1979), Tim Robinson (1981), Paul Leyland (1981), Norman Margolus (1982), John Conway (1983), Brian Silverman (1985), Eric Weisstein (1999), Ed Pegg (2000), Jon C. R. Bennett (2006), Robert Wainwright (2010), Dave Greene (2011), Steve Bourne (2018), Tanha Kate (2018), Simon Norton (2018), Adam Goucher (2019), Keith Patarroyo (2021), Steph Macurdy (2021), Mark McAndrew (2022), Richard Assar (2024) and Nigel Martin (2025). And, of course, thanks to the many people who’ve contributed over the past half century to the historical progression of Life engineering that I’ve been analyzing here.
Note added April 24, 2025: Thanks to Dave Greene who pointed out an incorrect historical inference, which has now been updated.
2025-02-04 07:27:46
As it’s practiced today, medicine is almost always about particulars: “this has gone wrong; this is how to fix it”. But might it also be possible to talk about medicine in a more general, more abstract way—and perhaps to create a framework in which one can study its essential features without engaging with all of its details?
My goal here is to take the first steps towards such a framework. And in a sense my central result is that there are many broad phenomena in medicine that seem at their core to be fundamentally computational—and to be captured by remarkably simple computational models that are readily amenable to study by computer experiment.
I should make it clear at the outset that I’m not trying to set up a specific model for any particular aspect or component of biological systems. Rather, my goal is to “zoom out” and create what one can think of as a “metamodel” for studying and formalizing the abstract foundations of medicine.
What I’ll be doing builds on my recent work on using the computational paradigm to study the foundations of biological evolution. And indeed in constructing idealized organisms we’ll be using the very same class of basic computational models. But now, instead of considering idealized genetic mutations and asking what types of idealized organisms they produce, we’re going to be looking at specific evolved idealized organisms, and seeing what effect perturbations have on them. Roughly, the idea is that an idealized organism operates in its normal “healthy” way if there are no perturbations—but perturbations can “derail” its operation and introduce what we can think of as “disease”. And with this setup we can then think of the “fundamental problem of medicine” as being the identification of additional perturbations that can “treat the disease” and put the organism at least approximately back on its normal “healthy” track.
As we’ll see, most perturbations lead to lots of detailed changes in our idealized organism, much as perturbations in biological organisms normally lead to vast numbers of effects, say at a molecular level. But as in medicine, we can imagine that all we can observe (and perhaps all we care about) are certain coarse-grained features or “symptoms”. And the fundamental problem of medicine is then to work out from these symptoms what “treatment” (if any) will end up being useful. (By the way, when I say “symptoms” I mean the whole cluster of signs, symptoms, tests, etc. that one might in practice use, say for diagnosis.)
It’s worth emphasizing again that I’m not trying here to derive specific, actionable, medical conclusions. Rather, my goal is to build a conceptual framework in which, for example, it becomes conceivable for general phenomena in medicine that in the past have seemed at best vague and anecdotal to begin to be formalized and studied in a systematic way. At some level, what I’m trying to do is a bit like what Darwinism did for biological evolution. But in modern times there’s a critical new element: the computational paradigm, which not only introduces all sorts of new, powerful theoretical concepts, but also leads us to the practical methodology of computer experimentation. And indeed much of what follows is based on the (often surprising) results of computer experiments I’ve recently done that give us raw material to build our intuition—and structure our thinking—about fundamental phenomena in medicine.
How can we make a metamodel of medicine? We need an idealization of biological organisms and their behavior and development. We need an idealization of the concept of disease for such organisms. And we need an idealization of the concept of treatment.
For our idealization of biological organisms we’ll use a class of simple computational systems called cellular automata (that I happen to have studied since the early 1980s). Here’s a specific example:
What’s going on here is that we’re progressively constructing the pattern on the left (representing the development and behavior of our organism) by repeatedly applying cases of the rules on the right (representing the idealized genome—and other biochemical, etc. rules—of our organism). Roughly we can think of the pattern on the left as corresponding to the “life history” of our organism—growing, developing and eventually dying as it goes down the page. And even though there’s a rather organic look to the pattern, remember that the system we’ve set up isn’t intended to provide a model for any particular real-world biological system. Rather, the goal is just for it to capture enough of the foundations of biology that it can serve as a successful metamodel to let us explore our questions about the foundations of medicine.
Looking at our model in more detail, we see that it involves a grid of squares—or “cells” (computational, not biological)—each having one of 4 possible colors (white and three others). We start from a single red “seed” cell on the top row of the grid, then compute the colors of cells on subsequent steps (i.e. on subsequent rows down the page) by successively applying the rules on the right. The rules here are basically very simple. But we can see that when we run them they lead to a fairly complicated pattern—which in this case happens to “die out” (i.e. all cells become white) after exactly 101 steps.
So what happens if we perturb this system? On the left here we’re showing the system as above, without perturbation. But on the right we’re introducing a perturbation by changing the color of a particular cell (on step 16)—leading to a rather different (if qualitatively similar) pattern:
Here are the results of some other perturbations to our system:
Some perturbations (like the one in the second panel here) quickly disappear; in essence the system quickly “heals itself”. But in most cases even single-cell perturbations like the ones here have a long-term effect. Sometimes they can “increase the lifetime” of the organism; often they will decrease it. And sometimes—like in the last case shown here—they will lead to essentially unbounded “tumor-like” growth.
In biological or medical terms, the perturbations we’re introducing are minimal idealizations of “things that can happen to an organism” in the course of its life. Sometimes the perturbations will have little or no effect on the organism. Or at least they won’t “really hurt it”—and the organism will “live out its natural life” (or even extend it a bit). But in other cases, a perturbation can somehow “destabilize” the organism, in effect “making it develop a disease”, and often making it “die before its time”.
But now we can formulate what we can think of as the “fundamental problem of medicine”: given that perturbations have had a deleterious effect on an organism, can we find subsequent perturbations to apply that will serve as a “treatment” to overcome the deleterious effect?
The first panel here shows a particular perturbation that makes our idealized organism die after 47 steps. The subsequent panels then show various “treatments” (i.e. additional perturbations) that serve at least to “keep the organism alive”:
In the later panels here the “life history” of the organism gets closer to the “healthy” unperturbed form shown in the final panel. And if our criterion is restoring overall lifetime, we can reasonably say that the “treatment has been successful”. But it’s notable that the detailed “life history” (and perhaps “quality of life”) of the organism will essentially never be the same as before: as we’ll see in more detail later, it’s almost inevitably the case that there’ll be at least some (and often many) long-term effects of the perturbation+treatment even if they’re not considered deleterious.
So now that we’ve got an idealized model of the “problem of medicine”, what can we say about solving it? Well, the main thing is that we can get a sense of why it’s fundamentally hard. And beyond anything else, the central issue is a fundamentally computational one: the phenomenon of computational irreducibility.
Given any particular cellular automaton rule, with any particular initial condition, one can always explicitly run the rule, step by step, from that initial condition, to see what will happen. But can one do better? Experience with mathematical science might make one imagine that as soon as one knows the underlying rule for a system, one should in principle immediately be able to “solve the equations” and jump ahead to work out everything about what the system does, without explicitly tracing through all the steps. But one of the central things I discovered in studying simple programs back in the early 1980s is that it’s common for such systems to show what I called computational irreducibility, which means that the only way to work out their detailed behavior is essentially just to run their rules step by step and see what happens.
So what about biology? One might imagine that with its incremental optimization, biological evolution would produce systems that somehow avoid computational irreducibility, and (like simple machinery) have obvious easy-to-understand mechanisms by which they operate. But in fact that’s not what biological evolution typically seems to produce. And instead—as I’ve recently argued—what it seems to do is basically just to put together randomly found “lumps of irreducible computation” that happen to satisfy its fitness criterion. And the result is that biological systems are full of computational irreducibility, and mostly aren’t straightforwardly “mechanically explainable”. (The presence of computational irreducibility is presumably also why theoretical biology based on mathematical models has always been so challenging.)
But, OK, given all this computational irreducibility, how is it that medicine is even possible? How is it that we can know enough about what a biological system will do to be able to determine what treatment to use on it? Well, computational irreducibility makes it hard. But it’s a fundamental feature of computational irreducibility that within any computationally irreducible process there must always be pockets of computational reducibility. And if we’re trying to achieve only some fairly coarse objective (like maximizing overall lifetime) it’s potentially possible to leverage some pocket of computational reducibility to do this.
(And indeed pockets of computational reducibility within computational irreducibility are what make many things possible—including having understandable laws of physics, doing higher mathematics, etc.)
With our simple idealization of disease as the effect of perturbations on the life history of our idealized organism, we can start asking questions like “What is the distribution of all possible diseases?”
And to begin exploring this, here are the patterns generated with a random sample of the 4383 possible single-point perturbations to the idealized organism we’ve discussed above:
Clearly there’s a lot of variation in these life histories—in effect a lot of different symptomologies. If we average them all together we lose the detail and we just get something close to the original:
But if we look at the distribution of lifetimes, we see that while it’s peaked at the original value, it nevertheless extends to both shorter and longer values:
In medicine (or at least Western medicine) it’s been traditional to classify “things that can go wrong” in terms of discrete diseases. And we can imagine also doing this in our simple model. But it’s already clear from the array of pictures above that this is not going to be a straightforward task. We’ve got a different detailed pattern for every different perturbation. So how should we group them together?
Well—much as in medicine—it depends on what we care about. In medicine we might talk about signs and symptoms, which in our idealized model we can basically identify with features of patterns. And as an example, we might decide that the only features that matter are ones associated with the boundary shape of our pattern:
So what happens to these boundary shapes with different perturbations? Here are the most frequent shapes found (together with their probabilities):
We might think of these as representing “common diseases” of our idealized organism. But what if we look at all possible “diseases”—at least all the ones produced by single-cell perturbations? Using boundary shape as our way to distinguish “diseases” we find that if we plot the frequency of diseases against their rank we get roughly a power law distribution (and, yes, it’s not clear why it’s a power law):
What are the “rare diseases” (i.e. ones with low frequency) like? Their boundary shapes can be quite diverse:
But, OK, can we somehow quantify all these “diseases”? For example, as a kind of “imitation medical test” we might look at how far to the left the boundary of each pattern goes. With single-point perturbations, 84% of the time it’s the same as in the unperturbed case—but there’s a distribution of other, “less healthy” results (here plotted on a log scale)
with extreme examples being:
And, yes, we could diagnose any pattern that goes further to the left than the unperturbed one as a case of, say, “leftiness syndrome”. And we might imagine that if we set up enough tests, we could begin to discriminate between many discrete “diseases”. But somehow this seems quite ad hoc.
So can we perhaps be more systematic by using machine learning? Let’s say we just look at each whole pattern, then try to place it in an image feature space, say a 2D one. Here’s an example of what we get:
The details of this depend on the particulars of the machine learning method we’ve used (here the default FeatureSpacePlot method in Wolfram Language). But it’s a fairly robust result that “visually different” patterns end up separated—so that in effect the machine learning is successfully automating some kind of “visual diagnosis”. And there’s at least a little evidence that the machine learning will identify separated clusters of patterns that we can reasonably identify as “truly distinct diseases”—even as the more common situation is that between any two patterns, there are intermediate ones that aren’t neatly classified as one disease or the other.
Somewhat in the style of the human “International Classification of Diseases” (ICD), we can try arranging all our patterns in a hierarchy—though it’s basically inevitable that we’ll always be able to subdivide further, and there’ll never be a clear point at which we can say “we’ve classified all the diseases”:
By the way, in addition to talking about possible diseases, we also need to discuss what counts as “healthy”. We could say that our organism is only “healthy” if its pattern is exactly what it would be without any perturbation (“the natural state”). But what probably better captures everyday medical thinking is to say that our organism should be considered “healthy” if it doesn’t have symptoms (or features) that we consider bad. And in particular, at least “after the fact” we might be able to say that it must have been healthy if its lifetime turned out to be long.
It’s worth noting that even in our simple model, while there are many perturbations that reduce lifetime, there are also perturbations that increase lifetime. In the course of biological evolution, genetic mutations of the overall underlying rules for our idealized organism might have managed to achieve a certain longevity. But the point is that nothing says “longevity perturbations” applied “during the life of the organism” can’t get further—and indeed here are some examples where they do:
And, actually, in a feature that’s not (at least yet) reflected in human medicine, there are perturbations than can make the lifetime very significantly longer. And for the particular idealized organism we’re studying here, the most extreme examples obtained with single-point perturbations are:
OK, but what happens if we consider perturbations at multiple points? There are immediately vastly more possibilities. Here are some examples of the 10 million or so possible configurations of two perturbations:
And here are examples with three perturbations:
Here are examples if we try to apply five perturbations (though sometimes the organism is “already dead” before we can apply later perturbations):
What happens to the overall distribution of lifetimes in these cases? Already with two perturbations, the distribution gets much broader, and with three or more, the peak at the original lifetime has all but disappeared, with a new peak appearing for organisms that in effect die almost immediately:
In other words, the particular idealized organism that we’re studying is fairly robust against one perturbation, and perhaps even two, but with more perturbations it’s increasingly likely to succumb to “infant mortality”. (And, yes, if one increases the number of perturbations the “life expectancy” progressively decreases.)
But what about the other way around? With multiple perturbations, can the organism in effect “live forever”? Here are some examples where it’s still “going strong” after 300 steps:
But after 500 steps most of these have died out:
As is typical in the computational universe (perhaps like in medicine) there are always surprises, courtesy of computational irreducibility. Like the sudden appearance of the obviously periodic case (with period 25):
As well as the much more complicated cases (where in the final pictures the pattern has been “rectified”):
So, yes, in these cases the organism does in effect “live forever”—though not in an “interesting” way. And indeed such cases might remind us of tumor-like behavior in biological organisms. But what about a case that not only lives forever, but also grows forever? Well, needless to say, lurking out in the computational universe, one can find an example:
The “incidence” of this behavior is about one in a million for 2 perturbations (or, more precisely, 7 out of 9.6 million possibilities), and one in 300,000 for 3 perturbations. And although there presumably are even more complicated behaviors out there to find, they don’t show up with 2 perturbations, and their incidence with 3 perturbations is below about one in 100 million.
A fundamental objective in medicine is to predict from tests we do or symptoms and signs we observe what will happen. And, yes, we now know that computational irreducibility inevitably makes this in general hard. But also know from experience that a certain amount of prediction is possible—which we can now interpret as successfully managing to tap into pockets of computational reducibility.
So as an example, let’s ask what the prognosis is for our idealized organism based on the width of its pattern we measure at a certain step. So here, for example, is what happens to the original lifetime distribution (in green) if we consider only cases where the width of the measured pattern after 25 steps is less than its unperturbed (“healthy”) value (and where we’re dropping the 1% of cases when the organism was “already dead” before 25 steps):
Our “narrow” cases represent about 5% of the total. Their median lifetime is 57, as compared with the overall median of 106. But clearly the median alone does not tell the whole story. And nor do the two survival curves:
And, for example, here are the actual widths as a function of time for all the narrow cases, compared to the sequence of widths for the unperturbed case:
These pictures don’t make it look promising that one could predict lifetime from the single test of whether the pattern was narrow at step 25. Like in analogous medical situations, one needs more data. One approach in our case is to look at actual “narrow” patterns (up to step 25)—here sorted by ultimate lifetime—and then to try to identify useful predictive features (though, for example, to attempt any serious machine learning training would require a lot more examples):
But perhaps a simpler approach is not just to do a discrete “narrow or not” test, but rather to look at the actual width at step 25. So here are the lifetimes as a function of width at step 25
and here’s the distribution of outcomes, together with the median in each case:
The predictive power of our width measurement is obviously quite weak (though there’s doubtless a way to “hack p values” to get at least something out). And, unsurprisingly, machine learning doesn’t help. Like here’s a machine learning prediction (based on decision tree methods) for lifetime as a function of width (that, yes, is very close to just being the median):
Does it help if we use more history? In other words, what happens if we make our prediction not just from the width at a particular step, but from the history of all widths up to that point? As one approach, we can make a collection of “training examples” of what lifetimes particular “width histories” (say up to step 25) lead to:
There’s already something of an issue here, because a given width history—which, in a sense is a “coarse graining” of the detailed “microscopic” history—can lead to multiple different final lifetimes:
But we can still go ahead and try to use machine learning to predict lifetimes from width histories based on training on (say, half) of our training data—yielding less than impressive results (with the vertical line being associated with multiple lifetimes from a single width history in the training data):
So how can we do better? Well, given the underlying setup for our system, if we could determine not just the width but the whole precise sequence of values for all cells, even just at step 25, then in principle we could use this as an “initial condition” and run the system forward to see what it does. But regardless of it being “medically implausible” to do this, it isn’t much of a prediction anyway; it’s more just “watch and see what happens”. And the point is that insofar as there’s computational irreducibility, one can’t expect—at least in full generality—to do much better. (And, as we’ll argue later, there’s no reason to think that organisms produced by biological evolution will avoid computational irreducibility at this level.)
But still, within any computationally irreducible system, there are always pockets of computational reducibility. So we can expect that there will be some predictions that can be made. But the question is whether those predictions will be about things we care about (like lifetime) or even about things we can measure. Or, in other words, will they be predictions that speak to things like symptoms?
Our Physics Project, for example, involves all sorts of underlying processes that are computationally irreducible. But the key point there is that what physical observers like us perceive are aggregate constructs (like overall features of space) that show significant computational reducibility. And in a sense there’s an analogous issue here: there’s computational irreducibility underneath, but what do “medical observers” actually perceive, and are there computationally reducible features related to that? If we could find such things, then in a sense we’d have identified “general laws of medicine” much like we now have “general laws of physics”.
We’ve talked a bit about giving a prognosis for what will happen to an idealized organism that’s suffered a perturbation. But what about trying to fix it? What about trying to intervene with another “treatment perturbation” that can “heal” the system, and give it a life history that’s at least close to what it would have had without the original perturbation?
Here’s our original idealized organism, together with how it behaves when it “suffers” a particular perturbation that significantly reduces its lifetime:
But what happens if we now try applying a second perturbation? Here are a few random examples:
None of these examples convincingly “heal” the system. But let’s (as we can in our idealized model) just enumerate all possible second perturbations (here 1554 of them). Then it turns out that a few of these do in fact successfully give us patterns that at least exactly reproduce the original lifetime:
Do these represent true examples of “healing”? Well, it depends on what we mean. Yes, they’ve managed to make the lifetime exactly what it would have been without the original “disease-inducing” perturbation. But in essentially all cases we see here that there are various “long-term side effects”—in the sense that the detailed patterns generated end up having obvious differences from the original unperturbed “healthy” form.
The one exception here is the very first case, in which the “disease was caught early enough” that the “treatment perturbation” manages to completely heal the effects of the “disease perturbation”:
We’ve been talking here about intervening with “treatment perturbations” to “heal” our “disease perturbation”. But actually it turns out that there are plenty of “disease perturbations” which automatically “heal themselves”, without any “treatment” intervention. In fact, of all possible 4383 single perturbations, 380 essentially heal themselves.
In many cases, the “healing” happens very locally, after one or two steps:
But there are also more complicated cases, where perturbations produce fairly large-scale changes in the pattern—that nevertheless “spontaneously heal themselves”:
(Needless to say, in cases where a perturbation “spontaneously heals itself”, adding a “treatment perturbation” will almost always lead to a worse outcome.)
So how should we think about perturbations that spontaneously heal themselves? They’re like seeds for diseases that never take hold, or like diseases that quickly burn themselves out. But from a theoretical point of view we can think of them as being where the unperturbed life history of our idealized organism is acting as attractor, to which certain perturbed states inexorably converge—a bit like how friction can dissipate perturbations to patterns of motion in a mechanical system.
But let’s say we have a perturbation that doesn’t “spontaneously heal itself”. Then to remediate it we have to “do the medical thing” and in our idealized model try to find a “treatment perturbation”. So how might we systematically set about doing that? Well, in general, computational irreducibility makes it difficult. And as one indication of this, this shows what lifetime is achieved by “treatment perturbations” made at each possible point in the pattern (after the initial perturbation):
We can think of this as providing a map of what the effects of different treatment perturbations will be. Here are some other examples, for different initial perturbations (or, in effect, different “diseases”):
There’s some regularity here. But the main observation is that different detailed choices of treatment perturbations will often have very different effects. In other words, even “nearby treatments” will often lead to very different outcomes. Given computational irreducibility, this isn’t surprising. But in a sense it underscores the difficulty of finding and applying “treatments”. By the way, cells indicated in dark red above are ones where treatment leads to a pattern that lives “excessively long”—or in effect shows tumor-like characteristics. And the fact that these are scattered so seemingly randomly reflects the difficulty of predicting whether such effects will occur as a result of treatment.
In what we’ve done so far here, our “treatment” has always consisted of just a single additional perturbation. But what about applying more perturbations? For example, let’s say we do a series of experiments where after our first “treatment perturbation” we progressively try other treatment perturbations. If a given additional perturbation doesn’t get further from the desired lifetime, we keep it. Otherwise we reject it, and try another perturbation. Here’s an example of what happens if we do this:
The highlighted panels represent perturbations we kept. And here’s how the overall lifetime “converges” over successive iterations in our experiment:
In what we just did, we allowed additional treatment perturbations to be added at any subsequent step. But what if we require treatment perturbations to always be added on successive steps—starting right after the “disease perturbation” occurred? Here’s an example of what happens in this case:
And here’s what we see zooming in at the beginning:
In a sense this corresponds to “doing aggressive treatment” as soon as the initial “disease perturbation” has occurred. And a notable feature of the particular example here is that when our succession of treatment perturbations have succeeded in “restoring the lifetime” (which happens fairly quickly), the life history they produce is similar (though not identical) to the original unperturbed case.
That definitely doesn’t always happen, as this example illustrates—but it’s fairly common:
It’s worth pointing out that if we allowed ourselves to do many single perturbations at the same time (i.e. on the same row of the pattern) we could effectively just “define new initial conditions” for the pattern, and, for example, perfectly “regenerate” the original unperturbed pattern after this “reset”. And in general we can imagine in effect “hot-wiring” the organism by applying large numbers of treatment perturbations that just repeatedly direct it back to its unperturbed form.
But such extensive and detailed “intervention”—that in effect replaces the whole state of the organism—seems far from what might be practical in typical (current) medicine (except perhaps in some kind of “regenerative treatment”). And indeed in actual (current) medicine one is normally operating in a situation where one does not have anything close to perfect “cell-by-cell” information on the state of an organism—and instead one has to figure out things like what treatment to give based on much coarser “symptom-level” information. (In some ways, though, the immune system does something closer to cell-by-cell “treatment”.)
So what can one do given coarse-grained information? As one example, let’s consider trying to predict what treatment perturbation will be best using the kind of pattern-width information we discussed above. Specifically, let’s say that we have the history of the overall width of a pattern up to a particular point, then from this we want to predict what treatment perturbation will lead to the best lifetime outcome for the system. There are a variety of ways we could approach this, but one is to make predictions of where to apply a treatment perturbation using machine learning trained on examples of optimal such perturbations.
This is analogous to what we did in the previous section in applying machine learning to predict lifetime from width history. But now we want to predict from width history what treatment perturbation to apply. To generate our training data we can search for treatment perturbations that lead to the unperturbed lifetime when starting from life histories with a given width history. Now we can use a simple neural net to create a predictor that tries to tell us from a width history what “treatment to give”. And here are comparisons between our earlier search results based on looking at complete life histories—and (shown with red arrows) the machine learning predictions based purely on width history before the original disease perturbation:
It’s clear that the machine learning is doing something—though it’s not as impressive as perhaps it looks, because a wide range of perturbations all in fact give rather similar life histories. So as a slightly more quantitative indication of what’s going on, here’s the distribution of lifetimes achieved by our machine-learning-based therapy:
Our “best treatment” was able to give lifetime 101 in all these cases. And while the distribution we’ve now achieved looks peaked around the unperturbed value, dividing this distribution by what we’d get without any treatment at all makes it clear that not so much was achieved by the machine learning we were able to do:
And in a sense this isn’t surprising; our machine learning—based, as it is, on coarse-grained features—is quite weak compared to the computational irreducibility of the underlying processes at work.
In what we’ve done so far, we’ve studied just a single idealized organism—with a single set of underlying “genetic rules”. But in analogy to the situation with humans, we can imagine a whole population of genetically slightly different idealized organisms, with different responses to perturbations, etc.
Many changes to the underlying rules for our idealized organism will lead to unrecognizably different patterns, that don’t, for example, have the kind of finite-but-long lifetimes we’ve been interested in. But it turns out that in the rules for our particular idealized organism there are some specific changes that actually don’t have any effect at all—at least on the unperturbed pattern of behavior. And the reason for this is that in generating the unperturbed pattern these particular cases in the rule happen never to be used:
And the result is that any one of the 43 = 64 possible choices of outcomes for those cases in the rule will still yield the same unperturbed pattern. If there’s a perturbation, however, different cases in the rule can be sampled—including these ones. It’s as if cases in the rule that are initially “non-coding” end up being “coding” when the path of behavior is changed by a perturbation. (Or, said differently, it’s like different genes being activated when conditions are different.)
So to make an idealized model of something like a population with genetic diversity, we can look at what happens with different choices of our (initially) “non-coding” rule outcomes:
Before the perturbation, all these inevitably show the same behavior, because they’re never sampling “non-coding” rule cases. But as soon as there’s a perturbation, the pattern is changed, and after varying numbers of steps, previously “non-coding” rule cases do get sampled—and can affect the outcome.
Here are the distinct cases of what happens in all 64 “genetic variants”—with the red arrow in each case indicating where the pattern first differs from what it is with our original idealized organism:
And here is then the distribution of lifetimes achieved—in effect showing the differing consequences of this particular “disease perturbation” on all our genetic variants:
What happens with other “disease perturbations”? Here’s a sample of distributions of lifetimes achieved (where “__” corresponds to cases where all 64 genetic variants yield the same lifetime):
OK, so what about the overall lifetime distribution across all (single) perturbations for each of the genetic variants? The detailed distribution we get is different for each variant. But their general shape is always remarkably similar
though taking differences from the case of our original idealized organism reveals some structure:
As another indication of the effect of genetic diversity, we can plot the survival curve averaged over all perturbations, and compare the case for our original idealized organism with what happens if we average equally over all 64 genetic variants. The difference is small, but there is a longer tail for the average of the genetic variants than for our specific original idealized organism:
We’ve seen how our idealized genetic variation affects “disease”. But how does it affect “treatment”? For the “disease” above, we already saw that there’s a particular “treatment perturbation” that successfully returns our original idealized organism to its “natural lifespan”. So what happens if we apply this same treatment across all the genetic variants? In effect this is like doing a very idealized “clinical trial” of our potential treatment. And what we see is that the results are quite diverse—and indeed more diverse than from the disease on it own:
In essence what we’re seeing is that, yes, there are some genetic variants for which the treatment still works. But there are many for which there are (often fairly dramatic) side effects.
So where did the particular rule for the “model organism” we’ve been studying come from? Well, we evolved it—using a slight generalization of the idealized model for biological evolution that I recently introduced. The goal of our evolutionary process was to find a rule that generates a pattern that lives as long as possible, but not infinitely long—and that does so robustly even in the presence of perturbations. In essence we used lifetime (or, more accurately, “lifetime under perturbation”) as our “fitness function”, then progressively evolved our rule (or “genome”) by random mutations to try to maximize this fitness function.
In more detail, we started from the null (“everything turns white”) rule, then successively made random changes to single cases in the rule (“point mutations”)—keeping the resulting rule whenever the pattern it generated had a lifetime (under perturbation) that wasn’t smaller (or infinite). And with this setup, here’s the particular (random) sequence of rules we got (showing for each rule the outcome for each of its 64 cases):
Many of these rules don’t “make progress” in the sense that they increase the lifetime under perturbation. But every so often there’s a “breakthrough”, and a rule with a longer lifetime under perturbation is reached:
And, as we see, the rule for the particular model organism we’ve been using is what’s reached at the end.
In studying my recent idealized model for biological evolution, I considered fitness functions like lifetime that can directly be computed just by running the underlying rule from a certain initial condition. But here I’m generalizing that a bit, and considering as a fitness function not just lifetime, but “lifetime under perturbation”, computed by taking a particular rule, and finding the minimum lifetime of all patterns produced by it with certain random perturbations applied.
So, for example, here the “lifetime under perturbation” would be considered to be the minimum of the lifetimes generated with no perturbation, and with certain random perturbations—or in this case 60:
This plot then illustrates how the (lifetime-under-perturbation) fitness (indicated by the blue line) behaves in the course of our adaptive evolution process, right around where the fitness-60 “breakthrough” above occurs:
What’s happening in this plot? At each adaptive step, we’re considering a new rule, obtained by a point mutation from the previous one. Running this rule we get a certain lifetime. If this lifetime is finite, we indicate it by a green dot. Then we apply a certain set of random perturbations—indicating the lifetimes we get by gray dots. (We could imagine using all sorts of schemes for picking the random perturbations; here what we’re doing is to perturb random points on about a tenth of the rows in the unperturbed pattern.)
Then the minimum lifetime for any given rule we indicate by a red dot—and this is the fitness we assign to that rule. So now we can see the whole progression of our adaptive evolution process:
One thing that’s notable is that the unperturbed lifetimes (green dots) are considerably larger than the final minimum lifetimes (red dots). And what this means is that our requirement of “robustness”, implemented by looking at lifetime under perturbation rather than just unperturbed lifetime, considerably reduces the lifetimes we can reach. In other words, if our idealized organism is going to be robust, it won’t tend to be able to have as long a lifetime as it could if it didn’t have to “worry about” random perturbations.
And to illustrate this, here’s a typical example of a much longer lifetime obtained by adaptive evolution with the same kind of rule we’ve been using (k = 4, r = 1 cellular automaton), but now with no perturbations and with fitness being given purely by the unperturbed lifetime (exactly as in my recent work on biological evolution):
OK, so given that we’re evolving with a lifetime-under-perturbation fitness function, what are some alternatives to our particular model organism? Here are a few examples:
At an overall level, these seem to react to perturbations much like our original model organism:
One notable feature here, though, is that there seems to be a tendency for simpler overall behavior to be less disrupted by perturbations. In other words, our idealized “diseases” seem to have less dramatic effects on “simpler” idealized organisms. And we can see a reflection of this phenomenon if we plot the overall (single-perturbation) lifetime distributions for the four rules above:
But despite detailed differences, the main conclusion seems to be that there’s nothing special about the particular model organism we’ve used—and that if we repeated our whole analysis for different model organisms (i.e. “different idealized species”) the results we’d get would be very much the same.
So what does all this mean? At the outset, it wasn’t clear there’d be a way to usefully capture anything about the foundations of medicine in a formalized theoretical way. But in fact what we’ve found is that even the very simple computational model we’ve studied seems to successfully reflect all sorts of features of what we see in medicine. Many of the fundamental effects and phenomena are, it seems, not the result of details of biomedicine, but instead are at their core purely abstract and computational—and therefore accessible to formalized theory and metamodeling. This kind of methodology is very different from what’s been traditional in medicine—and isn’t likely to lead directly to specific practical medicine. But what it can do is to help us develop powerful new general intuition and ways of reasoning—and ultimately an understanding of the conceptual foundations of what’s going on.
At the heart of much of what we’ve seen is the very fundamental—and ubiquitous—phenomenon of computational irreducibility. I’ve argued recently that computational irreducibility is central to what makes biological evolution work—and that it’s inevitably imprinted on the core “computational architecture” of biological organisms. And it’s this computational irreducibility that inexorably leads to much of the complexity we see so ubiquitously in medicine. Can we expect to find a simple narrative explanation for the consequences of some perturbation to an organism? In general, no—because of computational irreducibility. There are always pockets of computational reducibility, but in general we can have no expectation that, for example, we’ll be able to describe the effects of different perturbations by neatly classifying them into a certain set of distinct “diseases”.
To a large extent the core mission of medicine is about “treating diseases”, or in our terms, about remediating or reversing the effects of perturbations. And once again, computational irreducibility implies there’s inevitably a certain fundamental difficulty in doing this. It’s a bit like with the Second Law of thermodynamics, where there’s enough computational irreducibility in microscopic molecular dynamics that to seriously reverse—or outpredict—this dynamics is something that’s at least far out of range for computationally bounded observers like us. And in our medical setting the analog of that is that “computationally bounded interventions” can only systematically lead to medical successes insofar as they tap into pockets of computational reducibility. And insofar as they are exposed to overall computational irreducibility they will inevitably seem to show a certain amount of apparent randomness in their outcomes.
In traditional approaches to medicine one ultimately tends to “give in to the randomness” and go no further than to assign probabilities to things. But an important feature of what we’ve done here is that in our idealized computational models we can always explicitly see what’s happening inside. Often—largely as a consequence of computational irreducibility—it’s complicated. But the fact that we can see it gives us the opportunity to get much more clarity about the fundamental mechanisms involved. And if we end up summarizing what happens by giving probabilities and doing statistics it’s because this is something we’re choosing to do, not something we’re forced to do because of our lack of knowledge of the systems we’re studying.
There’s much to do in our effort to explore the computational foundations of medicine. But already there are some implications that are beginning to emerge. Much of the workflow of medicine today is based on classifying things that can go wrong into discrete diseases. But what we’ve seen here (which is hardly surprising given practical experience with medicine) is that when one looks at the details, a huge diversity of things can happen—whose characteristics and outcomes can’t really be binned neatly into discrete “diseases”.
And indeed when we try to figure out “treatments” the details matter. As a first approximation, we might base our treatments on coarse graining into discrete diseases. But—as the approach I’ve outlined here can potentially help us analyze—the more we can directly go from detailed measurements to detailed treatments (through computation, machine learning, etc.), the more promising it’s likely to be. Not that it’s easy. Because in a sense we’re trying to beat computational irreducibility—with computationally bounded measurements and interventions.
In principle one can imagine a future in which our efforts at treatment have much more computational sophistication (and indeed the immune system presumably already provides an example in nature). We can imagine things like algorithmic drugs and artificial cells that are capable of amounts of computation that are a closer match for the irreducible computation of an organism. And indeed the kind of formalized theory that I’ve outlined here is likely what one needs to begin to get an idea of how such an approach might work. (In the thermodynamic analogy, what we need to do is a bit like reversing entropy increase by sending in large numbers of “smart molecules”.)
(By the way, seeing how difficult it potentially is to reverse the effects of a perturbation provides all the more impetus to consider “starting from scratch”—as nature does in successive generations of organisms—and simply wholesale regenerating elements of organisms, rather than trying to “fix what’s there”. And, yes, in our models this is for example like starting to grow again from a new seed, and letting the resulting pattern knit itself into the existing one.)
One of the important features of operating at the level of computational foundations is that we can expect conclusions we draw to be very general. And we might wonder whether perhaps the framework we’ve described here could be applied outside of medicine. And to some extent I suspect it can—potentially to areas like robustness of large-scale technological and social systems and specifically things like computer security and computer system failures. (And, yes, much as in medicine one can imagine for example “classifying diseases” for computer systems.) But things likely won’t be quite the same in cases like these—because the underlying systems have much more human-determined mechanisms, and less “blind” adaptive evolution.
But when it comes to medicine, the very presence of computational irreducibility introduced by biological evolution is what potentially allows one to develop a robust framework in which one can draw conclusions purely on the basis of abstract computational phenomena. Here I’ve just begun to scratch the surface of what’s possible. But I think we’ve already seen enough that we can be confident that medicine is yet another field whose foundations can be seen as fundamentally rooted in the computational paradigm.
Thanks to Wolfram Institute researcher Willem Nielsen for extensive help.
I’ve never written anything substantial about medicine before, though I’ve had many interactions with the medical research and biomedical communities over the years—that have gradually extended my knowledge and intuition about medicine. (Thanks particularly to Beatrice Golomb, who over the course of more than forty years has helped me understand more about medical reasoning, often emphasizing “Beatrice’s Law” that “Everything in medicine is more complicated than you can possibly imagine, even taking account of Beatrice’s Law”…)
2025-01-24 03:00:09
Just under six months ago (176 days ago, to be precise) we released Version 14.1. Today I’m pleased to announce that we’re releasing Version 14.2, delivering the latest from our R&D pipeline.
This is an exciting time for our technology, both in terms of what we’re now able to implement, and in terms of how our technology is now being used in the world at large. A notable feature of these times is the increasing use of Wolfram Language not only by humans, but also by AIs. And it’s very nice to see that all the effort we’ve put into consistent language design, implementation and documentation over the years is now paying dividends in making Wolfram Language uniquely valuable as a tool for AIs—complementing their own intrinsic capabilities.
But there’s another angle to AI as well. With our Wolfram Notebook Assistant launched last month we’re using AI technology (plus a lot more) to provide what amounts to a conversational interface to Wolfram Language. As I described when we released Wolfram Notebook Assistant, it’s something extremely useful for experts and beginners alike, but ultimately I think its most important consequence will be to accelerate the ability to go from any field X to “computational X”—making use of the whole tower of technology we’ve built around Wolfram Language.
So, what’s new in 14.2? Under the hood there are changes to make Wolfram Notebook Assistant more efficient and more streamlined. But there are also lots of visible extensions and enhancements to the user-visible parts of the Wolfram Language. In total there are 80 completely new functions—along with 177 functions that have been substantially updated.
There are continuations of long-running R&D stories, like additional functionality for video, and additional capabilities around symbolic arrays. Then there are completely new areas of built-in functionality, like game theory. But the largest new development in Version 14.2 is around handling tabular data, and particularly, big tabular data. It’s a whole new subsystem for Wolfram Language, with powerful consequences throughout the system. We’ve been working on it for quite a few years, and we’re excited to be able to release it for the first time in Version 14.2.
Talking of working on new functionality: starting more than seven years ago we pioneered the concept of open software design, livestreaming our software design meetings. And, for example, since the release of Version 14.1, we’ve done 43 software design livestreams, for a total of 46 hours (I’ve also done 73 hours of other livestreams in that time). Some of the functionality that’s now in Version 14.2 we started work on quite a few years ago. But we’ve been livestreaming long enough that pretty much anything that’s now in Version 14.2 we designed live and in public on a livestream at some time or another. It’s hard work doing software design (as you can tell if you watch the livestreams). But it’s always exciting to see the fruits of those efforts come to fruition in the system we’ve been progressively building for so long. And so, today, it’s a pleasure to be able to release Version 14.2 and to let everyone use the things we’ve been working so hard to build.
Last month we released the Wolfram Notebook Assistant to “turn words into computation”—and help experts and novices alike make broader and deeper use of Wolfram Language technology. In Version 14.1 the primary way to use Notebook Assistant is through the separate “side chat” Notebook Assistant window. But in Version 14.2 “chat cells” have become a standard feature of any notebook available to anyone with a Notebook Assistant subscription.
Just type ‘ as the first character of any cell, and it’ll become a chat cell:
Now you can start chatting with the Notebook Assistant:
With the side chat you have a “separate channel” for communicating with the Notebook Assistant—that won’t, for example, be saved with your notebook. With chat cells, your chat becomes an integral part of the notebook.
We actually first introduced Chat Notebooks in the middle of 2023—just a few months after the arrival of ChatGPT. Chat Notebooks defined the interface, but at the time, the actual content of chat cells was purely from external LLMs. Now in Version 14.2, chat cells are not limited to separate Chat Notebooks, but are available in any notebook. And by default they make use of the full Notebook Assistant technology stack, which goes far beyond a raw LLM. In addition, once you have a Notebook Assistant + LLM Kit subscription, you can seamlessly use chat cells; no account with external LLM providers is needed.
The chat cell functionality in Version 14.2 inherits all the features of Chat Notebooks. For example, typing ~ in a new cell creates a chat break, that lets you start a “new conversation”. And when you use a chat cell, it’s able to see anything in your notebook up to the most recent chat break. (By the way, when you use Notebook Assistant through side chat it can also see what selection you’ve made in your “focus” notebook.)
By default, chat cells are “talking” to the Notebook Assistant. But if you want, you can also use them to talk to external LLMs, just like in our original Chat Notebook—and there’s a convenient menu to set that up. Of course, if you’re using an external LLM, you don’t have all the technology that’s now in the Notebook Assistant, and unless you’re doing LLM research, you’ll typically find it much more useful and valuable to use chat cells in their default configuration—talking to the Notebook Assistant.
Lists, associations, datasets. These are very flexible ways to represent structured collections of data in the Wolfram Language. But now in Version 14.2 there’s another: Tabular. Tabular provides a very streamlined and efficient way to handle tables of data laid out in rows and columns. And when we say “efficient” we mean that it can routinely juggle gigabytes of data or more, both in core and out of core.
Let’s do an example. Let’s start off by importing some tabular data:
This is data on trees in New York City, 683,788 of them, each with 45 properties (sometimes missing). Tabular introduces a variety of new ideas. One of them is treating tabular columns much like variables. Here we’re using this to make a histogram of the values of the "tree_dbh" column in this Tabular:
You can think of a Tabular as being like an optimized form of a list of associations, where each row consists of an association whose keys are column names. Functions like Select then just work on Tabular:
Length gives the number of rows:
CountsBy treats the Tabular as a list of associations, extracting the value associated with the key "spc_latin" (“Latin species”) in each association, and counting how many times that value occurs ("spc_latin" here is short for #"spc_latin"&):
To get the names of the columns we can use the new function ColumnKeys:
Viewing Tabular as being like a list of associations we can extract parts—giving first a specification of rows, and then a specification of columns:
There are lots of new operations that we’ve been able to introduce now that we have Tabular. An example is AggregrateRows, which constructs a new Tabular from a given Tabular by aggregating groups of rows, in this case ones with the same value of "spc_latin", and then applying a function to those rows, in this case finding the mean value of "tree_dbh":
An operation like ReverseSortBy then “just works” on this table, here reverse sorting by the value of "meandbh":
Here we’re making an ordinary matrix out of a small slice of data from our Tabular:
And now we can plot the result, giving the positions of Virginia pine trees in New York City:
When should you use a Tabular, rather than, say a Dataset? Tabular is specifically set up for data that is arranged in rows and columns—and it supports many powerful operations that make sense for data in this “rectangular” form. Dataset is more general; it can have an arbitrary hierarchy of data dimensions, and so can’t in general support all the “rectangular” data operations of Tabular. In addition, by being specialized for “rectangular” data, Tabular can also be much more efficient, and indeed we’re making use of the latest type-specific methods for large-scale data handling.
If you use TabularStructure you can see some of what lets Tabular be so efficient. Every column is treated as data of a specific type (and, yes, the types are consistent with the ones in the Wolfram Language compiler). And there’s streamlined treatment of missing data (with several new functions added specifically to handle this):
What we’ve seen so far is Tabular operating with “in-core” data. But you can quite transparently also use Tabular on out-of-core data, for example data stored in a relational database.
Here’s an example of what this looks like:
It’s a tabular that points to a table in a relational database. It doesn’t by default explicitly display the data in the Tabular (and in fact it doesn’t even get it into memory—because it might be huge and might be changing quickly as well). But you can still specify operations just like on any other Tabular. This finds out what columns are there:
And this specifies an operation, giving the result as a symbolic out-of-core Tabular object:
You can “resolve” this, and get an explicit in-memory Tabular using ToMemory:
Let’s say you’ve got a Tabular—like this one based on penguins:
There are lots of operations you can do that manipulate the data in this Tabular in a structured way—giving you back another Tabular. For example, you could just take the last 2 rows of the Tabular:
Or you could sample 3 random rows:
Other operations depend on the actual content of the Tabular. And because you can treat each row like an association, you can set up functions that effectively refer to elements by their column names:
Note that we can always use #[name] to refer to elements in a column. If name is an alphanumeric string then we can also use the shorthand #name. And for other strings, we can use #"name". Some functions let you just use "name" to indicate the function #["name"]:
So far we’ve talked only about arranging or selecting rows in a Tabular. What about columns? Here’s how we can construct a tabular that has just two of the columns from our original Tabular:
What if we don’t just want existing columns, but instead want new columns that are functions of these? ConstructColumns lets us define new columns, giving their names and the functions to be used to compute values in them:
(Note the trick of writing out Function to avoid having to put parentheses, as in (StringTake[#species,1]&).)
ConstructColumns lets you take an existing Tabular and construct a new one. TransformColumns lets you transform columns in an existing Tabular, here replacing species names by their first letters:
TransformColumns also lets you add new columns, specifying the content of the columns just like in ConstructColumns. But where does TransformColumns put your new columns? By default, they go at the end, after all existing columns. But if you specifically list an existing column, that’ll be used as a marker to determine where to put the new column ("name"Nothing removes a column):
Everything we’ve seen so far operates separately on each row of a Tabular. But what if we want to “gulp in” a whole column to use in our computation—say, for example, computing the mean of a whole column, then subtracting it from each value. ColumnwiseValue lets you do this, by supplying to the function (here Mean) a list of all the values in whatever column or columns you specify:
ColumnwiseValue effectively lets you compute a scalar value by applying a function to a whole column. There’s also ColumnwiseThread, which lets you compute a list of values, that will in effect be “threaded” into a column. Here we’re creating a column from a list of accumulated values:
By the way, as we’ll discuss below, if you’ve externally generated a list of values (of the right length) that you want to use as a column, you can do that directly by using InsertColumns.
There’s another concept that’s very useful in practice in working with tabular data, and that’s grouping. In our penguin data, we’ve got an individual row for each penguin of each species. But what if we want instead to aggregate all the penguins of a given species, for example computing their average body mass? Well, we can do this with AggregateRows. AggregateRows works like ConstructColumns in the sense that you specify columns and their contents. But unlike ConstructColumns it creates new “aggregated” rows:
What is that first column here? The gray background of its entries indicates that it’s what we call a “key column”: a column whose entries (perhaps together with other key columns) can be used to reference rows. And later, we’ll see how you can use RowKey to indicate a row by giving a value from a key column:
But let’s go on with our aggregation efforts. Let’s say that we want to group not just by species, but also by island. Here’s how we can do that with AggregateRows:
In a sense what we have here is a table whose rows are specified by pairs of values (here “species” and “island”). But it’s often convenient to “pivot” things so that these values are used respectively for rows and for columns. And you can do that with PivotTable:
Note the —’s, which indicate missing values; apparently there are no Gentoo penguins on Dream island, etc.
PivotTable normally gives exactly the same data as AggregateRows, but in a rearranged form. One additional feature of PivotTable is the option IncludeGroupAggregates which includes All entries that aggregate across each type of group:
If you have multiple functions that you’re computing, AggregateRows will just give them as separate columns:
PivotTable can also deal with multiple functions—by creating columns with “extended keys”:
And now you can use RowKey and ExtendedKey to refer to elements of the resulting Tabular:
We’ve seen some of the things you can do when you have data as a Tabular. But how does one get data into a Tabular? There are several ways. The first is just to convert from structures like lists and associations. The second is to import from a file, say a CSV or XLSX (or, for larger amounts of data, Parquet)—or from an external data store (S3, Dropbox, etc.). And the third is to connect to a database. You can also get data for Tabular directly from the Wolfram Knowledgebase or from the Wolfram Data Repository.
Here’s how you can convert a list of lists into a Tabular:
And here’s how you can convert back:
It works with sparse arrays too, here instantly creating a million-row Tabular
that takes 80 MB to store:
Here’s what happens with a list of associations:
You can get the same Tabular by entering its data and its column names separately:
By the way, you can convert a Tabular to a Dataset
and in this simple case you can convert it back to a Tabular too:
In general, though, there are all sorts of options for how to convert lists, datasets, etc. to Tabular objects—and ToTabular is set up to let you control these. For example, you can use ToTabular to create a Tabular from columns rather than rows:
How about external data? In Version 14.2 Import now supports a "Tabular" element for tabular data formats. So, for example, given a CSV file
Import can immediately import it as a Tabular:
This works very efficiently even for huge CSV files with millions of entries. It also does well at automatically identifying column names and headers. The same kind of thing works with more structured files, like ones from spreadsheets and statistical data formats. And it also works with modern columnar storage formats like Parquet, ORC and Arrow.
Import transparently handles both ordinary files, and URLs (and URIs), requesting authentication if needed. In Version 14.2 we’re adding the new concept of DataConnectionObject, which provides a symbolic representation of remote data, essentially encapsulating all the details of how to get the data. So, for example, here’s a DataConnectionObject for an S3 bucket, whose contents we can immediately import:
(In Version 14.2 we’re supporting Amazon S3, Azure Blob Storage, Dropbox, IPFS—with many more to come. And we’re also planning support for data warehouse connections, APIs, etc.)
But what about data that’s too big—or too fast-changing—to make sense to explicitly import? An important feature of Tabular (mentioned above) is that it can transparently handle external data, for example in relational databases.
Here’s a reference to a large external database:
This defines a Tabular that points to a table in the external database:
We can ask for the dimensions of the Tabular—and we see that it has 158 million rows:
The table we’re looking at happens to be all the line-oriented data in OpenStreetMap. Here are the first 3 rows and 10 columns:
Most operations on the Tabular will now actually get done in the external database. Here we’re asking to select rows whose “name” field contains "Wolfram":
The actual computation is only done when we use ToMemory, and in this case (because there’s a lot of data in the database) it takes a little while. But soon we get the result, as a Tabular:
And we learn that there are 58 Wolfram-named items in the database:
Another source of data for Tabular is the built-in Wolfram Knowledgebase. In Version 14.2 EntityValue supports direct output in Tabular form:
The Wolfram Knowledgebase provides lots of good examples of data for Tabular. And the same is true of the Wolfram Data Repository—where you can typically just apply Tabular to get data in Tabular form:
In many ways it’s the bane of data science. Yes, data is in digital form. But it’s not clean; it’s not computable. The Wolfram Language has long been a uniquely powerful tool for flexibly cleaning data (and, for example, for advancing through the ten levels of making data computable that I defined some years ago).
But now, in Version 14.2, with Tabular, we have a whole new collection of streamlined capabilities for cleaning data. Let’s start by importing some data “from the wild” (and, actually, this example is cleaner than many):
(By the way, if there was really crazy stuff in the file, we might have wanted to use the option MissingValuePattern to specify a pattern that would just immediately replace the crazy stuff with Missing[…].)
OK, but let’s start by surveying what came in here from our file, using TabularStructure:
We see that Import successfully managed to identify the basic type of data in most of the columns—though for example it can’t tell if numbers are just numbers or are representing quantities with units, etc. And it also identifies that some number of entries in some columns are “missing”.
As a first step in data cleaning, let’s get rid of what seems like an irrelevant "id" column:
Next, we see that the elements in the first column are being identified as strings—but they’re really dates, and they should be combined with the times in the second column. We can do this with TransformColumns, removing what’s now an “extra column” by replacing it with Nothing:
Looking at the various numerical columns, we see that they’re really quantities that should have units. But first, for convenience, let’s rename the last two columns:
Now let’s turn the numerical columns into columns of quantities with units, and, while we’re at it, also convert from °C to °F:
Here’s how we can now plot the temperature as a function of time:
There’s a lot of wiggling there. And looking at the data we see that we’re getting temperature values from several different weather stations. This selects data from a single station:
What’s the break in the curve? If we just scroll to that part of the tabular we’ll see that it’s because of missing data:
So what can we do about this? Well, there’s a powerful function TransformMissing that provides many options. Here we’re asking it to interpolate to fill in missing temperature values:
And now there are no gaps, but, slightly mysteriously, the whole plot extends further:
The reason is that it’s interpolating even in cases where basically nothing was measured. We can remove those rows using Discard:
And now we won’t have that “overhang” at the end:
Sometimes there’ll explicitly be data that’s missing; sometimes (more insidiously) the data will just be wrong. Let’s look at the histogram of pressure values for our data:
Oops. What are those small values? Presumably they’re wrong. (Perhaps they were transcription errors?) We can remove such “anomalous” values by using TransformAnomalies. Here we’re telling it to just completely trim out any row where the pressure was “anomalous”:
We can also get TransformAnomalies to try to “fix” the data. Here we’re just replacing any anomalous pressure by the previous pressure listed in the tabular:
You can also tell TransformAnomalies to “flag” any anomalous value and make it “missing”. But, if we’ve got missing values what then happens if we try to do computations on them? That’s where MissingFallback comes in. It’s fundamentally a very simple function—that just returns its first non-missing argument:
But even though it’s simple, it’s important in making it easy to handle missing values. So, for example, this computes a “northspeed”, falling back to 0 if data needed for the computation is missing:
We’ve said that a Tabular is “like” a list of associations. And, indeed, if you apply Normal to it, that’s what you’ll get:
But internally Tabular is stored in a much more compact and efficient way. And it’s useful to know something about this, so you can manipulate Tabular objects without having to “take them apart” into things like lists and associations. Here’s our basic sample Tabular:
What happens if we extract a row? Well, we get a TabularRow object:
If we apply Normal, we get an association:
Here’s what happens if we instead extract a column:
Now Normal gives a list:
We can create a TabularColumn from a list:
Now we can use InsertColumns to insert a symbolic column like this into an existing Tabular (including the "b" tells InsertColumns to insert the new column after the “b” column):
But what actually is a Tabular inside? Let’s look at the example:
TabularStructure gives us a summary of the internal structure here:
The first thing to notice is that everything is stated in terms of columns, reflecting the fact that Tabular is a fundamentally column-oriented construct. And part of what makes Tabular so efficient is then that within a column everything is uniform, in the sense that all the values are the same type of data. In addition, for things like quantities and dates, we factor the data so that what’s actually stored internally in the column is just a list of numbers, with a single copy of “metadata information” on how to interpret them.
And, yes, all this has a big effect. Like here’s the size in bytes of our New York trees Tabular from above:
But if we turn it into a list of associations using Normal, the result is about 14x larger:
OK, but what are those “column types” in the tabular structure? ColumnTypes gives a list of them:
These are low-level types of the kind used in the Wolfram Language compiler. And part of what knowing these does is that it immediately tells us what operations we can do on a particular column. And that’s useful both in low-level processing, and in things like knowing what kind of visualization might be possible.
When Import imports data from something like a CSV file, it tries to infer what type each column is. But sometimes (as we mentioned above) you’ll want to “cast” a column to a different type, specifying the “destination type” using Wolfram Language type description. So, for example, this casts column “b” to a 32-bit real number, and column “c” to units of meters:
By the way, when a Tabular is displayed in a notebook, the column headers indicate the types of data in the corresponding columns. So in this case, there’s a little in the first column to indicate that it contains strings. Numbers and dates basically just “show what they are”. Quantities have their units indicated. And general symbolic expressions (like column “f” here) are indicated with
. (If you hover over a column header, it gives you more detail about the types.)
The next thing to discuss is missing data. Tabular always treats columns as being of a uniform type, but keeps an overall map of where values are missing. If you extract the column you’ll see a symbolic Missing:
But if you operate on the tabular column directly it’ll just behave as if the missing data is, well, missing:
By the way, if you’re bringing in data “from the wild”, Import will attempt to automatically infer the right type for each column. It knows how to deal with common anomalies in the input data, like NaN or null in a column of numbers. But if there are other weird things—like, say, notfound in the middle of a column of numbers—you can tell Import to turn such things into ordinary missing data by giving them as settings for the option MissingValuePattern.
There are a couple more subtleties to discuss in connection with the structure of Tabular objects. The first is the notion of extended keys. Let’s say we have the following Tabular:
We can “pivot this to columns” so that the values x and y become column headers, but “under” the overall column header “value”:
But what is the structure of this Tabular? We can use ColumnKeys to find out:
You can now use these extended keys as indices for the Tabular:
In this particular case, because the “subkeys” "x" and "y" are unique, we can just use those, without including the other part of the extended key:
Our final subtlety (for now) is somewhat related. It concerns key columns. Normally the way we specify a row in a Tabular object is just by giving its position. But if the values of a particular column happen to be unique, then we can use those instead to specify a row. Consider this Tabular:
The fruit column has the feature that each entry appears only once—so we can create a Tabular that uses this column as a key column:
Notice that the numbers for rows have now disappeared, and the key column is indicated with a gray background. In this Tabular, you can then reference a particular row using for example RowKey:
Equivalently, you can also use an association with the column name:
What if the values in a single column are not sufficient to uniquely specify a row, but several columns together are? (In a real-world example, say one column has first names, and another has last names, and another has dates of birth.) Well, then you can designate all those columns as key columns:
And once you’ve done that, you can reference a row by giving the values in all the key columns:
Tabular provides an important new way to represent structured data in the Wolfram Language. It’s powerful in its own right, but what makes it even more powerful is how it integrates with all the other capabilities in the Wolfram Language. Many functions just immediately work with Tabular. But in Version 14.2 hundreds have been enhanced to make use of the special features of Tabular.
Most often, it’s to be able to operate directly on columns in a Tabular. So, for example, given the Tabular
we can immediately make a visualization based on two of the columns:
If one of the columns has categorical data, we’ll recognize that, and plot it accordingly:
Another area where Tabular can immediately be used is machine learning. So, for example, this creates a classifier function that will attempt to determine the species of a penguin from other data about it:
Now we can use this classifier function to predict species from other data about a penguin:
We can also take the whole Tabular and make a feature space plot, labeling with species:
Or we could “learn the distribution of possible penguins”
and randomly generate 3 “fictitious penguins” from this distribution:
One of the major innovations of Version 14.1 was the introduction of symbolic arrays—and the ability to create expressions involving vector, matrix and array variables, and to take derivatives of them. In Version 14.2 we’re taking the idea of computing with symbolic arrays a step further—for the first time systematically automating what has in the past been the manual process of doing algebra with symbolic arrays, and simplifying expressions involving symbolic arrays.
Let’s start by talking about ArrayExpand. Our longstanding function Expand just deals with expanding ordinary multiplication, effectively of scalars—so in this case it does nothing:
But in Version 14.2 we also have ArrayExpand which will do the expansion:
ArrayExpand deals with many generalizations of multiplication that aren’t commutative:
In an example like this, we really don’t need to know anything about a and b. But sometimes we can’t do the expansion without, for example, knowing their dimensions. One way to specify those dimensions is as a condition in ArrayExpand:
An alternative is to use an explicit symbolic array variable:
In addition to expanding generalized products using ArrayExpand, Version 14.2 also supports general simplification of symbolic array expressions:
The function ArraySimplify will specifically do simplification on symbolic arrays, while leaving other parts of expressions unchanged. Version 14.2 supports many kinds of array simplifications:
We could do these simplifications without knowing anything about the dimensions of a and b. But sometimes we can’t go as far without knowing these. For example, if we don’t know the dimensions we get:
But with the dimensions we can explicitly simplify this to an n×n identity matrix:
ArraySimplify can also take account of the symmetries of arrays. For example, let’s set up a symbolic symmetric matrix:
And now ArraySimplify can immediately resolve this:
The ability to do algebraic operations on complete arrays in symbolic form is very powerful. But sometimes it’s also important to look at individual components of arrays. And in Version 14.2 we’ve added ComponentExpand to let you get components of arrays in symbolic form.
So, for example this takes a 2-component vector and writes it out as an explicit list with two symbolic components:
Underneath, those components are represented using Indexed:
Here’s the determinant of a 3×3 matrix, written out in terms of symbolic components:
And here’s a matrix power:
Given 3D vectors and
we can also for example form the cross product
and we can then go ahead and dot it into an inverse matrix:
As a daily user of the Wolfram Language I’m very pleased with how smoothly I find I can translate computational ideas into code. But the more we’ve made it easy to do, the more we can see new places where we can polish the language further. And in Version 14.2—like every version before it—we’ve added a number of “language tune-ups”.
A simple one—whose utility becomes particularly clear with Tabular—is Discard. You can think of it as a complement to Select: it discards elements according to the criterion you specify:
And along with adding Discard, we’ve also enhanced Select. Normally, Select just gives a list of the elements it selects. But in Version 14.2 you can specify other results. Here we’re asking for the “index” (i.e. position) of the elements that NumberQ is selecting:
Something that can be helpful in dealing with very large amounts of data is getting a bit vector data structure from Select (and Discard), that provides a bit mask of which elements are selected or not:
By the way, here’s how you can ask for multiple results from Select and Discard:
In talking about Tabular we already mentioned MissingFallback. Another function related to code robustification and error handling is the new function Failsafe. Let’s say you’ve got a list which contains some “failed” elements. If you map a function f over that list, it’ll apply itself to the failure elements just as to everything else:
But quite possibly f wasn’t set up to deal with these kinds of failure inputs. And that’s where Failsafe comes in. Because Failsafe[f][x] is defined to give f[x] if x is not a failure, and to just return the failure if it is. So now we can map f across our list with impunity, knowing it’ll never be fed failure input:
Talking of tricky error cases, another new function in Version 14.2 is HoldCompleteForm. HoldForm lets you display an expression without doing ordinary evaluation of the expression. But—like Hold—it still allows certain transformations to get made. HoldCompleteForm—like HoldComplete—prevents all these transformations. So while HoldForm gets a bit confused here when the sequence “resolves”
HoldCompleteForm just completely holds and displays the sequence:
Another piece of polish added in Version 14.2 concerns Counts. I often find myself wanting to count elements in a list, including getting 0 when a certain element is missing. By default, Counts just counts elements that are present:
But in Version 14.2 we’ve added a second argument that lets you give a complete list of all the elements you want to count—even if they happen to be absent from the list:
As a final example of language tune-up in Version 14.2 I’ll mention AssociationComap. In Version 14.0 we introduced Comap as a “co-” (as in “co-functor”, etc.) analog of Map:
In Version 14.2 we’re introducing AssociationComap—the “co-” version of AssociationMap:
Think of it as a nice way to make labeled tables of things, as in:
In 2014—for Version 10.0—we did a major overhaul of the default colors for all our graphics and visualization functions, coming up with what we felt was a good solution. (And as we’ve just noticed, somewhat bizarrely, it turned out that in the years that followed, many of the graphics and visualization libraries out there seemed to copy what we did!) Well, a decade has now passed, visual expectations (and display technologies) have changed, and we decided it was time to spiff up our colors for 2025.
Here’s what a typical plot looked like in Versions 10.0 through 14.1:
And here’s the same plot in Version 14.2:
By design, it’s still completely recognizable, but it’s got a little extra zing to it.
With more curves, there are more colors. Here’s the old version:
And here’s the new version:
Histograms are brighter too. The old:
And the new:
Here’s the comparison between old (“2014”) and new (“2025”) colors:
It’s subtle, but it makes a difference. I have to say that increasingly over the past few years, I’ve felt I had to tweak the colors in almost every Wolfram Language image I’ve published. But I’m excited to say that with the new colors that urge has gone away—and I can just use our default colors again!
We first introduced programmatic access to LLMs in Wolfram Language in the middle of 2023, with functions like LLMFunction and LLMSynthesize. At that time, these functions needed access to external LLM services. But with the release last month of LLM Kit (along with Wolfram Notebook Assistant) we’ve made these functions seamlessly available for everyone with a Notebook Assistant + LLM Kit subscription. Once you have your subscription, you can use programmatic LLM functions anywhere and everywhere in Version 14.2 without any further set up.
There are also two new functions: LLMSynthesizeSubmit and ChatSubmit. Both are concerned with letting you get incremental results from LLMs (and, yes, that’s important, at least for now, because LLMs can be quite slow). Like CloudSubmit and URLSubmit, LLMSynthesizeSubmit and ChatSubmit are asynchronous functions: you call them to start something that will call an appropriate handler function whenever a certain specified event occurs.
Both LLMSynthesizeSubmit and ChatSubmit support a whole variety of events. An example is "ContentChunkReceived": an event that occurs when there’s a chunk of content received from the LLM.
Here’s how one can use that:
The LLMSynthesizeSubmit returns a TaskObject, but then starts to synthesize text in response to the prompt you’ve given, calling the handler function you specified every time a chunk of text comes in. After a few moments, the LLM will have finished its process of synthesizing text, and if you ask for the value of c you’ll see each of the chunks it produced:
Let’s try this again, but now setting up a dynamic display for a string s and then running LLMSynthesizeSubmit to accumulate the synthesized text into this string:
ChatSubmit is the analog of ChatEvaluate, but asynchronous—and you can use it to create a full chat experience, in which content is streaming into your notebook as soon as the LLM (or tools called by the LLM) generate it.
For nearly 20 years we’ve had a streamlined capability to do parallel computation in Wolfram Language, using functions like ParallelMap, ParallelTable and Parallelize. The parallel computation can happen on multiple cores on a single machine, or across many machines on a network. (And, for example, in my own current setup I have 7 machines right now with a total of 204 cores.)
In the past few years, partly responding to the increasing number of cores typically available on individual machines, we’ve been progressively streamlining the way that parallel computation is provisioned. And in Version 14.2 we’ve, yes, parallelized the provisioning of parallel computation. Which means, for example, that my 7 machines all start their parallel kernels in parallel—so that the whole process is now finished in a matter of seconds, rather than potentially taking minutes, as it did before:
Another new feature for parallel computation in Version 14.2 is the ability to automatically parallelize across multiple variables in ParallelTable. ParallelTable has always had a variety of algorithms for optimizing the way it splits up computations for different kernels. Now that’s been extended so that it can deal with multiple variables:
As someone who very regularly does large-scale computations with the Wolfram Language it’s hard to overstate how seamlessly important its parallel computation capabilities have been to me. Usually I’ll first figure out a computation with Map, Table, etc. Then when I’m ready to do the full version I’ll swap in ParallelMap, ParallelTable, etc. And it’s remarkable how much difference a 200x increase in speed makes (assuming my computation doesn’t have too much communication overhead).
(By the way, talking of communication overhead, two new functions in Version 14.2 are ParallelSelect and ParallelCases, which allow you to select and find cases in lists in parallel, saving communication overhead by sending only final results back to the master kernel. This functionality has actually been available for a while through Parallelize[ … Select[ … ] … ] etc., but it’s streamlined in Version 14.2.)
Let’s say we’ve got a video, for example of people walking through a train station. We’ve had the capability for some time to take a single frame of such a video, and find the people in it. But in Version 14.2 we’ve got something new: the capability to track objects that move around between frames of the video.
Let’s start with a video:
We could take an individual frame, and find image bounding boxes. But as of Version 14.2 we can just apply ImageBoundingBoxes to the whole video at once:
Then we can apply the data on bounding boxes to highlight people in the video—using the new HighlightVideo function:
But this just separately indicates where people are in each frame; it doesn’t connect them from one frame to another. In Version 14.2 we’ve added VideoObjectTracking to follow objects between frames:
Now if we use HighlightVideo, different objects will be annotated with different colors:
This picks out all the unique objects identified in the course of the video, and counts them:
“Where’s the dog?”, you might ask. It’s certainly not there for long:
And if we find the first frame where it is supposed to appear it does seem as if what’s presumably a person on the lower right has been mistaken for a dog:
And, yup, that’s what it thought was a dog:
“What about game theory?”, people have long asked. And, yes, there has been lots of game theory done with the Wolfram Language, and lots of packages written for particular aspects of it. But in Version 14.2 we’re finally introducing built-in system functions for doing game theory (both matrix games and tree games).
Here’s how we specify a (zero-sum) 2-player matrix game:
This defines payoffs when each player takes each action. We can represent this by a dataset:
An alternative is to “plot the game” using MatrixGamePlot:
OK, so how can we “solve” this game? In other words, what action should each player take, with what probability, to maximize their average payoff over many instances of the game? (It’s assumed that in each instance the players simultaneously and independently choose their actions.) A “solution” that maximizes expected payoffs for all players is called a Nash equilibrium. (As a small footnote to history, John Nash was a long-time user of Mathematica and what’s now the Wolfram Language—though many years after he came up with the concept of Nash equilibrium.) Well, now in Version 14.2, FindMatrixGameStrategies computes optimal strategies (AKA Nash equilibria) for matrix games:
This result means that for this game player 1 should play action 1 with probability and action 2 with probability
, and player 2 should do these with probabilities
and
. But what are their expected payoffs? MatrixGamePayoff computes that:
It can get pretty hard to keep track of the different cases in a game, so MatrixGame lets you give whatever labels you want for players and actions:
These labels are then used in visualizations:
What we just showed is actually a standard example game—the “prisoner’s dilemma”. In the Wolfram Language we now have GameTheoryData as a repository of about 50 standard games. Here’s one, specified to have 4 players:
And it’s less trivial to solve this game, but here’s the result—with 27 distinct solutions:
And, yes, the visualizations keep on working, even when there are more players (here we’re showing the 5-player case, indicating the 50th game solution):
It might be worth mentioning that the way we’re solving these kinds of games is by using our latest polynomial equation solving capabilities—and not only are we able to routinely find all possible Nash equilibria (not just a single fixed point), but we’re also able to get exact results:
In addition to matrix games, which model games in which players simultaneously pick their actions just once, we’re also supporting tree games, in which players take turns, producing a tree of possible outcomes, ending with a specified payoff for each of the players. Here’s an example of a very simple tree game:
We can get at least one solution to this game—described by a nested structure that gives the optimal probabilities for each action of each player at each turn:
Things with tree games can get more elaborate. Here’s an example—in which other players sometimes don’t know which branches were taken (as indicated by states joined by dashed lines):
What we’ve got in Version 14.2 represents rather complete coverage of the basic concepts in a typical introductory game theory course. But now, in typical Wolfram Language fashion, it’s all computable and extensible—so you can study more realistic games, and quickly do lots of examples to build intuition.
We’ve so far concentrated on “classic game theory”, notably with the feature (relevant to many current applications) that all action nodes are the result of a different sequence of actions. However, games like tic-tac-toe (that I happened to recently study using multiway graphs) can be simplified by merging equivalent action nodes. Multiple sequences of actions may lead to the same game of tic-tac-toe, as is often the case for iterated games. These graph structures don’t fit into the kind of classic game theory trees we’ve introduced in Version 14.2—though (as my own efforts I think demonstrate) they’re uniquely amenable to analysis with the Wolfram Language.
There are lots of “coincidences” in astronomy—situations where things line up in a particular way. Eclipses are one example. But there are many more. And in Version 14.2 there’s now a general function FindAstroEvent for finding these “coincidences”, technically called syzygies (“sizz-ee-gees”), as well as other “special configurations” of astronomical objects.
A simple example is the September (autumnal) equinox:
Roughly this is when day and night are of equal length. More precisely, it’s when the sun is at one of the two positions in the sky where the plane of the ecliptic (i.e. the orbital plane of the earth around the sun) crosses the celestial equator (i.e. the projection of the earth’s equator)—as we can see here (the ecliptic is the yellow line; the celestial equator the blue one):
As another example, let’s find the next time over the next century when Jupiter and Saturn will be closest in the sky:
They’ll get close enough to see their moons together:
There are an incredible number of astronomical configurations that have historically been given special names. There are equinoxes, solstices, equiluxes, culminations, conjunctions, oppositions, quadratures—as well as periapses and apoapses (specialized to perigee, perihelion, periareion, perijove, perikrone, periuranion, periposeideum, etc.). In Version 14.2 we support all these.
So, for example, this gives the next time Triton will be closest to Neptune:
A famous example has to do with the perihelion (closest approach to the Sun) of Mercury. Let’s compute the position of Mercury (as seen from the Sun) at all its perihelia in the first couple of decades of the nineteenth century:
We see that there’s a systematic “advance” (along with some wiggling):
So now let’s quantitatively compute this advance. We start by finding the times for the first perihelia in 1800 and 1900:
Now we compute the angular separation between the positions of Mercury at these times:
Then divide this by the time difference
and convert units:
Famously, 43 arcseconds per century of this is the result of deviations from the inverse square law of gravity introduced by general relativity—and, of course, accounted for by our astronomical computation system. (The rest of the advance is the result of traditional gravitational effects from Venus, Jupiter, Earth, etc.)
More than a decade and a half ago we made the commitment to make the Wolfram Language a full strength PDE modeling environment. Of course it helped that we could rely on all the other capabilities of the Wolfram Language—and what we’ve been able to produce is immeasurably more valuable because of its synergy with the rest of the system. But over the years, with great effort, we’ve been steadily building up symbolic PDE modeling capabilities across all the standard domains. And at this point I think it’s fair to say that we can handle—at an industrial scale—a large part of the PDE modeling that arises in real-world situations.
But there are always more cases for which we can build in capabilities, and in Version 14.2 we’re adding built-in modeling primitives for static and quasistatic magnetic fields. So, for example, here’s how we can now model an hourglass-shaped magnet. This defines boundary conditions—then solves the equations for the magnetic scalar potential:
We can then take that result, and, for example, immediately plot the magnetic field lines it implies:
Version 14.2 also adds the primitives to deal with slowly varying electric currents, and the magnetic fields they generate. All of this immediately integrates with our other modeling domains like heat transfer, fluid dynamics, acoustics, etc.
There’s much to say about PDE modeling and its applications, and in Version 14.2 we’ve added more than 200 pages of additional textbook-style documentation about PDE modeling, including some research-level examples.
Graphics has always been a strong area for the Wolfram Language, and over the past decade we’ve also built up very strong computational geometry capabilities. Version 14.2 adds some more “icing on the cake”, particularly in connecting graphics to geometry, and connecting geometry to other parts of the system.
As an example, Version 14.2 adds geometry capabilities for more of what were previously just graphics primitives. For example, this is a geometric region formed by filling a Bezier curve:
And we can now do all our usual computational geometry operations on it:
Something like this now works too:
Something else new in Version 14.2 is MoleculeMesh, which lets you build computable geometry from molecular structures. Here’s a graphical rendering of a molecule:
And here now is a geometric mesh corresponding to the molecule:
We can then do computational geometry on this mesh:
Another new feature in Version 14.2 is an additional method for graph drawing that can make use of symmetries. If you make a layered graph from a symmetrical grid, it won’t immediately render in a symmetrical way:
But with the new "SymmetricLayeredEmbedding" graph layout, it will:
Making a great user interface is always a story of continued polishing, and we’ve now been doing that for the notebook interface for nearly four decades. In Version 14.2 there are several notable pieces of polish that have been added. One concerns autocompletion for option values.
We’ve long shown completions for options that have a discrete collection of definite common settings (such as All, Automatic, etc.). In Version 14.2 we’re adding “template completions” that give the structure of settings, and then let you tab through to fill in particular values. In all these years, one of the places I pretty much always find myself going to in the documentation is the settings for FrameLabel. But now autocompletion immediately shows me the structure of these settings:
Also in autocompletion, we’ve added the capability to autocomplete context names, context aliases, and symbols that include contexts. And in all cases, the autocompletion is “fuzzy” in the sense that it’ll trigger not only on characters at the beginning of a name but on ones anywhere in the name—which means that you can just type characters in the name of a symbol, and relevant contexts will appear as autocompletions.
Another small convenience added in Version 14.2 is the ability to drag images from one notebook to any other notebook, or, for that matter, to any other application that can accept dragged images. It’s been possible to drag images from other applications into notebooks, but now you can do it the other way too.
Something else that’s for now specific to macOS is enhanced support for icon preview (as well as Quick Look). So now if you have a folder full of notebooks and you select Icon view, you’ll see a little representation of each notebook as an icon of its content:
Under the hood in Version 14.2 there are also some infrastructural developments that will enable significant new features in subsequent versions. Some of these involve generalized support for dark mode. (Yes, one might initially imagine that dark mode would somehow be trivial, but when you start thinking about all the graphics and interface elements that involve colors, it’s clear it’s not. Though, for example, after significant effort we did recently release dark mode for Wolfram|Alpha.)
So, for example, in Version 14.2 you’ll find the new symbol LightDarkSwitched, which is part of the mechanism for specifying styles that will automatically switch for light and dark modes. And, yes, there is a style option LightDark that will switch modes for notebooks—and which is at least experimentally supported.
Related to light/dark mode is also the notion of theme colors: colors that are defined symbolically and can be switched together. And, yes, there’s an experimental symbol ThemeColor related to these. But the full deployment of this whole mechanism won’t be there until the next version.
Many important pieces of functionality inside the Wolfram Language automatically make use of GPUs when they are available. And already 15 years ago we introduced primitives for low-level GPU programming. But in Version 14.2 we’re beginning the process of making GPU capabilities more readily available as a way to optimize general Wolfram Language usage. The key new construct is GPUArray, which represents an array of data that will (if possible) be stored so as to be immediately and directly accessible to your GPU. (On some systems, it will be stored in separate “GPU memory”; on others, such as modern Macs, it will be stored in shared memory in such a way as to be directly accessible by the GPU.)
In Version 14.2 we’re supporting an initial set of operations that can be performed directly on GPU arrays. The operations available vary slightly from one type of GPU to another. Over time, we expect to use or create many additional GPU libraries that will extend the set of operations that can be performed on GPU arrays.
Here is a random ten-million-element vector stored as a GPU array:
The GPU on the Mac on which I am writing this supports the necessary operations to do this purely in its GPU, giving back a GPUArray result:
Here’s the timing:
And here’s the corresponding ordinary (CPU) result:
In this case, the GPUArray result is about a factor of 2 faster. What factor you get will vary with the operations you’re doing, and the particular hardware you’re using. So far, the largest factors I’ve seen are around 10x. But as we build more GPU libraries, I expect this to increase—particularly when what you’re doing involves a lot of compute “inside the GPU”, and not too much memory access.
By the way, if you sprinkle GPUArray in your code it’ll normally never affect the results you get—because operations always default to running on your CPU if they’re not supported on your GPU. (Usually GPUArray will make things faster, but if there are too many “GPU misses” then all the “attempts to move data” may actually slow things down.) It’s worth realizing, though, that GPU computation is still not at all well standardized or uniform. Sometimes there may only be support for vectors, sometimes also matrices—and there may be different data types with different numerical precision supported in different cases.
In addition to all the things we’ve discussed here so far, there are also a host of other “little” new features in Version 14.2. But even though they may be “little” compared to other things we’ve discussed, they’ll be big if you happen to need just that functionality.
For example, there’s MidDate—that computes the midpoint of dates:
And like almost everything involving dates, MidDate is full of subtleties. Here it’s computing the week 2/3 of the way through this year:
In math, functions like DSolve and SurfaceIntegrate can now deal with symbolic array variables:
SumConvergence now lets one specify the range of summation, and can give conditions that depend on it:
A little convenience that, yes, I asked for, is that DigitCount now lets you specify how many digits altogether you want to assume your number has, so that it appropriately counts leading 0s:
Talking of conveniences, for functions like MaximalBy and TakeLargest we added a new argument that says how to sort elements to determine “the largest”. Here’s the default numerical order
and here’s what happens if we use “symbolic order” instead:
There are always so many details to polish. Like in Version 14.2 there’s an update to MoonPhase and related functions, both new things to ask about, and new methods to compute them:
In another area, in addition to major new import/export formats (particularly to support Tabular) there’s an update to "Markdown" import that gives results in plaintext, and there’s an update to "PDF" import that gives a mixed list of text and images.
And there are lots of other things too, as you can find in the “Summary of New and Improved Features in 14.2”. By the way, it’s worth mentioning that if you’re looking at a particular documentation page for a function, you can always find out what’s new in this version just by pressing show changes:
2025-01-10 06:42:31
Theorem (Wolfram with Mathematica, 2000):
The single axiom ((a•b)•c)•(a•((a•c)•a))c is a complete axiom system for Boolean algebra (and is the simplest possible)
For more than a century people had wondered how simple the axioms of logic (Boolean algebra) could be. On January 29, 2000, I found the answer—and made the surprising discovery that they could be about twice as simple as anyone knew. (I also showed that what I found was the simplest possible.)
It was an interesting result—that gave new intuition about just how simple the foundations of things can be, and for example helped inspire my efforts to find a simple underlying theory of physics.
But how did I get the result? Well, I used automated theorem proving (specifically, what’s now FindEquationalProof in Wolfram Language). Automated theorem proving is something that’s been around since at least the 1950s, and its core methods haven’t changed in a long time. But in the rare cases it’s been used in mathematics it’s typically been to confirm things that were already believed to be true. And in fact, to my knowledge, my Boolean algebra axiom is actually the only truly unexpected result that’s ever been found for the first time using automated theorem proving.
But, OK, so we know it’s true. And that’s interesting. But what about the proof? Does the proof, for example, show us why the result is true? Well, actually, in a quarter of a century, nobody (including me) has ever made much headway at all in understanding the proof (which, at least in the form we currently know it, is long and complicated). So is that basically inevitable—say as a consequence of computational irreducibility? Or is there some way—perhaps using modern AI—to “humanize” the proof to a point where one can understand it?
It is, I think, an interesting challenge—that gets at the heart of what one can (and can’t) expect to achieve with formalized mathematics. In what follows, I’ll discuss what I’ve been able to figure out—and how it relates to foundational questions about what mathematics is and how it can be done. And while I think I’ve been able to clarify some of the issues, the core problem is still out there—and I’d like to issue it here as a challenge:
Challenge: Understand the proof of the Theorem
What do I mean by “understand”? Inevitably, “understand” has to be defined in human terms. Something like “so a human can follow and reproduce it”—and, with luck, feel like saying “aha!” at some point, the kind of way they might on hearing a proof of the Pythagorean theorem (or, in logic, something like de Morgan’s law Not[And[p, q]]Or[Not[p], Not[q]]).
It should be said that it’s certainly not clear that such an understanding would ever be possible. After all, as we’ll discuss, it’s a basic metamathematical fact that out of all possible theorems almost none have short proofs, at least in terms of any particular way of stating the proofs. But what about an “interesting theorem” like the one we’re considering here? Maybe that’s different. Or maybe, at least, there’s some way of building out a “higher-level mathematical narrative” for a theorem like this that will take one through the proof in human-accessible steps.
In principle one could always imagine a somewhat bizarre scenario in which people would just rote learn chunks of the proof, perhaps giving each chunk some name (a bit like how people learned bArbArA and cElArEnt syllogisms in the Middle Ages). And in terms of these chunks there’d presumably then be a “human way” to talk about the proof. But learning the chunks—other than as some kind of recreational or devotional activity—doesn’t seem to make much sense unless there’s metamathematical structure that somehow connects the chunks to “general concepts” that are widely useful elsewhere.
But of course it’s still conceivable that there might be a “big theory” that would lead us to the theorem in an “understandable way”. And that could be a traditional mathematical theory, built up with precise, if potentially very abstract, constructs. But what about something more like a theory in natural science? In which we might treat our automatically generated proof as an object for empirical study—exploring its characteristics, trying to get intuition about it, and ultimately trying to deduce the analog of “natural laws” that give us a “human-level” way of understanding it.
Of course, for many purposes it doesn’t really matter why the theorem is true. All that matters is that it is true, and that one can deduce things on the basis of it. But as one thinks about the future of mathematics, and the future of doing mathematics, it’s interesting to explore to what extent it might or might not ultimately be possible to understand in a human-accessible way the kind of seemingly alien result that the theorem represents.
I first presented a version of the proof on two pages of my 2002 book A New Kind of Science, printing it in 4-point type to make it fit:
Today, generating a very similar proof is a one-liner in Wolfram Language (as we’ll discuss below, the · dot here can be thought of as representing the Nand operation):
The proof involves 307 (mostly rather elaborate) steps. And here’s one page of it (out of about 30)—presented in the form of a computable Wolfram Language dataset:
What’s the basic idea of this proof? Essentially it’s to perform a sequence of purely structural symbolic operations that go from our axiom to known axioms of Boolean algebra. And the proof does this by proving a series of lemmas which can be combined to eventually give what we want:
The highlighted “targets” here are the standard Sheffer axioms for Boolean algebra from 1913:
And, yes, even though these are quite short, the intermediate lemmas involved in the proof get quite long—the longest involving 60 symbols (i.e. having LeafCount 60):
It’s as if to get to where it’s going, the proof ends up having to go through the wilds of metamathematical space. And indeed one gets a sense of this if one plots the sizes (i.e. LeafCount) of successive lemmas:
Here’s the distribution of these sizes, showing that while they’re often small, there’s a long tail (note, by the way, that if dot · appears n times in a lemma, the LeafCount will be 2n + 3):
So how are these lemmas related? Here’s a graph of their interdependence (with the size of each dot being proportional to the size of the lemma it represents):
Zooming in on the top we see more detail:
We start from our axiom, then derive a whole sequence of lemmas—as we’ll see later, always combining two lemmas to create a new one. (And, yes, we could equally well call these things theorems—but we generate so many of them it seems more natural to call them “lemmas”.)
So, OK, we’ve got a complicated proof. But how can we check that it’s correct? Well, from the symbolic representation of the proof in the Wolfram Language we can immediately generate a “proof function” that in effect contains executable versions of all the lemmas—implemented using simple structural operations:
And when you run this function, it applies all these lemmas and checks that the result comes out right:
And, yes, this is basically what one would do in a proof assistant system (like Lean or Metamath)—except that here the steps in the proof were generated purely automatically, without any human guidance (or effort). And, by the way, the fact that we can readily translate our symbolic proof representation into a function that we can run provides an operational manifestation of the equivalence between proofs and programs.
But let’s look back at our lemma-interdependence “proof graph”. One notable feature is that we see several nodes with high out-degree—corresponding to what we can think of as “pivotal lemmas” from which many other lemmas end up directly being proved. So here’s a list of the “most pivotal” lemmas in our proof:
Or, more graphically, here are the results for all lemmas that occur:
So what are the “pivotal lemmas”? a · b = b · a we readily recognize as commutativity. But the others—despite their comparative simplicity—don’t seem to correspond to things that have specifically shown up before in the mathematical literature (or, as we’ll discuss later, that’s at least what the current generation of LLMs tell us).
But looking at our proof graph something we can conclude is that a large fraction of the “heavy lifting” needed for the whole proof has already happened by the time we can prove a · b = b · a. So, for the sake of avoiding at least some of hairy detail in the full proof, in most of what follows, we’ll concentrate on the proof of a · b = b · a—which FindEquationalProof tells us we can accomplish in 104 steps, with a proof graph of the form
with the sizes of successive lemmas (in what is basically a breadth-first traversal of the proof graph) being:
It’s already obvious from the previous section that the proof as we currently know it is long, complicated, and fiddly—and in many ways reminiscent of something at a “machine-code” level. But to get a grounded sense of what’s going on in the proof, it’s useful to dive into the details—even if, yes, they can be seriously hard to wrap one’s head around.
At a fundamental level, the way the proof—say of a · b = b · a—works is by starting from our axiom, and then progressively deducing new lemmas from pairs of existing lemmas. In the simplest case, that deduction works by straightforward symbolic substitution. So, for example, let’s say we have the lemmas
and
Then it turns out that from these lemmas we can deduce:
Or, in other words, knowing that the first two lemmas hold for any a gives us enough information about · that the third lemma must inevitably also hold. So how do we derive this?
Our lemmas in effect define two-way equivalences: their left-hand sides are defined as equal to their right-hand sides, which means that if we see an expression that (structurally) matches one side of a lemma, we can always replace it by the other side of the lemma. And to implement this, we can write our second lemma explicitly as a rule—where to avoid confusion we’re using x rather than a:
But if we now look at our first lemma, we see that there’s part of it (indicated with a frame) that matches the left-hand side of our rule:
If we replace this part (which is at position {2,2}) using our rule we then get
which is precisely the lemma we wanted to deduce.
We can summarize what happened here as a fragment of our proof graph—in which a “substitution event” node takes our first two lemmas as input, and “outputs” our final lemma:
As always, the symbolic expressions we’re working with here can be represented as trees:
The substitution event then corresponds to a tree rewriting:
The essence of automated theorem proving is to find a particular sequence of substitutions etc. that get us from whatever axioms or lemmas we’re starting with, to whatever lemmas or theorems we want to reach. Or in effect to find a suitable “path” through the multiway graph of all possible substitutions etc. that can be made.
So, for example, in the particular case we’re considering here, this is the graph that represents all possible transformations that can occur through a single substitution event:
The particular transformation (or “path”) we’ve used to prove a · a = a · ((a · a) · a) is highlighted. But as we can see, there are many other possible lemmas that can be generated, or in other words that can be proved from the two lemmas we’ve given as input. Put another way, we can think of our input lemmas as implying or entailing all the other lemmas shown here. And, by analogy to the concept of a light cone in physics, we can view the collection of everything entailed by given lemmas or given events as the (future) “entailment cone” of those lemmas or events. A proof that reaches a particular lemma is then effectively a path in this entailment cone—analogous in physics to a world line that reaches a particular spacetime point.
If we continue building out the entailment cone from our original lemmas, then after two (substitution) events we get:
There are 21 lemmas generated here. But it turns out that beyond the lemma we already discussed there are only three (highlighted here) that appear in the proof we are studying here:
And indeed the main algorithmic challenge of theorem proving is to figure out which lemmas to generate in order to get a path to the theorem one’s trying to prove. And, yes, as we’ll discuss later, there are typically many paths that will work, and different algorithms will yield different paths and therefore different proofs.
But, OK, seeing how new lemmas can be derived from old by substitution is already quite complicated. But actually there’s something even more complicated we need to discuss: deriving lemmas not only by substitution but also by what we’ve called bisubstitution.
We can think of both substitution and bisubstitution as turning one lemma X == Y into a transformation rule (either X Y or Y
X), and then applying this rule to another lemma, to derive a new lemma. In ordinary substitution, the left-hand side of the rule directly matches (in a Wolfram Language pattern-matching sense) a subexpression in the lemma we’re transforming. But the key point is that all the variables that appear in both our lemmas are really “pattern variables” (x_ etc. in Wolfram Language). So that means there’s another way that one lemma can transform another, in which in effect replacements are made not only in the lemma being transformed, but also in the lemma that’s doing the transforming.
The net effect, though, is still to take two lemmas and derive another, as in:
But in tracing through the details of our proof, we need to distinguish “substitution events” (shown yellowish) from “bisubstitution” ones (shown reddish). (In FindEquationalProof in Wolfram Language, lemmas produced by ordinary substitution are called “substitution lemmas”, while lemmas produced by bisubstitution are called “critical pair lemmas”.)
OK, so how does bisubstitution work? Let’s look at an example. We’re going to be transforming the lemma
using the lemma (which in this case happens to be our original axiom)
to derive the new lemma:
We start by creating a rule from the second lemma. In this case, the rule we need happens to be reversed relative to the way we wrote the lemma, and this means that (in the canonical form we’re using) it’s convenient to rename the variables that appear:
To do our bisubstitution we’re going to apply this rule to a subterm of our first lemma. We can write that first lemma with explicit pattern variables:
As always, the particular names of those variables don’t matter. And to avoid confusion, we’re going to rename them:
Now look at this subterm of this lemma (which is part {2,1,1,2} of the expression):
It turns out that with appropriate bindings for pattern variables this can be matched (or “unified”) with the left-hand side of our rule. This provides a way to find such bindings:
(Note that in these bindings things like c_ stand only for explicit expressions, like c_, not for expressions that the ordinary Wolfram Language pattern c_ would match.)
Now if we apply the bindings we’ve found to the left-hand side of our rule
and to the subterm we picked out from our lemma
we see that we get the same expression. Which means that with these bindings the subterm matches the left-hand side of our rule, and we can therefore replace this subterm with the right-hand side of the rule. To see all this in operation, we first apply the bindings we’ve found to the lemma we’re going to transform (and, as it happens, the binding for y_ is the only one that matters here):
Now we take this form and apply the rule at the position of the subterm we identified:
Renaming variables
we now finally get exactly the lemma that we were trying to derive:
And, yes, getting here was a pretty complicated process. But with the symbolic character of our lemmas, it’s one that is inevitably possible, and so can be used in our proof. And in the end, out of the 101 lemmas used in the proof, 47 were derived by ordinary substitution, while 54 were derived by bisubstitution.
And indeed the first few steps of the proof turn out to use only bisubstituion. An example is the first step—which effectively applies the original axiom to itself using bisubsitution:
And, yes, even this very first step is pretty difficult to follow.
If we start from the original axiom, there are 16 lemmas that can be derived purely by a single ordinary substitution (effectively of the axiom into itself)—resulting in the following entailment cone:
As it happens, though, none of the 16 new lemmas here actually get used in our proof. On the other hand, in the bisubstitution entailment cone
there are 24 new lemmas, and 4 of them get used in the proof—as we can see from the first level of the proof graph (here rotated for easier rendering):
At the next level of the entailment cone from ordinary substitution, there are 5062 new lemmas—none of which get used in the proof. But of the 31431 new lemmas in the (pure) bisubstitution entailment cone, 13 do get used:
At the next level, lemmas generated by ordinary substitution also start to get used:
Here’s another rendering of these first few levels of the proof graph:
Going to another couple of levels we’re starting to see quite a few independent chains of lemmas developing
which eventually join up when we assemble the whole proof graph:
A notable feature of this proof graph is that it has more bisubstitution events at the top, and more ordinary substitution events at the bottom. So why is that? Essentially it seems to be because bisubstitution events tend to produce larger lemmas, and ordinary substitution events tend to produce smaller ones—as we can see if we plot input and output lemma sizes for all events in the proof:
So in effect what seems to be happening is that the proof first has to “spread out in metamathematical space”, using bisubstitution to generate large lemmas “far out in metamathematical space”. Then later the proof has to “corral things back in”, using ordinary substitution to generate smaller lemmas. And for example, at the very end, it’s a substitution event that yields the final theorem we’re trying to prove:
And earlier in the graph, there’s a similar “collapse” to a small (and rather pivotal) lemma:
As the plot above indicates, ordinary substitution can lead to large lemmas, and indeed bisubstitution can also lead to smaller ones, as in
or slightly more dramatically:
But, OK, so this is some of what’s going on at a “machine-code” level inside the proof we have. Of course, given our axiom and the operations of substitution and bisubstitution there are inevitably a huge number of different possible proofs that could be given. The particular proof we’re considering is what the Wolfram Language FindEquationalProof gives. (In the Appendix, we’ll also look at results from some other automated theorem proving systems. The results will be very comparable, if usually a little lengthier.)
We won’t discuss the detailed (and rather elaborate) algorithms inside FindEquationalProof. But fundamentally what they’re doing is to try constructing certain lemmas, then to find sequences of lemmas that eventually form a “path” to what we’re trying to prove. And as some indication of what’s involved in this, here’s a plot of the number of “candidate lemmas” that are being maintained as possible when different lemmas in the proof are generated:
And, yes, for a while there’s roughly exponential growth, leveling off at just over a million when we get to the “pulling everything together” stage of the proof.
In what we’ve done so far, we’ve viewed our proof as working by starting from an axiom, then progressively building up lemmas, until eventually we get to the theorem we want. But there’s an alternative view that’s in some ways useful in getting a more direct, “mechanical” intuition about what’s going on in the proof.
Let’s say we’re trying to prove that our axiom implies that p · q = q · p. Well, then there must be some way to start from the expression p · q and just keep on judiciously applying the axiom until eventually we get to the expression q · p. And, yes, the number of axiom application steps required might be very large. But ultimately, if it’s true that the axiom implies p · q = q · p there must be a path that gets from p · q to q · p.
But before considering the case of our full proof, let’s start with something simpler. Let’s assume that we’ve already established the lemmas:
Then we can treat them as axioms, and ask a question like whether they imply the lemma
or, in our current approach, whether they can be used to form a path from a · a to a · (a · (a · a)).
Well, it’s not too hard to see that in fact there is such a path. Apply our second lemma to a · a to get:
But now this subterm
matches the left-hand of the first lemma, so that it can be replaced by the right-hand side of that lemma (i.e. by a · (a · a)), giving in the end the desired a · (a · (a · a)).
So now we can summarize this process as:
In what follows, it’ll be convenient to label lemmas. We’ll call our original axiom A1, we’ll call our successive lemmas generated by ordinary substitution Sn and the ones generated by bisubsitution Bn:
In our proof we’ll also use and
to indicate whether we’re going to use the lemma (say
Y or the “reverse direction” X
Y. And with this labeling, the proof we just gave (which is for the lemma S23) becomes:
Each step here is a pure substitution, and requires no replacement in the rule (i.e. “axiom”) being used. But proofs like this can also be done with bisubstitution, where replacements are applied to the rule to get it in a form where it can directly be applied to transform an expression:
OK, so how about the first lemma in our full proof? Here’s a proof that its left-hand side can be transformed to its right-hand side just by judiciously applying the original axiom:
Here’s a corresponding proof for the second lemma:
Both these involve bisubstitution. Here’s a proof of the first lemma derived purely by ordinary substitution:
This proof is using not only the original axiom but also the lemma B5. Meanwhile, B5 can be proved using the original axiom together with B2:
But now, inserting the proof we just gave above for B2, we can give a proof of B5 just in terms of the original axiom:
And recursively continuing this unrolling process, we can then prove S1 purely using the original axiom:
What about the whole proof? Well, at the very end we have:
If we “unroll” one step we have
and after 2 steps:
In principle we could go on with this unrolling, in effect recursively replacing each rule by the sequence of transformations that represents its proof. Typically this process will, however, generate exponentially longer proof sequences. But say for lemma S5
the result is still very easily manageable:
We can summarize this result by in effect plotting the sizes of the intermediate expressions involved—and indicating what part of each expression is replaced at each step (with as above indicating “forward” use of the axiom A1
and
“backward” A1
):
For lemma B33
the unrolled proof is now 30 steps long
while for lemma S11
the unrolled proof is 88 steps long:
But here there is a new subtlety. Doing a direct substitution of the “proof paths” for the lemmas used to prove S11 in our original proof gives a proof of length 104:
But this proof turns out to be repetitive, with the whole gray section going from one copy to another of:
As an example of a larger proof, we can consider lemma B47:
And despite the simplicity of this lemma, our proof for it is 1008 steps long:
If we don’t remove repetitive sections, it’s 6805 steps:
Can we unroll the whole proof of a · b = b · a? We can get closer by considering lemma S36:
Its proof is 27105 steps long:
The distribution of expression sizes follows a roughly exponential distribution, with a maximum of 20107:
Plotting the expression sizes on a log scale one gets:
And what stands out most here is a kind of recursive structure—which is the result of long sequences that basically represent the analog of “subroutine calls” that go back and repeatedly prove lemmas that are needed.
OK, so what about the whole proof of a · b = b · a? Yes, it can be unrolled—in terms of 83,314 applications of the original axiom. The sequence of expression sizes is:
Or on a log scale:
The distribution of expression sizes now shows clear deviation from being exponential:
The maximum is 63245, which occurs just 81 steps after the exact midpoint of the proof. In other words, in the middle, the proof has wandered incredibly far out in metamathematical space (there are altogether CatalanNumber[63245] ≈ 1038070 possible expressions of the size it reaches).
The proof returns to small expressions just a few times; here are all the cases in which the size is below 10:
So, yes, it is possible to completely unroll the proof into a sequence of applications of the original axiom. But if one does this, it inevitably involves repeating lots of work. Being able to use intermediate lemmas in effect lets one “share common subparts” in the proof. So that one ends up with just 104 “rule applications”, rather than 83314. Not that it’s easy to understand those 104 steps…
Looking at our proof—either in its original “lemma” form, or in its “unrolled” form—the most striking aspect of it is how complicated (and incomprehensible) it seems to be. But one might wonder whether much of that complexity is just the result of not “using the right notation”. In the end, we’ve got a huge number of expressions written in terms of · operations that we can interpret as Nand (or Nor). And maybe it’s a little like seeing the operation of a microprocessor down at the level of individual gates implementing Nands or Nors. And might there perhaps be an analog of a higher-level representation—with higher-level operations (even like arithmetic) that are more accessible to us humans?
It perhaps doesn’t help that Nand itself is a rather non-human construct. For example, not a single natural human language seems to have a word for Nand. But there are combinations of Nands that have more familiar interpretations:
But what combinations actually occur in our proof? Here are the most common subexpressions that appear in lemmas in the proof:
And, yes, we could give the most common of these special names. But it wouldn’t really help in “compressing” the proof—or making it easier to understand.
What about “upgrading” our “laws of inference”, i.e. the way that we can derive new lemmas from old? Perhaps instead of substitution and bisubstitution, which both take two lemmas and produce one more, we could set up more elaborate “tactics” that for example take in more input lemmas. We’ve seen that if we completely unroll the proof, it gets much longer. So perhaps there is a “higher-order” setup that for example dramatically shortens the proof.
One way one might identify this is by seeing commonly repeating structures in the subgraphs that lead to lemmas. But in fact these subgraphs are quite diverse:
A typical feature of human-written mathematical proofs is that they’re “anchored” by famous theorems or lemmas. They may have fiddly technical pieces. But usually there’s a backbone of “theorems people know”.
We have the impression that the proof we’re discussing here “spends most of its time wandering around the wilds of metamathematical space”. But perhaps it visits waypoints that are somehow recognizable, or at least should be. Or in other words, perhaps out in the metamathematical space of lemmas there are ones that are somehow sufficiently popular that they’re worth giving names to, and learning—and can then be used as “reference points” in terms of which our proof becomes simpler and more human accessible.
It’s a story very much like what happens with human language. There are things out there in the world, but when there’s a certain category of them that are somehow common or important enough, we make a word for them in our language, which we can then use to “compactly” refer to them. (It’s again the same story when it comes to computational language, and in particular the Wolfram Language, except that in that case it’s been my personal responsibility to come up with the appropriate definitions and names for functions to represent “common lumps of computation”.)
But, OK, so what are the “popular lemmas” of Nand proofs? One way to explore this is to enumerate statements that are “true about Nand”—then to look at proofs of these statements (say found with FindEquationalProof from our axiom) and see what lemmas show up frequently in them.
Enumerating statements “true about Nand”, starting from the smallest, we get
where we have highlighted statements from this list that appear as lemmas in our proof.
Proving each of these statements from our original axiom, here are the lengths of proofs we find (for all 1341 distinct theorems with up to LeafCount 4 on each side):
A histogram shows that it’s basically a bimodal distribution
with the smallest “long-proof” theorem being:
In aggregate, all these proofs use about 200,000 lemmas. But only about 1200 of these are distinct. And we can plot which lemmas are used in which proofs—and we see that there are indeed many lemmas that are used across wide ranges of proofs, while there are a few others that are “special” to each proof (the diagonal stripe is associated with lemmas close to the statement being proved):
If we rank all distinct lemmas from most frequently to least frequently used, we get the following distribution of lemma usage frequencies across all our proofs:
It turns out that there is a “common core” of 49 lemmas that are used in every single one of the proofs. So what are these lemmas? Here’s a plot of the usage frequency of lemmas against their size—with the “common ones” populating the top line:
And at first this might seem surprising. We might have expected that short lemmas would be the most frequent, but instead we’re seeing long lemmas that always appear, the very longest being:
So why is this? Basically it’s that these long lemmas are being used at the beginning of every proof. They’re the result of applying bisubstitution to the original axiom, and in some sense they seem to be laying down a kind of net in metamathematical space that then allows more diverse—and smaller—lemmas to be derived.
But how are these “common core” popular lemmas distributed within proofs? Here are a few examples:
And what we see is that while, yes, the common core lemmas are always at the beginning, they don’t seem to have a uniform way of “plugging into” the rest of the proof. And it doesn’t, for example, seem as if there’s just some small set of (perhaps simple) “waypoint” lemmas that one can introduce that will typically shorten these proofs.
If one effectively allows all the common core lemmas to be used as axioms, then inevitably proofs will be shortened; for example, the proof of a · b = b · a—which only ends up using 5 of the common core lemmas—is now shortened to 51 lemmas:
It doesn’t seem to become easier to understand, though. And if it’s unrolled, it’s still 5013 steps.
Still, one can ask what happens if one just introduces particular “recognizable” lemmas as additional axioms. For example, if we include “commutativity” a · b = b · a then we find that, yes, we do manage to reduce the lengths of some proofs, but certainly not all:
Are there any other “pivotal” lemmas we could add? In particular, what about lemmas that can help with the length-200 or more proofs? It turns out that all of these proofs involve the lemma:
So what happens if we add this? Well, it definitely reduces proof lengths:
And sometimes it even seems like it brings proofs into “human range”. For example, a proof of
from our original axiom has length 56. Adding in commutativity reduces it to length 18. And adding our third lemma reduces it to just length 9—and makes it not even depend directly on the original axiom:
But despite the apparent simplicity here, the steps involved—particularly when bisubstitution is used—are remarkably hard to follow. (Note the use of a = a as a kind of “implicit axiom”—something that has actually also appeared, without comment, in many of our other proofs.)
The proof that we’ve been studying can be seen in some ways as a rather arbitrary artifact. It’s the output of FindEquationalProof, with all its specific detailed internal algorithms and choices. In the Appendix, we’ll see that other automated theorem proving systems give very similar results. But we still might wonder whether actually the complexity of the proof as we’ve been studying it is just a consequence of the details of our automated theorem proving—and that in fact there’s a much shorter (and perhaps easier to understand) proof that exists.
One approach we could take—reminiscent of higher category theory—is to think about just simplifying the proof we have, effectively using proof-to-proof transformations. And, yes, this is technically difficult, though it doesn’t seem impossible. But what if there are “holes” in proof space? Then a “continuous deformation” of one proof into another will get stuck, and even if there is a much shorter proof, we’re liable to get “topologically stuck” before we find it.
One way to be sure we’re getting the shortest proof of a particular lemma is to explicitly find the first place that lemma appears in the (future) entailment cone of our original axiom. For example, as we saw above, a single substitution event leads to the entailment cone:
Every lemma produced here is, by construction, in principle derivable by a proof involving a single substitution event. But if we actually use FindEquationalProof to prove these lemmas, the proofs we get most involve 2 events (and in one case 4):
If we take another step in the entailment cone, we get a total of 5062 lemmas. From the way we generated them, we know that all these lemmas can in principle be reached by proofs of length 2. But if we run FindEquationalProof on them, we find a distribution of proof lengths:
And, yes, there is one lemma (with LeafCount 183) that is found only by a proof of length 15. But most often the proof length is 4—or about double what it could be.
If we generate the entailment cone for lemmas using bisubstitution rather than just ordinary substitution, there are slightly more cases where FindEquationalProof does worse at getting minimal proofs.
For example, the lemma
and 3 others can be generated by a single bisubstitution from the original axiom, but FindEquationalProof gives only proofs of length 4 for all of these.
What about unrolled proofs, in which one can generate an entailment cone by starting from a particular expression, and then applying the original axiom in all possible ways? For example, let’s say we start with:
Then applying bisubstitution with the original axiom once in all possible ways gives:
Applying bisubstitution a second time gives a larger entailment cone:
But now it turns out that—as indicated—one of the expressions in this cone is:
So this shows that the lemma
can in principle be reached with just two steps of “unrolled” proof:
And in this particular case, if we use FindEquationalProof and then unroll the resulting proof we also get a proof of length 3—but it goes through a different intermediate expression:
As it happens, this intermediate expression is also reached in the entailment cone that we get by starting from our “output” expression and then applying two bisubsitutions:
We can think of logic (or Boolean algebra) as being associated with a certain collection of theorems. And what our axiom does is to provide something from which all theorems of logic (and nothing but theorems of logic) can be derived. At some level, we can think of it as just being about symbolic expressions. But in our effort to understand what’s going on—say with our proof—it’s sometimes useful to ask how we can “concretely” interpret these expressions.
For example, we might ask what the · operator actually is. And what kinds of things can our symbolic variables be? In effect we’re asking for what in model theory are called “models” of our axiom system. And in aligning with logic the most obvious model to discuss is one in which variables can be True or False, and the · represents either the logical operator Nand or the logical operator Nor.
The truth table, say for Nand, is:
And as expected, with this model for ·, we can confirm that our original axiom holds:
In general, though, our original axiom allows two size-2 models (that we can interpret as Nand and Nor):
It allows no size-3 models, and in fact in general allows only models of size 2n; for example, for size 4 its models are:
So what about a · b = b · a? What models does it allow? For size 2, it’s all 8 possible models with symmetric “multiplication tables”:
But the crucial point is that the 2 models for our original axiom system are part of these. In other words, at least for size-2 models, satisfying the original axiom system implies satisfying
And indeed any lemma derived from our axiom system must allow the models associated with our original axiom system. But it may also allow more—and sometimes many more. So here’s a map of our proof, showing how many models (out of 16 possible) each lemma allows:
Here are the results for size-3 models:
And, once again, these look complicated. We can think of models as defining—in some sense—what lemmas are “about”. So, for example, our original axiom is “about” Nand and Nor. The lemma a · b = b · a is “about” symmetric functions. And so on. And we might have hoped that we could gain some understanding of our proof by looking at how different lemmas that occur in it “sculpt” what is being talked about. But in fact we just seem to end up with complicated descriptions of sets that don’t seem to have any obvious relationship with each other.
If there’s one thing that stands out about our proof—and the analysis we’ve given of it here—it’s how fiddly and “in the weeds” it seems to be. But is that because we’re missing some big picture? Is there actually a more abstract way of discussing things, that gets to our result without having to go through all the details?
In the history of mathematics many of the most important themes have been precisely about finding such higher-level abstractions. We could start from the explicit symbolic axioms
or even
and start building up theorems much as we’ve done here. Or we could recognize that these are axioms for group theory, and then start using the abstract ideas of group theory to derive our theorems.
So is there some higher-level version of what we’re discussing here? Remember that the issue is not about the overall structure of Boolean algebra; rather it’s about the more metamathematical question of how one can prove that all of Boolean algebra can be generated from the axiom:
In the last few sections we’ve tried a few semi-empirical approaches to finding higher-level representations. But they haven’t gotten very far. And to get further we’re probably going to need a serious new idea.
And, if history is a guide, we’re going to need to come up with an abstraction that somehow “goes outside of the system” before “coming back”. It’s like trying to figure out the real roots of a cubic equation, and realizing that the best way to do this is to introduce complex numbers, even though the imaginary parts will cancel at the end.
In the direct exploration of our proof, it feels as if the intermediate lemmas we generate “wander off into the wilds of metamathematical space” before coming back to establish our final result. And if we were using a higher-level abstraction, we’d instead be “wandering off” into the space of that abstraction. But what we might hope is that—at least with the concepts we would use in discussing that abstraction—the path that would be involved would be “short enough to be accessible to human understanding”.
Will we be able to find such an abstraction? It’s a subtle question. Because in effect it asks whether we can reduce the computational effort needed for the proof—or, in other words, whether we can find a pocket of computational reducibility in what in general will be a computationally irreducible process. But it’s not a question that can really be answered just for our specific proof on it own. After all, our “abstraction” could in principle just involve introducing a primitive that represents our whole proof or a large part of it. But to make it what we can think of as a real abstraction we need something that spans many different specific examples—and, in our case, likely many axiomatic systems or symbolic proofs.
So is such an abstraction possible? In the history of mathematics the experience has been that after enough time (often measured in centuries) has passed, abstractions tend to be found. But at some level this has been self fulfilling. Because the areas that are considered to have remained “interesting for mathematics” tend to be just those where general abstractions have in fact been found.
In ruliology, though, the typical experience has been different. Because there it’s been routine to sample the computational universe of possible simple programs and encounter computational irreducibility. In the end it’s still inevitable that among the computational irreducibility there must be pockets of computational reducibility. But the issue is that these pockets of computational reducibility may not involve features of our system that we care about.
So is a proof of the kind we’re discussing here more like ruliology, or more like “typical mathematics”? Insofar as it’s a mathematical-style proof of a mathematical statement it feels more like typical mathematics. But insofar as it’s something found by the computational process of automated theorem proving it perhaps seems more ruliology.
But what might a higher-level abstraction for it look like? Figuring that out is probably tantamount to finding the abstraction. But perhaps one can at least expect that in some ways it will be metamathematical, and more about the structure and character of proofs than about their content. Perhaps it will be something related to the framework of higher category theory, or some form of meta-algebra. But as of now, we really don’t know—and we can’t even say that such an abstraction with any degree of generality is possible.
The unexpected success of LLMs in language generation and related tasks has led to the idea that perhaps eventually systems like LLMs will be able to “do everything”—including for example math. We already know—not least thanks to Wolfram Language—that lots of math can be done computationally. But often the computations are hard—and, as in the example of the proof we’re discussing here, incomprehensible to humans. So the question really is: can LLMs “humanize” what has to be done in math, turning everything into a human-accessible narrative? And here our proof seems like an excellent—if challenging—test case.
But what happens if we just ask a current LLM to generate the proof from scratch? It’s not a good picture. Very often the LLM will eagerly generate a proof, but it’ll be completely wrong, often with the same kind of mistakes that a student somewhat out of their depth might make. Here’s a typical response where an LLM simply assumes that the · operator is associative (which it isn’t in Boolean algebra) then produces a proof that on first blush looks at least vaguely plausible, but is in fact completely wrong:
Coming up with an explanation for what went wrong is basically an exercise in “LLM psychology”. But in a first approximation one might say the following. LLMs are trained to “fill in what’s typical”, where “typical” is defined by what appears in the training set. But (absent some recent Wolfram Language and Wolfram|Alpha based technology of ours) what’s been available as a training set has been human-generated mathematical texts, where, yes, operators are often associative, and typical proofs are fairly short. And in the “psychology of LLMs” an LLM is much more likely to “do what’s typical” than to “rigorously follow the rules”.
If you press the LLM harder, then it might just “abdicate”, and suggest using the Wolfram Language as a tool to generate the proof. So what happens if we do that, then feed the finished proof to the LLM and ask it to explain? Well, typically it just does what LLMs do so well, and writes an essay:
So, yes, it does fine in “generally framing the problem”. But not on the details. And if you press it for details, it’ll typically eventually just start parroting what it was given as input.
How else might we try to get the LLM to help? One thing I’ve certainly wondered is how the lemmas in the proof relate to known theorems—perhaps in quite different areas of mathematics. It’s something one might imagine one would be able to answer by searching the literature of mathematics. But, for example, textual search won’t be sufficient: it has to be some form of semantic search based on the meaning or symbolic structure of lemmas, not their (fairly arbitrary) textual presentation. A vector database might be all one needs, but one can certainly ask an LLM too:
It’s not extremely helpful, though, charmingly, it correctly identifies the source of our original axiom. I’ve tried similar queries for our whole set of lemmas across a variety of LLMs, with a variety of RAG systems. Often the LLM will talk about an interpretation for some lemma—but the lemma isn’t actual present in our proof. But occasionally the LLM will mention possible connections (“band theory”; “left self-distributive operations in quandles”; “Moufang loops”)—though so far none have seemed to quite hit the mark.
And perhaps this failure is itself actually a result—telling us that the lemmas that show up in our proof really are, in effect, out in the wilds of metamathematical space, probing places that haven’t ever been seriously visited before by human mathematics.
But beyond LLMs, what about more general machine learning and neural net approaches? Could we imagine using a neural net as a probe to find “exploitable regularities” in our proof? It’s certainly possible, but I suspect that the systematic algorithmic methods we’ve already discussed for finding optimal notations, popular lemmas, etc. will tend to do better. I suppose it would be one thing if our systematic methods had failed to even find a proof. Then we might have wanted something like neural nets to try to guess the right paths to follow, etc. But as it is, our systematic methods rather efficiently do manage to successfully find a proof.
Of course, there’s still the issue that we’re discussing here that the proof is very “non-human”. And perhaps we could imagine that neural nets, etc.—especially when trained on existing human knowledge—could be used to “form concepts” that would help us humans to understand the proof.
We can get at least a rough analogy for how this might work by looking at visual images produced by a generative AI system trained from billions of human-selected images. There’s a concept (like “a cube”) that exists somewhere in the feature space of possible images. But “around” that concept are other things—“out in interconcept space”—that we don’t (at least yet) explicitly have words for:
And it’ll presumably be similar for math, though harder to represent in something like a visual way. There’ll be existing math concepts. But these will be embedded in a vast domain of “mathematical interconcept space” that we humans haven’t yet “colonized”. And what we can imagine is that—perhaps with the help of neural nets, etc.—we can identify a limited number of “points in interconcept space” that we can introduce as new concepts that will, for example, provide useful “waypoints” in understanding our proof.
It’s a common human urge to think that anything that’s true must be true for a reason. But what about our theorem? Why is it true? Well, we’ve seen a proof. But somehow that doesn’t seem satisfactory. We want “an explanation we can understand”. But we know that in general we can’t always expect to get one.
It’s a fundamental implication of computational irreducibility that things can happen where the only way to “see how they happen” is just to “watch them happen”; there’s no way to “compress the explanation”.
Consider the following patterns. They’re all generated by cellular automata. And all live exactly 100 steps before dying out. But why?
In a few cases it seems like we can perhaps at least begin to imagine “narratively describing” a mechanism. But most of the time all we can say is basically that they “live 100 steps because they do”.
It’s a quintessential consequence of computational irreducibility. It might not be what we’d expect, or hope for. But it’s reality in the computational universe. And it seems very likely that our theorem—and its proof—is like this too. The theorem in effect “just happens to be true”—and if you run the steps in the proof (or find the appropriate path in the entailment cone) you’ll find that it is. But there’s no “narrative explanation”. No “understanding of why it’s true”.
We’ve been talking a lot about the proof of our theorem. But where did the theorem to prove come from in the first place? Its immediate origin was an exhaustive search I did of simple axiom systems, filtering for ones that could conceivably generate Boolean algebra, followed by testing each of the candidates using automated theorem proving.
But how did I even get the idea of searching for a simple axiom system for Boolean algebra? Based on the axiom systems for Boolean algebra known before—and the historical difficulty of finding them—one might have concluded that it was quite hopeless to find an axiom system for Boolean algebra by exhaustive search. But by 2000 I had nearly two decades of experience in exploring the computational universe—and I was well used to the remarkable phenomenon that even very simple computational rules can lead to behavior of great complexity. So the result was that when I came to think about axiom systems and the foundations of mathematics my intuition led me to imagine that perhaps the simplest axiom system for something like Boolean algebra might be simple enough to exhaustively search for.
And indeed discovering the axiom system we’ve discussed here helped further expand and deepen my intuition about the consequences of simple rules. But what about the proof? What intuition might one get from the proof as we now know it, and as we’ve discussed here?
There’s much intuition to be got from observing the world as it is. But for nearly half a century I’ve had another crucial source of intuition: observing the computational universe—and doing computational experiments. I was recently reflecting on how I came to start developing intuition in this way. And what it might mean for intuition I could now develop from things like automated theorem proving and AI.
Back in the mid-1970s my efforts in particle physics led me to start using computers to do not just numerical, but also algebraic computations. In numerical computations it was usual to just get a few numbers out, that perhaps one could plot to make a curve. But in algebraic computations one instead got out formulas—and often very ornate ones full of structure and detail. And for me it was routine to get not just one formula, but many. And looking at these formulas I started to develop intuition about them. What functions would they involve? What algebraic form would they take? What kind of numbers would they involve?
I don’t think I ever consciously realized that I was developing a new kind of computationally based intuition. But I soon began to take it for granted. And when—at the beginning of the 1980s—I started to explore the consequences of simple abstract systems like cellular automata it was natural to expect that I would get intuition from just “seeing” how they behaved. And here there was also another important element. Because part of the reason I concentrated on cellular automata was precisely because one could readily visualize their behavior on a computer.
I don’t think I would have learned much if I’d just been printing out “numerical summaries” of what cellular automata do. But as it was, I was seeing their behavior in full detail. And—surprising though what I saw was—I was soon able to start getting an intuition for what could happen. It wasn’t a matter of knowing what the value of every cell would be. But I started doing things like identifying four general classes of cellular automata, and then recognizing the phenomenon of computational irreducibility.
By the 1990s I was much more broadly exploring the computational universe—always trying to see what could happen there. And in almost all cases it was a story of defining simple rules, then running them, and making an explicit step-by-step visualization of what they do—and thereby in effect “seeing computation in action”.
In recent years—spurred by our Physics Project—I’ve increasingly explored not just computational processes, but also multicomputational ones. And although it’s more difficult I’ve made every effort to visualize the behavior of multiway systems—and to get intuition about what they do.
But what about automated theorem proving? In effect, automated theorem proving is about finding a particular path in a multiway system that leads to a theorem we want. We’re not getting to see “complete behavior”; we’re in effect just seeing one particular “solution” for how to prove a theorem.
And after one’s seen many examples, the challenge once again is to develop intuition. And that’s a large part of what I’ve been trying to do here. It’s crucial, I think, to have some way to visualize what’s happening—in effect because visual input is the most efficient way to get information into our brains. And while the visualizations we’ve developed here aren’t as direct and complete as, say, for cellular automaton evolution, I think they begin to give some overall sense of our proof—and other proofs like it.
In studying simple programs like cellular automata, the intuition I developed led me to things like my classification of cellular automaton behavior, as well as to bigger ideas like the Principle of Computational Equivalence and computational irreducibility. So having now exposed myself to automated theorem proving as I exposed myself to algebraic computation and the running of simple rules in the past, what general principles might I begin to see? And might they, for example, somehow make the fact that our proof works ultimately seem “obvious”?
In some ways yes, but in other ways no. Much as with simple programs, there are axiom systems so simple that, for example, the multiway systems they generate are highly regular. But beyond a low threshold, it’s common to get very complicated—and in many ways seemingly random—multiway system structures. Typically an infinite number of lemmas are generated, with little or no obvious regularity in their forms.
And one can expect that—following the ideas of universal computation—it’ll typically be possible to encode in any one such multiway system the behavior of any other multiway system. In terms of axioms what one’s saying is that if one sets up the right translation between theorems, one will be able to use any one such axiom system to generate the theorems of any other. But the issue is that the translation will often make major changes to the structure of the theorems, and in effect define not just a “mathematical translation” (like between geometry and algebra) but a metamathematical one (as one would need to get from Peano arithmetic to set theory).
And what this means is that it isn’t surprising that even a very simple axiom system can generate a complicated set of possible lemmas. But knowing this doesn’t immediately tell one whether those lemmas will align with some particular existing theory—like Boolean algebra. And in a sense that’s a much more detailed question.
At some metamathematical level it might not be a natural question. But at a “mathematical level” it is. And it’s what we have to address in connection with the theorem—and proof—we’re discussing here. Many aspects of the overall form and properties of the proof will be quite generic, and won’t depend on the particulars of the axiom system we’re using. But some will. And quite what intuition we may be able to get about these isn’t clear. And perhaps it’ll necessarily be fragmented and specific—in effect responding to the presence of computational irreducibility.
It’s perhaps worth commenting that LLMs—and machine learning in general—represent another potential source of intuition. That intuition may well be more about the general features of us as observers and thinkers. But such intuition is potentially critical in framing just what we can experience, not only in the natural world, but also in the mathematical and metamathematical worlds. And perhaps the apparent impotence of LLMs when faced with the proof we’ve been discussing already tells us something significant about the nature of “mathematical observers” like us.
Let’s say we never manage to “humanize” the proof we’ve been discussing here. Then in effect we’ll end up with a “black-box theorem”—that we can be sure is true—but we’ll never know quite how or why. So what would that mean for mathematics?
Traditionally, mathematics has tended to operate in a “white box” kind of way, trying to build narrative and understanding along with “facts”. And in this respect it’s very different from natural science. Because in natural science much of our knowledge has traditionally been empirical—derived from observing the world or experimenting on it—and without any certainty that we can “understand its origins”.
Automated theorem proving of the kind we’re discussing here—or, for that matter, pretty much any exploratory computational experimentation—aligns mathematics much more with natural science, deriving what’s true without an expectation of having a narrative explanation of why.
Could one imagine practicing mathematics that way? One’s already to some extent following such a path as soon as one introduces axiom systems to base one’s mathematics on. Where do the axiom systems come from? In the time of Euclid perhaps they were thought of as an idealization of nature. But in more modern times they are realistically much more the result of human choice and human aesthetics.
So let’s say we determine (given a particular axiom system) that some black-box theorem is true. Well, then we can just add it, just as we could another axiom. Maybe one day it’ll be possible to prove P≠NP or the Riemann Hypothesis from existing axioms of mathematics (if they don’t in fact turn out to be independent). And—black box or not—we can expect to add them to what we assume in subsequent mathematics we do, much as they’re routinely added right now, even though their status isn’t yet known.
But it’s one thing to add one or two “black-box theorems”. But what happens when black-box theorems—that we can think of as “experimentally determined”—start to dominate the landscape of mathematics?
Well, then mathematics will take on much more of the character of ruliology—or of an experimental science. When it comes to the applications of mathematics, this probably won’t make much difference, except that in effect mathematics will be able to become much more powerful. But the “inner experience” of mathematics will be quite different—and much less “human”.
If one indeed starts from axioms, it’s not at the outset obvious why everything in mathematics should not be mired in the kind of alien-seeming metamathematical complexity that we’ve encountered in the discussion of our proof here. But what I’ve argued elsewhere is that the fact that in our experience of doing mathematics it’s not is a reflection of how “mathematical observers like us” sample the raw metamathematical structure generated by axioms (or ultimately by the subaxiomatic structure of the ruliad).
The physics analogy I’ve used is that we succeed in doing mathematics at a “fluid dynamics level”, far above the detailed “molecular dynamics level” of things like the proof we’ve discussed here. Yes, we can ask questions—like ones about the structure of our proof—that probe the axiomatic “molecular dynamics level”. But it’s an important fact that in doing what we normally think of as mathematics we almost never have to; there’s a coherent way to operate purely at the “fluid dynamics level”.
Is it useful to “dip down” to the molecular dynamics? Definitely yes, because that’s where we can readily do computations—like those in our proof, or in general those going on in the internals of the Wolfram Language. But a key idea in the design of the Wolfram Language is to provide a computational language that can express concepts at a humanized “fluid dynamics” level—in effect bridging between the way humans can think and understand things, and the way raw computation can be done with them.
And it’s notable that while we’ve had great success over the years in defining “human-accessible” high-level representations for what amount to the “inputs” and “outputs” of computations, that’s been much less true of the “ongoing processes” of computation—or, for example, of the innards of proofs.
Is there a good “human-level” way to represent proofs? If the proofs are short, it’s not too difficult (and the step-by-step solutions technology of Wolfram|Alpha provides a good large-scale example of what can be done). But—as we’ve discussed—computational irreducibility implies that some proofs will inevitably be long.
If they’re not too long, then at least some parts of them might be constructed by human effort, say in a system like a proof assistant. But as soon as there’s much automation (whether with automated theorem proving or with LLMs) it’s basically inevitable that one will end up with things that at least approach what we’ve seen with the proof we’re discussing here.
What can then be done? Well, that’s the challenge. Maybe there is some way to simplify, abstract or otherwise “humanize” the proof we’ve been discussing. But I rather doubt it. I think this is likely one of those cases where we inevitably find ourselves face to face with computational irreducibility.
And, yes, there’s important science (particularly ruliology) to do on the structures we see. But it’s not mathematics as it’s traditionally been practiced. But that’s not to say that the results that come out of things like our proof won’t be useful for mathematics. They will be. But they make mathematics more like an experimental science—where what matters most is in effect the input and output rather than a “publishable” or human-readable derivation in between. And where the key issue in making progress is less in the innards of derivations than in defining clear computational ways to express input and output. Or, in effect, in capturing “human-level mathematics” in the primitives and structure of computational language.
The proof we’ve been discussing here was created using FindEquationalProof in the Wolfram Language. But what if we were to use a different automated theorem proving system? How different would the results be? In the spectrum of things that automated theorem proving systems do, our proof here is on the difficult end. And many existing automated theorem proving systems don’t manage to do it all. But some of the stronger ones do. And in the end—despite their different internal algorithms and heuristics—it’s remarkable how similar the results they give are to those from the Wolfram Language FindEquationalProof (differences in the way lemmas vs. inference steps, etc. are identified make detailed quantitative comparisons difficult):
Thanks to Nik Murzin of the Wolfram Institute for his extensive help as part of the Wolfram Institute Empirical Metamathematics Project. Also Roger Germundsson, Sergio Sandoval, Adam Strzebonski, Michael Trott, Liubov Tupikina, James Wiles and Carlos Zapata for input. Thanks to Arnim Buch and Thomas Hillenbrand for their work in the 1990s on Waldmeister which is now part of FindEquationalProof (also to Jonathan Gorard for his 2017 work on the interface for FindEquationalProof). I was first seriously introduced to automated theorem proving in the late 1980s by Dana Scott, and have interacted with many people about it over the years, including Richard Assar, Bruno Buchberger, David Hillman, Norm Megill, Todd Rowland and Matthew Szudzik. (I’ve also interacted with many people about proof assistant, proof presentation and proof verification systems, both recently and in the past.)
2024-12-10 02:38:15
Note: As of today, copies of Wolfram Version 14.1 are being auto-updated to allow subscription access to the capabilities described here. [For additional installation information see here.]
Nearly a year and a half ago—just a few months after ChatGPT burst on the scene—we introduced the first version of our Chat Notebook technology to integrate LLM-based chat into Wolfram Notebooks. For the past year and a half we’ve been building on those foundations. And today I’m excited to be able to announce that we’re releasing the fruits of those efforts: the first version of our Wolfram Notebook Assistant.
There are all sorts of gimmicky AI assistants out there. But Notebook Assistant isn’t one of them. It’s a serious, deep piece of new technology, and what’s more important, it’s really, really useful! In fact, I think it’s so useful as to be revolutionary. Personally, I thought I was a pretty efficient user of Wolfram Language—but Notebook Assistant has immediately made me not only significantly more efficient, but also more ambitious in what I try to do. I hadn’t imagined just how useful Notebook Assistant was going to be. But seeing it now I can say for sure that it’s going to raise the bar for what everyone can do. And perhaps most important of all, it’s going to open up computational language and computational thinking to a vast range of new people, who in the past assumed that those things just weren’t accessible to them.
Leveraging the decades of work we’ve done on the design and implementation of the Wolfram Language (and Wolfram|Alpha), Notebook Assistant lets people just say in their own words what they want to do; then it does its best to crispen it up and give a computational implementation. Sometimes it goes all the way and just delivers the answer. But even when there’s no immediate “answer” it does remarkably well at building up structures where things can be represented computationally and tackled concretely. People really don’t need to know anything about computational language—or computational thinking to get started; Notebook Assistant will take their ideas, rough as they may be, and frame them in computational language terms.
I’ve long seen Wolfram Language as uniquely providing the infrastructure and “notation” to enable “computational X” for all fields X. I’m excited to say that I think Notebook Assistant now bridges “the last mile” to let anyone—at almost any level—access the power of computational language, and “do computational X”. In its original conception, Wolfram Notebook Assistant was just intended to be “useful”. But it’s emerging as something much more than that; something positively revolutionary.
“I can’t believe it’ll do anything useful with that”, I’ll think. But then I’ll try it. And, very often, something amazing will happen. Something that gets me past some sticking point or over some confusion. Something that gives me an unexpected new building block—or new idea—for what I’m trying to do. And that uses the medium of our computational language to take me beyond where I would ever have reached before.
So how does one use Notebook Assistant? Once you’ve signed up you can just go to the toolbar of any notebook, and open a Notebook Assistant chat window:
Now tell Notebook Assistant what you want to do. The more precise and explicit you are, the better. But you don’t have to have thought things through. Just type what comes into your mind. Imagine you’ve been working in a notebook, and (somehow) you’ve got a picture of some cats. You wonder “How can I find the cats in this picture?” Well, just ask Notebook Assistant!
Notebook Assistant gives some narrative text, and then a piece of Wolfram Language code—which you can just run in your notebook (by pressing ):
It seems a bit like magic. You say something vague, and Notebook Assistant turns it into something precise and computational—which you can then run. It’s not always as straightforward as in this example. But the important thing is that in practice (at least in my rather demanding experience) Notebook Assistant essentially always does spectacularly well at being useful—and at telling me things that move forward what I’m trying to do.
Imagine that sitting next to you, you had someone very knowledgeable about Wolfram Language and about computational thinking in general. Think what you might ask them. That’s what you should ask Notebook Assistant. And if there’s one thing to communicate here, it’s “Just try it!” You might think what you’re thinking about is too vague, or too specific, or too technical. But just try asking Notebook Assistant. In my experience, you’ll be amazed at what it’s able to do, and how helpful it’s able to be.
Maybe you’re an experienced Wolfram Language user who “knows there must be a way to do something”, but can’t quite remember how. Just ask Notebook Assistant. And not only will it typically be able to find the function (or whatever) you need; it’ll also usually be able to create a code fragment that does the very specific thing you asked about. And, by the way, it’ll save you lots of typing (and debugging) by filling in those fiddly options and so on just how you need them. And even if it doesn’t quite nail it, it’ll have given a skeleton of what you need, that you can then readily edit. (And, yes, the fact that it’s realistic to edit it relies on the fact that Wolfram Language represents it in a way that humans can readily read as well as write.)
What if you’re a novice, who’s never used Wolfram Language before, and never really been exposed to computational thinking, or for that matter, “techie stuff” at all? Well, the remarkable thing is that Notebook Assistant will still be able to help you—a lot. You can ask it something very vague, that doesn’t even seem particularly computational. It does remarkably well at “computationalizing things”. Taking what you’ve said, and finding a way to address it computationally—and to lead you into the kind of computational thinking that’ll be needed for the particular thing you’re trying to do.
In what follows, we’ll see a whole range of different ways to use Notebook Assistant. In fact, even as I’ve been writing this, I’ve discovered quite a few new ways to use it that I’d never thought of before.
There are some general themes, though. The most important is the way Notebook Assistant pivotally relies on the Wolfram Language. In a sense, the main mission of Notebook Assistant is to make things computational. And the whole reason it can so successfully do that is that it has the Wolfram Language as its target. It’s leveraging the unique nature of the Wolfram Language as a full-scale computational language, able to coherently represent abstract and real-world things in a computational way.
One might think that the Wolfram Language would in the end be mainly an “implementation layer”—serving to make what Notebook Assistant produces runnable. But in reality it’s very, very much more than that. In particular, it’s basically the medium—the language—in which computational ideas are communicated. When Notebook Assistant generates Wolfram Language, it’s not just something for the computer to run; it’s also something for humans to read. Yes, Notebook Assistant can produce text, and that’s useful, especially for contextualizing things. But the most concentrated and poignant communication comes in the Wolfram Language it produces. Want the TL;DR? Just look at the Wolfram Language code!
Part of how Wolfram Language code manages to communicate so much so efficiently is that it’s precise. You can just mention the name of a function, and you know precisely what it does. You don’t have to “scaffold” it with text to make its meaning clear.
But there’s something else as well. With its symbolic character—and with all the coverage and consistency that we’ve spent so much effort on over the decades—the Wolfram Language is uniquely able to “communicate in fragments”. Any fragment of Wolfram Language code can be run, and more important, it can smoothly fit into a larger structure. And that means that even small fragments of code that Notebook Assistant generates can be used as building blocks.
It produces Wolfram Language code. You read the code (and it’s critical that it’s set up to be read). You figure out if it’s what you want. (And if it’s not, you edit it, or ask Notebook Assistant to do that.) Then you can use that code as a robust building block in whatever structure—large or small—that you might be building.
In practice, a critical feature is that you don’t have to foresee how Notebook Assistant is going to respond to what you asked. It might nail the whole thing. Or it might just take steps in the right direction. But then you just look at what it produced, and decide what to do next. Maybe in the end you’ll have to “break the problem down” to get Notebook Assistant to deal with it. But there’s no need to do that in advance—and Notebook Assistant will often surprise you by how far it’s able to get on its own.
You might imagine that Notebook Assistant would usually need you to break down what you’re asking into “pure computational questions”. But in effect it has good enough “general knowledge” that it doesn’t. And in fact it will usually do better the more context you give it about why you’re asking it to do something. (Is it for chemical engineering, or for sports analytics, or what?)
But how ambitious can what you ask Notebook Assistant be? What if you ask it something “too big”? Yes, it won’t be able to solve that 100-year-old problem or build a giant software system in its immediate output. But it does remarkably well at identifying pieces that it can say something about, and that can help you understand how to get started. So, as with many things about Notebook Assistant, you shouldn’t assume that it won’t be helpful; just try it and see what happens! And, yes, the more you use Notebook Assistant, the more you’ll learn just what kind of thing it does best, and how to get the most out of it.
So how should you ultimately think about Notebook Assistant? Mainly you should think of it like an very knowledgeable and hardworking expert. But at a more mundane level it can serve as a super-enhanced documentation lookup system or code completion system. It can also take something vague you might ask it, and somehow undauntedly find the “closest formalizable construct”—that it can then compute with.
An important feature is that it is—in human terms—almost infinitely patient and hardworking. Where a human might think: “it’s too much trouble to write out all those details”, Notebook Assistant just goes ahead and does it. And, yes, it saves you huge amounts of typing. But, more important, it makes it “cheap” to do things more perfectly and more completely. So that means you actually end up labeling those plot axes, or adding a comment to your code, or coming up with meaningful names for your variables.
One of the overarching points about Notebook Assistant is that it lowers the barrier to getting help. You don’t have to think carefully about formulating your question. You don’t have to go clicking through lots of links. And you don’t have to worry that it’s too trivial to waste a coworker’s time on the question. You can just ask Notebook Assistant. Oh, and it’ll give you a response immediately. (And you can go back and forth with it, and ask it to clarify and refine things.)
At least for me it’s very common: you have something in your mind that you want to do, but you don’t quite know how to achieve it in the Wolfram Language. Well, now you can just ask Notebook Assistant!
I’ll show various examples here. It’s worth emphasizing that these examples typically won’t look exactly the same if you run them again. Notebook Assistant has a certain amount of “AI-style random creativity”—and it also routinely makes use of what you’ve done earlier in a session, etc. It also has to be said that Notebook Assistant will sometimes make mistakes—or will misunderstand what you’re asking it. But if you don’t like what it did, you can always press the button to generate a new response.
Let’s start off with a basic computational operation:
As an experienced user of Wolfram Language, a simple “do it with FoldList” would already have been enough. But Notebook Assistant goes all the way—generating specific code for exactly what I asked. Courtesy of Wolfram Language, the code is very short and easy to read. But Notebook Assistant does something else for one as well: it produces an example of the code in action—which lets one check that it really does what one wanted. Oh, and then it goes even further, and tells me about a function in the Wolfram Function Repository (that I, for one, had never heard of; wait did I write it?) that directly does the operation I want.
OK, so that was a basic computational operation. Now let’s try something a little more elaborate:
This involves several steps, but Notebook Assistant nails it, giving a nice example. (And, yes, it’s reading the Wolfram Language documentation, so often its examples are based on that.)
But even after giving an A+ result right at the top, Notebook Assistant goes on, talking about various options and extensions. And despite being (I think) quite an expert on what the Wolfram Language can do, I was frankly surprised by what it came up with; I didn’t know about these capabilities!
There’s an incredible amount of functionality built into the Wolfram Language (yes, four decades worth of it). And quite often things you want to do can be done with just a single Wolfram Language function. But which one? One of the great things about Notebook Assistant is that it’s very good at taking “raw thoughts”, sloppily worded, and figuring out what function you need. Like here, bam, “use LineGraph!”
You can ask Notebook Assistant “fairly basic” questions, and it’ll respond with nice, synthesized-on-the-spot “custom documentation”:
You can also ask it about obscure and technical things; it knows about every Wolfram Language function, with all its details and options:
Notebook Assistant is surprisingly good at writing quite minimal code that does sophisticated things:
If you ask it open-ended questions, it’ll often answer with what amount to custom-synthesized computational essays:
Notebook Assistant is pretty good at “pedagogically explaining what you can do”:
In everything we’ve seen so far, the workflow is that you ask Notebook Assistant something, then it generates a result, and then you use it. But everything can be much more interactive, and you can go back and forth with Notebook Assistant—say refining what you want it to do.
Here I had something in mind, but I was quite sloppy in describing it. And although Notebook Assistant came up with a reasonable interpretation of what I asked, it wasn’t really what I had in mind:
So I went back and edited what I asked (right there in the Notebook Assistant window), and tried again:
The result was better, but still not right. But all I had to do was to tell it to make a change, and lo and behold, I got what I was thinking of:
By the way, you can also perfectly well ask about deployment to the web:
And while I might have some minor quibbles (why use a string for the molecule name, not "Chemical"; why not use CloudPublish; etc.) what Notebook Assistant produces works, and provides an excellent scaffold for further development. And, as it often does, Notebook Assistant adds a kind of “by the way, did you know?” at the end, showing how one could use ARPublish to produce output for augmented reality.
Here’s one last example: creating a user interface element. I want to make a slider-like control that goes around (like an analog clock):
Well, actually, I had in mind something more minimal:
Impressive. Even if maybe it got that from some documentation or other example. But what if I wanted to tweak it? Well, actually, Notebook Assistant does seem to understand what it has:
What we’ve seen so far are a few examples of asking Notebook Assistant to tell us how to do things. But you can also just ask Notebook Assistant to do things for you, in effect producing “finished goods”:
Pretty impressive! And it even just went ahead and made the picture. By the way, if I wanted the code packaged up into a single line, I can just ask for that:
Notebook Assistant can generate interactive content too. And—very usefully—you don’t have to give precise specifications up front: Notebook Assistant will automatically pick “sensible defaults” (that, yes, you can trivially edit later, or just tell Notebook Assistant to change it for you):
Here’s an example that requires putting together several different ideas and functions. But Notebook Assistant manages it just fine—and in fact the code it produces is interesting and clarifying to read:
Notebook Assistant knows about every area of Wolfram Language functionality—here synthetic geometry:
And here chemistry:
It also knows about things like the Wolfram Function Repository, here running a function from there that generates a video:
Here’s something that again leverages Notebook Assistant’s encyclopedic knowledge of Wolfram Language capabilities, now pulling in real-time data:
I can’t resist trying a few more examples:
Let’s try something involving more sophisticated math:
(I would have used RegularPolygon[5], and I don’t think DiscretizeRegion is necessary … but what Notebook Assistant did is still very impressive.)
Or here’s some more abstract math:
OK, so Notebook Assistant provides a very powerful way to go from words to computational results. So what then is the role of computational language and of “raw Wolfram Language”? First of all, it’s the Wolfram Language that makes everything we’ve seen here work; it’s what the words are being turned into so that they can be computed from. But there’s something much more than that. The Wolfram Language isn’t just for computers to compute with. It’s also for humans to think with. And it’s an incredibly powerful medium for that thinking. Like a great generalization of mathematical notation from the distant past, it provides a streamlined way to broadly formalize things in computational terms—and to systematically build things up.
Notebook Assistant is great for getting started with things, and for producing a first level of results. But words aren’t ultimately an efficient way say how to build up from there. You need the crisp, formal structure of computational language. In which even the tiny amounts of code you write can be incredibly powerful.
Now that I’ve been using Notebook Assistant for a while I think I can say that on quite a few occasions it’s helped me launch things, it’s helped me figure out details, and it’s helped me debug things that have gone wrong. But the backbone of my computational progress has been me writing Wolfram Language myself (though quite often starting from something Notebook Assistant wrote). Notebook Assistant is an important new part of the “on ramp” to Wolfram Language; but it’s raw Wolfram Language that lets one really zoom forward to build new structures and achieve what’s computationally possible.
Computational thinking is an incredibly powerful approach. But sometimes it’s hard to get started with, particularly if you’re not used to it. And although one might not imagine it, Notebook Assistant can be very useful here, essentially helping one brainstorm about what direction to take.
I was explaining this to our head of Sales, and tried:
I really didn’t expect this to do anything terribly useful … and I was frankly amazed at what happened. Pushing my luck I tried:
Obviously this isn’t the end of the story, but it’s a remarkably good beginning—going from a vague request to something that’s set up to be thought about computationally.
Here’s another example. I’m trying to invent a good system for finding books in my library. I just took a picture of a shelf of books behind my desk:
Once again, a very impressive result. Not the final answer, but a surprisingly good start. That points me in the direction of image processing and segmentation. At first, it’s running too slowly, so it downsamples the image. Then it tells me I might need to tweak the parameters. So I just ask it to create a tool to do that:
And then:
It’s very impressive how much Notebook Assistant can help one go “from zero to computation”. And when one gets used to using it, it starts to be quite natural to just try it on all sorts of things one’s thinking about. But if it’s just “quick, tell me something to compute”, it’s usually harder to come up with anything.
And that reminds me of the very first time I ever saw a computer in real life. It was 1969 and I was 9 years old (and the computer was an IBM mainframe). The person who was showing me the computer asked me: “So what do you want to compute?” I really had no idea at that time “what one might compute”. Rather lamely I said “the weight of a dinosaur”. So, 55 years later, let’s try that again:
And let’s try going further:
Something I find very useful with Notebook Assistant is having it “tweak the details” of something I’ve already generated. For example, let’s say I have a basic plot of a sine curve in a notebook:
Assuming I have that notebook in focus, Notebook Assistant will “see” what’s there. So then I can tell it to modify my sine curve—and what it will do is produce new code with extra details added:
That’s a good result. But as a Wolfram Language aficionado I notice that the code is a bit more complicated than it needs to be. So what can I do about it? Well, I can just ask Notebook Assistant to simplify it:
I can keep going, asking it to further “embellish” the plot:
Let’s push our luck and try going even further:
Oops. Something went wrong. No callouts, and a pink “error” box. I tried regenerating a few times. Often that helps. But this time it didn’t seem to. So I decided to give Notebook Assistant a suggestion:
And now it basically got it. And with a little more back and forth I can expect to get exactly what I want.
In the Wolfram Language, functions (like Plot) are set up to have good automatic defaults. But when you want, for example, to achieve some particular, detailed look, you often have to end up specifying all sorts of additional settings. And Notebook Assistant is very good at doing this, and in effect, patiently typing out all those option settings, etc.
Let’s say you wrote some Wolfram Language (or perhaps Notebook Assistant did it for you). And let’s say it doesn’t work. Maybe it just produces the wrong output. Or maybe it generates all sorts of messages when it runs. Either way, you can just ask the Assistant “What went wrong?”
Here the Assistant rather patiently and clearly explained the message that was generated, then suggested “correct code”:
The Assistant tends to be remarkably helpful in situations like this—even for an experienced Wolfram Language user like me. In a sense, though, it has an “unfair advantage”. Not only has it learned “what’s reasonable” from seeing large amounts of Wolfram Language code; it also has access to “internal information”—like a stream of telemetry about messages that were generated (as well as stack traces, etc.).
In general, Notebook Assistant is rather impressive at “spotting errors” even in long and sophisticated pieces of Wolfram Language code—and in suggesting possible fixes. And I can say that this is a way in which using Notebook Assistant has immediately saved me significant time in doing things with Wolfram Language.
Notebook Assistant doesn’t just know how to write Wolfram Language code; it knows how to write good Wolfram Language code. And in fact if you give it even a sloppy “outline” of Wolfram Language code, the Assistant is usually quite good at making it clean and complete. And that’s important not only in being able to produce code that will run correctly; it’s also important in making code that’s clear enough that you can understand it (courtesy of the readability of good Wolfram Language code).
Here’s an example starting with a rather horrible piece of Wolfram Language code on the right:
The code on the right is quite buggy (it doesn’t initialize list, for example). But Notebook Assistant guesses what it’s supposed to do, and then makes nice “Wolfram style” versions, explaining what it’s doing.
If the code you’re dealing with is long and complicated, Notebook Assistant may (like a person) get confused. But you can always select a particular part, then ask Notebook Assistant specifically about that. And the symbolic nature—and coherence—of the Wolfram Language will typically mean that Notebook Assistant will be able to act “modularly” on the piece that you’ve selected.
Something I’ve found rather useful is to have Notebook Assistant refactor code for me. Here I’m starting from a sequence of separate inputs (yes, itself generated by Notebook Assistant) and I’m turning it into a single function:
Now we can use the function however we want:
Going the other way is useful too. And Notebook Assistant is surprisingly good at grokking what a piece of code is “about”, and coming up with reasonable names for variables, functions, etc.:
Yet another thing Notebook Assistant is good at is knowing all sorts of tricks to make code run faster:
“What does that piece of code actually do?” Good Wolfram Language code—like good prose or good mathematical formalism—can succinctly communicate ideas, in its case in computational terms, precisely grounded in the definition of the language. But (as with prose and math) you sometimes need a more detailed exploration. And providing narrative explanations of code is something else that Notebook Assistant is good at. Here it’s taking a single line of (rather elegant) Wolfram Language code and writing a whole essay about what the code is doing:
What if you have a long piece of code, and you just want to explain some small part of it? Well, since Notebook Assistant sees selections you make, you can just select one part of your code, and Notebook Assistant will know that’s what you want to explain.
The Wolfram Language is carefully designed to have built-in functions that just “do what you need”, without having to use idioms or set up repeated boilerplate. But there are situations where there’s inevitably a certain amount of “bureaucracy” to do. For example, let’s say you’re writing a function to deploy to the Function Repository. You enter the definition for the function into a Function Resource Definition Notebook. But now you have to fill in documentation, examples, etc. And in fact that’s often the part that typically takes the longest. But now you can ask Notebook Assistant to do it for you. Here I put the cursor in the Examples section:
It’s always a good idea to set up tests for functions you define. And this is another thing Notebook Assistant can help with:
All the examples of interacting with Notebook Assistant that we’ve seen so far involve using the Notebook Assistant window, that you can open with the button on the notebook toolbar. But another method involves using the
button in the toolbar, which we’ve been calling the “inspiration button”.
When you use the Notebook Assistant window, the Assistant will always try to figure out what you’re talking about. For example, if you say “Plot that” it’ll use what it knows about what notebook you’re using, and where you are in it, to try to work out what you mean by “that”. But when you use the button it’ll specifically try to “provide inspiration at your current selection”.
Let’s say you’ve typed Plot[Sin[x]. Press and it’ll suggest a possible completion:
After using that suggestion, you can keep going:
You can think of the button as providing a sophisticated meaning-aware autocomplete.
It also lets you do things like code simplification. Imagine you’ve written the (rather grotesque):
If you want to get rid of the For loops, just select them and press the button to get a much simpler version:
Want to go even further? Select that result and Notebook Assistant manages to get to a one-liner:
At some level it seems bizarre. Write a text cell that describes code to follow it. Start an Input cell, then press and Notebook Assistant will try to magically write the code!
You can go the other way as well. Start with the code, then start a CodeText cell above it, and it’ll “magically” write a caption:
If you start a heading cell, it’ll try to make up a heading:
Start a Text cell, and it’ll try to “magically” write relevant textual content:
You can go even further: just put the cursor underneath the existing content, and press —and Notebook Assistant will start suggesting how you can go on:
As I write this, of course I had to try it: what does Notebook Assistant think I should write next? Here’s what it suggests (and, yes, in this case, those aren’t such bad ideas):
One of the objectives for Notebook Assistant is to have it provide “hassle-free” access to AI and LLM technology integrated into the Wolfram System. And indeed, once you’ve set up your subscription (within your Wolfram Account), everything “just works”. Under the hood, there’s all sorts of technology, servers, etc. But you don’t have to worry about any of that; you can just use Notebook Assistant as a built-in part of the Wolfram Notebook experience.
As you work with Notebook Assistant, you’ll get progressively better intuition about where it can best help you. (And, yes, we’ll be continually updating Notebook Assistant, so it’ll often be worth trying things again if a bit of time has passed.) Notebook Assistant—like any AI-based system—has definite human-like characteristics, including sometimes making mistakes. Often those mistakes will be obvious (e.g. code with incorrect syntax colored red); sometimes they may be more difficult to spot. But the great thing about Notebook Assistant is that it’s firmly anchored to the “solid ground” of Wolfram Language. And any time it writes Wolfram Language code that you can see does what you want, you can always confidently use it.
There are some things that will help Notebook Assistant do its best for you. Particularly important is giving it the best view of the “context” for what you ask it. Notebook Assistant will generally look at whatever has already been said in a particular chat. So if you’re going to change the subject, it’s best to use the button to start a new chat, so Notebook Assistant will focus on the new subject, and not get confused by what you (or it) said before.
When you open the Notebook Assistant chat window you’ll often want to talk about—or refer to—material in some other notebook. Generally Notebook Assistant will assume that the notebook you last used is the one that’s relevant—and that any selection you have in that notebook is the thing to concentrate on the most. If you want Notebook Assistant to focus exclusively on what you’re saying in the chat window, one way to achieve that is to start a blank notebook. Another approach is to use the menu, which provides more detailed control over what material Notebook Assistant will consider. (For now, it just deals with notebooks you have open—but external files, URLs, etc. are coming soon.)
Notebook Assistant will by default store all your chat sessions. You can see your chat history (with chats automatically assigned names by the Assistant) by pressing the History button. You can delete chats from your history here. You can also “pop out” chats with
, creating standalone notebooks that you can save, send to other people, etc.
So what’s inside Notebook Assistant? It’s quite a tower of technology. The core of its “linguistic interface” is an LLM (actually, several different LLMs)—trained on extensive Wolfram Language material, and with access to a variety of tools, especially Wolfram Language evaluators. Also critical to Notebook Assistant is its access to a variety of RAGs based on vector databases, that it uses for immediate semantic search of material such as Wolfram Language documentation. Oh, and then there’s a lot of technology to connect Notebook Assistant to the symbolic internal structure of notebooks, etc.
So when you use Notebook Assistant, where is it actually running? Its larger LLM tasks are currently running on cloud servers. But a substantial part of its functionally is running right on your computer—using Wolfram Language (notably the Wolfram machine learning framework, vector database system, etc.) And because these things are running locally, the Assistant can request access to local information on your computer—as well as avoiding the latency of accessing cloud-based systems.
Much of the time, you want your interactions with Notebook Assistant to be somehow “off on the side”—say in the Notebook Assistant window, or in the inspiration button menu. But sometimes you want your interactions to be right in your main notebook.
And for this you’ll soon (in Version 14.2) be able to use an enhanced version of the Chat Notebook technology that we developed last year, not just in a separate “Chat Notebook”, but fully integrated into any notebook.
At the beginning of a cell in any notebook, just press ‘. You get a chat cell that communicates with Notebook Assistant:
And now the output from that chat cell is placed directly below in the notebook—so you can create a notebook that mixes standard notebook content with chat content.
It all works basically just like a fully integrated version of our Chat Notebook technology. (And this functionality is already available in Version 14.1 if you explicitly create a chat notebook with File > New > Chat Notebook.) As in Chat Notebooks, you use a chat break (with ~) to start a new chat within the same notebook. (In general, when you use a chat cell in an ordinary notebook to access Notebook Assistant, the assistant will see only material that occurs before the chat, and within the same chat block.)
In mid-2023 we introduced LLMFunction, LLMSynthesize and related functions (as well as ChatEvaluate, ImageSynthesize, etc.) to let you access LLM functionality directly within the Wolfram Language. Until now these functions required connection to an external LLM provider. But along with Notebook Assistant we’re introducing today LLM Kit—which allows you to access all LLM functionality in the Wolfram Language directly through a subscription within your Wolfram Account.
It’s all very easy: as soon as you enable your subscription, not only Notebook Assistant but also all LLM functionality will just work, going through our LLM service. (And, yes, Notebook Assistant is basically built on top of LLM Kit and the LLM service access it defines.)
When you’ve enabled your Notebook Assistant + LLM Kit subscription, this is what you’ll see in the Preferences panel:
Our LLM service is primarily aimed at “human speed” LLM usage, in other words, things like responding to what you ask the Notebook Assistant. But the service also seamlessly supports programmatic things like LLMFunction. And for anything beyond small-scale uses of LLMFunction, etc. you’ll probably want to upgrade from the basic “Essentials” subscription level to the “Pro” level. And if you want to go “industrial scale” in your LLM usage, you can do that by explicitly purchasing Wolfram Service Credits.
Everything is set up to be easy if you use our Wolfram LLM service—and that’s what Notebook Assistant is based on. But for Chat Notebooks and programmatic LLM functionality, our Wolfram Language framework also supports connection to a wide range of external LLM service providers. You have to have your own external subscription to whatever external service you want to use. But once you have the appropriate access key you’ll be able to set things up so that you can pick that LLM provider interactively in Chat Notebooks, programmatically through LLMConfiguration, or in the Preferences panel.
(By the way, we’re continually monitoring the performance of different LLMs on Wolfram Language generation; you can see weekly benchmark results at the Wolfram LLM Benchmark Project website—or get the data behind that from the Wolfram Data Repository.)
There’s really never been anything quite like it before: a way of automatically taking what can be quite vague human thoughts and ideas, and making them crisp and structured—by expressing them computationally. And, yes, this is made possible now by the unexpectedly effective linguistic interface that LLMs give us. But ultimately what makes it possible is that the LLMs have a target: the Wolfram Language in all its breadth and depth.
For me it’s an exciting moment. Because it’s a moment where everything we’ve been building these past four decades is suddenly much more broadly accessible. Expert users of Wolfram Language will be able to make use of all sorts of amazing nooks of functionality they never knew about. And people who’ve never used Wolfram Language before—or never even formulated anything computationally—will suddenly be able to do so.
And it’s remarkable what kinds of things one can “make computational”. Let’s say you ask Wolfram Notebook Assistant to make up a story. Like pretty much anything today with LLMs inside, it’ll dutifully do that:
But how can one make something like this computational? Well, just ask Notebook Assistant:
And what it does is rather remarkable: it uses Wolfram Language to create an interactive agent-based computational game version of the story!
Computation is the great paradigm of our times. And the development of “computational X” for all X seems destined to be the future of pretty much every field. The whole tower of ideas and technology that is the modern Wolfram Language was built precisely to provide the computational language that is needed. But now Notebook Assistant is dramatically broadening access to that—making it possible to get “computational language superpowers” using just ordinary (and perhaps even vague) natural language.
And even though I’ve now been living the computational language story for more than four decades Notebook Assistant keeps on surprising me with what it manages to make computational. It’s incredibly powerful to be able “go computational”. And even if you can’t imagine how it could work in what you’re doing, you should still just try it! Notebook Assistant may well surprise you—and in that moment show you a path to leverage the great power of the computational paradigm in ways that you’re never imagined.