Prolegomenon to a General Biology
February 21, 2001 by Stuart Kauffman
Before artificially intelligent systems can emerge and become self-aware, there is the question: Whence life in the first place?
Originally published October 2000 in the book Investigations. Excerpt published on KurzweilAI.net February 22, 2001.
Lecturing in Dublin, one of the twentieth century’s most famous physicists set the stage of contemporary biology during the war-heavy year of 1944. Given Erwin Schroedinger’s towering reputation as the discoverer of the Schroedinger equation, the fundamental formulation of quantum mechanics, his public lectures and subsequent book were bound to draw high attention. But no one, not even Schroedinger himself, was likely to have foreseen the consequences. Schroedinger’s What Is Life? is credited with inspiring a generation of physicists and biologists to seek the fundamental character of living systems. Schroedinger brought quantum mechanics, chemistry, and the still poorly formulated concept of “information” into biology. He is the progenitor of our understanding of DNA and the genetic code. Yet as brilliant as was Schroedinger’s insight, I believe he missed the center. Investigations seeks that center and finds, in fact, a mystery.
In my previous two books, I laid out some of the growing reasons to think that evolution was even richer than Darwin supposed. Modern evolutionary theory, based on Darwin’s concept of descent with heritable variations that are sifted by natural selection to retain the adaptive changes, has come to view selection as the sole source of order in biological organisms. But the snowflake’s delicate sixfold symmetry tells us that order can arise without the benefit of natural selection. Origins of Order and At Home in the Universe give good grounds to think that much of the order in organisms, from the origin of life itself to the stunning orderuon aione. inste&i, much ot the order in organisms, I believe, is self-organized and spontaneous. Self-organization mingles with natural selection in barely understood ways to yield the magnificence of our teeming biosphere. We must, therefore, expand evolutionary theory.
Yet we need something far more important than a broadened evolutionary theory. Despite any valid insights in my own two books, and despite the fine work of many others, including the brilliance manifest in the past three decades of molecular biology, the core of life itself remains shrouded from view. We know chunks of molecular machinery; metabolic pathways, means of membrane biosynthesis-we know many of the parts and many of the processes. But what makes a cell alive is still not clear to us. The center is still mysterious.
And so I began my notebook “Investigations” in December of 1994, a full half century after Schroedinger’s What Is Life?, as an intellectual enterprise unlike any I had undertaken before. Rather bravely and thinking with some presumptuousness of Wittgenstein’s famous Philosophical Investigations, which had shattered the philosophical tradition of logical atomism in which he had richly participated, I betook myself to my office at home in Santa Fe and grandly intoned through my fingers onto the computer’s disc, “Investigations,” on December 4, 1994. I sensed my long search would uncover issues that were then only dimly visible to me. I hoped the unfolding, ongoing notebook would allow me to find the themes and link them into something that was vast and new but at the time inarticulate.
Two years later, in September of 1996, I published a modestly well-organized version of Investigations as a Santa Fe Institute preprint, launched it onto the web, and put it aside for the time being. I found I had indeed been led into arenas that I had in no way expected, led by a swirl of ever new questions. I put the notebooks aside, but a year later I returned to the swirl, taking up again a struggle to see something that, I think, is right in front of us-always the hardest thing to see. This book is the fruit of these efforts. And this first chapter is but an introduction, in brief, to the themes that wifi be explained more fully in the following chapters. I would ask the reader to be patient with unfamiliar terms and concepts.
My first efforts had begun with twin questions. First, in addition to the known laws of thermodynamics, could there possibly be a fourth law of thermodynamics for open thermodynamic systems, some law that governs biospheres anywhere in the cosmos or the cosmos itself? Second, living entities-bacteria, plants, and animals-manipulate the world on their own behalf: the bacterium swimming upstream in a glucose gradient that is easily said to be going to get “dinner”; the paramecium, cilia beating like a Roman warship’s oars, hot after the bacterium; we humans earning our livings. Call the bacterium, paramecium, and us humans “autonomous agents’ able to act on our own behalf in an environment.
My second and core question became, What must a physical system be to be an autonomous agent? Make no mistake, we autonomous agents construct our biosphere, even as we coevolve in it. Why and how this is so is a central subject of all that follows. From the outset, there were, and remain, reasons for deep skepticism about the enterprise of Investigations. First, there are very strong arguments to say that there can be no general law for open thermodynamic systems. The core argument is simple to state. Any computer program is an algorithm that, given data, produces some sequence of output, finite or infinite. Computer programs can always be written in the form of a binary symbol string of i and o symbols. All possible binary symbol strings are possible computer programs. Hence, there is a countable, or denumerable, infinity of computer programs. A theorem states that for most computer programs, there is no compact description of the printout of the program. Rather, we must just unleash the program and watch it print what it prints. In short, there is no shorter description of the output of the program than that which can be obtained by running the program itself. If by the concept of a “law” we mean a compact description, ahead of time, of what the computer program will print then for any such program, there can be no law that allows us to predict what the program will actually do ahead of the actual running of the program.
The next step is simple. Any such program can be realized on a universal Turing machine such as the familiar computer. But that computer is an open nonequilibrium thermodynamic system, its openness visibly realized by the plug and power line that connects the computer to the electric power grid. Therefore, and I think this conclusion is cogent, there can be no general law for all possible nonequilibrium thermodynamic systems.
So why was I conjuring the possibility of a general law for open thermodynamic systems? Clearly, no such general law can hold for all open thermodynamic systems. But hold a moment. It is we humans who conceived and built the intricate assembly of chips and logic gates that constitute a computer, typically we humans who program it, and we humans who contrived the entire power grid that supplies the electric power to run the computer itself. This assemblage of late-twentieth-century technology did not assemble itself. We built it.
On the other hand, no one designed and built the biosphere. The biosphere got itself constructed by the emergence and persistent coevolution of autonomous agents. If there cannot be general laws for all open thermodynamic systems, might there be general laws for thermodynamically open but self-constructing systems such as biospheres? I believe that the answer is yes. Indeed, among those candidate laws to be discussed in this book is a candidate fourth law of thermodynamics for such self-constructing systems.
To roughly state the candidate law, I suspect that biospheres maximize the average secular construction of the diversity of autonomous agents and the ways those agents can make a living to propagate further. In other words, on average, biospheres persistently increase the diversity of what can happen next. In effect, as we shall see later, biospheres may maximize the average sustained growth of their own “dimensionality.”
Thus, the enterprise of Investigations soon began to center on the character of the autonomous agents whose coevolution constructs a biosphere. I was gradually led to a labyrinth of issues concerning the core features of autonomous agents able to manipulate the world on their own behalf. It may be that those core features capture a proper definition of life and that definition differs from the one Schroedinger found.
To state my hypothesis abruptly and without preamble, I think an autonomous agent is a self-reproducing system able to perform at least one thermodynamic work cycle. It will require most of this book to unfold the implications of this tentative definition.
Following an effort to understand what an autonomous agent might be–which, as just noted, involves the concept of work cycles–I was led to the concepts of work itself, constraints, and work as the constrained release of energy. In turn, this led to the fact that work itself is often used to construct constraints on the release of energy that then constitutes further work. So we confront a virtuous cycle:
Work constructs constraints, yet constraints on the release of energy are required for work to be done. Here is the heart of a new concept of “organization” that is not covered by our concepts of matter alone, energy alone, entropy alone, or information alone. In turn, this led me to wonder about the relation between the emergence of constraints in the universe and in a biosphere, and the diversification of patterns of the constrained release of energy that alone constitute work and the use of that work to build still further constraints on the release of energy. How do biospheres construct themselves or how does the universe construct itself?
The considerations above led to the role of Maxwell’s demon, one of the major places in physics where matter, energy, work, and information come together. The central point of the demon is that by making measurements on a system, the information gained can be used to extract work. I made a new distinction between measurements the demon might make that reveal features of nonequilibrium systems that can be used to extract work, and measurements he might make of the nonequilibrium system that cannot be used to extract work. How does the demon know what features to measure? And, in turn, how does work actually come to be extracted by devices that measure and detect displacements from equilibrium from which work can, in principle, be obtained? An example of such a device is a windmill pivoting to face the wind, then extracting work by the wind turning its vanes. Other examples are the rhodopsin molecule of a bacterium responding to a photon of light or a chloroplast using the constrained release of the energy of light to construct high-energy sugar molecules. How do such devices come into existence in the unfolding universe and in our biosphere? How does the vast web of constraint construction and constrained energy release used to construct yet more constraints happen into existence in the biosphere? In the universe itself? The answers appear not to be present in contemporary physics, chemistry, or biology. But a coevolving biosphere accomplishes just this coconstruction of propagating organization.
Thus, in due course, I struggled with the concept of organization itself, concluding that our concepts of entropy and its negative, Shannon’s information theory (which was developed initially to quantify telephonic traffic and had been greatly extended since then) entirely miss the central issues. What is happening in a biosphere is that autonomous agents are coconstructing and propagating organizations of work, of constraint construction, and of task completion that continue to propagate and proliferate diversifying organization.
This statement is just plain true. Look out your window, burrow down a foot or so, and try to establish what all the microscopic life is busy doing and building and has done for billions of years, let alone the macroscopic ecosystem of plants, herbivores, and carnivores that is slipping, sliding, hiding, hunting, bursting with flowers and leaves outside your window. So, I think, we lack a concept of propagating organization.
Then too there is the mystery of the emergence of novel functionalities in evolution where none existed before: hearing, sight, flight, language. Whence this novelty? I was led to doubt that we could prestate the novelty. I came to doubt that we could finitely prestate all possible adaptations that might arise in a biosphere. In turn, I was led to doubt that we can prestate the “configuration space” of a biosphere.
But how strange a conclusion. In statistical mechanics, with its famous liter box of gas as an isolated thermodynamic system, we can prestate the configuration space of all possible positions and momenta of the gas particles in the box. Then Ludwig Boltzmann and Willard Gibbs taught us how to calculate macroscopic properties such as pressure and temperature as equilibrium averages over the configuration space. State the laws and the initial and boundary conditions, then calculate; Newton taught us how to do science this way. What if we cannot prestate the configuration space of a biosphere and calculate with Newton’s “method of fluxions,” the calculus, from initial and boundary conditions and laws? Whether we can calculate or not does not slow down the persistent evolution of novelty in the biosphere. But a biosphere is just another physical system. So what in the world is going on? Literally, what in the world is going on?
We have much to investigate. At the end, I think we will know more than at the outset. But Investigations is at best a mere beginning.
It is well to return to Schroedinger’s brilliant insights and his attempt at a central definition of life as a well-grounded starting place. Schroedinger ‘s What Is Life? provided a surprising answer to his enquiry about the central character of life by posing a core question: What is the source of the astonishing order in organisms?
The standard–and Schroedinger argued, incorrect–answer, lay in statistical physics. If an ink drop is placed in still water in a petri dish, it will diffuse to a uniform equilibrium distribution. That uniform distribution is an average over an enormous number of atoms or molecules and is not due to the behavior of individual molecules. Any local fluctuations in ink concentration soon dissipate back to equilibrium.
Could statistical averaging be the source of order in organisms? Schroedinger based his argument on the emerging field of experimental genetics and the recent data on X-ray induction of heritable genetic mutations. Calculating the “target size” of such mutations, Schroedinger realized that a gene could comprise at most a few hundred or thousand atoms.
The sizes of statistical fluctuations familiar from statistical physics scale as the square root of the number of particles, N. Consider tossing a fair coin 10,000 times. The result will be about 50 percent heads, 50 percent tails, with a fluctuation of about ioo, which is the square root of 10,000. Thus, a typical fluctuation from 50:50 heads and tails is 100/10,000 or 1 percent. Let the number of coin flips be 100 million, then the fluctuations are its square root, or 10,000. Dividing, 10,000/100,000,000 yields a typical deviation of .01 percent from 50:50.
Schroedinger reached the correct conclusion: If genes are constituted by as few as several hundred atoms, the familiar statistical fluctuations predicted by statistical mechanics would be so large that heritability would be essentially impossible. Spontaneous mutations would happen at a frequency vastly larger than observed. The source of order must lie elsewhere.
Quantum mechanics, argued Schroedinger, comes to the rescue of life. Quantum mechanics ensures that solids have rigidly ordered molecular structures. A crystal is the simplest case. But crystals are structurally dull. The atoms are arranged in a regular lattice in three dimensions. If you know the positions of all the atoms in a minimal-unit crystal, you know where all the other atoms are in the entire crystal. This overstates the case, for there can be complex defects, but the point is clear. Crystals have very regular structures, so the different parts of the crystal, in some sense, all “say” the same thing. As shown below, Schroedinger translated the idea of “saying” into the idea of “encoding’ With that leap, a regular crystal cannot encode much “information’ All the information is contained in the unit cell.
If solids have the order required but periodic solids such as crystals are too regular, then Schroedinger puts his bet on aperiodic solids. The stuff of the gene, he bets, is some form of aperiodic crystal. The form of the aperiodicity will contain some kind of microscopic code that somehow controls the development of the organism. The quantum character of the aperiodic solid will mean that small discrete changes, or mutations, will occur. Natural selection, operating on these small discrete changes, will select out favorable mutations, as Darwin hoped.
Fifty years later, I find Schroedinger’s argument fascinating and brilliant. At once he envisioned what became, by 1953, the elucidation of the structure of DNA’s aperiodic double helix by James Watson and Francis Crick, with the famously understated comment in their original paper that its structure suggests its mode of replication and its mode of encoding genetic information.
Fifty years later we know very much more. We know the human genome harbors some 80,000 to 100,000 “structural genes’ each encoding the RNA that, after being transcribed from the DNA, is translated according to the genetic code to a linear sequence of amino acids, thereby constituting a protein. From Schroedinger to the establishment of the code required only about twenty years.
Beyond the brilliance of the core of molecular genetics, we understand much concerning developmental biology. Humans have about 260 different cell types: liver, nerve, muscle. Each is a different pattern of expression of the 80,000 or 100,000 genes. Since the work of Francois Jacob and Jacques Monod thirty-five years ago, biologists have understood that the protein transcribed from one gene might turn other genes on or off. Some vast network of regulatory interactions among genes and their products provides the mechanism that marshals the genome into the dance of development.
We have come close to Schroedinger’s dream. But have we come close to answering his question, What is life? The answer almost surely is no. I am unable to say, all at once, why I believe this, but I can begin to hint at an explanation. Investigations is a search for an answer. I am not entirely convinced of what lies within this book; the material is too new and far too surprising to warrant conviction. Yet the pathways I have stumbled along, glimpsing what may be a terra nova, do seem to me to be worth serious presentation and serious consideration.
Quite to my astonishment, the story that will unfold here suggests a novel answer to the question, What is life? I had not expected even the outlines of an answer, and I am astonished because I have been led in such unexpected directions. One direction suggests that an answer to this question may demand a fundamental alteration in how we have done science since Newton. Life is doing something far richer than we may have dreamed, literally something incalculable. What is the place of law if, as hinted above, the variables and configuration space cannot be prespecified for a biosphere, or perhaps a universe? Yet, I think, there are laws. And if these musings be true, we must rethink science itself.
Perhaps I can point again at the outset to the central question of an autonomous agent. Consider a bacterium swimming upstream in a glucose gradient, its flagellar motor rotating. If we naively ask, “What is it doing?” we unhesitatingly answer something like, “It’s going to get dinner.” That is, without attributing consciousness or conscious purpose, we view the bacterium as acting on its own behalf in an environment. The bacterium is swimming upstream in order to obtain the glucose it needs. Presumably we have in mind something like the Darwinian criteria to unpack the phrase, “on its own behalf?’ Bacteria that do obtain glucose or its equivalent may survive with higher probability than those incapable of the flagellar motor trick, hence, be selected by natural selection.
An autonomous agent is a physical system, such as a bacterium, that can act on its own behalf in an environment. All free-living cells and organisms are clearly autonomous agents. The quite familiar, utterly astonishing feature of autonomous agents–E. coli, paramecia, yeast cells, algae, sponges, flat worms, annelids, all of us–is that we do, everyday, manipulate the universe around us. We swim, scramble, twist, build, hide, snuffle, pounce.
Yet the bacterium, the yeast cell, and we all are just physical systems. Physicists, biologists, and philosophers no longer look for a mysterious élan vital, some ethereal vital force that animates matter. Which leads immediately to the central, and confusing, question: What must a physical system be such that it can act on its own behalf in an environment? What must a physical system be such that it constitutes an autonomous agent? I will leap ahead to state now my tentative answer: A molecular autonomous agent is a self-reproducing molecular system able to carry out one or more thermodynamic work cycles.
All free-living cells are, by this definition, autonomous agents. To take a simple example, our bacterium with its flagellar motor rotating and swimming upstream for dinner is, in point of plain fact, a self-reproducing molecular system that is carrying out one or more thermodynamic work cycles. So is the paramecium chasing the bacterium, hoping for its own dinner. So is the dinoflagellate hunting the paramecium sneaking up on the bacterium. So are the flower and flatworm. So are you and I.
It will take a while to fully explore this definition. Unpacking its implications reveals much that I did not remotely anticipate. An early insight is that an autonomous agent must be displaced from thermodynamic equilibrium. Work cycles cannot occur at equilibrium. Thus, the concept of an agent is, inherently, a non-equilibrium concept. So too at the outset it is clear that this new concept of an autonomous agent is not contained in Schroedinger’s answer. Schroedinger’s brilliant leap to aperiodic solids encoding the organism that unleashed mid-twentieth-century biology appears to be but a glimmer of a far larger story.
Footprints of Destiny: The Birth of Astrobiology
The telltale beginnings of that larger story are beginning to be formulated. The U.S. National Aeronautics and Space Agency has had a long program in “exobiology,” the search for life elsewhere in the universe. Among its well-known interests are SETI, a search for extraterrestrial life, and the Mars probes. Over the past three decades, a sustained effort has included a wealth of experiments aiming at discovering the abiotic origins of the organic molecules that are the building blocks of known living systems.
In the summer of 1997, NASA was busy attempting to formulate what it came to call “astrobiology,” an attempt to understand the origin, evolution, and characteristics of life anywhere in the universe. Astrobiology does not yet exist–it is a field in the birthing process. Whatever the area comes to be called as it matures, it seems likely to be a field of spectacular success and deep importance in the coming century. A hint of the potential impact of astrobiology came in August 1997 with the tentative but excited reports of a Martian meteorite found in Antarctica that, NASA scientists announced, might have evidence of early Martian microbial life. The White House organized the single-day “Space Conference,” to which 1 was pleased to be invited. Perhaps thirty-five scientists and scholars gathered in the Old Executive Office Building for a meeting led by Vice President Gore. The vice president began the meeting with a rather unexpected question to the group: If it should prove true that the Martian rock actually harbored fossilized microbial life, what would be the least interesting result?
The room was silent, for a moment. Then Stephen Jay Gould gave the answer many of us must have been considering: “Martian life turns out to be essentially identical to Earth life, same DNA, RNA, proteins, code?’ Were it so, then we would all envision life flitting from planet to planet in our solar system. It turns out that a minimum transit time for a fleck of Martian soil kicked into space to make it to earth is about fifteen thousand years. Spores can survive that long under desiccating conditions.
“And what,” continued the vice president, “would be the most interesting result?” Ah, said many of us, in different voices around the room: Martian life is radically different from Earth life.
If radically different, then….
If radically different, then life must not be improbable.
If radically different, then life may be abundant among the myriad stars and solar systems, on far planets hinted at by our current astronomy.
If radically different and abundant, then we are not alone.
If radically different and abundant, then we inhabit a universe rife with the creativity to create life.
If radically different, then-thought I of my just published second book-we are at home in the universe.
If radically different, then we are on the threshold of a new biology; a “general biology” freed from the confines of our known example of Earth life.
If radically different, then a new science seeking the origins, evolution, characteristics, and laws that may govern biospheres anywhere.
A general biology awaits us. Call it astrobiology if you wish. We confront the vast new task of understanding what properties and laws, if any, may characterize biospheres anywhere in the universe. I find the prospect stunning. I will argue that the concept of an autonomous agent will be central to the enterprise of a general biology.
A personally delightful moment arose during that meeting. The vice president, it appeared, had read At Home in the Universe, or parts of it. In At Home, and also in this book, I explore a theory I believe has deep merit, one that asserts that, in complex chemical reaction systems, self-reproducing molecular systems form with high probability.
The vice president looked across the table at me and asked, “Dr. Kauffman, don’t you have a theory that in complex chemical reaction systems life arises more or less spontaneously?”
“Well, isn’t that just sensible?”
I was, of course, rather thrilled, but somewhat embarrassed. “The theory has been tested computationally, but there are no molecular experiments to support it,” I answered.
“But isn’t it just sensible?” the vice president persisted.
I couldn’t help my response, “Mr. Vice President, I have waited a long time for such confirmation. With your permission, sir, I will use it to bludgeon my enemies.”
I’m glad to say there was warm laughter around the table. Would that scientific proof were so easily obtained. Much remains to be done to test my theory.
Many of us, including Mr. Gore, while maintaining skepticism about the Mars rock itself, spoke at that meeting about the spiritual impact of the discovery of life elsewhere in the universe. The general consensus was that such a discovery, linked to the sense of membership in a creative universe, would alter how we see ourselves and our place under all, all the suns. I find it a gentle, thrilling, quiet, and transforming vision.
We are surprisingly well poised to begin an investigation of a general biology, for such a study will surely involve the understanding of the collective behaviors of very complex chemical reaction networks. After all, all known life on earth is based on the complex webs of chemical reactions-DNA, RNA, proteins, metabolism, linked cycles of construction and destruction-that form the life cycles of cells. In the past decade we have crossed a threshold that will rival the computer revolution. We have learned to construct enormously diverse “libraries” of different DNA, RNA, proteins, and other organic molecules. Armed with such high-diversity libraries, we are in a position to begin to study the properties of complex chemical reaction networks.
To begin to understand the molecular diversity revolution, consider a crude estimate of the total organic molecular diversity of the biosphere. There are perhaps a hundred million species. Humans have about a hundred thousand structural genes, encoding that many different proteins. If all the genes within a species were identical, and all the genes in different species were at least slightly different, the biosphere would harbor about ten trillion different proteins. Within a few orders of magnitude, ten trillion will serve as an estimate of the organic molecular diversity of the natural biosphere. But the current technology of molecular diversity that generates libraries of more or less random DNA, RNA, or proteins now routinely produces a diversity of a hundred trillion molecular species in a single test tube.
In our hubris, we rival the biosphere.
The field of molecular diversity was born to help solve the problem of drug discovery. The core concept is simple. Consider a human hormone such as estrogen. Estrogen acts by binding to a specific receptor protein; think of the estrogen as a “key” and the receptor as a “lock.” Now generate sixty-four million different small proteins, called peptides, say, six amino acids in length. (Since there are twenty types of amino acids, the number of possible hexamers is sixty-four million.) The sixty-four million hexamer peptides are candidate second keys, any one of which might be able to fit into the same estrogen receptor lock into which estrogen fits. If so, any such second key may be similar to the first key, estrogen, and hence is a candidate drug to mimic or modulate estrogen.
To find such an estrogen mimic, take many identical copies of the estrogen receptor, afix them to the bottom of a petri plate, and expose them simultaneously to all sixty-four million hexamers. Wash off all the peptides that do not stick to the estrogen receptor, then recover those hexamers that do stick to the estrogen receptor. Any such peptide is a second key that binds the estrogen receptor locks and, hence, is a candidate estrogen mimic.
The procedure works, and works brilliantly. By 1990, George Smith at the University of Missouri used a specific kind of virus, a filamentous phage that infects bacteria. The phage is a strand of RNA that encodes proteins. Among these proteins is the coat protein that packages the head of the phage as part of an infective phage particle. George cloned random DNA sequences encoding random hexamer peptides into one end of the phage coat protein gene. Each phage then carried a different, random DNA sequence in its coat protein gene, hence made a coat protein with a random six amino acid sequence at one end. The initial resulting “phage display” libraries had about twenty million of the sixty-four million different possible hexamer peptides.
Rather than using the estrogen receptor and seeking a peptide estrogen mimic that binds the estrogen receptor, George Smith used a monoclonal antibody molecule as the analog of the receptor and sought a hexamer peptide that could bind the monoclonal antibody. Monoclonal antibody technology allows the generation of a large number of identical antibody molecules, hence George could use these as identical mock receptors. George found that, among the twenty million different phage, about one in a million would stick to his specific monoclonal antibody molecules. In fact, George found nineteen different hexamers binding to his monoclonal antibody. Moreover, the nineteen different hexamers differed from one another, on average, in three of the six amino acid positions. All had high affinity for his monoclonal antibody target.
These results have been of very deep importance. Phage display is now a central part of drug discovery in many pharmaceutical and biotechnology companies. The discovery of “drug leads” is being transformed from a difficult to a routine task. Not only is work being pursued using peptides but also using RNA and DNA sequences. Molecular diversity has now spread to the generation of high-diversity libraries of small organic molecules, an approach called “combinatorial chemistry.” The promise is of high medical importance. As we understand better the genetic diversity of the human population, we can hope to create well-crafted molecules with increased efficacy as drugs, vaccines, enzymes, and novel molecular structures. When the capacity to craft such molecules is married, as it will be in the coming decades, to increased understanding of the genetic and cellular signaling pathways by which ontogeny is controlled, we will enter an era of “postgenomic” medicine. By learning to control gene regulation and cell signaling, we will begin to control cell proliferation, cell differentiation, and tissue regeneration to treat pathologies such as cancer, autoimmune diseases, and degenerative diseases.
But George Smith’s experiments are also of immediate interest, and in surprising ways that will bear on our later discussion of autonomous agents. George’s experiments have begun to verify the concept of a “shape space” put forth by George Oster and Alan Perelson of the University of California, Berkeley, and Los Alamos National Laboratory more than a decade earlier. In turn, shape space suggests “catalytic task space.” We will need both to understand autonomous agents.
Oster and Perelson had been concerned about accounting for the fact that humans can make about a hundred million different antibody molecules. Why, they wondered. They conceived of an abstract shape space with perhaps seven or eight “dimensions’ Three of these dimensions would correspond to the three spatial dimensions, length, height, and width of a molecular binding site. Other dimensions might correspond to physical properties of the binding sites of molecules, such as charge, dipole moment, and hydrophobicity.
A point in shape space would represent a molecular shape. An antibody binds its shape complement, key and lock. But the precision with which an antibody can recognize its shape complement is finite. Some jiggle room is allowed. So an antibody molecule “covers” a kind of “ball” of complementary shapes in shape space. And then comes the sweet argument. If an antibody covers a ball, an actual volume, in shape space, then a finite number of balls will suffice to cover all of shape space. A reasonable analogy is that a finite number of Ping-Pong balls will fill up a bedroom.
But how big of a Ping-Pong ball in shape space is covered by one antibody? Oster and Perelson reasoned that in order for an immune system to protect an organism against disease, its antibody repertoire should cover a reasonable fraction of shape space. Newts, with about ten thousand different antibody molecules, have the minimal known antibody diversity. Perelson and Oster guessed that the newt repertoire must cover a substantial fraction, say about 1/e–where e is the natural base for logarithms–or 37 percent of shape space. Dividing 37 percent by 10,000 gives the fractional volume of shape space covered by one antibody molecule. It follows that 100,000,000 such balls, thrown at random into shape space and allowed to overlap one another, will saturate shape space. So, 100 million antibody molecules is all we need to recognize virtually any shape of the size scale of molecular binding sites.
And therefore the concept of shape space carries surprising implications. Not surprisingly, similar molecules can have similar shapes. More surprisingly, very different molecules can have the same shape. Examples include endorphin and morphine. Endorphin is a peptide hormone. When endorphin binds the endorphin brain receptor, a euphoric state is induced. Morphine, a completely different kind of organic molecule, binds the endorphin receptor as well, with well-known consequences. Still more surprising, a finite number of different molecules, about a hundred million, can constitute a universal shape library. Thus, while there are vastly many different proteins, the number of effectively different shapes may only be on the order of a hundred million.
If one molecule binding to a second molecule can be thought of as carrying out a “binding task’ then about a hundred million different molecules may constitute a universal toolbox for all molecular binding tasks. So if we can now create libraries with 100 trillion different proteins, a millionfold in excess of the universal library, we are in a position to begin to study molecular binding in earnest.
But there may also be a universal enzymatic toolbox. Enzymes catalyze, or speed up, chemical reactions. Consider a substrate molecule undergoing a reaction to a product molecule. Physical chemists think of the substrate and product molecules as lying in two potential “energy wells’ like a ball at the bottom of one of two adjacent bowls. A chemical reaction requires “lifting” the substrate energetically to the top of the barrier between the bowls. Physically, the substrate’s bonds are maximally strained and deformed at the top of this potential barrier. The deformed molecule is called the “transition state.” According to transition state theory, an enzyme works by binding to and stabilizing the transition state molecule, thereby lowering the potential barrier of the reaction. Since the probability that a molecule acquires enough energy to hop to the top of the potential barrier is exponentially less as the barrier height increases, the stabilization of the transition state by the enzyme can speed up the reaction by many orders of magnitude.
Think of a catalytic task space, in which a point represents a catalytic task, where a catalytic task is the binding of a transition state of a reaction. Just as similar molecules can have similar shapes, so too can similar reactions have similar transition states, hence, such reactions constitute similar catalytic tasks. Just as different molecules can have the same shape, so too can different reactions have similar transition states, hence constitute the “same” catalytic task. Just as an antibody can bind to and cover a ball of similar shapes, an enzyme can bind to and cover a ball of similar catalytic tasks. Just as a finite number of balls can cover shape space, a finite number of balls can cover catalytic task space.
In short, a universal enzymatic toolbox is possible. Clues that such a toolbox is experimentally feasible come from many recent developments, including the discovery that antibody molecules, evolved to bind molecular features called epitopes, can actually act as catalysts.
Catalytic antibodies are obtained exactly as one might expect, given the concept of a catalytic task space. One would like an antibody molecule that binds the transition state of a reaction. But transition states are ephemeral. Since they last only fractions of a second, one cannot immunize with a transition state itself. Instead, one immunizes with a stable analog of the transition shape; that is, one immunizes with a second molecule that represents the “same” catalytic task as does the transition state itself. Antibody molecules binding to this transition state analog are tested. Typically, about one in ten antibody molecules can function as at least a weak catalyst for the corresponding reaction.
These results even allow a crude estimate of the probability that a randomly chosen antibody molecule will catalyze a randomly chosen reaction. About one antibody in a hundred thousand can bind a randomly chosen epitope. About one in ten antibodies that bind the transition state analog act as catalysts. By this crude calculation, about one in a million antibody molecules can catalyze a given reaction.
This rough calculation is probably too high by several orders of magnitude, even for antibody molecules. Recent experiments begin to address the probability that a randomly chosen peptide or DNA or RNA sequence will catalyze a randomly chosen reaction. The answer for DNA or RNA appears to be about one in a billion to one in a trillion. If we now make libraries of a hundred trillion random DNA, RNA, and protein molecules, we may already have in hand universal enzymatic toolboxes. Virtually any reaction, on the proper molecular scale of reasonable substrates and products, probably has one or more catalysts in such a universal toolbox.
In short, among the radical implications of molecular diversity is that we already possess hundreds of millions of different molecular functions-binding, catalytic, structural, and otherwise.
In our hubris, we rival the biosphere.
In our humility, we can begin to formulate a general biology and begin to investigate the collective behaviors of hugely diverse molecular libraries. Among these collective behaviors must be life itself.
Life as an Emergent Collective Behavior of Complex Chemical Networks
In the summer of 1996, Philip Anderson, a Nobel laureate in physics, and I accompanied Dr. Henry MacDonald, incoming director of NASA, Ames, to NASA headquarters. Our purpose was to discuss a new linked experimental and theoretical approach to the origin-of-life problem with NASA Administrator Dan Golden and his colleague, Dr. Wesley Huntress. I was excited and delighted.
As long ago as 1971, I had published my own first foray into the origin-of-life problem as a young assistant professor in the Department of Theoretical Biology at the University of Chicago. I had wondered if life must be based on template replicating nucleic acids such as DNA or RNA double helices and found myself doubting that standard assumption. Life, at its core, depends upon autocatalysis, that is, reproduction. Most catalysis in cells is carried out by protein enzymes. Might there be general laws supporting the possibility that systems of catalytic polymers such as proteins might be self-reproducing? Proteins are, as noted, linear sequences of twenty kinds of standard amino acids. Consider, then, a first copy of a protein that has the capacity to catalyze a reaction by which two fragments of a potential second copy of that same protein might be ligated to make the second copy of the whole protein. Such a protein, A, say, thirty-two amino acids long, might act on two fragments, say, fifteen amino acids and seventeen amino acids in length, and ligate the two to make a second copy of the thirty-two amino acid sequence.
But if one could imagine a molecule, A, catalyzing its own formation from its own fragments, could one not imagine two proteins, A and B, having the property that A catalyzes the formation of B by ligating B’s fragments into a second copy of B, while B catalyzes the formation of A by catalyzing the ligation of A’s fragments into a second copy of A? Such a little reaction system would be collectively autocatalytic. Neither A alone, nor B alone, would catalyze its own formation. Rather the AB system would jointly catalyze its reproduction from A and B fragments. But if A and B might achieve collective autocatalysis, might one envision a system with tens or hundreds of proteins, or peptides, that were collectively autocatalytic?
Might collective autocatalysis of proteins or similar polymers be the basic source of self-reproduction in molecular systems? Or must life be based on template replication, as envisioned by Watson and Crick, or as envisioned even earlier by Schroedinger in his aperiodic solid with its microcode? In view of the potential for a general biology, what, in fact, are the alternative bases for self-reproducing molecular systems here and anywhere in the cosmos? Which of these alternatives is more probable, here and anywhere?
By 1971 I had asked and found a preliminary answer to the following question:
In a complex mixture of different proteins, where the proteins might be able to serve as candidates to ligate one another into still larger amino acid sequences, what are the chances that such a system will contain one or more collectively auto-catalytic sets of molecules? The best current guess is that, as the molecular diversity of a reaction system increases, a critical threshold is reached at which collectively autocatalytic, self-reproducing chemical reaction networks emerge spontaneously.
If this view is correct, and the kinetic conditions for rapid reactions can be sustained, perhaps by enclosure of such a reproducing system in a bounding membrane vesicle, also synthesized by the system, the emergence of self-reproducing molecular systems may be highly probable. No small conclusion this: Life abundant, emergent, expected. Life spattered across megaparsecs, galaxies, galactic clusters. We as members of a creative, mysteriously unfolding universe. Moreover, the hypothesis is richly testable and, as described in the next chapter, is now under the early stages of testing.
One way or another, we will discover a second life-crouched under a Mars rock, frozen in time; limpid in some pool on Titan, in some test tube in Nebraska in the next few decades. We will discover a second life, one way or another. What monumental transformations await us, proudly postmodern, mingled with peoples on this very globe still wedded to archetypes thousands of years old.
The Strange Thing About the Theory of Evolution
We do not understand evolution. We live it with moss, fruit, fin, and quill fellows. We see it since Darwin. We have insights of forms and their formation, won from efforts since Aristotle codified the embryological investigations that over twenty-five centuries ago began with the study of deformed fetuses in sacrificial animals. But we do not understand evolution.
“The strange thing about the theory of evolution, said one of the Huxleys (although I cannot find which one), “is that everyone thinks he understands it.” How very well stated in that British fashion Americans can admire but not emulate. (“Two peoples separated by a common language,” as Churchill dryly put it.)
The strange thing about the theory of evolution is that everyone thinks he understands it. How very true. It seems, of course, so simple. Finches hop around the Galapagos, occasionally migrating from island to island. Small and large beaks serve for different seeds. Beaks fitting seeds feed the young. Well-wrought beaks are selected. Mutations are the feedstock of heritable variation in a population. Populations evolve by mutation, mating, recombination, and selection to give the well-marked varieties that are, for Darwin, new species. Phylogenies bushy in the biosphere. “We’re here, we’re here,” cry all for their typical four-million-year-stay along the four-billion-year pageant.
How, in many senses. First, Darwin’s theory of evolution is a theory of descent with modification. It does not yet explain the genesis of forms, but the trimmings of the forms, once they are generated. “Rather like achieving an apple tree by trimming off all the branches’ said a late-nineteenth-century skeptic.
How, in the most fundamental sense: Whence life in the first place? Darwin starts with life already here. Whence life is the stuff of all later questions about whence the forms to sift.
How, in still a different sense. Darwin assumed gradualism. Most variation would be minor. Selection would sift these insensible alterations, a bit more lift, a little less drag, until the wing flew faultless in the high-hoped sky, a falcon’s knot-winged, claw-latching dive to dine.
But whence the gradualism itself? It is not God given, but true, that organisms are hardly affected by most mutations. Most mutations do have little effect, some have major effects. In Drosophila, many mutants make small modifications in bristle number, color, shape. A few change wings to legs, eyes to antennae, heads to genitalia. Suppose that all mutations were of dramatic effect. Suppose, to take the limiting philosophical case, that all mutations were what geneticists call “lethals.” Since, indeed, some mutations are lethals, one can, a priori, imagine creatures in which all mutations were lethal prior to having offspring. Might be fine creatures, too, in the absence of any mutations, these evolutionary descendants of, well, of what? And progenitors of whom? No pathway to or from these luckless ones.
Thus, evolution must somehow be crafting the very capacity of creatures to evolve. Evolution nurtures herself! But not yet in Darwin’s theory, nor yet in ours.
Take another case–sex. Yes, it captures our attention, and the attention of most members of most species. Most species are sexual. But why bother? Asexuals, budding quietly wherever they bud, require only a single parent. We plumaged ones require two, a twofold loss in fitness.
Why sex? The typical answer, to which I adhere, is that sexual mating gives the opportunity for genetic recombination. In genetic recombination, the double chromosome complement sets, maternal and paternal homologues, pair up, break and recombine to yield offspring chromosomes the left half of which derives from one parental chromosome, the right half of which derives from the other parental chromosome.
Recombination is said to be a useful “search procedure” in an evolving population. Consider, a geneticist would say, two genes, each with two versions, or alleles:
A and a for the first gene, B and b for the second gene. Suppose A confers a selective advantage compared to a, and B confers an advantage with respect to b. In the absence of sex, mating, and recombination, a rabbit with A and b would have to wait for a mutation to convert b to B. That might take a long time. But, with mating and recombination, a rabbit with A on the left end of a maternal chromosome, recombination. A and B would now be on a single chromosome, hence be passed on to the offspring. Recombination, therefore, can be a lot faster than waiting for mutation to assemble the good, AB chromosome.
But it is not so obvious that recombination is a good idea after all. At the molecular level, the recombination procedure is rather like taking an airplane and a motorcycle, breaking both in half, and using spare bolts to attach the back half of the airplane to the front half of the motorcycle. The resulting contraption seems useless for any purpose.
In short, the very usefulness of recombination depends upon the gradualness that Darwin assumed. In later chapters 1 will discuss the concept of a “fitness landscape.” The basic idea is simple. Consider a set of all possible frogs, each with a different genotype. Locate each frog in a high-dimensional “genotype space;’ each next to all genotypes that differ from it by a single mutation. Imagine that you can measure the fitness of each frog. Graph the fitness of each frog as a height above that position in genotype space. The resulting heights form a fitness landscape over the genotype space, much as the Alps form a mountainous landscape over part of Europe.
In the fitness landscape, image, mutation, recombination, and selection can conspire to pull evolving populations upward toward the peaks of high fitness. But not always. It is relatively easy to show that recombination is only a useful search procedure on smooth fitness landscapes. The smoothness of a fitness landscape can be defined mathematically by a correlation function giving the similarity of fitnesses, or heights, at two points on the landscape separated by a mutational distance. In the Alps, most nearby points are of similar heights, except for cliffs, but points fifty kilometers apart can be of very different heights. Fifty kilometers is beyond the correlation length of the Alps.
There is good evidence that recombination is only a useful search strategy on smooth, highly correlated landscapes, where the high peaks all cluster near one another. Recombination, half airplane-half motorcycle, is a means to look “between” two positions in a high-dimensional space. Then if both points are in the region of high peaks, looking between those two points is likely to uncover further new points of high fitness, or points on the slopes of even higher peaks. Thereafter, further mutation, recombination, and selection can bring the adapting population to successively higher peaks in the high-peaked region of the genotype space. If landscapes are very rugged and the high peaks do not cluster into smallish regions, recombination turns out to be a useless search strategy.
But most organisms are sexual. If organisms are sexual because recombination is a good search strategy, but recombination is only useful as a search strategy on certain classes of fitness landscapes, where did those fitness landscapes come from?
The strange thing about evolution is that everyone thinks he understands it. Somehow, evolution has brought forth the kind of smooth landscapes upon which recombination itself is a successful search strategy.
More generally, two young scientists, then at the Santa Fe Institute, proved a rather unsettling theorem. Bill Macready and David Wolpert called it the “no-free-lunch theorem.” They asked an innocent question. Are there some search procedures that are “good” search procedures, no matter what the problem is? To formalize this, Bill and David considered a mathematical convenience–a set of all possible fitness landscapes. To be simple and concrete, consider a large three-dimensional room. Divide the room into very small cubic volumes, perhaps a millimeter on a side. Let the number of these small volumes in the room be large, say a trillion. Now consider all possible ways of assigning integers between one and a trillion, to these small volumes. Any such assignment can be thought of as a fitness landscape, with the integer representing the fitness of that position in the room.
Next, formalize a search procedure as a process that somehow samples M distinct volumes among the trillion in the room. A search procedure specifies how to take the M samples. An example is a random search, choosing the M boxes at random. A second procedure starts at a box and samples its neighbors, climbing uphill via neighboring boxes toward higher integers. Still another procedure picks a box, samples neighbors and picks those with lower integers, then continues.
The no-free-lunch theorem says that, averaged over all possible fitness landscape, no search procedure outperforms any other search procedure. What? Averaged over all possible fitness landscapes, you would do as well trying to find a large integer by searching randomly from an initial box for your M samples as you would climbing sensibly uphill from your initial box.
The theorem is correct. In the absence of any knowledge, or constraint, on the fitness landscape, on average, any search procedure is as good as any other.
But life uses mutation, recombination, and selection. These search procedures seem to be working quite well. Your typical bat or butterfly has managed to get itself evolved and seems a rather impressive entity. The no-free-lunch theorem brings into high relief the puzzle. If mutation, recombination, and selection only work well on certain kinds of fitness landscapes, yet most organisms are sexual, and hence use recombination, and all organisms use mutation as a search mechanism, where did these well-wrought fitness landscapes come from, such that evolution manages to produce the fancy stuff around us?
Here, I think, is how. Think of an organism’s niche as a way of making a living. Call a way of making a living a “natural game.” Then, of course, natural games evolve with the organisms making those livings during the past four billion years. What, then, are the “winning games”? Naturally, the winning games are the games the winning organisms play. One can almost see Darwin nod. But what games are those? What games are the games the winners play?
Ways of making a living, natural games, that are well searched out and well mastered by the evolutionary search strategies of organisms, namely, mutation and recombination, will be precisely the niches, or ways of making a living, that a diversifying and speciating population of organisms will manage to master. The ways of making a living presenting fitness landscapes that can be well searched by the procedures that organisms have in hand will be the very ways of making a living that readily come into existence. If there were a way of making a living that could not be well explored and exploited by organisms as they speciate, that way of making a living would not become populated. Good jobs, like successful jobholders, prosper.
So organisms, niches, and search procedures jointly and self-consistently coconstruct one another! We make the world in which we make a living such that we can, and have, more or less mastered that evolving world as we make it. The same is true, I will argue, for an econosphere. A web of economic activities, firms, tasks, jobs, workers, skills, and learning, self-consistently came into existence in the last forty thousand years of human evolution.
The strange thing about the theory of evolution is that everyone thinks he understands it. But we do not. A biosphere, or an econosphere, self-consistently coconstructs itself according to principles we do not yet fathom.
Laws for a Biosphere
But there must be principles. Think of the Magna Carta, that cultural enterprise founded on a green meadow in England when John I was confronted by his nobles. British common law has evolved by precedent and determinations to a tangled web of more-or-less wisdom. When a judge makes a new determination, sets a new precedent, ripples of new interpretation pass to near and occasionally far reaches of the law. Were it the case that every new precedent altered the interpretation of all old judgments, the common law could not have coevolved into its rich tapestry. Conversely, if new precedents never sent out ripples, the common law could hardly evolve at all.
There must be principles of coevolutionary assembly for biospheres, economic systems, legal systems. Coevolutionary assembly must involve coevolving organizations flexible enough to change but firm enough to resist change. Edmund Burke was basically right. Might there be something deep here? Some hint of a law of coevolutionary assembly?
Perhaps. I begin with the simple example offered by Per Bak and his colleagues some years ago-Bak’s “sand pile” and “self-organized criticality” The experiment requires a table and some sand. Drop the sand slowly on the table. The sand gradually piles up, fills the tabletop, piles to the rest angle of sand, then sand avalanches begin to fall to the floor.
Keep adding sand slowly to the sand pile and plot the size distribution of sand avalanches. You will obtain many small avalanches and progressively fewer large avalanches. In fact, you will achieve a characteristic size distribution called a “power law.” Power law distributions are easily seen if one plots the logarithm of the number of avalanches at a given size on the y-axis, and the logarithm of the size of the avalanche on the x-axis. In the sand pile case, a straight line sloping downward to the right is obtained. The slope is the power law relation between the size and number of avalanches.
Bak and his friends called their sand pile “self-organized critical.” Here, “critical” means that avalanches occur on all length scales, “self-organized” means that the system tunes itself to this critical state.
Many of us have now explored the application of Bak’s ideas in models of coevolution that I will discuss shortly With caveats that other explanations may account for the data, the general result is that something may occur that is like a theory of coevolutionary assembly that yields a self-organized critical biosphere with a power law distribution of small and large avalanches of extinction and speciation events. As we shall see, the best data now suggest that precisely such a power law distribution of extinction and speciation events has occurred over the past 650 million years of the Phanerozoic. In addition, the same body of theory predicts that most species go extinct soon after their formation, while some live a long time. The predicted species lifetime distribution is a power law. So too are the data.
Similar phenomena may occur in an econosphere. Small and large avalanches of extinction and speciation events occur in our technologies. A colleague, Brian Arthur, is fond of pointing out that when the car came in, the horse, buggy, buggy whip, saddlery, smithy, and Pony Express went out of business. The car paved the way for an oil and gas industry, paved roads, motels, fast-food restaurants, and suburbia. The Austrian economist Joseph Schumpeter wrote about this kind of turbulence in capitalist economies. These Schumpeterian gales of creative destruction appear to occur in small and large avalanches. Perhaps the avalanches arise in power laws. And, like species, most firms die young; some make it to old age–Storre, in Sweden, is over nine hundred years old. The distribution of firm lifetimes is again a power law.
Here are hints–common law, ecosystems, economic systems–that general principles govern the coevolutionary coconstruction of lives and livings, organisms and natural games, firms and economic opportunities. Perhaps such a law governs any biosphere anywhere in the cosmos.
I shall suggest other candidate laws for any biosphere in the course of Investigations. As autonomous agents coconstruct a biosphere, each must manage to categorize and act upon its world in its own behalf. What principles might govern that categorization and action, one might begin to wonder. I suspect that autonomous agents coevolve such that each makes the maximum diversity of reliable discriminations upon which it can act reliably as it swims, scrambles, pokes, twists, and pounces. This simple view leads to a working hypothesis: Communities of agents will coevolve to an “edge of chaos” between overrigid and overfluid behavior. The working hypothesis is richly testable today using, for example, microbial communities.
Moreover, autonomous agents forever push their way into novelty–molecular, morphological, behavioral, organizational. I will formalize this push into novelty as the mathematical concept of an “adjacent possible,” persistently explored in a universe that can never, in the vastly many lifetimes of the universe, have made all possible protein sequences even once, bacterial species even once, or legal systems even once. Our universe is vastly nonrepeating; or, as the physicists say, the universe is vastly nonergodic. Perhaps there are laws that govern this nonergodic flow. I will suggest that a biosphere gates its way into the adjacent possible at just that rate at which its inhabitants can just manage to make a living, just poised so that selection sifts out useless variations slightly faster than those variations arise. We ourselves, in our biosphere, econosphere, and technosphere, gate our rate of discovery. There may be hints here too of a general law for any biosphere, a hoped-for new law for self-constructing systems of autonomous agents. Biospheres, on average, may enter their adjacent possible as rapidly as they can sustain; so too may econospheres. Then the hoped-for fourth law of thermodynamics for such self-constructing systems will be that they tend to maximize their dimensionality, the number of types of events that can happen next.
And astonishingly, we need stories. If, as I will suggest, we cannot prestate the configuration space, variables, laws, initial and boundary conditions of a biosphere, if we cannot foretell a biosphere, we can, nevertheless, tell the stories as it unfolds. Biospheres demand their Shakespeares as well as their Newtons. We will have to rethink what science is itself. And C. P. Snow’s “two cultures’ the humanities and science may find an unexpected, inevitable union.
Investigations leads us to new views of the biosphere as a coconstructing system. In the final chapter, I step beyond the central concern with autonomous agents to consider the universe itself. Again, there are hints of coconstruction-of the laws themselves, of the complexity of the universe, of geometry itself. The epilogue concludes with limits to reductionism in its strong form and an invocation to a constructivist science.
No one knows.
From INVESTIGATIONS by Stuart Kauffman, copyright – 2000 by Oxford University Press, Inc. Used by permission of Oxford University Press, Inc.