A New Kind of Science: Analysis
June 24, 2002 by Scott Aaronson
This review of Stephen Wolfram’s new book addresses weaknesses in Wolfram’s notions of computational complexity, general relativity, quantum mechanics, and the Bell inequality violation.
“Somebody says, ‘You know, you people always say that space is continuous. How do you know when you get to a small enough dimension that there really are enough points in between, that it isn’t just a lot of dots separated by little distances?’ Or they say, ‘You know those quantum mechanical amplitudes you told me about, they’re so complicated and absurd, what makes you think those are right? Maybe they aren’t right.’ Such remarks are obvious and are perfectly clear to anybody who is working on this problem. It does not do any good to point this out.” – Richard Feynman [7]
1. Introduction
A New Kind of Science [18], the 1280-page treatise by Mathematica creator Stephen Wolfram, has only a few things to say about quantum computing. Yet the book’s goal – to understand nature in computational terms – is one widely shared by the quantum computing community. Thus, many in the field will likely be curious: is this 2.5-kilogram tome worth reading? Notwithstanding newspaper comparisons [6] to Darwin’s Origin of Species, what is the book’s actual content? This review will not attempt a chapter-by-chapter evaluation, but will focus on two specific areas: computational complexity (Section 2), especially the claimed relevance of Wolfram’s Principle of Computational Equivalence to NP-completeness (Section 2.1); and fundamental physics (Section 3), especially quantum mechanics (Section 3.1) and Wolfram’s attempt to account for Bell inequality violation in a classical model (Section 3.2).
As a popularization, A New Kind of Science is an impressive accomplishment. The book’s main theme is that simple programs can exhibit complex behavior. For example, let p_{i,j}=1 if cell (i,j) is colored black, and p_{i,j}=0 if white. Then the ‘Rule 110′ cellular automaton is defined by the recurrence
p_{i+1,j} = p_{i,j} + p_{i,j+1} – (1+p_{i,j-1})p_{i,j}p_{i,j+1}
for i>0, given some initial condition at i=0. Wolfram emphasizes that such an automaton, even when run with a simple initial condition such as a single black cell, can generate a complicated-looking image with no apparent repetitive or nested structure. This phenomenon, although well known to programming enthusiasts as well as professionals, will no doubt surprise many general readers.
Using cellular automata as a framework, Wolfram moves on to discuss a range of topics – including the second law of thermodynamics, natural selection, plant and animal morphology, artificial intelligence, fluid dynamics, special and general relativity, quantum mechanics, efficient algorithms and NP-completeness, heuristic search methods, cryptography and pseudorandomness, data compression, statistical hypothesis testing, Gödel’s Theorem, axiomatic set theory, and the Church-Turing thesis. What is noteworthy is that he explains all of these without using formal notation. To do so, he relies on about 1000 high-resolution graphics, which often (though not always) convey the ideas with as much precision as a formula would. With suitable disclaimers, A New Kind of Science could form an excellent basis for an undergraduate general-science course.
The trouble is that, as the title implies, Wolfram emphatically does not believe that he is using cellular automata to popularize known ideas. In the introduction, he describes his finding that one-dimensional cellular automata can produce complex behavior as one of the “more important single discoveries in the whole history of theoretical science” (p. 2). He refers in the preface to “a crack in the very foundations of existing science,” “new ideas and new methods that ultimately depend very little on what has gone before,” and “a vast array of applications – both conceptual and practical – that can now be developed.” Comments of this character pervade the book.
Significantly, there is no bibliography. Instead there are 349 pages of endnotes, which summarize the history, from antiquity to the present, of each subject that Wolfram addresses. The notes are fascinating; in many respects they constitute a better book than the main text. However, in both the main text and in the notes, Wolfram generally brings up prior work only to dismiss it as misguided, or at best as irrelevant to his concerns. For example, after relating his ‘discovery’ that there is apparent complexity in the distribution of primes, Wolfram acknowledges that “the first few hundred primes were no doubt known even in antiquity, and it must have been evident that there was at least some complexity in their distribution” (p. 134).
However [he continues], without the whole intellectual structure that I have developed in this book, the implications of this observation – and its potential connection, for example, to phenomena in nature – were not recognized. And even though there has been a vast amount of mathematical work done on the sequence of primes over the course of many centuries, almost without exception it has been concerned not with basic issues of complexity but instead with trying to find specific kinds of regularities (p. 134).
2. Computational Complexity
In the opening chapter we are promised that Wolfram’s new kind of science “begins to shed new light on various longstanding questions in computational complexity theory” (p. 14). On pages 758-764 we finally learn what this means. By enumerating simple Turing machines – say, all those with 4 states and 2 symbols – one can lower-bound the number of steps needed by any such machine to perform a particular task. To illustrate that simple machines can already display nontrivial behavior, Wolfram exhibits a machine (call it M) that decides a language L_{M} in time exponential in the length of its input. No machine with at most 4 states can decide L_{M} more efficiently, and Wolfram conjectures that M is ‘irreducible,’ meaning that no machine with any number of states can decide L_{M} substantially more efficiently.
It is unlikely that this enumeration approach scales even to 5-state, 2-symbol machines. For let S(n), the n^{th} ‘Busy Beaver shift’ number, be the maximum number of steps that an n-state, 2-symbol machine can make before halting, when started on a blank tape. Then as noted by Wolfram on p. 889, determining S(5) is still open; the best known lower bound, due to Marxen and Buntrock [12], is S(5)>=47,176,870.The situation is even worse if the machine can be started on a non-blank tape.
Wolfram would reply that extremely short programs are already of interest, since they can exhibit ‘irreducible complexity.’ And indeed, in Chapter 11 he gives a remarkable construction, due to his employee Matthew Cook, showing that even the Rule 110 cellular automaton is a universal computer. A corollary is that there exists a 2-state, 5-symbol universal Turing machine. However, such constructions inevitably rely on a complicated input encoding. And given a problem of interest, such as matrix multiplication, there seems to be nothing to suggest that an extremely short program exists to solve it using a standard encoding (note 1). So by restricting to extremely short programs, in effect we trade an infeasible search among programs for an infeasible search among input encoding schemes.
Might it be worthwhile, though, to examine tradeoffs between running time and program size? If we do this, then again we must not allow preprocessing of the input, since otherwise a constant-size program can always achieve within a polynomial factor of the optimal running time. We therefore obtain complexity measures that are not ‘robust’ (in the traditional sense) under changes to the input representation. Such measures might nevertheless have some interest for theoretical computer science.
Elsewhere in the book, there are a few errors and oversights regarding complexity. Wolfram says that minimizing a DNF expression (p. 1096) and computing a permanent (p. 1146) are NP-complete; they are respectively Σ_{2}P-complete (as shown by Umans [16]) and #P-complete (as shown by Valiant [17]). Also, in Chapter 10, pseudorandom number generators based on cellular automata are proposed. Wolfram suggests that, since certain questions involving cellular automata are NP-complete, these generators might be a good basis for cryptography (note 2):
To date [no] system has been devised whose cryptanalysis is known to be NP-complete. Indeed, essentially the only problem on which cryptography systems have so far successfully been based is factoring of integers. [And] while this problem has resisted a fair number of attempts at solution, it is not known to be NP-complete (and indeed its ability to be solved in polynomial time on a formal quantum computer may suggest that it is not) (p. 1089-1090).
There are two problems here. First, all cryptanalysis problems are in NP intersect coNP, and thus cannot be NP-complete unless NP=coNP. Second, there is no shortage of candidate pseudorandom generators (or equivalently, candidate one-way functions; Håstad et al. [9] showed that either can be obtained from the other). Attention has focused on factoring – and on other ‘structured’ problems, involving elliptic curves, error-correcting codes, lattices, and so on – because of the need for trapdoor one-way functions in public-key cryptography.
2.1. The Principle of ComputationalEquivalence
The final chapter proposes a ‘Principle of Computational Equivalence’: that almost all systems that are not ‘obviously simple’ are in some sense equivalent to a universal Turing machine. Wolfram [19] emphasizes that this principle goes beyond the Church-Turing thesis in two ways: it asserts, first, that universality is pervasive in nature; and second, that universality arises in any sufficiently complex system, without needing to be ‘engineered in.’ We confess that the principle still seems to us an expression of the conventional wisdom in theoretical computer science. However, we will not debate this question in general terms. Instead we will consider a specific implication that Wolfram offers for computational complexity:
In studying the phenomenon of NP completeness what has mostly been done in the past is to try to construct particular instances of rather general problems that exhibit equivalence to other problems. But almost always what is actually constructed is quite complicated – and certainly not something one would expect to occur at all often. Yet on the basis of intuition from the Principle of Computational Equivalence I strongly suspect that in most cases there are already quite simple instances of general NP-complete problems that are just as difficult as any NP-complete problem. And so, for example, I suspect that it does not take a cellular automaton nearly as complicated as [one with 19 colors given previously] for it to be an NP-complete problem to determine whether initial conditions exist that lead to particular behavior. (p. 769)
In computer science, the complexity of ‘typical’ instances of NP-complete problems has been investigated for decades. Highlights include Levin’s theory of average-case completeness [10] and studies of phase transitions in randomly generated combinatorial problems [5]. It remains open to show that some NP-complete problem is hard on average, under a simple distribution, so long as P!=NP. However, ‘worst-case/average-case equivalence’ has been shown for several cryptographic problems, including one studied by Ajtai and Dwork [1].
As for the cellular automaton conjecture, its validity depends on how it is formulated. Suppose we are given a one-dimensional, two-color cellular automaton on a lattice of bounded size n, and an ending condition E in {0,1}^{n}. Then we can decide in polynomial time whether there exists an initial condition that evolves to E in one or more steps, by using dynamic programming. Extending this technique, we can decide whether there exists an initial condition that evolves to E in exactly t steps, where t=O(log n), by computing a list of all possible initial configurations for each contiguous block of t cells.
Indeed, for any fixed polynomial-time predicate Φ, let Init_{110}(Φ) be the following problem. We are given an ending condition E with n cells, and asked to decide whether there exists an initial condition I such that (i) Φ(I) holds, and (ii) the Rule 110 cellular automaton evolves I to E in exactly t steps. Here t is a fixed polynomial in n. Then:
Proposition 1. For all Φ and polynomials p, there is a polynomial-time algorithm that solves a 1-1/p(n) fraction of Init_{110}(Φ) instances.
Proof. Consider a directed graph with 2^{n} vertices, one for each configuration, and edges corresponding to the action of Rule 110. Each vertex has outdegree 1, so the number of paths of length t is 2^{n}. Thus, if E is chosen uniformly at random, then the expected number of length-t paths ending at E is 1, so this number is at most p(n) with probability at least 1-1/p(n). In this case we can trace t steps backward from E in time O(ntp(n)), maintaining a list of all possible predecessors, and then evaluate Φ on each.
Nevertheless, since Rule 110 is universal, the following intuition suggests that Init_{110}(Φ) should be NP-complete for some Φ. Given a Boolean formula Ψ and proposed solution X, we could create an initial condition I(Ψ,X) corresponding to a Turing machine that checks whether X satisfies Ψ; and if it does, erases X, preserves Ψ, and goes into an ‘accept’ state A(Ψ). Then by having Φ verify that the initial condition is of legal form, we could reduce the problem of whether Ψ is satisfiable to that of whether there exists a legal initial condition that evolves to E=A(Ψ).
This intuition fails for an interesting reason. Cook’s proof that Rule 110 is universal relies on simulating ‘cyclic tag systems,’ a variant of the tag systems studied in the 1960′s by Cocke and Minsky [4] among others (see also p. 670 of Wolfram). However, though Wolfram does not discuss this explicitly in the book, the known simulations of Turing machines by tag systems require exponential slowdown. To prove thatInit_{110}(Φ) is NP-complete, what is needed is to show that Rule 110 allows efficient simulation of Turing machines.
In summary, there are many fascinating complexity questions about one-dimensional cellular automata, as well as about typical instances of NP-complete problems. But it is unclear why the Principle of Computational Equivalence should yield more insight into these questions than the standard techniques of computational complexity.
3. Fundamental Physics
The most interesting chapter of A New Kind of Science is the ninth, on ‘Fundamental Physics.’ Here Wolfram confronts general relativity and quantum mechanics, arguably the two most serious challenges to a view of nature based on deterministic cellular automata. He conjectures that spacetime is discrete at the Planck scale, of about 10^{-33} centimeters or 10^{-43} seconds. This conjecture is not new, and has received considerable attention recently in connection with the holographic principle [3] from black hole thermodynamics, which Wolfram does not discuss (note 3). But are new ideas offered to substantiate the conjecture?
For Wolfram, spacetime is a causal network, in which events are vertices and edges specify the dependence relations between events. Pages 486-496 and 508-515 discuss in detail how to generate such a network from a simple set of rules. In particular, we could start with a finite undirected ‘space graph’ G, assumed to be regular with degree 3 (since higher-degree vertices can be replaced by cycles of degree-3 vertices). We then posit a set of update rules, each of which replaces a subgraph by another subgraph with the same number of outgoing edges. The new subgraph must preserve any symmetries of the old one. Then each event in the causal network corresponds to an application of an update rule. If updating event B becomes possible as a result of event A, then we draw an edge from A to B. As pointed out to us by Rowland [13], it is important to keep in mind the distinction between the causal network and the space graph G.
Properties of space are defined in terms of G. For example, if the number of vertices in G at distance at most n from any given vertex grows as n^{D}, then space can be said to have dimension D. (As for formalizing this definition, Wolfram says only that there are “some subtleties. For example, to find a definite volume growth rate one does still need to take some kind of limit – and one needs to avoid sampling too many or too few” vertices (p. 1030).) Similarly, Wolfram argues that the curvature information needed for general relativity, in particular the Ricci tensor (note 4), can be read from the connectivity pattern of G. Interestingly, to make the model as simple as possible, Wolfram does not associate a bit to each vertex of G, representing (say) the presence or absence of a particle. Instead particles are localized structures, or ‘tangles,’ in G.
An immediate problem is that one might obtain many nonequivalent causal networks, depending on the order in which update rules are applied to G. Wolfram calls a set of rules that allows such nondeterministic evolution a ‘multiway system.’ He recognizes, but rejects, a possible connection to quantum mechanics:
The notion of ‘many-figured time’ has been discussed since the 1950s in the context of the many-worlds interpretation of quantum mechanics. There are some similarities to the multiway systems that I consider here. But an important difference is that while in the many-worlds approach, branchings are associated with possible observation or measurement events, what I suggest here is that they could be an intrinsic feature of even the very lowest-level rules for the universe (p. 1035-6).
It is unclear exactly what distinction is being drawn: is there any physical event that is not associated with a possible observation or measurement? In any case, Wolfram opts instead for rule sets that are ‘causal invariant’: that is, that yield the same causal network regardless of the order in which rules are applied. As noted by Wolfram, a sufficient (though not necessary) condition for causal invariance is that no ‘replaceable’ subgraph overlaps itself or any other replaceable subgraph.
Wolfram points out an immediate analogy to special relativity, wherein observers do not in general agree on the order in which spacelike separated events occur, yet agree on any final outcome of the events. He is vague, though, about how (say) the Lorentz transformations might be derived in a causal network model:
There are many subtleties here, and indeed to explain the details of what is going on will no doubt require quite a few new and rather abstract concepts. But the general picture that I believe will emerge is that when particles move faster they will appear to have more nodes associated with them (p. 529).
Wolfram is “certainly aware that many physicists will want to know more details,” he says in the endnotes, about how a discrete model of the sort he proposes can reproduce known features of physics. But, although he chose to omit technical formalism from the presentation, “[g]iven my own personal background in theoretical physics it will come as no surprise that I have often used such formalism in the process of working out what I describe in these sections” (p. 1043).
3.1. Quantum Mechanics
Physicists’ hunger for details will likely grow further when they read the section on ‘Quantum Phenomena’ (p. 537-545). Here Wolfram’s proposals are even more unorthodox than he seems to acknowledge. He believes that quantum mechanics is only an approximation to an underlying classical (and most likely deterministic) theory. He is not advocating a hidden-variable approach such as Bohmian mechanics, in which the state vector is supplemented by an ‘actual’ eigenstate of a particular observable. Instead he thinks that, at the lowest level, the state vector is not needed at all; it is merely a useful construct for describing some (though presumably not all) higher-level phenomena. Indeterminacy arises because of one’s inability to know the exact state of a system:
[I]f one knew all of the underlying details of the network that makes up our universe, it should always be possible to work out the result of any measurement. I strongly believe that the initial conditions for the universe were quite simple. But like many of the processes we have seen in this book, the evolution of the universe no doubt intrinsically generates apparent randomness. And the result is that most aspects of the network that represents the current state of our universe will seem essentially random (p. 543).
Similarly, Wolfram explains as follows why an electron has wave properties: “…a network which represents our whole universe must also include us as observers. And this means that there is no way that we can look at the network from the outside and see the electron as a definite object” (p. 538). An obvious question then is how Wolfram accounts for the possibility of quantum computing, assuming BPP!=BQP. He gives an answer in the final chapter:
Indeed within the usual formalism [of quantum mechanics] one can construct quantum computers that may be able to solve at least a few specific problems exponentially faster than ordinary Turing machines. But particularly after my discoveries in Chapter 9 ['Fundamental Physics'], I strongly suspect that even if this is formally the case, it will still not turn out to be a true representation of ultimate physical reality, but will instead just be found to reflect various idealizations made in the models used so far (p. 771).
In the endnotes, though, where he explains quantum computing in more detail, Wolfram seems to hedge about which idealizations he has in mind:
It does appear that only modest precision is needed for the initial amplitudes. And it seems that perturbations from the environment can be overcome using versions of error-correcting codes. But it remains unclear just what might be needed actually to perform for example the final measurements required (p. 1148).
One might respond that, with or without quantum computing, Wolfram’s proposals can be ruled out on the simpler ground that they disallow Bell inequality violations. Wolfram, however, puts forward an imaginative hypothesis to account for bipartite entanglement. When two particles (or ‘tangles’ in the graph G) collide, long-range ‘threads’ may form between them, which remain in place even if the particles are later separated:
The picture that emerges is then of a background containing a very large number of connections that maintain an approximation to three-dimensional space, together with a few threads that in effect go outside of that space to make direct connections between particles (p. 544).
The threads can produce Bell correlations, but are somehow too small (i.e. contain too few edges) to transmit information in a way that violates causality. This is reminiscent of, for example, the ‘multisimultaneity’ model studied experimentally by Stefanov et al. [15].
There are several objections one could raise against this thread hypothesis. What we will show in Section 3.2 is that, if one accepts two of Wolfram’s own desiderata – determinism and causal invariance – then the hypothesis fails. For now, though, we remark that Wolfram says little about what, to us, is a more natural possibility than the thread hypothesis. This is an explicitly quantum cellular automaton or causal network, with a unitary transition rule. The reason seems to be that he does not want continuity anywhere in a model, not even in probabilities or amplitudes. In the notes, he describes an experiment with a quantum cellular automaton as follows:
One might hope to be able to get an ordinary cellular automaton with a limited set of possible values by choosing a suitable [phase rotation] θ [θ=π/4 and θ=π/3 are given as examples in an illustration]. But in fact in non-trivial cases most of the cells generated at each step end up having distinct values (p. 1060).
This observation is unsurprising, given results in quantum computing to the effect that almost any nontrivial gate set is universal (that is, can approximate any unitary matrix to any desired precision, or any orthogonal matrix in case one is limited to reals). Indeed, Shi [14] has shown that a Toffoli gate (note 5) plus any gate that does not preserve the computational basis, or a controlled-NOT gate plus any gate whose square does not preserve the computational basis, are both universal gate sets. In any case, Wolfram does not address the fact that continuity in amplitudes seems more ‘benign’ than continuity in measurable quantities: the former, unlike the latter, does not enable an infinite amount of computation to be performed in a finite time. Also, as observed by Bernstein and Vazirani [2], the linearity of quantum mechanics implies that small errors in amplitudes cannot be magnified during a quantum computation.
3.2. Bell’s Theorem and Causal Invariance
Let R be a set of graph updating rules, which might be probabilistic. Then consider the following four assertions (which, though not mathematically precise, will be clarified by subsequent discussion).
- R satisfies causal invariance. That is, given any initial graph (and choice of randomness if R is probabilistic), R yields a unique causal network.
- R satisfies the relativity postulate. That is, assuming the causal network approximates a flat Minkowski spacetime at a large enough scale, there are no preferred inertial frames.
- R permits Bell inequality violation.
- Any updating rule in R is always considered to act on a fixed graph, not on a distribution or superposition over graphs. This is true even if parts of the initial graph are chosen at random, and even if R is probabilistic.
Our goal is to show that, for any R, at least one of these assertions is false. Current physical theory would suggest that (1)-(3) are true and that (4) is false. Wolfram, if we understand him correctly, starts with (4) as a premise, and then introduces causal invariance to satisfy (1) and (2), and long-range threads to satisfy (3). Of course, even to state the two-party Bell inequalities requires some notion of randomness. And on pages 299-326, Wolfram discusses three mechanisms for introducing randomness into a system: randomness in initial conditions, randomness from the environment (i.e. probabilistic updating rules), and intrinsic randomness (i.e. deterministic rules that produce pseudorandom output). However, all of these mechanisms are compatible with (4), and so our argument will show that they are inadequate assuming (1)-(3). Our conclusion is that, in a model of the sort Wolfram considers, randomness must play a more fundamental role than he allows.
In a standard Bell experiment, Alice and Bob are given input bits x_{A} and x_{B} respectively, chosen uniformly and independently at random. Their goal is, without communicating, to output bits y_{A} and y_{B} respectively such that
y_{A} XOR y_{B} = x_{A} AND x_{B}.
Under any ‘local hidden variable’ theory, Alice and Bob can succeed with probability at most 3/4; the optimal strategy is for them to ignore their inputs and output (say) y_{A}=0 and y_{B}=0. However, suppose Alice has a qubit ρ_{A} and Bob a ρ_{B}, that are jointly in the Bell state (|00>+|11>)/sqrt(2). Then there is a protocol (note 6) by which they can succeed with probability (5+sqrt(2))/8, or about 0.802.
We model this situation by letting A and B, corresponding to Alice and Bob, be disjoint subgraphs of a graph G. We suppose that, at a large scale, G approximates a Euclidean space of some dimension; and that any causal network obtained by applying updates to Gapproximates a Minkowski spacetime. We can think of G as containing long-range threads from A to B, though the nature of the threads will not affect our conclusions. We encode Alice’s input x_{A} by (say) placing an edge between two specific vertices in A if and only if x_{A}=1. We encode x_{B} similarly, and also supply Alice and Bob with arbitrarily many correlated random bits. Finally we stipulate that, at the end of the protocol, there is an edge between two specific vertices in A if and only if y_{A}=1, and similarly for y_{B}. A technicality is that we need to be able to identify which vertices correspond to x_{A}, y_{A}, and so on, even as G evolves over time. We could do this by stipulating that (say) “the x_{A} vertices are the ones that are roots of complete binary trees of depth 3,” and then choosing the rule set to guarantee that, throughout the protocol, exactly two vertices have this property.
Call a variable ‘touched’ after an update has been applied to a subgraph containing any of the variable’s vertices. Also, let Z be an assignment to all random variables: that is, x_{A}, x_{B}, the correlated random bits, and the choice of randomness if R is probabilistic. Then for all Z we require the following, based on what observers in different inertial frames could perceive:
- There exists a sequence of updates under which y_{A} is output before any of Bob’s variables are touched.
- There exists another sequence under which y_{B} is output before any of Alice’s variables are touched.
Then it is easy to see that, if Bell inequality violation occurs, then causal invariance must be violated. Given Z, let y_{A}^{(1)}(Z),y_{B}^{(1)}(Z) be the values of y_{A},y_{B} that are output under rule sequence (i), and let y_{A}^{(2)}(Z),y_{B}^{(2)}(Z) be the values output under sequence (ii). Then there must exist some Z for which eithery_{A}^{(1)}(Z) != y_{A}^{(2)}(Z) or y_{B}^{(1)}(Z) != y_{B}^{(2)}(Z) – for if not, then the entire protocol could be simulated under a local hidden variable model. It follows that the outcome of the protocol can depend on the order in which updates are applied.
To obtain Bell inequality violation, something like the following seems to be needed. We can encode ‘hidden variables’ into G, representing the outcomes of the possible measurements Bob could make on ρ_{B}. (We can imagine, if we like, that the update rules are such that observing any one of these variables destroys all the others. Also, we make no assumption of contextuality.) Then, after Alice measures ρ_{A}, using the long-range threads she updates Bob’s hidden variables conditioned on her measurement outcome. Similarly, Bob updates Alice’s hidden variables conditioned on his outcome. Since at least one party must access its hidden variables for there to be Bell inequality violation, causal invariance is still violated. But a sort of probabilistic causal invariance holds, in the sense that if we marginalize out A (the ‘Alice’ part of G), then the distribution of values for each of Bob’s hidden variables is the same before and after Alice’s update. The lesson is that, if we want both causal invariance and Bell inequality violation, then we need to introduce probabilities at a fundamental level – not merely to represent Alice and Bob’s subjective uncertainty about the state of G, but even to define whether a set of rules is or is not causal invariant.
Note that we made no assumption about how the random bits were generated – i.e. whether they were ‘truly random’ or were the pseudorandom output of some updating rule. Our conclusion is also unaffected if we consider a ‘deterministic’ variant of Bell’s theorem due to Greenberger, Horne, and Zeilinger [8], discussed by Wolfram on p. 1065. There three parties, Alice, Bob, and Charlie, are given input bits x_{A}, x_{B}, and x_{C} respectively, satisfying the promise that
x_{A} + x_{B} + x_{C} = 0 (mod 2).
The goal is to output bits y_{A}, y_{B}, and y_{C} such that
y_{A} + y_{B} + y_{C} = 1 – (1-x_{A})(1-x_{B})(1-x_{C}) (mod 2).
Under a local hidden variable model, there is no protocol that succeeds on all four possible inputs; but if the parties share the GHZ state (|011>+|101>+|110>-|000>)/2, then such a protocol exists. However, although the output is correct with certainty, assuming causal invariance one cannot implement the protocol without introducing randomness into the underlying rules, precisely as for the two-party Bell inequality.
After a version of the above argument was circulated, Rowland [13] claimed that the argument fails for the following reason. We assumed that there exist two sequences of updating events, one in which Alice’s measurement precedes Bob’s and one in which Bob’s precedes Alice’s. But we neglected the possibility that a single update, call it E, is applied to a subgraph that straddles the long-range threads. The event E would encompass both Alice and Bob’s measurements, so that neither would precede the other in any sequence of updates. We could thereby obtain a rule set R satisfying assertions (1), (3), and (4).
We argue that such an R would nevertheless fail to satisfy (2). For in effect we start with a flat Minkowski spacetime, and then take two distinct events that are simultaneous in a particular inertial frame, and identify them as being the same event E. This can be visualized as ‘pinching together’ two horizontally separated points on a spacetime diagram. (Actually a whole ‘V’ of points must be pinched together, since otherwise entanglement could not have been created.) However, what happens in a different inertial frame? It would seem that E, a single event, is perceived to occur at two separate times. That by itself might be thought acceptable, but it implies that there exists a class of preferred inertial frames: those in which E is perceived to occur only once. Of course, even in a flat spacetime, one could designate as ‘preferred’ those frames in which Alice and Bob’s measurements are perceived to be simultaneous. A crucial distinction, though, is that there one only obtains a class of preferred frames after deciding which event at Alice’s location, and which at Bob’s location, should count as the ‘measurement.’ Under Rowland’s hypothesis, by contrast, once one decides what counts as the measurement at Alice’s location, the decision at Bob’s location is made automatically, because of the identification of events that would otherwise be far apart.
4. Conclusion
Steven Levy [11] opines in Wired that “probably the toughest criticism [of A New Kind of Science] will come from those who reject Wolfram’s ideas because the evidence for his contentions is based on observing systems contained inside computers.” In our opinion, however, it is preferable to judge the book on its own terms – to grant, that is, that many complex systems might indeed be fruitfully understood in terms of simple computations. The question is, what does the book tell us about such systems, beyond what was known from ‘traditional science’?
This review focused on two fields, computational complexity and fundamental physics, which Wolfram claims are transformed by the discoveries in his book. We made no attempt to assess the book’s relevance to other fields such as evolutionary biology and statistical physics.
In computational complexity, we argued that Wolfram often recapitulates existing ideas (such as pseudorandomness and the intractability of simple instances of NP-complete problems), albeit without precise definitions or proofs, and with greater claims of significance. On the other hand, some of the book’s contents – such as the explicit constructions of Turing machines – may be of interest to theoretical computer scientists.
In physics, the book proposes that spacetime be viewed in terms of causal networks arising from graph rewriting systems. The causal network models are intriguing, and in our opinion merit further mathematical study. However, their relevance to physics is difficult to evaluate without the details that Wolfram declines to supply. As for the proposal that a deterministic, relativistically invariant, causal invariant model underlies quantum mechanics, we argued that it fails – even if quantum mechanics breaks down for more than two particles, and even if, as Wolfram suggests, one allows long-range threads to connect entangled particles. Exactly what kinds of classical models could underlie quantum mechanics is a question of great importance, but Wolfram makes no serious effort to address the question.
A New Kind of Science was published by Wolfram’s own company, and was not subject to outside editing or peer review. If it were (say) a creationist tract, then this unusual route to publication would be of little consequence: for then no amount of editing would have improved it, and few scientifically literate readers would be misled by it. What is unfortunate in this case is that outside editing would probably have made a substantial difference. In an endnote, Wolfram explains that “[p]erhaps I might avoid some criticism by a greater display of modesty [in the text], but the cost would be a drastic reduction in clarity” (p. 849). However, were the book more cautious in its claims and more willing to acknowledge previous work, it would likely be easier for readers to assess what it does offer: a cellular-automaton-based perspective on existing ideas in science. Thus, we believe the book would be not only less susceptible to criticism, but also clearer.
Acknowledgments
I thank Stephen Wolfram for an enjoyable conversation about this review, David Reiss for arranging the conversation, Todd Rowland for correspondence about Section 3.2, and Dave Bacon for helpful comments. Supported by an NSF Graduate Fellowship and by DARPA grant F30602-01-2-0524.
Notes
References
[1] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case equivalence, Electronic Colloquium on Computational Complexity (ECCC) 3(65), 1996.
[2] E. Bernstein and U. Vazirani. Quantum complexity theory, SIAM Journal on Computing, 26(5):1411-1473, 1997.
[3] R. Bousso. The holographic principle, to appear in Reviews of Modern Physics, 74(3), 2002.
[4] J. Cocke and M. Minsky. Universality of tag systems with P=2, Journal of the ACM, 11(1):15-20, 1964.
[5] S. A. Cook and D. Mitchell. Finding hard instances of the satisfiability problem: a survey, DIMACS Series in Discrete Math and Theoretical Computer Science 35, pp. 1-17, 1997.
[6] G. Farmela. The universe in black and white, The Daily Telegraph, 13 January 1999.
[7] R. P. Feynman. The Character of Physical Law, MIT Press, 1998 (originally published 1965).
[8] D. M. Greenberger, M. A. Horne, and A. Zeilinger. Bell’s theorem without inequalities, in Sixty-Two Years of Uncertainty: Historical, Philosophical, and Physical Inquiries into the Foundations of Quantum Mechanics, A. Miller (ed.), Plenum, 1990.
[9] J. Håstad, R. Impagliazzo, L. A. Levin, and M. Luby. A pseudorandom generator from any one-way function, SIAM Journal on Computing, 28(4):1364-1396, 1999.
[10] L. A. Levin. Average case complete problems, SIAM Journal on Computing, 15(1):285-286, 1986.
[11] S. Levy. The man who cracked the code to everything, Wired, June 2002.
[12] H. Marxen and J. Buntrock. Attacking the Busy Beaver 5, Bulletin of the EATCS 40, pp. 247-251, 1990. Also http://www.drb.insel.de/~heiner/BB.
[13] T. Rowland. Email correspondence, June 12, 2002.
[14] Y. Shi. Both Toffoli and controlled-NOT need little help to do universal quantum computation, submitted, 2002. quant-ph/0205115.
[15] A. Stefanov, H. Zbinden, N. Gisin, and A. Suarez. Quantum correlations with spacelike separated beam splitters in motion: experimental test of multisimultaneity, Physical Review Letters 88(12), 2002.
[16] C. Umans. The minimum equivalent DNF problem and shortest implicants, Proceedings of 39th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 556-563, 1998.
[17] L. G. Valiant. The complexity of computing the permanent, Theoretical Computer Science, 8(2):189-201, 1979.
[18] S. Wolfram. A New Kind of Science, Wolfram Media, 2002.
[19] S. Wolfram. Telephone conversation, June 10, 2002.