Singularity Math Trialogue

March 28, 2001 by Ray Kurzweil, Vernor Vinge, Hans Moravec

Hans Moravec, Vernor Vinge, and Ray Kurzweil discuss the mathematics of the Singularity, making various assumptions about growth of knowledge vs. computational power.

From: Hans Moravec

Date: February 15, 1999

To: Ray Kurzweil

Subject: Foldover in the accelerating exponential

Hi Ray,

Following up a thread touched on in your visit: Vernor Vinge noted (and wrote SF stories) in the 1960s that AI, by automating the process of innovation, would increasingly accelerate its own development, shortening the time constant of its own exponential.

If the doubling rate is proportional to the magnitude of the exponential, then the curve undergoes equal multiplicative steps in exponentially decreasing time intervals, and reaches infinite slope in a finite time.

Vinge calls this the “Technological Singularity,” and notes that it blocks prediction, just as a mathematical singularity prevents extrapolation. There is probably a curve beyond the singularity, but it has a different character.

Here is a recent pointer:

http://www-personal.engin.umich.edu/~jxm/singlar.html

I looked at the prescription for the accelerating exponential, and noted the following:

From: Hans Moravec

Date: September 30, 1997

To: Vernor Vinge

Subject: A kind of Singularity

Hi Vernor,

A tidbit you’ve probably noticed before:

V = exp(V*t) has a singularity (at t = 1/e, V = e) in dV(t)/dt but not in V(t) itself. Instead, V folds back, making a double valued function that kind of suggests time travel!

You can see the curve most easily by parametrically plotting V against t = log(V)/V with the parameter V going from about 1 (or less) to 5.

Or take V only up to e to be neat.

Best–Hans

To: Hans Moravec

Subject: Re: A kind of Singularity

Date: October 1, 1997

From: Vernor Vinge

Hi Hans —

I hadn’t noticed that. Neat and weird: “After a while the world disappeared, but only because it had already become very much nicer”!

Hope things are all going well. (I’m back to teaching after two years at writing. It is very interesting–and as hectic as I remember!)

Regards,

–Vernor

From: Hans Moravec

Date: Monday, February 15, 1999

To: Vernor Vinge, Ray Kurzweil

Subject: Singularity equation correction

So, I reconsidered the V = exp(V*t) formula that suggested a foldback in the progress curve, and decided it didn’t make a lot of sense, in the way it jumbled time t and the rate V of computing.

More careful formulation makes for a less surprising conclusion.

Making maximal simplifying assumptions, and shifting and scaling all quantities to avoid constants:

Let W be “world knowledge”, and assume that each additive increment in W results in a multiplicative improvement in miniaturization, and thus in computer memory and speed V. So:

V = exp(W)

In the old days, assume an essentially constant number of humans worked unassisted at a steady pace to increase W at a steady rate:

dW/dt = 1

So W = t and V = exp(t)

which is a regular Moore’s law.

Now, suppose instead W is created solely by computers, and increases at a rate proportional to computer speed. Then:

dW/dt = V giving dW/exp(W) = dt

This solves to W = log(-1/t) and V = -1/t

W and V rise very slowly when t<<0, might be mistaken for exponential around t = -1, and have a glorious singularity at t = 0.

From: Hans Moravec

Sent: March 10, 1999

To: Ray Kurzweil

Cc: Vernor Vinge

Subject: Response to Singularity Equations

Goody! This led to some new insights.

Ray said:

One of your assumptions is:

V = exp(W) (i.e., that computer power grows exponentially with world knowledge).

I don’t think this is a well-founded assumption. Why would additive increments in W result in a multiplicative improvement in V? I agree that V grows with W, but it makes more sense to say that: if we double world knowledge, we double our understanding of how to build a computer, and, therefore, double the power of computation per unit cost.

The assumption that V = exp(W) is surely too optimistic. I was thinking in terms of independent innovations. For instance, one might be an algorithmic discovery (like log N sorting) that lets you get the same result with half the computation. Another might be a computer organization (like RISC) that lets you get twice the computation with the same number of gates. Another might be a circuit advance (like CMOS) that lets you get twice the gates in a given space. Others might be independent speed-increasing advances, like size-reducing copper interconnects and capacitance-reducing silicon-on-insulator channels. Each of those increments of knowledge more or less multiplies the effect of all of the others, and computation would grow exponentially in their number.

But, of course, a lot of new knowledge steps on the toes of other knowledge, by making it obsolete, or diluting its effect, so the simple independent model doesn’t work in general. Also, simply searching through an increasing amount of knowledge may take increasing amounts of computation. I played with the V=exp(W) assumption to weaken it, and observed that the singularity remains if you assume processing increases more slowly, for instance V = exp(sqrt(W)) or exp(W^1/4). Only when V = exp(log(W)) (i.e, = W) does the progress curve subside to an exponential.

Actually, the singularity appears somewhere in the I-would-have-expected tame region between and V = W and V = W^2 (!)

Unfortunately the transitional territory between the merely exponential V=W and the singularity-causing V=W^2 is analytically hard to deal with. I assume just before a singularity appears, you get non-computably rapid growth!

Your assumption:

N = C4^(C5*t) (the number of computing devices is growing at its own exponential rate) is pretty arbitrary. Wouldn’t it make more sense to have the number increase as some function of W? In the latter case, the number of computers could simply be factored into the V=f(W) equation (where my V means the total amount of computation in the world, equivalent to your N*V).

Suppose computing power per computer simply grows linearly with total world knowledge, but that the number of computers also grows the same way, so that the total amount of computational power in the world grows as the square of knowledge:

V = W*W

also dW/dt = V+1 as before

This solves to W = tan(t) and V = tan(t)^2, which has lots of singularities (I like the one at t = pi/2).

I also question the application of negative time, and in particular zero time (which provides for the singularity).

Oh, that’s just a matter of labeling the axes, with the origin chosen in that particular place to avoid unsightly constants. You could shift it anywhere, if you replace the t in the formulas by (t-t0). But I like it where it is.

If there is a singularity, it’s kind of natural to divide time into BS (the negative times before the singularity) and AS (the strange times afterwards). (I do worry a little that in some of my constant-free formulas, it is easy to find regions where W<0, though they can mostly be shifted away.)

–Hans

 

_______

From: Hans Moravec

Date: October 18, 2003

To: Ray Kurzweil

Subject: Singularity equation, very simple

Well, that was vastly easier than expected. I was holding the wrong end of the stick last time.

Express an exponent a bit above 1 as (n+1)/n

The equation

dx/dt = x^((n+1)/n) with initial condition x(0) = 1

can be solved in Maple

dsolve({ diff(x(t),t)=x(t)^((n+1)/n), x(0)=1 } ,x(t));

to produce the solution

x(t) = (n/(n-t))^n

This has a singularity at t=n.

We discussed exponents in the range 1 to 2. 1 resulted in an exponential, I was surprised that merely setting it to 2 produced a singularity.

In the solution above, exponent 2 happens when n = 1, and gives a singularity at t=1.

The exponent approaches 1 as n is made large. The singularity recedes farther and farther into the future, but never actually goes away.

We reach an exponent of 1 (i.e. dx/dt = x) as n approaches infinity. In that limit (n/(n-t))^n does indeed become e^t (the limit is the classic definition of e^t), and the singularity vanishes to infinitely far into the future.

But for any exponent even slightly greater than 1 there is a singularity, just far away.

This supersedes the confusing previous discussion.

So I tried a functions sub-polynomial (i.e. a smidge) faster than linear

dx/dt = x*(1+log(x)) -> x(t) = exp(exp(t)-1)

vastly fast double exponential, but without a singularity

But merely square the log, or the log factor

dx/dt = x*(1+log(x)^2) -> x(t) = exp(tan(t))

dx/dt = x*(1+log(x(t)))^2 -> x(t) = exp(t/(1-t))

and the singularity is back.

Generalizing:

dx/dt = x*(1+log(x(t)))^((n+1)/n) gives

x(t) = exp((n/(n-t))^n-1)

and again we have a singularity that recedes.

10/19/03, Ray Kurzweil replied:

Thanks, Hans.

I appended your follow up message so that the thread will be complete.

This is what I suspected&#8212that exponents over 1 would produce a singularity.

Now the question is what IS the exponent. We can approach this on either theoretical grounds, or from the data, and hopefully the two approaches will corroborate one another.

Certain linear increments to k (knowledge) cause multiplicative improvements in the performance of a technology. For example, an increment to knowledge that allows us to reduce semiconductor feature widths by a fraction, or a pipelining microprocessor architecture that allows a certain fraction of instructions to be conducted in parallel, result in improving performance by a multiple.

HOWEVER, technology performance is not the same as knowledge. The next question is how does an improvement in the performance of computation (and other information processing technologies such as communication bandwidths) affect knowledge. There are many instances in which it requires exponential gains in information processing to achieve linear gains in knowledge. For example, it requires exponential gains in computation to see linearly further ahead in chess moves. Indeed, with exponential gains in computation, we’ve seen linear gains in machine chess ratings. We’ve also seen linear gains in pattern recognition performance (e.g., speech recognition accuracy) from exponential gains in processor speed, memory capacity, as well as the size of the research databases (of speech samples to train on). There are many examples of inherently exponential problems for which exponential grains in the power of the information processing technology produce linear gains in performance and/or knowledge.

So in these cases, linear increments to k produce exponential grains in information processing which in turn produce linear gains to k.

My sense is that it is difficult to justify on theoretical grounds having the rate of increase of knowledge be equal to the "size" of knowledge raised to a power greater than 1.

However, I do think it is feasible to justify separate components of growth, some (or one) of which have an exponent of one, and some (or one) of which have a log exponent. The analysis above points to a log exponent. I believe we can justify that the "value" of a typical network or of a database does not expand proportional to its size, but to the log of its size. For example, the value of the Internet to me does not double if the number of users double. Rather, its value to me increases in a logarithmic fashion. The exponent of 1 comes from the number of users themselves. Having twice as many users means it is serving twice as many people. So the overall value of the network = n * log n (n people served times the value of the network to each person = log n). This varies from Metcalfe’s formula that the value of a network = n ^ 2.

Having the exponent be 1 + log(x) (or rather 1 + log (k) since we are talking about "knowledge") results in double exponential growth. You described this as "vastly fast," but the speed would depend entirely on the constants, which we’ve been leaving out here for simplicity sake.

What does the data show? Both your graph and mine show double exponential growth. On the other hand, we don’t have enough data points to rule out the possibility of an exponent greater than 1, so the data does not definitively settle the issue.

I’m going to work on this some more, including trying to determine values for the unstated constants, fitting the available data to these formulas. If the "greater than exponential" performance of the data results from an exponent greater than 1, what is the exponent and when does it blow up to a singularity?

My belief is that we can justify a double exponential. The data shows this, and I believe that a careful consideration of the theoretical basis for the formulas will justify this as well. That, of course, is profound enough. Double exponential growth still results int he conclusion that our technology will be many orders of magnitude more powerful than our biological intelligence within a small number of decades.

It is hard to explain how we could get infinite knowledge, or infinite information processing, from a finite world. Then again, maybe the world isn’t finite. If every particle in our universe were another universe, and if every particle in those tiny universes were universes, etc., then infinite knowledge and computational capacity could be feasible.

Ray

10/19/03 Hans Moravec replied:

Both the polynomial and log polynomial curves with large n are essentially indistinguishable from the singularity-free limiting cases as long as t<<n.

Only as t nears n do they begin to dramatically differ.

If the universe is really finite in resolution and extent, then it has only a finite number of distinct states, so can never host a truly infinite process (and must eventually return endlessly to the same states).

On the other hand, if the storage space can be extended indefinitely, as with a Turing machine tape, a computation could make deductive discoveries endlessly.

But I think Vernor doesn’t suggest an actual infinity, just that when AI takes over in its own further advancement, each doubling in the rate at which AIs think will result in a doubling in the rate of the advancement (rather than today, where the constant rate of human thought almost fixes the rate of advancement). Ungoverned AI progress will resemble a singularity as far as human scope and time scale are concerned, and result in a new regime with properties that are not simply an extrapolation of what happened before the "singularity".

I think trying to time the appearance of runaway human-free (or human time-scale-free) AI may be an interesting exercise.

All the existing human-paced curves will surely become obsolete when that happens, even if there is some acceleration today from computer tools. It still takes hours, days and years for new ideas to penetrate human skulls&#8212that changes when accelerating AI runs the show.

10/19/03, Ray Kurzweil replied:

Hans,

You write:

"Both the polynomial and log polynomial curves with large n are essentially indistinguishable from the singularity-free limiting cases as long as t<<n. Only as t nears n do they begin to dramatically differ."

RAY: Yes, so we cannot distinguish between these from the historical data. Either case is consistent with the data we can observe to date. I believe that a theoretical derivation of a formula that makes sense would support the rate of change of k = k ^ 1 + log (k).

You write:

"If the universe is really finite in resolution and extent, then it has only a finite number of distinct states, so can never host a truly infinite process (and must eventually return endlessly to the same states).

On the other hand, if the storage space can be extended indefinitely, as with a Turing machine tape, a computation could make deductive discoveries endlessly."

RAY: Agreed.

You write:

"But I think Vernor doesn’t suggest an actual infinity, just that when AI takes over in its own further advancement, each doubling in the rate at which AIs think will result in a doubling in the rate of the advancement (rather than today, where the constant rate of human thought almost fixes the rate of advancement). Ungoverned AI progress will resemble a singularity as far as human scope and time scale are concerned, and result in a new regime with properties that are not simply an extrapolation of what happened before the "singularity"."

RAY: Yes, I understand that, and I use the term "singularity" in this sense, that is a "rupture" in the fabric of human history caused by the cycle of strong AI (nonbiological intelligence) having access to its own source code and improving it.

Our dialogue on the possibility of a true mathematical singularity is somewhat of a side issue, although an interesting one. However, I am not definining the technological "singularity" as a mathematical singularity, but (like Vinge) using the word as a metaphor for an event with an event horizon beyond which it is difficult to penetrate (analogously with a black hole).

I see this eventuality as more of a continuity with biological intelligence. In other words, I see a gradual shift from the dominance of biological intelligence to nonbiological. Our biological intelligence is already greatly enhanced by our tools. We are reverse engineering the human brain, and will thereby gain access to our own source code. These insights&#8212combined with our traditional mode of AI experimentation that is not dependent on emulating the principles of operation of the human brain&#8212will enable us to create strong AI as well as improve our own cognition (using noninvasive, nanobot-based, widely distributed, neural implants).

You point out that any improvement of our own cognition is a very temporary situation since the nonbiological portion of our intelligence will within a brief period of time greatly outstrip the biological portion.

I agree with this, but point out further that the nonbiological intelligence we are creating is an expression of our human civilization. I don’t consider "human" to imply "biological." Our species is specifically that species that seeks to improve its own condition and to extend beyond its limitations. Note that the subtitle to my upcoming book "The Singularity is Near" is "When Humans Transcend Biology."

You wrote:

"I think trying to time the appearance of runaway human-free (or human time-scale-free) AI may be an interesting exercise."

RAY: Yes, that is a major focus of this upcoming book. I believe we will demonstrate strong AI (passing the Turing test) by 2029, but that does not immediately mean hyper development speeds. Passing the turing test simply means equaling human intelligence. Human intelligence, after all, is not necessarily all that impressive. If you gathered 100 random people from your local shopping mall, you have 100 entities that can presumably pass a Turing test, but they would not be very effective at improving human intelligence even if they did have access to their own source code.

I see these trends consolidating in the 2030s and that by the 2040s we will see changes that cannot be followed by unenhanced human intelligence. That is what I regard as the Singularity and I place it in that decade.

You write:

"All the existing human-paced curves will surely become obsolete when that happens, even if there is some acceleration today from computer tools. It still takes hours, days and years for new ideas to penetrate human skulls&#8212that changes when accelerating AI runs the show."

RAY: No, I disagree. If you look at my curve (which corresponds closely to what you have derived) for the 2040s and beyond, you see rates of change that are so fast that they could not possibly be understoood, let alone governed, by unenhanced human biological intelligence. Indeed, this a point (mistakenly) made by some critics&#8212that these curves cannot survive to the 2030s and 2040s because it would be impossible for humans to make changes that fast.

So the take over of continued accelerating technological progress and exponential growth of knowledge by nonbiological intelligence is necessary for this to happen. Conversely, these curves are feasible because, and only because, of the impending dominance by nonbiological intelligence (strong AI). So it is not the case that our current curves become obsolete. Rather, the "singularity" is what enables these curves to continue.

Ray

10/19/03 Hans Replied:

> a formula that makes sense would support

> the rate of change of k = k ^ 1 + log (k).

You mean dk/dt = k ^ (1 + log (k))

dsolve({ diff(x(t),t) = x(t) ^ (1+log(x(t))), x(0)=1 } ,x(t));

x(t) = exp(RootOf(erf(_Z)*Pi-2*sqrt(Pi)*t))

(RootOf means the solution for _Z of erf(_Z)*Pi-2*sqrt(Pi)*t = 0, or erf(_Z) = 2*t/sqrt(Pi).

erf has a range between and asymptotic to -1 and 1. Thus the solution has a singularity when 2*t/sqrt(Pi) = 1, i.e at t = sqrt(Pi)/2 )

Thus this formula does have a near singularity. Not surprising since log(k) > 0 so the original formula has an exponent of k >1.

If you really meant dk/dt = k + log(k) then, of course the formula is essentially linear (<2k) and the solution is between t = exp(k) and t = exp(2*k) (actual solution is a complicated RootOf)

10/19/03 Ray Replied:

I don’t mean: dk/dt = k ^ (1 + log (k)) since (as you note) that is an exponent greater than 1 which as we noted will blow up.

What I meant was: dk/dt = k*(1+log(k)) = k + k * log(k). This has the linear portion plus a linear portion accelerated by the log of knowledge. As I noted k * log(k) is a reasonable value function for a network of k nodes. Sorry I meant to write * instead of ^ and had implied the parentheses.

10/19/03 Hans replied:

Tickling the tail of the singularity, I see

dsolve({diff(x(t),t)=x(t)*(1+log(x(t)))*(1+log(1+log(x(t)))),x(0)=1},x(t));

gets

x(t) = exp(exp(exp(t)-1)-1)

raising the log log term to (n+1)/n power brings in a singularity at t=n as before. So instead layer on another log in the next term

Let lp(x) = 1+log(x)

dsolve({diff(x(t),t)=x(t)*lp(x(t))*lp(lp(x(t)))*lp(lp(lp(x(t)))),x(0)=1},x(t));

-> x(t) = exp(exp(exp(exp(t)-1)-1)-1)

Seems to be a pattern here.

Makes me wonder what the limit of

x*lp(x)*lp^2(x)*lp^3(x)*lp^4(x)… might be

a function still barely faster than linear, yet only infinitesimally away from triggering a singularity.