Let us consider an AI programmed to find new mathematical truths. Its utility function — call it T — says, of an arbitrary mathematical statement, whether or not it is a new mathematical truth. This involves two steps:

1. Saying whether or not the statement is true.

2. Saying whether or not the statement is new.

(2) is straightforward to implement. We simply need to maintain a database of known mathematical theorems, and check statements against this database. What about (1)?

The criterion for mathematical truth is the existence of a logically valid proof proceeding from axioms to the statement in question. So T will be based on this criterion.

But we can’t write T. This is true because of Godel’s incompleteness theorem. Godel’s incompleteness theorem implies that there is no consistent set of axioms sufficient to imply all mathematical truths. For every consistent set of axioms, there are statements that are true and not provable by those axioms.

In order to write T, we would need to specify the set of axioms that it is allowed to use. In other words, we would need to specify the set of true axioms — the set of axioms which implies all true theorems. But there is no such set.

We would not necessarily need to write out all of the axioms one by one, and give them as constants in the computer program. At a minimum, we would need to be able to write a function which could say, of an arbitrary mathematical statement, whether or not it was one of the true axioms. But we cannot do this, because there is no such thing as the set of true axioms.

We can write a utility function optimizing for “new mathematical theorems provable under axiom set Y.” For instance, perhaps Y = ZFC. But we cannot write a utility function optimizing for “new mathematical truths.” In other words, we cannot write T.

This has implications not only for mathematical truth in AI, but for truth in AI. Mathematical truth is part of truth in general. So we cannot write computer code that means “truth” without at some point writing computer code that means “mathematical truth.” And we just learned that that code is unwriteable.

This means that we cannot write an AI which includes “truth” in its goal system. It is hard to imagine writing a Friendly AI which doesn’t have truth in its goal system. For instance, how do you write “make yourself more intelligent” without ever referring to truth? Can you define “intelligence” without any reference to truth?

Suppose that we bite the bullet, and build an AI which doesn’t have truth anywhere in its goal system. Instead of “mathematical truth,” its goal system contains “provability under ZFC.” This AI would not know any mathematical facts not provable under ZFC. Could it still go superintelligent? And a related question: would it ever be able to realize that it needed to know about something called “truth?”