# Archive for category AI

### Running Mean Estimates

Here is some statistical theory. I thought it up because it is applicable to an aspect of my AI project. I don’t know much statistics; it seems likely that I’m groping towards something that already exists. But here it is.

Suppose that we have a black box which outputs a series of numbers between zero and one. We don’t know how it works, and we want to determine the mean value of its output.

One way we could do this would be to record a series of outputs, and take their mean. This works if the black box’s behavior is regular; that is, the mean will be the same in the distant future as it is right now. But what if the trends of the black box’s output gradually change over time, so that the mean will be different in the future?

An analogous case would be the mean income of a U.S. citizen. This is a mean which changes over time. How do we deal with this sort of case?

One solution would be to re-calculate the mean periodically, taking a new sample every so often and calculating its mean. This is how we produce statistics about mean income.

But we could also compute the mean dynamically. Returning to our black box, we could come up with a scheme by which every time the black box outputs a number, it adjusts our estimate of the mean slightly. This solution is more elegant in the case of an AI which wants to dynamically monitor the trends in its own internal data structures.

In the case of the black box, how do we compute a running estimate of the mean?

Suppose we imitate the conventional way of estimating the mean. We can do this by keeping track of the number of numbers we have observed (the sample size), and a running sum of all of the numbers we have observed. Every time a new number comes down the pipe, we add one to the sample size, add the number to the running sum, and re-compute the mean as the running sum over the sample size.

The problem with this method is that, as time goes by, each number will have a progressively smaller impact on the estimate of the mean. It doesn’t capture the idea that the mean is changing over time, which demands that we put more weight on recent samples than on samples in the distant past.

Let us consider another method. Suppose we keep a running mean (m). Let p = 0.5 be its initial estimate, before we have observed any numbers. Furthermore, we have a number (w, for “weight”) which says how much impact every sample has on the running mean. w is analogous to the sample size. For instance, w = .05 is analogous to a sample size of 20 (1/.05 = 20).

Every time we observe a new number (n), we say m’ = (1 – w) * m + w * n. m’ is the new mean, and m is the old mean.

Let t be the number of numbers we have observed since we started our running estimate, and n1, n2, …, nt be those numbers. Then we can say that, at any time, m = p * (1 – w)^t + n1 * w(1 – w)^(t-1) + n2 * w(1 – w)^(t-2) + … + nt * w(1 – w)^(t-t).

Suppose that w is set to a value analogous to a statistically significant sample size. Then, if the real mean doesn’t change over time, this method of estimating the mean will always keep the estimate near the real mean; m will never change significantly.

But this method will capture fluctuations in the real mean, keeping the estimate near the real mean as the real mean changes. It will do so more in a more fine-grained fashion with larger values of w, or a more coarse-grained fashion with smaller values of w.

The biggest questions for this method are: how do we choose a value for p (the initial estimate of the mean), and how do we choose a value for w (the amount of impact each new number has on our mean estimate, which determines the coarseness with which we follow fluctuations in the mean)?

These are two very difficult questions. The first question is a basic problem of Bayesian statistics: how do we choose our prior probabilities? The second question I haven’t seen before, but it also seems fairly enigmatic.

### Further Work on Music AI

I do not intend to make any more blog posts on music AI. Further work will occur in the design document, of which the current draft can be found here:

https://antitheology.files.wordpress.com/2012/01/mary.pdf

### The Possibility, Feasibility, and Utility of Music AI

Two challenges made me abandon music AI for the past few weeks.

The first challenge was that, depending on the metaphysical nature of things, music AI might not be possible (specifically if reductionism is false). And, depending on how complicated a program needs to be to make good music, it might be possible but not feasible.

The second challenge was that music AI might be possible and feasible, but not worth the opportunity costs. Specifically, I think that what I should do is seek enlightenment, and writing music AI is not seeking enlightenment.

As far as the first challenge goes, I’ve come to the conclusion that continuing to doubt my philosophical views is only hurting me, and not helping anybody. So for now, I believe that reductionism and materialism are false.

Here is not the place to state what I believe instead; and I can’t do so adequately, because I haven’t finished clarifying my thoughts on the topic. But I do need to say what is true about the possibility of music AI under my metaphysics.

Under my metaphysics, “beauty” is not reducible to any finite mathematical formula. So writing a generalized beauty-creating computer progam is not possible.

But it is clear and obvious that some aspects of beauty can be mathematically quantified. The Baroque and Classical era defined elaborate rules which music should follow; and these rules partially select a set of beautiful pieces of music from the set of possible pieces of music.

What about the qualitative aspect of beauty? What about that feeling of beauty which feels purely non-logical?

That feeling is an illusory perspective on an objective reality. Perhaps, at least in many cases, there is an illusory logical perspective on the same objective reality. This would be the same position as reductionism, except that it does not privilege the logical perspective as being somehow more real than the qualitative perspective.

My own hypothesis is that we can define music-making rules which pick out particular subsets of the space of beautiful music. We can define rules which produce some beautiful music, but not all beautiful music. And we can continue making better and better rule sets infinitely, without ever arriving at a final rule set.

This is similar to what is true about math under Godel’s incompleteness theorem. Under Godel’s incompleteness theorem, every axiom set yields some truths, but not all truths. And we can continue constructing better and better axiom sets infinitely, without ever arriving at a final axiom set.

So if we are writing a music AI, the task we are considering is the task of defining and implementing a rule set which picks out some nice subset of the space of beautiful music.

I feel pretty confident that this is possible and feasible. The strongest evidence that it is possible and feasible is the existence of EMI. My project differs from EMI mainly in the details of the composition algorithm, and in the fact that EMI works from an existing corpus written by humans, whereas I intend to write an AI that does completely “de novo” composition.

So that leaves us with the second problem: should I do this? I think that what I should do is seek enlightenment, and if this project doesn’t contribute to that, then I shouldn’t do this project.

What I’m realizing, however, is that seeking enlightenment needs to involve having experiences in the everyday world and distilling new insights from those experiences. The everyday world is this huge, apparently chaotic flux of experience which helps us to discover things we didn’t know about. Assembling a huge quantity of those things eventually amounts to new fundamental understandings about the nature of enlightenment and the most efficient ways of seeking it.

So I’ve come to conclude that I should engage in and care about things that, in a superficial sense, have nothing to do with enlightenment. This means that working on music AI is not incompatible with the idea that what I should do is seek enlightenment.

So it seems that music AI is possible, feasible, and something I potentially should do. There are no fundamental philosophical obstacles to my working on it.

### Transformations

One of the functions the music AI needs is the ability to compose new phrases. How would it go about this?

One strategy would be completely “de novo” composition, just randomly generating a sequence of notes. However, this is very unlikely to yield beautiful music. So we will eschew this strategy.

Another strategy would be “de novo” synthesis from concepts. We take an existing phrase concept, and randomly synthesize a phrase which satisfies that concept. This strategy is likely to do reasonably well.

A third strategy would be mutation of existing phrases. We take a good phrase, and modify it slightly. There are two obvious ways we could do phrase mutation: preserving concepts, and mutating concepts. In the first case, we mutate the phrase while keeping its concepts the same. In the second case, we mutate the concepts along with the phrase. (First we mutate the concepts, then we mutate the phrase.)

Are these all of the possible mutation strategies? No. In AI, a good rule is: never say, “the only options.” There is always another way to do something. Only the closest other way may be a large inferential distance away from the ideas we already have. In that case we can say, “the only obvious options.” And I think that these phrase mutation strategies are the only obvious options.

Besides phrase mutations, we also need concept mutations — because concepts are supposed to evolve. So we can ask the general question, “how do we mutate these data structures?”

An open-ended strategy would be as follows. For a data structure we wish to mutate, we define a set of transformations: i.e., functions over the data structure. Then to mutate a data structure, we simply apply a series of transformations. When a transformation takes more than one data structure as an argument, we can call it a “combination.” An example of a phrase combination would be, “sequentially concatenate these phrases.”

We have the option to evolve the set of transformations. This process would use the same fitness metrics as concept evolution: simplicity, usefulness, distinctiveness. (“Usefulness” is how likely the transformation is to yield a good data structure, when it is applied.)

Of course, to do that, we need to define a set of transformations over transformations. So at some point, the obvious option is to have a fixed set of transformations for the more “meta” data structures.

We need transformations over all of the things that we want to evolve. The things that we want to evolve include:

* Phrases.
* Phrase concepts.
* Phrase transformations.

So we have an evolved set of phrase transformations, and fixed sets of transformations for concepts and phrase transformations.

A fixed set of transformations seems potentially limiting. But I think the only thing it affects is how quickly the AI comes up with good new ideas; it couldn’t absolutely prevent the AI from coming up with any given idea.

At least, that is true if the set of transformations is “complete.” A set of transformations over a data structure A is “complete” if, for every a and b in A, there is some sequence of transformations which turns a into b. When we have fixed sets of transformations over data structures, we will want those transformation sets to be provably complete.

### Concept Language Redux

Now it is time to take another go at the concept language.

What is a concept? A concept represents a subset of the space of possible phrases. It describes this subset in the abstract, using mathematical rules, rather than by enumerating all of its members. Concepts represent compositionally useful patterns that appear in music. They let the AI think abstractly about music.

There are at least two things you can do with a concept. You can use it to “analyze” a phrase: to say, of an existing phrase, whether or not the phrase is an example of the concept. You can use it “synthesize” a phrase: to create a new phrase which is an example of the concept.

These operations will be more complex than that in practice, because we’ll likely need to do things like take an existing phrase and modify it to satisfy an additional concept, and somehow merge the set of concepts into an aggregate structure to check a phrase against many concepts simultaneously.

Also, we will have to evolve concepts, varying them semi-randomly and measuring them against a fitness function.

Concepts are defined using a “concept language.” This language is based on the analytic/synthetic concept language which I developed previously for natural numbers. We are simply extending that language from concepts of natural numbers to concepts of phrases.

We shall reiterate the definition of a phrase. A phrase is a set of notes, which are grouped in a topological fashion into sub-phrases. The phrase as a whole, and each sub-phrase, can satisfy various concepts. A phrase has the following components:

1. N: a set of notes, each of which has pitch, velocity, position, and duration.
2. T: a topology over N.
3. For every set t in T, a set C(t) of concepts which t satisfies. (Prune members of t which satisfy no concepts.)
4. A record of N’s minimum position; its maximum position; its duration; its minimum pitch; its maximum pitch; and its pitch range.

And we shall reiterate the definition of our concept language over natural numbers. A number concept is:

* Y or N, one of the primitive concepts.
* The sequence operation, [C1, C2, …].
* The repetition operation, (C * n).

The repetition operation has a numerical argument (n) which is defined using a restricted function language, with the following syntax:

* Numerical constants: 0, 1, 2, …
* Variables: a, b, c, …
* Operators: +, -, *, /, %.
* Conditionals: if (P1) { E1 } else if (P2) { E2 } … else { Em }.

The Ei in the conditionals are written in the function language. The Pj are written in the condition language, which consists of comparisons (=, <, <=) joined by boolean operators (and, or, xor, not). The arguments of the comparisons are written in the function language.

The variables which are available depend on the surrounding context. We shall define what variables are available in number concepts when they are part of phrase concepts.

A phrase concept consists of a score concept, a topology concept, and a set of free variable definitions. The score concept says what notes the phrase contains. The topology concept says how these are grouped into sub-phrases.

The free variables are numbers whose values are defined by number concepts. These variables can be referenced in the functions of repetition operators within the score concept and the topology concept. The free variables are named f1, f2, ….

The score concept language has the following elements:

* Note concepts: note(P, V, T, D)
* Sequencing: [C1, C2, …]
* Repetition: (C * n)

A note concept specifies a note, in the abstract. P, V, T, and D are number concepts constraining the pitch, velocity, position, and duration of the note, respectively.

The sequencing and repetition operations have slightly different meanings than they do in the number concepts. The sequencing operation means, “the notes yielded by C1, C2, … are all in the score.” The repetition operation means, “the score contains the notes yielded by repeating C n times.”

In the score, each note has a number, such that they are in a sequence counting up from zero. The notes are computed in order, from first to last. The functions in a given note concept have access to the following variables:

* f1, f2, …: The free variables.
* r1, r2, …: The index numbers in each of the repetitions that this function is inside. r1 is the innermost repetition, r2 is the one outside that, etc.^
* i: The index number of the note that this function is part of.
* p1, p2, …, p{i-1}: The pitches of all preceding notes.
* v1, v2, …, v{i-1}: The velocities of all preceding notes.
* t1, t2, …, t{i-1}: The positions of all preceding notes.
* d1, d2, …, d{i-1}: The durations of all preceding notes.

^ So for instance, the expression ([N, (Y * r1)] * 4) yields NNYNYYNYYY.

The functions in the score concept, but not in a note concept, have access to r1, r2, …, and f1, f2, ….

The topology concept language has the following elements:

* Sub-phrase concepts: sub(D, C1, C2, …)
* Sequencing: [C1, C2, …]
* Repetition: (C * n)

A sub-phrase concept specifies a subset of the notes (D), and a set of concepts (C1, C2, …) which that subset satisfies. D is a number concept which is supposed to range over [0, n); it is the set of numbers which satisfy that concept. C1, C2, … are phrase concepts.

Sequencing and repetition have the same meaning that they have in score concepts.

The variables to which functions in the topology concept have access are r1, r2, …, f1, f2, …, and n, the total number of notes in the score.

So there it is. It’s rather complex, but I think it’s complex in the right way, and that’s what counts.

### Concept Language: Multiple Variables

Phrases have many variables, and the values of these variables need to be able to have mutually interacting constraints. As a stepping stone towards this, let us define concepts over not only single natural numbers, but pairs, triples, etc., of natural numbers. These concepts will define sub-sets of N, N x N, N x N x N, etc.

An appropriate visualization is an n-dimensional figure demarcated into blocks, with some of these blocks being white (satisfies the concept) and others being black (does not satisfy the concept).

So let us consider a concept over variables x, y, z, …

As before, the two primitive concepts are Y and N.

Then for each of the variables we have a sequencing operator: x[C1, C2, …], y[C1, C2, …], z[C1, C2, …], … To make more sense of this, let us consider a two-dimensional concept with variables x and y. Then x[C1, C2, …] means “horizontal sequence,” and y[C1, C2, …] means “vertical sequence.” By combining horizontal and vertical sequences, we can make arbitrary two-dimensional patterns.

Then for each of the variables we have a repetition operator: x(C * n), y(C * n), z(C * n), … Only now n does not have to be a constant; it can be a complex expression referring to any variables other than the variable which the repetition operator is over. We will say that this complex expression is a function which can use the following syntax:

* Constants and variables.
* Operators: +, -, *, /, %.
* Conditionals: if (C1) { E1 } else if (C2) { E2 } … else { En }

The conditionals are composed of comparisons (=, <, <=) joined by boolean operators (and, or, xor, not).

If it’s less confusing, we can also factor out the determinants of the three variables. Then the concept has separate expressions for each variable. Those expressions consist of Y/N, sequences, and repetitions whose coefficients are functions over the variables.

Performing analysis on these concepts is a straightforward extension of the method given for one-dimensional concepts.

Performing synthesis to determine a value for one variable is straightforward if we already have fixed values for all of the other variables. If we don’t have such fixed values, then things are trickier.

One solution is to restrict the variables so that they are computed in a specific order. So if we have variables a, b, c, d, in computing a we can refer to b, c, d, in computing b to c, d, in computing c d to d, in computing d to nothing. Then performing synthesis is straightforward. This restriction is probably acceptable.

### Concept Language: Analysis and Synthesis

In pondering the concept language, I realized that I need to be able to do two things with concepts:

1. I need to be able to say, of a phrase, whether or not it matches the concept. (Call this “analysis.”)
2. I need to be able, given a concept, to generate a random phrase which matches the concept. (Call this “synthesis.”)

Analysis is easy to do if concepts are written as logical predicates. On the other hand, then synthesis becomes very difficult to do; in general, generating an example of something matching a logical predicate is computationally intractable.

Synthesis is easy to do if concepts are written as functions (where the range is the set of phrases matching the concept, and each phrase is given an index). On the other hand, then analysis becomes very difficult to do; in general, saying whether or not something is part of the range of a function is computationally intractable.

So what I require is a different concept language, which is neither like predicates nor like functions, in which both analysis and synthesis are computationally practical. In order to penetrate this problem, I’ve decided to begin by focusing on concepts over natural numbers, before generalizing the task to concepts over phrases.

Let us clarify the task. Both functions and predicates denote sets. The set, for a predicate, is the set of things which satisfy the predicate. The set, for a function, is the set of things which the function can yield. We need another way of denoting sets, for which analysis and synthesis are computationally practical. In this case, the sets are subsets of the natural numbers.

The language doesn’t have to be totally general; in fact, this would be a disadvantage. A totally general language would have wacky computational properties (non-halting, etc.), and would probably make various things more difficult. So we need to have the language at the right level of power — neither too powerful nor not powerful enough.

Here is a first attempt at a language with the properties we need. I’m sure that it’s not too powerful; I’m not sure that it’s powerful enough.

The primitive concepts are “Y” and “N.” They mean “this number is in the set,” and “this number is not in the set,” respectively.

There are two operations for constructing new concepts out of existing concepts. The first is “sequence,” which takes a series of concepts and strings them together in sequence on the number line. We will write it [a, b, c, …].

So for instance, [Y, N, Y, Y] means, “0 is in the set; 1 is not in the set; 2 is in the set; 3 is in the set.” The concept is undefined for numbers higher than 3. (Each concept is over a limited range of numbers.)

The second operation is “repetition,” which takes a concept and a numerical constant. It takes a concept and repeats it on the number line. We will write it as (a * n).

So for instance, ([N, N, Y, Y] * 3) looks like this:

0-N 1-N 2-Y 3-Y 4-N 5-N 6-Y 7-Y 9-N 10-N 11-Y 12-Y

In summary, the concept language is:

* Y and N, the primitive concepts.
* The sequence operation, [a, b, c, …].
* The repetition operation, (a * n).

An “empty” concept is N, a sequence whose members are all empty, a repetition whose operand is empty, or a repetition where the number of repetitions is zero. Empty concepts can be used in building concepts, but an empty concept is not itself a concept. We cannot perform analysis and synthesis on it. (In principle we could perform analysis, but say that nothing satisfies the concept. But it doesn’t even make sense to ask to perform synthesis on it.)

With this language it is computationally easy both to determine membership (analysis) and to randomly generate an example (synthesis). Let us consider how to do both of these.

First we need to be able to determine the “length” of a concept: that is, the amount of number line that it covers. For Y and N, this is 1. For sequence, this is the sum of the lengths of the members. For repetition, this is the length of the operand times the number of repetitions.

Let us consider how to generate an example (synthesis). If the concept is a repetition, generate an example of the operand and add it to the length of the operand multiplied by a random number in [0, n). If the concept is a sequence, randomly choose one of the non-empty members, generate an example of it, and add to it the lengths of all of the members preceding this member. If the concept is Y, the example is 0. (We will never be asked to generate an example of N, and this is fortunate because there is no example of N.)

Now let us consider how to determine membership of a number n (analysis).

If the concept is a sequence, subtract the lengths of the members from n until the length of a member is greater than n. n is “inside” that member. Determine membership in that member. If we run out of members, then n is out of range.

If the concept is a repetition, assert that n integer-divided by the length of the operand is less than the number of repetitions; otherwise n is out of range. If n is in range, determine membership in the operand of n modulo the operand length.

If the concept is Y, then n is a member. If the concept is N, then n is not a member.

So this is an example of the sort of thing that we need. A great many subsets of the natural numbers can be expressed in this language; but things like the set of prime numbers, or the powers of two, can’t be expressed. Is this in the right neighborhood of power level? Hard for me to say; but I lean towards yes. The next task is to invent a language like this for phrases.

### Melodic Dissonance

Now I will take a shot at defining melodic dissonance.

The general idea is that we want notes to be in coherent lines. Big jumps are dissonant; small jumps (e.g. whole tones and semitones) are consonant. So dissonance increases with interval; but after a while, the notes start sounding like they are part of separate lines, and the dissonance starts decreasing with interval.

Melodic dissonance is measured between each non-simultaneous note pair of the phrase; so we measure it separately for each combination of two non-simultaneous notes in the phrase, and then take the sum of that. For efficiency we may only measure notes that are sufficiently close to each other. The metric should be defined such that melodic dissonance decreases with the distance between note beginnings. For now we give the general definition which measures the melodic dissonance between two arbitrary notes.

First let us ignore the distance, and pay attention only to the interval. The dissonance is lowest with notes that are very close (a semitone or two). Then it increases as the notes increase in distance, then it decreases as the distance increases further (because the notes are perceived as being part of separate melodic lines). We will say that the melodic dissonance is greatest at a distance of half an octave (a diminished fifth, or six semitones).

Let i be the interval between the two notes. Let p = 6 be the point of peak dissonance. Then the melodic dissonance is:

D1 = k1^i if i

p

k1 is a constant calibrating the speed of dissonance increase before the peak. k2 is a constant calibrating the speed of dissonance decrease after the peak. Reasonable values would be:

k1 = 1.5
k2 = 1.2

This would yield the following dissonance values:

i D1
0 1
1 1.5
2 2.3
3 3.4
4 5
5 7.6
6 11.4
7 7.9
8 6.6
9 6.6
10 5.5
11 4.6
12 3.8

So that is two notes, without taking temporal distance into account. Now let us add temporal distance. Dissonance decreases with distance. So if the temporal distance (in ticks) between the beginnings of two notes is d, then:

D2 = D1 / (k3^d).

k3 is a constant calibrating the speed of dissonance decrease with temporal distance. Reasonable values would be slightly larger than 1.

To calculate the melodic dissonance of the whole phrase, we calculate D2 for every combination of non-simultaneous notes, take the average, and multiply by the length of the phrase. This gives us a result similar to the result we got with harmonic dissonance, where adding complexity to the phrase does not necessarily make it more dissonant, but the dissonance is still proportional to the phrase length.

Alternatively, maybe longer phrases should not be more dissonant. Then the melodic dissonance is the average of all of the D2s, and the harmonic dissonance is not the integral of the instantaneous harmonic dissonances, but the integral over the phrase length.

### Phrases

Previously I said that a phrase is a note or a set of sub-phrases. This gives phrases a hierarchical tree structure. A sub-phrase is a unit which can match a concept. The only problem with this is that it seems restrictive for the AI to be only able to think in trees. We might want to be able to divide the phrase in less regular ways. So the phrase shouldn’t be a tree; it should be a topology.

Let us then take another shot at phrases. A phrase has the following components:

1. N: a set of notes, each of which has pitch, velocity, position, and duration.
2. T: a topology over N.
3. For every set t in T, a set C(t) of concepts which t satisfies. (Prune members of t which satisfy no concepts.)
4. A record of N’s minimum position; its maximum position; its duration; its minimum pitch; its maximum pitch; and its pitch range.

The primitive operations which can be performed on this phrase structure are rather complex. This will probably add expressiveness to the system.

### Beauty and AI

I wrote my previous two posts in part to prepare for a discussion of the utility function of a music AI.

A music AI needs to have a utility function which optimizes for beautiful music. So we need a computable function which says what music is beautiful.

The first problem with this is that beauty feels inherently mysterious. When I introspect on “beauty,” it feels like this magical, irreducible thing which couldn’t possibly be explained in logic. This is related to the problem of qualia; since beauty is, after all, a quale.

If materialism is true, then that intuition is wrong, and beauty is not inherently mysterious. If materialism is false, then that intuition may as well be correct. Let us for the moment assume materialism, and see where it gets us.

There is some circuitry in the brain which corresponds to our perception of beauty. There is some analogy to the beauty function in the human brain.

But it is a truism of truisms that “beauty is in the eye of the beholder.” There is no fact of the matter about how beautiful something is. So there cannot be a single correct beauty function. This makes sense. Natural selection created the beauty function, so it is probably slightly different in everybody.

Also, it is probably affected by culture as well as genes. There could be an underlying circuitry for beauty perception, relatively constant between people, which needs to be “fed” by cultural data to develop fully. To make an analogy, there is relatively constant circuitry which says what is grammatical language, but what the circuitry labels as grammatical depends on what languages you know.

In any case, we are interested in writing a function which does something like the brain’s beauty circuitry. There cannot be a single correct beauty function, because the beauty circuitry varies between people. Furthermore, we probably cannot untangle all of the nuances of the brain’s beauty circuitry; so we can only write an approximation which captures some of the most important and salient features.

Now suppose that materialism is false, and beauty is inherently mysterious and irreducible. We can still approximate musical beauty in a function, because beautiful music follows patterns, and those patterns are correlated with how beautiful the music is. But the limitations preventing us from writing a perfect beauty function are more fundamental. We could speculate that in this case there are an infinite series of progressively better beauty functions, which approach beauty without ever reaching it.