I am thinking of writing an artificial intelligence to compose music. Here are some design ideas.
The idea is that he starts with nothing but an algorithm, generates music, and then learns from his own experiences to become better at generating music. He follows the same process that the history of music followed: semi-random experimentation guided by experience and evolving over time.
For simplicity, he will write piano pieces. There needs to be a representation for arbitrary-sized phrases of music, along the lines of MIDI.
There will be a “corpus” of already generated phrases which were judged to be good. The corpus is referenced during the process of composition, and changed by the process of composition.
There will be “concepts,” which describe properties of phrases. For instance, “major third” could be a concept; as could “accelerando.”
Concepts will be predicates in first-order fuzzy logic, which a given phrase can satisfy or not satisfy. There will be built-in primitive concepts (e.g., “C5,” “velocity 50”), and it will be possible for the AI to think of arbitrary concepts using a logical language.
There will be a “language,” which is the set of concepts that are applied in composition. The language evolves over time.
Phrases will be judged as good or bad according to a judgment metric. The metric which I am currently thinking of has four components: consonance, novelty, richness, and unity.
Consonance is a measure of the ground-level properties of music which make it pleasant or jarring as we perceive it. The most obvious component of this is harmonic (y-axis) consonance. I also want to define something like consonance for the rhythmic (x-axis) and melodic (x+y-axis) properties of music.
Novelty is a measure of how similar the phrase is to other phrases (of similar length) which exist in the corpus. Similarity is measured by conceptual closeness. Less similarity is better. The novelty of the sub-phrases is part of this measure.
Richness is a measure of how many concepts the phrase satisfies. More is better. The richness of the sub-phrases is part of this measure.
Unity is some measure of how well the sub-phrases fit together. I don’t know what that will look like.
So there is one fixed metric (consonance), and three metrics (novelty, richness, and unity) which are dependent upon the corpus and the language, and so evolve over time.
He generates phrases by some as yet unknown means, drawing on the language and the corpus, and incorporating randomness. He judges the generated phrases, and puts the good ones into the corpus.
The corpus shall have a limited size, and periodically the worst phrases will be culled from it.
New concepts will also periodically be generated, by some as yet unknown means. The simplest thing would be to make random variations on the concepts in the language and judge them — i.e., a genetic algorithm.
Concepts also have a judgment metric. The language, like the corpus, has a limited size, and periodically has the worst concepts culled from it. The judgment metric I am thinking of right now has three components: simplicity, usefulness, and distinctness.
Simplicity is a measure of how many moving parts the concept has. If a concept incorporates other concepts into its definition, the complexity of those concepts is not part of the concept’s complexity.
Usefulness is a measure of how many times the concept appears in the corpus, and how many times another concept uses it in its definition. More is better.
Distinctness is a measure of how different the concept is from every other concept in the language. It is measured by statistical anti-correlation with other concepts in the corpus.
When he runs, the process will look like this. Generate phrases; update the corpus. Generate concepts; update the language. Repeat forever.