In this post I describe a metric for measuring harmonic consonance in the music AI. (Actually, the measure is of dissonance. We think of maximum consonance as zero dissonance.)

The assumption is that what we perceive as consonance and dissonance has simple mathematical roots. This is easy to believe. If one looks at the waveforms of consonant versus dissonant sounds, they differ in a consistent way. Furthermore, we know that the dissonant intervals have more complex ratios than the consonant intervals. All of this suggests the possibility of mathematically quantifying consonance and dissonance. I shall give an equation which I think does this.

That said, I do not think that this measure perfectly equals what humans perceive as consonance/dissonance. I think it lines up quite well; but I imagine that there are little quirks of our psychology which cause our perceptions to be at variance from the mathematical idealization of consonance/dissonance. In particular, I think that our social conventions surrounding music affect our perceptions of consonance/dissonance.

For instance, the model I give makes the diminished fifth significantly less dissonant than I would have expected it to be relative to the other intervals. I think that this is because Western harmony is constructed such that it almost never uses the diminished fifth, and so our social conventions depart from the mathematical idealization in this case.

Dissonance always involves “beating:” the interaction of frequencies which results in an irregular pattern of vibration. We can quantify the amount of beating by measuring the amount of time it takes for the pattern of vibration to repeat itself. We can measure this by taking the least common multiple of the wavelengths of the notes involved.

So suppose that there are n notes playing at a given moment, which have wavelengths w1, w2, …, wn. Then the dissonance D is, as a first approximation:

D = LCM(w1, w2, …, wn).

In order for this formula to work, the wavelengths must be rational numbers. We use just intonation for this purpose. The wavelengths across the piano keyboard are:

C0 = 1

Db0 = 15/16

D0 = 8/9

Eb0 = 5/6

E0 = 4/5

F0 = 3/4

Gb0 = 5/7

G0 = 2/3

Ab0 = 5/8

A0 = 3/5

Bb0 = 4/7

B0 = 8/15

C1 = 1/2

Db1 = 15/32

D1 = 4/9

Eb1 = 5/12

E1 = 2/5

F1 = 3/8

Gb1 = 5/14

G1 = 1/3

Ab1 = 5/16

A1 = 3/10

Bb1 = 2/7

B1 = 4/15

C2 = 1/4

etc.

This formula has the advantage that it results in lower notes contributing more dissonance than higher notes. This aligns with practical experience; a major third between C0 and E0 is much more dissonant than a major third between C5 and E5.

The first problem with this formula is that it does not take account of the fact that different notes may have different volumes. This is simple enough to fix. Let v1, v2, …, vn be the volumes of the different notes. Then:

D = root(v1 * v2 * … * vn, n) * LCM(w1, w2, …, wn)

(To be clear, root(a, b) is the bth root of a.)

There is still a problem with this formula, which is that it assumes that the instrument is tuned in just intonation. In fact it is tuned in 12-TET. This means that intervals such as C5-G5 are more dissonant than this formula predicts, and intervals such as C#5-F#5 are less dissonant. The latter type of problem is more serious, since with this formula, the AI would think that it could not safely use the interval C#5-F#5, whereas in fact it can.

And that is about where I am at. I don’t yet know how to translate this formula to work with 12-TET. It can’t be the same formula, because the least common multiple operation only works on rational numbers, and the wavelengths in 12-TET are irrational numbers. So my next task is to adapt this formula to 12-TET.