Frequency modulation (FM) is a synthesis technique based on the simple idea of periodic modulation of the signal frequency. That is, the frequency of a carrier sinusoid is modulated by a modulator sinusoid. The peak frequency deviation, AKA depth of modulation, expresses the strength of the modulator’s effect on the carrier oscillator’s frequency. FM synthesis was invented by John Chowning (1973), and became very popular due to its ease of implementation and computationally low cost, as well as its (somewhat surprisingly) powerful ability to create realistic and interesting sounds.
To begin with, let’s look at the equation for a simple frequency controlled sine oscillator. Often, this is written
\[y(t) = A \sin(2 \pi f t)\]
where \(f\) is the frequency in Hz. However, this only works for fixed frequency and amplitude. To deal with a time-varying frequency, we must integrate the frequency function \(f\) to determine the accumulated phase at time \(t\):
\[y(t) = A(t) \sin (\int_{0}^{t} 2 \pi f(x) {\rm d}x) \label{fmint}\]
Frequency modulation uses a rapidly changing function
\[f(t) = C + D \sin(2 \pi M t)\]
where \(C\) is the carrier, a frequency offset that is in many cases is the fundamental or “pitch”. \(D\) is the depth of modulation that controls the amount of frequency deviation (called modulation), and \(M\) is the frequency of modulation in Hz. Plugging this into equation [fmint] and simplifying gives the equation for FM:
\[f(t) = A \sin (2 \pi C t + D \sin (2 \pi M t))\]
Note that this equation is not exactly right. We assume that the phase of the modulation does not matter. Thus, while the integral of the \(\sin\) function is \(\cos\), we keep the \(\sin\) term because \(\sin\) is the same as \(\cos\) with a phase shift. In practice, the phase can make a subtle difference, but nearly everyone ignores it.
\(I = \frac{D}{M}\) is known as the index of modulation. When \(D \ne 0\), sidebands appear in the spectra of the signal; above and below the carrier frequency \(C\), at multiples of \(\pm M\). In other words, we can write the set of frequency components as \(C \pm k M\), where k=0,1,2,.... The number of significant components increases with \(I\), the index of modulation.
According to these formulas, some frequencies will be negative. This can be interpreted as merely a phase change: \(\sin(-x) = - \sin(x)\) or perhaps not even a phase change: \(\cos(-x) = \cos(x)\). Since we tend to ignore phase, we can just ignore the sign of the frequency and consider negative frequencies to be positive. We sometimes say the negative frequencies “wrap around” (zero) to become positive. The main caveat here is that when frequencies wrap around and add to positive frequencies of the same magnitude, the components may not add in phase. The complexity of all this tends to give FM signals a complex behavior as the index of modulation increases, adding more and more components, both positive and negative.
The human ear is very sensitive to harmonic vs. inharmonic spectra. Perceptually, harmonic spectra are very distinctive because they give a strong sense of pitch. The harmonic ratio [Truax 1977] is the ratio of the modulating frequency to the carrier frequency, such that \(H=\frac{M}{C}\). If \(H\) is a rational number, the spectrum is harmonic; if it is irrational, the spectrum is inharmonic.
If \(H=1\) the spectrum is harmonic and the carrier frequency is also the fundamental, i.e. \(F_0 = C\). To show this, remember that the frequencies will be \(C \pm k M\), where k=0,1,2,..., but if \(H=1\), then \(M=C\), so the frequencies are \(C \pm k C\), or simply \(k C\). This is the definition of a harmonic series: multiples of some fundamental frequency \(C\).
When \(H = \frac{1}{m}\), and \(m\) is a positive integer, \(C\) instead becomes the \(m\)’th component (harmonic) because the spacing between harmonics is \(M = C/m\), which is also the fundamental: \(F_0 = M = C/m\).
With \(H=2\), we will get sidebands at \(C \pm 2 k C\) (where k=0,1,2,...), thus omitting all even harmonics - which is ideal for modeling a clarinet.
If \(H\) is irrational, the negative frequencies that wrap around at 0 Hz tend to land between the positive frequency components, thus making the spectrum denser. With \(H=\frac{1}{m}\), where \(m\) is a positive irrational number, the harmonics will cluster more and more around \(C\) as \(m\) increases (because \(M\) will decrease and so will the spacing between components); yielding sounds that have no distinct pitch and that can mimic drums and gongs.
The sidebands infused by FM are governed by Bessel functions of the first kind and \(n\)th order; denoted \(J_n(I)\), where \(I\) is the index of modulation. The Bessel functions determine the magnitudes and signs of the frequency components in the FM spectrum. These functions look a lot like damped sine waves, as can be seen in Figure 1.
 
    A few insights as to how Bessel functions can help explain why FM synthesis sounds the way it does:
\(J_0(I)\) decides the amplitude of the carrier.
\(J_1(I)\) controls the first upper and lower sidebands.
Generally, \(J_n(I)\) governs the amplitudes of the \(n\)th upper and lower sidebands.
Higher-order Bessel functions start from zero more and more gradually, so higher-order sidebands only have significant energy when \(I\) is large.
The spectral bandwidth increases with \(I\); the upper and lower sidebands grow toward higher and lower frequencies, respectively.
As \(I\) increases, the energy of the sidebands vary much like a damped sinusoid.
The index of modulation, \(I=\frac{D}{M}\), allows us to relate the depth of modulation, \(D\), the modulation frequency, \(M\), and the index of the Bessel functions. In practice, this means that if we want a spectrum that has the energy of the Bessel functions at some index \(I\), with frequency components separated by \(M\), then we must choose the depth of modulation according to the relation \(I=\frac{D}{M}\) [F. R. Moore 1990]. As a rule-of-thumb , the number of sidebands is roughly equivalent to \(I + 1\) . That is, if \(I = 10\) we get \(10 + 1 = 11\) sidebands above, and 11 sidebands below the carrier frequency. In theory, there are infinitely many sidebands at \(C \pm k M\), where k=0,1,2,... if the modulation is non-zero, but the intensity of sidebands falls rapidly toward zero as \(k\) increases, so this rule of thumb considers significant sidebands.
In Nyquist we can use the built-in function
 fmosc(pitch, modulation, table, phase) 
for FM synthesis - about which the manual says: “Returns a sound which is table oscillated at pitch plus modulation for the duration of the sound modulation.” The table and phase parameters are optional and often omitted: the default table is a sinusoid, and the initial phase generally does not change the resulting sound.
When you create an FM instrument, keep in mind exactly how the modulation parameter given to fmosc() relates to the FM equation, \(f(t)=A \sin(2 \pi C t + D \sin(2 \pi M t))\). Namely, that modulation denotes the term, \(modulation = D \sin(2 \pi M t)\).
Produce a harmonic sound with about 10 harmonics and a fundamental of 100 Hz. We can choose \(C = M = 100\). Since the number of harmonics is 10 we need 9 sidebands, and so \(I + 1 = 9\) or \(I = 8\). \(I=\frac{D}{M}\) or \(D=I M\), so \(D = 8 * 100 = 800\). Finally, we can write fmosc(hz-to-step(100), 800 * hzosc(100)).
Figures 1 & 2 show examples of FM signals. The X-axes on the plots represent time - here denoted in multiples of \(\pi\).
 
    