(published as: Roger B. Dannenberg, “Danger in
Floating-Point-to-Integer Conversion,” (letter to editor), *Computer
Music Journal, *vol. 26, no. 2, Summer 2002, p4.)

Most audio programs convert floating point samples to integer samples and write them to a file or an audio output device. I will assume that ‘‘correct’’ behavior is to round to the nearest integer value. Dealing with scale factors and overflow are also important issues, but there is no standard and the best approach may depend on the application. I will limit my discussion to rounding, which is where this bug occurs.

The natural way to implement the conversion is to scale each floating point sample to some appropriate range (-32767 to 32767) and assign it to a signed 16-bit integer as follows:

float f; /* assume -32768 <= f <= 32767 */ int i; /* could also be “shortint” */ i = (int) f; /* “(int) f” means “convert f to an int” */The default float-to-integer conversion in C does not round to the nearest integer, but instead truncates toward zero. That means that signal values in the open interval (–1.0, 1.0) are all converted to zero (0). This interval is twice as large as the interval mapping to any other integer, and this introduces a nonlinear distortion into the signal. This is not just an issue of truncation versus rounding. It is well known that rounding to the nearest integer can be achieved by adding 0.5 and rounding down, but the following C assignment is incorrect:

i = (int)(f + 0.5);C does not round negative numbers down, so values in the interval (-1.5, 0.5) are converted to zero. In contrast, a correct conversion should map only the interval (-0.5, 0.5) to zero. There are several ways to perform rounding for audio, and, surprisingly, proper rounding can be faster than the default conversion in C. The direct implementation is to treat positive and negative numbers as different cases:

float f;/* assume -32768 <= f <= 32767 */ int i; /* could also be “shortint” */ if(f > 0) { i = (int)(f + 0.5); } else { i = (int)(f - 0.5); }This code has the problem of taking a branch, which is very slow relative to arithmetic on modern processors. However, this is a good approach if you can combine the rounding with testing for peak values and clipping out-ofrange values, which also treat positive and negative samples separately. An elegant approach, suggested by Phil Burk, the developer of JSyn and co-developer of PortAudio, is to offset the sample values to make them all positive, perform rounding, and then shift back. Note that I add an extra 0.5 before truncating to simulate rounding behavior:

i = (((int) (f + 32768.5)) - 32768)This also produces correct results. These last two algorithms essentially work around the default C conversion semantics, but unfortunately, the conversion itself is slow in most C implementations. Erik de Castro Lopo describes this in detail and offers solutions (see meganerd.com/FPcast/) that avoid using the default conversion altogether, thereby achieving substantially better performance. Intel offers an optimized signal processing library (developer.intel.com/software/products/perflib /spl/index.htm) that includes fast rounding and conversion functions. Finally, there is an interesting conversion method described on page 91 of Dannenberg and Thompson, ‘‘Real-Time Software Synthesis on Superscalar Architectures’’ (

Roger B. Dannenberg

Carnegie Mellon University

rbd@cs.cmu.edu