Audacity Audio/MIDI Synchronization Tests

Roger B. Dannenberg

September 20, 2017

Here's a test of some new synchronization code in Audacity for MIDI playback.

All tests are on a Dell Ultrabook with Intel CORE i7 2.6GHz processor and ubuntu 14.04 LTS Linux.

Audio is taken directly from the laptop speakers, and recordings are made directly with the laptop microphone.

MIDI is played through a Yamaha USB-to-MIDI interface, and then to an (old) Roland U-220 MIDI synthesizer. The reason for the MIDI hardware is to eliminate higher jitter and latency that might be expected from a software synthesizer. (For example Timidiy has obvious jitter when driven by MIDI, at least with the settings I tried.) Final audio output is through the headphone jacks to a pair of Shure over-the-ear headphones, which in this case are "over the laptop" to get a good signal into the microphone and to eliminate any significant speed-of-sound delay.

The Audacity MIDI Latency preference was set to 10ms. I do not know the true latency or the onset time of the particular sample used. 10ms seems a bit high (I would have guessed 5ms) but it is certainly in the right ballpark. This was adjusted by ear and seems better than 8ms or 12ms, and much better than 0ms, so I think this is correct within a few ms.

Test 1

The first test is with my latest code. Just for safekeeping, the code for this test is AudioIO.cpp and AudioIO.h. This test just plays 10 piano-like notes/sec with audio and MIDI. The input was generated by Nyquist. The code is:

;; lots.sal - make a midi file and audio file with lots of notes
;;
function lots(rate, dur)
  begin
    exec score-gen(save: quote(dense),
                   score-len: round(rate * dur),
                   dur: 3.0 / rate,
                   ioi: 1.0 / rate,
                   pitch: 60 + (sg:count % 12))
    exec score-write-smf(dense, "lots.mid")
    exec s-save(dense, ny:all, "./lots.wav")
  end

exec lots(10, 10)

Here is the MIDI file.

Here is the Audio file.

Results

These were loaded into Audacity and played. During playback, both track play together, then the audio track is soloed, then the MIDI track is soloed, then the audio track is soloed, and then both play together. The soloing is just so you can get a sense of which instrument is MIDI and which is Audio. Here's the recorded result. This sounds quite good. I don't hear any obvious jitter in the MIDI notes (jitter is zero in the audio because it is all sample-accurate and there is no randomness in the synthesized tones), but when I listen to MIDI alone, I suspect there might be some small delays occasionally. I think the JND for jitter for this content would be a bit below 10ms. I suppose I could record just MIDI, put it through some onset detection and measure the actual jitter, but I'm printing a huge amount of debugging info even from the audio callback, so I'm pleased that it sounds anywhere near as good as it does.

Also note there is a 186ms delay before the first sound. This is the actual recorded track. When I recorded, Audacity automatically shifted this track to the left (starting before zero), and did a pretty good job of compensating for the startup latency, but even with compensation, the recorded track was about 40 or 50ms late.

In spite of some fairly wild timing behavior of callbacks at the beginning of audio playback, notice that the first notes sound perfectly synchronized. These are at time=0 on the audio and MIDI tracks, so there is no pre-roll or "warm up" time for Audacity to get in sync.

This might be a fluke in that time=0 could be a special case, and by the time we get to the next notes at time=250ms, we're well into the audio and MIDI streams. Putting notes at 50ms might be even more likely to turn up problems, right? But no, I tried that too and it sounds just as good when I shift all the notes later by 30ms or 50ms. I really expected a problem would show up here, so maybe the fact that audio buffers are getting pre-loaded with zeros in this implementation helps.

Next...

In my next installment, I'll start exploring some unanswered questions about other possible changes (since many have already been made in the main Audacity branch): Avoiding the non-deterministic pre-load of zeros on startup, moving MIDI playback code to the audio callback, real-time audio thread option, fixed buffer sizes in PortAudio, and maybe more.

Test 2

What's happening with underflow/overflow on stream startup? Using the code from Test 1 (except for a minor change to make underflow/overflow messages more findable and detection of null inputBuffer or outputBuffer in the callback), start playback and look for underflow/overflow.

Do the same with simultaneous recording and playback.

Here is the slightly modified AudioIO.cpp and unmodified AudioIO.h.

Results

No overflow/underflow occurred on playback-only. Using this version of AudioIO.cpp, the message

$$$$$$$$$$$$$$$$ CALLBACK STATUS: NULL inputBuffer

is printed on every callback.

When recording, here's a redacted output:

$$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow
*** Callback: t=1.07977 MidiT -998.015 SysMinAudio 1000 PauseT 0 track 0 ***
    tracktime 0 anow 0 EXPECTED ABOUT 1.17267
** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 2048

$$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow
*** Callback: t=1.08024 MidiT 1 SysMinAudio 1.07976 PauseT 0.0464399 track 0 ***
    tracktime 0 anow 0.0464399 EXPECTED ABOUT 1.08024
** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 4096

$$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow
*** Callback: t=1.08032 MidiT 1.001 SysMinAudio 1.0338 PauseT 0.0928798 track 0 ***
    tracktime 0 anow 0.0928798 EXPECTED ABOUT 1.12628
** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 6144

This is happening as the output buffer is being filled with zeros, before playback has actually started (in this version, callbacks happen before returning from AudioIO::StartStream and before mStreamToken = (++mNextStreamToken);).

Underflow happens on the first callback at t=1.07977. If we call that CallbackTime 0, then the other underflows happen at CallbackTime 0.47ms and CallbackTime 0.55ms. During this period 93ms of zero samples are delivered to the output.

Comments

It seems pretty clear that initially, both the input and output buffers are empty. Initial callbacks apparently occur because space is available in the output buffer. After 92ms of audio is delivered to the output in well under 1ms, callbacks seem to be paced by output buffer availability, and no more input underflows occur.

This behavior is repeatable, although in this particular run there was a noticable delay between clicking "start" and the first callback, and this is confirmed by "Callback: t=1.07977" (because these callback times are from a system-based clock that is set to zero near the beginning of AudioIO::StartStream. Usually, the first callback is around t=0.06.

Test 3

What happens when the audio stream token is set before any callbacks? This avoids getting callbacks before the stream token is set, which means that the audio callback just writes zeros into the output buffer, as observed in previous tests (see above).

Here is the modified AudioIO.cpp and AudioIO.h. There are some additional changes there that are described further below.

Results

No overflow/underflow occurred on playback-only. On record, we observed the same initial underruns as before, but no underruns once we started receiving and recording samples, so apparently no input samples are ever dropped.

The startup latency from clicking "start" to the first output sample is reduced, and this showed up in the form of better synchronization of the recorded track. After the automatic shifting by the estimated audio latency, the synchronization error is consistently 15ms (audio at track time 0 shows up in the recorded track at 15ms). I'm not sure where the error comes from, but at least this is better than the 40ms or so observed before.

Comments

"Preloading" zeros into the output buffer has been discussed as a way to deal with startup timing issues. It is disturbing to see an initial flood of callbacks loading up the output buffer instead of (more) periodic callbacks in the steady state. Since our estimate of the buffer size is based on these initial callbacks, I think that to ensure the buffer is really preloaded, we would have to write zeros until there was a noticeable delay between callbacks, so we'd end up writing an extra buffer of zeros, and the code would still be somewhat dependent on timing and callback framecounts.

My opinion is we're better off not preloading buffers. At least there's no indeterminacy on when audio samples are written (it happens in the first callback).

Midi Note Offs

In the previous code, various strategies were in place to eliminate stuck-on notes when a track is terminated. This was not working. There are some fixes in this version of AudioIO.{cpp,h}.

There seemed to be 2 problems: First, when you close an ALSA stream, scheduled messages can be dropped. Since note-off or all-off messages could be scheduled in the future, they might never make it to the output.

Second, it seems that ALSA does not respect the order of messages. If you schedule something for 10ms in the future, then 5ms later, you schedule something for 5ms in the future (presumably resulting in the same timestamp), the second message may be sent before the first. (This seems to be a bug in ALSA.)

My solution, implemented here, is to keep track of the latest timestamp at which MIDI is scheduled (mMaxMidiTimestamp. To implement AllNotesOff(), increment mMaxMidiTimestamp and use it to send the first note-off message. After each message, increment mMaxMidiTimestamp until all note-off and all-off messages are sent. Thus, when we are done, mMaxMidiTimestamp is a reasonable estimate of when the last note-off/all-off message will be sent. (Messages go much faster over software or USB-MIDI, so this is a worst-case estimate.) After waiting past the final value of mMaxMidiTimestamp, close the MIDI stream.

Midi Note-Off Results

Before the fix, it was easy to get stuck notes by clickin "stop" about the same time as a note-on. After the fix, I could no longer produce stuck notes.

When I stop just as a note is being played, the note is of course scheduled well into the future, so the user is aware that the note-on occurs after clicking "stop". I think this is exagerated by MIDI because while audio has a hard stop, a MIDI note might have a "tail" or decay that allows it to sound well after the note-off message. I think there's no good way around this and it's not really a problem.

Test 4

In the "fast start" approach of Test 3, what happens to MIDI timing when there are rapid callbacks and adjustments to mAudioOutLatency and therefore MIDI timestamps?

This is not a problem with notes at times 0, 0.25, 0.5, etc., maybe because all the action is after time 0 and before time 0.25, so what about in between?

To answer this, I'll shift the MIDI track forward by 1, 2, and 10ms and listen to and record the resulting timing.

Results

The results are subtle, but I believe there is a real (but small) synchronization error with the 1ms-shifted MIDI track (here is the recorded output). Listen for the attack differences between the first note and the others.

With a 2ms shift, there appears to be a difference visually, but I'm not convinced I'm hearing any differences between the first and second notes.

With a 10ms shift, I don't hear or see any difference due to synchronization errors.

Comments

I was worried that there would be significant synchronization errors due to the simultaneous start-up and latency estimation. I'm still not sure why things work as well as they do, but given the limitations of ALSA, I think this is pretty good. Anyone working with critical recording and mixing would almost certainly have a measure to "count off" the click track or at least start with 100ms of silence to make sure that the start of audio is not being cut off. Under these conditions, there's a built-in "preroll" that would eliminate start-up problems entirely.

The option exists to build in a pre-roll by inserting 100ms or more of silence every time you click "play" or "record", but in my opinion, it is not necessary.