Here's a test of some new synchronization code in Audacity for MIDI playback.
All tests are on a Dell Ultrabook with Intel CORE i7 2.6GHz processor and ubuntu 14.04 LTS Linux.
Audio is taken directly from the laptop speakers, and recordings are made directly with the laptop microphone.
MIDI is played through a Yamaha USB-to-MIDI interface, and then to an (old) Roland U-220 MIDI synthesizer. The reason for the MIDI hardware is to eliminate higher jitter and latency that might be expected from a software synthesizer. (For example Timidiy has obvious jitter when driven by MIDI, at least with the settings I tried.) Final audio output is through the headphone jacks to a pair of Shure over-the-ear headphones, which in this case are "over the laptop" to get a good signal into the microphone and to eliminate any significant speed-of-sound delay.
The Audacity MIDI Latency preference was set to 10ms. I do not know the true latency or the onset time of the particular sample used. 10ms seems a bit high (I would have guessed 5ms) but it is certainly in the right ballpark. This was adjusted by ear and seems better than 8ms or 12ms, and much better than 0ms, so I think this is correct within a few ms.
AudioIO.cpp
and
AudioIO.h
.
This test just plays 10 piano-like notes/sec with audio and MIDI. The
input was generated by Nyquist. The code is:
;; lots.sal - make a midi file and audio file with lots of notes ;; function lots(rate, dur) begin exec score-gen(save: quote(dense), score-len: round(rate * dur), dur: 3.0 / rate, ioi: 1.0 / rate, pitch: 60 + (sg:count % 12)) exec score-write-smf(dense, "lots.mid") exec s-save(dense, ny:all, "./lots.wav") end exec lots(10, 10)
Here is the MIDI file.
Here is the Audio file.
These were loaded into Audacity and played. During playback, both track play together, then the audio track is soloed, then the MIDI track is soloed, then the audio track is soloed, and then both play together. The soloing is just so you can get a sense of which instrument is MIDI and which is Audio. Here's the recorded result. This sounds quite good. I don't hear any obvious jitter in the MIDI notes (jitter is zero in the audio because it is all sample-accurate and there is no randomness in the synthesized tones), but when I listen to MIDI alone, I suspect there might be some small delays occasionally. I think the JND for jitter for this content would be a bit below 10ms. I suppose I could record just MIDI, put it through some onset detection and measure the actual jitter, but I'm printing a huge amount of debugging info even from the audio callback, so I'm pleased that it sounds anywhere near as good as it does.
Also note there is a 186ms delay before the first sound. This is the actual recorded track. When I recorded, Audacity automatically shifted this track to the left (starting before zero), and did a pretty good job of compensating for the startup latency, but even with compensation, the recorded track was about 40 or 50ms late.
In spite of some fairly wild timing behavior of callbacks at the beginning of audio playback, notice that the first notes sound perfectly synchronized. These are at time=0 on the audio and MIDI tracks, so there is no pre-roll or "warm up" time for Audacity to get in sync.
This might be a fluke in that time=0 could be a special case, and by the time we get to the next notes at time=250ms, we're well into the audio and MIDI streams. Putting notes at 50ms might be even more likely to turn up problems, right? But no, I tried that too and it sounds just as good when I shift all the notes later by 30ms or 50ms. I really expected a problem would show up here, so maybe the fact that audio buffers are getting pre-loaded with zeros in this implementation helps.
What's happening with underflow/overflow on stream startup? Using the code from Test 1 (except for a minor change to make underflow/overflow messages more findable and detection of null inputBuffer or outputBuffer in the callback), start playback and look for underflow/overflow.
Do the same with simultaneous recording and playback.
Here is the slightly
modified AudioIO.cpp
and unmodified AudioIO.h
.
No overflow/underflow occurred on playback-only. Using this version of AudioIO.cpp, the message
$$$$$$$$$$$$$$$$ CALLBACK STATUS: NULL inputBufferis printed on every callback.
When recording, here's a redacted output:
$$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow *** Callback: t=1.07977 MidiT -998.015 SysMinAudio 1000 PauseT 0 track 0 *** tracktime 0 anow 0 EXPECTED ABOUT 1.17267 ** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 2048 $$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow *** Callback: t=1.08024 MidiT 1 SysMinAudio 1.07976 PauseT 0.0464399 track 0 *** tracktime 0 anow 0.0464399 EXPECTED ABOUT 1.08024 ** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 4096 $$$$$$$$$$$$$$$$ CALLBACK STATUS: InputUnderflow *** Callback: t=1.08032 MidiT 1.001 SysMinAudio 1.0338 PauseT 0.0928798 track 0 *** tracktime 0 anow 0.0928798 EXPECTED ABOUT 1.12628 ** gAudioIO->mStreamToken -1, gAudioIO->mSeek 0 frames 2048 totalframes 6144This is happening as the output buffer is being filled with zeros, before playback has actually started (in this version, callbacks happen before returning from
AudioIO::StartStream
and
before mStreamToken = (++mNextStreamToken);
).
Underflow happens on the first callback at t=1.07977. If we call that CallbackTime 0, then the other underflows happen at CallbackTime 0.47ms and CallbackTime 0.55ms. During this period 93ms of zero samples are delivered to the output.
This behavior is repeatable, although in this particular run there
was a noticable delay between clicking "start" and the first
callback, and this is confirmed by "Callback: t=1.07977" (because
these callback times are from a system-based clock that is set to
zero near the beginning
of AudioIO::StartStream
. Usually, the first callback is
around t=0.06.
Here is the
modified AudioIO.cpp
and AudioIO.h
. There are
some additional changes there that are described further below.
No overflow/underflow occurred on playback-only. On record, we observed the same initial underruns as before, but no underruns once we started receiving and recording samples, so apparently no input samples are ever dropped.
The startup latency from clicking "start" to the first output sample is reduced, and this showed up in the form of better synchronization of the recorded track. After the automatic shifting by the estimated audio latency, the synchronization error is consistently 15ms (audio at track time 0 shows up in the recorded track at 15ms). I'm not sure where the error comes from, but at least this is better than the 40ms or so observed before.
"Preloading" zeros into the output buffer has been discussed as a way to deal with startup timing issues. It is disturbing to see an initial flood of callbacks loading up the output buffer instead of (more) periodic callbacks in the steady state. Since our estimate of the buffer size is based on these initial callbacks, I think that to ensure the buffer is really preloaded, we would have to write zeros until there was a noticeable delay between callbacks, so we'd end up writing an extra buffer of zeros, and the code would still be somewhat dependent on timing and callback framecounts.
My opinion is we're better off not preloading buffers. At least there's no indeterminacy on when audio samples are written (it happens in the first callback).
AudioIO.{cpp,h}
.
There seemed to be 2 problems: First, when you close an ALSA stream, scheduled messages can be dropped. Since note-off or all-off messages could be scheduled in the future, they might never make it to the output.
Second, it seems that ALSA does not respect the order of messages. If you schedule something for 10ms in the future, then 5ms later, you schedule something for 5ms in the future (presumably resulting in the same timestamp), the second message may be sent before the first. (This seems to be a bug in ALSA.)
My solution, implemented here, is to keep track of the latest
timestamp at which MIDI is scheduled
(mMaxMidiTimestamp
. To
implement AllNotesOff()
,
increment mMaxMidiTimestamp
and use it to send the
first note-off message. After each message,
increment mMaxMidiTimestamp
until all note-off and
all-off messages are sent. Thus, when we are
done, mMaxMidiTimestamp
is a reasonable estimate of
when the last note-off/all-off message will be sent. (Messages go
much faster over software or USB-MIDI, so this is a worst-case
estimate.) After waiting past the final value
of mMaxMidiTimestamp
, close the MIDI stream.
When I stop just as a note is being played, the note is of course scheduled well into the future, so the user is aware that the note-on occurs after clicking "stop". I think this is exagerated by MIDI because while audio has a hard stop, a MIDI note might have a "tail" or decay that allows it to sound well after the note-off message. I think there's no good way around this and it's not really a problem.
This is not a problem with notes at times 0, 0.25, 0.5, etc., maybe because all the action is after time 0 and before time 0.25, so what about in between?
To answer this, I'll shift the MIDI track forward by 1, 2, and 10ms and listen to and record the resulting timing.
With a 2ms shift, there appears to be a difference visually, but I'm not convinced I'm hearing any differences between the first and second notes.
With a 10ms shift, I don't hear or see any difference due to synchronization errors.
The option exists to build in a pre-roll by inserting 100ms or more of silence every time you click "play" or "record", but in my opinion, it is not necessary.