"Generic" Music Representation for Aura

Roger B. Dannenberg
Carnegie Mellon University

Introduction

Aura is a real-time object system with a focus on interactive music. It evolved from earlier systems, including the CMU Midi Toolkit, which included software for Midi processing. Aura also supports Midi data, but a goal of Aura was to go beyond Midi, enabling richer and more flexible control. To that end, Aura introduced the idea of general messages that contain attribute/value pairs. With Aura, you can dynamically create an instrument, a note, an object that modifies a note (for example, a vibrato generator or an envelope), and you can send any of these a stream of timestamped, attribute/value pairs containing high precision control information.

It was thought that this would be an improvement over Midi: more flexibility, none of the limitations. However, experience has shown that unlimited flexibility does not make something better in all ways. One of the advantages of Midi is that you can use standard tools to capture, store, and play Midi data. The contrained representation and the conventions make it easy to work with. For example, if you are not sure a keyboard is working, you can plug it into almost any synthesizer for a simple test.

With Aura, I found myself inventing semantics and a protocol every time I designed a new instrument. The flexibility was there, and I was able to do things that would be difficult with Midi, but the overhead of designing interfaces and remembering them so I could use them was too much. This document describes a fairly generic representation to be used in the context of Aura to represent notes, sounds, and control infomation. Using the conventions described here, it should be possible to make some general tools to assist with music processing in Aura.

Multiple Representations

I want to support three representations:

A text representation for communication with various bits of software (I learned one of the most popular features of the CMU Midi Toolkit was the simple text language Adagio, which many composers generated using the programming language of their choice.)
Aura messages. The representation should translate to an attribute/value pair message stream.
Midi. There should be a natural way to translate Midi into this representation and to at least map a subset of this representation into Midi.

Resources and Instances

One of the most critical aspects of a representation is to decide what exactly is being represented. I want to be able to represent sounds of various kinds and to be able to update the sounds over time. The limitations of Midi might help make this clear. In Midi, control changes apply to channels, so there are really only 16 objects or resources (channels in this case) to which updates can be directed. There are a few updates that apply to particular keys or note numbers, but even here, you are limited to 128 note numbers. I want to be able to associate each sound with its own identifier so that the sound can receive individual updates.

Is a sound structured? A sound can have many parameters. Notes usually have pitch and loudness, but there are many other possibilities. When sounds get complex, there is a tendency to describe them hierarchically, e.g. a note can have a pitch vibrato and an amplitude vibrato. Each vibrato can have a phase, frequency, and amplitude. This approach can lead to a hierarchical naming scheme as in Open Sound Control, such as "note/ampvib/freq" or "note/freqvib/freq". In Aura, vibrato objects can be considered as separate entities and named directly. In fact, a collection of notes can share a vibrato object. The variations are endless.

Alternatively, sounds can be "closed" abstractions. All updates are sent to the sound, and it is the sound's resposibility to forward the updates as it sees fit. Continuing with the example, you might set the "ampvibfreq" attribute and the sound would in turn set the "frequency" attribute of its amplitude vibrato object. This object might be an internal object managed by the sound or a shared object calculating vibrato for many sounds.

My leaning right now is toward the closed abstraction approach. This eliminates the complexities of a hierarchical name space and the danger of exposing the internals of sounds to updates.

Multiple Parameters

Another issue is the problem of multiple parameters for sounds, given that Aura messages typically convey one attribute/value pair. Open Sound Control sends packets of atomic updates, and Aura had this feature in a previous version, but it turned out to be very difficult for clients to construct packets through any kind of simple interface, and packets make filters and mappers more complex.

The alternative is to simply send sequences of updates in the form of attribute/value pairs. It helps to have some sort of delimiters, particularly because we typically want updates to apply to a particular sound, yet attribute/value pairs do not contain a "tag" or target field that would say which sound is to be updated. The way in which a sequence of updates is bracketed by other messages is an important convention in the representation.

Synchronization and Atomicity

Since Aura messages set a single attribute to a simple value (typically a float, integer, string, or Aura object reference), an important question is how to make sure that groups of attributes are updated simultaneously. The classic version of this problem is to insure that filter coefficients are updated simultaneously to avoid unstable regions of the parameter space. There are at least three ways to handle this problem:

Typically a sound or note is created using a sequence of attribute updates. By convention, the last of these is the "gate" attribute which actually starts the sound. Updates do not really need to be synchronous and atomic until the sound begins (in most applications).
Timestamps allow the sender to specify synchronous updates, and the Aura scheduler cooperates by delivering all messages with a given timestamp atomically. What this means for synthesis is that audio generation is stopped, messages are delivered, and then audio generation continues. The implementation is quite elegant and simple: audio generation itself is scheduled by timestamped messages. The Aura scheduler insures that processing occurs in timestamp order. The only exception is that if a message arrives from another process and its timestamp has expired, the message is delivered immediately.
Within a process, messages are processed synchronously and non-preemptively. Thus, if a bank of filters are being controlled by a single object located in the same Aura zone (process), then that object can update all filter coefficients atomically without any special precautions. A client in another zone wishing to deliver a set of updates atomically can create a proxy in the destination zone and use the proxy to deliver the set of messages. This may require some extra programming to create the proxy.

The Aura Message Representation

Channels

Music information can exist in many parallel streams representing Midi channels, instruments, voices, sections, etc. We could simply direct each stream to a different object, but ultimately we want to be able to store streams in a single file or direct them to a single object, so we need a representation for multiple streams. The "chan" attribute serves to name a stream. The value is an integer (32 bits), allowing a large number of channels.

Whenever the channel attribute is set (i.e. a "set 'chan' to value message is sent), the following attribute/value pairs apply to the channel or to a specific sound associated with the channel. Channels can have attributes. By convention, setting an attribute for a channel sets that attribute for all sounds currently active on the channel. The attribute may or may not apply to future sounds created on that channel. (It is also up to the channel whether to do anything with the attribute/value pair, and it is up to the sounds to decide whether to do anything if they receive the pair, so it does not seem wise to try to control the semantics of attribute/value updates too rigidly.)

Keys

Within a channel, sounds are allocated and named by setting the "key" attribute. The name comes from the notion of keyboards, but there is not necessarily a one-to-one mapping from key number to pitch. Instead, the key numbers 0 through 127 act as Midi keys which imply pitch, but key numbers above 127 are simply tags used to identify sounds. In this way, we can have 32 bits to name sounds within a channel. This is enough to allocate a separate name for each sound or note on the channel in all but the most extreme cases.

By convention, setting the "key" attribute allocates a sound on the current channel. Successive attribute/value pairs apply to the newly allocated sound or note.

Gates

In Midi, the keydown message that allocates a note also starts it playing. In Aura, setting the "key" attribute only allocates a sound or note. To make it play, you set the "gate" attribute, which normally is a floating point number in [0...127], representing a Midi-like velocity or amplitude. If the gate value is less than or equal to zero, the message is roughly equivalent to a Midi noteoff message. In other words, the note or sounds begins to decay and eventually stops sounding. The gate may be changed to any non-zero value to accomplish volume control changes, but sounds may choose to ignore these changes. (Otherwise, every sound would have to include some additional logic to detect changes, route them to an envelope and use the envelope to control gain in some fashion.)

Note that the "gate" in neither in dB nor is it linear. Is this a bad idea? I'm not sure. Instruments should call a conversion function to make it easy to change the interpretation of the gate value.

Duration

To accomodate the notelist style of score specification where notes are given a duration attribute at the beginning rather than a noteoff update, you can set the "dur" attribute to a floating point value representing seconds of duration. It is up to the note whether the duration is interpreted as the point at which decay begins or the point at which the note becomes silent, but the convention will be that of Music N, that is, when the note becomes silent. If the "dur" attribute is set, there is no need to eventually set the gate to zero. Notes and sounds that do not otherwise know what to do with duration can simply schedule a "set 'gate' to 0 after duration" message to accomplish the effect.

Pitch

Pitch is specified using a floating point number of half steps corresponding to Midi integer key numbers. In other words, 60 is middle C, and 60.5 is a quarter tone above middle C.

Other Attributes

Any number of other attributes can be implemented. For example, "bend" might be an additive offset to "pitch" to facilitate pitch bend specification.

Examples

A typical sequence of messages to turn on a note in a Midi-like fashion is the following:

set 'chan' to 1
set 'key' to 60
set 'gate' to 100

To play this same note for a known duration, use the following:

set 'chan' to 1
set 'key' to 60
set 'dur' to 0.85
set 'gate' to 100

A more advanced sequence where the "key" attribute serves as a tag is the following:

set 'chan' to 10
set 'key' to 1205
set 'pitch' to 60.1 --10 cents sharp
set 'pan' to 0.5 --pan to the middle
set 'brightness' to 0.3 --set any number of additional parameters
set 'gate' to 95 --and finally turn on the note

To modify the note, you might send additional updates, for example:

set 'chan' to 10 --only necessary if 'chan' was set to another value
set 'key' to 1205 --only necessary if 'key' was set to another value
set 'pan' to 0.6 --now change as many attributes as you want

To end the note, set the "gate" attribute to zero:

set 'chan' to 1
set 'key' to 60
set 'gate' to 0

The Text Representation

The text representation is based on "Adagio," the text-based score language in the CMU Midi Toolkit. The basic idea is that each sound event or note is represented by a line of text (some extensions allow multiple events per line). There are some abbreviations for common attributes. Some examples follow, corresponding to the message-based examples above. Note that in the text form, notes are always specified using durations to avoid having to match up note beginnings with note endings. In Adagio, various letters are used to indicate pitch ("A" through "G") and duration ("S", "I", "Q", "H", "W"), leaving the rest of the alphabet to indicate attributes, which include the following:

V - Voice (the channel parameter)
K - Key
L - Loudness (the gate parameter)
P - Pitch
U - dUration
T - Time (the starting time of the note)

The following is a note description:

V1 K60 U0.85 L100

Translated, this means: "Using channel 1, and key 60, and a duration of 0.85 seconds, set the gate to 100."

Adagio was mostly limited to Midi, so there was no need for an extended set of attributes. For Aura, the syntax is extended to allow attributes as follows:

V1 C4 Q Lmf -pan:0.5 -brightness:0.3

In this example, the "pan" attribute is set to 0.5. The syntax is simple: a leading dash ("-") and trailing colon (":") serve to delimit the attribute name, and the value follows. This example uses standard Adagio syntax for pitch (C4) duration (Q) and loudness (Lmf).

If the "key" attribute is not specified, the line is a channel update message, e.g.

V3 -bend:0.21 T21.1

sends a "bend" attribute to channel 3 at time 21.1.

Tempo and Beat Representation

(This section is not yet complete)

Tempo and beats are important aspects of music representation. My earlier work more-or-less ignored this problem, but this makes interfacing to notation programs and sequencers difficult. It should be possible to:

Encode beats in a stream of Aura messages (for synchronization to external Midi devices and for interfacing to notation programs)
Express a time map to be applied to a stream (for text-based notation and composition)
Encode both tempo and beats (to provide for tempo variation control in real time)

The attribute "beat" will encode beat position as a floating point value, allowing arbitrary subdivisions of the beat as opposed to the fixed divisions of the Midi Clock message. The attribute "tempo" will specify a new tempo. Aura messages carry timestamps. Together with "beat" and "tempo" attributes, timestamps can be used to recover a tempo map from a sequence of messages.

Text-based scores will ordinarily treat time as beats, which means that timestamps of messages must be mapped into beat values for the scores, and a separate section of the score file will allow the specification of a tempo map. It is expected that the internal representation of a score will use beats for times and durations, and these will be translated on-the-fly into Aura timestamps using the Aura scheduling mechanisms.

Different channels might use different tempo maps, but I am not sure how to indicate this. We do not want to replicate a tempo map for each channel.