Generic Music Representation for Aura

Roger B. Dannenberg
Carnegie Mellon University

November 2000

 

Introduction

Aura is a real-time object system with a focus on interactive music. It evolved from earlier systems, including the CMU Midi Toolkit, which included software for Midi processing. Aura also supports Midi data, but a goal of Aura was to go beyond Midi, enabling richer and more flexible control. To that end, Aura introduced the idea of general messages that contain attribute/value pairs. With Aura, you can dynamically create an instrument, a note, an object that modifies a note (for example, a vibrato generator or an envelope), and you can send any of these a stream of timestamped, attribute/value pairs containing high precision control information.

It was thought that this would be an improvement over Midi: more flexibility, none of the limitations. However, experience has shown that unlimited flexibility does not make something better in all ways. One of the advantages of Midi is that you can use standard tools to capture, store, and play Midi data. The contrained representation and the conventions make it easy to work with. For example, if you are not sure a keyboard is working, you can plug it into almost any synthesizer for a simple test.

With Aura, I found myself inventing semantics and a protocol every time I designed a new instrument. The flexibility was there, and I was able to do things that would be difficult with Midi, but the overhead of designing interfaces and remembering them so I could use them was too much. This document describes a fairly generic representation to be used in the context of Aura to represent notes, sounds, and control infomation. Using the conventions described here, it should be possible to make some general tools to assist with music processing in Aura.

Multiple Representations

I want to support three representations:

Resources and Instances

One of the most critical aspects of a representation is to decide what exactly is being represented. I want to be able to represent sounds of various kinds and to be able to update the sounds over time. The limitations of Midi might help make this clear. In Midi, control changes apply to channels, so there are really only 16 objects or resources (channels in this case) to which updates can be directed. There are a few updates that apply to particular keys or note numbers, but even here, you are limited to 128 note numbers. I want to be able to associate each sound with its own identifier so that the sound can receive individual updates.

Is a sound structured? A sound can have many parameters. Notes usually  have pitch and loudness, but there are many other possibilities. When sounds get complex, there is a tendency to describe them hierarchically, e.g. a note can have a pitch vibrato and an amplitude vibrato. Each vibrato can have a phase, frequency, and amplitude. This approach can lead to a hierarchical naming scheme as in Open Sound Control, such as "note/ampvib/freq" or "note/freqvib/freq". In Aura, vibrato objects can be considered as separate entities and named directly. In fact, a collection of notes can share a vibrato object. The variations are endless.

Alternatively, sounds can be "closed" abstractions. All updates are sent to the sound, and it is the sound's resposibility to forward the updates as it sees fit. Continuing with the example, you might set the "ampvibfreq" attribute and the sound would in turn set the "frequency" attribute of its amplitude vibrato object. This object might be an internal object managed by the sound or a shared object calculating vibrato for many sounds.

My leaning right now is toward the closed abstraction approach. This eliminates the complexities of a hierarchical name space and the danger of exposing the internals of sounds to updates.

Multiple Parameters

Another issue is the problem of multiple parameters for sounds, given that Aura messages typically convey one attribute/value pair. Open Sound Control sends packets of atomic updates, and Aura had this feature in a previous version, but it turned out to be very difficult for clients to construct packets through any kind of simple interface, and packets make filters and mappers more complex.

The alternative is to simply send sequences of updates in the form of attribute/value pairs. It helps to have some sort of delimiters, particularly because we typically want updates to apply to a particular sound, yet attribute/value pairs do not contain a "tag" or target field that would say which sound is to be updated. The way in which a sequence of updates is bracketed by other messages is an important convention in the representation.

Synchronization and Atomicity

Since Aura messages set a single attribute to a simple value (typically a float, integer, string, or Aura object reference), an important question is how to make sure that groups of attributes are updated simultaneously. The classic version of this problem is to insure that filter coefficients are updated simultaneously to avoid unstable regions of the parameter space. There are at least three ways to handle this problem:

The Aura Message Representation

Aura messages consist of attribute/value pairs. Attributes are typed, and by convention, the last letter of the attribute name indicates the type: 'r' for real (double), 'i' for (long) integer, 'l' for logical (bool), 's' for string (char *), and 'a' for an Aura object identifier.

Channels

Music information can exist in many parallel streams representing Midi channels, instruments, voices, sections, etc. We could simply direct each stream to a different object, but ultimately we want to be able to store streams in a single file or direct them to a single object, so we need a representation for multiple streams. The "chani" attribute serves to name a stream. The value is an integer (32 bits), allowing a large number of channels.

Whenever the channel attribute is set (i.e. a "set 'chani' to value message is sent), the following attribute/value pairs apply to the channel or to a specific sound associated with the channel. Channels can have attributes. By convention, setting an attribute for a channel sets that attribute for all sounds currently active on the channel. The attribute may or may not apply to future sounds created on that channel. (It is also up to the channel whether to do anything with the attribute/value pair, and it is up to the sounds to decide whether to do anything if they receive the pair, so it does not seem wise to try to control the semantics of attribute/value updates too rigidly.)

Keys

Within a channel, sounds are allocated and named by setting the "keyi" attribute. The name comes from the notion of keyboards, but there is not necessarily a one-to-one mapping from key number to pitch. Instead, the key numbers 0 through 127 act as Midi keys which imply pitch, but key numbers above 127 are simply tags used to identify sounds. In this way, we can have 32 bits to name sounds within a channel. This is enough to allocate a separate name for each sound or note on the channel in all but the most extreme cases.

By convention, setting the "key" attribute allocates a sound on the current channel. Successive attribute/value pairs apply to the newly allocated sound or note.

When a "chani" message is sent, succeeding messages apply to the channel until a "keyi" message arrives. After the "keyi" message, messages apply to a sound within that channel corresponding to the key number. To direct messages back to the channel as a whole, send another "chani" message. Alternatively, the key number -1 is reserved to mean that messages should be directed to the channel.

Gates

In Midi, the keydown message that allocates a note also starts it playing. In Aura, setting the "keyi" attribute only allocates a sound or note. To make it play, you set the "gater" attribute, which normally is a floating point number in [0...127], representing a Midi-like velocity or amplitude. If the gate value is less than or equal to zero, the message is roughly equivalent to a Midi noteoff message. In other words, the note or sounds begins to decay and eventually stops sounding. The gate may be changed to any non-zero value to accomplish volume control changes, but sounds may choose to ignore these changes. (Otherwise, every sound would have to include some additional logic to detect changes, route them to an envelope and use the envelope to control gain in some fashion.)

Note that the "gater" in neither in dB nor is it linear. Is this a bad idea? I'm not sure. Instruments should call a conversion function to make it easy to change the interpretation of the gater value.

Duration

To accomodate the notelist style of score specification where notes are given a duration attribute at the beginning rather than a noteoff update, you can set the "durr" attribute to a floating point value representing seconds of duration. It is up to the note whether the duration is interpreted as the point at which decay begins or the point at which the note becomes silent, but the convention will be that of Music N, that is, when the note becomes silent. If the "durr" attribute is set, there is no need to eventually set the gate to zero. Notes and sounds that do not otherwise know what to do with duration can simply schedule a "set 'gater' to 0 after duration" message to accomplish the effect.

Pitch

Pitch is specified using a floating point number of half steps corresponding to Midi integer key numbers. In other words, 60 is middle C, and 60.5 is a quarter tone above middle C. The attribute name is "pitchr".

Other Attributes

Any number of other attributes can be implemented. For example, "bendr" is an additive offset to "pitchr" to facilitate pitch bend specification.

Examples

A typical sequence of messages to turn on a note in a Midi-like fashion is the following:

set 'chani' to 1
set 'keyi' to 60
set 'gater' to 100

To play this same note for a known duration, use the following:

set 'chani' to 1
set 'keyi' to 60
set 'durr' to 0.85
set 'gater' to 100

A more advanced sequence where the "key" attribute serves as a tag is the following:

set 'chani' to 10
set 'keyi' to 1205
set 'pitchr' to 60.1 --10 cents sharp
set 'panr' to 0.5 --pan to the middle
set 'brightnessr' to 0.3 --set any number of additional parameters
set 'gater' to 95 --and finally turn on the note

To modify the note, you might send additional updates, for example:

set 'chani' to 10 --only necessary if 'chan' was set to another value
set 'keyi' to 1205 --only necessary if 'key' was set to another value
set 'panr' to 0.6 --now change as many attributes as you want

To end the note, set the "gate" attribute to zero:

set 'chani' to 1
set 'keyi' to 60
set 'gater' to 0

Tempo and Beat Representation

Tempo and beats are important aspects of music representation. My earlier work more-or-less ignored this problem, compiling everything down to absolute timestamps, but this makes interfacing to notation programs and sequencers difficult. It should be possible to:

The attribute "beatr" will encode beat position as a floating point value, allowing arbitrary subdivisions of the beat as opposed to the fixed divisions of the Midi Clock message. The attribute "tempor" will specify a new tempo. Aura messages carry timestamps. Together with "beatr" and "tempor" attributes, timestamps can be used to recover a tempo map from a sequence of messages.

These messages do not apply to particular channels or keys. We do not want to replicate a tempo map for each channel. On the other hand, different channels might want to use different tempo maps. This should be handled by using a separate message stream for each tempo map.