Spring 2023 Projects | Roger B. Dannenberg

Summary

I wrote this to share with students and prospective students what I’m working on, and interested in working on, during the next year. This is an update and revision of Projects - Fall 2021.

About the Author

Roger B. Dannenberg is Emeritus Professor of Computer Science at Carnegie Mellon University. He is known for a broad range of research in Computer Music, including the creation of interactive computer accompaniment systems, languages for computer music, music understanding systems, and music composing software. He is a co-creator of Audacity, perhaps the most widely used music editing software.

Internships

If you are looking for an internship, I cannot offer a salary or cover travel expenses, but I have some funds for minor research expenses. Since Covid, I have been working remotely from home.

Ph.D.s

If you are looking for a Ph.D., CMU is a great place, but I'm no longer taking new students. Chris Donahue is joining our faculty in fall 2023, and I hope to participate in research, possibly as co-advisor.

I receive a lot of requests for internships and supervision. Prospective interns and Ph.D.s should read the sidebar at left. Here’s what I’m doing and thinking about these days.

O2

O2 is a network protocol especially for music control. It is intended to be an OSC “do over” given that even tiny low-cost controllers can communicate using IP, and also given what we’ve learned from experience with OSC (Open Sound Control).

Mainly, O2 introduces discovery so users do not have to type in IP addresses and port numbers. O2 also supports clock synchronization, timed message delivery, and some publish-subscribe capabilities. O2 is also global in the sense that discovery is not limited to the local area network. O2 works over Web Sockets to connect to in-browser applications, and through shared memory for low-latency applications such as interactive audio applications.

What comes next are some extensions and further work to make O2 even more complete and interesting:

O2 is written in C++. Ports to Python or Java would be difficult, but we should at least have bindings for other languages and some example code and make it easy to use O2.
An alternative to native O2 implemnentations in Python, Java, Rust, Go, etc., is native implementations of O2lite, a minimal protocol that allows applications to connect to an O2 host that can relay messages to an entire O2 ensemble. O2lite is small enough that it could be reimplemented in Python and Java (we already have Javascript, C++, and ESP32 implementations). These "native" implementations would be particularly easy for Python and Java applications.
O2lite in Javascript could be used to create some inspection and debugging tools.
O2 needs example applications and more testing.
O2 should be supported in MaxMSP and Pd as externals. A Pd implementation exists but needs more testing and examples.

Music Composition

I’m working with Shuqi Dai on automatic composition of popular songs, particularly songs that are similar to existing “seed” songs. One thing that might help is a lot of data. It‘s fairly easy to get data from MIDI files, and there are at least 100,000 easy-to-get music files out there, but most of these require analysis to identify chords, melody, bass lines, and structure that could be valuable for learning and further analysis.

Previously, I worked with Zheng Jiang, who created a system to automatically analyze and label MIDI files. It’s a great start, but I think it is not robust enough to turn loose on 100K files and expect satisfactory results. For one thing, not all MIDI is a popular song or even useful as an example.

Therefore, a project waiting to get done is to push this existing work forward and try to obtain tens of thousands of songs with labeled melody, chords, bass, structure, bar-lines, etc.

Web Audio Soundcool

Soundcool is a software-based modular synthesis system that is very easy to use. I’m working with an international team to port Soundcool to Web Audio, Javascript, and React so that users can play with Soundcool just by visiting a free website. Things are moving along, but we could use more help.

Computer Music Archeology

There are some early works in music composition that, rather than relying on sophisticated machine learning, simply implemented very insightful rules or algorithms of music theory and music composition. I've tried to understand what’s going on in these programs, because many of them outperform the so-called “state-of-the-art” methods that have become popular recently. Some efforts to recreate some of this early work could be very interesting and allow better understanding as well as additional experimentation. I am particularly interested in understanding the scope of output (does the software always write a variation of essentially the same song, or does the output have a range of ideas and forms?) as well as what is musically important (what’s more important: pitch, rhythm or form? And when is careful selection better than random choice?) This could lead to advances by revealing forgotten secrets of music.

Arco

I’ve written a number of libraries or frameworks for building interactive real-time music systems including Nyquist and Aura. I’ve learned a lot from these, and I’ve started yet another system that's simpler in some ways and more powerful in others. The new system, Arco, uses O2 for communication between threads.

One of the goals of Arco is to be lightweight, modular and flexible. It can serve as an “embedded” system providing synthesis for another application or running in a microcontroller such as a Raspberry Pi. Modularity is achieved by using O2 for interprocess communication, FAUST for most DSP, and the plan is to have some kind of configuration facility to specify what synthesis functions are linked into an Arco executable so that Arco does not expand into a huge monolithic binary with hundreds of built-in DSP routines.

A new feature of Arco is a C++ abstract superclass for DSP objects that allows for runtime patching to interconnect objects, where connections can be constant (scalar) values, control rate, audio rate, and either single or multi-channel. This should facilitate rapid prototyping and creative programming.

All of this is working now, but barely, so there is a lot more work to do. My next step is to try to build some applications, using that to guide the evolution and development of the Arco framework.

Human Computer Music Performance

I have been thinking a lot about how computers and humans can perform together. One possibility is the creation of “artificial musicians” that can join humans in performances, particularly of “steady-tempo” music with conventional forms, rhythms, and fairly fixed structures. As a jazz musician, I play this kind of music all the time, and other examples include choir-music, rock, folk, church music and pop (of all kinds). It is beyond the state-of-the-art for machines to follow beats and recognize song position, at least at an acceptable performance level, but these requirements can be met through user interfaces, e.g. a foot pedal for tapping the beat and touch displays for giving cues. I created a lot of research systems with students, but going further requires robust road-worthy software systems that are quick to set up and allow music content to be entered quickly. I’ve been stuck at this stage for awhile now, but currently I'm hardening my HCMP software and adding computer accompaniment capabilities where the system can follow and accompany live MIDI piano performances.

There is room here both in software development and in writing music and performing with this HCMP system to learn what works, how it can be used, and what new features are needed. I hope to become the pioneer of a new genre of music or at least overcome some interesting HCI and music performance challenges.

Global Drum Circle

I did some preliminary work on group drumming online. Latency is a big issue, and my approach is to organize drumming into cycles of 4 or more measures. Locally, you hear your own drums immediately, but you hear everyone else's drums with a 1-cycle delay, e.g. 4 measures later than they were played. Similarly, everyone else hears your drums 4 measures later than when you actually played them. Experience has shown that there has to be a reference such as bass drum hits in order to allow the tempo to “lock in.”

I've mapped out a number of interactive scenarios such as call-and-response, follow-the-leader, alternating group and solo play, etc., and we've done some prototypes and testing, so I believe it is possible to make drumming online an enjoyable experience.

I‘m working with Ari Liloia on this. We envision a 3-step implementation: First, create an interactive system with a human expert who calls the shots and leads the drum circle. Second, learn from this to create an automated AI drum circle leader. Third, scale up to multiple drum circles around the world that run 24/7 and where people can join and leave, and depending on how many participants are online, drum circles can split and merge.

Music Patterns and Music Models

Music structure is critical, but not well understood. I'm working with students to implement music prediction models. Our claim is that while there are general tendencies in music (e.g., small pitch intervals are more common than large intervals in melody), there are also important local tendencies. For example, the first few bars of Beethoven's Fifth Symphony tells you much more about the next few bars than general knowledge of all classical music will tell you. This seems obvious, but we see many music-composition-using-deep-learning projects attempting to write music based on learning to predict music in general. Do these systems ever learn how to use repetition and imitation within a single composition? To some extent, yes, but so far, learning systems seem unable to form many abstractions about music because they are not good at memory or learning on-the-fly, whereas I believe that identifying repeated and transformed patterns within a composition is a critical part of the music composition and listening experience.

Our hypothesis is that by identifying patterns and repetition in music we can create better models for music generation and listening. Our approach is based on prediction: We rate models on their ability to predict the next element in a sequence (of pitches, durations, intervals, or whatever), and we measure this quantitatively in terms of entropy.

The actual work here consists of gathering and pre-processing music in machine-readable to form datasets, writing and debugging models, and running experiments to evaluate different models and parameters on different datasets. I think the next step will be coding and evaluating ad-hoc models that are based on common music ideas and structures.

Coda: Machine Learning and Music Generation

Many students write to say they know all about machine learning and would love to come to be interns. I can understand the excitement and enthusiasm. Unfortunately, my experience is that by the time students “tool up” and get enough experience to tackle some real problems, most of a summer or semester (or 2) has gone by, and there’s no time to make any advances. I would not say this is a bad area for research, but it seems that most of the obvious things are already done (and a great deal more). When the low-hanging fruit is gone, you really need a ladder or some secret advantage, whether it is a supercomputer, experience and insight, or just a good novel idea. I do not feel I can offer that now to undergrads in search of a quick but rewarding research experience. Many of the other topics listed above have some potential for completing something interesting and even publishable in a couple of months, but if you are only excited by machine learning applications, you should follow your heart and passion. That is where you will find the greatest happiness and accomplishments.