Training Systems for Augmenting Improvisers

CS 134 - Spring 2010
Roth Michaels, Digital Musics '11
roth@rothmichaels.us

Proposal | Milestone | Final Report

Proposal

Background

Early music for instruments and live electronics (i.e. pieces for instruments and tape) allowed for very limited flexibility in musical form because often performers were forced to maintain synchronization with a single prerecorded tape that would run throughout the piece. With the advent of synthesizers and samplers that could be used in performance, greater flexibility could be achieved as human performers could control electronics parts. As an improviser, being able to maintain flexible control over the parameters of the live electronics (both pure synthesis and processing of acoustic instruments) within the acoustic ensemble and where parameter changes fall in relation to musical form.

Currently in performances I use a myriad of hardware controllers to expressively control the behavior of live electronics (whether I am playing an acoustic instrument as well or a purely digital part in the ensemble). The vast amount of parameters needed to control the wide variety of synthetic timbres I employ often requires a large amount of manual attention which can distract from a performers ability to simultaneously play an acoustic instrument. To be able to manage all of these synthesis parameters in real-time to allow for flexible musical forms, I (and many other practitioners of electroacoustic music) either delegate the control of the electronics to a single performer using a computer or by establishing a list of parameter presets that can be triggered by one of the acoustic musicians via a foot pedal. My overall research goals at Dartmouth are focused around creating systems for an ensemble of acoustic performers to be augmented by live electronics where the electronics can be controlled by the acoustic instruments in the ensemble without the need for presets or a dedicated performer to control.

Project Concept

I am proposing a machine learning system where musical gestures in acoustic instruments could be used to signal specific sets of synthesis parameters for the live electronics. During training, a given parameter set would be assigned to a group of musical gestures or phrases. Merely by changing playing style, the performer would be able to make changes in the electronics without the need to rely on a predefined set of presets creating more flexibility in the form of the music.

The basic concept I am researching in this project is a classification problem where learned gestures are assigned a set of labels representing the on/off state for a group of synthesis modules. Each on-state is a point in a multidimensional synthesis parameter space for that module. During performance, gestures heard by the system will be placed in the multidimensional gesture space and matched to the learned gesture of least distance. The performed gesture is then mapped into the parameter space for each module that is assigned an on-state for the matched learned gesture. Linear interpolation is then used to determine the appropriate output parameters based on the learned points in the given parameter space.

This design only addresses processing of a single gesture. I will also investigate possible solutions for parsing gestures in the context of a larger composition/improvisation.

Real-time applications of this system is an important long-term goal. All methods explored will be evaulated for their functional success and realistic ability to be implimented in a real-time system.

Methods

My proposed design will be a combination of supervised learning and reinforcement learning. An initial supervised learning stage will be used to assign parameter-state labels. Multiple variations on the gestures will be used for each label. A reinforcement stage will further train the system by assigning rewards to the system after a rehearsal session with a performer. The reinforcement stage could serve two functions:

Frequency spectrum analysis techniques have been useful in speech recognition, audio classification, and soundspotting (a technique Michael Casey’s that has similarities to my proposed gesture matching technique). Currently I am considering using either a short-time fast Fourier transform, constant Q transform, or Mel-frequency cepstral coefficients using a discrete cosine transform to perform the spectral analysis.

The dimensionality of the frequency domain representation of a gesture will be much higher than what is needed to perform minimum distance matching efficiently if STFT or constant Q techniques are used instead of MFCC. Gestures analyzed this way will need to be mapped into a lower dimensional space during training and application of the system. Currently I am planning on using PCA to reduce the dimensionality of the gesture’s spectrum.

I am considering the possibility that spectral techniques will not perform well for defining the gesture space to be used for minimum distance matching of performed gestures. If this does not prove fruitful, I will investigate using perceptible musical features to create the gestures space (e.g. pitch, dynamics, tempo, harmony).

While the main focus of this project is learning how to respond to individual gestures, this alone would not provide a complete system that could be used in performance. Unless the performer was using some sort of trigger to turn on gesture listening, the system would need to know how to identify and respond to learned gestures within the context of a larger composition/improvisation. I will investigate previous research into creating artificial improvisers that respond to a live performer for techniques used to parse phrases/gestures for analysis out of a larger musical context. Currently, I am thinking of investigating sequential techniques such as hidden Markov models as well as neural networks.

I am currently doing some other research into using genetic algorithms to generate electronic sound material to augment an acoustic performer. I am considering attempting to apply these techniques to choose parameters for the on-state modules for the matched gesture instead of merely using linear interpolation. Fitness would be assigned through rehearsal.

Training Data

will use a MIDI controller and software synthesizer to record a series of gesture variations to be used in a short improvisation. Synthesizing training and test data this way versus using a microphone on an acoustic instrument is to reduce any outside variables while designing the system. After this research is complete, I do plan on trying the system with a microphone and an acoustic instrument to see how well my system responds in real-world conditions. Both audio recordings and MIDI recordings will be saved so that MIDI data can be used to define a gesture space if spectral techniques do not prove to be useful.

Milestone Goals

  1. Create all training examples and some initial test/reinforcement examples
  2. Determine the best feature analysis method, of those previously discussed, for this application
  3. Determine the feasibility of implementing techniques for listening for gestures within a larger context (e.g. neural networks, Markov)
  4. Decide if genetic algorithm ideas I am investigating should be applied to the parameter value selection stage
  5. If (3) and (4) are not deemed to be within the scope of this project, decide upon another way to apply a reinforcement stage to this system to use while rehearsing

References

  1. Assayag, G. & Dubnov, S. (2004). Using Factor Oracles for Machine Improvisation. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 8(9), 604-610.
  2. Burton, Anthony R. & Vladimirova, Tanya. Generation of Musical Sequences with Genetic Techniques. Computer Music Journal, 23(4), 59-73.
  3. Casey, M. (2009). Soundspotting: A New Kind of Process? In R. Dean (Ed.), The Oxford Handbook of Computer Music.Oxford: Oxford University Press.
  4. Casey, M., Rhodes, C., & Slaney, M. (2008). Analysis of Minimum Distances in High Dimensional Musical Spaces. IEEE Transactions on Audio, Speech and Language Processing.
  5. Franklin, J. A. (March 1-3, 2001). Multi-Phase Learning for Jazz Improvisation and Interaction. Eighth Biennial Sypmposium on Art and Technology.
  6. Patchet, François. (2003). The Continuator: Musical Interaction With Style. Journal of New Music Research, 32(3), 333-341.
  7. Stamatatos, Efstathios & Widmer, Gerhard. (2005). Automatic indentification of music performers with learning ensembles. Artificial Inteligence, 165, 37-56.
  8. Thom, Belinda. (2001). Proceedings of the fourth international conference on Autonomous agents, 309-316.