CS 134 - Spring 2010
Roth Michaels, Digital Musics '11
roth@rothmichaels.us
Proposal | Milestone | Final Report
Early music for instruments and live electronics (i.e. pieces for instruments and tape) allowed for very limited flexibility in musical form because often performers were forced to maintain synchronization with a single prerecorded tape that would run throughout the piece. With the advent of synthesizers and samplers that could be used in performance, greater flexibility could be achieved as human performers could control electronics parts. As an improviser, being able to maintain flexible control over the parameters of the live electronics (both pure synthesis and processing of acoustic instruments) within the acoustic ensemble and where parameter changes fall in relation to musical form.
Currently in performances I use a myriad of hardware controllers to expressively control the behavior of live electronics (whether I am playing an acoustic instrument as well or a purely digital part in the ensemble). The vast amount of parameters needed to control the wide variety of synthetic timbres I employ often requires a large amount of manual attention which can distract from a performers ability to simultaneously play an acoustic instrument. To be able to manage all of these synthesis parameters in real-time to allow for flexible musical forms, I (and many other practitioners of electroacoustic music) either delegate the control of the electronics to a single performer using a computer or by establishing a list of parameter presets that can be triggered by one of the acoustic musicians via a foot pedal. My overall research goals at Dartmouth are focused around creating systems for an ensemble of acoustic performers to be augmented by live electronics where the electronics can be controlled by the acoustic instruments in the ensemble without the need for presets or a dedicated performer to control.
I am proposing a machine learning system where musical gestures in acoustic instruments could be used to signal specific sets of synthesis parameters for the live electronics. During training, a given parameter set would be assigned to a group of musical gestures or phrases. Merely by changing playing style, the performer would be able to make changes in the electronics without the need to rely on a predefined set of presets creating more flexibility in the form of the music.
The basic concept I am researching in this project is a classification problem where learned gestures are assigned a set of labels representing the on/off state for a group of synthesis modules. Each on-state is a point in a multidimensional synthesis parameter space for that module. During performance, gestures heard by the system will be placed in the multidimensional gesture space and matched to the learned gesture of least distance. The performed gesture is then mapped into the parameter space for each module that is assigned an on-state for the matched learned gesture. Linear interpolation is then used to determine the appropriate output parameters based on the learned points in the given parameter space.
This design only addresses processing of a single gesture. I will also investigate possible solutions for parsing gestures in the context of a larger composition/improvisation.
Real-time applications of this system is an important long-term goal. All methods explored will be evaulated for their functional success and realistic ability to be implimented in a real-time system.
My proposed design will be a combination of supervised learning and reinforcement learning. An initial supervised learning stage will be used to assign parameter-state labels. Multiple variations on the gestures will be used for each label. A reinforcement stage will further train the system by assigning rewards to the system after a rehearsal session with a performer. The reinforcement stage could serve two functions:
Frequency spectrum analysis techniques have been useful in speech recognition, audio classification, and soundspotting (a technique Michael Casey’s that has similarities to my proposed gesture matching technique). Currently I am considering using either a short-time fast Fourier transform, constant Q transform, or Mel-frequency cepstral coefficients using a discrete cosine transform to perform the spectral analysis.
The dimensionality of the frequency domain representation of a gesture will be much higher than what is needed to perform minimum distance matching efficiently if STFT or constant Q techniques are used instead of MFCC. Gestures analyzed this way will need to be mapped into a lower dimensional space during training and application of the system. Currently I am planning on using PCA to reduce the dimensionality of the gesture’s spectrum.
I am considering the possibility that spectral techniques will not perform well for defining the gesture space to be used for minimum distance matching of performed gestures. If this does not prove fruitful, I will investigate using perceptible musical features to create the gestures space (e.g. pitch, dynamics, tempo, harmony).
While the main focus of this project is learning how to respond to individual gestures, this alone would not provide a complete system that could be used in performance. Unless the performer was using some sort of trigger to turn on gesture listening, the system would need to know how to identify and respond to learned gestures within the context of a larger composition/improvisation. I will investigate previous research into creating artificial improvisers that respond to a live performer for techniques used to parse phrases/gestures for analysis out of a larger musical context. Currently, I am thinking of investigating sequential techniques such as hidden Markov models as well as neural networks.
I am currently doing some other research into using genetic algorithms to generate electronic sound material to augment an acoustic performer. I am considering attempting to apply these techniques to choose parameters for the on-state modules for the matched gesture instead of merely using linear interpolation. Fitness would be assigned through rehearsal.
will use a MIDI controller and software synthesizer to record a series of gesture variations to be used in a short improvisation. Synthesizing training and test data this way versus using a microphone on an acoustic instrument is to reduce any outside variables while designing the system. After this research is complete, I do plan on trying the system with a microphone and an acoustic instrument to see how well my system responds in real-world conditions. Both audio recordings and MIDI recordings will be saved so that MIDI data can be used to define a gesture space if spectral techniques do not prove to be useful.