Decomposing Duets: Harmonic Structed-Based Instrument Identification and Separation

Kyle Konrad

Project Proposal

Introduction

Humans listening to multiple musical instuments playing together can effortlessly pick out the part of a single instrument and often identify which instruments they are hearing based on past experience. The distinctive sound of a given instrument is determined by the harmonic structure of the instrument - the relative amplitude of the harmonics that accompany whatever fundamental frequency is being played.

The goal of this project is to endow a machine with the same ability by analysis of harmonic structures extracted from audio recordings. The final product will have a library of known instruments and will be able to match heard instruments to this known library. Furthermore the system will be able to extract individual instrument lines in simple settings, such as the duet.

Methods

Creating the known library of instruments will be accomplished through supervised training. The computer will be presented with labelled solo excerpts of various instruments and compute and store average harmonic structures for these instruments.

In order to analyze a musical excerpt the computer will create a frequency spectrum and identify spectral peaks for each time step by measuring deviations from a gaussian envelope of amplitude over frequencies. The number of instruments will be estimated using the Bayesian Information Criterion and fundamental frequencies will be found using maximum likelihood estimation. Harmonics are extracted from previously identified spectral peaks and matched to their corresponding fundamentals.

The totality of harmonic structures over all time steps will then be clustered using the nearest-k algorithm. These average harmonic structures will be matched to known instruments. The harmonic structures of known instruments may be used for error correction in frames where overlapping harmonics cannot be separated.

Following the identification of instruments, fundamentals will be reestimated and harmonics reextracted using maximum likelihood estimation using the newly computed average harmonic structures. The separated harmonics will be reconstructed to audio using an inverse fourier transform.

Data

Data for supervised training will be labelled solo excerpts of various harmonic instruments such as clarinet, flute, oboe, guitar, violin, etc. The audio samples to be analyzed will be classical duets featuring flute and guitar.

Timeline

After assembling data sets, known libraries will be created within two weeks. By the milestone submission I hope to have the instrument identification completed and begin work on part extraction.

References