Detecting an Individual Speaker in an Environment with Multiple Sound Sources Using a Single Microphone System

Sean Curran

Objective

The principle goal of my project is to determine whether a particular speaker is involved in a conversation, having previously learned distinct features of the target speaker's voice. This task may involve learning on multiple levels. The features that identify the speaker must be learned and then the algorithm must learn to detect these features amidst other sounds and voices.

Methods

This problem is related to speaker recognition, a problem for which low-error solutions exist through the use of such methods as Support Vector Machines. However, traditional speaker recognition systems lose their effectiveness in noisy, multi-speaker environments.

I will initially work with Support Vector Machines because the problem still entails binary classification and SVMs have been succesfully used in the one-source case. My focus will be on adjusting past methods to be more robust in the presence of multiple sound sources. One way in which I will attempt to improve perfomance is by changing the features that are used to distingush different speakers. I will explore the use of common features such as mel-frequency cepstrum cooefficients and also investigate the use of more unusual attributes. Other than SVMs, I intend to look at Artificial Neural Networks and Hidden Markov Models, two other methods that have been succesfully used to attack the speaker recognition problem.

Without the one microphone limitation, one could separate the various sources and then run a traditional speaker recognition system on each source to determine whether the target speaker is present. However, using this method could require many microphones depending on the number of sound sources.

Data

There are numerous corpora available online containing conversational speech, such as the COSINE Corpus at the University of Washington. The COSINE database is promising because I believe it contains both conversational speech and speech spoken directly into a microphone by the same individuals involved in the conversation.

Timeline

By the milestone, I expect to have a Support Vector Machine based method of solving my selected problem and will begin exploring other potential methods or optimizations to the SVM method.

References

Speaker Recognition in a Multi-Speaker Environment, Alvin F. Martin, Mark A. Przybocki

Combining classifier decisions for robust speaker identification Daniel J. Mashao, Marshalleno Skosan 2006

A Comparison of Several Approaches to the Feature Extractor Design for ASR Tasks in Telephone Environment, Antolin et al. 2003