Change in Objective
Due to the difficulty of detecting a speaker in a noisy environment with multiple speakers, the primary objective of the project is now simply speaker verification. A single microphone source seperation algorithm may be implemented later to address the original objective.
Methods
So far I have used Support Vector Machines to address my problem. In Matlab, I have implemented a SVM that uses the SMO Algorithm introduced by John Platt in 1998. As of the milestone presentation, my SMO implemenation does not work as intended; the Lagrange multipliers it returns are all zero. I gathered some preliminary results using the Matlab function quadprog.
For the feauture vector I used 25 mel-frequency cepstral cooefficients and 25 deltas. I found that these features were the most commonly used in the various papers that I read.
The data I have used for training and testing so far is from the CHAINS corpus of University College Dublin. I used speech from 33 speakers for the imposter dataset and had 3 target speakers. The imposter training set consisted of 400 randomly selected observations. The size of the target training set varied by speaker.
Results
The algorithm was trained twice for each target, once with deltas and once without deltas. Only the linear kernel was used. The results were used to classify three test sets. On average, 22% of the imposters were falsly identified as the target and 27% of the target observations were misclassified as imposters. The results with and without the deltas were almost identical, so I will investigate the reasons for which they are often included in others' feature vectors and consider dropping them.
Future Goals
In the very short term I would like to fix the SMO algorithm. Doing cross validation and tuning parameters is difficult with quadprog because it takes so long.
I would like to test the algorithm with other kernels such as the Gaussian kernel and the more specialized kernels I have encountered in the papers I've read.
Finally, I would like to explore HMMs as an alternative to SVMs as a method for speaker verification or augment the SVM that I implement with HMMs.
References
Platt, John, "Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines", 1998
Huang et al., "Spoken Language Processing: A Guide to Theory, Algorithm and System Development", 2001
Campbell, W.M., "A SVM/HMM system for speaker recognition," Acoustics, Speech, and Signal Processing, 2003
W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo, "Support vector machines for speaker and language recognition", Computer Speech & Language, 2004