Speaker Verification Final Report

Introduction

My project addressed the problem of speaker verification. The goal of speaker verification is to learn the distinguishing featuers of a particular individual's voice a

Implementation

As described in class, the Sequential Minimal Optimization algorithm makes progress by optimizing the objective function with respect to two of the Lagrange multipliers (alphas). My implementation did not utilize the heuristics recommend by my references for choosing the alphas to optimize at each step. Initially, the algorithm chose both alphas randomly but this led to poor performance and slow convergence on larger data sets. The updated version ensures that each alpha is examined at least once during every iteration by simply looping through the entire set. The second alpha is still randomly chosen.

The algorithm was implemented in both MATLAB and C. The code is essentially the same for both MATLAB and C and it generates the same results on small data sets. However, on the larger datasets the MATLAB code runs much more slowly than the C code. On an 800 observation training set, the C version of the SMO using the Gaussian Kernel often takes less than one minute to converge while the MATLAB version takes several minutes to complete a single loop through all the alphas.

The algorithm terminates when less than 1% of the alphas are changed for ten consecutive iterations. This simple stopping criterion was chosen mainly due to the ease of implementation. It also avoids expensive computations and floating point comparisons required by evaluating the objective function and determining which Lagrange multipliers violated the KKT conditions respectively.

Whenever two alphas are updated the bias parameter is updated as well. The procedure for doing this exactly follows the procedure suggested by Platt.

I did not use much outside software in my implementation. One of the advantages of SMO is that most of the computations involved are relatively simple. The most significant linear algebra operations required by the algorithm are dot products. The dot products in my algorithm are computed using a call to the SDOT function from the Basic Linear Algebra Subprograms (BLAS) library. It would be easy to implement a dot-product function, but my naïve implementation would probably be much slower than the BLAS version which has been optimized and tuned specifically for my computer’s architecture and came with the operating system (Mac OSX).

Features and Data

For most of my training and testing I used 25 Mel-Frequency Cepstral Coefficients for my feature vector. I also experimented with a 50 dimensional vector consistenting of the 25 MFCCs and 25 delta coefficients computed from the MFCCs. The deltas measure the rate of change over time of the coefficients. I found that adding the deltas did not improve my results and increased training time so I dropped them from further testing.

The data came from the CHAINS corpus at University College Dublin. It can be found here.

Training, Testing, and Results

The subset of the data I used for most of my training had 800 examples total, 400 target speaker samples and 400 imposter speaker samples. Initially, eight-fold cross validation was used in choosing the parameters for various kernels. For testing, the algorithm was trained on all 800 training examples and tested on another set containing 590 target examples and 10000 imposter examples.

I did eight fold cross validation with both linear and quadratic kernels to set the parameter C. The linear SVM performed almost equivalently to using quadprog in MATLAB, achieving a slightly better error rate. However, both of these kernels caused the algorithm to converge very slowly and the Gaussian kernel clearly outperformed them in terms of classification accuracy, so I did not spend much time tuning with these two kernels. Most of my cross validation time was spent on the Gaussian kernel. I experimented with several hundred combinations of C and tau (Note that in my code, sigma refers to tau. I did not use sigma in the write-up or the poster because my use of sigma is inconsistent with its conventional usage regarding the Gaussian kernel).

Unfortunately, the parameters suggested by cross-validation often led to very different results when used for the test set. I attempted to account for this in several ways. First, I increased the training set size to 900, still evenly split between positive and negative examples, and did nine-fold cross validation. I did this so that the cross-validation training set size matched the size of the training set that would be used during testing. This did not remove the inconsistencies from cross validation to training. I also used my test program on the cross-validation partitions to make sure the two programs were classifying observations in the same way; this appears to be the case.

The best results during testing ended up being 7% Training Error and 18% Test Error (Average between false positives and false negatives). During crossvalidation average test error rates were as low as 10%.

Possible Explanations for Inconsistencies and Poor Performance:

I am suspicious of my implementation’s determination of the bias and believe that a suboptimal bias may explain my algorithm’s inconsistent performance. My chosen method for updating the bias may be poor. I update the bias precisely as originally suggested by Platt. Keerthi et. al have suggested a more sophisticated procedure for choosing the bias which they argue is more efficient computationally and generates a result that is closer to the optimum. This method is also suggested in Scholkopf and Smola as an improvement to the method used by my implementation. The inconsistent performance may also be caused by some bug in my code that I have yet to discover. If I had more time I would perform more extensive unit testing, especially on any functions/sections of code that affect or are affected by the bias, to make sure there are no undiscovered bugs. I would also implement the improved bias update procedure recommended in the literature. Once I was certain that my algorithm worked as intended I would try it on other datasets, such as the one used by the NIST, if I could get access to them.

References

Platt, John, "Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines", 1998

Scholkopf and Smola, "Learning with Kernels", 2002

Keerthi et al.,"Improvements to Platt's SMO Algorithm for SVM Classifier Design",1999

W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo, "Support vector machines for speaker and language recognition", Computer Speech & Language, 2004