Disease Prediction ----- Machine Learning Project Milestone

Yilong Zhao

May 08, 2012

Progress Summary

I have implemented most algorithms needed for this project including Naive Bayes, Decision Tree, SVM( Support Vector Machine) and combination methods such as Majority Vote and Borda Count. I also got some preliminary result for the project.

Algorithm Details and Result

Classifiers Details

I have implemented following three algorithms to do the classification for this project:

Implemented Classifier Combination Methods

I have finished two easiest classifier combination methods:

Preliminary Result

I splitted the original data into training sets and test sets. The result below is the corresponding error rate of different machine learning methods.


PIC

Figure 1: Test errors on the test data using different methods

Result Analysis

I have a rather high error rate. There might be two or more reasons account for this:
  • The size of the data set is too small, there are no more than 400 entries in the training data
  • Some unrealistic assumptions lie in some of the methods I implemented.

Future Work

  • Increase and modify the data set.
  • The combiners I used require no training, I plan to implement two more combiners which are Bayesian combiner and class-conditional density estimation which require training. Maybe better result will be obtained.
  • Reanalyze the result, finish the final write-up. I may also refactor the current code if I have time.