Disease Prediction ----- Machine Learning Project Milestone
Yilong Zhao
May 08, 2012
Progress Summary
I have implemented most
algorithms needed for this project including Naive Bayes,
Decision Tree, SVM( Support Vector Machine) and combination
methods such as Majority Vote and Borda Count. I also got some
preliminary result for the project.
Algorithm Details and Result
Classifiers Details
I have implemented following three algorithms to do the classification for this project:
- Naive Bayes Classifier: Based on the Bayes rule, I implemented algorithms to get posterior probability by assuming the independence of features. For the continuous parameter, I simply assume that the distribution of the data is a
Gaussian distribution.
- Decision Tree: I used C4.5 algorithm to generate the decision tree.
- SVM: I implemented the multi-class SVM using one-versus-one strategy. Train all the possible combinations of the classes and got the prediction by using the class with the largest votes. I used the Matlab's built-in functions for this method, I am working on implement my own SVM classifer using one-versus-the rest method.
Implemented Classifier Combination Methods
I have finished two easiest classifier combination methods:
- Majority Vote: After training the data using all the classifiers, I try to assign the class label to the test data by the majority vote from all the classifiers.
- Borda Count Method: Borda Count is a quantity defined on the ranked outputs of each classifier. The patten will be assigned to the class with the highest Borda count.
Preliminary Result
I splitted the original data into training sets and test sets. The result below is the corresponding error rate of different machine learning methods.