BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR2002-428 ENTRY:: June 12, 2002 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Information-theoretic Bounds on the Training and Testing Error of Boosting TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Lahaie, Sebastien M. DATE:: May 2002 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: Compressed Postscript at http://www.cs.dartmouth.edu/reports/TR2002-428.ps.Z RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR2002-428.pdf ABSTRACT:: Boosting is a means of using weak learners as subroutines to produce a strong learner with markedly better accuracy. Recent results showing the connection between logistic regression and boosting provide the foundation for an information-theoretic analysis of boosting. We describe the analogy between boosting and gambling, which allows us to derive a new upper bound on training error. This upper bound explicitly describes the effect of noisy data on training error. We also use information-theoretic techniques to derive an alternative upper-bound on testing error which is independent of the size of the weak-learner space. END:: ncstrl.dartmouthcs//TR2002-428