BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR2006-584 ENTRY:: January 22, 2008 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Tools and algorithms to advance interactive intrusion analysis via Machine Learning and Information Retrieval TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Aslam, Javed AUTHOR:: Bratus, Sergey AUTHOR:: Pavlu, Virgil DATE:: September 2006 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR2006-584.pdf ABSTRACT:: We consider typical tasks that arise in the intrusion analysis of log data from the perspectives of Machine Learning and Information Retrieval, and we study a number of data organization and interactive learning techniques to improve the analyst's efficiency. In doing so, we attempt to translate intrusion analysis problems into the language of the abovementioned disciplines and to offer metrics to evaluate the effect of proposed techniques. The Kerf toolkit contains prototype implementations of these techniques, as well as data transformation tools that help bridge the gap between the real world log data formats and the ML and IR data models. We also describe the log representation approach that Kerf prototype tools are based on. In particular, we describe the connection between decision trees, automatic classification algorithms and log analysis techniques implemented in Kerf. END:: ncstrl.dartmouthcs//TR2006-584