BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR2003-456 ENTRY:: May 31, 2003 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Discovery, Visualization and Analysis of Gene Regulatory Sequence Elements in Genomes TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Simola, Daniel F. DATE:: May 2003 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR2003-456.pdf ABSTRACT:: The advent of rapid DNA sequencing has produced an explosion in the amount of available sequence information, permitting us to ask many new questions about DNA. There is a pressing need to design algorithms that can provide answers to questions related to the control of gene expression, and thus to the structure, function, and behavior of organisms. Such algorithms must filter through massive amounts of informational noise to identify meaningful conserved regulatory DNA sequence elements. We are approaching these questions with the notion that visualization is a key to exploring data relationships. Understanding the exact nature of these relationships can be very difficult by simply interpreting raw data. The ability to look at data in a graphical form allows us to apply our innate capacity to think visually to discern the subtle relationships that might not be recognizable otherwise. This thesis provides computational tools to visually identify and analyze candidate motifs in the DNA of a species. This includes a parsing utility to store genomic data and an application to search for and visually identify motifs. Using these tools, novel and previously compiled gene sets were identified using the genome of the plant species Arabidopsis thaliana. NOTE:: Senior Honors Thesis. Advisor: Jay Aslam. END:: ncstrl.dartmouthcs//TR2003-456