|
Dartmouth College Computer Science Technical Report series |
CS home TR home TR search TR listserv |
| By author: | A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | |
| By number: | 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986 | |
Abstract:
Domain knowledge expressed in structured citation formats
can be exploited in data mining. We propose four structural properties
of canonically cited texts, then look at to two classic problems in
the study of the scholia, or ancient scholarly commentary, found in the
manuscripts of the Iliad. We cluster citations of scholia to analyze their
distribution in different manuscripts; this leads to a revised view of how
the manuscripts' scribes drew on their source material. Correlated
frequencies of named entities suggest that one group of manuscripts
had access to material more closely based on the work of the greatest
Hellenistic editor of Homer, Aristarchus of Samothrace.
Note:
In proceedings for Text Mining Services, page 129-139, 2009.
Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]
Or copy and paste:
D. Neel Smith and
Gabriel A. Weaver,
"Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture."
Dartmouth Computer Science Technical Report TR2009-649,
June 2009.
Notify me about new tech reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu
Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Technical reports collection maintained by David Kotz.