Dartmouth College Computer Science Technical Report series 
CS home TR home TR search TR listserv 

By author:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z  
By number:  2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986 
Abstract:
Given a collection of strings S={s_1,...,s_n} over an alphabet Sigma, a superstring alpha of S is a string containing each s_i as a substring, that is, for each i, 1<=i<=n, alpha contains a block of s_i consecutive characters that match s_i exactly. The shortest superstring problem is the problem of finding a superstring alpha of minimum length.
The shortest superstring problem has applications in both computational biology and data compression. The problem is NPhard [GallantMS80]; in fact, it was recently shown to be MAX SNPhard [BlumJLTY91]. Given the importance of the applications, several heuristics and approximation algorithms have been proposed. Constant factor approximation algorithms have been given in [BlumJLTY91] (factor of 3), [TengY93] (factor of 28/9), [CzumajGPR94] (factor of 25/6) and [KosarajuPS94] (factor of 250/63).
Informally, the key to any algorithm for the shortest superstring problem is to identify sets of strings with large amounts of similarity, or overlap. While the previous algorithms and their analyses have grown increasingly sophisticated, they reveal remarkably little about the structure of strings with large amounts of overlap. In this sense, they are solving a more general problem than the one at hand.
In this paper, we study the structure of strings with large amounts of overlap and use our understanding to give an algorithm that finds a superstring whose length is no more than 23/4 times that of the optimal superstring. We prove several interesting properties about short periodic strings, allowing us to answer questions of the following form: given a string with some periodic structure, characterize all the possible periodic strings that can have a large amount of overlap with the first string.
Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]
Or copy and paste:
Chris Armen and
Clifford Stein,
"A 23/4Approximation Algorithm for the Shortest Superstring Problem."
Dartmouth Computer Science Technical Report PCSTR94214,
1994.
Notify me about new tech reports.
To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu
Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Technical reports collection maintained by David Kotz.