Dartmouth logo Dartmouth College Computer Science
Technical Report series
CS home
TR home
TR listserv
By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
By number: 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

Automatic Generation of Input Grammars Using Symbolic Execution
Linda Xiao
Dartmouth TR2020-898


Invalid input often leads to unexpected behavior in a program and is behind a plethora of known and unknown vulnerabilities. To prevent improper input from being processed, the input needs to be validated before the rest of the program executes. Formal language theory facilitates the definition and recognition of proper inputs.

We focus on the problem of defining valid input after the program has already been written. We construct a parser that infers the structure of inputs which avoid vulnerabilities while existing work focuses on inferring the structure of input the program anticipates. We present a tool that constructs an input language, given the program as input, using symbolic execution on symbolic arguments. This differs from existing work which tracks the execution of concrete inputs to infer a grammar. We test our tool on programs with known vulnerabilities, including programs in the GNU Coreutils library, and we demonstrate how the parser catches known invalid inputs.

We conclude that the synthesis of the complete parser cannot be entirely automated due to limitations of symbolic execution tools and issues of computability. A more comprehensive parser must additionally be informed by examples and counterexamples of the input language.


Senior Honors Thesis. Advisor: Sean W. Smith.

PDF PDF (1835KB)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
   Linda Xiao, "Automatic Generation of Input Grammars Using Symbolic Execution ." Dartmouth Computer Science Technical Report TR2020-898, June 2020.

Notify me about new tech reports.

Search the technical reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu

Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.