In brief:   BibTeX · List (pdf) · Citations (Scholar)

  Multicore Homology. with Ryan Lewis.
Full Manuscript, 2012.

We design and implement a framework for parallel computation of homology of cellular spaces over field coefficients. We begin by cutting a space into local pieces. We then compute the homology of each piece in parallel using the persistence algorithm. Finally we glue the pieces together by constructing the Mayer-Vietoris blowup complex and computing its homology. Theoretically, we show that the first step, optimal decomposition into local pieces is NP-HARD. In practice we use a decomposition based on graph partitions, which produces decompositions with a simple structure and bounded overlap. We implement our algorithms for multicore computers, and demonstrate their efficacy with a suite of experiments. For example, we achieve roughly an 8x speedup of the homology computations on a 3-dimensional complex with about 10 million simplices using 11 cores.

  Constructing Simplicial Complexes over Topological Spaces. with Milka Doktorova.
Full Manuscript, 2012.

Topological data analysis usually begins by constructing a combinatorial structure, such as a simplicial complex, to approximate the lost topology of a sampled point set. In this paper, we present an oracle-based framework and algorithms that construct high-dimensional simplicial complexes over arbitrary topological spaces. Using the minimum-sized representation for the simplicial complexes, we design a novel top-down algorithm and analyze it both theoretically and experimentally. We compare its performance to other algorithmic approaches, building up to 27-dimensional complexes on a standard desktop machine. Finally, we apply our framework to problems from three domains: Google search, word composition, and protein structure.

  Advances in Applied and Computational Topology (Editor)
American Mathematical Society, 2012.
Barnes and Noble

What is the shape of data? How do we describe flows? Can we count by integrating? How do we plan with uncertainty? What is the most compact representation? These questions, while unrelated, become similar when recast into a computational setting. Our input is a set of finite, discrete, noisy samples that describes an abstract space. Our goal is to compute qualitative features of the unknown space. It turns out that topology is sufficiently tolerant to provide us with robust tools.

This is the proceedings of the Short Course on Computational Topology at the 2011 Joint Mathematics Meeting, organized by myself.

  Digitizing 18th-Century French Literature: Comparing transcription methods for a critical edition text. with Ann Irvine and Laure Marcellesi.
Workshop on Computational Linguistics for Literature, 2012.
Joséphine de Monbart: Lettres taïtiennes, Laure Marcellesi (Editor), 2012.

We compare four methods for transcribing early printed texts. Our comparison is through a case-study of digitizing an eighteenth-century French novel for a new critical edition: the 1784 Lettres taïtiennes by Joséphine de Monbart. We provide a detailed error analysis of transcription by optical character recognition (OCR), non-expert humans, and expert humans and weigh each technique based on accuracy, speed, cost and the need for scholarly overhead.

  Topological Data Analysis
Advances in Applied and Computational Topology, 2012.

Scientific data is often in the form of a finite set of noisy points, sampled from an unknown space, and embedded in a high-dimensional space. Topological data analysis focuses on recovering the topology of the sampled space. In this chapter, we look at methods for constructing combinatorial representations of point sets, as well as theories and algorithms for effective computation of robust topological invariants. Throughout, we maintain a computational view by applying our techniques to a dataset representing the conformation space of a small molecule.

  The Tidy Set: A Minimal Simplicial Set for Computing Homology of Clique Complexes
Full Manuscript, 2010. (Invited, then rejected from CGTA)
26th ACM Symposium on Computational Geometry, Snowbird, UT, 2010.

We introduce the tidy set, a minimal simplicial set that captures the topology of a simplicial complex. The tidy set is particularly effective for computing the homology of clique complexes. This family of complexes include the Vietoris-Rips complex and the weak witness complex, methods that are popular in topological data analysis. The key feature of our approach is that it skips constructing the clique complex. We give algorithms for constructing tidy sets, implement them, and present experiments. Our preliminary results show that tidy sets are orders of magnitude smaller than clique complexes, giving us a homology engine with small memory requirements.

  Fast Construction of the Vietoris-Rips Complex
Computers & Graphics, 2010. (Invited) – doi:10.1016/j.cag.2010.03.007
Shape Modeling International, Aix-en-Provence, France, 2010.

The Vietoris-Rips complex characterizes the topology of a point set. This complex is popular in topological data analysis as its construction extends easily to higher dimensions. We formulate a two-phase approach for its construction that separates geometry from topology. We survey methods for the first phase, give three algorithms for the second phase, implement all algorithms, and present experimental results. Our software can also be used for constructing any clique complex, such as the weak witness complex.

  Computing Multidimensional Persistence. with Gunnar Carlsson and Gurjeet Singh.
Journal of Computational Geometry, 2010.
20th International Symposium on Algorithms and Computation, Waikiki, HI, 2009.

The theory of multidimensional persistence captures the topology of a multifiltration – a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational commutative algebra and utilize algorithms from this area to solve it. While the resulting problem is EXPSPACE-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent withing multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.

  Computational Topology
Algorithms and Theory of Computation Handbook, 2010.

According to the Oxford English Dictionary, the word topology is derived of topos (τοπος) meaning place, and -logy (λογια), a variant of the verb λεγειν, meaning to speak. As such, topology speaks about places: how local neighborhoods connect to each other to form a space. Computational topology, in turn, undertakes the challenge of studying topology using a computer.

This is Chapter 3 of Volume 2 of the second edition of the Algorithms and Theory of Computation Handbook, edited by Mikhail J. Atallah and Marina Blanton.

  Colon Polyp Detection using Smoothed Shape Operators: Preliminary Results. with Padma Sundaram, Christopher Beaulieu, and Sandy Napel.
Medical Image Analysis, 2008. – doi:10.1016/
United States Patent 8,055,047, November 8, 2011.

Computer-aided detection (CAD) algorithms identify locations in Computed Tomographic (CT) images of the colon that are most likely to contain polyps. In this paper, we present the Smoothed Shape Operators method (SSO), which uses a geometry processing approach. We extract a triangle mesh representation of the colon surface, and estimate curvature on this surface using the shape operator. We then smooth the shape operators on the surface iteratively. We evaluate our algorithm on patient data and provide free-response receiver operating characteristic performance analysis over all size ranges of polyps. We also provide confidence intervals for our performance estimates. We compare our performance with the Surface Normal Overlap (SNO) method for the same data.

  On the Local Behavior of Spaces of Natural Images. with Gunnar Carlsson, Tigran Ishkhanov, and Vin de Silva.
International Journal of Computer Vision, 2008. (ps.gz) – doi:10.1007/s11263-007-0056-x

In this study, we concentrate on qualitative topological analysis of the local behavior of the space of natural images. To this end, we use a space of 3 by 3 high-contrast patches M studied by Mumford et al. We develop a theoretical model for the high-density 2-dimensional submanifold of M showing that it has the topology of the Klein bottle. Using our topological software package PLEX we experimentally verify our theoretical conclusions. We use polynomial representation to give coordinatization to various subspaces of M. We find the best-fitting embedding of the Klein bottle into the ambient space of M. Our results are currently being used in developing a compression algorithm based on a Klein bottle dictionary.

  Persistent voids: a new structural metric for membrane fusion. with Peter M. Kasson, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, and Vijay S. Pande.
Bioinformatics, 2007. (ps.gz) – doi:10.1093/bioinformatics/btm250

Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. We introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. Our findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a full-expanded one.

  Localized Homology. with Gunnar Carlsson.
Computational Geometry: Theory & Applications, 2008. (ps.gz) – doi:10.1016/j.comgeo.2008.02.003
Shape Modeling International, Lyon, France, 2007. (ps.gz)
Supplemental package: Localizing within a 3D Solid

In this paper, we provide a complete theoretical foundation and an effective algorithm for localizing topological attributes. Unlike previous work that focused on 2-manifolds with restricted geometry, our theory is general and localizes arbitrary-dimensional attributes in arbitrary spaces. We implement our algorithm to validate our approach in practice.

  The Theory of Multidimensional Persistence. with Gunnar Carlsson.
Discrete and Computational Geometry, 2009 (Invited). (ps.gz) – doi:10.1007/s00454-009-9176-0
23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, 2007. (ps.gz)

Persistent homology captures the topology of a filtration – a one-parameter family of increasing spaces – in terms of a complete discrete invariant. This invariant is a multiset of intervals that denote the lifetimes of the topological entities within the filtration. In many applications of topology, we need to study a multifiltration: a family of spaces parameterized along multiple geometric dimensions. In this paper, we show that no complete discrete invariant exists for multidimensional persistence. We also propose a discrete invariant, the rank invariant, for the robust estimation of Betti numbers in a multifiltration, and prove its completeness in one dimension.

  Geometric Filtering of Pairwise Atomic Interactions Applied to the Design of Efficient Statistical Potentials. with Leonidas J. Guibas and Patrice Koehl.
Computer Aided Geometric Design, 2006. (ps.gz) – doi:10.1016/j.cagd.2006.03.002

Distance-dependent, pairwise, statistical potentials are based on the concept that the packing observed in known protein structures can be used as a reference for comparing different 3D models for a protein. We show that we can filter the list of all interactions in a protein to generate a much smaller subset of pairs that retains most of the structural information contained in proteins.

  The Conformal Alpha Shape Filtration. with Frederic Cazals, Joachim Giesen, and Mark Pauly.
The Visual Computer, 2006 (Invited). (ps.gz) – doi:10.1007/s00371-006-0027-1
Symposium on Point-Based Graphics, Stony Brook, NY, 2005. (ps.gz)

We define a new filtration of the Delaunay triangulation of a finite set of points in Rd, similar to the alpha shape filtration. The new filtration is parameterized by a local scale parameter instead of the global scale parameter in alpha shapes. Since our approach shares many properties with the alpha shape filtration and the local scale parameter conforms to the local geometry we call it conformal alpha shape filtration. The local scale parameter is motivated from applications and previous algorithms in surface reconstruction. We show how conformal alpha shapes can be used for surface reconstruction of non-uniformly sampled surfaces, which is not possible with alpha shapes.

Note: Conference version has the title "Conformal Alpha Shapes".

  Persistence Barcodes for Shapes. with Gunnar Carlsson, Anne Collins, and Leonidas J. Guibas.
International Journal of Shape Modeling, 2005. (ps.gz) – doi:10.1142/S0218654305000761
Symposium on Geometry Processing, Nice, France, 2004. (ps.gz)

In this paper, we initiate a study of shape description and classification via the application of persistent homology to two tangential constructions on geometric objects. The homology of our first construction, the tangent complex, can distinguish between topologically identical shapes with different "sharp" features, such as corners. To capture "soft" curvature-dependent features, we define a second complex, the filtered tangent complex, obtained by parametrizing a family of increasing subcomplexes of the tangent complex. Applying persistent homology, we obtain a shape descriptor, called a barcode, that is a finite union of intervals. We define a metric over the space of such intervals, arriving at a continuous invariant that reflects the geometric properties of shapes.

  A Barcode Shape Descriptor for Curve Point Cloud Data. with Anne Collins, Gunnar Carlsson, and Leonidas J. Guibas.
Computers and Graphics, 2004 (Invited). (ps.gz) – doi:10.1016/j.cag.2004.08.015
Symposium on Point-Based Graphics, Zürich, Switzerland, 2004 (ps.gz)
Video (Quicktime)

In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data. Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we have implemented our pipeline and provide experimental evidence in shape classification and parametrization.

  Computing Persistent Homology. with Gunnar Carlsson
Discrete and Computational Geometry, 2005 (ps.gz) (Errata) – doi:10.1007/s00454-004-1146-y
20th ACM Symposium on Computational Geometry, Brooklyn, NY, 2004 (ps.gz)

Application Video: Witness Complexes -- The Mumford Dataset (AVI)

We study the homology of a filtered d-dimensional simplicial complex K as a single algebraic entity and establish a correspondence that provides a simple description over fields. Our analysis enables us to derive a natural algorithm for computing persistent homology over an arbitrary field in any dimension. Our study also implies the lack of a simple classification over non-fields. Instead, we give an algorithm for computing individual persistent homology groups over an arbitrary PIDs in any dimension.

  Computing and Comprehending Topology: Persistence and Hierarchical Morse Complexes. Ph.D. Thesis. Urbana, Illinois, October 2001.
One-Sided (Hyperlinked)
Two-Sided (Hyperlinked)
Technical Report UIUCDCS-R-2001-2240

The fully hyperlinked versions include bookmarks, thumbnails, and backreferences, and are appropriate for online viewing. The two-sided versions have different odd and even margins for duplex printing and binding.

  Computing Linking Numbers of a Filtration. with Herbert Edelsbrunner
Homology, Homotopy, and Applications, 2003 (Invited).
1st Workshop on Algorithms in BioInformatics, Århus, Denmark, 2001

We develop fast algorithms for computing the linking number of a simplicial complex within a filtration. We give experimental results in applying our work toward the detection of non-trivial tangling in biomolecules, modeled as alpha complexes.

  Hierarchical Morse-Smale Complexes for Piecewise Linear 2-Manifolds. with Herbert Edelsbrunner and John Harer
French translation/interpretation by Sara Deriviere, 2006
Discrete and Computational Geometry, 2003 (Invited). – doi:10.1007/s00454-003-2926-5
17th ACM Symposium on Computational Geometry, Medford, MA, 2001

We present algorithms for constructing a hierarchy of increasingly coarse Morse complexes that decompose a piecewise linear 2-manifold. While Morse complexes are defined only in the smooth category, we extend the construction to the piecewise linear category by ensuring structural integrity and simulating differentiability. We then simplify Morse complexes by cancelling pairs of critical points in order of increasing persistence.

Note: Conference version refers to "Morse complexes". We changed the title according to a reviewer's suggestion.

  Topological Persistence and Simplification. with Herbert Edelsbrunner and David Letscher
Discrete and Computational Geometry, 2002 (Invited). – doi:10.1007/s00454-002-2885-2
41st Symposium on Foundations of Computer Science, Redondo Beach, CA, 2000

We formalize a notion of topological simplification within the framework of a filtration, which is the history of a growing complex. We classify a topological change that happens during growth as either a feature or noise depending on its life-time or persistence within the filtration. We give fast algorithms for computing persistence and experimental evidence for their speed and utility.

  Fast Software for Box Intersection. with Herbert Edelsbrunner
International Journal of Computational Geometry and Applications, 2002 (Invited). – doi:10.1142/S0218195902000785
16th ACM Symposium on Computational Geometry, Hong Kong, 2000

We present fast implementations of a hybrid algorithm for reporting box and cube intersections. Our algorithm initially takes a divide-and-conquer approach and switches to simpler algorithms for low numbers of boxes. We use our implementations as engines to solve problems about geometric primitives. We look at two such problems in the category of quality analysis of surface triangulations.

  Context-Free Language Induction by Evolution of Deterministic Push-Down Automata Using Genetic Programming.
AAAI Fall Symposium. Boston, 1995.

The process of learning often consists of Inductive Inference, making generalizations from samples. The problem here is finding generalizations (Grammars) for Formal Languages from finite sets of positive and negative sample sentences. The focus of this paper is on Context-Free Languages (CFL's) as defined by Context-Free Grammars (CFG's), some of which are accepted by Deterministic Push-Down Automata (D-PDA). This paper describes a meta-language for constructing D-PDA's. This language is then combined with Genetic Programming to evolve D-PDA's which accept languages. The technique is illustrated with two favorite CFL's.