# Yu-Han Lyu

## Ph.D. Student Computer Science, Dartmouth College

 Office: Room 207, Sudikoff Lab Address: Dartmouth College 6211 Hinman, Hanover NH, 03755-3510 E-mail: Yu-Han.Lyu@dartmouth.edu

I am a Ph.D. student doing robotics research at Dartmouth College. My advisor is Devin Balkcom. My research interests lie at the intersection of mathematical optimization and foundations of robotics. Recently, I designed algorithms for finding trajectories optimizing different objectives and have published papers in top international conferences/journals, including International Journal of Robotics Research, the top one journal with the highest impact factor in robotics research.
Besides robotics, I also like theoretical computer science and writing programs. I maintained a collection of algorithmic puzzles and code optimization tricks.
I have 15 years experience of C/C++/Java programming. I served as a teaching assistant for 11 courses and won Best TA award in Department of Computer Science at Dartmouth College in 2013.

## Experience

### Optimal trajectories for kinematic planar rigid bodies

We consider a simple system that is a rigid body in the Euclidean plane with no obstacles. The model of robot motion is only kinematic; that is, no dynamics, and no drift. How can we efficiently find a time-optimal trajectory connecting two given configurations?
As it turns out, optimal trajectories may not exist. Suppose you are trying to move a heavy bench from one place to another. The bench is too heavy so that the only thing you can do is to lift one end and rotate around the other end. If the initial position and the goal position are parallel, what is the best way to move the bench?
The best way is to rotate both ends alternatively through infinitessimal angles and has infinite number of control switches, which is impractical. In order to guarantee the existence of an optimal solution with finite number of switches, we introduce a switch cost on each switch of controls. We show that under this model, an optimal solution always exists. We also design an algorithm to find an optimal trajectory.
The source code can be found here.
 Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Dubins CCC curve Dubins LSR curve Reeds-Shepp C|C|C curve Reeds-Shepp LRSLR curve Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Reeds-Shepp LRLR curve Differential drive TGT curve Differential drive turn-drive-turn curve Omnidirectional vehicle ROLL curve Your browser does not support the video tag. Omnidirectional vehicle SHUFFLE curve

### Complete search for optimal trajectories

We consider a general motion planning problem: given a system (may be non-holonomic) and an environment with obstacles, find an optimal trajectory connecting two configurations. We show that optimal trajectories with certain clearance can always be approxiamted if an optimal steering is available.
The source code can be found here.

### Diversity and survival of expendable robots

We consider the following problem: a set of robots at one camp must move to another camp through a minefield. Robots have a model of the probabilistic distribution of mine locations. What trajectories should the robots follow, if the robots must decide their paths before moving and cannot reroute during movement?
We propose a new measure, $$k$$-survivability, of quality of $$n$$ trajectories in a stochastic threat environment to be the probability that at least $$k$$ paths successfully reach the goal. By maximizing $$k$$-survivability, we get an interestingly diverse trajectories.
The source code can be found here.

## Software

### Optimal trajectories for differential Drive, Dubins car, and Reeds-Shepp car

I implemented three analytical solvers for differential drive, Reeds-Shepp car, and Dubin's car.
The source code can be found here.

## Publication

### Journal Papers

• Yu-Han Lyu, Yining Chen, Devin Balkcom, k-survivability: Diversity and survival of expendable robots, submitted to IEEE Robotics and Automation Letters (RA-L).
• Yu-Han Lyu, Devin Balkcom, Optimal Trajectories for Kinematic Planar Rigid Bodies with Switching Costs, International Journal of Robotics Research, to appear.
• Shin-Cheng Mu, Yu-Han Lyu, and Akimasa Morihata, Approximate by Thinning: Deriving Fully Polynomial-Time Approximation Schemes, Science of Computer Programming, Volume 98, Part 4, 1 February 2015, Pages 484–515.

### Master's thesis

• Yu-Han Lyu, and Chuan Yi Tang, Random Generation of k-Connected graphs, master thesis.
I like to solve puzzles, especially algorithmic puzzles. A good algorithmic puzzle should have a simple and clear statement so that anyone can understand the problem easily. Moreover, a good algorithmic puzzle should possess multiple solutions: one easy solution that any CS student can construct, and one clever solution that a good programmer can figure out. I personally prefer puzzles with academic values, since theoretically computer scientist may use some heavy-duty techniques that I cannot imagine to solve problems.
I have collected algorithmic puzzles since I was an undergraduate. Three major sources are:
• algorithm textbooks: most problems in textbooks have some academic values. I bought 20+ algorithm textbooks and tried hard to solve all algorithm design problems in the textbooks to find fun problems.
• programing interview questions: a good interview question usually have a simple statement and a tricky solution.
• programming competitions: A good competition problem usually have a clever solution.
Here is a list of good algorithmic puzzles in my collection.
Given a number $$x$$ and an $$n \times m$$ matrix $$A$$ in which each row and column are in non-decreasing order, $$m \leq n$$, design an algorithm to determine whether $$x \in A$$.
For each column $$i$$, let $$y_i$$ be the largest index that $$A_{y_i, i} < x$$. Since each row is sorted, $$y_i$$ is decreasing in $$i$$. First, we linear scan from $$A_{n, 1}$$ to find $$y_1$$. Then, we use linear search in the second column from $$y_1$$ to find $$y_2$$, and so on. The complexity is $$O(n + m)$$. This method is called saddleback search.
A better solution: First, do binary search in the middle column $$A_{*, mid}$$ to find $$y_{mid}$$. Since the submatrix in the top-left of $$A_{x_{mid}, mid}$$ and the submatrix in the bottom-right of $$A_{x_{mid+1}, mid}$$ can not contain $$x$$, we recursively search for $$x$$ in the remaining two submatrices. The complexity is $$O(m \lg (n / m))$$.
Richard S. Bird, “Improving Saddleback Search: A Lesson in Algorithm Design,” Mathematics of Program Construction, MPC, 2006, pp 82-89. [DOI]
Given a number $$x$$ and an $$n \times m$$ matrix $$A$$ in which each row and column are in non-decreasing order, design an algorithm to find the rank of $$x$$ in $$A$$.
For each column $$i$$, let $$y_i$$ be the largest index that $$A_{y_i, i} < x$$. The rank of $$x$$ in $$A$$ equals $$\sum_{i=1}^m y_i$$. Since we can find all indices $$y_i$$ in $$O(n + m)$$ time by using the saddleback search, the rank of $$x$$ can be computed in $$O(n + m)$$ time as well.
A better solution: Use exponential search in the first column to find $$i = y_1$$. Then, apply exponential search on the $$i$$-th row to find the largest index $$j$$ such that $$A_{i, j} < x$$. Since each row and column are sorted, we can ignore the first $$j$$ columns and the last $$n - i$$ rows and solve the remaining problem in the same way. It can be shown that this algorithm takes $$O(\sqrt{k})$$, where $$k$$ is the rank of $$x$$.
Greg N. Frederickson, Donald B. Johnson, "Generalized selection and ranking Sorted matrices," SIAM Journal on Computing, 13(1), 14–30. [DOI]
Given a positive integer $$k$$ and an $$n \times m$$ matrix $$A$$ in which each row and column are in non-decreasing order, design an algorithm to find the $$k$$-th smallest number in $$A$$.
We first determine the minimum value $$min$$ and maximum value $$max$$ among all elements in $$A$$. Since the median must between $$min$$ and $$max$$, we apply binary search method on integers $$[min \dots max]$$ to find the median.
In each iteration of the binary search with search range, $$[l, \dots, h]$$, we find the middle point $$mid = (l + h) / 2$$. If the rank of $$mid$$ is larger than $$k$$, then we reduce the search range to $$[l, \dots, mid - 1]$$. Otherwise, we reduce the search range to $$[mid + 1, \dots, h]$$.
By the previous question, for any integer $$x$$, we can determine the rank of $$x$$ in $$O(n + m)$$ time. Since the maximum number of iterations is $$\lg (max - min)$$, the time complexity is $$O((n + m) \lg (max - min))$$.
This solution has two drawbacks: one minor drawback is that $$\lg (max - min)$$ is in fact linear in the length of the input. Theoretically, this method is not better than applying linear time selection algorithm on $$M$$.
Another drawback is that this method can only be applied on numbers or data like numbers, since this method needs to find the middle of two elements, even if the middle element is not in $$M$$. Hence, this method is not general.
A better solution: The idea is as follows: we iteratively eliminate some elements that cannot be the answer. Repeat this process until only one element remains.
In each iteration, let $$N$$ be the number of current remaining elements, and let $$m_i$$ be the median of the remaining elements in the $$i$$-th column. Let $$m^*$$ be the median of all medians $$m_i$$ of all columns. We compute the rank of $$m^*$$ in all remaining elements. If the rank is smaller than $$k$$, then for each column $$i$$ such that $$m_i < m^*$$, elements smaller than $$m_i$$ cannot be the answer and can be eliminated. The case of rank larger than $$k$$ is symmetric.
In each iteration, determine the median can be done in $$O(n)$$ time and the rank of $$m^*$$ in the remaining elements can be computed in $$O(m + n)$$ time. Since half columns are halved in each iteration, the number of iterations is $$O(\lg n)$$. The complexity is $$O((m+n) \lg n)$$.
Theoretically, this problem can be solved in $$O(m + n)$$ time.
Andranik Mirzaian and Eshrat Arjomandi, "Selection in X + Y and matrices with sorted rows and columns," Information Processing Letters Volume 20, Issue 1, 2 January 1985, Pages 13–17. [DOI]
Let $$c$$ be a fixed constant. Given a positive integer $$k$$ and $$c$$ sorted arrays of the same length $$n$$, design an algorithm to find the $$k$$-th smallest number in these $$c$$ sorted arrays.
We can apply the technique in the previous question and obtain an $$O(n)$$ time algorithm.
A better solution: The idea is as follows: we iteratively eliminate some elements that cannot be the answer. Repeat this process until only one element remains.
In each iteration, let $$N$$ be the number of current remaining elements, and let $$m_i$$ be the median of the remaining elements in the $$i$$-th array. If $$k < N/2$$, then we pick $$i^* = \arg \max_i m_i$$. In the $$i^*$$-th array, the elements that are larger than $$m_{i^*}$$ must be larger than $$N / 2$$ elements in all elements. Hence, we can remove these elements safely. The case of $$k \geq N / 2$$ is symmetric.
Since at least one array halves the size in each iteration, the number of iterations is $$O(c \lg n)$$. By storing all medians in a min-heap, each iteration can be done in $$O(\lg c)$$ time. The complexity is $$O(c \lg n \lg c) = O(\lg n)$$.
Given two sorted arrays $$A$$ and $$B$$ of $$m$$ and $$n$$ integers respectively, $$n > m$$, design an algorithm to find the intersection of these two arrays.
Since arrays are sorted, we can linear scan the arrays to find the intersection in time $$O(m + n)$$. When $$m = o(n / \lg n)$$, we can improve the running time by applying binary search in $$A$$ to find each element in $$B$$.
A better solution: Use the median of $$B$$ to divide $$A$$ into two subarrays and then recursively solve the sub-problems. The time complexity is $$O(m \lg (n/m))$$.
Ricardo Baeza-Yates, Alejandro Salinger, “Fast Intersection Algorithms for Sorted Sequences,” Algorithms and Applications, Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday, 2010, pp 45-61. [DOI]
Given a read-only array $$A$$ of $$n$$ integers, where $$A[i] \in [1 \dots n - 1]$$ for all $$1 \leq i \leq n$$, design an algorithm to find one duplicated element.
If $$A$$ is modifiable, we can sort the array in linear time, since the range of elements in $$A$$ is bounded by $$n$$. After sort $$A$$, one duplicated element can be found easily. How can solve this problem if the array is read-only?
Solution: Since each element is in $$[1, n-1]$$, we conceptually consider each element as a node in a linked list, that is, element $$i$$ points to an element $$A[i]$$. Since no node can point to node $$n$$, we consider node $$n$$ as the head of this linked list. If node $$i$$ is pointed by more than one element, then $$i$$ is a duplicate value in the array. Thus, we can use classical cycle detection algorithm to find the head of the cycle, which is a duplicate element. The algorithn runs in linear time with $$O(1)$$ additional variables.
Given a read-only array $$A$$ of $$n - k$$ distinct integers, where $$A[i] \in [1 \dots n]$$ for all $$1 \leq i \leq n - k$$, design an algorithm to find all integers in $$[1 \dots n] \setminus A$$.
If $$A$$ is modifiable, we can sort the array in linear time, since the range of elements in $$A$$ is bounded by $$n$$. After sort $$A$$, the integers are in $$[1 \dots n] \setminus A$$ can be found easily. How can we solve this problem when the array is read-only?
When $$k$$ is 1, 2, or 3, the missing numbers can be found easily by using xor operations in linear time with $$O(1)$$ additional variables without modifing $$A$$.
General solution: Let $$x_1, \dots, x_k$$ be the missing numbers. We can construct a system of equations $$\sum_{j = 1}^k x_j^i = b_i = \sum_{j=1}^n j^i- \sum_{j=1}^{n-k} A[j]^i$$. Solve this system of polynomial equations and then get $$x_1, \dots, x_k$$. Since solving a system of polynomial equations takes exponential time in the worst case, we need a different approach.
Using Newton's identities, we create a degree $$k$$ polynomial $$P$$, whose roots are exactly $$x_1, \dots, x_k$$, based on $$b_1, \dots, b_k$$. Then, we factor the polynomial and find the roots. Since coefficients of $$P$$ may be large, both calculation and factorization are carried in a finite field. The complexity is polynomial in expectation.
Yaron Minsky, Ari Trachtenberg, Richard Zippel: Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory 49(9): 2213-2218 (2003). [DOI]
Given a $$w$$-bit integer $$x$$, design an algorithm to determine the position of the most significant set bit of $$x$$. That is, find $$\lfloor \lg x \rfloor$$. Assume that bitwise operations (and, or, not, xor) and integer arithmetics (+, -, *, /) on $$w$$-bit integers can be done in $$O(1)$$ time.
By using binary search, $$\lfloor \lg x \rfloor$$ can be computed in $$O(\lg w)$$ time.
A better solution: we can avoid the comparison in the binary search, since comparsion may be expensive for some architectures. First, set all bits that are on the right of the most significant bit to one. Then, add one to get $$2^{\lfloor \lg x \rfloor + 1}$$, which has only one set bit. Since the index of the set bit is $$\lfloor \lg x \rfloor + 1$$, index minus one is the answer.
In order to set all bits that are on the right of the most significant bit to one, we use the bitwise parallel technique. The idea is to double the consecutive ones from the most significant set bit. In the first iteration, we set $$x$$ to be the bitwise or of $$x$$ and $$x \gg 1$$. After the first iteration, $$x$$ must have at least two consecutive ones from the most significant set bit. In the $$i$$-th iteration, we set $$x$$ to be the bitwise or of $$x$$ and $$x \gg 2^i$$. After the $$i$$-th iteration, $$x$$ must have $$2^{i}$$ consecutive ones from the most significant set bit. After $$O(\lg w)$$ iterations, we successfully set all bits that are on the right of the most significant bit to one.
In order to compute the index of an integer that is power of two, since there are only $$w+1$$ different inputs, we can design a perfect hash function that maps each input to an unique integer in $$[0 \dots w]$$ and then build a look-up table with a size $$w+1$$ to store the result. One good candidate hash function is based on the property of de Bruijn sequence, where the detail can be found in the follwing paper:
Charles E. Leiserson, Harald Prokof, and Keith H. Randall, Using de Bruijn Sequences to Index 1 in a Computer Word. [LINK]
Theoretically, finding the index of the most significant set bit can be solved in $$O(1)$$ time assuming integer arithmetics are available.
M. L. Fredman and D. E. Willard, BLASTING through the information theoretic barrier with FUSION TREES. STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing Pages 1-7. [DOI]
Given a positive integer $$k$$ and a binary min-heap $$Q$$ with $$n$$ elements, design an algorithm to find the $$k$$-th smallest element in $$Q$$.
We can find the $$k$$-th smallest element by executing extract-min for $$k$$ times. The complexity is $$O(k \lg n)$$.
A better solution: Create a binary heap $$P$$ with one element, which is the minimum element of $$P$$. In each iteration, delete the minimum $$m$$ of $$P$$. Then, add $$m$$'s children in $$Q$$ to $$P$$. The element deleted in the $$k$$-th iteration is the answer. Since the number of elements in $$P$$ is $$O(k)$$, the complexity is $$O(k \lg k)$$.
Theoretically, this problem can be solved in $$O(k)$$ time.
G.N. Frederickson, “An Optimal Algorithm for Selection in a Min-Heap,” Information and Computation Volume 104, Issue 2, June 1993, Pages 197–214. [DOI]
In an undirected graph $$G = (V, E)$$, four vertices of $$G$$, $$u, v, x$$, and $$y$$ form a 4-cycle, if $$(u, v)$$, $$(v, x)$$, $$(x, y)$$ and $$(y, u)$$ are in $$E$$. Given an undirected graph with $$n$$ vertices, design an algorithm to determine whether $$G$$ contains a 4-cycle.
For each pair of vertices $$(u, x)$$, testing whether there exists two vertices $$v$$ and $$y$$ such that $$u, v, x$$, and $$y$$ form a 4-cycle can be done in linear time. Since there are $$O(n^2)$$ pairs of vertices, determine the existence of a 4-cycle can be done in $$O(n^3)$$.
A better solution: Construct an $$n \times n$$ matrix $$A$$ with all zeros. For each vertex $$u$$, for each pair of $$u$$'s neighbors $$v$$ and $$x$$, we test the value of $$y = A[v][x]$$. If $$y$$ is zero, then we set $$A[v][x] = u$$, which means that $$v$$ and $$x$$ has a common neighbor $$u$$. If $$y$$ is non-zero, then we found a 4-cycle $$u, v, y, x$$. Since each time we either find a 4-cycle or change the value of some entry in $$A$$, the complexity is $$O(n^2)$$.
Theoretically, for any even number $$k > 0$$, finding a $$k$$-cycle can be done in $$O(n^2)$$.

Raphael Yuster and Uri Zwick, “Finding Even Cycles Even Faster,” SIAM Journal on Discrete Mathematics, Volume 10 Issue 2, May 1997, Pages 209 - 222. [DOI]

The key to performance is elegance, not battalions of special cases.
Jon Bentley and Doug McIlroy

### Improve branch prediction

Here is an ordinary version of binary search to find the leftmost insertion point (lower_bound in C++ STL).
long binarysearch(int a[], long n, int x)
{
long low = 0;
for (long high = n - 1; low <= high; ) {
long mid = low + ((high - low) >> 1);
if (a[mid] >= x)
high = mid - 1;
else
low = mid + 1;
}
return low;
}
In each iteration, binary search divides the search range into two equal-size ranges and determine which range contains the target. Theoretically, binary search is optimal. But, what if we divide the range into two ranges with different sizes?
We change the calculation of mid element from
 long mid = low + ((high - low) >> 1);
to
 long mid = low + ((high - low) >> 2);
Is modified binary search faster? We test on random sorted integer array. Modified binary search improves the performance by ~10%. [CODE]
This is due to branch misprediction. If binary search always divides the search range into two equal-size ranges, then the comparison inside the loop has probability 0.5 to be true when the input is uniformly at random. Consequently, branch predictor becomes useless and misprediction penalty increases. The modified binary search divides the search range into two ranges with different size so that branch predictor can perform better. That's why modified binary search method improves the performance.
In practice, since the bottleneck of binary search is usually in the user-defined comparison, modified binary search may not improve the performance.
Gerth Stølting Brodal, Gabriel Moruz, “Skewed Binary Search Trees,” Annual European Symposium on Algorithms, ESA, 2006, pp 708-719.[DOI]

## Improve cache performance

Here is an ordinary implementaion of sieve of Eratosthenes.
void eratosthenes(char prime[], int n)
{
memset(prime, 1, n + 1);
prime[0] = prime[1] = 0;
int bound = round(sqrt(n));
for (int i = 2; i <= bound; ++i) {
if (prime[i]) {
for (int j = i * i; j <= n; j += i)
prime[j] = 0;
}
}
}
What if we process the inner loop in reverse order? We change the inner loop from
for (int j = i * i; j <= n; j += i)
prime[j] = 0;
to
for (int k = n / i, j = i * k; k >= i; --k, j -= i)
if (prime[k])
prime[j] = 0;
Since the number of iterations in the inner loop does not change, this transformation seems useless. However, the modifed version only takes 1/3 time to execute. [CODE]
In the modified version, the inner loop accesses prime[j] only when prime[k] is true, so we avoid unecessary memory writing. Moreover, since the access of prime[k] is sequential, the cache performance is improved.
Rolf H. Möhring and Martin Oellrich, Chapter 13 The Sieve of Eratosthenes – How Fast Can We Compute a Prime Number Table?, Algorithms Unplugged, 2011. [DOI]

## Improve instruction level parallelism

Here is an oridinay way to calculate the sum of an array of doubles.
double sum(double a[], int n)
{
double ans = 0.0;
for (int i = 0; i < n; ++i)
ans += a[i];
return ans;
}
Asymptotically, this is the optimal solution. However, we can unroll the loop to improve the instruction level parallelism.
double sum(double a[], int n)
{
double even = 0.0, odd = 0.0;
for (int i = 0; i + 1 < n; i += 2) {
even += a[i];
odd += a[i + 1];
}
return even + odd + (n & 1 ? a[n - 1] : 0.0);
}
The modified version is ~10% faster than the ordinary method. [CODE]