せーの、

Korosensei can move at Mach 20 and takes advantage of this fact to make frequent trips around the world. Obviously, in the interest of efficiency, he often takes the opportunity to make multiple stops per trip. But what is the most efficient way for him to do so?

We can think about this using graphs. Suppose all the possible destinations are vertices and edges with costs connect the vertices, representing the cost of traveling from some destination $u$ to another destination $v$. At first glance, it might make sense to use physical distance in our case, but that is not necessarily a measure of the “best” way to get there. For instance, taking a bus or train between two cities may get you to the same place, but the actual cost depends on the path that’s taken and you may choose to include other criteria like the cost of tickets and other stuff like that.

So the problem is this: Korosensei has a list of places he wants to hit up during lunch hour. What is the most cost-effective way of getting to each of those places exactly once and get back to school? This is what’s known as the *Traveling Salesman problem* (TSP).

As it turns out, this problem, like many graph theory problems, is NP-complete, meaning it’s quite computationally hard and taxing for computers to solve. Formally, if we’re given a graph $G = (V,E)$ and a cost function $d: V \times V \to \mathbb R$, the problem is to find a cycle that visits every vertex exactly once with smallest possible cost. Usually, if we have $n$ vertices, then $G$ is the complete graph on $n$ vertices, that is, a graph where every vertex has an edge to every other vertex. Otherwise, it may not be possible to find such a cycle (and the problem of finding such a cycle is also an NP-complete problem, *Hamiltonian cycle*).

It’s called the Traveling Salesman problem because the idea is that if you’re a traveling salesman, you’d want to find that least-cost cycle because it’d be the most cost-effective way to hit up all the cities you need to go to in order to sell whatever it is that you’re selling. Of course, the hardness of the problem means that our poor salesman or Korosensei are stuck having to check the cost of every cycle. How many are there? Well, since we’re dealing with a complete graph, this means we can get to any vertex from any vertex so we can take any permutation of our vertices and that’s a cycle, and there are $n!$ permutations. This means that as $n$ gets larger, the difficulty of solving this problem grows exponentially.

One way we can attempt to make NP-complete problems feasible to solve is to loosen our demands a bit. Right now, we want the absolute least-cost cycle, but what if it’s not necessary for us to get the least cost cycle? What if we’re okay with getting something that’s within a factor of 2 or 3 or $f(n)$ for some function $f$ that depends on the size of the input? These are called *approximation algorithms*, in that they approximate the optimal solution by some factor. Unfortunately this doesn’t work for TSP.

Why? Because if I have a feasible algorithm for approximating TSP, then I’ve got an efficient algorithm for solving Hamiltonian cycle, which I mentioned was also NP-complete. Suppose I take any old graph $H$ with $n$ vertices and turn it into a complete graph with all my original edges having cost 1 and any new edges I add having cost $\rho \cdot n + 1$. If I do have some approximation algorithm that gives me a solution within a factor of $\rho$, I’ll run it and get a cycle for Korosensei to use. But, based on the cost of the cycle I get, I can tell if my original graph had a Hamiltonian cycle: if it has cost $\leq n$, then that means it found a Hamiltonian cycle using all of the edges in my original graph. Otherwise, we get a cycle with cost at least

$$(\rho \cdot n + 1) + (n-1) = \rho \cdot n + n > \rho \cdot n$$

meaning it must’ve used one of the new expensive edges I added and so we know the original graph didn’t have a Hamiltonian cycle.

Now this is a bit discouraging, but Korosensei would encourage us to keep on thinking about how to assassinate the hell out of this problem. In the most general case, the cost function $d$ has no constraints and the way that I’ve initially motivated Korosensei’s problem, $d$ can be arbitrary, with costs adjusted to his needs. However, some of the attempts to make TSP more computationally feasible have to do with making some reasonable restrictions on our cost function. This is another fairly standard approach to making computationally hard problems easier to deal with: figure out some cases of the problem that are still useful but might help to simplify or restrict the problem a bit.

For example, in most cases, it’s a reasonable expectation that if you’re going from $u$ to $v$, it’s best to go there directly rather than stopping at $w$ on your way there. This is what we call the *triangle inequality*:

$$d(u,v) \leq d(u,w) + d(w,v).$$

It’s called the triangle inequality because of what I just described: draw a triangle out of $u$, $v$, and $w$ and the distance between any two should be shorter than going through the third point.

I say *should* because for an arbitrary cost function, this isn’t always the case. One example is flight pricing. Sometimes, it’s more expensive to fly from $u$ to $w$ than it is to fly from $u$ to $v$, even if the flight stops over at $w$. Anyhow, Korosensei doesn’t have to deal with silly things like flight pricing and so we impose this reasonable triangle inequality condition on his cost function. Does it help? It turns out it does and we can get an approximation that’ll be at most twice the cost of the optimal solution.

How? Well, first we find a minimum spanning tree. A spanning tree of a graph is a tree (a graph that contains no cycles) that connects all of the vertices of a graph together. The minimum spanning tree is the spanning tree with the least weight out of all the spanning trees. The nice thing about MSTs is that we know how to do this efficiently.

Once we have our MST $T$ of our graph $G$, we double up every edge in $T$ to get a cycle $T’$. This cycle what we want yet, since it travels along all the same edges and goes to each vertex twice. But, we can get the cycle we really want by removing all of the second times we visit a vertex and we can do this because the triangle inequality tells us that by taking a more direct route, the cost of the cycle won’t increase. So now if we have our cycle $C$, then we have

$$cost(C) \leq cost(T’) \leq 2 \cdot cost(T).$$

Now, to compare to the cost of our actual optimal solution $C^*$, we notice that taking out some edge of $C^*$ makes it a spanning tree. It’s not necessarily minimal, so we’ve got

$$cost(T) \leq cost(C^* \setminus \{ e \}) \leq cost(C^*).$$

Putting that all together, we’ve got

$$cost(C) \leq cost(T’) \leq 2 \cdot cost(T) \leq 2 \cdot cost(C^*)$$

and there’s our 2-approximation.

Given how fast Korosensei is, this is probably good enough for him and is a decent tradeoff between the time it takes to solve the problem and his actual travel time.