The Travelling Salesman problem

This is a really classic problem from computer science. Say you have a weighted graph in which the nodes represent cities and the edges represent distances between them. You want to visit every city exactly once and return to your starting point. You want to do it in the shortest path possible.

Often we assume that the nodes (cities) lie in a plane and the weight of each edge is just the (Euclidean) distance between them. We use the complete graph, meaning there's an edge between each pair of nodes. In other words, you may travel directly between any two cities.

The two graph-theory algorithms you've seen so far (Kruskal's algorithm for minimum spanning tree and Dijkstra's algorithm for shortest path) are both greedy algorithms. The crucial step in each algorithm is to choose the next-best (this is why they're called "greedy"). In the case of Kruskal's algorithm, we choose the next lightest edge that doesn't create a cycle. In Dijkstra's algorithm, we choose the next-shortest path that uses only vertices we're done with. Despite the simplistic approach of these algorithms they succeed in finding the best solution. For the travelling salesman problem, you might try a greedy approach - for example, you might try following a path that always goes to the next-closest unused node. However, this greedy approach does not work - it simply does not find the best path. Here's an example that shows the path that a greedy approach would choose, versus the true shortest path:

This problem of finding the true shortest route turns out to be hard. In fact, it's one of a class of problems known as NP-complete. NP-complete problems can be thought of as the hardest - no known polynomial-time algorithm solves them. However, neither has it been proved that there exists no polynomial-time algorithm. Whether or not such an algorithm exists is an important and famous unsolved problem in computer science.

The only known solution to the travelling salesman problem is the brute-force approach. This means trying every possible route. Let's figure out how many routes need to be tried if there are n cites. Assume that the start city is fixed. There are n-1 choices for the 2nd city, n-2 choices for the 3rd city, etc. Thus, there are (n-1)· (n-2)·...·3·2·1 = (n-1)! possible routes. Assuming that we calculate the distances between each city in advance, then each route requires n additions. This means we need about n! additions to solve the problem. In other words, this algorithm is o(n!) Assuming that we can do 500 million floating point operations per second, the following table shows how long it will take to solve the travelling salesman problem for various numbers of cities.

n # of routes # of additions time to compute
32612 nanoseconds
462448 nanoseconds
524120240 nanoseconds
61207201.4 microseconds
7720504010 microseconds
850404032081 microseconds
940320362880726 microseconds
1036288036288007 milliseconds
113.6 million40 million80 milliseconds
1240 million480 million.96 seconds
13480 million6.2 billion12.5 seconds
146.2 billion87 billion3 minutes
1587 billion1.3 trillion44 minutes
201.22 × 10172.4 × 1018154 years
256.2 × 1023 1.6 × 1025984000 millenia
Here's a web site with a java applet that does a brute force (exhaustive) search for all the routes in the travelling salesman problem and finds the true optimum route. Travelling Salesman brute force (exhaustive, exact) solution

Actually, there are ways to improve the discouraging numbers shown above. The brute force solution can be made more efficient by recognizing poor solutions before checking them, and thus reducing the amount of computation needed. The best brute-force solutions so far are of exponential order in the number of nodes. This is a significant improvement over a factorial order, which makes the solution accessible for larger numbers of nodes. However, the problem is still intractable past about 150 nodes. Here is a web site that has an excellent exhaustive search solution to TSP. TSP exact solution - very efficient

When you're doing an exhaustive search, you must try every permutation of nodes. However, sometimes after listing just the first few nodes in the path, you can already realize that it can't be better than your best so far. One way to do this is to add the lengths of those first few nodes, then get a lower-bound for the path through remaining nodes. If the sum of the start of the path and the lower-bound for the rest of the nodes is larger than your best so far, then you can omit testing ALL of the paths that begin with those first few nodes. This is sometimes called branch-and-bound, and it can drastically reduce the number of paths you have to check. To find a lower bound for a path through the remaining nodes, you can find a mimimum spanning tree. Since any path through the remaining nodes will connect them, the sum of those lengths can't exceed the mimimum spanning tree (taking into account the two edges that connect the unused nodes to the start of the path). You know how to find the MST already - use Kruskal's algorithm. This technique can be further improved by finding a really excellent route before you even begin. The length of this first route is the one try to beat - the better it is, the more paths you'll avoid checking. Therefore, in order to find the absolute optimum route, it is helpful to start by finding a very very good one.

Because the computational complexity of the exhaustive search is exponential, for many real-world applications people are willing to trade finding the very best solution in exchange for a pretty good solution that can be found quickly. Many algorithms have been developed to do this. These are sometimes known as heuristic algorithms. Examples include the techniques of simulated annealing, genetic algorithms, memetic algorithms, neural networks, dynamic programming, clustering techniques, and others. These techniques can also be used in combination, and each have a wide range of applications other than TSP.

Later this quarter, we'll talk about the genetic alogorithm solution to the travelling salesman problem.