Comp 2673, Spring 2003 - The Travelling Salesman

The Travelling Salesman problem

This is a really classic problem from computer science. Say you have a weighted graph in which the nodes represent cities and the edges represent distances between them. You want to visit every city exactly once and return to your starting point. You want to do it in the shortest path possible.

Often we assume that the nodes (cities) lie in a plane and the weight of each edge is just the (Euclidean) distance between them. We use the complete graph, meaning there's an edge between each pair of nodes. In other words, you may travel directly between any two cities.

The two graph-theory algorithms you've seen so far (Kruskal's algorithm for minimum spanning tree and Dijkstra's algorithm for shortest path) are both greedy algorithms. The crucial step in each algorithm is to choose the next-best (this is why they're called "greedy"). In the case of Kruskal's algorithm, we choose the next lightest edge that doesn't create a cycle. In Dijkstra's algorithm, we choose the next-shortest path that uses only vertices we're done with. Despite the simplistic approach of these algorithms they succeed in finding the best solution. For the travelling salesman problem, you might try a greedy approach - for example, you might try following a path that always goes to the next-closest unused node. However, this greedy approach does not work - it simply does not find the best path. Here's an example that shows the path that a greedy approach would choose, versus the true shortest path:

This problem of finding the true shortest route turns out to be hard. In fact, it's one of a class of problems known as NP-complete. NP-complete problems can be thought of as the hardest - no known polynomial-time algorithm solves them. However, neither has it been proved that there exists no polynomial-time algorithm. Whether or not such an algorithm exists is an important and famous unsolved problem in computer science.

The only known solution to the travelling salesman problem is the brute-force approach. This means trying every possible route. Let's figure out how many routes need to be tried if there are n cites. Assume that the start city is fixed. There are n-1 choices for the 2nd city, n-2 choices for the 3rd city, etc. Thus, there are (n-1)· (n-2)·...·3·2·1 = (n-1)! possible routes. Assuming that we calculate the distances between each city in advance, then each route requires n additions. This means we need about n! additions to solve the problem. In other words, this algorithm is o(n!) Assuming that we can do 500 million floating point operations per second, the following table shows how long it will take to solve the travelling salesman problem for various numbers of cities.

n # of routes # of additions time to compute

3 2 6 12 nanoseconds

4 6 24 48 nanoseconds

5 24 120 240 nanoseconds

6 120 720 1.4 microseconds

7 720 5040 10 microseconds

8 5040 40320 81 microseconds

9 40320 362880 726 microseconds

10 362880 3628800 7 milliseconds

11 3.6 million 40 million 80 milliseconds

12 40 million 480 million .96 seconds

13 480 million 6.2 billion 12.5 seconds

14 6.2 billion 87 billion 3 minutes

15 87 billion 1.3 trillion 44 minutes

20 1.22 × 10¹⁷ 2.4 × 10¹⁸ 154 years

25 6.2 × 10²³ 1.6 × 10²⁵ 984000 millenia

Here's a web site with a java applet that does a brute force (exhaustive) search for all the routes in the travelling salesman problem and finds the true optimum route. Travelling Salesman brute force (exhaustive, exact) solution

n	# of routes	# of additions	time to compute
3	2	6	12 nanoseconds
4	6	24	48 nanoseconds
5	24	120	240 nanoseconds
6	120	720	1.4 microseconds
7	720	5040	10 microseconds
8	5040	40320	81 microseconds
9	40320	362880	726 microseconds
10	362880	3628800	7 milliseconds
11	3.6 million	40 million	80 milliseconds
12	40 million	480 million	.96 seconds
13	480 million	6.2 billion	12.5 seconds
14	6.2 billion	87 billion	3 minutes
15	87 billion	1.3 trillion	44 minutes
20	1.22 × 10¹⁷	2.4 × 10¹⁸	154 years
25	6.2 × 10²³	1.6 × 10²⁵	984000 millenia

Actually, there are ways to improve the discouraging numbers shown above. The brute force solution can be made more efficient by recognizing poor solutions before checking them, and thus reducing the amount of computation needed. The best brute-force solutions so far are of exponential order in the number of nodes. This is a significant improvement over a factorial order, which makes the solution accessible for larger numbers of nodes. However, the problem is still intractable past about 150 nodes. Here is a web site that has an excellent exhaustive search solution to TSP. TSP exact solution - very efficient

When you're doing an exhaustive search, you must try every permutation of nodes. However, sometimes after listing just the first few nodes in the path, you can already realize that it can't be better than your best so far. One way to do this is to add the lengths of those first few nodes, then get a lower-bound for the path through remaining nodes. If the sum of the start of the path and the lower-bound for the rest of the nodes is larger than your best so far, then you can omit testing ALL of the paths that begin with those first few nodes. This is sometimes called branch-and-bound, and it can drastically reduce the number of paths you have to check. To find a lower bound for a path through the remaining nodes, you can find a mimimum spanning tree. Since any path through the remaining nodes will connect them, the sum of those lengths can't exceed the mimimum spanning tree (taking into account the two edges that connect the unused nodes to the start of the path). You know how to find the MST already - use Kruskal's algorithm. This technique can be further improved by finding a really excellent route before you even begin. The length of this first route is the one try to beat - the better it is, the more paths you'll avoid checking. Therefore, in order to find the absolute optimum route, it is helpful to start by finding a very very good one.

Because the computational complexity of the exhaustive search is exponential, for many real-world applications people are willing to trade finding the very best solution in exchange for a pretty good solution that can be found quickly. Many algorithms have been developed to do this. These are sometimes known as heuristic algorithms. Examples include the techniques of simulated annealing, genetic algorithms, memetic algorithms, neural networks, dynamic programming, clustering techniques, and others. These techniques can also be used in combination, and each have a wide range of applications other than TSP.

Later this quarter, we'll talk about the genetic alogorithm solution to the travelling salesman problem.