Implementation Notes

Sketch

I won't describe the implementation in detail here, but you can look in an AI book or at another web page that has a description. See the references section for a list of books. Essentially, there is a set of nodes (map locations and a cost) called OPEN and another set called CLOSED. Each time through the main loop, you pick out the best element from OPEN (where "best" means "the one with the lowest cost"), and you look at its neighbors. You then put any unvisited neighbors into the OPEN set. The cost of a node is the sum of the current cost of walking from the start to that node and the heuristic estimate of the cost from that node to the goal.

Source code

My own C++ A* code is available: path.cpp and path.h. There is also an older (slower, but easier to understand) version of my code, as well as many other implementations of A*, on Steve Woodcock's Games AI Page.

Performance tips

A lot of people don't know why their A* implementation is so slow, and ask on rec.games.programmer or comp.ai.games. The most common reasons are:

Bad heuristic

For example, on a grid map, you should have a heuristic that reflects the actual movement costs along normal directions and diagonals. If your game doesn't allow diagonal movement, you probably want abs(dx)+abs(dy). If it does, you probably want max(abs(dx),abs(dy)).

This is counterintuitive -- it makes the most sense to have a "true" distance heuristic. A*, however, is comparing the g values (actual movement costs) to h values (heuristic). If these two don't match, or if they aren't at the same scale, A* may give bad paths or may search too much.

The problem with this heuristic is that it gives you path that look bad, even if they aren't really bad! To remedy this, add a small (around 1% of the heuristic) penalty when the node isn't near the straight-line path from the start to the goal. Also see the Heuristics section of this document for more information.

Slow priority queue

What's the first thing you'll think of using for these two sets? If you're like me, you probably thought "array". You may have thought "linked list", too. The problem is that finding the best node out of an array or list is slow. Lists and arrays are very general. Usually, more specific data structures can outperform general data structures. For A*, we do not need the flexibility of a list (fast insertion and deletion anywhere), nor do we need the flexibility of an array (fast access to any element). Instead, we want two operations: insert and delete best. (Actually, we need a third; see the note below.)

Heaps

For these operations, what you really should have is a priority queue. These can be implemented in a variety of ways, including balanced binary trees, skip lists, and splay trees. My favorite is with heaps. Heaps are easy to implement, and they are stored in an array, which is pretty efficient. I use STL's efficient implementation of heaps for my code. I get speed and I don't have to code up and test a heap data structure. An example set of numbers: if you have 1,000 nodes in your OPEN set, then inserting into and removing from a sorted list takes an average of 500 comparisons and 2 moves; for an unsorted list it takes 1000 comparisons and 2 moves; for a sorted array it takes 500 comparisons and 500 moves; for a heap it takes 10 comparisons and 10 moves. That's approximately a factor of 100 faster. However, these are worst case values. More realistically, with 1000 nodes in the set, you should expect heaps to be 25% faster than linked lists, and for 1500 nodes in the set, you should expect heaps to be 50% faster. The more nodes you have, the faster the heap will be relative to an array or list.

It is important to keep in mind that we are not merely looking for asymptotice ("big O") behavior. We also want to look for a low constant. To see why, consider an algorithm that is O(log N) and another that is O(N), where N is the number of elements in the heap. It may be that on your machine, an implementation of the first algorithm takes 10,000 * log(N) seconds, while an implementation of the second one takes 2 * N seconds. For N = 256, the first would take 80,000 seconds and the second would take 512 seconds. The "faster" algorithm takes more time in this case, and would only start to be faster when N > 200,000.

You cannot merely compare two algorithms. You should also compare the implementations of those algorithms. You also have to know what size your data might be. In the above example, the first implementation is faster for N > 200,000, but if in your game, N stays under 30,000, then the second implementation would have been better.

A friend of mine (who does research in data structures for shortest-path algorithms) says that heaps are good unless you have more than 10,000 elements in your heap. Unless your game's map is extremely large, you probably don't need a more complicated data structure (like multi-level buckets). You should probably stay away from Fibonacci heaps, which have good asymptotic performance but have slow implementations unless N is huge.

Important Note for anyone using heaps: Unlike Dijkstra's Algorithm, A* requires that you determine whether the new node you are examining is already in the OPEN set, and if it is, whether it is better than the one in the OPEN set, and if it is better, then the one in the OPEN set must be adjusted. The problem is that when using heaps, it is difficult to find a particular node in the heap -- it requires a search through the entire heap, which is slow. To avoid this expensive search, my A* code has a "Marking" array to help it answer the two questions above. To avoid the first question (whether something is in the OPEN set), the Marking array keeps track of which map locations are in the OPEN and CLOSED sets. To avoid the second question (whether the new node is better than the old one), the Marking array keeps track of the g values at each node. Only if both questions are answered "yes" does my code invoke Find (which I call find_in_open). Experiments suggest that for most searches in my game, one of the two questions will be answered "no" most of the time, so almost all the calls to Find are eliminated when using the Marking array.

Splay Trees

Heaps are a tree-based structure with expected O(log N) time operations. However, the problem is that with A*, the common behavior is that you have a low cost node that is removed (causing O(log N) behavior, since values have to move up from the very bottom of the tree) followed by low cost nodes that are added (cuasing O(log N) behavior, since these values are added at the bottom and bubble up to the very top). The expected case behavior of heaps here is equivalent to the worst case behavior. We may be able to do better if we find a data structure where expected case is better, even if the worst case is no better.

Splay trees are a self adjusting tree structure. Any access to a node in the tree tends to bring that node up to the top. The result is a "caching" effect, where rarely used nodes go to the bottom and don't slow down operations. It doesn't matter how big your splay tree is, because your operations are only as slow as your "cache size". In A*, the low cost nodes are used a lot, and the high cost nodes aren't used for a long time, so those high cost nodes can move to the bottom of the tree.

HOT Queues

There's another good data structure that may be better than heaps. Often, you can restrict the range of values that would be in the priority queue. Given a restricted range, there are often better algorithms. For example, sorting can be done on arbitrary values in O(N log N) time, but when there is a fixed range it can be done in O(N) time.

We can use HOT Queues (Heap On Top queues) to take advantage of a particular property of A* and Dijkstra's algorithms: when we are removing nodes with value f, the nodes we insert have value f + delta where delta <= C. (Note however that delta >= 0 only if your heuristic is admissible, and this technique is not as useful if you use an inadmissible heuristic.) The constant C is the maximum change in cost from one point to an adjacent point. We can therefore keep "buckets" that store some subset of the values (just like we do in the O(N) sorting algorithms), and in the most important bucket we can use a heap. For example, if there are ten buckets, then the heap only contains (on average) one tenth of the values, so it runs faster than using a heap for all of them. In addition, we do not organize the other nine tenths of the elements unless they are actually needed.

In A*, we often put nodes into OPEN that we never actually need. HOT Queues are a big win because the elements that are not needed are inserted in O(1) time. Only elements that are needed get heapified (which is not too expensive). The only operation that is more than O(1) is node deletion from the heap, which is O(log N) but remember that the queue is small, so it's not as bad as you'd expect.

In addition, if C is small, we do not even need a heap for the smallest bucket, so insertion and deletion are both O(1) time! (Compare this to heaps, which have insertions and deletions take O(log N) time.) One person reported that HOT queues are as fast as heaps for at most 800 nodes in the OPEN set, and are 20% faster when there are at most 1500 nodes. I would expect that HOT queues get faster as the number of nodes increases.

Variations of A*

Beam search

In the main A* loop, the OPEN set stores all the nodes that may need to be searched to find a path. The Beam Search is a variation of A* that places a limit on the size of the OPEN set. If the set becomes too large, the node with the worst chances of giving a good path is dropped. One drawback is that you have to keep your set sorted to do this; a priority queue implementation may not work well. (HOT queues would work, though.) On the other hand, sorting it is not much slower than a priority queue, and it may help so much to be able to drop nodes that it pays for the extra overhead of sorting. (Note that you don't do a full sort every time you add a node to the list; you only need to place the new node in the right place, which can be done in logarithmic time. Inserting into a priority queue is also logarithmic. It's just that priority queues can have a lower constant factor associated with the costs.)

Iterative deepening

Iterative Deepening is used in many AI algorithms to start with an approximate answer, then make it more accurate. The name comes from game tree searches, where you look some number of moves ahead (for example, in Chess). You can try to deepen the tree by looking ahead more moves. Once your answer doesn't change or improve much, you assume that you have a pretty good answer, and it won't improve when you try to make it more accurate again. In ID-A*, the "depth" is a cutoff for f values. When the f value is too large, the node won't even be considered (i.e., it won't be added to the OPEN set). The first time through you process very few nodes. Each subsequent pass, you increase the number of nodes you visit. If you find that the path improves, then you continue to increase the cutoff; otherwise, you can stop. For more details, read these lecture nodes on ID-A*.

I personally don't see much need for ID-A* in games. ID algorithms tend to increase computation time while reducing memory requirements. In games, however, the "nodes" are very small -- they are simply coordinates. I don't see a big win from not storing those nodes. As for processing time, although lists, arrays, and heaps are affected by a large number of nodes added to the OPEN set, HOT queues can be used to avoid using any CPU time for handling nodes with high f values.

Dynamic weighting

With dynamic weighting, you assume that at the beginning of your search, it's more important to get (anywhere) quickly; at the end of the search, it's more important to get to the goal.

f(p) = g(p) + w(p) * h(p)

Here, there is a weight (w >= 1) associated with the heuristic. As you get closer to the goal, you decrease the weight; this decreases the importance of the heuristic, and increases the relative importance of the actual cost of the path.

Bandwidth search

There are two properties about Bandwidth Search that some people may find useful. This variation assumes that h is an overestimate, but that it doesn't overestimate by more than some number e. If this is the case in your search, then the path you get will have a cost that doesn't exceed the best path's cost by more than e. Once again, the better you make your heuristic, the better your solution will be.

Another property you get is that if you can drop some nodes in the OPEN set. Whenever h+d is greater then the true cost of the path (for some d), you can drop any node that has an f value that's at least e+d higher than the f value of the best node in OPEN. This is a strange property. You have a "band" of good values for f; everything outside this band can be dropped, because there is a guarantee that it will not be on the best path.

Curiously, you can use different heuristics for the two properties, and things still work out. You can use one heuristic to guarantee that your path isn't too bad, and another one to determine what to drop in the OPEN set.

Bidirectional search

Instead of searching from the start to the finish, you can start two searches in parallel -- one from start to finish, and one from finish to start. When they meet, you should have a good path.

This sounds like a good idea, but I don't think it'll give you very much. The idea behind bidirectional searches is that searching results in a `tree' that fans out over the map. A big tree is much worse than two small trees, so it's better to have two small search trees. With A*, however, my experiments suggest that you don't get a tree. You get a path that has nearby map areas explored, but it doesn't fan out like Dijkstra's algorithm. In fact, that's what makes A* so fast -- no matter how long your path is, it doesn't search like crazy, unless the path really is crazy. It tends to search only small areas of the map. Bidirectional search may be more useful if your map is complex.

The front-to-front variation links the two searches together. Instead of choosing the best forward-search node---g(start,x) + h(x,goal)---or the best backward-search node---g(y,goal) + h(start,y)---this algorithm chooses a pair of nodes with the best g(start,x) + h(x,y) + g(y,goal).

The retargeting approach abandons simultaneous searches in the forward and backward directions. Instead, it performs a forward search for a short time, chooses the best forward candidate, and then performs a backward search---not to the starting point, but to that candidate. After a while, it chooses a best backward candidate and performs a forward search from the best forward candidate to the best backward candidate. This process continues until the two candidates are the same point.

Hierarchical A*

Path-Finding algorithms tend to be worse than linear: if you double the distance needed to travel, it takes more than twice as long to find the path. You can think of path-finding as searching some area like a circle --- when the circle's diameter doubles, it has four times the area.

One way to reduce the problem is to have multiple levels of searching. For example, to get from your home to a location in another city, you would find a path from your chair to your car, from the car to the street, from the street to a freeway, from the freeway to the edge of the city, from there to the other city, then to a street, to a parking lot, and finally to the door of the destination building. There are several levels of searching here:

Dividing the problem into levels allows you to ignore most of your choices. When moving from city to city, it is quite tedious to consider every street in every city along the way. Instead, you ignore them all, and only consider freeways. The problem becomes small and manageable, and solving it becomes fast.

You can use multiple levels with graph-searching algorithms such as A*. One paper on this topic is Hierarchical A*: Searching Abstraction Hierarchies Efficiently. You do not need to use the same algorithm at each level. See the section about splines for an example of using a path-finding algorithm at one level (from tile to tile) and a simple spline drawing algorithm at the lower level (from pixel to pixel).

A similar approach is to use varying resolution. First, plot a path with low resolution. As you get closer to a point, refine the path with a higher resolution. This approach can be used with path splicing to avoid moving obstacles.

Dynamic A*

My impression of Dynamic A* (D*) was that it was good for robots but not necessarily for games. D* is intended for use when you don't have complete information. If you don't have all the information, A* can make mistakes; D*'s contribution is that it can correct those mistakes without taking much time. A* assumes you have all the information at the beginning, and then gives you a path to follow. D*'s does not require complete information at the beginning -- instead, you give it what you know, and it gives you a path. When you learn more, D* will make improvements to the path. D* solves a different problem than A*. D* never does worse than A*. If you have all the information at the beginning, they produce the same path. If you do not have all the information, they initially produce the same path but D* improves it, so its path is better than the one A* produces. However, D* requires a lot of space -- essentially you run A* and keep around its internal information (OPEN/CLOSED sets, path tree, g values), and then when the map changes, D* will tell you if you need to adjust your path to take into account the map changes. For a game with more than one unit, you usually don't want to keep all that information around, so D* can't really be used. It was designed for robotics, where there is just one robot -- you don't need to reuse the memory for some other robot's path. If your game has only one or a small number of units, you may want to investigate D*.


Last modified: Sun Nov 18 11:32:17 2001
From Amit's Game Programming Site
Copyright (C) 2000, Amit J. Patel