IA Logo


IA Information
Communication

Dave Mark's books

IA on AI


Posts Tagged ‘planning’

AI Architectures: A Culinary Guide (GDMag Article)

Thursday, November 1st, 2012

(This is a slightly edited copy of the feature article I wrote for Game Developer Magazine that appeared in their August, 2012 issue.)

A question that seems to come up regularly from students, designers, and even from veteran programmers is “what AI architecture should I use?” It’s a query that is sprinkled across internet message boards, comes up at GDC dinners, and I’m sure is a frequent topic in pre-production meetings at game studios – indie and AAA alike. As a game AI consultant, I certainly field it from my clients on a regular basis. More often than not, the less-than-satisfying answer to this question is a resounding, “It depends.” What follows resembles something more along the lines of an interrogation than an answer.

It puts me in mind of the days long ago when I was a waiter. (I suppose they call them “servers” now.) People would often ask me, “what do you recommend?” or the even vaguer, “what’s good here?” For those that haven’t had the opportunity to work in the restaurant business, you may not realize that this can be a horribly uncomfortable question to be presented with. After all, most people are well aware that tastes differ… so why would a waiter server be able to ascertain what it is you have a proverbial hankerin’ for? The way out of this situation would be to ask return questions. “Well, how hungry are you? Are you in a hurry? What are you in the mood for? Steak? Chicken? Allergic to peanuts? Oh… you’re vegan? And you need gluten-free, eh? Well that certainly redirects things.” The result of this Q&A is not that I tell them what they want, but rather I help them discover for themselves what it is they really want.

Much the same falls out of the conversations surrounding AI architectures. After all, there isn’t necessarily one best way of doing things. Often, as I mentioned before, it simply depends. What is it you are trying to do? What are your technical limitations? What is your experience? What is your time frame? How much authorial control do your designers need? Really, it is something that you as the developer need to ascertain for yourself. As a waiter, I can only help you ask the right questions of yourself and point you in the right direction once we have the answers.

Unfortunately, many of the available books, articles, and web sites only tell you how the different architectures work. They often don’t tell you the pros and cons of each. That often leads to misguided enthusiasts proclaiming with exuberant confidence, “I will create my [incredible AI] with a [not even remotely appropriate technology]!!” Don’t get me wrong… those resources are often very good at telling you how to work on a technical level with the different methods. What is missing is why you would do so.

At the 2010 GDC AI Summit, we presented a panel-based session entitled “Deciding on an AI Architecture: Which Tool for the Job?” The premise of the panel pitted four different technologies against each other to see which of them was most appropriate to use in four different game AI situations. Each architecture was represented by a person whose job it was to argue for our own and against the others. (Spoiler alert: the 4 game highly-contrived situations were specifically chosen so that each architecture was shown to have a strong point.) Even with the pre-written outcomes designed to steer toward a “best” answer, the three “losing” panelists managed to make the point that their method could work but simply was not the best tool for that job.

But that’s part of the problem, isn’t it? Certainly most AI architectures could manage to muddle through most any AI situation. However, that often leads people to a false sense of security. AI is often complicated enough that some amount of hair-pulling is expected. This often obscures the fact that an inappropriate—or, shall we say, “less than optimal”—method is making things more difficult than it should be. All of this brings us back to our initial question—“what AI architecture should I use?” And, once again, I respond, “it depends.”

Same Stuff—Different Shapes

It’s the same stuff… just different shapes.

If I may tap once again into my food metaphor, selecting an AI architecture is often like selecting Mexican food—for most intents and purposes, it’s the same stuff… just different shapes. Allow me to expound. The contents of most Mexican food can be reduced to some combination of a fairly succinct list of possibilities: tomato, cheese, beans, lettuce, onions… you know the drill. Additionally, the non-vegan among us often elect to include some form of meat (real or faux). Think of these ingredients as the behavior content of our AI. This is the stuff the gives all the flavor and “substance” to the dish. But what about the outside? Often we order our Mexican food in terms of its shape or form—a taco… a burrito… etc. Why do we think in terms of the outside form when it is the stuff that is on the inside that is pretty much the point of the order in the first place?

The answer lies in the fact that the outside—that is, the shell or wrapper of some sort—is merely a content delivery mechanism. For the most part, it exists only as a way of keeping those internals together long enough for consumption. In this way, these “delivery mechanisms” compare to AI architectures. They only serve to package and deliver the tasty behavioral content that we want our players to experience. But why are there so many different forms for such a utilitarian function? And which form do I use? Well, it depends.

The same holds true when we talk about our AI systems. We often speak in terms of the mechanism rather than the content. We are writing a finite state machine, a behavior tree, or a planner, just like we are filling a taco, a burrito, or an enchilada. Once we have decided on that packaging, we can put any (or all!) of the tasty stuff into it that we want to. However… there are pros and cons to each.

The Tostada—Just a Pile of Stuff

Starting simple, let’s look at the tostada. It is, quite literally, about as simple a delivery platform that you can use for our Mexican food. Everything just sits on top of it right where you can see it. You can add and remove ingredients with ease. Of course, you are somewhat limited on how much you can put on since eventually it will start to fall off the sides. Although it is a little difficult to grab initially, if you pick it up correctly, everything stays put. However, it’s not a terribly stable platform. If you tip it in the slightest, you run the risk of sending things tumbling. What’s more, as soon as you start biting into it, you run the risk of having the whole thing break in unpredictable ways at which point, your entire pile of content falls apart. For all intents and purposes, the tostada really isn’t a container at all once you start using it—it’s just a hard flat object that you pile stuff on top of and hope it stays put.

You never know when the entire platform is going to simply fall apart.

From an AI standpoint, the equivalent isn’t even really an architecture. This would be similar to simply adding rules here or there around our code that change the direction of things in a fairly haphazard manner. Obviously, the problem with that is, like the tostada, you can only get so much content before things become unstable. It’s a bit unwieldy as well. Most importantly, every time you take a bite of your content, you never know when the entire platform is going to simply fall apart.

Adding a Little Structure—The State Machine Taco

Our tostada suffered from not having enough structure to hold the content stable before it started to fall off—or fall apart entirely. However, by just being a little more organized about how we arrange things, we can make sure our content is a lot more self-contained. In the Mexican food world, we can simply bend our tostada shell into a curve and, behold, it becomes a taco! The result is that we can not only hold a lot more content, but we can also pick it up, manipulate it, and move it around to where we can get some good use out of it.

Adding a little bit of structure to a bunch of otherwise disjointed rules maps over somewhat to the most basic of AI architectures—the finite state machine (FSM). The most basic part of a FSM is a state. That is, an AI agent is doing or being something at a given point in time. It is said to be “in” a state. Theoretically, an agent can only be in one state at a time. (This is only partially correct because more advanced agents can run multiple FSMs in parallel… never mind that for now. Suffice to say, each FSM can only be in one state.)

The reason this organizes the agent behavior better is because everything the agent needs to know about what it is doing is contained in the code for the state that it is in. The animations it needs to play to act out a certain state, for example, are listed in the body of that state. The other part of the state machine is the logic for what to do next. This may involve switching to another state or even simply continuing to stay in the current one.

The transition logic in any given state may be as simple or as complex as needed. For example, it may simply involve a countdown timer that says to switch to a new state after a designated amount of time. It may be a simple random chance that a new state should be entered. For example, State A might say that, every time we check, there is a 10% chance of transitioning to State B. We could even elect to make the new state that we will transition to a result of a random selection as well—say a 1/3 chance of State B and a 2/3 chance of State C.

More commonly, state machines employ elaborate trigger mechanisms that involve the game logic and situation. For instance our “guard” state may have the logic, “if [the player enters the room] and [is holding the Taco of Power] and [I have the Salsa of Smiting], then attack the player” at which point my state changes from “guard” to “attack”. Note the three individual criteria in the statement. We could certainly have a different statement that says, “if [the player enters the room] and [is holding the Taco of Power] and [I DO NOT have the Salsa of Smiting], then flee.” Obviously, the result of this is that I would transition from “guard” to “flee” instead.

So each state has the code for what to do while in that state and, more notably, when, if, and what to do next. While some of the criteria can access some of the same external checks, in the end each state has its own set of transition logic that is used solely for that state. Unfortunately, this comes with some drawbacks.

Figure 1 – As we add states to a finite state machine, the number of transitions rapidly.

First, as the number of states increases, the number of potential transitions increases as well—at an alarming rate. If you assume for the moment that any given state could potentially transition to any of the other states, the number of transitions increases fairly quickly. Specifically, the number of transitions would be the [number of states] × ([number of states] – 1). In Figure 1, there are 4 states each of which can transition to 3 others for a total of 12 transitions. If we were to add a 5th state, this would increase to 20 transitions. 6 states would merit 30, etc. When you consider that games could potentially have dozens of states transitioning back and forth, you begin to appreciate the complexity.

What really drives that issue home, however, is the realization of the workload that is involved in adding a new state to the mix. In order to have that state accessible, you have to go and touch every single other state that could potentially transition to it. Looking back at Figure 1, if we were to add a State E, we would have to edit states A-D to add the transitions to E. Editing a state’s logic invokes the same problem. You have to remember what other states may be involved and revisit each one.

And the bigger it gets, the more opportunity for disaster.

Additionally, any logic that would be involved in that transition must also be interworked into the other state-specific logic that may already be there. With the sheer numbers of states in which to put the transition logic and the possible complexity of the integration into each one, we realize that our FSM taco suffers from some of the same fragility of the ad hoc tostada we mentioned earlier. Sure, because of its shape, we can pile a little more on and even handle it a little better. One bite, however, could shatter the shell and drop everything into our lap. And the bigger it gets, the more opportunity for disaster.

A Softer Approach—The Behavior Tree

So the problem with the taco is that it can hold a fair bit of content (more than the tostada anyway), but is a bit brittle. It is not the shape of the taco that is at fault as much as it is a result of using the hard shell. If only we could hold the same content delivered in much the same manner, but have our container be less prone to shattering when we put some pressure on it. The answer, of course, is the soft taco. And the analogue in the AI architecture could very well be the behavior tree.

At this point, it is useful to point out the different between an action and a decision. In the FSM above, our agents were in one state at a time—that is, they were “doing something” at any given moment (even if that something was “doing nothing”). Inside each state was decision logic that told them if they should change to something else and, in fact, what they should change to. That logic often has very little to do with the state that it is contained in and more to do with what is going on outside the state or even outside the agent itself. For example, if I hear a gunshot, it really doesn’t matter what I’m doing at the time—I’m going to flinch, duck for cover, wet myself, or any number of other appropriate responses. Therefore, why would I need to have the decision logic for “React to Gunshot” in each and every other state I could have been in at the time?

Figure 2 – In a behavior tree, the decision logic is separate from the actual state code.

This is the strength of the behavior tree. It separates the states from the decision logic. Both still exist in the AI code, but they are not arranged so that the decision logic is in the actual state code. Instead, the decision logic is removed to a stand-alone architecture (Figure 2). This allows it to run by itself—either continuously or as needed—where it selects what state the agent should be in. All the state code is responsible for is doing things that are specific to that state such as animating, changing values in the world, etc.

The main advantage to this is that all the decision logic is in a single place. We can make it as complicated as we need to without worrying about how to keep it all synchronized between different states. If we add a new behavior, we add the code to call it in one place rather than having to revisit all of the existing states. If we need to edit the transition logic for a particular behavior, we can edit it in one place rather than many.

Figure 3 — A simple behavior tree. At the moment the agent has decided to do a ranged attack.

Another advantage of behavior trees is that there is a far more formal method of building behaviors. Through a collection of tools, templates, and structures, very expressive behaviors can be written—even sequencing behaviors together that are meant to go together (Figure 3). This is one of the reasons that Behavior Trees have become one of the more “go-to” AI architectures in games having been notably used in titles ranging from Halo 2 and 3, to Spore.

A detailed explanation of what makes behavior trees work, how they are organized, and how the code is written is beyond the scope of this article. Suffice to say, however, that they are far less prone to breaking their shell and spilling their contents all over your lap. Because the risk of breaking is far less, and the structure is so much more organized, you can also pack in a lot more behavioral content.

For an excellent primer on behavior trees, see Bjoern Knafla’s Introduction to Behavior Trees on #AltDevBlogADay

A Hybrid Taco—The Hierarchical Finite State Machine

Figure 4 – In a hierarchical finite state machine, some states contain other related states making the organization more manageable.

A brief note before we leave the land of tacos behind. One the advantages of the behavior tree—namely the tree-like structure—is sometimes applied to the finite state machine. In the hierarchical finite state machine (HFSM), there are multiple levels of states. Higher-level states will only be concerned with transitioning to other states on the same level. On the other hand, lower-level states inside the parent state can only transition to each other. This tiered separation of responsibility helps to provide a little structural organization to a flat FSM and helps to keep some of the complexity under control.

If we were to place the HFSM into our Mexican metaphor, it would be similar to one of those nifty hard tacos wrapped in a soft taco shell. There’s still only so much you can pile into it before it gets unwieldy, but at least it doesn’t tend to shatter and make as big of a mess.

Building it from Scratch—Planning your Fajita

If you are anything like my wife, you want to be able to choose exactly what is in your Mexican dish—not just overall, but in each bite. That’s why she orders fajitas. While the fajita looks and acts a lot like a taco (specifically, a soft shell one), the method of construction is a little different. Rather than coming as an already assembled construction, the typical method of serving it is to bring you the tortillas and having the content in a couple of separate piles. You then construct your own on the spot like a personal mini buffet. You can choose what you want to put in the first one and then even change it up for the subsequent ones. It all depends on what you deem appropriate for your tastes at that moment.

The AI equivalent of this is the planner. While the end result of a planner is a state (just like the FSM and behavior tree above), how it gets to that state is significantly different.

Like a behavior tree, the reasoning architecture behind a planner is separate from the code that “does stuff”. A planner compares its situation—the state of the world at the moment—and compares it to a collection of individual atomic actions that it could do. It then assembles one or more of these tasks into a sequence (the “plan”) so that its current goal is met.

A planner actually works backwards from its goal.

Unlike other architectures that start at its current state and look forward, a planner actually works backwards from its goal (Figure 5). For example, if the goal is “kill player”, a planner might discover that one method of satisfying that goal is to “shoot player”. Of course, this requires having a gun. If the agent doesn’t have a gun, it would have to pick one up. If one is not nearby, it would have to move to one it knows exists. If it doesn’t know where one is, it may have to search for one. The result of searching backwards is a plan that can be executed forwards. Of course, if another method of satisfying the “kill player” goal is to throw a Taco of Power at it, and the agent already has one in hand, it would likely elect to take the shorter plan and just hurl said taco.

Figure 5 – The planner has found two different methods of achieving “kill player” and selected the shorter one.

The planner diverges from the FSM and BT in that it isn’t specifically hand-authored. Therein lies the difference in planners—they actually solve situations based on what is available to do and how those available actions can be chained together. One of the benefits of this sort of structure is that it can often come up with solutions to novel situations that the designer or programmer didn’t necessarily account for and handle directly in code.

From an implementation standpoint, a major plus of the planner is that a new action can be dropped into the game and the planner architecture will know how to use it. This speeds up development time markedly. All the author says is, “here are the potential things you could do… go forth and do things.”

Of course, a drawback of this is that authorial control is diminished. In a FSM or BT, creative, “outside the box” solutions were the exception from the predictable, hand-authored systems. In a planner, the scripted, predictable moments are the exception; you must specifically override or trick the planning system to say, “no… I really want you to do this exact thing at this moment.”

While planner-based architectures are less common than behavior trees, there are notable titles that used some form of planners. Most famously, Jeff Orkin used them in Monolith’s creepy shooter, F.E.A.R. His variant was referred to as Goal-Oriented Action Planning or GOAP. For more information on GOAP, see Jeff’s page, http://web.media.mit.edu/~jorkin/goap.html

A more recent flavor of planner is the hierarchical task network (or HTN) planner such as was used to great effect in Guerilla’s Killzone 2. For more information on HTN planning in Killzone 2, visit http://aigamedev.com/open/coverage/htn-planning-discussion/

Putting It All in a Bowl—A Utility-Based Salad

Another architecture that is less structured than the FSM or behavior tree is what has been called in recent years simply “utility-based” method. Much like the planner, a utility-based system doesn’t have a pre-determined arrangement of what to do when. Instead, potential actions are considered by weighing a variety of factors—what is good and bad about this?—and selecting the most appropriate thing to do. As you can see, this is similar to the planner in that the AI gets to choose what’s best at the time.

The action with the highest score wins.

Instead of assemblinga plan like the fajita-like planner, however, the utility-based system simply selects the single next bite. This is why it is more comparable to a taco salad in a huge bowl. All the ingredients are in the mix and available at all times. However, you simply select what it is that would like to poke at and eat. Do you want that piece of chicken in there? A tomato, perhaps? An olive? A big wad of lettuce? You can select it based on what you have a taste for or what is most accessible at the moment.

Figure 6 – A typical utility-based system rates all the potential actions on a variety of criteria and selects the best.

One of the more apparent examples of a utility-based AI system is in The Sims. In fact, the considerations are largely shown in the interface itself. The progression of AI architectures throughout The Sims franchise is well documented and I recommend reading up on it. The short version is that each potential action in the game is scored based on a combination of an agent’s current needs and the ability of that action or item to satisfy that need. The agent then uses an approach common in utility-based methods and constructs a weighted sum of the considerations to determine which action is “the best” at that moment. The action with the highest score wins (Figure 6).

While utility-based systems can be used in many types of games, they are more appropriate in situations where there are a large number of potentially competing actions the AI can take—often with no obvious “right answer.” In those times, the mathematical approach that utility-based systems employ is necessary to ferret out what the most reasonable action to take is. Aside from The Sims, other common areas where utility-based systems are appropriate are in RPGs, RTS, and simulations.

Like behavior trees and planners, the utility-based AI code is a reasoner. Once an action is decided up, the agent still must transition to a state. The utility system simply is selecting what state to go to next. In the same way, then, just like those other systems, the reasoning code is all in a single place. This makes building, editing, tuning and tweaking the system much more compartmentalized. Also, like a planner, adding actions to the system is fairly straight forward. By simply adding the action with the appropriate weights, the AI will automatically take it into account and being using it in relevant situations. This is one of the reasons that games such as The Sims were as expandable as they were—the agents simply included any new object into their decision system without any changes to the underlying code.

The system is providing suggestions as to what might be a good idea.

On the other hand, one drawback of a utility system is that there isn’t always a good way to intuit what will happen in a given situation. In a behavior tree, for example, it is a relatively simple exercise to traverse the tree and find the branches and nodes that would be active in a particular situation. Because a utility system is inherently more fuzzy than binary, determining how the actions stack up is often more opaque. That’s not to say that a utility-based AI is not controllable or configurable. In fact, utility systems offer a deep level of control. The difference is that rather than telling the system exactly what to do in a situation, the system is providing suggestions as to what might be a good idea. In that respect, a utility system shares some of the adaptable aspects of planners—the AI simply looks at its available options and then decides what is most appropriate.

For more reading on utility-based systems, please check out my book, Behavioral Mathematics for Game AI or, if you have GDC Vault access, you can view my AI Summit lectures (with Kevin Dill) from 2010 and 2012 entitled, “Improving AI Decision Modeling through Utility Theory” and “Embracing the Dark Art of Mathematical Modeling” respectively.

Wrap it Up—A Neural Network Burrito

The last entry in my (extremely strained) metaphorical cornucopia is the burrito. In the other examples, all the content that was being delivered was open and available for inspection. In the case of the fajita, you (the AI users) were able to assemble what you wanted in each iteration. In the taco salad, the hard and soft tacos, and even the tostada you were able to, as the cliché says, “season to taste”. Perhaps a little extra cheese over the top? Maybe a few extra tomatoes? Even if you didn’t edit the content of your dish, you were at least able to see what you were getting into before you took a bite. There was no mystery about what made up your dinner; everything was open and available for your inspection.

The burrito is different in this respect. You are often told, in general terms, what the burrito is supposed to be packing but the details are often hidden from view. To paraphrase Winston Churchill, “it is a riddle, wrapped in a mystery, inside a soft flour shell.” While the burrito (and for that matter, the neural network) is extremely flexible, you have absolutely no idea what is inside or what you are going to get in the next bite. Don’t like sour cream? Olives? Tough. If it’s in there, you won’t know until you take that bite. At that point, it is too late. There is no editing without completely unwrapping the package and, for all intents and purposes, starting from scratch.

This is the caveat emptor of the NN-based AI solution. As a type of “learning” AI, neural nets need to be trained with test or live performance data. At some point you have to wrap up the training and say, “This is what I have”. If a designer wanders in, looks over your shoulder and says, “It looks pretty cool, but in [this situation] I would like it to do [this action] a little a little more often,” there’s really nothing you can do to change it. You’ve already closed your burrito up. About all you can do is try to retrain the NN and hope for the best.

So while the NN offers some advantages in being able to pile a lot of things into a huge concoction of possibilities, there are huge disadvantages in a lack of designer control after the fact. Unfortunately, this tends to disqualify NNs and other machine-learning solutions from consideration in the game AI environment where that level of control is not only valuable but often a requirement.

That said, there have been a few successful implementations of NNs in games—for example, Michael Robbins used NNs to improve the tactical AI of Supreme Commander 2 from Gas Powered Games. (For those with GDC Vault access, you can see more about his implementation of this in the AI Summit session, “Off the Beaten Path: Non-Traditional Uses of AI”.)

Browsing the Buffet

So we’ve covered a variety of architectures and, for what its worth, equated them with our Mexican delights. Going back to our premise, as far as the content goes, it’s all the same stuff. The difference is simply the delivery mechanism—that is, the shape of the wrapping—and the associated pros and cons of each. This has by no means been an exhaustive treatment of AI architectures. The purpose was simply to expose you to the options that are out there and why you may or may not want to select each for the particular tastes and needs of your project. To sum up, though, let’s go through the dishes once again…

You can certainly just throw the occasional rule into your code that controls behavior—but that’s not really an “architecture”. By organizing your AI into logical chunks, you can create a finite state machine. A FSM is relatively easy to construct and for non-programmers to understand. While this is good for simple agents with a limited number behaviors, it gets brittle quickly as the number of states increases. By organizing the states into the logical tiers of a hierarchical finite state machine (HFSM), you can mitigate some of this complexity.

By removing the reasoning code from the states themselves, you can gain a lot more flexibility. By organizing behaviors into logically similar branches, you can construct a behavior tree. BTs are also fairly easy for designers and other non-programmers to understand. The main advantages, however, are not only the ease with which they can be constructed but how well they scale without the need for a lot of extra programming. Once the BT structure is in place, new branches and nodes can be added easily. However, despite how robust BT implementations can get, it is still a form of hand-authored of scripting—“when X, do Y”.

Figure 7 — The type of architecture you select needs to be based on YOUR needs.

Like the behavior tree, a planner allows for very extensible creation. However, where the BT is more hand-authored by designers, a planner simply “solves” situations using whatever it feels is best for the situation. This can be powerful, but also leads to a very scary lack of control for designers.

Similarly, utility-based systems depart from the specific script approach and allow the characters to decide freely what to do and, as above, might unsettle some designers. They are incredibly expandable to large numbers of complex factors and possible decisions. However, they are slightly more difficult to intuit at times although tools can be easily built that aid that process.

The ultimate hands-off black box in the neural network. Even the programmers don’t know what’s going on inside their little neurons at times. The good news is that they can often be trained to “do what human players would do.” That aspect itself holds a lot of appeal in some genres. They are also a little easier to build since you only have to construct the training mechanism and then… well… train it.

There is no “one size fits all” solution to AI architectures.

The point is, there is no “one size fits all” solution to AI architectures. (You also don’t have to limit yourself to a single architecture in a game.) Depending on your needs, your team, and your prior experience, any of the above may be The Right Way for you to organize your AI. As with any technical decision, the secret is to research what you can, even try a few things out, and decide what you like. If I am your waiter AI consultant, I can help you out… but the decision is ultimately going to be what you have a hankerin’ for.

Now let’s eat!

(The author would like to dedicate this article to all the GDC, Gamasutra, and GDMag staff who are still lamenting the departure of the beloved Mexican restaurant that used to be located in their building. I mourn with you, my friends.)

Sun Tzu as a Design Feature?

Saturday, June 5th, 2010

Total War creator, The Creative Assembly, has announced the development of the latest in the line of acclaimed RTS games, Shogun 2. While the Total War franchise has a 10-year history and is fairly well-known for its AI,  this blurb from their web site has spread through the web like an overturned ink well:

Featuring a brand new AI system inspired by the scriptures that influenced Japanese warfare, the millennia old Chinese “Art of War”, the Creative Assembly brings the wisdom of Master Sun Tsu to Shogun 2: Total War. Analysing this ancient text enabled the Creative Assembly to implement easy to understand yet deep strategical gameplay.

Sun Tzu‘s “The Art of War” has been a staple reference tome since he penned it (or brushed it… or whatever) in the 6th century B.C. It’s hard to find many legends that have made it for over 20 centuries. Its applications have been adapted in various ways to go beyond war to arenas such as business and politics. Suffice to say that “The Art of War” lives on as “things that just make sense”.

The problem I have here is that this seems to be more of a marketing gimmick than anything. After all, most of what Sun Tzu wrote should, in various forms, already be in game AI anyway.  To say Sun Tzu’s ideas are unique to him and would never have been considered without his wisdom is similar to saying that no one thought that killing was a bad idea until Moses wandered down the hill with “Thou Shalt Not Kill” on a big ol’ rock. No one stood around saying, “Gee… ya think?” Likewise, Sun Tzu’s advice about “knowing your enemy” is hardly an earth-shattering revelation.

Certainly, there is plenty of game AI out there that could have benefited from a quick read of a summary of Art of War.

Certainly, there is plenty of game AI out there that could have benefited from a quick read of a summary of Art of War. Things like “staying in cover and waiting for the enemy to attack you” come to mind. Of course, in the game world, we call that “camping” (as an individual) or “turtling” (as a group). I can imagine a spirited argument as to whether a camping/turtling AI is necessarily What Our Players Want™, however. It certainly beats the old “Doom model” of “walk straight towards the enemy”.

And what about the Sun Tzu concept of letting your two enemies beat the snot out of each other before you jump in? (I believe there are translations that yielded “dog shit” rather than “snot” but the meaning is still clear.) If you are in an RTS and one enemy just sits and waits for the other one whack you around a little bit, it’s going to look broken. On the other hand, I admit to doing that in free-for-all Starcraft matches… because it is a brutal tactic!

The problem I have with their claim is that we already do use many of his concepts in game AI.

The problem I have with their claim, however, is that there are many concepts in the Art of War that we already do use in game AI. By looking at Sun Tzu’s chapter headings (or whatever he called them) we can see some of his general ideas:

For ease of reference, I pillage the following list from Wikipedia:

  1. Laying Plans/The Calculations
  2. Waging War/The Challenge
  3. Attack by Stratagem/The Plan of Attack
  4. Tactical Dispositions/Positioning
  5. Energy/Directing
  6. Weak Points & Strong/Illusion and Reality
  7. Maneuvering/Engaging The Force
  8. Variation in Tactics/The Nine Variations
  9. The Army on the March/Moving The Force
  10. The Attack by Fire/Fiery Attack
  11. The Use of Spies/The Use of Intelligence

Going into more detail on each of them, we can find many analogues to existing AI practices:

Laying Plans/The Calculations explores the five fundamental factors (and seven elements) that define a successful outcome (the Way, seasons, terrain, leadership, and management). By thinking, assessing and comparing these points you can calculate a victory, deviation from them will ensure failure. Remember that war is a very grave matter of state.

It almost seems to easy to cite planning techniques here because “plans” is in the title. I’ll go a step further then and point out that the practice of collecting information and assessing the relative merits of the selection, you can determine potential outcomes or select correct paths of action. This is a common technique in AI decision-making calculations. Even the lowly min/max procedure is, in essence simply comparing various potential paths through the state space.

Waging War/The Challenge explains how to understand the economy of war and how success requires making the winning play, which in turn, requires limiting the cost of competition and conflict.

This one speaks even more to the min/max approach. The phrase “limiting the cost of competition and conflict” expresses the inherent economic calculations that min/max is based on. That is, I need to get the most bang for my buck.

Attack by Stratagem/The Plan of Attack defines the source of strength as unity, not size, and the five ingredients that you need to succeed in any war. In order of importance attack: Strategy, Alliances, Army, lastly Cities.

Any coordinating aspects to the AI forces falls under this category. For example, the hierarchical structure of units into squads and ultimately armies is part of that “unity” aspect. Very few RTS games send units into battle as soon as they are created. They also don’t go off and do their own thing. If you have 100 units going to 100 places, you aren’t going to have the strength of 100 units working as a collection.  This has been a staple of RTS games since their inception.

Tactical Dispositions/Positioning explains the importance of defending existing positions until you can advance them and how you must recognize opportunities, not try to create them.

Even simply including cover points in a shooter game can be thought of as “defending existing positions”.

Even simply including cover points in a shooter game can be thought of as “defending existing positions”. More importantly, individual or squad tactics that do leapfrogging, cover-to-cover, movement is something that has been addressed in various ways for a number of years. Not only in FPS games do we see this (e.g. F.E.A.R.), but even in some of the work that Chris Jurney did originally in Company of Heroes. Simply telling a squad to advance to a point didn’t mean they would continue on mindless of their peril. Even while not under fire, they would do a general cover-to-cover movement. When engaged in combat, however, there was a very obvious and concerted effort to move up only when the opportunity presented itself.

This point can be worked in reverse as well. The enemies in Halo 3, as explained by Damián Isla in his various lectures on the subject, defend a point until they can no longer reasonably do so and then fall back to the next defensible point. This is a similar concept to the “advance” model above.

Suffice to say, whether it be advancing opportunistically or retreating prudently, this is something that game AI is already doing.

Energy/Directing explains the use of creativity and timing in building your momentum.

This one is a little more vague simply because of the brevity of the summary on Wikipedia. However, we are all well aware of how some games have diverged from the simple and stale “aggro” models that were the norm 10-15 years ago.

Weak Points & Strong/Illusion and Reality explains how your opportunities come from the openings in the environment caused by the relative weakness of your enemy in a given area.

Identifying the disposition of the enemy screams of influence mapping…

Identifying the disposition of the enemy screams of influence mapping—something that we have been using in RTS games for quite some time. Even some FPS and RPG titles have begun using it. Influence maps have been around for a long time and their construction and usage are well documented in books and papers. Not only do they use the disposition of forces as suggested above, but many of them have been constructed to incorporate environmental features as Mr. Tzu (Mr. Sun?) entreats us to do.

Maneuvering/Engaging The Force explains the dangers of direct conflict and how to win those confrontations when they are forced upon you.

Again, this one is a bit vague. Not sure where to go there.

Variation in Tactics/The Nine Variations focuses on the need for flexibility in your responses. It explains how to respond to shifting circumstances successfully.

This is an issue that game AI has not dealt with well in the past. If you managed to disrupt a build order for an RTS opponent, for example, it might get confused. Also AI was not always terribly adaptive to changing circumstances. To put it in simple rock-paper-scissors terms, if you kept playing rock over and over, the AI wouldn’t catch on and play paper exclusively. In fact, it might still occasionally play scissors despite the guaranteed loss to your rock.

Lately, however, game AI has been far more adaptive to situations. The use of planners, behavior trees, and robust rule-based systems, for example, has allowed for far more flexibility than the more brittle FSMs allowed for. It is much harder to paint an AI into a corner from which it doesn’t know how to extricate itself. (Often, with the FSM architecture, the AI wouldn’t even realize it was painted into a corner at all and continue on blissfully unaware.)

The Army on the March/Moving The Force describes the different situations inf them.

[editorial comment on the above bullet point: WTF?]

I’m not sure to what the above refers, but there has been a long history of movement-based algorithms. Whether it be solo pathfinding, group movement, group formations, or local steering rules, this is an area that is constantly being polished.

The Attack by Fire/Fiery Attack explains the use of weapons generally and the use of the environment as a weapon specifically. It examines the five targets for attack, the five types of environmental attack, and the appropriate responses to such attack.

For all intents and purposes, fire was the only “special attack” that they had in 600 BC. It was their BFG, I suppose. Extrapolated out, this is merely a way of describing when and how to go beyond the typical melee and missile attacks. While not perfect, actions like spell-casting decisions in an RPG are not terribly complicated to make. Also, by tagging environmental objects, we can allow the AI to reason about their uses. One excellent example is how the agents in F.E.A.R. would toss over a couch to create a cover point. That’s using the environment to your advantage through a special (not typical) action.

The Use of Spies/The Use of Intelligence focuses on the importance of developing good information sources, specifically the five types of sources and how to manage them.

The interesting point here is that, given that our AI already has the game world at its e-fingertips, we haven’t had to accurately simulate the gathering of intelligence information. That has changed in recent years as the technology has allowed us to burn more resources on the problem. We now regularly simulate the AI piercing the Fog of War through scouts, etc. It is only a matter of time and tech before we get even more detailed in this area. Additionally, we will soon be able to model the AI’s belief of what we, the player, know of its disposition. This allows for intentional misdirection and subterfuge on the part of the AI. Now that will be fun!

Claiming to use Sun Tzu’s “Art of War” makes for good “back of the box” reading…

Anyway, the point of all of this is that, while claiming to use Sun Tzu’s “Art of War” makes for good “back of the box” reading, much of what he wrote of we as game AI programmers do already. Is there merit in reading his work to garner a new appreciation of how to think? Sure. Is it the miraculous godsend that it seems to be? Not likely.

In the mean time, marketing fluff aside, I look forward to seeing how it all plays out (so to speak) in the latest Total War installment. (Looks like I might get a peek at E3 next week anyway.)

Fritz Heckel’s Reactive Teaming

Tuesday, May 25th, 2010

Fritz Heckel, a PhD student in the Games + Learning Group at UNC Charlotte, posted a video (below) on the research he has been doing under the supervision of G. Michael Youngblood. He has been working on using subsumption architectures to create coordination among multiple game agents.

When the video first started, I was a bit confused in that he was simply explaining a FSM. However, when the first character shared a state with the second one, I was a little more interested. Still, this isn’t necessarily the highlight of the video. As more characters were added, they split the goal of looking for a single item amongst them in that they parsed the search space.

This behavior certainly could be used in games… for example, with guards searching for the player. However, this is simply solved using other architectures. Even something as simple as influence mapping could handle this. In fact, Damián Isla’s occupancy maps could be tweaked accordingly to allow for multiple agents in a very life-like way. I don’t know what Fritz is using under the hood, but I have to wonder if it isn’t more complicated.

Obviously, his searching example was only just a simple one. He wasn’t setting out to design something that allowed people to share a searching goal, per se. He was creating an architecture for cooperation. This, too, has been done in a variety of ways. Notably, Jeff Orkin’s GOAP architecture from F.E.A.R. did a lot of squad coordination that was very robust. Many sports simulations do cooperation — but that tends to be more playbook-driven. Fritz seems to be doing it on the fly without any sort of pre-conceived plan or even pre-known methods by the eventual participants.

From a game standpoint, it seems that this is an unnecessary complication.

In a way, it seems that the goal itself is somewhat viral from one agent to the next. That is, one agent in effect explains what it is that he needs the others to do and then parses it out accordingly. From a game standpoint, it seems that this is an unnecessary complication. Since most of the game agents would be built on the same codebase, they would already have the knowledge of how to do a task. At this point, it would simply be a matter of having one agent tell the other “I need this done,” so that the appropriate behavior gets switched on. And now we’re back to Orkin’s cooperative GOAP system.

On the whole, a subsumption architecture is an odd choice. Alex Champandard of AIGameDev pointed out via Twitter:

@fwph Who uses subsumption for games these days though? Did anyone use it in the past for that matter?

That’s an interesting point. I have to wonder if, as is the case at times with academic research, it is not a case of picking a tool first and then seeing if you can invent a problem to solve with it. To me, a subsumption architecture seems like it is simply the layered approach of a HFSM married with the modularity of a planner. In fact, there has been a lot of buzz in recent years about hierarchical planning anyway. What are the differences… or the similarities, for that matter?

Regardless, it is an interesting, if short, demo. If this is what he submitted to present at AIIDE this fall, I will be interested in seeing more of it.

Writing AI is Like Parenting

Sunday, April 6th, 2008

Ted Vessenes wrote a nifty little post on his blog where he compared designing and programming AI to being a parent. Here’s the opening paragraph:

“Writing artificial intelligence is a lot like being a parent. It requires an unbelievable amount of work. There are utterly frustrating times where your children (or bots) do completely stupid things and you just can’t figure out what they were thinking. And there are other times they act brilliantly, and all the effort feels satisfying and well spent.”

I have to agree with a lot of the points he makes in his post. I would like to take the analogy one step farther.

I’ve occasionally made the point about both parenting and AI that your job is to not define what your progeny should do but convey an understanding of why. If, as a parent, you tell your child not to run in the street, they will hopefully carry that lesson into the future. However, they may not apply that same edict to driveways, parking lots or any other places where they could get plowed over by a car. This is analogous to the scripted AI methodology. However, if you explain the why of the situation – i.e. “be careful anywhere that cars are moving because the driver may not see you in time to stop and you could get badly hurt” – then the simple rule can be applied to any situation where there are cars (or even car-like objects). This, of course, maps over to rule-based systems or even planning systems.

However, going back to Ted’s point, it is an interesting similarity to put all those rules into place and hope that your little bots realize the appropriate situations in which to use them. I actually wrote a column about this scary process on my weekly column over at AIGameDev.

Anyway, if you are an AI developer, I hope that you are blessed with many children who all grow up to be accomplished in their chosen lives (or deaths).

F.E.A.R. sequel promises "visual density"

Wednesday, January 30th, 2008

I noticed this GamePro blurb about the upcoming sequel to F.E.A.R. Here’s an excerpt…

“The most obvious difference that will hit the player right away is in the visual density of the world,” said Mulkey. “F.E.A.R. looked really great, but where F.E.A.R. would have a dozen props in a room to convey the space, Project Origin will have five times that much detail.

“Of course, this will only serve to further ratchet up that ‘chaos of combat’ to all new levels with more breakables, more debris, more stuff to fly through the air in destructive slow motion beauty.”

OK… I can dig that. One thing I noticed as I played through F.E.A.R. is that things were kinda sparse. (I really got tired of seeing the same potted cactus, too.)

The part that I am curious about, however is this:

… Mulkey says improved enemy behavior is at the top of the list.

“We are teaching the enemies more about the environment and new ways to leverage it, adding new enemy types with new combat tactics, ramping up the tactical impact of our weapons, introducing more open environments, and giving the player the ability to create cover in the environment the way the enemies do,” he says.

Now that is the cool part. When the enemies in the original moved the couches, tables, bookshelves, etc. it was cool… but rather infrequent. I was always expecting them to do more with it. If they are both adding objects to the environment and then “teaching” the agents to actually use those objects, we may see a level of environment interactivity that we’ve never experienced before.

The cool thing about their planning AI structure is that there isn’t a completely rediculous ramp-up in the complexity of the design. All one needs do is tag an object that it can be used in a certain way and it gets included into the mix. On the other hand, having more objects to use and hide behind does increase the potential decision space quite a bit. It’s like how the decision tree in chess is far greater than that of Tic-tac-toe because there are so many more options. The good news is that the emergent behavior level will go through the roof. The bad news is that it will hit your processor pretty hard. Expect the game to be a beast to run on a PC.

I certainly am looking forward to mucking about with this game!

Level Designers trumping AI Programmers

Sunday, January 6th, 2008

I hate glomming on to a blog chain, but I’m going to link to AIGameDev’s article on an article (which may very well be about an article.) The title is Watching Level Designers Use Scripts to Disable Your Autonomous AI: Priceless – which just about covers it. Alex does a nice job of not just reporting on it, but explaining the mindset and even the things to watch out for.

Regular readers of my other blog, Post-Play’em will know that I talked about the idea of scripts over-riding AI behaviors in Call of Duty 2 in a post entitled Call of Duty 2: Omniscience and Invulnerability. Specifically, this was in reference to one of the behaviors mentioned in the other article where an AI agent takes on a temporary god-like quality of invulnerability until such time as he finishes a scripted event – at which time he is no longer important to the level designer’s wishes and is cast back into the pot of cannon fodder so that I can mow him down properly.

Getting back to the initial topic, my thought is that part of the issue between artists/level designers and programmers may very well be that the level designers don’t have a trust in the capabilities of autonomous AI agents… or even and understanding of what could be done with them.

For example, with the use of goal-based agents such as those found in F.E.A.R. (related post), rather than a designer saying “I want the bot to do A then B, then C on his way to doing the final action of D.” he could simply tell the goal-based agent that “D is a damn good goal to accomplish.” If constructed properly, the agent would then realize that a perfectly viable way of accomplishing D would be via A-B-C-D. The difference between these two methods is important. If C is no longer a viable (or intelligent looking) option, then the scripted bot either gets stuck or looks very dumb in still trying to accomplish D through that pre-defined path. The very nature of planning agents, however, would allow the agent to try to find other ways of satisfying D. If one exists, he will find it. If not, perhaps another goal will suffice.

The problem is, while AI programmers understand this concept (especially if you are the one who wrote the planner for that game), level designers and particularly artists, may not have an intuitive grasp on this. They are cut more from the cloth of writers – “and then this happened, and then this, and then it was really cool when I wrote this next thing because I wanted the agent to look smart, and then this…” That is being a writer - and is why many games continue to be largely linear in nature. You are being pulled through an experience on a string of scripted events. (See related post on Doom 3’s scripting vs. AI)

So, can the problem of designers trumping AI programmers be solved? It will always be there to some extent. But education and communication will certainly help the matter.

Behavior Trees

Friday, December 14th, 2007

Time for a taste of the Lyon, France Game Developers Conference!

Alex Champandard at AIGameDev.com posted part 1 of a presentation he gave on the use of behavior trees in game AI.

Seriously good stuff!

(note: there may be a problem viewing the videos with IE – they work fine in Firefox.)

Temporal Coherence and Planning

Tuesday, December 11th, 2007

Alex at AIGameDev has a great essay up entitled “Memento, Temporal Coherence and Debugging Planners“. In it, he talks about how planning algorithms have the problem of having their assumptions about the world fall quickly out of scope as the world changes. One solution is to continually replan from scratch – which can become quite expensive to do for numerous agents.

He offers a couple of solutions – and the comments on the post have turned into a rather interesting discussion on the caveats and possibilities. Check it out!

Add to Google Reader or Homepage

Latest blog posts:

IA News

IA on AI

Post-Play'em




Content ©2002-2010 by Intrinsic Algorithm L.L.C.

OGDA