Balance Testing

I wrote a post earlier on Faction Design. This new post is similar, but not quite the same. I've also written about Testing your game.

The development of Empire has been such a fruitful experience, both in that I feel I've made my second good game (Farmageddon being the first) and that I've learned an absurd amount, especially about designing more complicated games. I honestly believe I test games well -- I take in feedback, I know what questions to ask, and I know what to look for in a test. But, thanks to Empire, I'm now learning a great deal about testing for balance.

I have some quick tips for things to consider when testing for balance. These tips are especially important when dealing with a game with asymmetrical properties. In my case, asymmetrical Army factions, each with unique capabilities.

Take notes throughout, but act upon the final result. The ideal test environment is one in which you, the designer, are merely an observer. This lets you watch human expressions and listen for deeper meaning into the commentary. You should be taking notes the entire time. For the sake of balance testing, you should note people's favorite abilities. You should note the ones that cause the most excitement and "holy crap!" type exclamations.

For Empire, Encirclement (Royal Brigade Offensive Tactic) and Bombardment (Imperial Army Offensive Tactic) always cause the "holy crap!" Form Up (Yorkan Staff Order) and False Orders (Republik Militia Staff Order) cause "oh noes!" from the victims and observers. However, the player executing the tactic is grinning like a jackal.

Over time, however, I notice that some Tactics are used more often than I'd like. Tactics and abilities seem to be overpowered and unfair. People start to whine and complain. My notes start to get more dramatic.

  • Cut this ability in half.
  • Reduce reinforcements by 3
  • Make this only do this one thing instead of 2

But, then the game ends and I examine the final score. The player who seemed to have the runaway ability only won by 2 against 2 of the 3 opponents. Heck, the first and last player only had a spread of 6 points. Clearly, my big, dramatic fears were unfounded. Plus, this "runaway faction" was third place in the previous game. My solution? Slightly tweak the ability to use the Tactic to make it slightly more difficult.

You need to get the full story and fully examine the facts before you dramatically re-tune something. If you change course mid-game you will deny yourself some really useful data. Take notes throughout, don't decide until the end of the test. Measure twice, cut once.

If you're curious, here are the two tiny changes I'm making to Empire as a result of this week's tests.

  • Form Up Staff Order now requires a Fog + Cavalry card to activate. Previously, it was Fog + any card.
  • Double Time Staff Order now states: Take two Mobilize Actions. You can Mobilize into a battle territory. (Bold text is the change.)

Good balance isn't just fairness, but an approximately equal set of choices. One thing you need to adjust for is imbalance. If Faction A has a 10 magnitude ability and Faction B has a 6 magnitude ability, you need to bring those into approximate parity with each other. However, once you move past this, you need to make sure that players see value in all of their options. You need to ensure that different options are useful in every game so that dominant strategies or repetitive choices foul the experience.

In Empire, every player has 4 unique abilities (Offensive Tactics, Defensive Tactics, Staff Orders) and one Army specific attribute. My intent was not just that these are options and privileges, but dictates to how you should play. For example, the Cave Goblins in Summoner Wars are flimsy but numerous. You should augment your strategy accordingly. In Empire, if you are turtling with the Yorkans, you're playing incorrectly.

So, the task for me is to make sure that players feel they have good options with which to execute a winning strategy. Furthermore, they need to feel that they have multiple good options in a variety of situations. If you see a player using Ability C over and over again instead of Abilities A, B, and D, you should ask a few questions:

  • Is Ability C too powerful?
  • Are abilities A, B, and D under powered?
  • Are the abilities explained (rules text) in a way that makes them appear less valuable?
  • If I were to list a strategy example, would that make it more enticing?
  • Could this be an art thing? I.e. giant laser looks WAY more fun than radar dish?

Most importantly, you should ask the tester! "Hey, why are you only using Ability C?" Perception is everything in a game. Make sure your game is presented such that a player's perception is that he has a full toolbox full of awesome choices.

Always keep your design goals in mind. This is a fundamental rule that I consider sacred. This is important for design, testing, balance testing, pitching to publishers. Always keep your goals in mind. You should balance your game such that is balanced and fair, but ONLY if that's your intent. If you want the game to be subtle, don't throw in the gigantic mega-bomb. If you want lots of combos, don't make your turn structure rigid. When balancing your game, always check your new changes against your philosophical approach to the game.

Test with the same data before making changes. Even if something appears REALLY broken, you need to test the same game with NO changes many times before making changes. This is the scientific method and it's crucial. GenCon was incredibly useful because I tested the same version of Empire 12 times over 4 days. Had I been home I would have been tempted to change it every time. But, being away from a computer and my prototype materials I had to run with it. What did I learn?

Well, I knew precisely what needed to change. I also knew that the game was mostly balanced. I had scores from 12 games with 40 or so players. The evidence was clear.

When you're testing a game for its mechanics, you can change the game fairly frequently. Why? Well, broken is broken. When you're testing balance, you need to factor in things like:

  • Player skill
  • Player familiarity with the game
  • Player familiarity with the faction
  • Player personality (aggressive versus passive versus erratic versus etc)

On Monday, I was worried the Militia was too powerful. On Tuesday, they took last place. On Monday, the player who played the Militia had played them 4 times previously. He knew them like an old friend. On Tuesday, I had 4 entirely new players play every faction. Is the Militia perfectly balanced? I don't know, but I know they aren't wildly imbalanced.

In an ideal world, I would have 4 equally skilled players playing the game 5 times with the same factions. That can't always happen, but I can try to steer my test sessions towards that.

Balance testing cannot truly begin until the mechanics are completed. Some may disagree with me here, but this has been my experience with both Farmageddon and Empire. Early Farmageddon had problems with 2 player rules, how many cards people could play, how much Fertilizer to use, etc. But, once those elements were finalized I spent months and months just revising the Action cards.

I began designing my war game in January (it was called General Staff back then). I've been testing the prototype since April. It has taken pretty much all of those 5 months (about 30 tests) to bring the mechanics within 90% of what I think are final. Now, a future publisher (fingers crossed!) may disagree and we'll cross that bridge, but I think the mechanics are mostly finished. Without an incredibly firm foundation, you cannot properly evaluate the balance of your game. It's practically impossible.

Why? Faction balance requires you evaluate the abilities for all player scaling, different player personalities, different starting positions and spatial relationships, and more. If you're trying to evaluate balance, which is a tiny, subtle thing, you need NOTHING else to be shifting. Otherwise, how you can really know if the ability is imbalanced? Was it the imbalance? Or was it imbalanced because the mechanic upon which it was built was poor?

This post went on a bit long for a Friday afternoon. Was this useful? Any interesting tidbits? Any advice of your own to share?

Comments

I had a problem for a long time where players would turtle up, which basically means they are camping or being overly passive/defensive. They would wait until the very end then jump out with a HUGE army. They'd be bored all game waiting and other players, who played how I wanted, would be upset they lose because of the guy who sat and did nothing.

So, I made it so there were more ways to score and that there was a mid-game scoring round. I made it so holding territory (end game) was valuable, but earning battle trophies (winning battles/being aggressive) was also valuable. And really, you need a little combination of both to come out ahead.

The scoring rounds and variety completely and fundamentally shifted the player behavior and it hasn't been a problem since.

I appreciate the distinction you draw between testing mechanics, which if broken merit immediate action, versus testing balance, which may require more data points to be judged broken. I'm still fixing broken mechanics in my current game, so evaluation of balance hasn't come yet. But drawing from your experience, I won't be as hasty to draw conclusions when I get there.

Also, "turtling" was a new term for me. Thanks to you and Google, I've expanded my vocabulary!

I really appreciate your insight into how to make play balance adjustments in an assymetric game. I have a design that's been on the shelf for a while that covers the Wars of the Successors after the death of Alexander the Great. The assymetry comes from the starting positions of the different factions (rather than their abilities). Early playtests indicated that centrally located factions were at a disadvantage because they had to contend with fronts against multiple opponents. Right now, the rules have the players bid for position, so that the advantageous positions start with fewer armies. I'm not sure that's the best balance mechanism, however. Based on your observations from testing Empire, I think that when I dust it off again, I will do a lot of playtesting to see whether I need to make some other adjustments so that every faction has some strategic opportunity and none has a clearcut advantage.

Great insights and reminders!