A Statistical Method of Creating Decklists
Written by neosystems on April 15, 2010
Since reading the following articles (http://manamaze.com/special) and (http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/feature/65), I have been fascinated with different methods of taking a large number of decklists and making a statistically created decklist. Frank Karsten, the author of both of those articles, popularized this method when he piloted his Faeries deck to a Top 8 finish. I believe there is a lot of room to improve on his methods, as I believe that simply taking the arithmetic mean creates a lot of bad version of decks.
The goal in technical terms is to take a large number of decklists (this does work with a small sample size, but works best with a large sample size) in order to analyze card choices and the number of copies of each card played in a deck to create a general “shell”. By shell, I mean that the cards in the deck will include, for the most part, only cards in numbers that are run by a large portion of other decks. In this way, you will not have a finely tuned version of the deck metagamed for an expected field – rather, your deck should give you the best possible chance against an unknown field. More precisely, it will give you the best possible chance against an “average” field where the most popular/best deck would show up in the largest quantity, the second deck in the second largest quantity, and so on. Should you have some knowledge of what you will play against, it is up to you to change card choices appropriately. Here are the statistical methods primarily used for this:
Arithmetic Mean: When someone says “mean” they are usually referring to the Arithmetic Mean, where you take a data set of n points, take the sum of that data set, and divide by n to get a value. I do refer to this as Arithmetic Mean throughout the article, because I will get into the Geometric and Harmonic means later on.
Median: The median is found by taking a data set, and simply finding the middle value. If there is no actual middle, but rather, two data points end up in the middle, simply find the arithmetic mean of the two to get the median.
Mode: For analysis in terms of M:tG decks, the mode is actually more useful than the arithmetic mean. I will get into the pitfalls of simply using the arithmetic mean alone, but for now, just know that the mode tells which number in a data set appears more than the others.
The Folly of Arithmetic Mean:
Many people have used Deckcheck’s “compare” feature to compare a large number of decks and simply using their calculated average to make a deck. This is also the method primarily employed by the articles linked in introduction. I believe that this is statistically inaccurate for a number of reasons.
Consider this first case: Let us say that one is creating a Zoo deck, and according to the data, 25% of Zoo decks played 0 Kird Ape, but 4 Loam Lion, while 75% of Zoo decks played 4 Kird Ape, but 0 Loam Lion. The method of averages would then say that it is correct to play 3 Kird Ape and 1 Loam Lion even though a quick skim of the lists says that this is clearly incorrect, as one should always either play 4 Kird Ape or 4 Loam Lion, as the data suggests.
A second reason why arithmetic mean is bad, aside from the fact that it does not take into account “one or the other” situations, is that it is easily skewed by outliers given a small sample size. Consider this second case: You are analyzing six decklists. Five of them play 2 Islands, while the 6th plays 12
So, how do we then calculate how many of a card to run without making ourselves backtrack and check? The method I use is to calculate the mean, median, and mode for each card. I then compare the values, and use the card number where at least 2 out of the 3 methods agree. For example, say we are trying to see how many Putrid Leeches to play in our Jund deck. The mean comes out to 3.7, the median 4, but the mode 3. The mean and median agree that 4 (always round to whole numbers of course) Putrid Leeches should be played in the deck, and that’s how many I will play. Each one of these three will fail you at some point. This is why it is very important to combine them and not allow one method to decide your card choices.
Putting it into Practice:
I decided to implement this method to create a Vintage deck. After crunching some data of Top 8 lists, I found that the top three best performing Vintage decks in the last 3 months were: Tezzeret’s Vault, Fish, and Iona Oath in that order. Since I was intrigued by the vault deck, and because it was the best deck, I wanted to find a good decklist for it. I have used well performing Tezzeret’s Vault lists from the last three months for the purposes of this article. I suggest you follow along and work with me as I go, if you are interested in this type of thing.
Cards we can immediately throw out:
Looking at cards which cannot even garner a single copy by method of averages are not cards that will never make it into the deck, but cards that do it make it in our first pass. In this list, this applies to all cards which have an arithmetic mean of less than 0.5.
It is worth mentioning the following two cards you may have noticed by looking at the link I provided above:
Hurkyl’s Recall: Take another look at the card. Deckcheck lists Hurkyl’s Recall and Hurkyl’s recall. The lack of capitalization on the second one, actually, at first glance, skews the numbers enough to tell us that we should not be playing Hurkyl’s Recall at all! In fact, when we add the numbers for the two Recalls together, we get exactly .5, telling us that we should be playing 1 Hurkyl’s Recall in our deck. It’s important to watch out for things like that, it is happens again with Sensei’s Divining Top as well.
Cards that are an auto-include:
A good shortcut to know is that whenever a card is played in only one number across all decks, one will always play that number. For example, every single deck plays a Black Lotus in the link above. Without doing any calculation, you know that the mean, median, and mode will always be one, and thus you will always play one Black Lotus in your deck. Cards that are an auto-include are:
The Mean, Median, and Mode Decklists:
For the sake of comparison, I have created three decklists, each one using exclusively Arithmetic Mean, Median, or Mode. We will then compare and include in our version, each card whose number is agreed upon by 2 of the 3 decks. Usually I do this card by card, but I think for the sake of learning, doing it by three decklists will prove to be beneficial.
First the decklist calculated with Arithmetic Mean:
The decklist calculated with Median:
The decklist calculated with Mode:
Now, it is simply a matter of combining the three decklists. Remember, we are looking to find two agreements on number among all three of these decklists. Also, do not worry about variable slots; we will address those in a bit.
Normally, you will either get all three matches, or at the very least, two out of three matches. However, there are times when all three statistical methods will disagree. That is when we may bring in two additional methods: The Geometric Mean and the Harmonic Mean. Since Mean and Median tell us to play two and three Dark Confidants, respectively, and Mode tells us to play none, we can at least say that two out of three agree that we should play some number of Dark Confidants. Thus, we can discount the zeros in our data. This is a prerequisite of finding the Geometric and Harmonic Means. The easiest thing to do now is to input each data point into its own cell in Microsoft Excel and then use the GEOMEAN( and HARMEAN( functions to calculate the values. If you are interested in the mathematics, the Geometric Mean multiplies each data point in a data set of size n together and then takes the nth root of the result. The Harmonic Mean takes n and divides it by the sum of 1/x from 1 to n.
Regardless, for our purposes, we find that both the Geometric Mean and Harmonic Mean suggest that we play three Dark Confidants, which the number I have chosen to use. We could further justify this by discounting the mode (since we know we have to play Dark Confidant) and averaging the mean and median to get 3 as well. This was the only discrepancy I ran into.
The Decklist Thusfar:
Recognizing When Statistics Fails:
Going purely by statistics we have six variable slots. However, in reality, we really do not. This is where statistics fails us, and why simple Arithmetic Mean as a way to build a statistical deck is not a very good idea. When we analyze the lists, we see that there are low numbers for large artifact creatures such as Darksteel Colossus. In fact each deck seems to have one of these types of creatures. Every statistical method we’ve used so far has told us to run zero copies of the following: Darksteel Colossus, Inkwell Leviathan, Sphinx of the Steel Wind, Platinum Angel, and Triskelion. However, upon closer examination, we see that every deck seems to have one copy of one of these creatures. This explains why the number is told to us as zero. If each deck were to play one Darksteel Colossus, we would play one as well, but some decks play Inkwell Leviathan, Platinum Angel etc. Recognizing things like this is a skill that must be developed if you elect to use this method. We must now decide which creature to run. Looking at the numbers for Arithmetic Mean, we find:
Darksteel Colossus: 0.23
Inkwell Leviathan: 0.31
Sphinx of the Steel Wind: 0.38
Platinum Angel: 0.08
Going by these numbers, Sphinx of the Steel Wind is the most popular choice, which is what we will use for our deck. This is what I mean, when I say that in reality, we do not actually have six variable slots. We must look for the things like this that statistics miss before calculating the actual number of variable slots. In our case, our actual number of variable slots is five. You may stop here and fill them with whatever you’d like, however I will keep going and share what I decided upon.
Filling in the Variable Slots:
I noticed that Mystic Remora was always (save for one exception) played as either a 4 of or not at all. The Arithmetic Mean told us to put one in our deck, but it ended up not making it in at all because the other two methods did not agree with that assessment. I decided to add in 4 Mystic Remora to the deck, because of how well it plays in Vintage. Leaving myself with one more free slot, the best thing to do is to find a card that did not make it in, but was played as a one-of in nearly half the decks we sampled. Being a one of in at least 50% of sampled decks means that the mean, median, and mode would all have suggested to play one. Being a one-of in just less than 50% means that it barely missed the cut. The best card I could find to fit this criteria was Echoing Truth which was played as a one of in twelve of the decks. Being in thirteen decks means that we would have played it in our main deck, but it barely missed out. Adding in four Mystic Remora and an Echoing Truth gives the final decklist of:
Sideboards are messy, to be honest. They vary so wildly that you will never be able to construct a decent 15 card board without a bunch of one-ofs that do you no good. You can find good sideboarding information, but I urge you to combine statistics with maybe one or two well performing decks to figure out a good sideboard. For example if we were to make a sideboard with purely Arithmetic Mean for this deck, we would get:
What actually makes this more difficult is that there are four cards in this board which could be classified as “graveyard hate” so honestly, I would first combine all the “graveyard hate” cards into one lump sum and calculate how much hate people are playing, as it’s unlikely that playing one of each card is a good idea.
For anyone wondering, I more or less compounded those numbers and looked at the sideboards of a few good decks and made the following sideboard.
The Final Deck and Closing Thoughts:
by neosystems on 2010-04-15 01:30 CET
by mchosa on 2010-04-15 03:45 CET
cool story bro
by Drayen on 2010-04-15 07:06 CET
by SteveMan on 2010-04-15 07:09 CET
Math iz hard
by neosystems on 2010-04-15 08:10 CET
Here's the conclusion, because apparently, adding it in makes all the page tables to go hell. The final deck is just the final maindeck listed + the suggested sideboard.
by coboney on 2010-04-15 13:54 CET
Very good article and a good read Neosystems.
by Nickname7 on 2010-04-15 18:36 CET
by GoneBananas on 2010-04-16 18:59 CET
i keep trying to google translate this into english but it won't work!
by warpg8 on 2010-04-19 09:19 CET
are you secretly walter wagner? reference:
by neosystems on 2010-04-19 20:15 CET
1) I've addressed metagaming in the article already. This method is useful for creating a good version of an archetype versus an average field (the average field being the field faced by the decks you chose to sample). If you choose well performing decks indiscriminately as I did, you will truly have a good version for an average field. If you have a good idea of what you will face, as I said, it is up to you to make the changes yourself, or sample decks which did well in the field you expect to face. In theory you should stick with Classical Statistical thought, but you should be Bayesian in practice and know when statistics will fail you.
by warpg8 on 2010-04-19 21:00 CET
Mean, median, and mode are measures of central tendency, so this statement:
by neosystems on 2010-04-19 21:58 CET
"Interesting. YOU predict ~20."
by warpg8 on 2010-04-20 10:40 CET
"Gaussian distribution is quite irrelevant since, with the exception of basic lands, you're (on a card by card basis) worrying about a closed interval of [0, 4]."
by darkwizard42 on 2010-04-20 15:52 CET
by Tcleberg on 2010-04-20 17:38 CET
Good prediction, darkwizard. Hey, Skathe, you make more than 20k, just barely, right?
by warpg8 on 2010-04-20 20:54 CET
@darkwizard: you're right, taking 10 minutes to put together a coherent argument about something i am a) knowledgeable and b) enjoy doing in my free time is obviously directly indicative of my yearly salary. and i've wasted far less time reading and critiquing this article/technique than the author did writing it or actually performing this process.
by initD on 2010-04-22 21:15 CET
"oohh, this is just too goo" - syndrome
Download & Guides