Peak Position, Part 1: A (Somewhat) New Way of Rating Games

bias, n.  [bahy-uhs]
1. a personal and sometimes unreasoned judgment
2. Statistics – a systemic, as opposed to a random, distortion of a statistic as a result of sampling procedure

There is a natural human tendency for individuals in a hobby to rate the objects of their affection, whether it be movies, comics, or whatever.  Gamers are by no means immune to this practice.  There just seems to be some primal need to establish what your favorites are and to notify the rest of the world.  And once that information is out there for a group of people, clearly someone is going to consolidate it and come up with a list of the best whatevers of all time.  In the gaming world, the most recognized of these lists is probably the one compiled by BGG, in which games are ranked by their Geek Rating.  Being one of the top 100 games on the Geek is a feather in any game’s cap.

But the thing with lists like these is that there’s always some bias present.  I’m not talking about deliberate bias, as is cited in the first definition of the word I displayed at the beginning of this article.  And I’m certainly not referring to tires!  No, I’m talking about statistical bias, where, due to the makeup of the voting group, the voting procedures, or any of a host of other factors, there is a tendency toward unbalanced results in the lists.

For example, about a dozen years ago, the Geek ratings were biased in favor of what would eventually become known as Eurogames.  Thematic titles tended to get lower ratings then they probably deserved, which led to flame wars, a mass exodus of gamers from the Geek, the launching of the Fortress: Ameritrash website, and all sorts of exciting stuff.  Still, the bias was unfortunate and led to many gamers feeling underrepresented.

Ironically, thematic games are very highly rated these days, so this bias is probably no longer present.  But there remain quirks in the ratings.  The most prevalent, in my humble opinion, is the bias in favor of newer games and against older designs.

Now, I could roll out all kinds of statistical flap-trap, filled with Greek symbols and such, to try to prove this point.  But sometimes, a simple glance suffices.  Here are the facts, with all statements being based on the Geek ratings of January 1, 2019:

  • 8 of the top 9 rated games on the Geek were published between 2015 and 2017.
  • 14 of the top 20 games came from this period as well.
  • These trends continue, as over half of the top 50 and over 40% of the top 100 games were released between ’15 and ’17.

Now companies have been publishing games for a very long time.  It’s certainly fair to say that most of the titles that came from the 20th century are of little interest to the vast majority of people who rate games on the Geek.  Still, starting in 1990 or so, or 1995 at the latest (the year when Catan was published), we started to see some very interesting and excellent games released.  So let’s say we’re talking about 25 years of notable titles.  If the Geek ratings are to be trusted, 8 of the best 9 games from that 25 year period came out since 2015.  That’s only the last four years (and really, it’s only the last three years, since there hasn’t been enough time for the 2018 games to get enough votes to break into the upper echelons of the ratings).  Three years out of 25, and yet it’s represented by almost 90% of the top 9 and 70% of the top 20 games.  That, my friends, is bias.

So is it possible that game design has improved so much in the last few years that these rankings are pretty accurate?  I guess it isn’t inconceivable, but I don’t come close to buying it.  For one thing, we’re talking about a massive representation from a very short period of time, much more than could be explained by advances in game design.  For another, a bunch of great games came out during the late nineties, the aughts, and the first part of the 2010’s, but very few of them are highly ranked (and, in fact, a lot of them are far down in the rating charts; the older the games are, the more pronounced this effect is).  And finally, this tendency to favor recent titles has always been present in the Geek rankings, although probably not as dramatically as we see today.  It’s just a weird consequence of the way that people rate games, a…well, a bias.

Let me be clear:  I’m not saying there’s any sort of conspiracy to rob El Grande of its god-given right to be a top 50 game.  No, I don’t think there’s anything sinister going on here.  There’s actually a bunch of factors that lead to this bias and detailing them all would take more space than I care to devote to the issue.  I just want to point out that older games tend to have depressed ratings on the Geek and, consequently, it’s difficult to use those ratings to judge the quality of these games.

So, assuming that we want our game ratings to actually be somewhat accurate, what can we do about this discrepancy?  A lot of arcane, mathematically-based methods have been proposed in the past.  But once again, I’d like to keep things simple.  What if we look at the peak position of these games?  For example, Puerto Rico was the top-rated game on the Geek once (in fact, it held that position for over 6 consecutive years).  During that period, it was considered, at least by the members of the Geek, to be the best game in the world.  Currently, PR is ranked 17th on the Geek.  Now, the game hasn’t gotten any worse during the past 10 years, so it isn’t unreasonable to say that the fact that it was once a #1 game is more meaningful than where it currently resides in today’s rankings.  That doesn’t make it the best game ever, but there’s reason to say that it’s one of the best.

Peak position has a lot of attractive features.  For one, it obviously eliminates the bias against older games, because games are being judged at the height of their popularity, against their own peers.  Another really good aspect is that it reflects how the games were viewed during their first few years.  There are so many good games that were widely played when they were first released that, for one reason or another, have completely fallen out of favor.  In a few cases, this represents a lack of staying power, but far more often, it stems from a series of capricious events that have nothing to do with the quality of the game.  Maybe another game with similar features came along a year later and overshadowed the earlier title.  Maybe the publisher had financial problems, or the game went OOP and it’s hard to find copies to play.  All sorts of stuff can lead to a game being forgotten before its time; it’s really quite common.  But these are still great games, far better than their ratings would lead you to believe.  And their peak position would reflect this forever, the knowledge that at some time, this game was considered one of the best designs in the world.

The problem is, the Geek doesn’t include historical rankings.  So unless you were part of the hobby and closely following the Geek a dozen years ago, there’s no way for you to find out that Caylus (currently ranked 47th; it’s days in the top 50 are definitely numbered) was once the second ranked game on the site.  Well, at least there wasn’t until recently.

JonMichael Rasmus is one of the more prolific statistical contributors to the Geek, a fellow who regularly posts gaming analyses on numerous subjects.  In May of 2017, he posted this Geeklist——in which, based on a great deal of independent research, he provides the peak position for every game that was listed in the Geek 100 since August 14, 2001.  Included in this analysis is data from before May of 2005, which had been missing from earlier lists, since it was hard to find and based on charts with fewer than 100 positions.  Rasmus updated the Geeklist on February 23, 2018, and the referenced list is current up to that date.

This list represents an enormous amount of research and completely delighted me when I saw it, as I had been seeking historical peak position data for some time.  In fact, I was so taken with the list that I decided to keep it current, so I have been updating my own copy of it every week since its original appearance.  With a huge tip of the hat to JonMichael, I’d like to share some details of that list with you.

According to Rasmus’ research, since 2001, 338 games have attained a position in the Geek top 100 for at least one week.  7 games were ranked #1, 62 games got as high as the top 10, 144 made the top 25, and 227 landed in the top 50.  Just as an example of the wide variety of games that reached the top strata at some point in time, here are all the games that made the top 10, listed in order of the number of weeks they stayed at that position (and with the year of publication given in parentheses), as of January 1, 2019:

Puerto Rico (2002)
Twilight Struggle (2005)
Pandemic Legacy: Season 1 (2015)
Agricola (2007)
Gloomhaven (2017)
Paths of Glory (1999)
Tigris & Euphrates (1997)

Through the Ages (2006)
Through the Ages: A New Story (2015)
Terra Mystica (2012)
Power Grid (2004)
Caylus (2005)
Die Macher (1997)

The Settlers of Catan (1995)
The Princes of Florence (2000)
Caverna (2013)
War of the Ring (1st edition) (2004)
Memoir ‘44 (2004)

Terraforming Mars (2016)
El Grande (1995)
Android: Netrunner (2012)
Command & Colors: Ancients (2006)

Star Wars: Rebellion (2016)
Eclipse (2011)
BattleLore (2006)
Up Front (1983)
ZERTZ (1999)
Elfenroads (1992)
Twilight Imperium (3rd edition) (2005)

Le Havre (2008)
Dominion (2008)
Carcassonne (2000)
Bridge (1925)
Scythe (2016)
Wallenstein (1st edition) (2002)
Dominion: Intrigue (2009)
Europe Engulfed (2003)

Ra (1999)
Age of Steam (2002)
7 Wonders: Duel (2015)
Brass: Lancashire (2007)
DVONN (2001)
Hammer of the Scots (2002)
Mage Knight Board Game (2011)
Space Hulk (3rd edition) (2009)

Modern Art (1992)
The Castles of Burgundy (2011)
Gaia Project (2017)
Hannibal: Rome vs. Carthage (1996)
Goa (2004)
Race for the Galaxy (2007)
Acquire (1964)
Shogun (2006)

Great Western Trail (2016)
Go (~2200 B.C.)
Carcassonne: Hunters and Gatherers (2002)

Star Wars: Imperial Assault (2014)
Full Metal Planete (1988)
The 7th Continent (2017)
Puerto Rico: Limited Anniversary Edition (2011)
YINSH (2003)
Ticket to Ride (2004)

That’s quite an interesting list!  If you’re new to the hobby, I’m sure there’s a ton of games there that you’ve never heard of (including a reasonable number of classic wargames, a category of designs that have literally no representation in the current Geek 100).  But if you’ve been gaming for 10 or 20 years, there’s undoubtedly a lot of titles there that will bring back some fond memories, including, I hope, a few that remain personal favorites, but which are rarely mentioned these days.  That, to me, is the real value of JonMichael’s research.

Well, with all this data, the temptation to analyze it is almost irresistible.  And far be it for me to resist temptation!  So in tomorrow’s article, I’ll look at the trends in the list, come up with a (hopefully) more refined way of interpreting the data, and then use it to create a new Top 100 games list.  Until then, let us know what you think of this approach to rating games in the comments section.

This entry was posted in Reviews. Bookmark the permalink.

22 Responses to Peak Position, Part 1: A (Somewhat) New Way of Rating Games

  1. Tom says:

    The truth is, I have been trying to figure out rankings that take into account a shift in gaming preference. There is likely no such ratings unless you constantly reevaluate your personal ratings. This is one reason I have really given up on ranking games other than to say “I adore the game”, “I like the game” and “I don’t need to play it again”. I wish there is a way to “age” rankings and take into account changing preferences. As a beginner gamer, most of the game I played were just amazing. Over time, as one evolves into a mature gamer, gaming tastes also change. Boy, I wish I could capture this shift in rankings.

  2. Jeff says:

    Interested to see where this series of posts goes!

    The “problem” with the ratings is, I suspect, that the “cult of the new” really means the “cult of new gamers”. Ask a teenager today what are the greatest movies of all time, but most have never seen Casablanca, Citizen Kane, or whatever classics you want to come up with. Gaming is the same way; how many of the people giving Gloomhaven a 10 have even played El Grande or Tigris or Puerto Rico? I believe someone did the analysis and it’s something like 1/3 have rated both Gloomhaven and PR. When PR was #1, /everyone/ had played it, Tigris, Die Macher, Settlers, and every other highly regarded game, so the comparison was meaningful. Now, it’s just a matter of which audience is more /populous/.

    I think maybe the ratings don’t mean anything any more. I think maybe it’s better to accept this reality than to try to think of a way to fix them. Not that your proposal isn’t worthy of consideration of course!

  3. Jeroen says:

    So how much do you think games “wear out” with repeated play? Is Puerto Rico (picking an example at random) still a number 1 game after you’ve played it 100 times?

    • huzonfirst says:

      Possibly not, Jeroen. But I’m not sure that’s a reason to rate a game lower. For example, I know of some Race for the Galaxy fans who have *finally* burned out on the game after hundreds of plays. But is that any reason for them to revise their 10 rating? After all, it was good for hundreds of games!

      There’s no right or wrong answer to this, just as different people use different criteria when rating games. But I personally don’t think it’s fair to subject Puerto Rico to a harsher standard than Gloomhaven, just because PR has been around long enough to “wear out” and Gloomhaven hasn’t. If someone asks me if I think they’ll like PR, I’m certainly not going to ding the game because it may not last for 100 sessions!

      • Phil Campeau says:

        BGG ranking is based on desire to play.
        The BGG definition of 10/10 is “love this game. Always want to play it and I don’t think this will ever change”. I’d youve burned out and no longer always want to play it, it’s no longer a 10/10 by definition.

        • riverc0il says:

          This is definitely another factor. But it also assumes that BGG users are utilizing the rating criteria provided by BGG rather than a subjective overall rating (as opposed to desired to play). I suspect there is a combination of both rating systems with more users utilizing their own rating system rather than “desire to play” as defined by BGG.

          Also, BGG ratings are often not updated by BGG users so “the hotness” rating of 10 often is never downgraded after the initial rating, another bias to the ratings towards initial impressions rather than longevity.

          I occasionally go back and change ratings and I find games more often than not cool rather than get a ratings increase. But if many if not most of BGG users do not adjust their ratings, hot games may get an artificial boost.

          It would be interesting to see an alternative system created with perpetual ratings. For example, a system in which raters must affirm or adjust all ratings every year.

  4. Matthew Strickler says:

    Fantastic, interesting read. Thank you!

  5. Nick Bentley says:

    Very cool, though the *much* larger obstacle to utility of ratings is that tastes vary so much. The variance around the mean of a game’s rating is typically much higher than the difference between mean ratings for games you might want to compare, so ratings often have little predictive power. The only way I know to fix this is to identify and aggregate ratings only from people who are like you. It can be done a little with the geekbuddies function on BGG, but it’s a pain.

  6. JonMichael says:

    Always good to see my data used for good instead of ill. Thanks for the shout outs – I can update the numbers if needed though I’m sure your updates are fine.

    The real question I find myself coming to on the Top 100 is that BGG is looking more and more like the Billboard Hot 100 and less and less like the imdb Top 250. There’s nothing wrong with that shift, but it does mean that using ideas like peak position (as pop songs do) is more meaningful than merely today’s chart listing (while the imdb chart is a little too… male?… it does have a lot of classics in the top 100 and a very wide net). Those differences and their implications have not yet been fully absorbed by the community and the capacity for hijinks remains high.

    That said, I’m curious to see where this analysis goes next.

    • Ippokratis says:

      so, what is the algorithm behind IMDB top 250 that allows Citizen Kane to be there? I understand that BGG weighs heavily the amount of votes of a game, a fact that favors new games as the BGG community is now 100 times bigger than it was 15 years ago (I assume).

      • JonMichael says:

        The algorithm is largely the same with a couple of key differences (1. BGG has a special sauce shillbuster which removes accounts that look like trolls. 2. BGG uses 5.5 for its Bayesian qualifier instead of the actual average of the games in the database. 3. The number of Bayesian ratings changes with the number of games in the database.)

        But I think the two key differences are extrinisic; 1. There is availability for older titles, especially those deemed classics. 2. There is a stronger understanding of what the imdb is meant to measure. By this I mean, The Godfather is ranked #2 on the chart… there is no way that The Godfather is the second most likely movie people want to sit down and watch. But it has quality, rewatchability, artistic value, influence, quality storytelling, etc. Board games seem to have a hard time with these factors and how to weigh them… even BGGs own guidelines tries to collapse it down excitement to play, which will always favor today’s flavor over important designs of the past.

        And, of course, I think both should lists should have a time factor since Spider-Man Into The Spiderverse isn’t the 26th best movie of all time just like 7th Continent isn’t the 12th best game of all time.

  7. Chris Backe says:

    Great list – and an interesting way of showing popularity.

    Some food for thought relates to BGG’s crowd (and how casual / party games are rarely seen on this list), and how any list would need to include these sort of well-enjoyed games… I’d point to another source of data if I had one, though!

  8. Chris Brandt says:

    Larry is truly one of the great gaming minds of our generation. Comparing games over time is as difficult is comparing baseball players from one era to another. There are too many biasing factors to make most any system perfect, but we can strive for “the best”. If we could see the ranking history of games, or even download it, we could make our own decisions on which games are truly considered “the best”.

  9. riverc0il says:

    Self selection bias is the biggest factor with BGG ratings, IMO.

    With so many new hot games, players (old and new) cannot play them all, so they focus on games most likely to hit their sweet spot. And why wouldn’t they? Whereas 10-20 years ago, it was much more likely that the biggest hits of the year would eventually be played at least once by the type of gamer that has a BGG account and actively plays and rates a lot of games.

    So if you only play games that you seem predisposed to enjoy and actively avoid games you think are not for you (easier to do now due to a huge amount of online media content), you’ll generally play a higher percentage of games that you’ll rate highly and avoid the games you might rate low. That will effect the overall ratings upward.

    For example, I will never play Gloomhaven. I know it is not a game for me. If i didn’t know as much about the game and it was brought to my group, I might give it a try and then give it a low rating (or at least, a rating that would lower the average).

    The other factor is the difference in time between how these factors are an issue now compared to 20 years ago. There is always going to be self selection bias. But it is more prevalent now. The biggest titles of today don’t get the same play with a wide spectrum of different types of gamers as many of us are focusing more. So it is hard to compare ratings for the best of years ago with the best of today.

  10. Phil Campeau says:

    “That, my friends, is bias.”

    No. That *could possibly* be bias. It could also be that new games stand on the shoulders of giants. Many of the best new games take great ideasand home them down into a samurai sword of precision gaming.

    We all agree that games made after the early 80s are by and large superior to their predecessors; isn’t it possible that another quantum shift happened somewhere around 2012?

  11. Eric Brosius says:

    Following up on Chris Brandt’s comment, Larry, your approach is analogous to that used by the Baseball Hall of Fame. Honorees are voted in (usually) by people who are voting soon after the time when the honorees were active. Furthermore, once an honoree is voted in, there is no process to take them out because (to use an example) Ty Cobb wouldn’t have had a career batting average of .368 if he had been forced to face the slider.

    It does leave open the question of comparing one era to another, but I don’t think there’s any approach that does that well.

    • huzonfirst says:

      The analogy between gaming’s statistics and baseball’s is an interesting one, Eric. Just as I’m suggesting for games, I’m a strong proponent of comparing baseball players to their peers. So would Cobb have hit for the same average if he had to consistently face Steve Carlton’s slider? Perhaps not. But he did have to face spitballs, which were legal for most of his career. And the man did win 12 batting titles in a 13 year period, which is unprecedented. With modern training, nutrition, and coaching available to him, who knows what he would have accomplished? All I can do is look how he stacked up against the players he played against and, based on that, it’s obvious that Cobb was one of the greats and certainly worthy of the Hall of Fame.

      Similarly, if a game achieves top 10 status at some point in time, I have no problem with it keeping that distinction forever. Take its competition and other factors into account, but getting up to the top 10 is a rare thing and is worthy of honor. I see no reason to retroactively alter the judgment of the gamers of that time, just because the tastes of today’s gamers may have changed.

  12. Pingback: The Village Square: February 4, 2019

  13. apertotes says:

    I will say it outfront: I disagree with the premise. I think that BGG’s current ranking system makes perfect sense, just like IMDB’s. What you perceive as bias, I attribute to the bloom of boardgaming during the last years, both in quantity of gamers and quality of games. I indeed believe that games are improving, specially thematic games.

    No matter how fondly we think of Heroquest or Warhammer Quest, I am convinced that Descent, Imperial Assault and Gloomhaven are way better games.

    Also, the typical gamer has changed. And this can be seen with videogames as well. When I was a teenager, I had time to play a 3 hour game of Risk or a 9 hour game of Civilization. Nowadays, I do not see anybody enduring a 4 hour long game, specially teenagers.

    Many old games have some perks that worked in their day but are not so attractive now, like duration, low component quality, Spartan artwork, harshness…

    If I may dare to say, I believe that BGG’s ranking system is biased towards old games and helps them maintain their ranking much longer than they would if everybody had to, for example, rate all games again every first of January. Many players that once ranked Carcassonne with an 8, or Puerto Rico with a 9, would rank them much lower if they had to do it today after having played hundreds of new games.

    Meanwhile, I do not think users would rank The Godfather or Pulp Fiction much lower today than when they watched it decades ago. I also think that happens with music. I like Highway Star just as much as when I was young, or even more.

    Either way, I enjoyed the article and the discussion a lot, thanks!

Leave a Reply