The Positives and Pitfalls of Review Democratization

by Justin Nation - April 9, 2017, 2:28 pm PDT
Total comments: 16

The proliferation of sources for game reviews over time have brought many benefits to people in search of consensus opinions. Unfortunately, because of bias and other factors, more reviews can also bring more noise, especially for Metacritic.

The release of Legend of Zelda: Breath of the Wild certainly brought out a lot of opinions. Many of them were extremely positive, but there was obviously both a reflexive backlash of “artificial” reviews (created with the express purpose of simply trashing the game) as well as “reviews” whose goal may not have been to objectively review the game but instead to act as an expression of defiance for specific communities. While the outright bogus reviews tended to be in the user community space, disqualifying them from being counted in aggregation scores on Metacritic, what created a little more of a stir was that a small number of less-than-stellar reviews also got counted, bringing down the overall average for the game in the all-time rankings. All of this brings us to the point of this editorial, which is to explore where things were at earlier points in time, where they are now, and then some of the potentially difficult questions that may need to be asked concerning what reviews (both good and bad BTW) merit being “counted” at the end of the day.

Before getting too far into things lets get one thing out of the way very quickly. All people are entitled to their opinions, no matter how they may be formed or influenced, and they even deserve for those opinions to be shared and read (whether you decide to care may be a different matter). In reviewing games, in particular, what qualifies someone to review a game has obviously reached a pretty low bar. People no longer need a journalistic platform to publish their thoughts on, reviews are solicited almost everywhere and, again, that's fine. The trickier part of things is tied to aggregation and whether you're interested in the raw mathematical average or whether you want it to be set up in a way to at least attempt to be accurate. If you read the content of a review you're often able to quickly discern whether it is the ramblings of either a fanboy or hater and you can pretty well throw those out on both ends. If all you're dealing in is the final numbers that gets a bit hairier. Keep in mind, as well, part of the reason how these scores are averaged matters is because literally peoples' careers and corporate decisions can be informed by them as repeatedly it has been noted some publishers look at the metacritic scores to “grade” development efforts. So this conversation is for more than just abstract interest in accuracy.

Getting on to where we once were, back in the stone age there were pretty well only mainstream print publications to go by. You could check the scores in the latest EGM, GamePro, Next Generation, and a host of others... and that was roughly it. Sure, at some point word of mouth would get rolling but the overall lack of variety in reviews made it tough, sometimes, to be confident. Worse, these publications would sometimes print reviews without attribution or they would be a product of collective opinion, possibly robbing readers of getting an honest feel for a particular reviewer's style or preferences. Even if you did know who wrote the review since almost all publications would only post one review per game even that connection could end up worthless if someone you didn't know/like/trust was the one who did the review that month. In short, looking back, going back to that era wouldn't be preferred.

Progress into the early internet age and things began to get a little more interesting and served as a sort of preview for where we are now. Independent game networks and fan sites of various kinds began to crop up, some with more polish than others, but the benefit was an increase in volume and diversity of opinion. Especially in the early days most independent sites weren't getting free games, the opinions being offered were from fellow gamers just like anyone else who had spent their money and were going to share their thoughts whether good or bad, in some ways improving their authenticity. Of course sometimes this would come at the cost of consistency or perspective so, in particular, fan sites could skew heavily positive at times. But since most sites would post multiple reviews you could at least have the advantage of diversity even within the staff of a website and you could come to value the opinions of specific reviewers as well.

This cascades into the modern day, and the point where things get both complicated and a bit overwhelming. There's just a load of opinion out there, plain and simple. Some of it is still in the mold of the integrity exhibited in the classic print publication space, whether private or professional, and a ton of it isn't. The great thing is that this climate amounts to there being reviews to fit all tastes and temperaments. If you can't find someone with a review style and track record you generally agree with you probably aren't looking hard enough. With the climate being what it is perhaps if this is the case what you should be doing is writing your own reviews to establish your own following... it really is just about that crazy anymore. But that same diversity and craziness is where the question begins to crop up with which reviews, at the end of the day, deserve to be counted. This is where things can get a bit ugly.

We'll start with a reviewer who shall not be named (if you read my opinion on the review it will be clear why I'm not doing it) and a particular score given for Breath of the Wild that did get counted. Again, I won't question the overall principle that people are entitled to stating their opinion, but I also disagree with that specific review score being counted. The complication with this type of reviewer is that rather than being from a school of measured thought first and foremost they're focused on the persona they project to their fan base. The new-ish creation in this generation is the “personality” reviewer, people with a shtick and a following that is driven more by the adherence to that gimmick than necessarily being accurate. In this specific case I'd argue that it wasn't brave to score the game low, it was actually self-serving and a manufactured score that would appease the rabid community who loves contrarian and anti-establishment grandstanding while not scoring things so low as to lose any hope of legitimacy. Let's face it, it was a very “safe” score to give for all of the hub bub... bravery would have been making it higher or lower.

To be fair, though, if we begin being critical of the lower and possibly troller-ish end of the spectrum we also need to take a hard look at the people who may be skewing things up unnaturally, in the end doing just as much damage (if not more since they're probably more prevalent). Are there really “legitimate” reviews out there for 1-2-Switch that are over 8? Or even as high as 7? Really? This game would be considered worthy of what would be considered a “passing grade” in peoples' traditional thoughts? I'll probably write something else in the future about how value versus purchase price really need to factor much further into modern review scoring but at the point this game gets higher reviews the legitimacy of them should be pretty severely questioned. This all ultimately means you're combating a problem both from the bottom and from the top.

That gets us into the last phase, trying to determine what could and should be done to aid in making the aggregated scores more “accurate”. Even among publications or sites that are considered to be “legitimate” I think a strong case could be made that rather than attempt to determine, as a whole, whether to add/remove a site or reviewer across the board it would be easiest and best to do what many places do when determining averages: Throw out both the top x and bottom x (whether this should be a set number or percentage, and what that number should be are up for debate). It would, in theory, mostly balance itself out if all reviews were 100% legitimate, doing no real harm, but it would likely prevent severe outliers from skewing things up or down. If enough people think a game is great or stinks obviously removing that number wouldn't change a thing, you're only removing individual reviews from the average and if enough people agreed on a specific numeric score the majority of them would still stand. 

Barring some other type of standard adjustment of this kind the only option I would see possible is, again, going to a need to continuously evaluating whether individual reviewers or outlets should be considered legitimate, pretty well an impossible task and one that would generate far more controversy than it is worth (especially given the tendency of “personality” reviewers to be a tad dramatic with a mentality of “forget the games, just focus on me”). Also, though on any given review an outlet or individual reviewer may skew up or down quite a bit (I would have likely been eliminated back in the day because I pretty well detested the original Tomb Raider games by likely 2 or more points below the average) I would guess most of the time they likely would have scores that are a little closer to the norm. I understand that metacritic attempts to help with an adjustment based on their “weighted” score of individual outlets but honestly that method is even more prone to issues if one of their highly-weighted outlets turns in a skewed review somehow, making the problem worse. Besides, don't you think if people found out how those weights were determined people would likely begin to nitpick even that? I know I probably would.

At the end of the day the fact that all games currently scored by Metacritic have all had the same criteria applied to them (sort of) makes their aggregated scores "fair". However, looking over the top 5 ranked games of all-time, the fact that the most recent of them was from almost a decade ago likely isn't a coincidence. The fact is that as more reviews are added to the mix, especially considering the diverse voices that can be out there, the more uncertain it is how you'll necessarily break out from the pack. With those added voices and numbers also comes the probability of baggage, both high and low, coming along for the ride, further complicating getting an accurate picture of things other than by hoping that the sheer force of numbers will help average things out. But when you see a variance of 3 – 4 points or more from your highest score to your lowest score perhaps something is up there that, in the end, may not be worth counting. It may be silly to worry over it but if the goal of an aggregated score is to be accurate, this sort of adjustment would seem to at least be more honestly set for meeting that goal.

Cross-posted from: MAMEiac Gaming

Talkback

ForgottenPearlApril 09, 2017

I agree that removing the really "out there" reviews would help averaged scores be more accurate.  I'm sure that's an important issue to many people, especially developers, like you mentioned.


Personally, though, I think review scores are WAY too skewed toward the positive, and that's the bigger issue for me.  Rather than have 5/10 be the average since that's in the middle of the scale, I'd say about 90% of games get at least a 6/10.  Heck, Game Informer once said in a certain review, "you’ll enjoy turning this game off more than you enjoy any aspect of its gameplay," but gave the game a 5/10.  Games rarely go that low or lower unless they're totally glitchy messes. Why have a huge chunk of the review scale that you only use when bashing garbage?  So, really, when it comes to great games, we end up bickering over whether a score should be an 8-point-something or a 9-point-something.  The scores are borderline useless.  However, Metacritic keeps calling 70/100 "average," so reviewers are going to maintain the status quo there.


I don't pay much attention to review scores.  I read the content of various reviews (some high, some middle, some low) and make my decision based on what I think of what I've read.  I also use my intuition: if a game's not making me excited, I stay away from it, even if I can't explain why it's not jibing with me.  In the past, I sometimes bought games with super-high review scores even if I didn't truly feel enticed by them, and I usually regretted it.


But anyway, that was an interesting article.  Back in the day, I only had Nintendo Power's reviews (and my own whims) to go off of.

Haha, we're on the same wave length, completely. I've actually got a number of Editorials still planned to come concerning reviewing, how it works, how it doesn't, different approaches and methods people could (but probably won't) use. It's been a fascination of mine since my first pass through this freelance game press thing approaching 2 decades ago and since I've been away things have continued to change, and not entirely for the better. Next up, and it already exists on my blog (just giving a break since I've already posted 3 things today from backlog of mine), is an editorial about how I believe that Value now should take a greater role in determining scoring (looking at you 1-2-Switch and Bomberman in particular!).

MASBApril 09, 2017

Great article, Justin! Review scores should never be used by publishers as a guide to pay/bonuses, etc. But since that brand of stupidity seems to be rampant and an accepted part of the industry, especially by American publishers, I think throwing out some percentage of the high/low scores would be the easiest way to handle the problem that wouldn't just introduce a new set of problems itself.

NintendoDadApril 09, 2017

I've never understood why more review sites don't use the letter grade system instead of the number system. Everyone knows a C is average and a B is above average. None of this 7 is average stuff that doesn't make sense.

And the Youtubers that will say anything to get more views, don't even get me started. What's really scary is all the 10-14 year olds that follow them religiously.

Evan_BApril 09, 2017

This has been a problem for some time, and I remember a similar debacle occurring with The Last of Us, in which people petitioned to have some poor reviews taken down because they were hurting the overall aggregate.

In all honesty, I don't go to Metacritic, nor do I consider a number of legitimate review sites for largely the same reasons- scores are skewed, and I think people value their experience with a game more than the technical and artistic aspects of a game. I have never had a particularly great experience with a Platinum Games title, but I would never discount the amount of technical polish and the quality of aesthetic they possess, and I don't necessarily think I'm qualified to review a Platinum title because I don't like action games. I mostly enjoy reviewing RPGs because that's my personal niche, though I love other types of games. I think the sensationalist and consumerist nature of the video game industry and the media surrounding it holds the medium back from receiving legitimate critique, and even being considered a legitimate form of art, in general.

All great thoughts and feedback, I'm encouraged to see that though the site doesn't seem to be in the general habit of posting Editorials there appears to be a thirst for discussion of topics, or at least nitpicking. :)

Quote from: MASB

Review scores should never be used by publishers as a guide to pay/bonuses, etc. But since that brand of stupidity seems to be rampant and an accepted part of the industry, especially by American publishers, I think throwing out some percentage of the high/low scores would be the easiest way to handle the problem that wouldn't just introduce a new set of problems itself.

Obviously I agree on both points. The challenge, and this is coming from corporate America and having been part of the management hierarchy at times, is that companies need a gauge to judge people and they would consider Metacritic as being fair and unbiased as a source. I could think you suck but perhaps I could have personal feelings mixed into that, 100 reviewers who don't know you averaged say you suck much more effectively and my hands remain clean. So considering that companies won't likely stop using the site as a measure of the work of the team, yeah, I think a revision would help. This keeping in mind that if they implemented such a strategy I was able to show where a score like the one for BotW would (rightly) go up while the score for 1-2-Switch would (rightly) go down. That's what would make this great: it has no guarantee of effect and could be benign if there are no outliers but would properly compensate against anything extreme in either direction.

Quote from: NintendoDad

I've never understood why more review sites don't use the letter grade system instead of the number system. Everyone knows a C is average and a B is above average. None of this 7 is average stuff that doesn't make sense.

And the Youtubers that will say anything to get more views, don't even get me started. What's really scary is all the 10-14 year olds that follow them religiously.

Also agree on both points, though with the first it's going to be complicated no matter what. Expect an Editorial at some point discussing the different ways reviews could be approached in general but keep in mind that it is precisely because of things like Metacritic and the "consensus" way things are scored, as you note, that changing things would be like fighting against the current and if you scored that way inevitably most of your scores would get lower, potentially confusing people or even making you now seem like one of those "outliers" and not worthy of being considered. It's complicated.


As for the minions of "He who shall not be named", yeah, and what's the most funny is I absolutely stand by my contention that he gave the game a "safe" score for his audience. He knows where his revenue stream is coming from and that bunch is full of contrary little assholes (though, not all, no doubt... but seeing them spread out into other sites' comments sections is like freaking locusts at times) who would eat him alive and unsubscribe from his channel the second they believed they got a whiff of him liking a big target like a new Zelda game. So he goes with low enough to throw them their red meat, stays JUST above the line where he'd go to obvious trollery and lose mainstream legitimacy, likely makes up a story about being attacked to be dramatic, and effectively turns the entire thing into a vehicle for promoting himself. You'd think he could run for public office.

Quote from: Evan_B

In all honesty, I don't go to Metacritic, nor do I consider a number of legitimate review sites for largely the same reasons- scores are skewed, and I think people value their experience with a game more than the technical and artistic aspects of a game. I have never had a particularly great experience with a Platinum Games title, but I would never discount the amount of technical polish and the quality of aesthetic they possess, and I don't necessarily think I'm qualified to review a Platinum title because I don't like action games. I mostly enjoy reviewing RPGs because that's my personal niche, though I love other types of games. I think the sensationalist and consumerist nature of the video game industry and the media surrounding it holds the medium back from receiving legitimate critique, and even being considered a legitimate form of art, in general.

Among the things I may do at some point is revisit the concepts of an editorial I wrote for the N64HQ probably 20 years ago now (that sadly I don't think exists anywhere) that started me on my quasi-free-time-unpaid-journalist road. It was all about the challenge of reviewing in the first place. Who is your audience? Genre fans, previous fans of the series, the mainstream? Who are you really scoring for? How do you write a single score that somehow could be relevant to all of the above at the same time when their interests are vastly different? Especially now that you have something like Metacritic aggregating scores and people looking for consistency in review recommendations this is more complicated than ever before. Seems like it makes it worth fleshing out and discussing. :)

ejamerApril 10, 2017

The older I get, the less confident I am that having everyone voice their opinions is a good thing. Sure, you can say whatever you want... but that doesn't mean you should.  On the up-side, it quicker and easier than ever to know who can safely be ignored.

SteefosaurusApril 10, 2017

Tangentially related question, but do consumers really use metacritic? I feel like that website, along with things like Rotten Tomatoes are more commonly sourced by the industry itself and "reception" sections on Wikipedia articles, than by actual individuals.

However, that's an entirely unfounded hypothesis. Would be curious to hear if anyone on here uses it a lot? I've looked at Metacritic a few times for games, less so for movies, but never found it super useful. The user reviews are often absolute garbage for one, and what I personally dislike is that the Critic scores use so many websites I've never heard of.

Just clicked the first game on the site, Persona 5 right now, and you've got notable sources like the Guardian, Washington Post, EGM, Gamespot etc. alongside things called SixthAxis and Cheat Code Central. Not to disparage those websites, since I don't know them, but that's not very helpful to me. Similar thing with Rotten Tomatoes.

It's probably my fault for defaulting only to places I've heard of and not stepping out that comfort zone, but I'm just trying to illustrate how the score aggregators don't really help out an average joe like me since they're not at all aligned with my personal outlets of choice. Not saying they SHOULD only display my preferred filter bubble, that'd ultimately be even worse probably. But yeah in reality that's what I'll end up doing myself. Just wondering if others have similar experiences with review aggregators?

Just for shiggles I did a Google search for "Persona 5 Reviews" and Metacritic is the first link, along with being in the infobox at the top of the results.

I seem to recall Amazon was including Metacritic in their listings as well?

KeyBillyApril 10, 2017

Thought provoking article.  There are simple statistical methods for outlier removal, which I agree would make sense.

In terms of Jim Sterling's review, I found it deplorable before I played the game or actually read it.  In the first hours of playing the game, it seemed like the 10/10 that many gave it.  After playing it longer, the cracks started to show and the excuses I had made for it fell apart.  I have now completed all the shrines and a 7/10 doesn't seem absurd, though I would rate it closer to 8.5.  There are plenty of issues for sequels to address and refine.  I did eventually read the review, after getting some perspective, and found that while the tone was over the top, the critiques were legitimate and really should have borne more weight and time in other reviews, along with other things the reviews were too rushed to mention.  Perhaps a game like this is worth revised reviews after more time.

Even now, I find that most discussions regarding BotW involve people censoring their own critiques.  There is a pattern of people mentioning flaws, then deflating their importance apologetically.

I don't mean to counter your obvious distaste for his motives, because I don't have an opinion on that.

I could easily see people not giving it a 10, his means of justifying his score was mostly picking his narrow set of gripes (most of which are personal taste and ignoring whether they made sense within the game design as a whole) and hitting the "scream about this" button... it's what he does. At the end of the day you could defeat ANY game that was given a 10 with such tactics, it's dropping down to 7 that's a bit preposterous though. At some point what a game does right is generally weighed against what it does wrong, considering BotW more lacking than not is really stretching hard. Even if I went to my ideal review scenario, where you're really rating games against every game you've ever played on a continuum it would be a massive struggle to justify it not being somewhere in the top 10% of games I've played, the top 30% it isn't even close. There's just too much that's through and through average to mediocre out there to contrast with.

Evan_BApril 10, 2017

The real crime here is that KeyBilly thinks BotW has flaws.

ForgottenPearlApril 10, 2017

Quote from: Steefosaurus

Tangentially related question, but do consumers really use metacritic? I feel like that website, along with things like Rotten Tomatoes are more commonly sourced by the industry itself and "reception" sections on Wikipedia articles, than by actual individuals.

If I'm on the fence about a game, I'll read a few reviews conveniently compiled there.  I pick some of the higher scores, some of the middle scores, and some of the low scores to get a wide variety of perspectives.  I then decide whether or not the good outweighs the bad.


But I don't do things based on an aggregate score alone!

SteefosaurusApril 10, 2017

Quote from: ForgottenPearl

If I'm on the fence about a game, I'll read a few reviews conveniently compiled there.  I pick some of the higher scores, some of the middle scores, and some of the low scores to get a wide variety of perspectives.  I then decide whether or not the good outweighs the bad.


But I don't do things based on an aggregate score alone!

Such an informed consumer! :) And hey even if someone went by numbered scores alone, if it works for them I won't knock it. But thanks for weighing in, I guess many people use it like you do.

I think Metacritic is good for gauging the "density" of the opinion your given reviewer of choice may have in the greater scheme of things. If it is roughly the same you have your trusted opinion on top of the mob. If they're higher or lower you can then factor that in as well. The folks you trust would still generally come first but there are still uses to understanding where the herd is gathering.

KeyBillyApril 11, 2017

Quote from: ForgottenPearl

Quote from: Steefosaurus

Tangentially related question, but do consumers really use metacritic? I feel like that website, along with things like Rotten Tomatoes are more commonly sourced by the industry itself and "reception" sections on Wikipedia articles, than by actual individuals.

If I'm on the fence about a game, I'll read a few reviews conveniently compiled there.  I pick some of the higher scores, some of the middle scores, and some of the low scores to get a wide variety of perspectives.  I then decide whether or not the good outweighs the bad.


But I don't do things based on an aggregate score alone!

This is typically how I use review sites, including games, movies, or Amazon.  The outlier (super high or low) reviews are sometimes the most useful, as long as they describe their reasoning.  There are some games with issues that others can look past, but would make me lose interest quickly.  There are also times when most people rate a game very low, but I would rate among my favorites.

Got a news tip? Send it in!
Advertisement
Advertisement
Advertisement