Dirk Knemeyer

Fixing the “five-star” rating system

I’ve long been a fan of the five-star rating system online: users are familiar with it, it is very straightforward, and in theory provides a nice gradation of results.

In theory. Today I think my affinity for the five-star system has been permanently squelched.

cnn.com ran a story today about the top 50 restaurants in the world, as voted by a bevy of culinary experts. More than a cheeky list, this one had some teeth and I’m sure the proprietors of the restaurants included on it felt really good to be given the honour. Now, I’m not quite a foodie but almost am, so this sort of thing interests me. I wanted to see what and how many restaurants from the U.S. and Germany – the countries I consider home – were on the list. Then I started to research the eight from the United States, most of which I had heard of already. I was surprised to realize that, despite being anointed among the 50 best restaurants in the world by the experts, they did not fare so well on Yelp, the most popular and useful Internet site for rating restaurants. Indeed, none of them had the perfect “5 stars”. Instead, the ratings broke down like this:

4 1/2 Stars – 6
4 Stars – 1
3 1/2 Stars – 1

Now, you could argue that 4 1/2 stars in this kind of a system is a perfectly fine top rating. I won’t disagree with that. And, you could argue that one or more of these restaurants getting 4 stars is reasonable, because people on Yelp rate restaurants for a lot of reasons and for a restaurant to come in 1/2 star below the “highest” realistic rating is reasonable. OK, I can buy that. But 3 1/2 stars? C’mon.

That said, these results are not the problem, they just surfaced the issue for me. In my own (extensive) use of Yelp, I’ve come to realize that while the input engine might be 5 stars, the output engine is muddled. There are no 5 star results; there are no 1 star results. Instead the results break down to:

4 1/2 Stars – Highest rated; going to be great
4 Stars – Definitely good, buy with confidence
3 1/2 stars – Could be good or could be bad, buyer beware
3 stars or less – Dreadful, avoid at all costs

Now, it took me a while to figure this out, that the 5 star system is not 5 stars at all. In fact it is a range of just 2 stars that covers the entirety of practical results! For a new user, or someone who has not figured this out through a lot of trial-and-error, this is terrible. It obfuscates the very purpose of the rating system: to provide clarity that guides choice.

I think the solution is rather simple. We need to look at what the data is giving us – four distinct and specific choices – and re-fashion the system around that. It could be done one of two ways:

1. Fix the output system only. Convert it into the four things I identified above: Great, Good, OK/Mixed, Bad. Simple, straightforward. So, users pick their 1-5 stars when they rate the restaurant, but that translates into a more plain-language system that provides directive behaviour. The average star rating could still show up in a supplementary capacity to provide transparency into the system.

2. Revamp the entire system around those four things. So, a user inputs their rating as one of Great, Good, OK, Bad and the system outputs the same way. I’m not schooled enough on the micro data patterns behind the roll-up to suggest if this is a better approach or not.

In either case, the intent is to give the user more useful and directive information. And it is not only a Yelp problem; Amazon faces the same issue. Functionally the 5 star ratings translate into a much more narrow set of directive results. Amazon tries to mitigate this by being very aggressive in displaying the breakdown of how many people picked each of the 5 rating options. But that is just putting duct tape around a leaky sewer main: the real problem is you are forcing users to figure out via trial-and-error what the ratings really mean, and convert them into their own more clumsy understanding. In Amazon’s case the ratings do tend to have a slightly broader curve but they still break down to the four categories:

Great (you can’t go wrong here)
Good (most like it, you might too)
OK (be careful, some like it and some don’t)
Bad (don’t bother)

Of course, even more useful to the end user would be ratings that were more nuanced. In the case of restaurants and Yelp, it would be lovely for the ratings to break out and have users individually rate perhaps five categories separately such as Value for Money, Food Quality, Service Quality, Ambiance and an Overall roll-up. But we’re still in an online moment where the business strategy is path of least resistance, get as many people to contribute as possible, even if those contributions are shallow and the results more an amusement than of strong practical value. That strategy has worked so far, but you have to wonder at what point that will change and we’ll be saying, to quote an old baseball player, “Nobody goes there anymore because it’s too crowded.”

Tagged on:         

Leave a Reply

Your email address will not be published. Required fields are marked *