Glicko-2 in Guildwars 2

Crystal Black.8190 · November 18, 2017

Hello,i would like to talk about the Glicko-2 rating system which is currently used in GW2. How it works and where and why i think it could be improved.My main source of information is the document from the glicko.net site.

How does Glicko-2 work?

In the Glicko-system every player will have assigned a rating, a rating-deviation and a volatility. In game only the rating is shown, but i think all three could be shown for further information and comparison.

What do these mean?

The rating provides information about a players rating, how good or bad a player performs. The rating-deviation provides information how secure we can be about a players rating.This allows to define a players rating in an interval rather than a number. Usually an interval is chosen in which we can be 95% certain that the players real rating is within the interval. To do this you add twice the rating-deviation and subtract twice the rating-deviation.For a player with rating 1500 and a rating deviation of 50 this would define a interval from 1400 to 1600. We would be pretty sure that the actual rating is somewhere between 1400 and 1600. Another player could have also a rating of 1500 but a deviation of 150. So his real rating is somewhere between 1200 and 1800.The more games a player plays the more secure we can be about a players rating. This means at the start of a season a players deviation will most likely be higher than at the end of a season.Finally we have the volatility to measure if a player performs at a constant skill level or if the performances is very bad in some games and very good in the other ones. A high volatility will lead to a higher rating-deviation.

These formulas will give the new rating and rating deviation. µ is the rating and ϕ the deviation.For the exact calculation i recommend to look at the glicko site, see above.

Glicko and GW2-Scoring

If we look at the formula there is a variable s that is dependent on the score/match outcome. It will become 0 if you lose, 1 if you win and 0,5 if it is a draw. And while this works in a chess environment we will met a problem in GW2.A draw is very rare in guildwars so we just have 0 or 1 available for s. This will cause that games with score 15-500, 350-500 and 499-500 will be treated as equal loses, which i think can be said are not (assuming no other differences except game outcome).This problem could be reduced by letting s scale depending on score. In the following picture are some graphs that let s scale depending on score difference.

Yellow:It could be a simple linear graph. Get more points, get a better rating.
Pink:Maybe we want to weight points more if the score difference is close to encourage fighting harder in close matches. You gain more rating the closer you can keep your score to the opposite team, while in matches with huge differences in score wouldn't change much.A devastating game with 15-500 and a game with 70-500 would roughly be treated the same.
Blue:The opposite of pink. Points are worth more in the extremes. This could encourage to still fight in games where the outcome is clear instead to afk.
Green:This is a combination of Pink and Blue. Points are worth more in the extremes and in close range.
Red:This is still a combination of Pink and Blue, but instead of weighting points more on the winning side the more extreme they win it is worth less if you win by a even more margin. You are still winning by a landslide. Simultaneously it encourages the losing team to fight hard because their points are worth more, if it is a one-sided match.

These quick doodles just show a few examples i could come up with. Which one is choosen depends what you want to achieve, but i think it would be good to implement a system that cares for the score.

Glicko's rating-period in GW2

Glicko is not supposed to update the rating after every game and instead update after several games within a certain time-frame. Glickman suggest that there should at least be 10-15 games per rating period so glicko can work best, while more games in a rating period would be better.I remember that somewhere in the old forums was stated that the rating period used in GW was 2 days. If we assume 1 hour playtime per day and 4-5 games per hour that is nearly the lower end of the recommended number of games.With the introduction of seasons we saw a change to this system and a players rating is instead updated after every match which could lead to inflated rating deviation and volatility. Furthermore we now probably still have a separate rating period used to scale the rating deviation if a players does not play.If there wouldn't be for some cases a separate rating period to measure the time of not playing the rating deviation would fast reach a very high amount diminishing its benefit and accuracy.What could be done is to reintroduce a longer rating period somewhere in the range of 1-3 days and instead show a expected change in rating until the rating period ends.

Glicko in Tournaments

The original glicko-system only works in a 1vs1 situation or in something where teams can be handled as one entity, like it is the case in tournaments.

Currently the initial match-pairings seem to be random and the following are determinated by the match outcome. Glicko-2 could be implemented here. Similar to guild-teams there could be teams you can form and register for tournaments.These teams would have a glicko rating which could be used for the initial matchmaking to pair similar teams. In the following matches this could be combined with a swiss-styled tournament, where the pairings would be based on glicko instead of previous wins/losses, so there could be more balanced match-ups. Of course win/lose would still be factored in when glicko gets adjusted after the tournament. The duration of the tournament would be a rating period.Another advantage of glicko in tournaments would be that teams could be compared to each other that never played in the same tournament together because of their glicko-rating, which allows us to create a team-leaderbord.

There may be a problem of good players constantly forming new teams that aren't rated high. The initial team-rating would need to be based on the personal mmr of the members. Furthermore to prevent abusing forming teams, where high-rated and low-rated player form a team, the highest rated player would have to be weighted more.

Glicko in Unranked/Ranked

As mentioned above glicko is unable to handle non 1vs1 games, but this is the case in unranked and ranked, so glicko needs to be adjusted to handle this.A method based on personal performance is in my opinion not feasible. The amount of possible variables that can be used to measure that and how to weight them against each other would be a nightmare.This document briefly explores some of the possibilities how glicko could be adjusted with a reasonable amount of work. I will try to explain them shortly. I don't know which A-Net choose or if they choose any of them, but they can´t be using base glicko either.The following pictures are from the document above.

The first option is to handle the game as various 1vs1 that occur within the same rating period.

The second one is to handle it as personal-rating compared to the average enemy-team-rating.

The third one is to use the own average team rating compared to the average enemy-rating.

The adjustment does not necessarily have to be the average that is just be used in the document. E.g. in the third option the own rating could be adjusted by different ways depending on your team-members rating. Average is just an easy one to use.

So far this would only work if everyone is playing solo. Because this is not the case the system needs further adjustments.Now there needs to be a way to factor in teams of various sizes. I will give a example how this could be done.

Teams cannot be handled as average of their mmr. On the one hand we have teams that can play exceptional well and better than they could alone because of communication and adjustment to each other. On the other hand we have teams that just formed or want to have fun with each other. I suggest to handle them similar to teams in tournaments (see above). If people queue together there would be created a hidden team with its own rating.This rating would be used for matchmaking instead of the players ones and adjusted on the match outcomes in addition to the players rating. The teams initial rating would be based on the mmr of its members and/or previous teams.

Malafaia.8903 · November 18, 2017

I must confess that i don't understand like 1/5 of this topic, but i'll thank you for adressing this.People will argue about balance, condis, "spamfest", and a lot of other arguments but the fact is: The real problem is the matchmaking. If Anet solve this problem we'll have A LOT LESS whining and toxicity spread accross the environment.

Vieux P.1238 · November 18, 2017

No thank you. Pvp is cook & done. There's really no coming back. Anything else work at would just push me away from it more & more.

LUST.2305 · November 18, 2017

Good read.I remember discussing with Evan Lesh the issues with the GW2 system (probably season 3?), most were brushed off until issues showed up exponentially later in the season...

Few games use Glicko2 for Team-based games (ex. CSGO), the difference is those games have HEAVILY modified versions to handle it and calculate individual skill. GW2 does not. Ideally, we would want a system that can do this...but that requires work...and "PvP attention".

Faux Play.6104 · November 19, 2017

Compared to the matchmaker, Glicko 2 works fine. GW2 isn't the only team competitive format it is used on.

The problem is how teams are formed. Glicko needs to have people on the same team that have a similar rating. With the current matchmaker they sacrificed queue time for rating and match quality. Last season, during primetime, I got a duo of chaith and phantaram on my team twice in a row. At the time that I was a nearly 400 rating below them. My queue times were less than two minutes both matches. If you look at the pseudo code for the matchmaker on the wiki, it considers rosters with a 1200 rating delta in the same group: Rating start="5m" end="10m" max="1200" min="25"

I disagree about giving points out for close losses. The closer the match is the more likely each player could have done something to turn the match.

Malafaia.8903 · November 19, 2017

@"Vieux P.1238" said:No thank you. Pvp is cook & done. There's really no coming back. Anything else work at would just push me away from it more & more.

Ok, so if theres no "coming back" and it's "done" why don't you just quit?Look at the amount of info OP just dropped trying to improve things and you came with all that negativity to bash and go.Please, don't be that kind of person.

Faux Play.6104 · November 19, 2017

@Crystal Black.8190 said:With the introduction of seasons we saw a change to this system and a players rating is instead updated after every match which could lead to inflated rating deviation and volatility. Furthermore we now probably still have a separate rating period used to scale the rating deviation if a players does not play.

Actually the opposite is true. The more matches you play, the more your deviation settles. When I ran the numbers it would bottom out around 60. So the most accurate it can get your rating is +/-120 at a 95% confidence. The more your w/l % deviates from 0.500 the higher your deviation becomes.

The first season or two they had a period where your deviation would increase if you went inactive, but it always updated your rating after each match. Since leader board position was based on rating only, the smart players just played a match or two, then let their deviation max out and play another match or two. That way they would be getting maybe 50 pts for beating a team several hundred MMR lower than them. Their tryhard peers on the other hand, were getting 4-5 pts for a win. The top spots on the leader board that season were around 2200 in NA.

After that they removed the ability for your deviation to grow from inactivity. The only way it can go up now is by getting very high win rates so it is more ELO than glicko. An option would have been to use the lower bound of the 95% confidence interval.

There isn't really anything wrong about updating ratings every match, it is just harder to do computationally. Assume you have a tournament with 100 players. They each play 10 matches. If you wait until the tourney is over you only have to do the math 100 times. However, if you do it after each match you have to run the calculation 1000 times.

Malafaia.8903 · November 19, 2017

I wish someone from Anet gives us a word in this topic.

Crystal Black.8190 · November 19, 2017

@Faux Play.6104 said:
@Crystal Black.8190 said:With the introduction of seasons we saw a change to this system and a players rating is instead updated after every match which could lead to inflated rating deviation and volatility. Furthermore we now probably still have a separate rating period used to scale the rating deviation if a players does not play.

Actually the opposite is true. The more matches you play, the more your deviation settles. When I ran the numbers it would bottom out around 60. So the most accurate it can get your rating is +/-120 at a 95% confidence. The more your w/l % deviates from 0.500 the higher your deviation becomes.

The first season or two they had a period where your deviation would increase if you went inactive, but it always updated your rating after each match. Since leader board position was based on rating only, the smart players just played a match or two, then let their deviation max out and play another match or two. That way they would be getting maybe 50 pts for beating a team several hundred MMR lower than them. Their tryhard peers on the other hand, were getting 4-5 pts for a win. The top spots on the leader board that season were around 2200 in NA.

After that they removed the ability for your deviation to grow from inactivity. The only way it can go up now is by getting very high win rates so it is more ELO than glicko. An option would have been to use the lower bound of the 95% confidence interval.

There isn't really anything wrong about updating ratings every match, it is just harder to do computationally. Assume you have a tournament with 100 players. They each play 10 matches. If you wait until the tourney is over you only have to do the math 100 times. However, if you do it after each match you have to run the calculation 1000 times.

You are right, the more matches are played the smaller probably becomes the deviation. It is mentioned somewhere above your quote.Maybe choose bad wording. The lesser matches are within a rating interval the more likely there can be big changes in volatility and deviation. If there is a low number of matches, it could be considered to lower τ. It should help to prevent too much fluctuation.

I guess the problem of abusing deviation increase over time is already addressed, now that there is a minimum number of matches required, that increases during the season to be listed on the leaderboard.> @Faux Play.6104 said:

Compared to the matchmaker, Glicko 2 works fine. GW2 isn't the only team competitive format it is used on.
The problem is how teams are formed. Glicko needs to have people on the same team that have a similar rating. With the current matchmaker they sacrificed queue time for rating and match quality. Last season, during primetime, I got a duo of chaith and phantaram on my team twice in a row. At the time that I was a nearly 400 rating below them. My queue times were less than two minutes both matches. If you look at the pseudo code for the matchmaker on the wiki, it considers rosters with a 1200 rating delta in the same group: Rating start="5m" end="10m" max="1200" min="25"
I disagree about giving points out for close losses. The closer the match is the more likely each player could have done something to turn the match.

I think Glicko is a essential part of the matchmaking. If there would be a better method to measure the rating of especially 2-5 player teams, the matchmaking would also have more accurate data to work with. I agree that an improvement to the matchmaking would still be needed. Now that you dont have to stay in the mists during queue i personally dont see why we can increase the queue time in favor of match-quality. There would have to be aimed for some reasonable amount of time, but i didnt mind my personal queue time of 5-10 min in prior versions.

I dont suggest to directly give out points for close losses but to treat close losses more similar to a draw between the teams.

brannigan.9831 · November 20, 2017

The first step to any sane matchmaking is putting people with only people of the exact same skill level and fighting people of the same exact skill level. It has been proven many times that they do no do this after a fairly short wait period of a couple minutes. I don't mind losing matches. I do mind like 1/3 of my losses being matches where I lose by 400 and can't do anything to stop it.

Glicko-2 in Guildwars 2

Recommended Posts

Crystal Black.8190

How does Glicko-2 work?

What do these mean?

Glicko and GW2-Scoring

Glicko's rating-period in GW2

Glicko in Tournaments

Glicko in Unranked/Ranked

Link to comment

Share on other sites

Malafaia.8903

Link to comment

Share on other sites

Vieux P.1238

Link to comment

Share on other sites

LUST.2305

Link to comment

Share on other sites

Faux Play.6104

Link to comment

Share on other sites

Malafaia.8903

Link to comment

Share on other sites

Faux Play.6104

Link to comment

Share on other sites

Malafaia.8903

Link to comment

Share on other sites

Crystal Black.8190

Link to comment

Share on other sites

brannigan.9831

Link to comment

Share on other sites

Archived