Jump to content
  • Sign Up

Glicko-2 algorithm put into code (Updated). Conclusion about win-streaks.


Tiah.3091

Recommended Posts

@Tiah.3091 said:

@"Exedore.6320" said:Going further, you could do additional tweaks:
  • Toss in a small, random fudge factor on top of that probability to account for people DC'ing or manipulating.
  • Simulate a pool of players at once rather than one at a time. Your current method knows your test player's true rating, but it assumes all other teammates and opponents are correctly rated - this is never the case.

That, on the other hand are interesting suggestions. I'll give it a try, thanks!

As for the first point - I believe it's reasonable to assume "different skill tiers" (however you call it) already include the stuff like disconnections, these things are just more likely to occur to "worse" players.The second one, in my opinion, is what should be done and what I suggested with a fixed pool of 5,000 players. ^^

Link to comment
Share on other sites

  • Replies 62
  • Created
  • Last Reply

Top Posters In This Topic

Oke, guys. A HUGE UPDATE here (check the OP)

  1. Ratings now resemble the REAL GW2 spvp leaderboard ratings as close, as possible, according to Gauss Distribution.
  2. Updated the matchmaking algorithm with that neat ole' good Gauss.
  3. Rating Deviation now converges, as the season progressing.
  4. Player's game history interval now increases, as he plays more games up to 100 games (was 10, constant).

As a result, the volatility VASTLY decreased and winstreaks-losestreaks are almost gone. They are still here, ofc, but not as huge, as they were before.I would advise you to re-read the first post from the "MODEL-1" paragraph. And run the code, if you want more thorough look at the thought process.Cheers!


Well, NOW we can ask for sticky, I guess (drops mic) B)

Link to comment
Share on other sites

I like that you put in the effort. But the people who complain about the rating system are still going to ignore any mathematical proof.

A few interesting things to try:

  1. Use a slightly skewed Gaussian distribution for player rating (not centered at the mean rating). This draws from an experiment I saw done in Overwatch. A player would play one account on weekdays and another account on weekends. The weekend account ended up over 500 rating lower (OW goes 0 to 5000) than the weekday account.
  2. Try to better reflect the matchmaker's behavior for fringe and low population. After ~5min, the matchmaker expands the rating margin for matching a player. This is particularly evident outside of prime play time.

On your question about duo queue, it uses the average of the players in a party as the roster's rating.

Link to comment
Share on other sites

@"Exedore.6320" said:

  1. Use a slightly skewed Gaussian distribution for player rating (not centered at the mean rating). This draws from an experiment I saw done in Overwatch. A player would play one account on weekdays and another account on weekends. The weekend account ended up over 500 rating lower (OW goes 0 to 5000) than the weekday account.

Could you, please, provide a link to that forum? Because I don't get the idea of this experiment.If it's the same guy playing both accounts, his TRUE SKILL level is still the same. It's not like he's playing worse on one account, than on the other, right?I use normal distribution explicitly for the TRUE SKILL level. The rating, which you IDEALLY should have.I don't get why it should be skewed. I mean, most of the things in nature distributed according to Gauss with very good precision, starting from something as fundamental, as the Cosmic Microwave Background and ending with something as simple as the boob size.

To me it seems like he got 500 lower rating simply because he didn't play enough games on second account.

  1. Try to better reflect the matchmaker's behavior for fringe and low population. After ~5min, the matchmaker expands the rating margin for matching a player.

I already did that, and I tried my best to explain how I did it in the updated OP post. Yes, I expand the rating margin, exactly as you suggest.

On your question about duo queue, it uses the average of the players in a party as the roster's rating.That wasn't MY question.I'm not interested in duo q results. At least for now. Probably that's a topic for MODEL-3 ;)

Link to comment
Share on other sites

@Tiah.3091 said:

  1. Use a slightly skewed Gaussian distribution for player rating (not centered at the mean rating). This draws from an experiment I saw done in Overwatch. A player would play one account on weekdays and another account on weekends. The weekend account ended up over 500 rating lower (OW goes 0 to 5000) than the weekday account.

Could you, please, provide a link to that forum? Because I don't get the idea of this experiment.If it's the same guy playing both accounts, his TRUE SKILL level is still the same. It's not like he's playing worse on one account, than on the other, right?I use normal distribution explicitly for the TRUE SKILL level. The rating, which you IDEALLY should have.I don't get why it should be skewed. I mean, most of the things in nature distributed according to Gauss with very good precision, starting from something as fundamental, as the Cosmic Microwave Background and ending with something as simple as the boob size.Too lazy to look for the post, but he did play enough to stabilize his rating. The idea is that different groups of people of play on weekdays vs. weekends. In particular, weekends may have more casual (slightly less skilled) players; possibly a higher number of middle school and high school kids; etc. Since rating is a representation of an individual's skill against the population, changing the overall skill level of the population will change an individual's rating.

Link to comment
Share on other sites

Couple points.

Every season I've played I've had games with 10+ wins in a row, and streaks where wins are hard to come by.

The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

Glixko will give accurate results for missmatched opponents. Because of this...

You are better off matching players that are of similar skill on the same team. The current method boosts the bad players and punishes the good. This tends to drive people towards the same rating vs. Separating them.

When I have run the numbers with the constants anet uses, the deviation won't go below 60. That means the system is 95% confident you are 2 deviations from your current rating, -/+ 120.

Link to comment
Share on other sites

@"Faux Play.6104" said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

Yeah, that's a good point, man.How it affects the actual rating approximation for every player - negatively or otherwise - that's a subject to research. I was thinking about that myself last night before sleep. Because I remembered the post in this thread, where dude cited A-net dev post, where he confirmed, that they use exactly that: they balance team average rating vs other team average rating.

When I have run the numbers with the constants anet uses, the deviation won't go below 60. That means the system is 95% confident you are 2 deviations from your current rating, -/+ 120.

Did you run the code from the original post?Because I just remembered, that I forgot to upload the new version, after the update.Also, the code uses slightly different constants from those, that A-net uses. I'll fix it and upload the new version. (However, that point about team vs team - I'm not yet sure what to do with it)

Link to comment
Share on other sites

@Tiah.3091 said:

@"Faux Play.6104" said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

Yeah, that's a good point, man.How it affects the actual rating approximation for every player - negatively or otherwise - that's a subject to research. I was thinking about that myself last night before sleep. Because I remembered the post in this thread, where dude cited
, where he confirmed, that they use exactly that: they balance team average rating vs other team average rating.

When I have run the numbers with the constants anet uses, the deviation won't go below 60. That means the system is 95% confident you are 2 deviations from your current rating, -/+ 120.

Did you run the code from the original post?Because I just remembered, that I forgot to upload the new version, after the update.Also, the code uses slightly different constants from those, that A-net uses. I'll fix it and upload the new version. (However, that point about team vs team - I'm not yet sure what to do with it)

I made my own based off the guild wars wiki and the glicko paper. Most of the posts I made on the subject were in the old forum. I'll have to dig it up when I get home.

For teams, I'd just do a sum of squares for the deviations and assume the player is at the midpoint.

I thought the wiki said 30 was the low cap, but I could not reach it when I ran multiple matches. Starting at 0 it would slowly grow until it reached 60. Same if you started at 700. It would shrink to 60. Regardless, I think your deviation numbers are too low.

Link to comment
Share on other sites

  • ArenaNet Staff

@Faux Play.6104 said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

This is not accurate. When a match is being built around a player, the matchmaker first looks for 9 other people within 25 points of rating. If it doesn't find enough people after 5 minutes (in ranked) it starts expanding the range over time until it finds enough players. Note: This doesn't mean if you've been in queue shorter than 5 minutes, everyone in the match is going to be within 25 points. If you're in a match with people over 25 points in rating difference than you, it just means whichever player the matchmaker built the match around was probably in queue for 5 minutes or more.

After those 10 people around found, is when it arranges teams to ensure that each side is close in average skill rating and standard deviation from that skill rating.

Additional note: We've experimented with making it so that everyone in a match had to have been waiting over 5 minutes before their ranges expanded. It didn't generally result in better matches and people at the higher end of skill rating ended up sometimes waiting in excess of 40 minutes for matches.

Link to comment
Share on other sites

@Tiah.3091 said:

@"Faux Play.6104" said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

Yeah, that's a good point, man.
How it affects the actual rating approximation for every player - negatively or otherwise - that's a subject to research.
I was thinking about that myself last night before sleep. Because I remembered the post in this thread, where dude cited
, where he confirmed, that they use exactly that: they balance team average rating vs other team average rating.

Before you read! Know that this response is admittingly, mostly conjecture and conspiracy theory.

I actually did this once with my best simulation. Say you take 100 players ranging from 1800 down to 800. I found that realistically over the course of a season, those rating margins will implode on themselves, not expand. Meaning, the season starts with a margin of difference between ratings ranging from1800 to 800, but at the end of the season with how rating is effected by wins & losses, it will end up looking more like 1400 to 1200. At least this is what seems to happen on paper when the population is very low and the match maker is putting together matches where our teammates and opponents can be several hundreds of rating higher or lower than each other. It creates a situation where high rated players are punished more than intended for loses and low rated players receive too much rating reward for wins where they are being carried. When I saw this result, it made me question why our leaderboards weren't doing this in the higher ranked margins? What was keeping the 1800+ margins of the board expanding and not imploding? The lower rated margins also seem to be stunted from inevitable implosion towards a median. Is it win trading creating unrealistic higher margins? Is it.. something else in the algorithm that isn't mentioned in the notes?

That is when I REALLY sat back and started thinking about those win/lose streaks that everyone seems to talk about that happen so frequently. Season after season I began paying close attention to many different player's "rising & falling" rhythms in the leaderboards. I began noticing something odd indeed. The same players would always go on win or lose streaks at the same time. What I mean is: Players (A)(B)©(D) always seem to hit some win rhythm at the same time, whilst Players (E)(F)(G)(H) are all hitting a lose rhythm at the same time A B C D are on a win rhythm. And it would usually go on for 2 or 3 days, until the rhythms apparently swap? Then I'll see A B C D all take on a sudden lose rhythm whilst E F G H all go on a win rhythm? I mean these are consistent patterns I have been watching for many many seasons now. I began to wonder if the algorithm had a secret function that was enforcing win & lose streak rhythms. From what I had seen, it would seem to be done in some kind of Control Group (A) and Control Group (B) kind of thing. When A is on a good rhythm for match making, B is on a bad rhythm. When B is on a good rhythm, A is on a bad rhythm. It certainly would explain the "according to algorithm notes" highly improbable but somehow super frequent win & lose streaks. It would also explain why the rating margins somehow magically expand high and low instead of imploding, a system that makes us take turns being ping ponged around the rating margins. Why would they program something like that in? I dunno, to make sure glicko margins work for a 5v5 game mode that makes you que with random people, to avoid implosion? Why would they not just tell us about it? Well it certainly wouldn't be a strong selling point for the game mode when a player read the glicko notes and realized the algorithm was sniping them during ranked ques.

I mean, does no one else find it odd that rating never settles as if your skill never settles? I have nearly 13,000 matches played and I'm sure I've peaked in my skill at Guild Wars 2 at this point! So why is my rating so ridiculously volatile every season all season? I would expect to bounce around between 1600ish and 1500ish, but to bounce around between 1650 and like 1350 four or five times a season, due to win & lose streaks that come and go like scheduled clockwork? ^^ It makes one wonder, but again, this is all just conjecture and a lot of conspiracy theory.

There is one other thing I wanted to note about running glicko simulations on paper. I've been pointing this out for years now and I'm going to say it again. The math all would seem to be perfectly accurate on paper, yes. But there are factors going on here that cannot be equated with numbers. Amongst these factors are things like "Are your teammates on their mains or are they alting for pvp wing achievements?" "Are some of your opponents smurfing on low rated f2p accounts but they play at a plat 2 level?" "Do you land a bad team comp but the enemy has a meta comp?" "Is anyone using 3rd party programs and/or win trading?" "Does someone random AFK to answer their door and pay for a pizza?" ect.. ect.. But by far the BIGGEST factor that stunts the accuracy of simulations is that they in no way consider how Conquest is actually played. This will be easier to explain in a list:

  • 1700 guy ques and match maker makes him a match. He gets put into a team looking like, RED: 1700 1400 1400 1400 1400
  • He gets put against BLUE: 1500 1500 1500 1400 1400
  • This looks perfectly balanced on paper "perfect" but is this actually balanced for a Guild Wars 2 Conquest match? In short, no.
  • What usually happens in these situations is that the 1700 will take and defend any node that he is at the entire game, but his 1400 teammates on the other two nodes that he won't be at, are going to get crunched the entire game by the 1500s. The BLUE team will likely hold 2 nodes much more often throughout the game and win the match as such.
  • In other words, a team with a high who is with lows, is at disadvantage vs. a team of players who have a tighter average within the glicko algorithm, despite that the math looks like "perfect matchmaking". This is because of how Conquest is actually played. The smaller our population gets, the more frequently this particular problem begins to occur.

But yeah, something to think about.

Link to comment
Share on other sites

I've always wondered how well the rank distribution takes into account certain factors that come up in GW2-style matches based on playstyle and class/build choice.

  • Carry factor: eg. some players will dominate the match if left unchecked. Sometimes they are shut down completely. Other times they run wild and win the game for their team. Eg. a glass cannon dps that can single-handedly wipe the other team
  • Acceptable-Teammate factor: other players are decent and will perform similarly in a game at any rating. Eg. A support player that tries to be in the middle of a teamfight healing + buffing but relying on other players to actually clear the point and get kills

Would we expect both types of players to have the same win/loss streaks in their results?

I'm completely speculating here, but I would expect Carry to be less streaky as the season goes on. They should settle to right around the point where players can deal with them. If they lose too much, they start to dominate games until they are back at the point where players can counter them again. If they win too much, they get countered every game.

Acceptable-Teammate though.. I think they could rise or fall to potentially any rating, as they are more dependent on the team they get. Obviously, there's a fair amount of shuffling of teams, but I think Acceptable-Teammate may be more prone to streaks of luck with the matchmaker.

Link to comment
Share on other sites

@"Trevor Boyer.6524" said:but again, this is all just conjecture and a lot of conspiracy theory.Pretty much what you said, yeah, lol xD(Also, quite a tough hypothesis to falsify, because it's hard to account for such factor.)

  • 1700 guy ques and match maker makes him a match. He gets put into a team looking like, RED: 1700 1400 1400 1400 1400
  • He gets put against BLUE: 1500 1500 1500 1400 1400
  • This looks perfectly balanced on paper "perfect" but is this actually balanced for a Guild Wars 2 Conquest match? In short, no.Yeah, that was my primary concern, when thinking about how matchmaker should behave in such situations. Especially how such skill distribution between teams would affect the winrate. I.e., will the 1700 dude carry the game, of will those 3 1500 guys do?I don't know how to approach it yet :/

@"Faux Play.6104" said:I thought the wiki said 30 was the low cap, but I could not reach it when I ran multiple matches. Starting at 0 it would slowly grow until it reached 60. Same if you started at 700. It would shrink to 60. Regardless, I think your deviation numbers are too low.Well, my deviation numbers are low indeed, because I took active steps to reduce it.I introduced the decaying RD for all playerbase, assuming that towards the end of the season people settle to their true rating.Now I updated the code - shifted the mean of gaussian to 1200, reduced standard deviation to 200 (so 3 sigma level is still 1800). And also I removed the decaying RD and introduced hard cap of 30 - all according to the wiki page (I didn't read it before, lol)

The second step, that I took, and it by far is playing a much more important role:Glicko-2 takes 6 parameters for its calculation of player's new rating, RD and volatility. Those are: current rating, current RD and volatility - all 3 are scalars. And also 3 arrays, which provide info about his opponents from N previous games. Opponent's ratings, RDs and match results (either 0 or 1). Or, well, if you want, a 2-dimensional array (N,3).As we all know, when we are Unranked, the game asks us to play 10 matches for "seeding".Now, I initially took those 10 games as N, and kept it constant throughout whole ordeal. The results were looking something like this:wx86b9A.png

As you can see: huge volatility, RD never drops below 40, and, obviously, HUUUUUUUUUUUUGE WIN STREAKS AND LOSE STREAKS.

Then I assumed, like, man, devs can't be that shallow. They definitely have this N increasing, as the season progresses. I.e., the "game history array" should be increasing with time . It definitely should have more than 10 games recorded.That was my assumption.So, I introduced the "growing array" - after every new match, that our player played, the algorithm "remembered" all his previous games. Up until it reached 100 games. I had to stop at 100, because otherwise my laptop was just basically saying "there's no way I'm doing it in the next millennium".So, after it reached 100 games, the first game (historically) was removed from the array, 2nd game became 1th, 3th became 2nd and so on. Freeing the space for the last game.

And that's what I got for doing that (the same picture is in OP post):V97iP4Y.png

Now, if I didn't introduce RD cap 30, it would drop to ~0 values quite soon. Volatility is almost non-existent, rating stabilises at ~1900 (which is a TRUE SKILL level for our test dude).

Wiki doesn't have that info. And you see yourself how significantly it's affecting the results. Therefore, I'm asking @Ben Phongluangtham.1065: can you tell what exactly this constant is? Is it 10, or is it gradually increasing to certain level (like in my simulation it was 100)?The question is super-important.

Link to comment
Share on other sites

@"Tiah.3091" said:(...)So, I introduced the "growing array" - after every new match, that our player played, the algorithm "remembered" all his previous games. (...)

As far as I know, this is how Glicko-2 works in general. The examplepaper on it shows it like this - an ever decreasing RD out of all previous games (which is less, though, since noone plays 100 matches in chess per year. Maybe not representative there.^^).

Also I think a limit to minimum RD is kind of okay - but I think it is too high. I once asked for it to be reduced: less effect of skill changes throughout the season, balance patches and stuff, but way less volatility on late matches reducing the punishment of many games played per season and maybe decreasing toxicity of later games. Maybe we can get a hint here too and maybe it could be reduced?

Link to comment
Share on other sites

@"Tiah.3091" said:As we all know, when we are Unranked, the game asks us to play 10 matches for "seeding".The 10 seeding games are to mask the fact that your rating is potentially shifting by hundreds of points in the first few games. The less players see, the less they freak out about it before thinking.


@"Trevor Boyer.6524" said:I mean, does no one else find it odd that rating never settles as if your skill never settles? I have nearly 13,000 matches played and I'm sure I've peaked in my skill at Guild Wars 2 at this point! So why is my rating so ridiculously volatile every season all season? I would expect to bounce around between 1600ish and 1500ish, but to bounce around between 1650 and like 1350 four or five times a season, due to win & lose streaks that come and go like scheduled clockwork? ^^ It makes one wonder, but again, this is all just conjecture and a lot of conspiracy theory.With the numbers used in GW2, I would expect a fluctuation of 150-200 points is normal (two standard deviations in each direction). Further variation can be explained by changes in the player from day to day. Maybe you're tired, playing a different build, get frustrated with a few losses and let it cloud your judgment, etc.

But by far the BIGGEST factor that stunts the accuracy of simulations is that they in no way consider how Conquest is actually played. This will be easier to explain in a list:

  • 1700 guy ques and match maker makes him a match. He gets put into a team looking like, RED: 1700 1400 1400 1400 1400
  • He gets put against BLUE: 1500 1500 1500 1400 1400
  • This looks perfectly balanced on paper "perfect" but is this actually balanced for a Guild Wars 2 Conquest match? In short, no.
  • What usually happens in these situations is that the 1700 will take and defend any node that he is at the entire game, but his 1400 teammates on the other two nodes that he won't be at, are going to get crunched the entire game by the 1500s. The BLUE team will likely hold 2 nodes much more often throughout the game and win the match as such.I would actually posit the opposite outcome. The 1700 player, if playing a solo/assassin build (holosmith, mesmer, and thief are all good at this) would single out opponents and easily defeat them. This causes the opposing team to stagger, which makes it all that much easier for him to control at a choke point shared between multiple nodes. The remaining 1400 players can zerg around and overwhelm their opponents.

If you're looking for a theory with some weight behind it, try this:HoT and PoF have introduced many builds with a low skill threshold. If you can hit buttons quickly, you can do decently - mechanical skill has dramatically decreased as a discriminator. Further, fight/run decisions and map strategy (rotation) are considerably advanced skills. This causes a large amount of players to fit in ratings just below those players who have the fight/run and map strategy skill set. The rating system tries to smooth them into a normal distribution, but because there is a lot of people with little skill difference, there is significant volatility. If you're above that threshold and have a bad day, you're stuck with that fickle group and luck in matchmaking can pull you down. This is especially true if you play a role which needs teamplay to succeed.

Link to comment
Share on other sites

@Exedore.6320 said:

@"Trevor Boyer.6524" said:
  • This looks perfectly balanced on paper "perfect" but is this actually balanced for a Guild Wars 2 Conquest match? In short, no.
  • What usually happens in these situations is that the 1700 will take and defend any node that he is at the entire game, but his 1400 teammates on the other two nodes that he won't be at, are going to get crunched the entire game by the 1500s. The BLUE team will likely hold 2 nodes much more often throughout the game and win the match as such.I would actually posit the opposite outcome. The 1700 player, if playing a solo/assassin build (holosmith, mesmer, and thief are all good at this) would single out opponents and easily defeat them. This causes the opposing team to stagger, which makes it all that much easier for him to control at a choke point shared between multiple nodes. The remaining 1400 players can zerg around and overwhelm their opponents.

Whoever thought it would be nice thing to do, if I accounted for 5v5 fights, instead of 1v1.Do you see that shit?How the heck am I supposed to evaluate the win probability in a chaos like this? :#(And win probability is THE MOST crucial part of the algorithm, because otherwise how should it converge to your "true skill level"?)

Link to comment
Share on other sites

@Ben Phongluangtham.1065 said:

@Faux Play.6104 said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

This is not accurate. When a match is being built around a player, the matchmaker first looks for 9 other people within 25 points of rating. If it doesn't find enough people after 5 minutes (in ranked) it starts expanding the range over time until it finds enough players. Note: This doesn't mean if you've been in queue shorter than 5 minutes, everyone in the match is going to be within 25 points. If you're in a match with people over 25 points in rating difference than you, it just means whichever player the matchmaker built the match around was probably in queue for 5 minutes or more.

After those 10 people around found, is when it arranges teams to ensure that each side is close in average skill rating and standard deviation from that skill rating.

Additional note: We've experimented with making it so that everyone in a match had to have been waiting over 5 minutes before their ranges expanded. It didn't generally result in better matches and people at the higher end of skill rating ended up sometimes waiting in excess of 40 minutes for matches.

Is there any limit on the expanded search range? Have you experimented with that?

Perhaps lowering the search time to expand to about 2-3 minutes, BUT having a hard maximum search range of +/- 50 or 100 would help?

Link to comment
Share on other sites

@Tiah.3091 said:

@"Trevor Boyer.6524" said:but again, this is all just conjecture and a lot of conspiracy theory.Pretty much what you said, yeah, lol xD(Also, quite a tough hypothesis to falsify, because it's hard to account for such factor.)
  • 1700 guy ques and match maker makes him a match. He gets put into a team looking like, RED: 1700 1400 1400 1400 1400
  • He gets put against BLUE: 1500 1500 1500 1400 1400
  • This looks perfectly balanced on paper "perfect" but is this actually balanced for a Guild Wars 2 Conquest match? In short, no.Yeah, that was my primary concern, when thinking about how matchmaker should behave in such situations. Especially
    how such skill distribution between teams would affect the winrate
    . I.e., will the 1700 dude carry the game, of will those 3 1500 guys do?I don't know how to approach it yet :/

@"Faux Play.6104" said:I thought the wiki said 30 was the low cap, but I could not reach it when I ran multiple matches. Starting at 0 it would slowly grow until it reached 60. Same if you started at 700. It would shrink to 60. Regardless, I think your deviation numbers are too low.Well, my deviation numbers are low indeed,
because I took active steps to reduce it.
I introduced the decaying RD for all playerbase, assuming that towards the end of the season people settle to their true rating.Now I updated the code - shifted the mean of gaussian to 1200, reduced standard deviation to 200 (so 3 sigma level is still 1800). And also I removed the decaying RD and introduced hard cap of 30 - all according to the
(I didn't read it before, lol)

The
second step, that I took, and it by far is playing a much more important role:
Glicko-2 takes 6 parameters for its calculation of player's new rating, RD and volatility. Those are: current rating, current RD and volatility - all 3 are scalars. And also 3 arrays, which provide info about his opponents from N previous games. Opponent's ratings, RDs and match results (either 0 or 1). Or, well, if you want, a 2-dimensional array (N,3).
As we all know, when we are Unranked, the game asks us to play 10 matches for "seeding".
Now, I initially took those 10 games as N, and kept it constant throughout whole ordeal. The results were looking something like this:
wx86b9A.png

As you can see: huge volatility, RD never drops below 40, and, obviously,
HUUUUUUUUUUUUGE WIN STREAKS AND LOSE STREAKS.

Then I assumed, like, man, devs can't be that shallow. They definitely have this N increasing, as the season progresses. I.e.,
the "game history array" should be increasing with time
. It definitely should have more than 10 games recorded.
That was my assumption.
So, I introduced the "growing array" - after every new match, that our player played, the algorithm "remembered" all his previous games. Up until it reached 100 games. I had to stop at 100, because otherwise my laptop was just basically saying "there's no way I'm doing it in the next millennium".So, after it reached 100 games, the first game (historically) was removed from the array, 2nd game became 1th, 3th became 2nd and so on. Freeing the space for the last game.

And that's what I got for doing that (the same picture is in OP post):
V97iP4Y.png

Now, if I didn't introduce RD cap 30, it would drop to ~0 values quite soon. Volatility is almost non-existent, rating stabilises at ~1900 (which is a TRUE SKILL level for our test dude).

Wiki doesn't have that info. And you see yourself how significantly it's affecting the results. Therefore, I'm asking @Ben Phongluangtham.1065:
can you tell what exactly this constant is? Is it 10, or is it gradually increasing to certain level (like in my simulation it was 100)?
The question is super-important.

Cooking the books to force it to 0 deviation isn't realistic statistics. You never know something with 100% certainty :-)

Something still looks off. I don't get wild swings of deviation like that or rating once it settles. Once you get about 20-30 matches in the +/- is about 12 points per game. I'm not summing matches and calculating a new rating, as this isn't done like a chess tournament. After every match I calculate a new rating and deviation for the players. It is an assumption, but maintaining a queue of match history for 10s-100s of thousands of players seems like a waste of computing resources. The main reason they did that was so you didn't have to repeat the iterative part of the calculation if you do it by hand. With a computer that is trivial.

Link to comment
Share on other sites

@Ben Phongluangtham.1065 said:

@"Faux Play.6104" said:The match maker doesn't try to match you with teammates that are close to your rating. Instead it matches your team's average rating withe the other team's average rating. Teammates can have several hundred points between the best and the worst player.

This is not accurate. When a match is being built around a player, the matchmaker first looks for 9 other people within 25 points of rating. If it doesn't find enough people after 5 minutes (in ranked) it starts expanding the range over time until it finds enough players. Note: This doesn't mean if you've been in queue shorter than 5 minutes, everyone in the match is going to be within 25 points. If you're in a match with people over 25 points in rating difference than you, it just means whichever player the matchmaker built the match around was probably in queue for 5 minutes or more.

After those 10 people around found, is when it arranges teams to ensure that each side is close in average skill rating and standard deviation from that skill rating.

Additional note: We've experimented with making it so that everyone in a match had to have been waiting over 5 minutes before their ranges expanded. It didn't generally result in better matches and people at the higher end of skill rating ended up sometimes waiting in excess of 40 minutes for matches.

Thanks for the info!

Based on that I took a stab at definitions on the wiki for the following code:<Rating start="5m" end="10m" max="1200" min="25"/>https://wiki.guildwars2.com/wiki/PvP_Matchmaking_Algorithm

Filter/Rating/@MinThe maximum rating difference between rosters the filter starts at.Filter/Rating/@MaxThe maximum rating difference between rosters that can exist after padding is applied.

Link to comment
Share on other sites

@Faux Play.6104 said:

After every match I calculate a new rating and deviation for the players. It is an assumption, but maintaining a queue of match history for 10s-100s of thousands of players seems like a waste of computing resources.

Wait, I didn't get it: in your code you don't feed the history of player's previous matches to glicko?But that is just simply wrong!Even in the pdf the author does the example run for 3 matches.

Logic is: the better algorithm knows the history, the more precise it is

Link to comment
Share on other sites

@"Tiah.3091" said:How the heck am I supposed to evaluate the win probability in a chaos like this? :#(And win probability is THE MOST crucial part of the algorithm, because otherwise how should it converge to your "true skill level"?)

I'm fairly certain that the rating adjustment is player rating vs. averaged team rating. Several seasons ago I did a few ranked games with a friend. I'm in platinum, he's somewhere around silver/gold. My rating adjustments were tiny for a win compared to me playing in platinum; his were huge. For a loss, mine were huge and his were tiny. A player vs. 5 players setup could also produce this, but ANet would have to do something to account for the magnitude of adjustment which I find unlikely.

Link to comment
Share on other sites

@Tiah.3091 said:

After every match I calculate a new rating and deviation for the players. It is an assumption, but maintaining a queue of match history for 10s-100s of thousands of players seems like a waste of computing resources.

Wait, I didn't get it: in your code you don't feed the history of player's previous matches to glicko?But that is just simply wrong!Even in the pdf the author does the example run for 3 matches.

Logic is: the better algorithm knows the history, the more precise it is

The history is built into the rating and deviation numbers. The history you are referring to is to minimize the number of times you have to do the iterative calculation if you are doing it by hand.

Each match would sill be evaluated once.

The term for increasing volatility due to inactivity is disabled, so the period isn't of much use.

Link to comment
Share on other sites

@Deimos.4263 said:Do any of these models take into account a player's skill improving over time? Because of course it will. You learn stuff.

Of course. However, only the relative skill improvement. So if everybody else improves at the same rate, you remain at your current rating. :wink:

The lower limit of RD exists to account for skill changes throughout the season as well as balance patch changes and stuff like that. It does not fix your rating too quickly (and therefor precisely), so things can still change.

Link to comment
Share on other sites

@Deimos.4263 said:Do any of these models take into account a player's skill improving over time? Because of course it will. You learn stuff.

No, my code doesn't account for it. But this code and this entire thread is mostly dedicated to one problem: winstreaks and losestreaks.Which tend to happen at MUCH shorter intervals, than the player would learn his stuff.I mean, you have a lose streak of 5-10 games, then a winstreak 5-10 games.I really doubt the player can improve his skills any faster than 100-200 games. Therefore, the effect is absolutely insignificant.


@"Faux Play.6104" said:The history is built into the rating and deviation numbers. The history you are referring to is to minimize the number of times you have to do the iterative calculation if you are doing it by hand.

Well, this is just plain wrong. Did you really read the paper?QNv3lrA.png

"m opponents with ratings mu1, mu2, ... mum" or "scores against EACH opponent"Can you see capital greek Sigma letter? With "j=1" below and "m" above?Do you know what this means?

I'm just asking, though. Probably I have misunderstood you.But RESULTS of the matches vs m previous opponents are DEFINITELY taken into account. The results of the matches - is that what I call "match history".Please tell me, if I'm still unclear.


@"Exedore.6320" said:I'm fairly certain that the rating adjustment is player rating vs. averaged team rating.

Oh.. of that I'm fairly certain as well. Perhaps with some lowering coefficient, but yeah, I've been in that situation, where I lose 15 and friend loses 8.

What I was talking about, IS NOT the "win probability" from the Glicko, which is required for rating update:LeLajOl.pngNo.I meant the REAL win probability. Why is it not the same - because glicko takes your (and your opponents) current rating for the calculation. Which is likely not exactly your real rating. Especially, if the season has just began.I.e. the dude, who was 1900 last season plays the game with 9 scrubs, who were 800-1300 last season.However, on paper, EVERYONE'S rating might be 1200 (first game of the season for all 10 ppl).What Glicko will calculate in this situation is obvious - it'll just take all those 1200 ratings, do its magic, and BOOM - everyone's equal, the winrate is 50v50%.

But is it true? No. So, what was the ACTUAL win probability for that game?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...