Jump to content
  • Sign Up

Glicko-2 algorithm put into code (Updated). Conclusion about win-streaks.


Tiah.3091

Recommended Posts

UPDATED 24 Oct, 11:19 PM (GMT+3)

INTRO

A few weeks ago there was a thread, where some dude hypothesised, that "MatchMaking algorithm forces you to lose games in streaks after you had a win streak." Here it is.The thread was met with healthy criticism, and one dude, @"Megametzler.5729" , even linked a pdf, where MM was described, step by step.I read it thoroughly (at least I think I did), and after that I had a feeling, that the situation, which OP described, is kinda-sorta possible-ish. Except MM didn't force anyone, ofc. But more on that later.

So, as you can check yourself, the math, which describes the algorithm is quite trivial. (Well, the math, that describes it is indeed trivial, but math, which took the author of the paper to prove it actually works - is slightly more complicated. Sadly, the full process is not described in the paper) Despite that fact, the formulas look quite clunky and not very fit for visual comprehension.That's why I thought it would be a fun thing to do, if I put them into a code. So, yesterday I was really bored and gave it a try: link to Python 2.7 Jupyter Notebook (updated). The code itself is a little bit trashy, but should be easy enough to read.

The main goal of the code was to simulate a game history of some player in 1v1 scenario (although, in GW2 spvp happens in form of 5v5, in the context of our hypothesis it doesn't really matter).In order to simulate something, you have to provide the model of some level of adequacy.In the case of this code, there supposed to be 2 models (the 2nd I'll add later)

MODEL-1

Description:

  • There's ONE very high TRUE SKILL level player. Let's say, 1900 level.
  • Although, he's initially Unranked, and has to play 10 games for seeding against various opponents of some skill level.And, to his misfortune, he does it, while being STONED AS FUCK, which in terms of math means, his winrate against 800-1200 scrubs is precisely 50% (for those 10 games only)
  • Then he finishes with some result and feels like "WTF, man, that won't do, I must tryhard." And he starts doing exactly that, playing with his full potential.
  • Important, MatchMaker Algorithm: The matchmaker now assumes, that all players in the game have their ratings distributed according to Gauss Distribution. With mean 1000 rating and standard deviation 266 rating (see the image 1.)_(1200 was taken from here, as well as the other constants, and the standard deviation I took assuming, that 1800 is 3 sigma level, where only 0.3% of the elitist dudes play. 30 ppl above 1800, makes the playerbase something like 10000 - that makes sense, I guess.)_It works like this: the matchmaker rolls normally distributed number (with mu and sigma 1000 and 200, accordingly)If it does NOT BELONGin a range of +/- 10 rating from our dude current rating, then we increase this range by +10 (making it +/-20, then +/-30, and so on), and re-iterate the process.If it DOES, however, we take this number value as our opponent rating for the current game. The higher our dude's rating, the tougher it is for him to find decent opponents (see image 2)
  • Then we calculate the winrate against this opponent, according to Glicko manual (see the formula for E in glicko-2.pdf, Step 3). Then we record win or loss for this game, in accordance to that winrate.
  • Then we calculate the updated rating, rating deviation and volatility from the game he just played, and update his game history with those values. Initially it was 10 games after seeding, and now it starts growing. If it grows to 100, we then start removing the first element of the history array and shift the whole array left by 1, and record the last game as before (I.E., making space for new games, and forgetting very old ones)
  • Then we reiterate process until our guy plays 500 games.

UPDATE from 24 Oct:1) removed RD decay over time, introduced a hard cap for RD=30,2) updated mean and standard deviation for gaussian of skill,2) took system constant and other parameters from wiki page

And, finally, we can see how his rating changes with time, by the end of a season. See image 3.

nS2Wmnl.pnggc0o0zh.pngV97iP4Y.png

(from top to bottom):

IMAGE-1: Gauss distribution of TRUE SKILL levels of players in GW2 leaderboard. Approximately, of course. 10000 players, mean is 1000, sigma is 266.These numbers derived from the assumption, that 3-sigma level is 1800 rating, and there are 30 players above 1800.

IMAGE-2 Matchmaking representative samples for players with 1000 TRUES SKILL level (Blue) and 1900 TRUES SKILL (Green). As you can see, 1000 rating player will almost always be playing with similar level opponents. While 1900 rating player will not only be playing against a much wider range of opponents, but he will also be forced to play against lower skill players most of the times.

IMAGE-3 The game history of our 1900 rated player. Rating is displayed at the left scale (Red) and the Rating Deviation on the right scale (Blue). Note how quickly it converges to 1800-2000 range and stays there throughout the whole season. Even though winstreaks occur, they won't bring the rating too high.

MODEL-2

@"Airdive.2613" said:I've come up with an idea of an interesting (at least to me) experiment....The data of interest:

  1. I'm curious to see the distribution of the overall number of people depending on their rating (like a histogram of the number of players to divisions), as well as "bad" and "good" ones independently.
  2. Using the same data as in point 1, calculate the sum of all players' ratings. What I mean is, how does the sum of all players' ratings after 100,000 games compare to the initial sum of their ratings? (The initial ratings' sum would be, for example, 1,500x5,000 = 7,500,000.)

This would require a slightly different model, and... It's coming soon ;)


OUTRO

Well, you've got the idea - although, the graph 3 clearly indicates, that winstreaks (and lose streaks) MAY EXIST to certain level, they shouldn't take you much farther, than +/-50 rating below or above your TRUE SKILL LEVEL. Especially at the end of the season.

As always, take it with a grain of salt. Because the model is STILL quite simplified and there's STILL a lot of uncertainties and unknowns.Constructive critics is welcomed. Please check the code yourself, if you're interested.


ALL THE MATERIALS, IN CASE SOMEONE MISSED SOMETHING:1) Code (python 2.7 notebook)2) Glicko-2.pdf3) previous thread4) Gauss Distribution

Link to comment
Share on other sites

  • Replies 62
  • Created
  • Last Reply

Top Posters In This Topic

Haha, great job man. :smiley: You actually did it! Thanks a lot for this!

Also, absolutely correct sidenote about the matchmaking <> Glicko. Matchmaker is developed by Anet and we have rather few information on this. But Glicko(2) is a solid and widely used and proven system.

Still I am kind of surprised about the huge deviations. Did you turn some screws, the system constant tau (τ) for example? Could you add two or three graphs for different values, because that is supposed to determine the volatility? But only if you should be bored (again). :lol: I don't know Python and I am abroad so I don't have Matlab on my private computer...

Anyway, again, thanks a lot! Really fun to see.

€: I see, you used 0.8. Could you give it a shot with lower values? :smile:

Link to comment
Share on other sites

Wow the major swings in the rating are pretty insane. I wonder if this is representative of other people's exp. However the ability to do que definitively skews this. I play exclusively solo and the highest I've gotten this season is 1657. When I hit that I noticed teams almost always had two do que. Kinda annoyed with that.

Link to comment
Share on other sites

@jportell.2197 said:Wow the major swings in the rating are pretty insane. I wonder if this is representative of other people's exp. However the ability to do que definitively skews this. I play exclusively solo and the highest I've gotten this season is 1657. When I hit that I noticed teams almost always had two do que. Kinda annoyed with that.

I seem to get better average results when I solo que past low Plat, than when I duo Q. Talking over the course of about 300 games last season, I'm as of yet unable to play competitively still this season, due to an ongoing hand injury.

Link to comment
Share on other sites

@"Megametzler.5729" said:Still I am kind of surprised about the huge deviations. Did you turn some screws, the system constant tau (τ) for example? Could you add two or three graphs for different values, because that is supposed to determine the volatility? But only if you should be bored (again). :lol: I don't know Python and I am abroad so I don't have Matlab on my private computer...

Anyway, again, thanks a lot! Really fun to see.

€: I see, you used 0.8. Could you give it a shot with lower values? :smile:

Yeah, well, I'd suggest you, guys, not to take THE ACTUAL VALUES of the swings too seriously. Because they are most likely just too huge.My matchmaking (actual matchmaking, lol) algorithm doesn't take into account, that by the end of the season the majority of players have their rating deviations gradually decreased.My assumption about Rating Deviation, is that when it goes below a certain threshold for an opponent, it instead rerolls randomly to be in range of 0-50.Which is, again, is FALSE for the VAST MAJORITY of players.

My point is, is that those swings shouldn't be TAHT HUGE, like you guys correctly noticed.Instead, if the player TRUE SKILL rating is 1500, the graph should look something like this:

iVcQDHS.png

(Sorry for the Paint).In other words - it should converge to 1500. I'm pretty sure one can do that, by playing with parameters a little bit. But with that much of free parameters (including the most major one - the ACTUAL MM algorithm), that's like finding a needle in a haystack.

What can be learned from it RIGHT NOW, however, is that you are most likely to converge to your TRUE SKILL rating at one point or another, but that might be quite a bumpy ride with big lose streaks and equally big win streaks.

I'll play with it tomorrow, hopefully. I mean, with variable RD for an opponent. And I'll also try one with a bigger TAU (kinda like a "viscosity" parameter, eh?).But you, guys, are more than a welcome to give it a try as well!Cheers!

Link to comment
Share on other sites

@"Airdive.2613" said:So, the question. If the matchmaking is not provided by Glicko, what exactly does it do? Is its only output a number indicative of a player's rating? If so, is its only function to determine the amount of points gained/lost?

Well, assuming, that there WAS a decent matchmaking, Glicko can be descibed as follows: it takes player's current rating, rating deviation and volatility, and the ratings and rd's of the enemies from N of his previous matches. And then returns the updated values of rating, rating deviation and volatility (for player).3 numbers, to be precise. Not 1.But if you put it simple - yes. It "doesn't do much". All the matchmaking magic happens thankfully to some cryptic A-Net's algorithm.

Matchmaking algorithm, which I used in my code is retardedly simple: if the enemy fits into a range of _[current_player_rating - 50; current_playerrating+50], then this is our guy. Exact value of the enemy rating is taken randomly, ofc.

Link to comment
Share on other sites

@Arlette.9684 said:

@jportell.2197 said:Wow the major swings in the rating are pretty insane. I wonder if this is representative of other people's exp. However the ability to do que definitively skews this. I play exclusively solo and the highest I've gotten this season is 1657. When I hit that I noticed teams almost always had two do que. Kinda annoyed with that.

I seem to get better average results when I solo que past low Plat, than when I duo Q. Talking over the course of about 300 games last season, I'm as of yet unable to play competitively still this season, due to an ongoing hand injury.

Solo queue tends to yield better results UNLESS you have a solid duo queue partner. Duo queue means you have a good chance of ending up with 3 people that don't know wtf they're doing - since duo queue inflates you're combined MMR

Link to comment
Share on other sites

@"mrauls.6519" said:

@jportell.2197 said:Wow the major swings in the rating are pretty insane. I wonder if this is representative of other people's exp. However the ability to do que definitively skews this. I play exclusively solo and the highest I've gotten this season is 1657. When I hit that I noticed teams almost always had two do que. Kinda annoyed with that.

I seem to get better average results when I solo que past low Plat, than when I duo Q. Talking over the course of about 300 games last season, I'm as of yet unable to play competitively still this season, due to an ongoing hand injury.

Solo queue tends to yield better results UNLESS you have a solid duo queue partner. Duo queue means you have a good chance of ending up with 3 people that don't know kitten they're doing - since duo queue inflates you're combined MMR

Is this factual? If it is, it's rather counter intuitive.

Link to comment
Share on other sites

I actually experience elongated win/lose streaks extremely frequently to the point that most of my seasons are played as win or lose streaks. Why does this happen to me? Good question, but it makes me wonder if:

  • (A) There is something going on that Arenanet isn't aware of
  • or (B) There is something going on that they don't talk to us about

Either way, in 6 years and almost 13,000 matches played, I've come to the conclusion that the algorithm notes and all simulations are somewhere incorrect and in no way reflect the deep applied effects of 3rd party programs/smurfing/win trading/whatever the hell else is going on, and their effects on actual match making.

Don't believe me? I had woke up earlier this morning to play some games and went on a 10 or 11 game win streak. And no, there is no win trading here. This is just random legit ranked solo que. I mean, does someone have a plausible explanation for this happening so frequently to some players? I'd love to hear it.

gIeVC6K.jpg

Link to comment
Share on other sites

@"Trevor Boyer.6524" said:I actually experience elongated win/lose streaks extremely frequently to the point that most of my seasons are played as win or lose streaks. Why does this happen to me? Good question, but it makes me wonder if:

  • (A) There is something going on that Arenanet isn't aware of
  • or (B) There is something going on that they don't talk to us about

Either way, in 6 years and almost 13,000 matches played, I've come to the conclusion that the algorithm notes and all simulations are somewhere incorrect and in no way reflect the deep applied effects of 3rd party programs/smurfing/win trading/whatever the hell else is going on, and their effects on actual match making.

Don't believe me? I had woke up earlier this morning to play some games and went on a 10 or 11 game win streak. And no, there is no win trading here. This is just random legit ranked solo que. I mean, does someone have a plausible explanation for this happening so frequently to some players? I'd love to hear it.

gIeVC6K.jpg

Am I correct in my assumption that all under 10 min games were blowouts? I have never really bothered to time PvP games so I'm curious to get some input on the subject. Also what was your baseline rating at the beginning of the streak vs where it ended?

Link to comment
Share on other sites

@"Trevor Boyer.6524" said:Don't believe me? I had woke up earlier this morning to play some games and went on a 10 or 11 game win streak. And no, there is no win trading here. This is just random legit ranked solo que. I mean, does someone have a plausible explanation for this happening so frequently to some players? I'd love to hear it.

I do believe you, because I experience exatly the same - 10 wins and 10 losses streaks with may be like 1-2 outliers. Well, normal "2 win, 1 loss, 1 wins, 2 loss" stuff also happens, ofc. But to my feeling, these win and lose streaks happen WAY too often. MUCH more often, than in other games with rating.

So, an attempt to find and explanation is explicitly the reason of this thread.So far, what I can tell for sure - if you win, say 7 games in a row, the matchmaker SHOULD put you against a "statistically tougher" opponent. Because normally it should put you against an equal opponent. An absolute case of equal opponent is you "mirror". Someone with exactly the same rating and RD as you.BUT!In case of a winstreak, precise "mirror match" (when your supposed opponent has EXACT same rating and RD, as you) results in WIN expectancy >50%..

Though, for the ultimate success I still need to converge that graph.

Link to comment
Share on other sites

I've come up with an idea of an interesting (at least to me) experiment.Could you please provide the data on the following?

  • Let's assume there are 5,000 players in the system and this number does not change.
  • Let's assume every player's initial scores are exactly the same (with the rating of, for example, 1,500).
  • Let's assume there are three tiers of players (1,000 "bad", 3,000 "average", and 1,000 "good"; tiers do not change throughout the experiment) and the randomly formed team's chances of winning somewhat depend on each player's tier.
  • Let the matchmaking then randomly create a large number (say, 100,000) of games of 10 players (maybe in the form of "choose 5 players, then choose another 5 players, then do a random roll for victory (normally isn't just 50%), then compare and update each player's scores, then use the updated scores for the rest of the experiment").

The data of interest:

  1. After these 100,000 games our "players" will presumably spread across the ladder. I'm curious to see the distribution of the overall number of people depending on their rating (like a histogram of the number of players to divisions), as well as "bad" and "good" ones independently.
  2. Using the same data as in point 1, calculate the sum of all players' ratings. What I mean is, how does the sum of all players' ratings after 100,000 games compare to the initial sum of their ratings? (The initial ratings' sum would be, for example, 1,500x5,000 = 7,500,000.)
  3. Moving on to another stage, a "soft reset" occurs somehow at the end of the "season". How does it affect the sum of all players' ratings, if does at all (or maybe it is just a volatility reset)?
  4. After running several seasons of the same experiment (and with the soft reset occuring in-between), I'd like to once again see the histogram of the number of players to divisions for the whole population, "bad" subgroup, and "good" subgroup, as well as the sum of all players' ratings.

I know it's a lot to ask, but unfortunately I'm not familiar with programming and it seems like too daunting a task to do it using spreadsheets.

Link to comment
Share on other sites

@"Airdive.2613" said:I've come up with an idea of an interesting (at least to me) experiment.Could you please provide the data on the following?That is indeed an interesting experiment, and actually quite common one, when you do the statistical analysis. Thank you, for pointing it out! ;)

It's actualy much easier task, than you might think, when using that code.1)

Let's assume every player's initial scores are exactly the same (with the rating of, for example, 1,500).This assumption is already in the code.2)Let's assume there are 5,000 playersthis assumption is done with 1 line, basically. Right now I only provided statistics for 1 player - his entire rating history throughout the season. To make it 5000 - you just have to run 1 additional loop from 0 to 4999. And since we only interested in the FINAL rating value for each player, the answer could be represented as an 1 dimensional array of values, something like this [1485,1257,...1763].3)Let's assume there are three tiers of playersThis is called normal distribution.I was planning to introduce it at some point, though, in a different place.Right now the MM algorithm simply takes random opponents from the interval of ratings. While normal distribution would suggest, that it has lesser probability to take the opponent with extremally different rating. E.g. the interval is 1400-1600 (+/- 100 around 1500), and the probability of taking the opponent with 1410 rating is 10%, while the probability of taking 1490 guy is 90% (right now those probabilities are equal)4)the randomly formed team's chances of winning somewhat depend on each player's tierRight now the TRUE SKILL of our experiment-rabbit player is 1500. And the probability to win is determined linearly - 1500 vs 500 rating is 100% win, 1500 vs 1500 rating is 50% win, 1500 vs 2500 is 0% win. (I was thinking to replace this model with some more sophisticated. Like the mentioned ELO - it has win probability dependencies from difference in players ratings.)What you suggest, is that I take those 5000 players, and simply introduce a normal distribution to their TRUE SKILL level. I.e., we had 1 guy with 1500, now we'll have 500 guys between 1400 and 1500, 10 guys between 0 and 200 and 50 guys between 1900 and 2000.This is easy =)5)maybe in the form of "choose 5 players, then choose another 5 playersUltimatively, the Glicko algorithm only calculates the outcome for 1v1 matches. In GW2, as we know it, spvp matches are 5v5. Then how the algorithm transfers it to 5v5 you ask? Well, my suggestion: it simply takes the average rating of the enemy team, and treats this team as a singular opponent for each player in ally team.So, in the end it doesn't really matter - the results won't be much different in case of 1v1 and 1v5. So, the the suggestion to make it 5v5, I think is unnecessary.


But all in all, this is a great suggestion, and I was planning to do it myself at one point or another (well, after I make the player rating converge to a certain value and decrease volatility).Thanks, man!

Link to comment
Share on other sites

Want to add one more 'variable' into your theorycrafting regarding the MMe. Just a while ago I get this match with 4 engineers, 2 being scrappers (both on my team) and the other 2 holos were enemies. I think most people can understand what happened. I mean wouldn't be possible for the engineers to be equally distributed in teams? Why 2 scrappers -a profession that is weak especially compared to holo- in one team?

Link to comment
Share on other sites

Е> @"Dreddo.9865" said:

Want to add one more 'variable' into your theorycrafting regarding the MMe. Just a while ago I get this match with 4 engineers, 2 being scrappers (both on my team) and the other 2 holos were enemies. I think most people can understand what happened. I mean wouldn't be possible for the engineers to be equally distributed in teams? Why 2 scrappers -a profession that is weak especially compared to holo- in one team?

That's not exactly a "variable". That's not really anything. I understand what are you talking about, but how do you suggest I should evaluate the "strength" of the spec? Just take a "wild guess"? That there are 2 equally skilled players, one's playing holo, another one playing scrapper, and the one playing holo has a better chance of winning? Better how much exactly? 20% 10%? 1%?That is basically a free parameter - whatever I chose to take, would affect the model significantly.The model has enough free parameters on its own, I'm not sure it would be a good idea, to introduce EVEN MORE of those.

Link to comment
Share on other sites

@"Tiah.3091" said:

Ultimatively, the Glicko algorithm only calculates the outcome for 1v1 matches. In GW2, as we know it, spvp matches are 5v5. Then how the algorithm transfers it to 5v5 you ask? Well, my suggestion: it simply takes the average rating of the enemy team, and treats this team as a singular opponent for each player in ally team.So, in the end it doesn't really matter - the results won't be much different in case of 1v1 and 1v5. So, the the suggestion to make it 5v5, I think is unnecessary.

Oh, that's cool!To clarify: what I wanted to look at (in my latter points) is whether it might be possible that the total sum of the players' ratings changes with time (if they aren't zero-sum), thus causing the leaderboard to become "biased" after several seasons - upping or lowering the mean/median player rating within the same (0, 2100) borders or revealing some sort of a pattern.I agree the normal distribution is clearly a better choice in terms of modelling the playerbase, but it must get increasingly harder to calculate the probability of winning. I mean, it shows well "how many people" are better than you, but I don't know for sure just "how much better" a 1800 player is over a 1500 one. Some math needs to be done. :0

Link to comment
Share on other sites

@"Airdive.2613" said:To clarify: what I wanted to look at (in my latter points) is whether it might be possible that the total sum of the players' ratings changes with time (if they aren't zero-sum), thus causing the leaderboard to become "biased" after several seasons - upping or lowering the mean/median player rating within the same (0, 2100) borders or revealing some sort of a pattern.

Some of your questions I can answer without any modelling.

  • Even in the case of zero sum game, the ratings for the constant number of people doesn't remain constant, it inflates very slightly, because ratings <0 are not allowed. I.e., when you win against 0 rating player and receive 10 points - he didn't lose 10, he remains at 0. Which means you got those 10 points from nowhere, therefore the inflation.
  • However, the leaderboard doesn't "become "biased" after several seasons ", because every season it resets. So, the payer rating sums are equal at the beginning of every season.

I know what you mean: few years ago there were people with 2k ratings (well, I only play for 2 month, that's just my assumption).Now, top 10 people are all <1900, HOW COME?I'll tell you how^ it has nothing to do with "biasing". It's simply the amount of playing people reduced significantly. Therefore the absolute sum of players ratings became lower (simply because it was ~1500N, and now it's ~15000.5*N). It's like the system ran out of fuel - those skilled players can't simply take away rating from others, because there's nothing to take anymore.

However, an important note, is that unlike ELO, Glicko is not a zero summ system (at least I think it's not). How this non-zero summ would affect the player summ - that's a very interesting question.Although, I really doubt it will be much. Because even if summ is not ZERO for SINGULAR player, on AVERAGE of N players it should be more or less ZERO.Where N is the system constant - in case of my code (and in case of GW2 spvp) its 10. Remember how they ask you to play 10 games to determine your rating - that's the constant I'm talking about.

Link to comment
Share on other sites

So basically this only shows what we've known since the Glicko / Glicko2 algorithms were released:

  • Initially a player's rating has a high volatility which decreases over time. This is exactly how the algorithm is supposed to work. It doesn't know a lot about the player initially, so their rating adjustments are larger.
  • You can never view a player's rating as a single number; it should be viewed as a their true rating falling within their Glicko2 rating ± their Glicko2 deviation. Given that GW2 rating shifts by 20-30 points per game once volatility stabilizes, I would expect deviation (once stable) in the 50-100 range. A deviation of 50 means that a 1200 rated player could be rated between 1100 and 1300 with 95% certainty. This deviation is why tier groupings are a good choice for games over raw rating.

The main flaw in the original work is how win/loss is determined. It's definitely not linear. That flaw leads to more volatility in the rating than there should be. I would suggest using the Elo probability of winning calculation for determining the outcome, using the player's true skill and the opponents actual rating (assuming all opponents were rated correctly). Going further, you could do additional tweaks:

  • Toss in a small, random fudge factor on top of that probability to account for people DC'ing or manipulating.
  • Simulate a pool of players at once rather than one at a time. Your current method knows your test player's true rating, but it assumes all other teammates and opponents are correctly rated - this is never the case.
Link to comment
Share on other sites

@"Exedore.6320" said:The main flaw in the original work is how win/loss is determined. It's definitely not linear. That flaw leads to more volatility in the rating than there should be. I would suggest using the Elo probability of winning calculation for determining the outcome, using the player's true skill and the opponents actual rating (assuming all opponents were rated correctly).

Yeah, I was thinking the same:

@Tiah.3091 said:Right now the TRUE SKILL of our experiment-rabbit player is 1500. And the probability to win is determined linearly. I was thinking to replace this model with some more sophisticated. Like the mentioned ELO - it has win probability dependencies from difference in players ratings.ELO indeed should give much better approximation, than linear model.

Going further, you could do additional tweaks:

  • Toss in a small, random fudge factor on top of that probability to account for people DC'ing or manipulating.
  • Simulate a pool of players at once rather than one at a time. Your current method knows your test player's true rating, but it assumes all other teammates and opponents are correctly rated - this is never the case.

That, on the other hand are interesting suggestions. I'll give it a try, thanks!

Link to comment
Share on other sites

@"Tiah.3091" said:(...)

iVcQDHS.png

(...)

On a side note: The rating deviation (RD) does usually have a lower limit to account for "skill changes", in the GW environment of course also balance patches and class changes and stuff. After the first 15-20 games we all seem to hit that limit - that is when the placement match deviations have become low. Ever wondered why the tenth matches gives ±30 rating? Then the 20th game at the exact same rating only gives ±15 (and stays like that)? That is the RD or it's lower limit respectively. You can still have statistical streaks, but the impact is lower.

On the topic of the matchmaker: We have rather few informations her indeed. https://wiki.guildwars2.com/wiki/PvP_Matchmaking_Algorithm gives some hints that it tests loops to look for the "best rooster" of teams, but we do not know many details on the criterias or for example how duos are implemented. Ben once showed an example here on the forums*, where it seemed not to be accounted for at all. That would indeed be a major flaw, but not connected to win/loss streaks.

So if your rating inflates by duoQing, you might indeed experience a loss streak to get you back to your solo rating. :wink:

Final note: Glicko-2 is solid, though I would still like to see some changes. Matchmaking itself however could be an issue. DuoQs are only my personal worst idea. It does not, however, seem to look at your previous match outcomes except they keep lying to us. :smile:

*Here: https://en-forum.guildwars2.com/discussion/54656/match-ranking/p2

@Ben Phongluangtham.1065 said:

@Axelteas.7192 said:This season im finding pro teams in silver, thats unacceptable... matches losing 15-500

Can you tell me the date/time and exact score of the match where you as a silver player had pro teams? I'd like to look at that match.

Please help me look at a match that ended a minute ago.

Maybe about 10/7/2018 - 12:28AM Server time.

Didn't grab screenshots of it, but this is the only match on my account for the day. Horrible match, completely one sided blowout. was 2x duos vs full team of soloqueue, and I'm pretty sure half of my team were bots, and one guy said he was silver just in the game for pips.

I think I found the match, at least this one had 2 duo queues on the opposite team. No silver players. The averages skill rating difference between the teams is 5 points.

Blue team (Defeated):Ranger - 1393Necromancer - 1432Necromancer - 1440Thief - 1515Guardian - 1521Average Skill Rating - 1460.2Std. Deviation - 55.72

Red team (Winner):Guardian - 1359Thief - 1391Necromancer - 1475Necromancer - 1514Mesmer - 1587Average Skill Rating - 1465.2Std. Deviation - 92.33
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...