Jump to content
  • Sign Up

Glicko-2 algorithm put into code (Updated). Conclusion about win-streaks.


Tiah.3091

Recommended Posts

Im> @Tiah.3091 said:

@Deimos.4263 said:Do any of these models take into account a player's skill improving over time? Because of course it will. You learn stuff.

No, my code doesn't account for it. But this code and this entire thread is mostly dedicated to one problem: winstreaks and losestreaks.Which tend to happen at MUCH shorter intervals, than the player would learn his stuff.I mean, you have a lose streak of 5-10 games, then a winstreak 5-10 games.I really doubt the player can improve his skills any faster than 100-200 games. Therefore, the effect is absolutely insignificant.

@"Faux Play.6104" said:The history is built into the rating and deviation numbers. The history you are referring to is to minimize the number of times you have to do the iterative calculation if you are doing it by hand.

Well, this is just plain wrong. Did you really read the paper?
QNv3lrA.png

"
m opponents with ratings
mu1, mu2, ... mum
" or "
scores against
EACH
opponent
"Can you see capital greek Sigma letter? With
"j=1"
below and
"m"
above?Do you know what this means?

I'm just asking, though. Probably I have misunderstood you.But RESULTS of the matches vs m previous opponents are DEFINITELY taken into account. The results of the matches - is that what I call "match history".Please tell me, if I'm still unclear.

@"Exedore.6320" said:I'm fairly certain that the rating adjustment is player rating vs. averaged team rating.

Oh.. of that I'm fairly certain as well. Perhaps with some lowering coefficient, but yeah, I've been in that situation, where I lose 15 and friend loses 8.

What I was talking about,
IS NOT
the "win probability" from the Glicko, which is required for rating update:
LeLajOl.png
No.I meant the REAL win probability. Why is it not the same - because glicko takes your (and your opponents) current rating for the calculation. Which is likely not exactly your real rating. Especially, if the season has just began.I.e. the dude,
who was 1900 last season
plays the game with 9 scrubs,
who were 800-1300 last season
.However,
on paper, EVERYONE'S rating might be 1200
(first game of the season for all 10 ppl).What Glicko will calculate in this situation is obvious - it'll just take all those 1200 ratings, do its magic, and BOOM - everyone's equal, the winrate is 50v50%.

But is it true? No. So, what was the ACTUAL win probability for that game?

There is nothing that says m needs to be greater than 1. That portion of the calculation is inside of an iterative loop where you need answers to converge. If you are calculating the iterative portion by hand it is more convenient to sum up several matches and do the iterative portion once.

It doesn't say to treat the summation portion like a fifo queue where each match is evaluated m times.

Link to comment
Share on other sites

  • Replies 62
  • Created
  • Last Reply

Top Posters In This Topic

@"Faux Play.6104" said:There is nothing that says m needs to be greater than 1. That portion of the calculation is inside of an iterative loop where you need answers to converge. If you are calculating the iterative portion by hand it is more convenient to sum up several matches and do the iterative portion once.

It doesn't say to treat the summation portion like a fifo queue where each match is evaluated m times.

Dude. No.If you're referring to the Step 5 in the pdf:ynHjZQ6.pngWhich is LITERALLY the only place, where you could find the word "iteration". The iteration here is only needed for NUMERICAL SOLUTION OF THE EQUATION for new sigma. Here's the wiki page on the matter.Again.NUMERICAL SOLUTION OF THE EQUATION.

It has ABSOLUTELY NOTHING to do with the fact, that Glicko itself uses an array of dimensionality (M,3)as ONE of the INPUT VARIABLES.It looks like this:

  1. [1200, 50, 0]
  2. [1350, 30, 1]
  3. [1100, 60, 1]
  4. [1250, 80, 0]
  5. [1400, 40, 1]Here M = 5 - is a number of opponents, which player fought the last 5 matches. Player rating, RD and volatility - is what he had BEFORE he played those 5 matches. Glicko-2 calculates what his rating will become AFTER he played those 5 matches.I.e., Glicko in fact doesn't calculate your rating relative to your PREVIOUS game (like ELO does, for example). It instead calculates it relative to the game, which was M matches before the current one.

Accordingly, from 1) - 1200 is his opponent rating, 50 - RD, and 0 - outcome (0 - defeat, 1 - victory).

When the player plays more than M games, this array starts acting exactly like fifo queue.I'm not sure how else should I explain it. Or why does it even needs explanation.

P.S.Thanks to our dialogue I actually found a huge bug in the code, lol =)

Link to comment
Share on other sites

@Tiah.3091 said:

@"Faux Play.6104" said:There is nothing that says m needs to be greater than 1. That portion of the calculation is inside of an iterative loop where you need answers to converge. If you are calculating the iterative portion by hand it is more convenient to sum up several matches and do the iterative portion once.

It doesn't say to treat the summation portion like a fifo queue where each match is evaluated m times.

Dude. No.If you're referring to the
Step 5
in the pdf:
ynHjZQ6.png
Which is
LITERALLY
the only place, where you could find the word "iteration". The iteration here is only needed for
NUMERICAL SOLUTION OF THE EQUATION
for new sigma. Here's the
on the matter.Again.
NUMERICAL SOLUTION OF THE EQUATION
.

It has
ABSOLUTELY NOTHING
to do with the fact, that Glicko itself uses an array of dimensionality (M,3)as
ONE
of the
INPUT VARIABLES
.It looks like this:
  1. [1200, 50, 0]
  2. [1350, 30, 1]
  3. [1100, 60, 1]
  4. [1250, 80, 0]
  5. [1400, 40, 1]Here M = 5 - is a number of opponents, which player fought the last 5 matches. Player rating, RD and volatility - is what he had BEFORE he played those 5 matches. Glicko-2 calculates what his rating will become AFTER he played those 5 matches.I.e., Glicko in fact doesn't calculate your rating relative to your
    PREVIOUS
    game (like ELO does, for example). It instead calculates it relative to the game, which was
    M matches before the current one
    .

Accordingly, from 1) - 1200 is his opponent rating, 50 - RD, and 0 - outcome (0 - defeat, 1 - victory).

When the player plays more than M games, this array starts acting exactly like fifo queue.I'm not sure how else should I explain it. Or why does it even needs explanation.

P.S.Thanks to our dialogue I actually found a huge bug in the code, lol =)

The rating period is not supposed to be a fifo queue. The way it appears you are describing it is if you picked m=5 a match would appear in the calculation for five different rating periods before it is discarded. The way I'm reading it, m is an arbitrary number based on how many matches someone had during a rating period. Once the period is over the results are tossed because they are captured in the rating and rating deviation terms. They could have played one for the month or they could have played thirty.

The main thing a rating period was for was to increase a players rating deviation when the period was over if they didn't play. The disadvantage if using m=1 for all matches is, IF you get several abnormal results, your deviation won't be increased like it would if you had the stored history. However, since your rating and deviation are calculated more often, they will settle out faster too. If we were trying to incorporate a months worth of results like they do for chess tournaments, having a larger m term simplifies the calculation because you don't have to repeat the iterative portion of the calculation. You just sum up all the matches and do the iterative portion once. You can't update a rating/deviation for a player without doing that iterative portion of the calculation. However, with computers that is a pretty trivial process now.

To have a system where the rating is updated every match that every player plays vs. wait some arbitrary time period, you can't follow glicko-2 exactly for a time based rating period. Using m = 1 is common for online games and even chess sites. Also, eliminating the inactivity portion is common as people tend to abuse it in online games. However, if you google it there are estimation methods for determining what a good period is if you are updating results after every player plays a match.

If you still don't believe me, make two f2p accounts. win the first 3 matches with one and lose the first 3 with the other. Then finish the initial rating period with the same w/l ratio. The account that won the first 3 matches will have a much higher rating than the one that lost the first three. Even though both accounts started with the same rating to start the season. If it was implemented as a 10 match fifo queue, the order of the first 10 matches wouldn't matter. However, in the system it looks more like this: https://forum-en.gw2archive.eu/forum/game/pvp/How-placement-matches-work

Link to comment
Share on other sites

@Faux Play.6104 said:having a larger m term simplifies the calculation because [...bla-bla...] However, with computers that is a pretty trivial process NOW......If you still don't believe me...

Dude, just.... I don't know at this point. Are you seriously not simply trolling me and this entire thread?

If not, please, answer one simple question:WHEN DO YOU THINK GLICKO-2 WAS INVENTED?

(Or, alternatively, when do you think a cumputer was invented?)


I don't have to believe you. Because there's a paper, which I very doubt you have read, and which describes very clear how the algorithm works.

Sorry, I can't provide a more detailed answer, because typing from a phone.

Link to comment
Share on other sites

@Tiah.3091 said:

@Faux Play.6104 said:having a larger m term simplifies the calculation because [...bla-bla...] However,
with computers
that is a pretty trivial process
NOW
......If you still don't believe me...

Dude, just.... I don't know at this point. Are you seriously not simply trolling me and this entire thread?

If not, please, answer one simple question:
WHEN DO YOU THINK GLICKO-2 WAS INVENTED?

(Or, alternatively, when do you think a cumputer was invented?)

I don't have to
believe
you. Because there's a paper, which I very doubt you have read, and which describes very clear how the algorithm works.

Sorry, I can't provide a more detailed answer, because typing from a phone.

Sounds like you aren't interested in discussing. I have read the paper, other guidance on the topic, and implemented it. Results I get agree with observed results from playing the game.

Link to comment
Share on other sites

@"Faux Play.6104" said:Sounds like you aren't interested in discussing. I have read the paper, other guidance on the topic, and implemented it. Results I get agree with observed results from playing the game.

Oh, I am interested, but I don't really see us having it.

I elaborate:

  1. You claim, that you implemented the code, and that it agrees with reality observation. Yet, noone except you hasn't seen neither code, nor results.
  2. You claim, that my implementation is wrong, because I used a "history record of player's games against M opponets". And my M is >1.
  3. And your main argument against it is that M>1 decreases iterative steps in calculation. Which is handy, when you calculate the rating manually, with pen and paper.AND (this is the best part)Because NOW, WHEN WE HAVE COMPUTERS we can easiliy do as much as we want of those iteration steps, and, therefore, we better use M=1.

-

In my previous posts I asked you 3 questions:

  1. Can you provide the code of yours and results?
  2. While doing the calculation "on computer" we can have both M>1 and M=1. Yet, you choose latter over the former.Why?Your RD will be huge because of that. Why do you think, that having huge RD is a good thing?3.Most important question: why do you think someone would EVER do the "pen and paper" calculation for Glicko-2?!When do you think Glicko was invented?

Three questions, dude. You ignored all of them.When you provide an answer to ALL THREE, then we have a discussion.

Link to comment
Share on other sites

@Tiah.3091 said:

@"Faux Play.6104" said:Sounds like you aren't interested in discussing. I have read the paper, other guidance on the topic, and implemented it. Results I get agree with observed results from playing the game.

Oh, I am interested, but I don't really see us having it.

I elaborate:
  1. You claim, that you implemented the code, and that it agrees with reality observation. Yet, noone except you hasn't seen neither code, nor results.
  2. You claim, that my implementation is wrong, because I used a "history record of player's games against M opponets". And my M is >1.
  3. And your main argument against it is that M>1 decreases iterative steps in calculation. Which is handy, when you calculate the rating manually, with pen and paper.
    AND
    (this is the best part)Because
    NOW, WHEN WE HAVE COMPUTERS
    we can easiliy do as much as we want of those iteration steps, and, therefore, we better use M=1.

-

In my previous posts I asked you 3 questions:
  1. Can you provide the code of yours and results?
  2. While doing the calculation "on computer" we can have both M>1 and M=1. Yet, you choose latter over the former.
    Why?
    Your RD will be huge because of that. Why do you think, that having huge RD is a good thing?3.
    Most important question
    : why do you think someone would
    EVER
    do the "pen and paper" calculation for Glicko-2?!When do you think Glicko was invented?

Three questions, dude. You ignored all of them.When you provide an answer to ALL THREE, then we have a discussion.

I have already answered them all, and I'm trying very hard to be polite.

  1. I have linked results in this thread. If you use the search functions on this forum or the old forum archives I bet you could find more.
  2. m corresponds to the number of matches played in a "rating period". This was originally developed for competitive chess and the rating period was a month. m would likely be a different number for each player. If players didn't play during the rating period, there ratings deviation would be increased. Once the rating update was made for the month the historic information would be purged and a new period would start. If you want to update rating after every match vs. a set period of time m should be 1. This is a common practice on other games and chess sites that update scores after every match played and use glicko. Guild wars, like many other sites that use glicko, don't use "rating period" to increase people's rating deviation because people abuse it to snipe rating. I have already posted this explanation at least once in the thread.
  3. You are misinterpreting what I'm saying. You can't even do it in a spreadsheet efficiently because it requires you to iteratively repeat a calculation until it converges to a solution. You have to create a custom function to do it. Dynamic content on websites wasn't that common in the late 90s. You couldn't just load raw data to a website and have it spit out the scores. This is largely irrelevant to the point that was trying to be made, but you seem fixated on it for some reason.

@Tiah.3091 said:So, I introduced the "growing array" - after every new match, that our player played, the algorithm "remembered" all his previous games. Up until it reached 100 games. I >had to stop at 100, because otherwise my laptop was just basically saying "there's no way I'm doing it in the next millennium".So, after it reached 100 games, the first game (historically) was removed from the array, 2nd game became 1th, 3th became 2nd and so on. Freeing the space for the last> >game.Couple comments on this.

  • The paper doesn't say to use match results from past periods. It says to used match results from the current period.
  • Conceptually it doesn't make sense why you would keep reusing results from previous periods. Their contribution is accounted for in the rating and rating deviation. If you want to use multiple matches, then you need to wait until they have played m number of matches before you calculate a new rating and deviation.
  • Your laptop is struggling to implementing this with one player. What happens when you scale it up to doing 10s of thousands to millions of players?

The main point I was trying to make is your plots don't seem to represent in game results. You rating deviation is settling after only 10-15 games. That means you would be getting gains and losses of around 10-13 points after 15 games played. That doesn't happen until 30+ games into the season. I also don't get wild swings in rating deviation once the rating deviation settles. If you look at your results, how many points does a win get you or a loss when two players at like skill level play each other? When I run the numbers it is around 11. Below is your plot with m = 1 vs. ones I have run using the constants from the guild wars 2 wiki. I included a couple of my plots since the results and matches are randomly generated and will look a lot different every time you run it. As you can see the rating deviation on my plots isn't settling out until 30-50 games into the season.

SIb9bke.pngH0O7UZP.pngBo1XZyb.pngA1ycLIf.pngaceuuLP.png

Link to comment
Share on other sites

  • 1 year later...

@Tiah.3091 said:The main goal of the code was to simulate a game history of some player in 1v1 scenario (although, in GW2 spvp happens in form of 5v5, in the context of our hypothesis it doesn't really matter).

I’d hate to revive an old thread but I feel like this actually needs to be addressed. It’s well known that GLicko or W.e works well and intended for 1 v 1 situations where the win condition is based on the skill of a single entity like in chess for example.

But in a mode where the win condition is not just dependent on you but 4 other strangers, I think that makes a massive difference in how important your Mmr actually is when considering it only uses this number to generate matches.

Perhaps instead of calculating this using a 1v1 environment you should aim for a 5v5 environment where the win condition decides the rating of a single person with respect to 4 other team members.

Link to comment
Share on other sites

@JusticeRetroHunter.7684 said:

@Tiah.3091 said:The main goal of the code was to
simulate a game history
of some player in 1v1 scenario (although, in GW2 spvp happens in form of 5v5, in the context of our hypothesis it doesn't really matter).

I’d hate to revive an old thread but I feel like this actually needs to be addressed. It’s well known that GLicko or W.e works well and intended for 1 v 1 situations where the win condition is based on the skill of a single entity like in chess for example.

I think this is the perfect time to revive this thread, given the recent complaints, once again, about win and lose streaks.

@JusticeRetroHunter.7684 said:But in a mode where the win condition is not just dependent on you but 4 other strangers, I think that makes a massive difference in how important your Mmr actually is when considering it only uses this number to generate matches.

Perhaps instead of calculating this using a 1v1 environment you should aim for a 5v5 environment where the win condition decides the rating of a single person with respect to 4 other team members.

Good point, this is where different carry potential of builds comes into play (on top of so many other factors). Some classes/builds are overall better at "carrying" a team in lower brackets, while at higher brackets (P2 and above) different builds will be useful in a more wider array of situations. That's why cheese builds like core mesmer or 1 shot gimmick builds tend to fall off in higher brackets, unless they provide more than their 1 gimmick move. It's also why builds like Firebrand currently are so overpowered, they are strong in many regards and can adapt to almost any situation at hand.

Let's take Firebrand as example:

  • in lower brackets a heal Firebrand or Symbol Brand can both support the team, hold a point against multiple opponents and even 1v1 or 1v2 right below strong brawlers in case of symbol. The only thing the class lacks is strong chase, which it does not need if close and mid are held. (this gets even worse when factoring for the game "pairing" Firebrands with other guardian builds. If your team has a FB and the other team a DH, you have to be fully retarded to lose the match-up.)
  • in higher brackets the longevity paired with the utility fb has makes the class very strong in a capable players hands.

Way to many players (mostly inexperienced players) value a class/build on their primary ability to 1v1, since the vast majority of players solo queue in lower brackets. That's also why certain classes can perform very well in ranked, but are almost unrepresented in MATs (say Soulbeast/Ranger currently).

Link to comment
Share on other sites

@Ben Phongluangtham.1065 said:

..... When a match is being built around a player....

So the match maker takes a random person that has pressed the que button, and then looks for 9 other people.Ok so if that person is let's say ranked 1650 and end up waiting let's say the 3 min it will expand to be +-75 point? And so on after every 2 min?So inorder to have a better match and not have either people with higher rank then you or lower you litteraly need to cancel the que every 3 min, wait 3 min so ur not placed on some one else's long expanded que and then reque?

I just want fair matches with people my own rating. Not against anyone whos a full 2 brackets below or above me.

Why does the que have to work around one person and not a team. O right there is no competitive team game mode in a 5v5 team based game mode. I'm dumbfounded by how this happened.

Link to comment
Share on other sites

@Eddbopkins.2630 said:

..... When a match is being built around a player....

So the match maker takes a random person that has pressed the que button, and then looks for 9 other people.Ok so if that person is let's say ranked 1650 and end up waiting let's say the 3 min it will expand to be +-75 point? And so on after every 2 min?So inorder to have a better match and not have either people with higher rank then you or lower you litteraly need to cancel the que every 3 min, wait 3 min so ur not placed on some one else's long expanded que and then reque?

No. You can get an insta que pop still with players who have been in the que for 20 mins still.

I mentioned this in the thread that the OP mentions back to, that the matchmaker operates on a server wide ping every 30 seconds. I go into it in a bit more detail in that thread and I suggest reading that to get a better understanding of the que system

In other words, it’s a myth that canceling your que has any positive effect on the matchmaking.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...