Recent PvP Queuing Instability

Kupper.8074 · July 6, 2019

@"Ben Phongluangtham.1065" said:Hey everyone. Just wanted to give an update on this bug. We have a potential "bulwark" to the problem currently in testing. While not a fix, our hope is that it will reduce the time someone experiences the problem from potentially hours to minutes. Once this has gone through enough testing, we hope to be able to deploy it soon.
As far as real fix, it's a very difficult problem for us to track down. It's not something we've been able to reproduce internally and only seems to happen in a live environment with normal server load. We're hoping at that as we add more logging, we'll find a long term solution.

Hey Ben, just wanted to add some information. It has hit me for over 4 hours today. I can pick a match from a game browser, but cannot solo queue in ranked or unranked. I can get pulled into a queue while duoing, and I can also queue the duo myself.

Ouk.5914 · July 7, 2019

12 hours. Took 12 hours for me to be allowed into a Q again and ONLY reason im allowed to play currently is because i was able to join a team for the Tournament.

PLEASE get this crap fix.

jezebel.7054 · July 15, 2019

The only thing that worked for me was exiting to desktop for a bit then logging back on. I don't know if the length of time logged off makes a difference.
Thank you for the feedback Anet. Some of us recognize that we cannot just assume you are doing nothing.

Robert Neckorcuk.1502 · July 16, 2019

Hello Again PvP Community!

I wanted to provide you with a status update on the Queue Instability issue's.

For many of you, noticed or not, there has been a noticeable increase in the reliability of the queue system (about 10x) since we deployed a change last Wednesday, July 10. That being said, we are still seeing two additional types of 'stuck' screens that we are continuing to dive into.

So what fix did we push out last week? Depending on how closely you follow ours or industry tech, you may know that our infrastructure is built using micro-services. Each service deals with (ideally) one core task, and can talk to other micro-services through messaging. The micro-service that handles arena-based PvP is called PvpSrv. (Go figure...) When creating objects (Arena's, matches, rosters, etc), PvpSrv will "talk" to other services to persist the current data and state of each of these objects.

For some clusters of micro-services, each service is able to talk to others directly, no middle men or gatekeepers or anything. Some micro-services however live in different clusters. For PvpSrv to talk to some of these services, it must make a connection to a "gateway" micro-service, and that gateway will forward the message to the appropriate micro-service in a different cluster. This all works well for the case of a few micro-services sending a few messages, but PvpSrv is not the only service talking cross-cluster. We have... several... gateways that handle the traffic of... several... micro-services.

So there's our background - PvpSrv, when setting up a player in a new roster, will send messages to some local micro-services for data and persistence, and will send messages through gateways for additional data and state persistence. How was this causing "stuck" rosters? PvpSrv config was set up to use 'round robin' gate connections; each roster would get its state updates through a different gateway. (e.g. If we had 4 gateways, 25% of all rosters would be on gateway 1, 25% of rosters on gateway 2, etc.) This worked well for distributing the message load, but didn't work so well for restoration and resilience.

There are many reasons why a service can restart, hardware can die (much less likely), or a network can disconnect (more common than you think). In the case of PvpSrv talking to the gateways, if and when a gateway connection terminated, PvpSrv would have all the rosters re-connect to the new pool of available gateways. For the majority of rosters, they would retain their existing connection. However, for rosters that were talking through the terminated gateway, they would create a new connection to another gateway, but the micro-service they were talking to would not know where to send any in-progress response messages. If a state update was made, the backing service would now be sending a message to a gateway that may or may not be connected to a given roster object. Then of course, the roster object would miss its state update, and it would, well, stick.

In terms of code changes, the actual change was very simple - instead of round-robin assignment, PvpSrv now connects to one gateway with a single connection. If and when this one connection is severed, PvpSrv will connect to another, single gateway. All rosters are associated with the single gateway, and the backing micro-services have only one location through which to send messages.

This was a great find, and I am glad to have seen the incident count drop dramatically over the past week. As stated, we still have some work to do, and are currently eyes deep in an issue surrounding map voting and sticking progress.

We hope this and other up-coming changes positively impact your PvP experiences!

-R

Lascax.2163 · July 16, 2019

@"Robert Neckorcuk II.6193"Thank you for the explanation. For those into techs like me this is a very interesting post you've made. Most Devs usually avoid the technical explanation after a fix, so I want you to know that when it comes it's highly appreciated.

Inculpatus cedo.9234 · July 16, 2019

Well done! Thanks for the update. :)

Neftex.7594 · July 16, 2019

does that mean that now when one gateway fails, all the messaging will go overload another gateway and another gateway after and another...

Dahkeus.8243 · July 16, 2019

@"Robert Neckorcuk II.6193" said:...or a network can disconnect (more common than you think).

A disconnect? In my network?

It's more likely than you think!

(Seriously though, thanks for all the work to fix the issue as well as the open communication/explanation)

Boris Losdindawoods.3098 · July 16, 2019

@"Robert Neckorcuk II.6193" said:

Great explanation. Thanks for taking the time to fill us in on something that many people wouldn't bother to explain.

Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.

MakubeC.3026 · July 16, 2019

Really appreciate these technical talks. Good thing it is now resolved! Thanks.

Robert Neckorcuk.1502 · July 16, 2019

@Boris Losdindawoods.3098 said:Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.

The original spelling (and the inflection marks) have been altered/removed, but it's pronounced as 'neck-or-chuck', I usually add emphasis on the 'or', but sometimes my dad or uncle will stress the 'Neck'.

@Neftex.7594 said:does that mean that now when one gateway fails, all the messaging will go overload another gateway and another gateway after and another...

This was an item we looked at when testing the fix. At the current traffic levels, we saw no measurable difference. Because the Gateways are essentially just routers, if we do start to see a large uptick in messages sent, the impact would be slightly increased latency for object data/state updates. If things become measurably slower, there is always the option of getting beefier hardware and/or a larger software change where rosters and other objects would have knowledge of their specific gateway connection, and would update the backing service if and when their gateway connection changes.

Thanks for all the positive feedback! I'll have to keep digging into interesting bugs and writing them up for you all!

Ari.4672 · July 16, 2019

Well I was doing PvP and now can't get into a game after 3 matches - ranked or unranked it just does nothing. This new build tonight done wonders for it :(

Adaruydre.1058 · July 18, 2019

PvP queue issue is back again~~

crimsonhelll.8024 · July 18, 2019

Just happened :/

ImodiumAD.2310 · July 19, 2019

I've written similar explanations and this is a very good write-up, thank you for sharing.

Is the messaging service SNS, something else, or proprietary?

x Charlie.4820 · July 21, 2019

Is it possible that the UI for the map selection screen can be pushed to the side and shrunk?

This would allow us to continue to play other parts of the game when this bug occurs.

Currently when encountering this you are forced to stop playing.

tntt.2309 · July 30, 2019

Can´t play PVP!! bugs bugs and more bugs!

thelol.2947 · August 4, 2019

Im stuck in queue help

chochitogoti.9031 · August 7, 2019

Im stuck in the queue please help me get out of it as the screen blocks most of the content

Koen.1327 · August 10, 2019

this starts happening nearly every day now - for me 3 days ago, yesterday and on top of the cake today map results screens blocks everything on all characters and no way to get a rid of it and can't play anything.

now the patch notes say "Bug Fix - Fixed a server crash.", but it's almost as if a server crash was implemented

Zekent.3652 · February 5, 2020

The same bug on 2020 again :)

testtesttesticle.3802 · February 1, 2021

Same bug still here in 2021 as well, just happened to me. Joining a squad/party didn't work either, still unable to queue.

Kuma.1503 · February 1, 2021

Took me a bit to notice the date on the thread. I was about to comment that a friend of mine had this bug. He couldn't queue for a PvP match. It wouldn't even let me do it so long as he was in my party. He was worried for a while that he may have been banned from PvP. Mind you, he doesn't talk in chat so neither of us could think of a reason why that might be the case.

Nonetheless, the bug is very much still present today.

Recent PvP Queuing Instability

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in