Jump to content
  • Sign Up

Recent PvP Queuing Instability


Recommended Posts

@"Ben Phongluangtham.1065" said:Hey everyone. Just wanted to give an update on this bug. We have a potential "bulwark" to the problem currently in testing. While not a fix, our hope is that it will reduce the time someone experiences the problem from potentially hours to minutes. Once this has gone through enough testing, we hope to be able to deploy it soon.

As far as real fix, it's a very difficult problem for us to track down. It's not something we've been able to reproduce internally and only seems to happen in a live environment with normal server load. We're hoping at that as we add more logging, we'll find a long term solution.

Hey Ben, just wanted to add some information. It has hit me for over 4 hours today. I can pick a match from a game browser, but cannot solo queue in ranked or unranked. I can get pulled into a queue while duoing, and I can also queue the duo myself.

Link to comment
Share on other sites

  • Replies 72
  • Created
  • Last Reply

Top Posters In This Topic

  • 2 weeks later...

The only thing that worked for me was exiting to desktop for a bit then logging back on. I don't know if the length of time logged off makes a difference.
Thank you for the feedback Anet. Some of us recognize that we cannot just assume you are doing nothing.

Link to comment
Share on other sites

  • ArenaNet Staff

Hello Again PvP Community!

I wanted to provide you with a status update on the Queue Instability issue's.

For many of you, noticed or not, there has been a noticeable increase in the reliability of the queue system (about 10x) since we deployed a change last Wednesday, July 10. That being said, we are still seeing two additional types of 'stuck' screens that we are continuing to dive into.

So what fix did we push out last week? Depending on how closely you follow ours or industry tech, you may know that our infrastructure is built using micro-services. Each service deals with (ideally) one core task, and can talk to other micro-services through messaging. The micro-service that handles arena-based PvP is called PvpSrv. (Go figure...) When creating objects (Arena's, matches, rosters, etc), PvpSrv will "talk" to other services to persist the current data and state of each of these objects.

For some clusters of micro-services, each service is able to talk to others directly, no middle men or gatekeepers or anything. Some micro-services however live in different clusters. For PvpSrv to talk to some of these services, it must make a connection to a "gateway" micro-service, and that gateway will forward the message to the appropriate micro-service in a different cluster. This all works well for the case of a few micro-services sending a few messages, but PvpSrv is not the only service talking cross-cluster. We have... several... gateways that handle the traffic of... several... micro-services.

So there's our background - PvpSrv, when setting up a player in a new roster, will send messages to some local micro-services for data and persistence, and will send messages through gateways for additional data and state persistence. How was this causing "stuck" rosters? PvpSrv config was set up to use 'round robin' gate connections; each roster would get its state updates through a different gateway. (e.g. If we had 4 gateways, 25% of all rosters would be on gateway 1, 25% of rosters on gateway 2, etc.) This worked well for distributing the message load, but didn't work so well for restoration and resilience.

There are many reasons why a service can restart, hardware can die (much less likely), or a network can disconnect (more common than you think). In the case of PvpSrv talking to the gateways, if and when a gateway connection terminated, PvpSrv would have all the rosters re-connect to the new pool of available gateways. For the majority of rosters, they would retain their existing connection. However, for rosters that were talking through the terminated gateway, they would create a new connection to another gateway, but the micro-service they were talking to would not know where to send any in-progress response messages. If a state update was made, the backing service would now be sending a message to a gateway that may or may not be connected to a given roster object. Then of course, the roster object would miss its state update, and it would, well, stick.

In terms of code changes, the actual change was very simple - instead of round-robin assignment, PvpSrv now connects to one gateway with a single connection. If and when this one connection is severed, PvpSrv will connect to another, single gateway. All rosters are associated with the single gateway, and the backing micro-services have only one location through which to send messages.

This was a great find, and I am glad to have seen the incident count drop dramatically over the past week. As stated, we still have some work to do, and are currently eyes deep in an issue surrounding map voting and sticking progress.

We hope this and other up-coming changes positively impact your PvP experiences!

-R

Link to comment
Share on other sites

@"Robert Neckorcuk II.6193"Thank you for the explanation. For those into techs like me this is a very interesting post you've made. Most Devs usually avoid the technical explanation after a fix, so I want you to know that when it comes it's highly appreciated.

Link to comment
Share on other sites

@"Robert Neckorcuk II.6193" said:...or a network can disconnect (more common than you think).

A disconnect? In my network?

It's more likely than you think!

(Seriously though, thanks for all the work to fix the issue as well as the open communication/explanation)

Link to comment
Share on other sites

@"Robert Neckorcuk II.6193" said:

Great explanation. Thanks for taking the time to fill us in on something that many people wouldn't bother to explain.

Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.

Link to comment
Share on other sites

  • ArenaNet Staff

@Boris Losdindawoods.3098 said:Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.

The original spelling (and the inflection marks) have been altered/removed, but it's pronounced as 'neck-or-chuck', I usually add emphasis on the 'or', but sometimes my dad or uncle will stress the 'Neck'.

@Neftex.7594 said:does that mean that now when one gateway fails, all the messaging will go overload another gateway and another gateway after and another...

This was an item we looked at when testing the fix. At the current traffic levels, we saw no measurable difference. Because the Gateways are essentially just routers, if we do start to see a large uptick in messages sent, the impact would be slightly increased latency for object data/state updates. If things become measurably slower, there is always the option of getting beefier hardware and/or a larger software change where rosters and other objects would have knowledge of their specific gateway connection, and would update the backing service if and when their gateway connection changes.

Thanks for all the positive feedback! I'll have to keep digging into interesting bugs and writing them up for you all!

Link to comment
Share on other sites

  • 2 weeks later...

this starts happening nearly every day now - for me 3 days ago, yesterday and on top of the cake today map results screens blocks everything on all characters and no way to get a rid of it and can't play anything.

now the patch notes say "Bug Fix - Fixed a server crash.", but it's almost as if a server crash was implemented

Link to comment
Share on other sites

  • 5 months later...
  • 11 months later...

Took me a bit to notice the date on the thread. I was about to comment that a friend of mine had this bug. He couldn't queue for a PvP match. It wouldn't even let me do it so long as he was in my party. He was worried for a while that he may have been banned from PvP. Mind you, he doesn't talk in chat so neither of us could think of a reason why that might be the case.

Nonetheless, the bug is very much still present today.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...