12 hours. Took 12 hours for me to be allowed into a Q again and ONLY reason im allowed to play currently is because i was able to join a team for the Tournament.
The only thing that worked for me was exiting to desktop for a bit then logging back on. I don't know if the length of time logged off makes a difference.
Thank you for the feedback Anet. Some of us recognize that we cannot just assume you are doing nothing.
I wanted to provide you with a status update on the Queue Instability issue's.
For many of you, noticed or not, there has been a noticeable increase in the reliability of the queue system (about 10x) since we deployed a change last Wednesday, July 10. That being said, we are still seeing two additional types of 'stuck' screens that we are continuing to dive into.
So what fix did we push out last week? Depending on how closely you follow ours or industry tech, you may know that our infrastructure is built using micro-services. Each service deals with (ideally) one core task, and can talk to other micro-services through messaging. The micro-service that handles arena-based PvP is called PvpSrv. (Go figure...) When creating objects (Arena's, matches, rosters, etc), PvpSrv will "talk" to other services to persist the current data and state of each of these objects.
For some clusters of micro-services, each service is able to talk to others directly, no middle men or gatekeepers or anything. Some micro-services however live in different clusters. For PvpSrv to talk to some of these services, it must make a connection to a "gateway" micro-service, and that gateway will forward the message to the appropriate micro-service in a different cluster. This all works well for the case of a few micro-services sending a few messages, but PvpSrv is not the only service talking cross-cluster. We have... several... gateways that handle the traffic of... several... micro-services.
So there's our background - PvpSrv, when setting up a player in a new roster, will send messages to some local micro-services for data and persistence, and will send messages through gateways for additional data and state persistence. How was this causing "stuck" rosters? PvpSrv config was set up to use 'round robin' gate connections; each roster would get its state updates through a different gateway. (e.g. If we had 4 gateways, 25% of all rosters would be on gateway 1, 25% of rosters on gateway 2, etc.) This worked well for distributing the message load, but didn't work so well for restoration and resilience.
There are many reasons why a service can restart, hardware can die (much less likely), or a network can disconnect (more common than you think). In the case of PvpSrv talking to the gateways, if and when a gateway connection terminated, PvpSrv would have all the rosters re-connect to the new pool of available gateways. For the majority of rosters, they would retain their existing connection. However, for rosters that were talking through the terminated gateway, they would create a new connection to another gateway, but the micro-service they were talking to would not know where to send any in-progress response messages. If a state update was made, the backing service would now be sending a message to a gateway that may or may not be connected to a given roster object. Then of course, the roster object would miss its state update, and it would, well, stick.
In terms of code changes, the actual change was very simple - instead of round-robin assignment, PvpSrv now connects to one gateway with a single connection. If and when this one connection is severed, PvpSrv will connect to another, single gateway. All rosters are associated with the single gateway, and the backing micro-services have only one location through which to send messages.
This was a great find, and I am glad to have seen the incident count drop dramatically over the past week. As stated, we still have some work to do, and are currently eyes deep in an issue surrounding map voting and sticking progress.
We hope this and other up-coming changes positively impact your PvP experiences!
Thank you for the explanation. For those into techs like me this is a very interesting post you've made. Most Devs usually avoid the technical explanation after a fix, so I want you to know that when it comes it's highly appreciated.
Great explanation. Thanks for taking the time to fill us in on something that many people wouldn't bother to explain.
Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.
@Boris Losdindawoods.3098 said:
Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.
The original spelling (and the inflection marks) have been altered/removed, but it's pronounced as 'neck-or-chuck', I usually add emphasis on the 'or', but sometimes my dad or uncle will stress the 'Neck'.
@Neftex.7594 said:
does that mean that now when one gateway fails, all the messaging will go overload another gateway and another gateway after and another...
This was an item we looked at when testing the fix. At the current traffic levels, we saw no measurable difference. Because the Gateways are essentially just routers, if we do start to see a large uptick in messages sent, the impact would be slightly increased latency for object data/state updates. If things become measurably slower, there is always the option of getting beefier hardware and/or a larger software change where rosters and other objects would have knowledge of their specific gateway connection, and would update the backing service if and when their gateway connection changes.
Thanks for all the positive feedback! I'll have to keep digging into interesting bugs and writing them up for you all!
Well I was doing PvP and now can't get into a game after 3 matches - ranked or unranked it just does nothing. This new build tonight done wonders for it
this starts happening nearly every day now - for me 3 days ago, yesterday and on top of the cake today map results screens blocks everything on all characters and no way to get a rid of it and can't play anything.
now the patch notes say "Bug Fix - Fixed a server crash.", but it's almost as if a server crash was implemented
Comments
12 hours. Took 12 hours for me to be allowed into a Q again and ONLY reason im allowed to play currently is because i was able to join a team for the Tournament.
PLEASE get this kitten fix.
The only thing that worked for me was exiting to desktop for a bit then logging back on. I don't know if the length of time logged off makes a difference.
Thank you for the feedback Anet. Some of us recognize that we cannot just assume you are doing nothing.
Hello Again PvP Community!
I wanted to provide you with a status update on the Queue Instability issue's.
For many of you, noticed or not, there has been a noticeable increase in the reliability of the queue system (about 10x) since we deployed a change last Wednesday, July 10. That being said, we are still seeing two additional types of 'stuck' screens that we are continuing to dive into.
So what fix did we push out last week? Depending on how closely you follow ours or industry tech, you may know that our infrastructure is built using micro-services. Each service deals with (ideally) one core task, and can talk to other micro-services through messaging. The micro-service that handles arena-based PvP is called PvpSrv. (Go figure...) When creating objects (Arena's, matches, rosters, etc), PvpSrv will "talk" to other services to persist the current data and state of each of these objects.
For some clusters of micro-services, each service is able to talk to others directly, no middle men or gatekeepers or anything. Some micro-services however live in different clusters. For PvpSrv to talk to some of these services, it must make a connection to a "gateway" micro-service, and that gateway will forward the message to the appropriate micro-service in a different cluster. This all works well for the case of a few micro-services sending a few messages, but PvpSrv is not the only service talking cross-cluster. We have... several... gateways that handle the traffic of... several... micro-services.
So there's our background - PvpSrv, when setting up a player in a new roster, will send messages to some local micro-services for data and persistence, and will send messages through gateways for additional data and state persistence. How was this causing "stuck" rosters? PvpSrv config was set up to use 'round robin' gate connections; each roster would get its state updates through a different gateway. (e.g. If we had 4 gateways, 25% of all rosters would be on gateway 1, 25% of rosters on gateway 2, etc.) This worked well for distributing the message load, but didn't work so well for restoration and resilience.
There are many reasons why a service can restart, hardware can die (much less likely), or a network can disconnect (more common than you think). In the case of PvpSrv talking to the gateways, if and when a gateway connection terminated, PvpSrv would have all the rosters re-connect to the new pool of available gateways. For the majority of rosters, they would retain their existing connection. However, for rosters that were talking through the terminated gateway, they would create a new connection to another gateway, but the micro-service they were talking to would not know where to send any in-progress response messages. If a state update was made, the backing service would now be sending a message to a gateway that may or may not be connected to a given roster object. Then of course, the roster object would miss its state update, and it would, well, stick.
In terms of code changes, the actual change was very simple - instead of round-robin assignment, PvpSrv now connects to one gateway with a single connection. If and when this one connection is severed, PvpSrv will connect to another, single gateway. All rosters are associated with the single gateway, and the backing micro-services have only one location through which to send messages.
This was a great find, and I am glad to have seen the incident count drop dramatically over the past week. As stated, we still have some work to do, and are currently eyes deep in an issue surrounding map voting and sticking progress.
We hope this and other up-coming changes positively impact your PvP experiences!
-R
Robert Neckorcuk
Server Programmer
Thank you for the explanation. For those into techs like me this is a very interesting post you've made. Most Devs usually avoid the technical explanation after a fix, so I want you to know that when it comes it's highly appreciated.
Well done! Thanks for the update.
does that mean that now when one gateway fails, all the messaging will go overload another gateway and another gateway after and another...
A disconnect? In my network?
It's more likely than you think!
(Seriously though, thanks for all the work to fix the issue as well as the open communication/explanation)
Great explanation. Thanks for taking the time to fill us in on something that many people wouldn't bother to explain.
Now, on to the important question... How in the heck is your last name pronounced? I can't decide if the emphasis should be on the 2nd syllable, or it's a double emphasis on the 1st and 3rd syllables, or if it's something else.
Really appreciate these technical talks. Good thing it is now resolved! Thanks.
The original spelling (and the inflection marks) have been altered/removed, but it's pronounced as 'neck-or-chuck', I usually add emphasis on the 'or', but sometimes my dad or uncle will stress the 'Neck'.
This was an item we looked at when testing the fix. At the current traffic levels, we saw no measurable difference. Because the Gateways are essentially just routers, if we do start to see a large uptick in messages sent, the impact would be slightly increased latency for object data/state updates. If things become measurably slower, there is always the option of getting beefier hardware and/or a larger software change where rosters and other objects would have knowledge of their specific gateway connection, and would update the backing service if and when their gateway connection changes.
Thanks for all the positive feedback! I'll have to keep digging into interesting bugs and writing them up for you all!
Robert Neckorcuk
Server Programmer
Well I was doing PvP and now can't get into a game after 3 matches - ranked or unranked it just does nothing. This new build tonight done wonders for it
PvP queue issue is back again~~
Just happened
I've written similar explanations and this is a very good write-up, thank you for sharing.
Is the messaging service SNS, something else, or proprietary?
Is it possible that the UI for the map selection screen can be pushed to the side and shrunk?
This would allow us to continue to play other parts of the game when this bug occurs.
Currently when encountering this you are forced to stop playing.
Can´t play PVP!! bugs bugs and more bugs!
Im stuck in queue help
Im stuck in the queue please help me get out of it as the screen blocks most of the content
this starts happening nearly every day now - for me 3 days ago, yesterday and on top of the cake today map results screens blocks everything on all characters and no way to get a rid of it and can't play anything.
now the patch notes say "Bug Fix - Fixed a server crash.", but it's almost as if a server crash was implemented
The same bug on 2020 again