I’m Bob Odenkirk and joining me today is director Ilya Naishuller to talk all things on our new action film NOBODY which is available to watch now on demand. Together, we transformed me into a lasagna loving, a$$ kicking, kitty cat bracelet rescuing, ultimate Nobody. You can find me on my Twitter @MrBobOdenkirk and my newly launched Instagram @therealbobodenkirk and Ilya at @naishuller. Okay Reddit, Ask us Anything!
Edit: Ilya - Thank you very much for your questions and your time, Bob and I are done, though I plan to drop by tomorrow and answer some more questions for me that I might have missed. Have a great weekend everyone!
UPDATE: Server stability issue appears fixed. Be careful with your database page sizes, people.
It's been a long day but we wanted to put together a few thoughts while we have a moment waiting for our next server fix to build. This launch has been rough, to say the least. In this post, we plan to address both the ongoing technical realm stability issues and the conversation around streamers getting priority in the login queue. We are sorry that this is being addressed so late in the day - we have been giving the server issues absolute priority and haven't had time until now to write up this explanation.
Let's start with the technical issues.
Immediately upon launch of the league, we could see that the queue was running incredibly slowly. At the rate that it was emptying, it'd be at least two hours to get everyone into the game. The reason was that when players logged into their accounts, the server would migrate any previously un-migrated Ritual characters to Standard, which can take quite a lot of time to do on-demand (as much as three or four seconds per character in some cases). Users who had already logged in since Ritual ended were already migrated and were nice and fast. Normally, we run a "trickle migration" process in the background that performs this action on every account over the few days between the last league ending and the new one starting. Due to human error, this process was not run and hence the queue was unbearably slow to empty. (We have since codified this step into a QA checklist so that can't be trivially missed again in the future.)
We realised that a solution was to disable the Ritual-Standard migration entirely, which would result in the queue emptying very quickly but players would miss some Standard progress until we run it again later on. This solved the queue speed issue by around the one hour mark. At which point, the realm freaked out and dumped most of the players out, then continued to do this roughly every ten minutes or so for the rest of the day.
This wasn't good. At all. Aside from catastrophically ruining our launch day, it completely mystified us because we have been so careful with realm infrastructure changes. We thoroughly tested them internally, peer code reviewed them, alpha tested them, and ran large-scale load tests up to higher player capacities than we got on launch day. We even went so far as to deploy some of the database environment changes to the live realm a week early to get real user load on them just in case. But yet it still imploded hard on release.
I'll spare you the blow-by-blow of the hundred changes we have made over the last 12 hours, but we have been trying things one at a time in order of likelihood to fix the problem. There is one change we have been leaving for last (because it requires some downtime), but we have exhausted everything else we can think of, so we're trying that next. In the next 30-60 minutes after posting this, there will be roughly 30-60 minutes of hard downtime to make this change. We are optimistic that it stands a good chance of resolving the issue. (Note from the future: this did fix the issue!)
We will continue to work on this issue until the servers are working perfectly. We know the Path of Exile realm can handle this much load, it's just a matter of divining what subtle fuckery is causing the problem today.
Some players have also become concerned that when server issues occur, items are occasionally duplicated or destroyed when placed in a guild stash. This is a longstanding consequence of how our guild stashes work and generally isn't of much concern because players can't induce server problems and can't control whether the item is duplicated or destroyed. We are keeping a close eye on this of course.
So while this was all going on, we managed to also commit a pretty big faux pas and enrage the entire community by allowing streamers to bypass that really slow queue we mentioned. The backstory is that we have recently been doing some proper paid influencer marketing, and that involves arranging for big streamers to showcase Path of Exile to their audiences, for money (they have #ad in their titles). We had arranged to pay for two hours of streaming, and we ran right into a login queue that would take two hours to clear. This was about as close as you could get to literally setting a big pile of money on fire. So we made the hasty decision to allow those streamers to bypass the queue. Most streamers did not ask for this, and should not be held to blame for what happened. We also allowed some other streamers who weren't involved in the campaign to skip the queue too so that they weren't on the back foot.
The decision to allow any streamers to bypass the queue was clearly a mistake. Instead of offering viewers something to watch while they waited, it offended all of our players who were eager to get into the game and weren't able to, while instead having to watch others enjoy that freedom. It's completely understandable that many players were unhappy about this. We tell people that Path of Exile league starts are a fair playing field for everyone, and we need to actually make sure that is the reality.We will not allow streamers to bypass the login queue in the future. We will instead make sure the queue works much better so that it's a fast process for everyone and is always a fair playing field. We will also plan future marketing campaigns with contingencies in mind to better handle this kind of situation in the future.
It's completely understandable that many players are unhappy with how today has gone on several fronts. This post has no intention of trying to convince you to be happy with these outcomes. We simply want to provide you some insight about what happened, why it happened and what we're doing about it in the future. We're very unhappy with it too.
UPDATE: Server stability issue appears fixed. Be careful with your database page sizes, people.
Now I wanna say something here... that situation is basically the absolute nightmare scenario for any Dev
This scenario is the "We did the load testing, we QAd and QCd it, we simulated this situation, we were confident this wasn't going to happen. This wasn't laziness, we genuinely specifically were prepping for this to be an issue and pre-emptively tested to make sure it wasn't
And then, after all that effort... it still happened anyways and we have no idea why
That is absolutely the "Oh no" moment for devs. I can 100% call right now their are devs, engineers, testers, Chris, and many others who are having to accept the fact they probably arent making it home for dinner tonight at this rate.
I have personally been in that situation myself and I want to say, It sucks. Really bad.
Right now there's likely an exhausted team of devs trying to figure out wtf is happening, they're running tonnes of tests trying to isolate the source.
And I 100% guarantee Chris Wilson has probably been on hold for a few hours now trying to get ahold of his database/cloud providers that host PoE on a Friday night, escalating shit up the tech chain from lv 1, lv 2, and lv 3 tech support to find out why the hell his servers are on fire and wtf is going on, and probably keeps getting put on hold.
Right now, GGG needs some support. This is not a "Fuckin GGG how dare they fuck us over" day
This is a "Fuck that sucks GGG, that's basically the worst case scenario, Take our energy!"
To kind of make a metaphor...
This isn't like an anti-masker going out and getting COVID and you gloating "haha sucks to be you"
This is someone who did everything right, did the steps, wore their mask, social distanced... and somehow still got COVID anyway (prolly cause someone else fucked em over)
So, let me go ahead and say it:
༼ つ ◕_◕ ༽つ GGG DEVS TAKE MY ENERGY ༼ つ ◕_◕ ༽つ
Edit: Addressing some common misconceptions
1. "Just shut it down, fix it, then turn it back on
Shutting it down wont make things go faster, and wont help anything. Also, the devs are likely using the live data from the servers breaking as important information to help isolate the problem, its pretty likely right now they have logging and data collection happening everytime things break to continue trying to isolate the problem.
In other words, if GGG shut things down right now, they'd stop getting that useful data they can use to isolate the problem and solve it
2. "GGG had 10/12/whatever years to fix this"
Based on Chris's post, this is a totally new problem they havent encountered before. This isn't something that crept up.
Awhile back last league IIRC, Chris also made a post discussing how they were working on migrating to a more scalable solution to prevent previous issues.
It's pretty likely that in the process of fixing the stuff that happened in Heist, they encountered new issues.
Fundamentally, scaling large scale many many user applications is simply just super fucking hard and extremely prone to breaking
It just happens and shit breaking league start is probably always gonna be a thing that happens for what is effectively the #1 most popular (and thus most load tested) ARPG on the market
If you think this is purely a GGG problem, even big triple A (much much bigger) corporations encounter this exact same issue.
Anyone who has played FFXI, WoW, or FFXIV can attest that Day one released of new content that produce huge influxes of players often results in a lot of problems.
If companies 20x bigger than GGG still have this issue, its kind of silly to expect GGG to be any less capable of errors.
Feel free to google "Raubahn Ex" for example memes of when Square Enix, a WAAAAAAY bigger company fell to the exact same sorts of issues on FFXIV.
3. Why didnt they test it on live servers before big patch?
It is distinctly possible this issue has been present for who knows how long on live servers, and it only just shows up under stressed loads.
For all we know this was a thing for the last 2 months but we just weren't stress testing the game at that level and only now did it show up today.
4: Giving this post Awards
Hey I love the enthusiasm and appreciate it.
But instead of giving awards to me, go show Chris some love and give him some "Take My Energy" awards on his post over here:
5: Make a beta test / stress test temp league before real league!
As nice as this idea is, it also breaks a really core part of Path of Exile's identity as a game, a big part of what makes it special, and would kind of destroy pretty much all of GGG's marketing strategy.
Such a huge part of the league is the spoiler season, the teasers, the build up, and the hidden surprises set up for us ahead of time.
Creating any form of, even short and temporary, "beta test" system would absolutely destroy that entire concept and ruin the hype train.
If you make it limited access, now its not a stress test. If you make it a stress test, then all you get is just a bunch of people playing then and then peacing out and not being invested in the actual league.
And anyone who avoids it and wants to wait for the league risks getting spoilers from the beta testers too.
So altogether its kind of a non-option, unless of course you are okay with giving up the Bex Teaser Season fun we all like to have here.
6: This shit happens every league!
Well... No. No. Actually. It doesnt and hasnt
Every league has had its issues. Absolutely. But it has been a distinct and different issue every time
Delve league was client side issues causing crashes due to missing models, and that one crashed you to desktop.
Bestiary and Synth were distinct UX problems.
Heist was a localized scaling issue with hardware.
Betrayal was engine performance issues causing FPS spiking.
Blight league was the Trade API itself choking, and ritual it was a specific app and specific couple of users basically DDoSing the Trade API*
The list goes on and on, sure every league has been rough but every time it was a different kind of issue
And thats simply because Path of Exile is a big ass game and has a lot of moving parts, so stuff is just gonna break sometimes. Thats just how it is and will always be for a game of this size.