Blog? Entry - Duende - 8/29/15
Posted 29 August 2015 - 11:53 AM
We have the changelog, which is great...ish at what it does, but that doesn't really cover all of the relevant use cases for communicating with the playerbase. Hey, that's you guys! Ordinarily, this sort of thing might go on the website, but the website had some protracted downtime, and we're still in the slow, laborious process of getting it up to speed. Until such time, I/we/the other imms will be tossing out periodic updates via the news. I had considered making a new parallel system specifically for this, BLOG or something, but it wasn't going to be worth the effort to differentiate itself, and I didn't feel like waiting for a reboot. So this is a news. It's not news in the standard sense- it's not announcing anything new, no globals, no new mechanics, so you can absolutely skip this if you want. This will likely be long, but future ones will likely be shorter, per increased frequency (more than one ever).
We're working on stuff. It's large, complicated, time-consuming, ferociously difficult to QA, and utterly gamefucking if done wrong at any level. So, we're currently at ~3ish months without a proper reboot.
I promise the outcome will be worth it. We're frantically trying to pin it down so we can reboot, but we have to do it correctly, and both Iyara, who's our primary QA/point person and I are in the middle of vacation/conference season, so it's not always convenient for us to touch base, be on at the same time to discuss things, or work on things together. We should be done with our respective times off in the next two weeks or so, and then we'll be back to our normal levels of productivity. Hopefully that means an imminent reboot, but that assertion is so fraught with peril that I don't even like writing it down.
A little background:
The game is written in C; most of our data is stored in flat files, which means it's mostly human-readable, and the code parses those files and dumps them into RAM at reboot. Most of the rest of the data is stored in SQL, which is basically the same procedure, except we do a 'select * from table'instead of 'fopen()'.
The remainder of the data, which is stored in neither flat files nor SQL, is the player data. The player data save file is (probably? I don't know, the core game stuff goes back to 1993 or something. Our version control history goes back to 2006. The game transitioned from 3.21 to 4.0 in January, 2001.) probably one of the oldest parts of the game. As such, per early-90s coding standards and practices, the player data file is a 4-gig binary file, which means it's highly compressed and unreadable except within the context of the game. We have 2-ish main game servers, live (Where you are) and dev (where we work and QA things, to verify it's stable and functional enough to release to live). Dev has more debug and such running, which introduces overhead, and live has a vastly larger player database. Everything else is identical, and despite dev being on a server with significantly worse specs, and larger overhead, it reboots in 1/4 of the time that live does, sheerly because of the expense of loading the entire player database into memory before it can start. Unfortunately, due to the way the existing mechanism is structured, there's no way of loading it piecemeal or on demand. It's slow and bulky, and there's no good way of making it not be so.
The killer aspect of the player db, however, is that each player 'file'(really, more an allocation of space within the larger player file) has a set length. The structure of each character is defined and accounted for down to the per-ASCII character level. In its current state, each character occupies exactly 87688 bytes of hard drive space during reboots, and 87688 bytes of memory when the game is running. This is not trivial- The first character that exists in the database (Vassago) occupies the 1st to 87688th bytes in that player database- the second character created occupies bytes 87689 through 175376, and so on. Each character's total existence is contained within the 87688 bytes following the [# of pre-existing players at their creation * 87688]th. Perhaps you've guessed what this means- It is impossible to add any kind of new information to characters without completely rewriting the player database.
This is not a trivial matter. I've been working on this problem since at least 2011- The game's code is littered with comments about (TODO: This works this way for now, but it would be cool to add feature X after we upgrade the player db). This migration requires rewriting all references to all data structures that reference 'players' (which also technically includes NPCs), writing a migration process, QAing the hell out of it, and then doing so. This process has occurred exactly twice in the 20 years the game has existed, and you can imagine why.
However, I'm currently working on some stuff that will make this more feasible in the future. That's where June, July and the first bit of my August went. There's too much cool stuff I want to add that I can't until this happens- new spells grantable to players, more notify slots, more ignore slots, affects being aware of the skillpower used to cast them and who cast them, and dozens of more such things. I'm excited enough about this that I've learned a few new languages and core technologies to facilitate it. Now that you know where my time's gone, you can be too!
The downside is, because I'm massively overhauling how all of the game code is organized, this work can't be done piecemeal- I'm ripping everything out by its roots, replacing it wholesale, and until that multi-month process is done, nothing else can even be accomplished. The first step was testing the baseline on dev- this resulted in dev being down for 5 weeks. The good news is, it's nearly done.
We're now at the point where dev is up and working and everything is back to normal. Things load on reboots, meaning the universe exists, and save between reboots, meaning the universe continues to exist. Never take this for granted. The next step is to create a new clone of live, perform the same process to live that I did to dev, and then make sure live works. For bad reasons, the way that live is organized is completely different from how dev is organized, so live-clone won't work for a while. But it'll get better.
Once we've verified that the dev -> live-clone process works, then we can port that to the real live server, take the server down for a while (hour(s)?) And when we come back nothing will look like it's changed, despite everything being completely different. Changes that players are explicitly supposed to not notice can be tricky to communicate.
We receive a lot of benefits from these changes. Instead of propagating changes via janky shell scripts between servers, which has occasionally caused significant loss of code progress, everything is now handled via git. This has tremendous positive implications. All of the shell scripts have been rewritten in Python, which means they're both maintainable by people other than me (barely) and are significantly faster. While I was in there, I fixed big chunks of infrastructure code that had been broken for decades, which doesn't have any meaningful implications for you guys, but will save me a lot of time and make rebooting much less anxiety-causing.
When I returned about 15 months ago, post-ownership change -> Hez+Kayde, I found that a bunch of the existing systems we'd been used had been abandoned or destroyed, either because they wanted to do something different, or as part of their destroy-all-game-data-and-the-game plot. This tabula rasa was a great opportunity, and we've run with it. We now use Trello boards for tracking issues and our respective progress on them, which has given us the ability to do proper QA for the first time in the game's history. This is fantastic. We use Slack daily, and voice chat on Mumble a few times a week, while almost entirely abandoning email for our internal communication. Slack is a very cool tool, and has tons of hooks and interplay with other tools. One of the coolest things is bots; it allows for blind CURLs, so any system which is capable of sending web requests can send data to/from slack. Not only are we informed when something happens in github/trello, i spent a few hours tossing together an asychronous multithreaded curl library for the game, so we also are alerted in Slack whenever the game crashes, starts compiling, is ready to accept connections, or whenever a script we're monitoring is interacted with by a player in the game. Every time someone gives the romantic a manual, it shows up in our Slack. This system works pretty well, so I'm considering adding new input channels for pray, novice, etc. All of this is possible due to recent infrastructure upgrades. Ideally, the next would be overhauling the much-hated helpdesk system and fixing all of the donation automation. Each piece builds off of the pieces laid before it, even if they don't necessarily seem related. It's tremendously boosted both our productivity, and honestly, morale- it's great no longer being constrained by ancient, poorly designed systems.
There's... A lot of other stuff. While we wait for each other, we work on little side projects which are ideally small enough to not occupy very many QA resources. These are smaller things, like new items or bosses- potentially perilous, but not capable of crashing the game or wiping the player database or anything so terrifying. Keep in mind, this information channel is not the changelog, meaning they're not qaed, and therefore not guaranteed to be in for reboot, or necessarily ever:
- I started poking around personal mounts again. Apparently I wrote almost all of the infrastructure to permit them to level, didn't finish it, and then forgot about it. Still working on that.
- All of the mayor stuff is somewhere between 'funky' and 'totally broken', so I came up with a plan to overhaul it, which should make it cooler, more interesting, streamlined, and way less buggy. I spent 2 hours writing new code, and then lost it on the plane because I somehow forgot to save my work. Annoying, but cool enough to suffer repeating the work very soon.
- Overhauled totems. Most sucked, and specifically weren't activating anywhere near frequently enough. Coyotes are cooler now.
- Added a dumb little mini to allow orc pursuer quests to be intentionally failed so another can be obtained. Time to start boning up on your orcish linguistics notes!
- Started working on a new mini in an abandoned clanhall, with some neat mechanics. It's really fun, but it's time consuming to make and will be weird to QA, so it probably won't make it in short-term.
- Decided that low level gameplay could use some sprucing up in general and be made more interesting + appealing to people that haven't spent the time necessary to find all the awesome archon content. This is kind of two-fold in related ways: All of the classes that had non-exclusive class-specific weapons got new class-specific weapons. For example, Rogue, Bard and Psionic all were encouraged to use Dagger, so i removed the (boring) dagger bonuses from Bard and Psi, and now their encouraged weapons are Hook and Whip, respectively. To make this more fun and engaging, I ran the numbers on various weapons and filled in the gaps for all affected weapons, so all weapons that classes are encouraged to learn should have a new, readily available weapon every ~11 levels, with interesting powers and scripts. I'm pretty excited by this. Shamans get mauls! We have 20 special new mauls!
Stuff is coming. You won't notice most of it, and it's not so much 'changes' as it is 'catalysts'. The new changes that they'll enable are going to be fantastic. I've also tossed in some bones, like fixing the third defense bug and researching the quest-assignment oddness, because you guys have been waiting a long time and I appreciate your patience. This is the first of hopefully many, hopefully consistent updates that aren't strictly limited to gameplay changes. Thanks for reading.
- Reznov likes this
Posted 04 September 2015 - 08:37 PM
Thanks for the update. I don't envy the guy who inherits a 25-year-old codebase and has to make improvements. I threw up in my mouth a little hearing about the player database. Good luck getting that fixed without screwing us all.