The new application, the one I've been working on for a year and a half, went live at 8:00 AM Saturday morning. All initially seemed good, and the customers were happy; their first pass showed no major problems. By Monday, though, I was certain something had gone terribly wrong.
It started innocently enough. A few users reported issues on specific pages, with specific data. Things like "this page gives me an error when I do X" and "I can't see the notes for item Y". We tracked them down, fixed them, and did more deployments. Most of these were "data that was expected to exist does not actually exist" type problems, which to me are typical first-day-of-prod issues, no big deal.
Then the error reports, bad ones, started rolling in. These bad things caused me, my boss, and our customers a week of debugging hell of such a magnitude which I had never experienced, and hope to never experience again. Twelve to fourteen hour days, for a week, and as I write this there is no end in sight.
I won't bore you with the technical details here. Suffice to say that through a combination of poor coding, incomplete testing (both manual and automated), and sheer bad luck, the deployment of this app was the most difficult one I've had in 12+ years of coding.
There were a few major problems with this deployment, and with development of this application in general.
Errors Should Actually, You Know, Be Logged
First, the error logging system was, due to a misunderstanding about how to call an asynchronous method from synchronous code, not actually logging any errors. Which, you know, kind of defeats the purpose.
Unluckily for us, we had our collective heads so far dug into the app fixing bug reports that we didn't notice the error logger issue until halfway through our hell week, right around the time the users reported to us, in chat, more errors than we could see being logged. Only then did we, did I, notice the problem. Which was soon followed by a string of curse words that I was so glad my children didn't hear, because they would've immediately started taking notes.
Test your error logging system, right away, repeatedly, whatever it takes to ensure that is it working how you expect it to.
This article is for paying subscribers only
Sign up now and upgrade your account to read the article and get access to the full library of articles for paying subscribers only.