Mr. Mistake
It has been a tough week. The team is all hands on deck to reach a project milestone. Most of my time went from coordinating work and impediment removal to hands-on development.
The task at hand for me was to set up data in all environments. With a script tested and ready for promotion, I deploy it to QA. Inserting hundreds of records into several core data tables, knowing that shortly there will be many system-generated records related to this core data from other jobs that run within the system. As I reviewed the data in QA I noticed an issue I had not seen before. Once the oddity was understood and a fix was in mind. I wanted to back out the change for a fresh deployment and move out of the way of other developers on the team.
From other data sanitization activities, I knew there was a script that I could reuse to clean up QA. I began to place pressure on myself to get out of the team's way as quickly as possible. This was my first of many mistakes. After copying the cleanup script into the database client application, I performed a quick audit of the script. Modifying the variables to un-do the data I had in QA. Everything appeared to be good enough to run in QA. Another quick scan of the script to confirm everything was in order. Then, click the button to execute the cleanup script.
I moved the cursor from the button when I saw my grave mistake. As mentioned earlier, my script was to insert hundreds of core data records. There are several tables that contain system-generated data from these core records. But, what did I see that was an issue? Three little truncate table
queries, wiping all system-generated data. As the script is completed, it's too late for me, and the data which supports 98% of the value proposition.
My initial response was to confirm that all the tables were wiped clean. They were. Next, I thought it be very important to communicate this potential impediment to all the other team members, as they may be depending on the data that is no more, then communicate how the QA system would be broken. At least until I would be able to restore all the data. It wasn't all frustration, there was one engineer who applauded my mistake with a big grin. Clearly, he knows what it's like to make mistakes.
Since, all the data that was deleted is system-generated, restoring it is not a huge issue. There is a sequence of ETLs and scheduled jobs that will repopulate everything. Until the data is restored, there are several team members unable to test core functionality in the final days of this milestone push. By the next morning, all the data was back in QA, removing the roadblock and opening the passageway to production. You may be asking, "Other than humor, why tell this story?". Because it happens to all of us. Every single one of us has faults and oversights that result in these silly mistakes. Impacting the lives and work of others. My writing this down serves as a reminder that fast is not always better. In this example, fast caused me to slow everything down greatly. Thirty seconds to a minute more of reviewing that script could have saved the team three or more hours of cleanup. When you feel the pressure do not forget to stop, slow down, or be thoughtful. Quick decisions have the potential to lead to long recovery periods.