Rohit Kumar Ankam

Mail Server Outage

On 12th August 2022 I faced a Service disruption in my mail server due to some human error. here is the timeline.

TimeLog
09:57Started usual Maintenance work
10:19Backup Before starting Updates
10:27Started Updates.
10:54Initial Service Disruption
11:11Rollback to Previous Version
11:29Service Is Live

35 minutes of service disruption recovered with zero data loss.

now I want to learn about Zero Downtime deployments. If you know any learning resources about Zero Downtime deployments. Please share me those on my Twitter or at email [email protected]

Cause and fix

After the upgrade finished I noticed some weird behavior. I am unable to send or receive emails. Then I dug into logs and found that I messed up Postfix configuration while upgrading packages on the server. After I found that I restored the previous version’s configuration and that fixed the problem.

what I learned from this?

From this incident, I learned how important it is to have a good backup and monitoring plan. but Before anything bad happened I am maintaining a good backup and maintenance schedule. that one decision that I made at the start of the mail server project saved me now.

Tags:

* This post is licensed under CC BY-SA 4.0