One of the core responsibilities of Exchange Server administrators is to bring back the server and services online in the least possible time after a disaster or any other incident. To do this task perfectly, administrators need to be prepared in advance. For this, they can prepare or follow a recovery plan that can help restore the company’s services in the least possible time, and minimize the downtime and data loss as much as possible. The recovery plan can also help in achieving their company’s Recovery Point Objective (RPO) and Recovery Time Objective (RTO). In this article, we will discuss how administrators can create a comprehensive Exchange Server recovery plan to bring back the server and services online in case of a disaster.

Here are some tips and guidelines that can help create a perfect Exchange Server recovery plan.

1. Understanding the exchange environment

To prepare a recovery plan, organizations need to first understand their Exchange Server environment and configuration. An on-premises exchange server can either be installed as a physical server or a virtual machine. If an organization has a physical server installation, they would need a physical server with the same specifications, make and model, to fully recover from a disaster. Any difference in the hardware or configuration might hinder the restorability. In the case of a virtual machine, there could be server resilience at the hyper visor where the server’s data and compute can be transferred to another node. If not, then security leaders can shift the virtual machine on another hyper visor.

Another option is the hybrid model, where Microsoft 365 and Exchange Server act as one. If something happens to the Exchange Server (physical or virtual), you can change the MX records of the domain to point to Microsoft 365. The mailboxes which were moved to the cloud will continue to work. However, the users whose mailboxes were on the local Exchange Server will not be able to view past emails until the server will be recovered via the backup or recovery operation.

Security leaders can also set up Database Availability Group (DAG), which is a cluster or group of Exchange Servers where you can have copies of the databases on different servers. This will give high availability to the server and service. However, the configuration needs to be setup with geo-located servers so that if something happens to the building, the data and services will continue to work. This is a high maintenance setup and needs to be monitored and checked on a daily basis.

2. Backup strategy

The backup is highly important when it comes to Exchange Server recovery. There are three types of backups:

  • Full Backup: Full backup is the best type of backup for the Exchange Server as it takes a full copy of the server and data. However, this means that it will take a considerable amount of time to finish and will be heavy on resources on the destination that will take the backup data. Before choosing this type of backup, security leaders should consider the cost of storage and the frequency of backup.
  • Incremental Backup: Incremental backup takes a full backup on the first day and then takes the changes on the rest of the days until the retention kicks in. In this, each backup is dependent on each other. When an incremental backup is taken, it backs up only the changes from the previous backup. This means if something happens to any snapshot before the date of restore, the data cannot be recovered. Although this takes less time to finish and less storage, it should be checked and tested on a daily basis.
  • Differential Backup: Differential backup works the same as incremental backup but for each differential backup, it backs up any changes done from the full backup on each backup taken. This means that the differential backups are not dependent on each other and all backups are dependent on the full backup. If something happens to the full backup, no data can be recovered.

In addition, security leaders should also consider the backup application. There are many backup applications available in the market. Choosing the wrong backup solution can result in restore failure. So, they should consult with the vendor and verify that the solution is compatible with the operating system and the Exchange Server installed.

Security leaders should test the backups daily and also document the restore tests and annual full-restore test. This will ensure that in case of a disaster, the server will be restored. Such drills are mandatory for companies to comply with regulations and legal obligations and should be fully documented with the results. These tasks should be planned outside office hours and should be approved by higher management.

3. Disaster recovery documentation

The server installation documentation needs to be a live document where any changes and the actual installation, configuration, and media are documented. This will help in recovery in case of a disaster.

Information, like the below, would need to be listed:

  • The internal contact list of the IT staff and system admins.
  • External vendors or third-party support contact information.

These people need to be available all the time so that if something happens, they will be contacted immediately.

5. Communication plan

The company should have an incidence response plan where a notification is sent to the stakeholders which are listed in the contingency plan. These contacts should be notified when an incident occurs. Regular updates about the developments should also be provided to the stakeholders and the members of the staff.

6. Monitoring and maintenance

The systems, operating system, network, and storage usage should be monitored by a monitoring tool and alerts should be set. Daily health-checks should be included, along with a checklist for the health of the server and the databases.

Patching and updates should be installed on a monthly basis. Patches and cumulative updates should be applied to the server during a stipulated maintenance window and approved by the business, after a full backup is taken. For health checks and other maintenance work, security leaders should always consult the Microsoft documentation.

7. Tools and resources needed for data recovery

In case of server failure, where the operating system and Exchange Server is still intact but the database or transaction logs are corrupted, there should be a procedure and a documented plan of action. In such a case, security leaders can use ESEUtil command to recover the database. For this, security leaders first need to check the state of the database with the /mh option. If the database is in Dirty Shutdown state, it means the database or transaction logs are corrupted. Organizations can run the ESEUtil to perform a smooth recovery or a hard recovery. The hard recovery should be avoided and only used as a last resort as it will purge anything which is deemed as corrupted, including the false positives. Once the database state is in the Clean Shutdown state, the database can be mounted. If the above fails, then they should restore from the last backup.

If the server is in an unrecoverable state, the server needs to be rebuilt. The Exchange Server configuration, apart from custom send/receive connectors and certificates, is all stored in the Active Directory schema. With the recovermode option, users can install the same Exchange Server version with the same computer name, IP Address, server operating system, storage and drives. Then, the data can be recovered from the failed server. If no damage has been done on the databases or transaction logs, the database should mount.

Third-party tools, such as Stellar Repair for Exchange, can be used for speeding up the recovery process and restore the services at the earliest with minimal impact. This tool can open the corrupted database and present the full structure of the database. Users can granularly select the mailboxes, archives, deleted/purged items, shared mailboxes, disabled mailboxes, and public folders, and export them directly to a new live database with automatic mailbox matching, priority exports, and parallel exports. Security leaders can also export the EDB data to PST or a Microsoft 365 tenant.

8. Post-Recovery analysis

After the recovery is done and the data has been restored, security leaders should analyze what went wrong and how the process can be improved. With this, the disaster recovery document and processes can be improved/updated and approved by the stakeholders. Apart from the documents, they should also provide advanced training to IT staff and Exchange administrator to better equip and prepare them when it comes to troubleshooting in case of disaster.

Conclusion

No one knows when a disaster might strike. So, it is important to have a recovery plan in place. The procedures and documentation need to be constantly updated and improved. This ensures, when a disaster strikes, the company will have the right plans and actions to restore the services as soon as possible and with minimal impact. Security leaders should also review the documentation and procedures, test the backups, and maintain a healthy system. It is also suggested to keep an Exchange recovery tool in hand that can quickly and easily recover data from corrupted databases.