Virtual machine snapshots in Intelligent Infrastructure (II)

On this page:


Overview

In Intelligent Infrastructure (II), the snapshots feature lets you capture a moment-in-time state of a virtual machine (VM) and revert the VM back to that state if needed. Snapshots can be useful during system changes; however, they are designed to be short-term fallback options, not long-term disaster recovery options, and they can have stability, reliability, or performance issues if kept for long period of time.

General use

In II, the self-service portal provides a way to create, revert, and delete snapshots on demand. Use the following instructions on how to Interact with a VM in the Intelligent Infrastructure (II) self-service portal and select the Resource Action Create Snapshot.

Only one snapshot is allowed at a time, based on VMware best practice. If you already have a snapshot on a VM, you will not see Create Snapshot but instead will see Delete Snapshot (commit changes to the VM since the snapshot was taken) and Revert to Snapshot (rollback to the state of the VM at the point the snapshot was taken).

The UITS Storage and Virtualization (SAV) team recommends using cold snapshots, where the VM is completely powered off when the snapshot is taken, whenever possible; however, you also can take a hot snapshot when the VM is up and operational. A hot snapshot has the ability to include the state of the memory at the time of the snapshot.

Note:
  • When you revert a hot snapshot, it will appear to the system like a crash took place (as if the power was suddenly pulled from the system). Upon powering back on after reverting, some operating systems will prompt for an explanation. Some may perform a (mandatory) file system check, which can add potentially hours to a boot time, depending on the size of the storage allocated to the VM.
  • While a snapshot is in place, hard disks cannot be added or increased in size.
  • When reverting to a snapshot on a VM that is joined to Active Directory (AD), re-establishing trust with AD may be needed.

Standard process

In most cases, snapshots are taken prior to system changes, such as patches, upgrades, and configuration changes. A snapshot is taken prior to making any system changes (this can be a hot or cold snapshot, depending on the needs of the maintenance and the preferences of the admin), and then the maintenance is performed.

If all goes smoothly with the maintenance and services are working as expected, then the VM admin can either immediately delete the snapshot, or keep it for a day to confirm that the system continues to operate as expected.

Note:
For large VMs (multiple TB of storage) and/or VMs with lots of data changes (high-use file servers or high-write databases), the growth of the snapshot may prevent being able to keep a snapshot for very long.

If something goes wrong with the maintenance, then revert to the snapshot, power on the system, and ensure that services are functioning as expected similar to before the snapshot was taken. Then either immediately attempt the maintenance again, or delete the snapshot and plan a new maintenance schedule to attempt the entire process again.

Retention and risks

VMware has a set of best practices for using snapshots in a vSphere environment. One of the first is that snapshots should not be kept for longer than 72 hours. Snapshots grow in size over time as changes are made to the system after the point of the snapshot. This can begin to affect performance and can potentially affect stability.

When a snapshot passes the 72 hours best practice age or passes a size threshold, a daily email will be sent to the email address associated with the RAC organization that owns the resource pool containing the VM). This email will remind the VM owners about the recommended age of snapshots and encourage them to delete it.

After that point, if the over-aged snapshot is not deleted, the SAV team may reach out again to ensure that the snapshot is deleted.

Important:
If a situation arises that causes management challenges, the SAV team may delete the over-aged snapshot without warning to avoid potential complications or performance issues, or to protect the stability of the II shared environment.

Automated snapshots

If a VM is configured to use IU's Data Protection Service (DPS) with either OSDisk or AllDisks protection level, the backup process will create snapshots on the VM automatically. VM admins are advised to not interact with these automated snapshots in any way. The backup may be missed if an automated snapshot is deleted by a VM administrator. All snapshots created by DPS will have __GX_BACKUP__ at the start of the snapshot name.

This is document bhno in the Knowledge Base.
Last modified on 2023-05-04 10:55:27.