The other
day we experienced an incident on the SAN storage with high latency and
even loss of connection to the SAN. This can generate a lot of really
unpleasant errors on the ESX hosts. Even after the SAN is brought back
to a stable state we've seen hosts that won't boot, VM's that won't
vMotion and VMs that won't power on due to file locks.
If
you receive a 'locked file error' (like screendump below) and your VM
won't boot there are a couple of ways to go about it. This VMware KB article
explains it quite well. Either you can cold migrate the VM to the
other hosts in the cluster (to find the ESX host with the lock) and
then try to boot it from there or you can try to locate specifically
which host has the lock.
If the vCenter log does not tell you specifically which files are locked, this can be viewed in the vmware.log which is located in the VM folder. If you just tried to power on the VM, then relevant info should be at the end of the log file.
In the example below, it is the swap that is still locked.
This can be verified by running the touch command on the locked file.
With vmkfstools you can get the mac address that has the lock:
# vmkfstools -D /vmfs/volumes///
In the screendump below, the MAC address has been highlighted.
The same info can be found in the /var/log/vmkernel log
Once
you have the MAC address you can find a match by, for example, logging
in to vCenter or onto the Blade enclosure. When you have a match, cold
migrate the VM to the relavant ESX host and boot it.
No comments:
Post a Comment