ESXi 5.x hosts might show the following symptoms
- Hosts might disconnect from Virtual Center
- Attempting to reconnect the hosts back might fail
- Hosts may/maynot respond to vSphere Client as well
- Rebooting the hosts might resolve the issue temporarily
Checking the host
- Does not show any network errors
- Hosts would be pingable
- Other hosts in the cluster might remain unaffected
- Virtual Machines are not affected and continue running
Checking the /var/log/vmkernel.log might throw up the following errors:
WARNING: VisorFSObj: xxxx: Cannot create file /var/spool/snmp/xxxxxxxx_x_
WARNING: VisorFSObj: xxxx: Cannot create file /var/run/vmware/tickets/vmtck-
So far so good. So we have a process running that’s causing the ESX host to run out of inodes. Not really a good thing at all.
The KBs below gives more of a reading material regarding inodes
Checked whether we were running into the “MAINSYS” issue and we were not
df -h did not show that the host was running out of space.
Checking the “/var/spool/snmp/” mentioned in the logs pulls up some interesting observations
- The entire “/var/spool/snmp/” was filled with hundreds of .trp files
- As the inodes/RAMDISK was full, an “rm -rf *.trp” failed with an “argument” error
- Tried removing one file and that worked
- Tried using “Tar” to tar all files in the folder and that fails as well
So how to we delete hundreds of files from a folder if no batch mode operations are allowed to run?
Downloaded and installed “WinSCP” on a test box and connected to the host using the same. That allowed a “cut” “paste” and “delete” operations by which multiple files could be selected and operations run!
Deleted all the .trp files and restarted the management services (services.sh restart)
This time the host did not complain about “inodes” – always a good sign! Tried connecting the host back to the Virtual Center and this time the same connected fine.
On trying to analyze the reason for the “.trp” files filling up the filesystem, we observed the following
- Hosts affected by the issue were Dell R710′s
- All hosts were configured to use Dell Open Manage and that was what looked like the process that configured the snmp in the first place.
- Checked with Dell and were given the following SNMP file changes to perform on the host
<?xml version=”1.0″ encoding=”ISO-8859-1″?> <config><snmpSettings><enable>false</enable>
Changes the above on all hosts and so far none of the hosts have shown the inode tables getting filled up! Always a good sign!
Additional information from Kyle (thank you Kyle, and keep those feedback coming in!)
I hit this issue today but didn’t use WinSCP to remove them. I just cd’d to /var/spool/snmp and ran this simple ‘for’ loop.
for F in `ls *trp` do rm $F done
And it cleaned them all right out.