At the place I work, we experienced continuous “The ramdisk ‘root’ is full” issues on our vSAN ESXi nodes.
The first thing we did was to raise a support call and have vmware check what is filling up the ramdisk.
Support suggested that we need to limit the size of vsantraces to 200MB, and pointed to the below KB Article
https://kb.vmware.com/kb/2150320
This puzzled me, as the vsantraces was not full.
1 2 3 4 5 6 7 8 9 10 |
Ramdisk Size Used Available Use% Mounted on root 32M 32M 0B 100% -- etc 28M 5M 22M 18% -- opt 32M 368K 31M 1% -- var 48M 728K 47M 1% -- tmp 256M 492K 255M 0% -- iofilters 32M 0B 32M 0% -- hostdstats 1553M 17M 1535M 1% -- snmptraps 1M 0B 1M 0% -- vsantraces 300M 167M 132M 55% -- |
I kept digging and I found out that the scratch partition on the hosts was not pointing scratch -> /tmp/scratch
but it was on / instead.
1 |
drwxr-xr-x 1 root root 512 Sep 30 11:00 scratch |
I have changed the ScratchConfig.CurrentScratchLocation under Advanced Settings, but the change did not persist after a reboot.
Raised another call with support, and after escalating to a senior engineer, we have been pointed to a new KB Article
Seems after upgrading the hosts using a custom HPE ESXi 6.5U1 image, we ran into the same issue, as the Dell EMC custom image
that the article points out.
Checked the drivers and they re were there, even though the card is not in use.
1 2 3 4 5 6 |
esxcli software vib list | grep elx elx-esx-libelxima.so 11.2.1238.0-03 ELX VMwareCertified 2017-09-04 elxiscsi 11.2.1238.0-1OEM.650.0.0.4598673 EMU VMwareCertified 2017-09-04 elxnet 11.2.1149.0-1OEM.650.0.0.4240417 EMU VMwareCertified 2017-09-04 emulex-esx-elxnetcli 11.1.28.0-1.26.5969303 VMware VMwareCertified 2017-09-04 |
The solution to the issue is as follows:
- Stop hostd (disconnects the host from vcenter)
-
12345/etc/init.d/hostd stopwatchdog-hostd: Terminating watchdog process with PID 70699hostd stopped.
-
- Remove the below drivers
12345678910111213esxcli software vib remove -n elxiscsi -n elx-esx-libelxima.soRemoval Result:Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.Reboot Required: trueVIBs Installed:VIBs Removed: ELX_bootbank_elx-esx-libelxima.so_11.2.1238.0-03, EMU_bootbank_elxiscsi_11.2.1238.0-1OEM.650.0.0.4598673VIBs Skipped: - Start hostd (host gets connected back to vcenter)
123/etc/init.d/hostd starthostd started. - Configure ScratchConfig.ConfiguredScratchLocation field to /tmp/scratch in Advanced System Settings
- Reboot the Host
the issue is resolved and scratch is persistent to /tmp/scratch
1 |
lrwxrwxrwx 1 root root 12 Oct 3 15:29 scratch -> /tmp/scratch |
Of course, that wouldn’t be an issue if ESXi was installed onto magnetic disks, or if scratch was redirected to a Datastore (shared storage)
these ESXi nodes form a vSAN cluster where, vSAN is the only Datastore.