While I did not initially set out to benchmark filesystem performance on our Linux-based Splunk enterprise indexers, we ended up doing so while striving to optimize the indexing tier’s I/O performance.
Based on previous Splunk .conf presentations the idea was to switch from ext4 to XFS to maximise disk performance. However, after changing to XFS the I/O performance decreased rather than increased over time.
The Splunk-based indexer workloads tested included around a million searches/day and ingestion of around 350GB data per indexer per day. The ext4 filesystem consistently outperformed XFS in terms of the introspection measure “avg_total_ms” on multiple indexer clusters.
What caused a more significant performance impact was maintaining 20% free disk space versus 10% free disk space.
There are multiple ways to measure I/O in Linux, here are a few options I have used.
Refer to Digging Deep into Disk Diagnoses (conf 2019) for an excellent discussion on iostat usage.
As per the kernel documentation for I/O statistics fields the /proc/diskstats file is used by iostat to measure a difference in the I/O counters.
Assuming you have iostat running for a period of time you can compare the counter values to the previously seen counter value. This is why the first iostat output is from system boot time unless the -y flag is used.
The Nmon utility appears to result in accurate I/O data. However, the measurements are often “different” from iostat. For example the disk service time is the average service time to complete an I/O, it is similar to await or svctm in iostat but it is a larger value in Nmon (it does correlate as expected)
Splunk enterprise records I/O data in the _introspection index by default, this data correlated with the Nmon/iostat data as expected. At the time of writing I did not find documentation on the introspection I/O metrics.
In Alerts for SplunkAdmins I have created the dashboard splunk_introspection_io_stats to display this data, there are also views in the Splunk monitoring console.
Nmon and _introspection were used, Splunk Add-on for Unix and Linux provided metrics that did not match the iostat data or Nmon/_introspection data. Therefore this addon’s results were were not used.
Splunk user searches will change I/O performance, in particular SmartStore downloads or I/O spikes changed disk service times.
You can use the report “SearchHeadLevel — SmartStore cache misses — combined” in Alerts for Splunk Admins for an example query, or the smartstore stats dashboard.
I/O performance also varied per server irrelevant of tuning settings, for an unknown reason some servers just had “slower” NVMe drives than others (with a similar I/O workload).
There are many statistics for disk performance in the _introspection index, we have data.avg_service_ms (XFS performed better), data.avg_total_ms (ext4 performed better).
With the Nmon data, DGREADSERV/DGWRITESERV were lower on ext4 and this correlated with “data.avg_total_ms” from the _introspection index in Splunk. Furthermore, this seemed to correlate with the ‘await’ time reported in iostat.
DGBACKLOG from Nmon was lower (back log time ms) on ext4, however disk busy time was higher. ext4 also resulted in more write and disk write merge operations.
The total service time for an I/O operation was consistently lower under ext4 vs XFS, thus the recommendation and choice of ext4 going forward.
ext4 — noatime,nodiratime (also tested with defaults)
XFS — noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,nobarrier (also tested with defaults)
To switch filesystems I re-formatted the partition with the required filesystem (a complete wipe), I let SmartStore downloads re-populate the cache over time.
Metricator/nmon along with Splunk’s _introspection data was used to compare performance of the filesystems on each server.
Performance improved (initially) after the switch to XFS, however it was later determined that the performance improvement related to the percent of the partition / disk that was left free.
There was a noticeable increase in response times after the partition dropped below 20% free space towards the 10% free set in Splunk’s server.conf settings.
Keeping 10% of the disk free is often recommended online for SSD drives, we increased our server.conf setting for minFreeSpace to 20% to maximise performance.
All servers were located on-premise (bare metal), 68 indexers in total.
4 NVMe drives per server (3.84TB read intensive disks), Linux software raid (mdraid) in RAID 0 was used.
The total disk space was 14TB/indexer on a single filesystem for the SmartStore cache and indexes/DMA data.
The graph below depicts, for an equivalent read/write workload the “average total ms” value, which I’ve named “average wait time” in the graphs.
I’ve taken the total response time (sum) of the 4 disks on each server across multiple servers. I also tested alternative ways to measure this value, such as perc95 of response times across the 4 disks. ext4 appeared to be faster in all cases.
Average wait time for ext4/XFS (30 days):
Read/write KB for ext4/XFS (30 days):
The below graph depicts a similar read/write workload with a 24 hour timespan:
This graph shows reads/writes per second, ext4 does have more writes per second in some instances, however XFS has longer wait times.
Average wait time and IOPS for ext4/XFS (24 hours):
While I did not keep the graphs as evidence the general trend was a newer kernel version resulted in lower service times on ext4.
Cent OS 7 / kernel 3.10 generally had lower performance than servers running Redhat 8.5 / kernel 4.6.x. This in turn was slower than servers with Oracle 8 / kernel 5.4.x UEK
I did not have enough data to draw a conclusion, but there was a definite trend on the servers with newer kernel versions having lower latency times at disk level.
The ext4 filesystem for our Splunk indexer workload, which involved over 1 million searches day and around 350GB/data/day per indexer was generally faster than XFS in terms of the avg_total_ms measurement.
What made a greater difference in performance was leaving 20% of the disk space on the filesystem free, this applied to both ext4 and XFS.
Finally, newer kernel versions appear to also improve I/O performance with ext4, this comparison was not done with XFS.
If you are running a Splunk indexer cluster I would suggest testing out ext4 if you are currently using XFS. Let me know what you find in the comments.
This article was originally posted on medium, Splunk Indexers — ext4 vs XFS filesystem performance
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.