Splunk Indexers — ext4 vs XFS filesystem performance

gjanders · ‎08-12-2024

Summary

While I did not initially set out to benchmark filesystem performance on our Linux-based Splunk enterprise indexers, we ended up doing so while striving to optimize the indexing tier’s I/O performance.

Based on previous Splunk .conf presentations the idea was to switch from ext4 to XFS to maximise disk performance. However, after changing to XFS the I/O performance decreased rather than increased over time.

The Splunk-based indexer workloads tested included around a million searches/day and ingestion of around 350GB data per indexer per day. The ext4 filesystem consistently outperformed XFS in terms of the introspection measure “avg_total_ms” on multiple indexer clusters.
What caused a more significant performance impact was maintaining 20% free disk space versus 10% free disk space.

Measuring Linux I/O performance

There are multiple ways to measure I/O in Linux, here are a few options I have used.

iostat

Refer to Digging Deep into Disk Diagnoses (conf 2019) for an excellent discussion on iostat usage.

Pro’s

Very flexible
Provides all required statistics accurately

Con’s

You may need to use per-second measurements so you do not “miss” the spike in latency which is affecting indexing
iostat is a great CLI utility, however you need to get the data into another tool to graph/compare

Linux kernel I/O statistics

As per the kernel documentation for I/O statistics fields the /proc/diskstats file is used by iostat to measure a difference in the I/O counters.
Assuming you have iostat running for a period of time you can compare the counter values to the previously seen counter value. This is why the first iostat output is from system boot time unless the -y flag is used.

Splunk Add-on for Unix and Linux

Pro’s

Easy to setup
Runs iostat as a shell script

Con’s

This addon measures iostat data incorrectly as it doesn’t keep iostat running, I have logged idea https://ideas.splunk.com/ideas/APPSID-I-573 about this issue.
I have also advised developers/support (in detail) of the issue via a support case in 2023, but as of July 2024 I do not believe the issue is resolved.

Metricator application for Nmon

The Nmon utility appears to result in accurate I/O data. However, the measurements are often “different” from iostat. For example the disk service time is the average service time to complete an I/O, it is similar to await or svctm in iostat but it is a larger value in Nmon (it does correlate as expected)

Pro’s

Metricator provides useful graphs of the I/O data
Accurate data

Con’s

Measurements are “different” to other utilities
May have “too much” data for some

Splunk’s _introspection data

Splunk enterprise records I/O data in the _introspection index by default, this data correlated with the Nmon/iostat data as expected. At the time of writing I did not find documentation on the introspection I/O metrics.

In Alerts for SplunkAdmins I have created the dashboard splunk_introspection_io_stats to display this data, there are also views in the Splunk monitoring console.

Measurement summary

Measurement tool of choice

Nmon and _introspection were used, Splunk Add-on for Unix and Linux provided metrics that did not match the iostat data or Nmon/_introspection data. Therefore this addon’s results were were not used.

Variation in I/O performance

Splunk user searches will change I/O performance, in particular SmartStore downloads or I/O spikes changed disk service times.
You can use the report “SearchHeadLevel — SmartStore cache misses — combined” in Alerts for Splunk Admins for an example query, or the smartstore stats dashboard.

Per-server variance

I/O performance also varied per server irrelevant of tuning settings, for an unknown reason some servers just had “slower” NVMe drives than others (with a similar I/O workload).

Choice of measurement statistic

There are many statistics for disk performance in the _introspection index, we have data.avg_service_ms (XFS performed better), data.avg_total_ms (ext4 performed better).

With the Nmon data, DGREADSERV/DGWRITESERV were lower on ext4 and this correlated with “data.avg_total_ms” from the _introspection index in Splunk. Furthermore, this seemed to correlate with the ‘await’ time reported in iostat.

Additional measurements

DGBACKLOG from Nmon was lower (back log time ms) on ext4, however disk busy time was higher. ext4 also resulted in more write and disk write merge operations.
The total service time for an I/O operation was consistently lower under ext4 vs XFS, thus the recommendation and choice of ext4 going forward.

Filesystem tuning & testing

/etc/fstab settings

ext4 — noatime,nodiratime (also tested with defaults)

XFS — noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,nobarrier (also tested with defaults)

Changing filesystems

To switch filesystems I re-formatted the partition with the required filesystem (a complete wipe), I let SmartStore downloads re-populate the cache over time.
Metricator/nmon along with Splunk’s _introspection data was used to compare performance of the filesystems on each server.

Performance improved (initially) after the switch to XFS, however it was later determined that the performance improvement related to the percent of the partition / disk that was left free.
There was a noticeable increase in response times after the partition dropped below 20% free space towards the 10% free set in Splunk’s server.conf settings.

Keeping 10% of the disk free is often recommended online for SSD drives, we increased our server.conf setting for minFreeSpace to 20% to maximise performance.

Server setup

All servers were located on-premise (bare metal), 68 indexers in total.
4 NVMe drives per server (3.84TB read intensive disks), Linux software raid (mdraid) in RAID 0 was used.
The total disk space was 14TB/indexer on a single filesystem for the SmartStore cache and indexes/DMA data.

Ext4 vs XFS graphs

The graph below depicts, for an equivalent read/write workload the “average total ms” value, which I’ve named “average wait time” in the graphs.
I’ve taken the total response time (sum) of the 4 disks on each server across multiple servers. I also tested alternative ways to measure this value, such as perc95 of response times across the 4 disks. ext4 appeared to be faster in all cases.

Average wait time for ext4/XFS (30 days):

Read/write KB for ext4/XFS (30 days):

The below graph depicts a similar read/write workload with a 24 hour timespan:

This graph shows reads/writes per second, ext4 does have more writes per second in some instances, however XFS has longer wait times.

Average wait time and IOPS for ext4/XFS (24 hours):

What about OS version changes?

While I did not keep the graphs as evidence the general trend was a newer kernel version resulted in lower service times on ext4.

Cent OS 7 / kernel 3.10 generally had lower performance than servers running Redhat 8.5 / kernel 4.6.x. This in turn was slower than servers with Oracle 8 / kernel 5.4.x UEK

I did not have enough data to draw a conclusion, but there was a definite trend on the servers with newer kernel versions having lower latency times at disk level.

Conclusion

The ext4 filesystem for our Splunk indexer workload, which involved over 1 million searches day and around 350GB/data/day per indexer was generally faster than XFS in terms of the avg_total_ms measurement.
What made a greater difference in performance was leaving 20% of the disk space on the filesystem free, this applied to both ext4 and XFS.
Finally, newer kernel versions appear to also improve I/O performance with ext4, this comparison was not done with XFS.

If you are running a Splunk indexer cluster I would suggest testing out ext4 if you are currently using XFS. Let me know what you find in the comments.

This article was originally posted on medium, Splunk Indexers — ext4 vs XFS filesystem performance

Splunk Indexers — ext4 vs XFS filesystem performance

Summary

Measuring Linux I/O performance

iostat

Pro’s

Con’s

Linux kernel I/O statistics

Splunk Add-on for Unix and Linux

Pro’s

Con’s

Metricator application for Nmon

Pro’s

Con’s

Splunk’s _introspection data

Measurement summary

Measurement tool of choice

Variation in I/O performance

Per-server variance

Choice of measurement statistic

Additional measurements

Filesystem tuning & testing

/etc/fstab settings

Changing filesystems

Server setup

Ext4 vs XFS graphs

What about OS version changes?

Conclusion

Now Playing: Splunk Education Summer Learning Premieres

The Visibility Gap: Hybrid Networks and IT Services

Get Operational Insights Quickly with Natural Language on the Splunk Platform

Are you a member of the Splunk Community?

Splunk Indexers — ext4 vs XFS filesystem performance

Summary

Measuring Linux I/O performance

iostat

Pro’s

Con’s

Linux kernel I/O statistics

Splunk Add-on for Unix and Linux

Pro’s

Con’s

Metricator application for Nmon

Pro’s

Con’s

Splunk’s _introspection data

Measurement summary

Measurement tool of choice

Variation in I/O performance

Per-server variance

Choice of measurement statistic

Additional measurements

Filesystem tuning & testing

/etc/fstab settings

Changing filesystems

Server setup

Ext4 vs XFS graphs

What about OS version changes?

Conclusion

Now Playing: Splunk Education Summer Learning Premieres

The Visibility Gap: Hybrid Networks and IT Services

Get Operational Insights Quickly with Natural Language on the Splunk Platform