Getting Data In

Is splunk supported forwarding log data from an IBM GPFS mount point?

sseekamp
Explorer

We are running a small GPFS cluster on AIX. I am seeing high CPU usage running a universal forwarder pointed at log files on the GPFS mount point.

1 Solution

dwaddle
SplunkTrust
SplunkTrust

Is Splunk itself running on the GPFS, or is it on a JFS/JFS2 and simply reading files from GPFS? Splunk has specific requirements for filesystem types related to its own data (index) storage - but I don't think there is a specific support policy about what filesystems Splunk can monitor.

I can see where the splunk filesystem monitor functionality could have a negative impact on GPFS. There is a high volume of stat(2) system calls, and (by default) it will recurse through the directory structure. Depending on how big your GPFS is, and from what level on the tree you have Splunk configured to monitor - the number of stat(2) calls could be substantial. And, of course, stat(2) is a filesystem metadata operation - which on GPFS could require additional processing like communicating with the other GPFS servers to get updated metadata.

There's mention in this developerworks document about various GPFS tuning options. There appears (on first sight) to be more than one that could have an impact on Splunk's interaction with GPFS.

My advice would be to be sure of just how much of the GPFS you're trying to monitor with Splunk and try to get IBM GPFS support to help with tuning advice. Their defaults may not be appropriate for software like Splunk.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

Is Splunk itself running on the GPFS, or is it on a JFS/JFS2 and simply reading files from GPFS? Splunk has specific requirements for filesystem types related to its own data (index) storage - but I don't think there is a specific support policy about what filesystems Splunk can monitor.

I can see where the splunk filesystem monitor functionality could have a negative impact on GPFS. There is a high volume of stat(2) system calls, and (by default) it will recurse through the directory structure. Depending on how big your GPFS is, and from what level on the tree you have Splunk configured to monitor - the number of stat(2) calls could be substantial. And, of course, stat(2) is a filesystem metadata operation - which on GPFS could require additional processing like communicating with the other GPFS servers to get updated metadata.

There's mention in this developerworks document about various GPFS tuning options. There appears (on first sight) to be more than one that could have an impact on Splunk's interaction with GPFS.

My advice would be to be sure of just how much of the GPFS you're trying to monitor with Splunk and try to get IBM GPFS support to help with tuning advice. Their defaults may not be appropriate for software like Splunk.

dwaddle
SplunkTrust
SplunkTrust

And as always, if the answer is useful please upvote/accept - thanks!

0 Karma

halr9000
Motivator

I just manually accepted this old answer for ya @dwaddle

dwaddle
SplunkTrust
SplunkTrust

Also, make sure you are not recursing too deeply un-necessarily. From what I understand, even if you blacklist a directory, Splunk 4.2 will still recursively readdir() and stat() down through it. It will exclude the files, but not without at least enumerating them first. Depending on how you have your monitor:// stanzas defined, they could be doing much more I/O than you had previously expected.

0 Karma

sseekamp
Explorer

Thanks dwaddle - we are running the forwarder off of jfs2 on AIX and just watching gpfs mount points. That's good to know on the stat calls. I will look into tuning that area. Good info!

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at ...

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...