Hey everyone. In our company we're HEAVILY discouraged from using NFS because it's significantly less robust than our enterprise SAN (the SAN can lose several chassis with no effect on data access or integrity). Also, NFS has significantly more latency than a direct clustered filesystem. As such, our storage team asked that we implement search head pooling by utilizing a clustered filesystem (gfs2). Everything works fine with the clustered filesystem, but splunk keeps spitting out errors about being unable to achieve lock on the following:
Error in search head pooling validate-quiet: Failed to lock /splunk/etc/users/testpath with return code -1: No such file or directory There was an error validating your search head pooling configuration. For more information, run 'splunk pooling validate' Error fixing dangling data: Failed to lock /splunk/etc/apps/sentinel.txt with return code -1: Success There was an error preparing your conf files for search head pooling. For more information, run 'splunk btool find-dangling'.
When I run splunk pooling validate I get the following:
[root@bcscer-chi-s1 ~]# /opt/splunk/bin/splunk pooling validate
Error in search head pooling validate: Failed to lock /splunk/etc/users/testpath with return code -1: No such file or directory
I opened a support ticket but want to check. Previously I recall seeing search head pooling as being compatible with clustered filesystems but I can't seem to track that down now. I could just be imagining things 🙂
no, only nfs and CFIS (Samba) is currently supported for search-head pooling.
I tackled this by creating a replicated glusterfs volume for all my peer nodes, and then locally mounted on each node via nfs. glusterd does it's thing in the background and Splunk just deals with the nfs share. With the nfs share being locally mounted, I was able to take aggressive mount options for rsize/wsize. On top of all that, I'm using mode=6 bonding (balance-alb) for the nics.
Clustered filesystem locking semantics have always been tricky. While not being officially supported, you might consider alternative clustered filesystems, like OCFS2, GPFS, or Veritas Cluster Filesystem. Any experience you can gain from getting them to work might be useful to Splunk from the perspective of figuring out what clustered filesystems to test and certify.
no, only nfs and CFIS (Samba) is currently supported for search-head pooling.
You are right about flock() calls. Splunk is using flock() signals for many reasons currently. Simply disabling it will cause other issues. To support gfs/gfs2 which does no t work general fslock() due to clustering, codes needs to be changed. Agree with filing an Enhancement Request.
Splunk really needs to support some kind of enterprise solution. We can't get the performance we need out of NFS and we were hoping to try Veritas CFS, which is similar in operation to GFS2.
This is a shame - it really should be supported. We're going to open a feature request with our support rep and see if they can at least provide a way to disable their file locking since that seems to be what causes the issue.