I've got a entirely Windows based Splunk environment, 2 indexers and 2 search heads at the minute for redundancy.
I'm wanting to configure search head pooling to use a NAS box with SAMBA shares for the storage of the pool. Configuring the pool to use the share \nas\splunk_pool works, as does creating the users and using apps.
Search performance though is terrible, even if I move the share to the local box and use that, removing any network issue.
Anyone seen this before?
you will get bad performances with NAS/NFS share for SHP if you don't have enough IOPS(800 dedicated iops) and bandwith.
SHP will as well have searches jobs results read/written
(splunk/var/run/...) in this share...this is why you need iops.
Cheers MarioM for the prompt response.
Unfortunately, the performance issues are also seen when using a local share on the search head, removing the network bottleneck.
what do you mean by local share? internal drive? have you tried to measure the iops with bonnie++ :
We did not manage to get search head pooling to perform well at our site (Solaris,NFS), I know of at least three other customers that are in the same situation. Splunk should tell it's customers that at the moment it is not advisable to set up SHP
We are currently wrestling with issues with search head pooling ourselves. We have 3 search heads and had originally planned to put them in a GFS2 cluster using a dedicated SAN LUN shared between all 3 (>5000 IOPS). After finding out that Splunk doesn't support direct storage (which they really should, and we've asked that it be put on the roadmap), we cut the connection for two of the servers, and now have a single server sharing the mounted lun via NFS. With 128 NFS instances and client connections maxed out on other search heads, it still isn't performing well. Support has been helping.
We couldn't make SHP work either.
If you have a large number of app and users 800 IOPS (well, NFS OPS) won't be enough. We figure about 2000 NFS OPS sustained, bursting to 4000 would do it for us. Our workload is 40% getattr, so having metadata cached on the NAS head / NFS server is key. If the server ever goes to disk to answer a getattr request (caused by a lstat call on the client) a cascade of performance-fail is unleashed, and things don't recover until users stop trying to search.
We were hoping to try again with Veritas CFS instead of NFS, but I saw this today: