I just started with Splunk and I ‘am having already some large performance problems.
my System :
* AIX 5300-08-02-0822
* 4 GB Memory (free 2.5GB)
* CPU (could use up to 5 power6 cpu's)
* Splunk 4.1.4
* 3 Index
* Total amount of data 10GB
* Daily Volume around 1GB per Day
My Problem is now, if I am stopping all the forwarders and there is no searching activity the server has no load (100% idle, no I/O). If i am starting a search, for example the license calculation for the last 7 Days, It takes over 2 Minutes until the calculation is finished. Also all other queries taking very very long and this only with maybe 1% of our expected data size.
If i' am looking at the systemressources I can see a few I/O's (maybe 45 to 100) and about one power6 CPU witch works (this seems to be normal as one search can't split of to multiple CPU's). If I am testing the disk i can get an average I/O of 1800.
And now the magic question : What am I doing wrong ?
I'm curious to see what kind of response you get to this question.
I run in an environment similar to yours- Indexer running on an LPAR running AIX 6.1 (Was on 5.3 just a few weeks ago tho), 1.5 POWER6 CPUs capped, 4 GB of RAM. We haven't experienced the performance problem that you're experiencing, but we're not indexing as much data as you.
(Calculating the same license usage for us takes only about 12 seconds, not the 2+ minutes you're experiencing.)
How long have you been running Splunk?
I do notice that Splunk becomes a CPU hog when it is first started, but it settles down after a little while. But it sounds like your problem goes far beyond that.
Out of curiosity, what kind of disk are you using?
Hi, we were just started with splunk, the amount of data should not be a problem there are larger environments around. It never worked well with the version 4.x . I will try to run splunk 3.x on the same server, because we have another server were we once installed splunk 3.x and it seems it run's mutch faster, but it's on a different server. thx anyway for your support
This sounds like an LPAR environment. Make sure your VIOS LPARs (if you have them) have enough CPU entitlement - Splunk is very I/O heavy. (It might help to share what type of disk you are using and how it's attached)
Is your Splunk LPAR capped or uncapped, and what is its CPU entitlement?
One tool we've found useful on our network is LPAR2RRD - it pulls HMC utilization data and loads it into an RRDtool database. http://sourceforge.net/projects/lpar2rrd/
Hi, thanks for your answer. Yes this is a LPAR and the disk is a SAN disk connected to the vio (FC), vio and LPAR are connected through vscsi. By making various test (writing to disk, reading from disk) I'm sure there is enough performance available (as said there is an average of 2000 I/O's).
CPU is uncapped 0.5 entitlement, 5 virtual processors, weight is on the second highest possibility, higher are only the vios. There are enough CPU resources available. I know this looks like there are not enough I/O's but they are there 🙂
The 2.5GB of free memory is interesting. Normally, AIX's filesystem cache stuff tries very hard to keep truly 'free' memory near zero using it instead for filesystem cache. You can tune minperm/maxperm and such to influence that. Splunk greatly depends on the OS file cache and does not cache data itself (unlike, say, DB2 or Oracle). What does your "vmstat -v" say about numperm, minperm, and maxperm?
Hi, the values are : 1.0 minperm percentage, 80.0 maxperm percentage, 4.0 numperm percentage. Looks okay for me as I understand the settings. greetz christian
4% numperm means that only 4% of your 4GB of memory is being used for filesystem cache. That, coupled with the large amount of 'free' memory makes me wonder exactly what's going on. Just as a curiosity, is the filesystem with your Splunk data on it mounted with the "dio" or "cio" options?
Hmm, just read @Meno's comment below... AIX JFS2 uses 'client' memory and not 'perm' memory for its filesystem cache. (Original JFS1 used 'perm' memory) Any chance you could just update your question above with full output of vmstat -v ?
Hi, you a right, I ajusted the value for maxclient and this gives a bit more performance (but still way beyond) Now the Filesystemcache is working and the numperm Value is increasing.