Splunk Search

Slow Search Performance on AIX

Christian
Path Finder

Hello everybody,

I just started with Splunk and I ‘am having already some large performance problems.
my System :
* AIX 5300-08-02-0822
* 4 GB Memory (free 2.5GB)
* CPU (could use up to 5 power6 cpu's)
* Splunk 4.1.4
* 3 Index
* Total amount of data 10GB
* Daily Volume around 1GB per Day

My Problem is now, if I am stopping all the forwarders and there is no searching activity the server has no load (100% idle, no I/O). If i am starting a search, for example the license calculation for the last 7 Days, It takes over 2 Minutes until the calculation is finished. Also all other queries taking very very long and this only with maybe 1% of our expected data size.
If i' am looking at the systemressources I can see a few I/O's (maybe 45 to 100) and about one power6 CPU witch works (this seems to be normal as one search can't split of to multiple CPU's). If I am testing the disk i can get an average I/O of 1800.

And now the magic question : What am I doing wrong ?

thx christian

2 Solutions

meno
Path Finder

Hi Christian, I am answering here so others could participate.

Another customer with great Splunk performance checked his AIX settings for me and allowed me to publish them (thanks Aaron 😉

8 CPUs 
entitled  0.4
minperm   3.0
maxperm  90.0
numperm  35.5

4 CPUs
entitled  0.2
minperm   3.0
maxperm  90.0
numperm  55.5

If I understood correctly, their RAM is usually used 86% by file caching. Hope this helps to compare it with your settings. Cheers, Meno

View solution in original post

Christian
Path Finder

Hello all,

thanks for you support, I think we solved the Problem. The Problem was neither I/O or CPU I was just the parameter maxclient% which controls how much memory can be consumed for filesystem cache. Just to complete this question the well formated output of vmstat -v

          4194304 memory pages
          4040624 lruable pages
          2224021 free pages
                2 memory pools
           485051 pinned pages
             80.0 maxpin percentage
              1.0 minperm percentage
             80.0 maxperm percentage
             31.6 numperm percentage
          1277976 file pages
              0.0 compressed percentage
                0 compressed pages
             31.6 numclient percentage
             80.0 maxclient percentage
          1277976 client pages
                0 remote pageouts scheduled
              370 pending disk I/Os blocked with no pbuf
                0 paging space I/Os blocked with no psbuf
             2228 filesystem I/Os blocked with no fsbuf
             3261 client filesystem I/Os blocked with no fsbuf
              894 external pager filesystem I/Os blocked with no fsbuf
                0 Virtualized Partition Memory Page Faults
             0.00 Time resolving virtualized partition memory page fault

The Problem with the License Calculation still exists but seems to be a different Problem. Search Performance is for the moment OK.

Setting vmo Parameter

    vmo -p -o maxperm%=80
    vmo -p -o maxclient%=80

View solution in original post

Christian
Path Finder

Hello all,

thanks for you support, I think we solved the Problem. The Problem was neither I/O or CPU I was just the parameter maxclient% which controls how much memory can be consumed for filesystem cache. Just to complete this question the well formated output of vmstat -v

          4194304 memory pages
          4040624 lruable pages
          2224021 free pages
                2 memory pools
           485051 pinned pages
             80.0 maxpin percentage
              1.0 minperm percentage
             80.0 maxperm percentage
             31.6 numperm percentage
          1277976 file pages
              0.0 compressed percentage
                0 compressed pages
             31.6 numclient percentage
             80.0 maxclient percentage
          1277976 client pages
                0 remote pageouts scheduled
              370 pending disk I/Os blocked with no pbuf
                0 paging space I/Os blocked with no psbuf
             2228 filesystem I/Os blocked with no fsbuf
             3261 client filesystem I/Os blocked with no fsbuf
              894 external pager filesystem I/Os blocked with no fsbuf
                0 Virtualized Partition Memory Page Faults
             0.00 Time resolving virtualized partition memory page fault

The Problem with the License Calculation still exists but seems to be a different Problem. Search Performance is for the moment OK.

Setting vmo Parameter

    vmo -p -o maxperm%=80
    vmo -p -o maxclient%=80

mzorzi
Splunk Employee
Splunk Employee

Hi,

since you have problems with the Search performance, I think you should give more cpu entitlement. Eventually as test, decrease the number of lcpu, obviously with less users connected and scheduled saved searches.

Can you try to verify the performance against one of the applications provided, like the *nix app?

Christian
Path Finder

Hi, i tried it with the following Setup, Entitelment 2.0 uncapped open to 8.0, as recomende I increased it to 8GB (uasge is even with filesystem cache around 2-3 GB) The Result is the same. As a reference for my performance i am useing always the licens calculation. So I think it's not our SearchApp which makes problems or am I wrong ?

0 Karma

meno
Path Finder

Hi Christian, I am answering here so others could participate.

Another customer with great Splunk performance checked his AIX settings for me and allowed me to publish them (thanks Aaron 😉

8 CPUs 
entitled  0.4
minperm   3.0
maxperm  90.0
numperm  35.5

4 CPUs
entitled  0.2
minperm   3.0
maxperm  90.0
numperm  55.5

If I understood correctly, their RAM is usually used 86% by file caching. Hope this helps to compare it with your settings. Cheers, Meno

Christian
Path Finder

Hi, I controlled now the settings, i increased also the value for maxclient percentage to 75%, this gives now i bit more performance and the filesystemcache is now used. But it's still too slow, the Blocksize of Splunk 4 during the search is between 4KB and 27KB witch is not very high.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

This sounds like an LPAR environment. Make sure your VIOS LPARs (if you have them) have enough CPU entitlement - Splunk is very I/O heavy. (It might help to share what type of disk you are using and how it's attached)

Is your Splunk LPAR capped or uncapped, and what is its CPU entitlement?

One tool we've found useful on our network is LPAR2RRD - it pulls HMC utilization data and loads it into an RRDtool database. http://sourceforge.net/projects/lpar2rrd/

Christian
Path Finder

[0 external pager filesystem I/Os blocked with no fsbuf] [0 Virtualized Partition Memory Page Faults] [0.00 Time resolving virtualized partition memory page faults]

0 Karma

Christian
Path Finder

Output of vmstat -v
[2097152 memory pages] [1996704 lruable pages] [1451545 free pages] [2 memory pools] [189140 pinned pages] [80.0 maxpin percentage] [1.0 minperm percentage] [80.0 maxperm percentage] [14.2 numperm] [284766 file pages] [0.0 compressed percentage] [0 compressed pages] [14.2 numclient percentage] [75.0 maxclient percentage] [284766 client pages] [0 remote pageouts scheduled] [0 pending disk I/Os blocked with no pbuf] [0 paging space I/Os blocked with no psbuf][2228 filesystem I/Os blocked with no fsbuf] [3261 client filesystem I/Os blocked with no fsbuf]

0 Karma

Christian
Path Finder

Hi, you a right, I ajusted the value for maxclient and this gives a bit more performance (but still way beyond) Now the Filesystemcache is working and the numperm Value is increasing.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Hmm, just read @Meno's comment below... AIX JFS2 uses 'client' memory and not 'perm' memory for its filesystem cache. (Original JFS1 used 'perm' memory) Any chance you could just update your question above with full output of vmstat -v ?

dwaddle
SplunkTrust
SplunkTrust

4% numperm means that only 4% of your 4GB of memory is being used for filesystem cache. That, coupled with the large amount of 'free' memory makes me wonder exactly what's going on. Just as a curiosity, is the filesystem with your Splunk data on it mounted with the "dio" or "cio" options?

Christian
Path Finder

Hi, the values are : 1.0 minperm percentage, 80.0 maxperm percentage, 4.0 numperm percentage. Looks okay for me as I understand the settings. greetz christian

0 Karma

dwaddle
SplunkTrust
SplunkTrust

The 2.5GB of free memory is interesting. Normally, AIX's filesystem cache stuff tries very hard to keep truly 'free' memory near zero using it instead for filesystem cache. You can tune minperm/maxperm and such to influence that. Splunk greatly depends on the OS file cache and does not cache data itself (unlike, say, DB2 or Oracle). What does your "vmstat -v" say about numperm, minperm, and maxperm?

Christian
Path Finder

Hi, thanks for your answer. Yes this is a LPAR and the disk is a SAN disk connected to the vio (FC), vio and LPAR are connected through vscsi. By making various test (writing to disk, reading from disk) I'm sure there is enough performance available (as said there is an average of 2000 I/O's).
CPU is uncapped 0.5 entitlement, 5 virtual processors, weight is on the second highest possibility, higher are only the vios. There are enough CPU resources available. I know this looks like there are not enough I/O's but they are there 🙂

Branden
Builder

I'm curious to see what kind of response you get to this question.

I run in an environment similar to yours- Indexer running on an LPAR running AIX 6.1 (Was on 5.3 just a few weeks ago tho), 1.5 POWER6 CPUs capped, 4 GB of RAM. We haven't experienced the performance problem that you're experiencing, but we're not indexing as much data as you.

(Calculating the same license usage for us takes only about 12 seconds, not the 2+ minutes you're experiencing.)

How long have you been running Splunk?

I do notice that Splunk becomes a CPU hog when it is first started, but it settles down after a little while. But it sounds like your problem goes far beyond that.

Out of curiosity, what kind of disk are you using?

0 Karma

Christian
Path Finder

Hi, we were just started with splunk, the amount of data should not be a problem there are larger environments around. It never worked well with the version 4.x . I will try to run splunk 3.x on the same server, because we have another server were we once installed splunk 3.x and it seems it run's mutch faster, but it's on a different server. thx anyway for your support

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...