Indexer Capacity Planning - linking indexing and search performance: how does one effect the office?
I'm attempting to plan an upgrade of our Splunk instance from an ancient 6.4.1 to a brand new 7.2 instance and as part of that I'm trying to work out what sort of capacity I need...
So this seems like it should be an easy task — as I'm already ingesting data into a Splunk setup so I can get real stats on how many searches are being run and how much data is being ingested.
However, after reading the capacity planning documents, I can't seem to find anything to indicate what the impact of running searches has on indexed performance. For example, there's the reference host specification which gives an idea of what indexer performance I can expect if no searches are being made and there is also a guide on resources used when searches are run.
But nothing linking the two?
If I'm ingesting 400gb of day per day and seem to be averaging about 10 concurrent searches per minute (during office hours) how will that impact the indexing rates of a reference specification host?
I think that it can be judged only by the number of cores of the CPU.
Since 10 cores are used for simultaneous search and 1.5 cores are used for index processing, delays will not occur if it exceeds 12 cores.
There is no problem if the specification and configuration (search head 1, indexer 2) are recommended configurations.
Also, if the data you import at once is large, tuning is necessary.
Then again: your current stats about number of concurrent searches may be totally irrelevant if your new deployment has faster CPUs or faster storage, or thanks to performance improvements in splunk itself, causing searches to complete much faster (and as a result your concurrent searches going down considerably).
Have a look at these documentation pages:
Regarding search impact on indexers: http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Accommodatemanysimultaneoussearches
And guidelines regarding the number of instances needed for certain numbers of users vs. data ingestion volumes:
Thanks for the information. I had a read those pages previously and still find parts of them a bit vague, for example:
An indexer that meets the reference hardware requirements can ingest up to 300GB/day while supporting a search load.
How much 'search load' - it is documented that the reference hardware can support 1.7tb of ingest if it has no search load. So there must be a sliding scale of search load reducing indexing performance?
The table further down that page also helps - but again it uses the vague figure of 'total users' which I can't seem to find defined anywhere (and a 'user' could be someone running a single query once very 10 minutes or having a big dashboard open and updating regularly).
Yeah, there is always a massive "it depends" with all of this. The docs provide some rules of thumb, but it is impossible to give and hard formulas I guess. As you correctly state: each user is different, but likewise also each search is different. Performance also heavily depends on your data distribution, the type of data, the amount of extractions / automated lookups etc. etc.
In the end it comes down to choosing a sensible starting point based on the high level guidelines and then closely monitoring performance to see if you need to scale out (which fortunately is relatively easy) or tune certain settings.