I'm trying to do some basic performance benchmarking using SplunkIT on a replacement splunk instance I'm building. Specifically, I'm trying to compare different Linux file systems to see how they preform in comparison with each other. (I'm considering ext3
, xfs
, and ext4
).
The problem that I'm running into is that the indexing performance seems to be strongly impacted by indexing throttling and therefore doesn't seem to accurately reflect the differences in file system performance, rather I'm seeing just the difference between how long easy test was stuck in "thottled" mode.
Here are some example splunkd messages I'm seeing:
09-16-2011 23:04:32.733 WARN databasePartitionPolicy - applying indexing throttle for /opt/splunk/var/lib/splunk/splunkit_idxtest/db because bucket has too many tsidx files, is your splunk-optimize working?
09-16-2011 23:04:33.462 WARN databasePartitionPolicy - released indexing throttle for /opt/splunk/var/lib/splunk/splunkit_idxtest/db
Does anyone have any experience with the impact that index throttling has on the SplunkIT indexing test?
Whoops! I found the real issue, the problem is not the indexing throttle (which only appears to be turned on every once in a while for a relatively short period of time.) The real issue is that I was monitoring splunk performance with a dashboard that was doing a realtime search across all of my indexes. These events in splunkd.log
show the real culprit:
09-19-2011 11:22:51.382 INFO IndexProcessor - rtsearch connection established, filter = '[ OR index::_internal [ AND index::* [ OR index::history index::main index::os index::sample index::sos index::splunkit_idxtest index::summary ] ] ]', _activeStreams = 1, queue_size = 10000, blocking = FALSE, max_block_secs = 0
09-19-2011 11:24:36.450 INFO IndexProcessor - rtsearch connection terminated, filter = '[ OR index::_internal [ AND index::* [ OR index::history index::main index::os index::sample index::sos index::splunkit_idxtest index::summary ] ] ]', _actionStreams = 0
As soon as I stopped this search, the indexing performance jumps right back up to where it should be.
Mystery solved.
BTW, it looks like the index throttle was never applied on my system for more than 3 seconds at a time, and normally it was right around 1 second. For anyone interested, here's a search that gives some additional insight into the length of time the indexing throttle was applied:
index=_internal sourcetype=splunkd databasePartitionPolicy indexing throttle | rex "throttle for (?<path>\S+)" | transaction path startswith="applying" endswith="released" | timechart max(duration)
Hi Lowell,
This is a great use case for SplunkIt - if you have results for the different file systems, and assuming you'd like to share them, could you send an email to splunkit @splunk?
Thanks!
Whoops! I found the real issue, the problem is not the indexing throttle (which only appears to be turned on every once in a while for a relatively short period of time.) The real issue is that I was monitoring splunk performance with a dashboard that was doing a realtime search across all of my indexes. These events in splunkd.log
show the real culprit:
09-19-2011 11:22:51.382 INFO IndexProcessor - rtsearch connection established, filter = '[ OR index::_internal [ AND index::* [ OR index::history index::main index::os index::sample index::sos index::splunkit_idxtest index::summary ] ] ]', _activeStreams = 1, queue_size = 10000, blocking = FALSE, max_block_secs = 0
09-19-2011 11:24:36.450 INFO IndexProcessor - rtsearch connection terminated, filter = '[ OR index::_internal [ AND index::* [ OR index::history index::main index::os index::sample index::sos index::splunkit_idxtest index::summary ] ] ]', _actionStreams = 0
As soon as I stopped this search, the indexing performance jumps right back up to where it should be.
Mystery solved.
BTW, it looks like the index throttle was never applied on my system for more than 3 seconds at a time, and normally it was right around 1 second. For anyone interested, here's a search that gives some additional insight into the length of time the indexing throttle was applied:
index=_internal sourcetype=splunkd databasePartitionPolicy indexing throttle | rex "throttle for (?<path>\S+)" | transaction path startswith="applying" endswith="released" | timechart max(duration)
Throttling is an inherent part of the indexing process. Incoming data is slowed down if the splunk-optimize process is not able to keep up with optimizing the files on disk, so the indexing rate is in fact affected by the performance of the disk and its ability to service the demands of splunk-optimize. It is possible that splunk-optimize bottlenecks on CPU, or that there aren't enough splunk-optimize threads, and maybe that's what you're seeing. In that case, you would first need to make sure that your machine has sufficient numbers of CPUs/threads. If you do, you could try a couple things. First, try raising the maxConcurrentOptimizes setting to, say, 6 or 10. Next, you might slightly increase maxMemMB (e.g., to 20 or 50). That should respectively, increase the number of splunk-optimizes available on the index, and decrease the amount of on-disk optimization required.