Doing a simple search index=test
over 10mln events gives me browsing speed around 5000 events per second. Extremely slow timeline build. Cpu load 100%. Doing that in fast mode gives 20k per second. In my opinion, too slow. Modern RAID with high iops and xenon cpu. Lots of cores and ram 64gigs... It's faster on my laptop with a similar dataset.
How to investigate?
If it's a linux server, Did you disable THP? It's one of the main reasons for slow searches over time. Although it shouldn't result in high cpu (afaik)
Ref: https://answers.splunk.com/answers/188875/how-do-i-disable-transparent-huge-pages-thp-and-co.html
What type of data is in the index? How large are the events? How saturated is your Splunk installation?
The best place to start is by analyzing the search job inspector. Check that there aren't any lookups or field extractions that are slowing you down. Is this a distributed installation? If so, look at how long it took to stream the data back (dispatch.stream.remote) and identify any slow search peers.
Use the Distributed Management Console to check the health of Splunk.
Also use other OS related tools to troubleshoot system performance; vmstat, iostat, top, lsof to look for any processes hogging CPU, memory or any high iowait times on your disk array.
Beyond that, searching index=test is a terrible way to test search performance. You have to bring back every event in the index for the given timeframe.
Type of data: mostly cisco asa logs. No more than 300 bytes per event.
Distributed two indexers one search head. But all data on one indexer and I am temporarily do searchea on it to benchmark.
Index=test for testing worst case scenario.
Done all linux optimize stuff i am aware of. Bottleneck is cpu.
I am surprised because fast search without extraction (as I understand it) is doing so poorly. 200bytes multiply by 10k is 2MB per sec reading raw data. I should get something near my io performance 500MB/s.
Am I missing something?
Bottleneck is cpu
Where did you get this information from? Did you check the job inspector as previously suggested? If so, what is using the most resources or time in the list?
cheers, MuS
I presume CPU as iostats show that drives are not saturated, and top gives 1 process 100% usage in userspace.
Job inspection for short time - circa 200.000 in circa 20 sec.
Duration (seconds) Component Invocations Input count Output count
0.02 command.fields 26 197,038 197,038
19.78 command.search 26 - 197,038
0.38 command.search.calcfields 25 197,038 197,038
0.15 command.search.fieldalias 25 197,038 197,038
0.08 command.search.index 26 - -
0.00 command.search.index.usec_1_8 6 - -
0.00 command.search.index.usec_8_64 36 - -
9.75 command.search.kv 25 - -
6.51 command.search.typer 25 197,038 197,038
1.41 command.search.rawdata 25 - -
0.83 command.search.lookups 25 197,038 197,038
0.32 command.search.tags 25 197,038 197,038
0.00 command.search.summary 26 - -
0.00 dispatch.check_disk_usage 2 - -
0.00 dispatch.createdSearchResultInfrastructure 1 - -
0.10 dispatch.evaluate 1 - -
0.10 dispatch.evaluate.search 1 - -
9.24 dispatch.fetch 27 - -
19.77 dispatch.localSearch 1 - -
0.02 dispatch.preview 16 - -
0.04 dispatch.readEventsInResults 1 - -
19.78 dispatch.stream.local 26 - -
8.14 dispatch.timeline 27 - -
0.17 dispatch.writeStatus 38 - -
0.02 startup.configuration 1 - -
0.06 startup.handoff 1 - -