Hi, guys :
forgive my English level first, it is not my native language.
I have a distributed search which consists of an indexer instance and a search head instance, Their host specifications are as follows:
indexer
CPU:E5-2682 v4 @ 2.50GHz / 16Core
Memory:32G
Dsik:1.8TB(5000IOPS)
search head:
CPU:E5-2680 v3 @ 2.50GHz / 16Core
Memory:32G
Disk:200GB(3400IOPS).
I have 170G of raw logs ingested into splunk indexer every day ,5 indexes, one of which is 1.3TB in size. Its index name is tomcat , which stores the logs of the backend application. now the index is full. When I search for events in this index, the search speed is very slow. My search is
index=tomcat uri="/xxx/xxx/xxx/xxx/xxx" "xxxx"
I'm very sorry that I use xxx to represent a certain word because it involves the privacy issues of the API interface. I am searching for events from 7 days ago, no results found were returned for a long time,I even tried searching the logs for a specific day,but the search speed is still not ideal.
If I wait about 5 minutes, I will gradually see some events appear on the page. I checked the job inspector, I found that command.search.index, dispatch.finalizeRemoteTimeline, and dispatch.fetch.rcp.phase_0 execution cost is high
but these don't help me much.I tried leaving the search head and performing a search on the indexer web ui, but the search was still slow. this means that there is no bottleneck in the search head?
During the search, I observed the various indicators of the host monitoring, the screenshot is as follows:
It seems that the indexer server resources are not completely exhausted. So I tried restarting the indexer's splunkd service,Unexpectedly, the search speed seems to have been relieved,When I use the same search query and time range, it is gradually showing the events returned, although the speed does not seem to be particularly fast.
Just as I was celebrating that I had solved the problem, my colleague told me the next day that the search speed seemed to be a little unsatisfactory again, although the search results would be gradually returned during the searching.so, this is not the best solution, it can only temporarily relieve.
so, how do you think I should solve the problem of slow search speed? Is it to scale out the indexers horizontally and create a indexer cluster?
This long dispatch phase means that it is taking very long for Splunk to spawn search to your indexer. At first glance it would suggest network problems (are your both components on prem or in cloud? If in cloud are they in the same cloud zone?) or some DNS issues (so that some timeouts must happen).
They're on the same network, they're using intranet bandwidth, and they have 100MB bandwidth.
cloud instance
Another technique you can use is to make use of TERM(xx) search - TERM() searches are much faster than raw data searches and let's assume your uri is
/partner/a/b/c/d
you can do
index=tomcat TERM(a) TERM(b) TERM(c) TERM(d) uri=/partner/a/b/c/d
it will depend on how unique the terms are, but it will certainly provide a way to reduce the amount of data looked at.
In the job properties, look at the scanCount property that will show you the number of events scanned to provide the results.
Is the search slow to return just the last 60 minutes of data and does the performance degraded linearly as you increase the time interval.
How many events do you get per 24h period?
Are you just doing a raw event search for 7 days to demonstrate the problem or is this part of your use case?
Take a look at the job properties phase_0 property to see what your expanded search is.
You can look at the monitoring console to see what the Splunk server metrics are looking like - perhaps there is a memory issue - take a look at the resource usage dashboards.
Thank you for your reply. I have 1 billion incidents every day ingesting the SPLUNK indexer. I checked the monitoring console. I didn't seem to see any abnormalities.
According to this chart, I single indexer should be enough for the volume of data. A lot depends on the number of searches being run, however, something Splunk's chart tries to capture in the "number of users" figures.
If you have fewer than 24 users, but still do a lot of searching then it may be worthwhile to add an indexer or two. Once the data is re-balanced among the indexers, each will perform a fraction of the work and the search should complete in a fraction of the current time.
Also, consider adding a sourcetype specifier to the base search as that can help improve performance.
Thank you, after my investigation of the problem, this is a super sparse search. I need to add IOPS to solve this problem. I raised the IOPS to 25,000. The search speed has changed amazing. It's done!