I have a 5GB license in my Splunk, but abruptly on a single day, 24GB logs got indexed. After that, search heads became very slow.
I have indexer clustering in my Splunk setup (3 peers 7 1 master). So if I search for the logs from the master indexer, it is providing the results properly, but not if I did the same from search heads.
Sounds a bit tricky. When you say search heads became slow, is it just when searching for that day's log when there was the 5 time spike?
How many concurrent searches is it running? How many RT searches? Are there any heavy searches jobs running? Check CPU and Mem utilization etc...that's where i would start. Go through the Job Inspector and see if you find any heavy weights.
Please post additional info.
First thanks for the reply..
Around 30 to 35 concurrent searches and real time searches are running. But I couldn't say they are heavy searches because in normal days they were/are providing proper results and working fine.
CPU and memory utilization were good.
It happened only on that particular time period. between 3AM to 3:15AM we got 23GB of data from around 50 hosts. After that search heads performance became very slow for the whole day. but indexer master was still working fine.
Now both the indexer master and search head are working fine.
But I just wanted to know how did it affect the search heads and its performances?
Hi Madhan45, If you can run the same search from the one search head and its fine, but then not fine when running it in the cluster, it seems there is something going on with the cluster. I would take a look at it from a hardware perspective. Are the search-heads pegged out from a cpu/memory perspective?
Do you have the Distributed Management Console setup? This can be a great help for getting insight into what the cluster is actually doing.
You mention that this issue seemed to have started the same day that 24GB was indexed. Was this surge of data all in one index? Does the SHC perform ok when searching anything but the index that had the huge amount of data?
I guess my point here is that it might not be the case that the extra data has any impact on the SHC issue.
If you can't get any visibility into the cluster through the DMC, or through OS level performance instrumentation (sar, top etc), I'd throw a hail mary and issue a rolling restart to the SHC (i.e. turn it off and then on again). A reinitialization of the cluster might serve as a solution to whatever is going on with it. If that fails, you'd need to open a Splunk support ticket, generate diags and all that fun stuff.
Please let me know if this helps!
Thanks for the reply first,
As of now, as per the information we gathered. WE were good at CPU and memory of search head.
we couldn't state that there was something wrong in SHC. Because form the next day everything became normal.
we don't have DMC and as you mentioned, immediately we have restarted the both search heads and all the indexers (3peer & 1 master) when this problem occurred.
we have indexer cluster(3 peers and 1 master) and it didn't get index only on one indexer..