I've been trying to troubleshoot a search that is incredibly slow. After paring down the events, it turns out that when I search on just a single specific 2-second timespan, the search throws an error and slows down the search to where it only looks through 10 at a time every few seconds, basically spinning its wheels. Every time. Only for this two-second timespan.
Here's the error is see in the search bar:
[indexer hostname] Events may not be returned in sub-second order due to search memory limits in limits.conf:[search]: maxrawsizeperchunk. See search.log for more information.
So it only happens when the search gets to this two-second span for this one particular index. Also, I can't figure out which search.log it's talking about because there are hundreds on the box.
I've seen this issue before where there are too many invents within a subsecond timespan for Splunk to process. Newer versions of Splunk know this and actually increment the timestamp at indexing to prevent this from happening.
Yes, you could increase the maxrawsizeperchunk (Splunk recommended that we set this to a really high value , or unlimited, but it caused browser issues for us).
You should also look into that data to see why there are so many events being returned within the same sub second. It could be a bad timestamping issue or lack of timestamp in your event causing Splunk to have to use the current time as the timestamp.
Did the strength of the event-timestamp-incrementation increase in very recent memory?
I believe this particular message (maxrawsizeperchunk) has to do with the SIZE of the events.
The _raw is the text of the events, so rawsize perchunk has to do with when we read a chunk (this can be different quantities of events based on a few factors like batch mode vs nonbatch, size of buckets, distribution across time, etc), but typically is capped at ten thousand (10,000). To avoid bad scenarios like reading in 10,000 events each of which are 1MB of text in size (oops, 10GB of raw and your box falls over), there's a ceiling on the maximum amount of data we're willing to read in from _raw.
If this feature is working ideally, you could still get a slowdown if that chunk pushed your system into a low memory scenario which might cause some amount of swapping until the search exits. If the feature is not working well, then it could cause slowdowns for other reasons (that I don't know).
I would suggest checking the memory scenario on the index nodes and the search head, and if no one else has a better answer about this problem working with support to nail it down.