I have this weird issue where the same exact search, run for a same exact period returns different number of events each time it is run.
Thus, rendering all attempts for accurate reporting obsolete.
It doesn't matter the type of search, for instance, if it has some statistics or it's just plain search - same searches return different results.
We've checked all the usual stuff - event sampling is turned off, indexing time is OK and it's not lagging, so no skewing of the results can come from this.
Searches are run directly against indexes, no data models are involved and search logs for the searches are identical for the runs compared to each other.
What we discovered for sure is, that this issue affects only indexes that are stored in an S3 Storage. Locally kept indexes are fine and do not have this issue.
The S3 storage was tested, it is configured correctly, there are no network disruptions, there are no errors in the logs concerning it, there's nothing that could hint a problem.
Yet, the problem remains.
Any idea what may be causing this?
Attaching a screenshot just for visualization, and here's the search for which it was made:
index="qualys" sourcetype="qualys:hostDetection" PATCHABLE="YES" NETBIOS="*"
Hi mmarinov,
I am facing same issue. Did you find anything to resolve it. Thanks in Advance
Hi Sindhi,
No resolution as of now, unfortunately.
Hi
have you both S3 as a SmartStore in splunk or some other S3-storage? Is this on AWS S3 or some other S3 implementation e.g. in OnPrem?
r. Ismo
S3 SmartStore is used and the Splunk machines are on AWS EC2 instances.
Have you looked from MC how cacheing is working or are there continuously need to get those from S3 instead of use cached version?
Can you also give some specification what you environment is looking and which kind of queries there are running (is those limited e.g. last 7d or all time etc.)
r. Ismo
The deployment is as follows:
1. Indexer cluster with 3 indexers
2. Cluster master node which is also a DMC
3. Search head with Enterprise Security
4. Deployment server
5. Heavy forwarder
6. Numerous UFs.
The deployment is on AWS EC2 instances.
The type of query run makes no difference as stated in the original post.
I've tested now with the one from the original post, and monitored the S3 cache, check the screenshot.
First thing to check in such case would be to see the job inspect window for any differences. And see the job log for any warnings/errors.
As stated in the original post, the job inspector returns basically identical information for the runs.
The only thing that is different are the execution times, which obviously cannot be identical.
Just to be on the safe side.
Does the number increase or is it "randomly fluctuating"?
Did you try limiting by _index_earliest and _index_latest and see if the number of results is constant?
It fluctuates.
Limiting with _index_earliest and _index_latest has no effect, the number of events still fluctuates.
I do realize this is an old post, however, I had the same issue of slight fluctuations in search results. During the course of examining this issue, I stumbled upon this yet unanswered question.
In my search query, I use the perc() function. Its documentation says the following:
The perc and upperperc functions give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile.
This means, sometimes it might be perc50.3, in the next run perc49.6, etc. In my case, this was the cause for the fluctuations I observed. Once I swapped it for a function like avg(), search results were steady.