This is one of the main reasons that I absolutely HATE real-time searches. Almost everyone using them makes an unconscious assumption that there is no latency in the events as they make their way into Splunk, but that is NEVER the case! If you have a 1 minute window and many events have a latency between 1 and 2 minutes, then you will miss most of them in your real-time search/alert. But then when you go back to look later (even 1 minute later), they will be in the indexers and so you will get more events returned and very different values.
All events have a field called _indextime which is the time the event was indexed. You can start with a search extension like this:
... | eval lagSeconds=_indextime - _time | stats min(lagSeconds) max(lagSeconds) avg(lagSeconds)
Now for each event, you have the Splunk latency characteristics to PROVE that your eventLatency is bigger than your real-time window. There are HUGE problems with real-time searches and by the time I get done explaining all of them to a client, 100% of the time, we ditch real-time and architect a different approach.
... View more
All you need is the Machine Learning Toolkit and to use the Panda Correlation Matrix Algorithm:
Simply copy&paste the code and than:
index=_internal sourcetype=splunkd group=* | timechart usenull=f useother=f count by group | fields -
_time | fit CorrelationMatrix * | table index,*
Here is the description of the available methods:
... View more