I have been working with streamstats for about 2 years by now and have been always facing the same issue : the maximum number of events it can handle (about 10k by default)
We had recently a requirement where we would like to link events from one to another based on a ticket number. However, we can have millions of events and there is no limited window range. As a result, one serie of event (with the same ticket number) has a lifetime which can go from 1 month to 6 months. From one event to another, I want to compute some variables and there are absolutely no existing field from which I could identify the next or previous event (and even if that was the case, I would not use join as it is awful from a performance perspective)
We can raise streamstats limit but I understand that this approach is not recommended because it uses large memory and will get slower. And it will reach at some point another limit...
With Splunk, we cannot populate on the run existing event in which case I could have tried limiting the search to the latest event from each serie...
So what would you suggest to handle that requirement ?
Thank you in advance for your help
Thank you for your answer.
This is exactly what I finally found out and I should have shared it on that thread. The resetonchange is very helpful.
However, I don't think that the a daily summary could be helpful. indeed, one serie/transaction can run over weeks and the delta between two events can be more than one day or one week. But as I have not tried, it may apply... The outcoming question is : how could I generate a summary index every day or every week without loosing the event structure ? Summary index are computed values and as per my test you cannot use streamstats properly from a summary index.
Let's take the following example:
For each event, I want to know the previous color and... the next color. Currently, I am running streamstats chronologically and then the other way on all events. I want to have this kind of result:
streamstats last(color) as nextcolor | reverse | streamstats last(color) as lastcolor
If I was using the summary index, how would I do that ?
Thank you !!
Okay, you started with a pretty straightforward question and then you went sideways.
The issue with how many events can be held in memory at one time with
streamstats is probably solved by sorting the records by your key (ticket number) and then
_time, and using
reset_on_change so that splunk literally does not have to remember anything from records that had prior keys. The only space limitation then is from the
sort 0 itself.
However, it may be that your REAL issue has a completely different solution. If you are computing running values, why would you want to run the dataset back over several months? If you ran this daily, you would be recalculting the information hundreds of times.
To me, this use case is asking for either a summary index or a CSV file. You would calculate your values once a day, for example, and write them out. That way, the running totals (or whatever) would always be at most 24 hours behind, and that's all you would have to scan to update them on the fly.