I have two streams of data coming into a HEC. one has call direction (i.e. inbound) and the other has call disposition (i.e. allowed).
at first i was joining these streams (join), but found a great thread in the community suggesting using stats and so with some cleanup, i have something like this:
index="my_hec_data" resource="somedata*" | stats values(*) as * by id
which works great, and may not even be related to my actual question, but next I want to count by day, cool, so just timechart it, but i suppose my real question is
Is that the most efficient way to count calls by day? or should i do some higher level aggregation somehow?
i don't even know if that makes sense, but if there are 2M calls a day and I go back 30d, is "counting 60M rows" the best way to display 'events per day?'
To answer the question about 'most efficient', then unless you use something like summary searches or accelerated data models, then timechart/stats+bin are the most efficient ways.
However, if you find you want to be able to look back over 30 days regularly, then the sensible way to do this is to have a search that runs daily, e.g. a little after midnight, that does the counting and saves the results, either to a summary index or to a lookup.
Writing to a summary index is simple and the data can go back as long as you want to hold the data for. Using a lookup needs a little 'management' if you want to limit what data you retain.
In both cases though you can then simply search the summary index or lookup for your data (and then add in today's data to get current day figures)
If you do use summary indexing, then make your summaries as frequently as you need for any granularity you need for any drilldown purposes.
Automatic summary indexing can be enabled on a scheduled saved search, just by selecting the Edit Summary Indexing option in the edit dropdown.
However, you can also do this manually, with the collect statement
where you just do
search to collect your info you want to save | collect index=my_summary_index
and this will collect the data you have at the point in the SPL pipeline to that summary index.
Note: Do not believe all you read in that doc page about _time handling!
_time is dependent on several things. If you have only a _raw field then _time will be taken from the standard parsing of _raw.
If you don't have _raw, then if you have a _time field, it is ignored completely. If you run the search as a scheduled saved search, it will be the time the search runs, but if you run the search manually, it will be different.
So, experiment with _time, but be aware that it is not consistent and not as the doc states.
I ended up trying Log Events and created my raw message using
and it wrote to the index.
so, ya, this is great. can create a set of X reports that run nightly to add data to this index.
ETL the Splunk way.
appreciate the time and education!
ty! in reading the bucket (bin) doc, it appears to be something chart/timechart, use, so do you feel this is 'faster' than just using something like
| timechart usenull=f span=1h count by id
my preliminary test is they are very close in run time (the bin one is a little faster), but trying to learn!
thank you, again!
Isn’t about speed exactly. Timechart is about charting. So by defaults limits values. https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchReference/Timechart
you can force change the limit. But stats doesn’t have that behavior.
your question asked only about counting. Stats will count and not introduce unexpected behaviors for a different purpose.