Splunk Search

What is the most efficient way to aggregate events by day over two separate streams?

loganseth
Path Finder

I have two streams of data coming into a HEC.  one has call direction (i.e. inbound) and the other has call disposition (i.e. allowed). 

at first i was joining these streams (join), but found a great thread in the community suggesting using stats and so with some cleanup, i have something like this:

 

index="my_hec_data" resource="somedata*" 
| stats values(*) as * by id

 


which works great, and may not even be related to my actual question, but next I want to count by day, cool, so just timechart it, but i suppose my real question is

Is that the most efficient way to count calls by day?  or should i do some higher level aggregation somehow?

i don't even know if that makes sense, but if there are 2M calls a day and I go back 30d, is "counting 60M rows" the best way to display 'events per day?'

Labels (2)
Tags (2)
0 Karma
1 Solution

starcher
Influencer

Oops typing on phone wasn’t helpful

| bucket span=1d _time
| stats count by _time, id

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

To answer the question about 'most efficient', then unless you use something like summary searches or accelerated data models, then timechart/stats+bin are the most efficient ways.

However, if you find you want to be able to look back over 30 days regularly, then the sensible way to do this is to have a search that runs daily, e.g. a little after midnight, that does the counting and saves the results, either to a summary index or to a lookup. 

Writing to a summary index is simple and the data can go back as long as you want to hold the data for. Using a lookup needs a little 'management' if you want to limit what data you retain.

In both cases though you can then simply search the summary index or lookup for your data (and then add in today's data to get current day figures)

If you do use summary indexing, then make your summaries as frequently as you need for any granularity you need for any drilldown purposes.

 

loganseth
Path Finder

love this!  I've written reports that write to a lookup, but what is this 'write to an index' magic?  how do i do that, sir? 👀

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Automatic summary indexing can be enabled on a scheduled saved search, just by selecting the Edit Summary Indexing option in the edit dropdown.

However, you can also do this manually, with the collect statement

https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchReference/Collect

where you just do 

search to collect your info you want to save
| collect index=my_summary_index

and this will collect the data you have at the point in the SPL pipeline to that summary index.

 

Note: Do not believe all you read in that doc page about _time handling!

_time is dependent on several things. If you have only a _raw field then _time will be taken from the standard parsing of _raw.

If you don't have _raw, then if you have a _time field, it is ignored completely. If you run the search as a scheduled saved search, it will be the time the search runs, but if you run the search manually, it will be different.

So, experiment with _time, but be aware that it is not consistent and not as the doc states.

 

 

 

loganseth
Path Finder

Awesome! 

I ended up trying Log Events and created my raw message using

_time="$result._time$";calls="$result.calls$";etc

and it wrote to the index.

so, ya, this is great.  can create a set of X reports that run nightly to add data to this index. 

ETL the Splunk way.

appreciate the time and education!

0 Karma

starcher
Influencer

 

| bucket span=1d
| stats count by _time, id

starcher
Influencer

Oops typing on phone wasn’t helpful

| bucket span=1d _time
| stats count by _time, id

loganseth
Path Finder

ty!  in reading the bucket (bin) doc, it appears to be something chart/timechart, use, so do you feel this is 'faster' than just using something like

| timechart usenull=f span=1h count by id

my preliminary test is they are very close in run time (the bin one is a little faster), but trying to learn! 

thank you, again!

0 Karma

starcher
Influencer

Isn’t about speed exactly. Timechart is about charting. So by defaults limits values. https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchReference/Timechart
you can force change the limit. But stats doesn’t have that behavior. 

your question asked only about counting. Stats will count and not introduce unexpected behaviors for a different purpose. 

loganseth
Path Finder

for sure!  this is really great information and i appreciate it!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...