Solved: How to sum by combination of specific fields?

ahofmann · ‎03-02-2018

I have an index of access logs and I want to see how many download events with a specific combination of 'ip', 'filename', 'date_mday', 'date_month', and date_year' exceed 1000 'bytes'

The following query gives me believable counts

index=logs sourcetype=logs
| stats sum(Bytes) as TotalBytes by ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename

but it seems like I should be using eventstats like

index=logs sourcetype=logs
| eventstats sum(Bytes) as TotalBytes by ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename

but whenever I do this, it gives me a much smaller number for each filename. I eventually want to take the TotalBytes of these downloads and see how many minutes of content is downloaded using each file's bitrate, so it's important that the TotalBytes is correct. Why is it more appropriate to use stats than eventstats?

somesoni2 · ‎03-02-2018

How many rows does your base search have? The eventstats command have limitation on memory usage and max result rows (see limits.conf, search for eventstats), so that might explain incorrect results if there are high number events to be processed. For your scenario, your first implementation, using stats, is the correct and optimal method.

View solution in original post

logloganathan · ‎03-05-2018

Please use this query. you will get the result

index=logs sourcetype=logs
| eventstats sum(Bytes) as TotalBytes by Bytes,ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename

Please provide your response

ahofmann · ‎03-05-2018

Unfortunately, this gave me way too high of a count. I believe this is because it's creating a new TotalBytes for each different Bytes within the same download. This doesn't achieve a threading of downloads by the same field, it just gives a much larger count by magnitude of the number of download requests it takes to complete one unique download.

somesoni2 · ‎03-02-2018

How many rows does your base search have? The eventstats command have limitation on memory usage and max result rows (see limits.conf, search for eventstats), so that might explain incorrect results if there are high number events to be processed. For your scenario, your first implementation, using stats, is the correct and optimal method.

ahofmann · ‎03-05-2018

It has hundreds of thousands of rows, so that actually would make sense. Thank you!

niketn · ‎03-05-2018

@ahofmann, I have converted @somesoni2 's comment to answer. Please accept the mark this question as answered!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

ahofmann · ‎03-05-2018

@niketnilay, I am reviewing @logloganathan's answer and will mark the one that worked best as the answer! Thanks

niketn · ‎03-05-2018

@ahofmann, I think @somesoni's point was that eventstats would be resource consuming command. If you can achieve same results from stats, then you should use the same!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

How to sum by combination of specific fields?

Customer Experience | Splunk 2024: New Onboarding Resources

Celebrate CX Day with Splunk: Take our interactive quiz, join our LinkedIn Live ...

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...