I have an index of access logs and I want to see how many download events with a specific combination of 'ip', 'filename', 'date_mday', 'date_month', and date_year' exceed 1000 'bytes'
The following query gives me believable counts
index=logs sourcetype=logs
| stats sum(Bytes) as TotalBytes by ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename
but it seems like I should be using eventstats like
index=logs sourcetype=logs
| eventstats sum(Bytes) as TotalBytes by ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename
but whenever I do this, it gives me a much smaller number for each filename. I eventually want to take the TotalBytes of these downloads and see how many minutes of content is downloaded using each file's bitrate, so it's important that the TotalBytes is correct. Why is it more appropriate to use stats
than eventstats
?
How many rows does your base search have? The eventstats
command have limitation on memory usage and max result rows (see limits.conf, search for eventstats), so that might explain incorrect results if there are high number events to be processed. For your scenario, your first implementation, using stats, is the correct and optimal method.
Please use this query. you will get the result
index=logs sourcetype=logs
| eventstats sum(Bytes) as TotalBytes by Bytes,ip, filename, date_mday, date_month, date_year
| where TotalBytes > 1000
| stats count by filename
Please provide your response
Unfortunately, this gave me way too high of a count. I believe this is because it's creating a new TotalBytes
for each different Bytes
within the same download. This doesn't achieve a threading of downloads by the same field, it just gives a much larger count by magnitude of the number of download requests it takes to complete one unique download.
How many rows does your base search have? The eventstats
command have limitation on memory usage and max result rows (see limits.conf, search for eventstats), so that might explain incorrect results if there are high number events to be processed. For your scenario, your first implementation, using stats, is the correct and optimal method.
It has hundreds of thousands of rows, so that actually would make sense. Thank you!
@ahofmann, I have converted @somesoni2 's comment to answer. Please accept the mark this question as answered!
@niketnilay, I am reviewing @logloganathan's answer and will mark the one that worked best as the answer! Thanks
@ahofmann, I think @somesoni's point was that eventstats would be resource consuming command. If you can achieve same results from stats, then you should use the same!