Howdy,
I'm trying to come up with a query that charts the most occurring x_forwarded_for and respective count in each of the bins over whatever window. Currently, the below query creates a sorted chart of the most occurring x_forwarded_for and their respective count over the entire lookback window, instead of each bin. I think I need to fit head 1 in there somewhere.
It's likely some or all of the x_forwarded_for's across those bins are repeats and I'd like that charted, so no unique counts.
Any help is appreciated!
index="canvas_*" cluster="*" | where isnull(user_id)| bin _time span=5m | stats count by x_forwarded_for | sort - count
As I said, if you want top X from each bin, you have to count and filter 🙂
So instead of the | stats first... you do
| streamstats count as rank by _time
| where rank<=X
Oh, and if you have so many results, add limit of 0 to sort. Like instead of
| sort _time - count
do
| sort 0 _time - count
(i keep forgetting about it)
You start pretty well but then you lose the time information along the way
index="canvas_*" cluster=* NOT user_id=*
| bin _time span=5m
| stats count by x_forwarded_for _time
| sort _time - count
| stats first(*) by _time
If I understand correctly what you want to achieve.
If you want more than one "top" result, instead of stats first you'd need to do streamstats count to add rank order and do a where to limit results by this rank value.
If you want more than one "top" result, instead of stats first you'd need to do streamstats count to add rank order and do a where to limit results by this rank value.
Should I interpret this as in more than one "top" result across the first bin or across all bins?
The query you provided almost gets me there. I was hoping it would get the top value across all bins. This seems to just return the highest count value for the first bin.
Hmm. That should have given you top values for every bin. At least that's how I understood your specs.
Drop the last stats and see what you get.
You should get a table of counted x_forwarded_for values sorted by count, groupped by time (divided into 5-minute bins)
@PickleRick wrote:Hmm. That should have given you top values for every bin. At least that's how I understood your specs.
Drop the last stats and see what you get.
You should get a table of counted x_forwarded_for values sorted by count, groupped by time (divided into 5-minute bins)
x_forwarded_for | _time | count |
x.x.x.1 | 2022-02-10 13:20:00 | 191 |
x.x.x.2 | 2022-02-10 13:20:00 | 77 |
x.x.x.3 | 2022-02-10 13:20:00 | 66 |
Ok. If you drop the last stats you should get the table like you showed.
After all the rows with decreasing counts at 2022-02-10 13:20:00, you should get similarily sorted entries from 13:25, then from 13:30 and so on (that's where stats first picks from).
Look at this:
Is there a way to only display the top 10 results from each bin? With my dataset, there are so many different x_forwarded_for that there are 100 pages (max return) with 100 x_forwarded_for ips per page for just the first bin in the lookback window.
As I said, if you want top X from each bin, you have to count and filter 🙂
So instead of the | stats first... you do
| streamstats count as rank by _time
| where rank<=X
Oh, and if you have so many results, add limit of 0 to sort. Like instead of
| sort _time - count
do
| sort 0 _time - count
(i keep forgetting about it)
Ohhhh, you really meant creating rank as a variable and using that. This is working like a charm!
You are a rockstar and have helped me better understand bin and TIL streamstats. Thank you very much PickleRick.
I'm happy I could have been of help 🙂
Have much fun with your splunk!