We have a Splunk instance here at my job that I've inherited. I rarely have to go do anything in it so my Splunk Fu is definitely lacking. We're looking at some new public file transfer appliance and need to know something kind of odd for licensing (according to my manager anyway).
My manager wants to know, which 60 day period has the highest number of unique user id's that successfully authenticated over the last 2 years. Seems odd that someone would come up with a licensing framework where that matters but I do what I'm told...
Here is an example of our current FTP logs when a user successfully logs in:
2012-06-26 22:55:39 dest_ip 21 2172 source_ip 4964 3248 Username 0 0 - 18 RESPONSE: 230-Welcome Username from dest_ip.
I have custom fields setup for source, destination, and username. What I'm having trouble grasping is how to I look for the top 60 days over the last 2 years.
Any help would be greatly appreciated.
If it's a rolling 60 day period, you'll want to use a pretty complex class of reports where you use timechart
OR stats
, and then you use streamstats
.
To start you off with a simpler example, here's a report that searches the _internal index and shows total indexing volume for the sourcetype "splunkd", over a rolling 24 hour period.
index=_internal group="per_sourcetype_thruput" series="splunkd" | timechart span="1h" sum(kb) as KB | streamstats sum(KB) as 24HourRollingKB window=24
Your case though, with the distinct counts, is a little more unusual.
This search will give you, for each day, the rolling distinct count of userId's in the last 30 days up to that point.
foo | bin _time span="1d" | stats values(userId) as userId by _time | streamstats dc(userId) window=30
(You might want to put a sort - _time
before the streamstats. Sort of depends from what side you want to calculate things)
and then from there you can get the 1 day with the highest rolling-30-day-period distinct userId count by just sorting those results and getting the top value:
foo | bin _time span="1d" | stats values(userId) as userId by _time | streamstats dc(userId) as distinctUsers window=30 | sort - distinctUsers | head 1
If it's a rolling 60 day period, you'll want to use a pretty complex class of reports where you use timechart
OR stats
, and then you use streamstats
.
To start you off with a simpler example, here's a report that searches the _internal index and shows total indexing volume for the sourcetype "splunkd", over a rolling 24 hour period.
index=_internal group="per_sourcetype_thruput" series="splunkd" | timechart span="1h" sum(kb) as KB | streamstats sum(KB) as 24HourRollingKB window=24
Your case though, with the distinct counts, is a little more unusual.
This search will give you, for each day, the rolling distinct count of userId's in the last 30 days up to that point.
foo | bin _time span="1d" | stats values(userId) as userId by _time | streamstats dc(userId) window=30
(You might want to put a sort - _time
before the streamstats. Sort of depends from what side you want to calculate things)
and then from there you can get the 1 day with the highest rolling-30-day-period distinct userId count by just sorting those results and getting the top value:
foo | bin _time span="1d" | stats values(userId) as userId by _time | streamstats dc(userId) as distinctUsers window=30 | sort - distinctUsers | head 1
Good question. I believe it's rolling but I'll ask to make sure. Given the wording of the request, it makes me think it's rolling (he says for ANY 60 day period).
Is it a rolling 60 day period, or fixed 2-month period based on something like first-day-of-month?