Im currently having trouble with query to get result of user activity duration after office hour within a month.
i expected result will be like below:
Query A:
user A = 13d 22h 12m 2s
user B = 10d 15h 27m 3s
OR
Query B:
user A=320h 23m 2s
user B=267h 42m 1s
Both answer can be acceptable for me to get as i try multiple set of query for the result above. Both query take time range after 5.30pm until 8.00am only for a month.
Below sample query :
Query A:
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| transaction user,date_hourmin
| stats sum(duration) as duration by user
| sort - duration
| eval string_dur = tostring(round(duration), "duration")
| eval formatted_dur = replace(string_dur,"(?:(\d+)\+)?0?(\d+):0?(\d+):0?(\d+)","\1d \2h \3m \4s")
| eval result=replace(formatted_dur, "^d (0h (0m )?)?","")
| table user duration string_dur formatted_dur result
Query B:
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| convert timeformat="%b %Y" ctime(_time) as date_month
| streamstats earliest(date_hourmin) as time_in latest(date_hourmin) as time_out by date_month
| eval duration=time_out-time_in
| stats values(src_ip) as SourceIP values(srcPort) as SourcePort values(dstService) as DestService earliest(time_in) as TimeIn latest(time_out) as TimeOut values(dest_ip) as DestIP values(dstPort) as DestPort sum(duration) as TotalDuration count by date_month,user
| eval secs=TotalDuration%60,mins=floor((TotalDuration/60)%60),hrs=floor((TotalDuration/3600)%60)
| eval HOURS=if(len(hrs)=1,"0".tostring(hrs), tostring(hrs)),MINUTES=if(len(mins)=1,"0".tostring(mins), tostring(mins)),SECONDS=if(len(secs)=1,"0".tostring(secs), tostring(secs))
| eval Time=HOURS.":".MINUTES.":".SECONDS
| table user SourceIP SourcePort DestIP DestPort DestService Time count
| sort - duration limit=10
Both query display the result, HOWEVER, its look like both query are not giving accurate result.
Ive been struggling for this kind of query for a month now, perhaps im missing something here.
Really appreciated if anyone can help n assist on this. TQ.
If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:
... earliest=-1mon@mon+8h latest=@mon+8h
When user activity crosses time boundaries, where do you begin counting time? For example:
Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.
This returns _time values after 5:30 PM local time or before 8:00 AM local time:
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:
| eval _time=_time - 28800
Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")
Using Splunk audit data as an example:
index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")
You can extend stats with additional fields as needed:
| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")
The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.
If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:
... earliest=-1mon@mon+8h latest=@mon+8h
When user activity crosses time boundaries, where do you begin counting time? For example:
Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.
This returns _time values after 5:30 PM local time or before 8:00 AM local time:
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:
| eval _time=_time - 28800
Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")
Using Splunk audit data as an example:
index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")
You can extend stats with additional fields as needed:
| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")
The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.
Hi tscroggins,
Thank you for your sharing idea, looks like i manage to get the duration and seem accurate for me. Kudos to you.
However, i wonder if can i use this topic to list out top 3 src_ip, src_port, dest_ip, dest_port, service by using stats values? or is it need to change to dc(field) first before get back the values?
what i understand when using the stats values(field), it will list out all the values from the field.
i can limit it using eval field = mvindex(field,0,2) --> for three field, but it will return on the first three field value that it capture.
Ex output:
User src_ip src_port dest_ip dest_port service duration
A 1.1.1.1 22 1.1.1.1 80 http 45h 22m 3s
2.2.2.2 445 2.2.2.2 443 https
3.3.3.3 53 3.3.3.3 53 dns
For now i have to do it manually for each user n all the detail from the user.
really helpful if get breakthrough this problem.
If you've used something like this:
| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_servic
you can use streamstats to find the top 3:
| sort 0 + user - duration
| streamstats count by user
| where count < 4
hi @tscroggins
thanks for your suggest. Its work, but it separate each field which the duration of it also will be split.
| stats range(_time) as duration by date_day user src_ip srcPort dest_ip dstPort dstService
| stats sum(duration) as duration by user src_ip srcPort dest_ip dstPort dstService
| sort 0 + user - duration
| streamstats values(src_ip) as src_ip values(srcPort) as srcPort values(dest_ip) as dest_ip values(dstPort) as dstPort values(dstService) as dstService count by user duration
| where count <4
| fieldformat duration=tostring(duration,"duration")
What i would expect is something like stats values, where it would be display like below.
Sample as above comment that i mention.
User src_ip src_port dest_ip dest_port service duration
10.1.1.1 22 10.1.1.1 80 http
A 10.1.1.2 23 10.1.1.2 443 https 45h 22m 3s
10.1.1.3 25 10.1.1.3 8447 https
10.1.1.4 25 10.1.1.4 80 http
B 10.1.1.5 37 10.1.1.5 443 https 40h 22m 3s
10.1.1.6 445 10.1.1.6 8447 https
Don't add values() aggregations to streamstats. We're only using streamstats to count sorted duration values by user:
| sort 0 + user - duration
| streamstats count by user
| where count <4
| fieldformat duration=tostring(duration,"duration")
The earlier stats command has already provided tabular output with the fields you need. You can rename them in stats or with separate rename commands if desired.