Splunk Search

How to calculate duration time after office hour within a month

Splunkin
Explorer

Im currently having trouble with query to get result of user activity duration after office hour within a month.

i expected result will be like below:
Query A: 
user A = 13d 22h 12m 2s
user B = 10d 15h 27m 3s

OR

Query B: 
user A=320h 23m 2s
user B=267h 42m 1s

 

Both answer can be acceptable for me to get as i try multiple set of query for the result above. Both query take time range after 5.30pm until 8.00am only for a month. 

Below sample query : 
Query A: 
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| transaction user,date_hourmin
| stats sum(duration) as duration by user
| sort - duration
| eval string_dur = tostring(round(duration), "duration")
| eval formatted_dur = replace(string_dur,"(?:(\d+)\+)?0?(\d+):0?(\d+):0?(\d+)","\1d \2h \3m \4s")
| eval result=replace(formatted_dur, "^d (0h (0m )?)?","")
| table user duration string_dur formatted_dur result

Query B: 
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| convert timeformat="%b %Y" ctime(_time) as date_month
| streamstats earliest(date_hourmin) as time_in latest(date_hourmin) as time_out by date_month
| eval duration=time_out-time_in
| stats values(src_ip) as SourceIP values(srcPort) as SourcePort values(dstService) as DestService earliest(time_in) as TimeIn latest(time_out) as TimeOut values(dest_ip) as DestIP values(dstPort) as DestPort sum(duration) as TotalDuration count by date_month,user
| eval secs=TotalDuration%60,mins=floor((TotalDuration/60)%60),hrs=floor((TotalDuration/3600)%60)
| eval HOURS=if(len(hrs)=1,"0".tostring(hrs), tostring(hrs)),MINUTES=if(len(mins)=1,"0".tostring(mins), tostring(mins)),SECONDS=if(len(secs)=1,"0".tostring(secs), tostring(secs))
| eval Time=HOURS.":".MINUTES.":".SECONDS
| table user SourceIP SourcePort DestIP DestPort DestService Time count
| sort - duration limit=10

Both query display the result, HOWEVER, its look like both query are not giving accurate result.
Ive been struggling for this kind of query for a month now, perhaps im missing something here.

Really appreciated if anyone can help n assist on this. TQ. 

Labels (5)
Tags (2)
0 Karma
1 Solution

tscroggins
Motivator

If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:

... earliest=-1mon@mon+8h latest=@mon+8h

When user activity crosses time boundaries, where do you begin counting time? For example:

Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http

Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.

This returns _time values after 5:30 PM local time or before 8:00 AM local time:

| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))

17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:

| eval _time=_time - 28800

Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:

| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

Using Splunk audit data as an example:

index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

You can extend stats with additional fields as needed:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")

The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.

View solution in original post

tscroggins
Motivator

If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:

... earliest=-1mon@mon+8h latest=@mon+8h

When user activity crosses time boundaries, where do you begin counting time? For example:

Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http

Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.

This returns _time values after 5:30 PM local time or before 8:00 AM local time:

| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))

17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:

| eval _time=_time - 28800

Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:

| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

Using Splunk audit data as an example:

index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

You can extend stats with additional fields as needed:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")

The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.

View solution in original post

Splunkin
Explorer

Hi tscroggins,

Thank you for your sharing idea, looks like i manage to get the duration and seem accurate for me. Kudos to you. 

However, i wonder if can i use this topic to list out top 3 src_ip, src_port, dest_ip, dest_port, service by using stats values? or is it need to change to dc(field) first before get back the values? 
what i understand when using the stats values(field), it will list out all the values from the field.
i can limit it using eval field = mvindex(field,0,2) --> for three field, but it will return on the first three field value that it capture.  

Ex output: 

User  src_ip       src_port        dest_ip         dest_port        service        duration
A         1.1.1.1            22             1.1.1.1                   80                   http         45h 22m 3s
             2.2.2.2           445            2.2.2.2                 443                 https       
             3.3.3.3            53             3.3.3.3                  53                    dns                       

 

For now i have to do it manually for each user n all the detail from the user.
really helpful if  get breakthrough this problem. 

0 Karma

tscroggins
Motivator

@Splunkin 

If you've used something like this:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_servic

you can use streamstats to find the top 3:

| sort 0 + user - duration

| streamstats count by user

| where count < 4

0 Karma

Splunkin
Explorer

hi @tscroggins 

thanks for your suggest. Its work, but it separate each field which the duration of it also will be split.

| stats range(_time) as duration by date_day user src_ip srcPort dest_ip dstPort dstService
| stats sum(duration) as duration by user src_ip srcPort dest_ip dstPort dstService
| sort 0 + user - duration
| streamstats values(src_ip) as src_ip values(srcPort) as srcPort values(dest_ip) as dest_ip values(dstPort) as dstPort values(dstService) as dstService count by user duration
| where count <4
| fieldformat duration=tostring(duration,"duration")

Splunkin_0-1615902810402.png

 

What i would expect is something like stats values, where it would be display like below.

Sample as above comment that i mention. 
User          src_ip          src_port           dest_ip            dest_port          service      duration
                10.1.1.1                   22              10.1.1.1                   80                      http
A             10.1.1.2                   23               10.1.1.2               443                     https      45h 22m 3s
                10.1.1.3                   25               10.1.1.3               8447                   https

                 10.1.1.4                  25                10.1.1.4                80                        http
B              10.1.1.5                  37                 10.1.1.5             443                      https      40h 22m 3s
                 10.1.1.6               445                 10.1.1.6             8447                   https

 

 

0 Karma

tscroggins
Motivator

@Splunkin 

Don't add values() aggregations to streamstats. We're only using streamstats to count sorted duration values by user:

| sort 0 + user - duration
| streamstats count by user
| where count <4
| fieldformat duration=tostring(duration,"duration")

The earlier stats command has already provided tabular output with the fields you need. You can rename them in stats or with separate rename commands if desired.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!