Splunk Search

How to calculate duration time after office hour within a month

Splunkin
Explorer

Im currently having trouble with query to get result of user activity duration after office hour within a month.

i expected result will be like below:
Query A: 
user A = 13d 22h 12m 2s
user B = 10d 15h 27m 3s

OR

Query B: 
user A=320h 23m 2s
user B=267h 42m 1s

 

Both answer can be acceptable for me to get as i try multiple set of query for the result above. Both query take time range after 5.30pm until 8.00am only for a month. 

Below sample query : 
Query A: 
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| transaction user,date_hourmin
| stats sum(duration) as duration by user
| sort - duration
| eval string_dur = tostring(round(duration), "duration")
| eval formatted_dur = replace(string_dur,"(?:(\d+)\+)?0?(\d+):0?(\d+):0?(\d+)","\1d \2h \3m \4s")
| eval result=replace(formatted_dur, "^d (0h (0m )?)?","")
| table user duration string_dur formatted_dur result

Query B: 
| eval date_hourmin = strftime(_time, "%H%M")
| where date_hourmin>=1730 OR date_hourmin<=800
| convert timeformat="%b %Y" ctime(_time) as date_month
| streamstats earliest(date_hourmin) as time_in latest(date_hourmin) as time_out by date_month
| eval duration=time_out-time_in
| stats values(src_ip) as SourceIP values(srcPort) as SourcePort values(dstService) as DestService earliest(time_in) as TimeIn latest(time_out) as TimeOut values(dest_ip) as DestIP values(dstPort) as DestPort sum(duration) as TotalDuration count by date_month,user
| eval secs=TotalDuration%60,mins=floor((TotalDuration/60)%60),hrs=floor((TotalDuration/3600)%60)
| eval HOURS=if(len(hrs)=1,"0".tostring(hrs), tostring(hrs)),MINUTES=if(len(mins)=1,"0".tostring(mins), tostring(mins)),SECONDS=if(len(secs)=1,"0".tostring(secs), tostring(secs))
| eval Time=HOURS.":".MINUTES.":".SECONDS
| table user SourceIP SourcePort DestIP DestPort DestService Time count
| sort - duration limit=10

Both query display the result, HOWEVER, its look like both query are not giving accurate result.
Ive been struggling for this kind of query for a month now, perhaps im missing something here.

Really appreciated if anyone can help n assist on this. TQ. 

Labels (5)
Tags (2)
0 Karma
1 Solution

tscroggins
Influencer

If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:

... earliest=-1mon@mon+8h latest=@mon+8h

When user activity crosses time boundaries, where do you begin counting time? For example:

Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http

Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.

This returns _time values after 5:30 PM local time or before 8:00 AM local time:

| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))

17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:

| eval _time=_time - 28800

Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:

| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

Using Splunk audit data as an example:

index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

You can extend stats with additional fields as needed:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")

The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.

View solution in original post

tscroggins
Influencer

If your boundary is the previous month between 8:00 AM local time on the first day of the month and 8:00 AM local time on the first day of the following month, you can set time constraints in the base search:

... earliest=-1mon@mon+8h latest=@mon+8h

When user activity crosses time boundaries, where do you begin counting time? For example:

Mar 1 14:30 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http
Mar 1 17:45 user=foo src_ip=10.0.0.1 src_port=65535 dest_ip=10.1.0.1 dest_port=80 dest_service=http

Do you begin counting at 17:30 with the assumption the user has been working since 14:30, or do you begin counting at 17:45? For simplicity, I'll assume 17:45 and report the duration of observed activity.

This returns _time values after 5:30 PM local time or before 8:00 AM local time:

| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))

17:30 to 08:00 is less than 24 hours, so we can shift _time values by eight hours (28,800 seconds) to align or normalize them to the same calendar day:

| eval _time=_time - 28800

Finally, we can calculate the range (duration in seconds) of _time by day and user and then sum the duration for a grand total:

| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

Using Splunk audit data as an example:

index=_audit user=* earliest=-1mon@mon+8h latest=@mon+8h
| where (_time > relative_time(_time, "@d+1050m")) OR (_time < relative_time(_time, "@d+8h"))
| eval _time=_time - 28800
| bin _time span=1d as date_day
| stats range(_time) as duration by date_day user
| stats sum(duration) as duration by user
| fieldformat duration=tostring(duration, "duration")

You can extend stats with additional fields as needed:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_service
| fieldformat duration=tostring(duration, "duration")

The fieldformat command uses the tostring function to display the duration field (in seconds) in duration format: [d+]HH:MM:SS[.FFFFFF]. The underlying duration field value remains a decimal number.

Splunkin
Explorer

Hi tscroggins,

Thank you for your sharing idea, looks like i manage to get the duration and seem accurate for me. Kudos to you. 

However, i wonder if can i use this topic to list out top 3 src_ip, src_port, dest_ip, dest_port, service by using stats values? or is it need to change to dc(field) first before get back the values? 
what i understand when using the stats values(field), it will list out all the values from the field.
i can limit it using eval field = mvindex(field,0,2) --> for three field, but it will return on the first three field value that it capture.  

Ex output: 

User  src_ip       src_port        dest_ip         dest_port        service        duration
A         1.1.1.1            22             1.1.1.1                   80                   http         45h 22m 3s
             2.2.2.2           445            2.2.2.2                 443                 https       
             3.3.3.3            53             3.3.3.3                  53                    dns                       

 

For now i have to do it manually for each user n all the detail from the user.
really helpful if  get breakthrough this problem. 

0 Karma

tscroggins
Influencer

@Splunkin 

If you've used something like this:

| stats range(_time) as duration by date_day user src_ip src_port dest_ip dest_port dest_service
| stats sum(duration) as duration by user src_ip src_port dest_ip dest_port dest_servic

you can use streamstats to find the top 3:

| sort 0 + user - duration

| streamstats count by user

| where count < 4

0 Karma

Splunkin
Explorer

hi @tscroggins 

thanks for your suggest. Its work, but it separate each field which the duration of it also will be split.

| stats range(_time) as duration by date_day user src_ip srcPort dest_ip dstPort dstService
| stats sum(duration) as duration by user src_ip srcPort dest_ip dstPort dstService
| sort 0 + user - duration
| streamstats values(src_ip) as src_ip values(srcPort) as srcPort values(dest_ip) as dest_ip values(dstPort) as dstPort values(dstService) as dstService count by user duration
| where count <4
| fieldformat duration=tostring(duration,"duration")

Splunkin_0-1615902810402.png

 

What i would expect is something like stats values, where it would be display like below.

Sample as above comment that i mention. 
User          src_ip          src_port           dest_ip            dest_port          service      duration
                10.1.1.1                   22              10.1.1.1                   80                      http
A             10.1.1.2                   23               10.1.1.2               443                     https      45h 22m 3s
                10.1.1.3                   25               10.1.1.3               8447                   https

                 10.1.1.4                  25                10.1.1.4                80                        http
B              10.1.1.5                  37                 10.1.1.5             443                      https      40h 22m 3s
                 10.1.1.6               445                 10.1.1.6             8447                   https

 

 

0 Karma

tscroggins
Influencer

@Splunkin 

Don't add values() aggregations to streamstats. We're only using streamstats to count sorted duration values by user:

| sort 0 + user - duration
| streamstats count by user
| where count <4
| fieldformat duration=tostring(duration,"duration")

The earlier stats command has already provided tabular output with the fields you need. You can rename them in stats or with separate rename commands if desired.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...