Splunk Search

How to find the time period of successive table lines?

ctaf
Contributor

Hello,

I have a base search which output me something like this:

_time            src_host             src_ip
06/19/16      A                         10.10.10.10
06/20/16      A                         9.9.9.9
06/21/16      A                         9.9.9.9
06/22/16      A                         10.10.10.10
06/24/16      A                         10.10.10.10

And I'd like to have a summary of this data because I have hundred of lines.
Is it possible to get this output:

duration                                src_host                        src_ip
06/19/16,06/22/16-06/24/16              A                           10.10.10.10
06/20/16-06/21/16                       A                           9.9.9.9

Edit: Please note that successive values are surrounding with a dash "-" and non-successive values are separated by a comma ",".

Thank you !

0 Karma

javiergn
Super Champion

Try this:

your base search
| stats values(_time) as duration by src_host, src_ip

If you still want duration values separated by commas then you can do it this way:

your base search
| stats values(_time) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ",")

Hope that helps,
J

EDIT (other options as suggested below):

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")

your base search
 | eval _time = strptime(_time, "%m/%d/%y")
 | sort limit=0 _time, src_host, src_ip
 | streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
 | eval dur = round(dur/86400)
 | eval start_time = if(dur>1, _time, start_time)
 | streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
 | eventstats last(_time) as end_time by start_time, src_host, src_ip
 | eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
 | eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
 | eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
 | stats values(duration) as duration by src_host, src_ip
 | eval duration = mvjoin(duration, ", ")
 | sort limit=0 src_host, src_ip
0 Karma

ctaf
Contributor

thank you for your answer javiergn. But with your solution, I am grouping all the _time values on a single row without order. As you can see in my example, I am surrounding successif values with "-" and the rest is separated by ",". It gives me only the period where the IP was actually leased to the host.

0 Karma

javiergn
Super Champion

Hmm, that's a bit more complicated but doable I think.

See if the following help:

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
0 Karma

ctaf
Contributor

Yes it is indeed more complicated. I am not sure to understand how streamstats works but unfortunately, it is not summed up enough, because the ouput is :

 02/16/16, 02/16/16, 02/16/16 - 02/17/16, 02/17/16 - 02/18/16.....

Following days are not summed up.

0 Karma

javiergn
Super Champion

What do you mean?

This is the csv I'm using to replicate your use case:

_time,src_host, src_ip
06/19/16,A,10.10.10.10
06/20/16,A,9.9.9.9
06/21/16,A,9.9.9.9
06/22/16,A,10.10.10.10
06/24/16,A,10.10.10.10

This is the query:

| inputcsv mycsv.csv
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip

And this is the output:

src_host, src_ip, duration
A, 9.9.9.9, 06/20/16 - 06/21/16
A, 10.10.10.10, 06/19/16, 06/22/16 - 06/24/16 

Isn't that what you are looking for according to your sample?

0 Karma

ctaf
Contributor

Actually, in real, the data contains more lines. Like this:

_time,src_host, src_ip
 06/19/16,A,10.10.10.10
 06/20/16,A,9.9.9.9
 06/21/16,A,9.9.9.9
 06/22/16,A,10.10.10.10
 06/24/16,A,10.10.10.10
 06/25/16,A,10.10.10.10
 06/26/16,A,10.10.10.10
 06/28/16,A,10.10.10.10
 06/29/16,A,10.10.10.10
0 Karma

javiergn
Super Champion

so what would you expect out of this:

  06/22/16,A,10.10.10.10
  06/24/16,A,10.10.10.10
  06/25/16,A,10.10.10.10
  06/26/16,A,10.10.10.10
  06/28/16,A,10.10.10.10
  06/29/16,A,10.10.10.10

a) 06/22/16 - 06/29/16,A,10.10.10.10
b) 06/22/16, 06/24/16 - 06/29/16,A,10.10.10.10
?

0 Karma

ctaf
Contributor

c)

06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16         A           10.10.10.10
0 Karma

javiergn
Super Champion

OK, one more attempt:

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
| eval dur = round(dur/86400)
| eval start_time = if(dur>1, _time, start_time)
| streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
| eventstats last(_time) as end_time by start_time, src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip

Output based on your sample above:

src_host   src_ip   duration
--------------------------------------------------------
A   9.9.9.9   06/20/16 - 06/21/16
A   10.10.10.10   06/19/16, 06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16 
0 Karma

ctaf
Contributor

Again, thank you for your help. But it is still strange because I have an output like this:

07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......

But in my base search output, I have a successive list of the same IP from 07/15/15 to 12/11/15

If I apply your search only from 07/15/15 to 12/11/15, I get the right output. But with a larger scope, I get

07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...