Hello,
I have a base search which output me something like this:
_time src_host src_ip
06/19/16 A 10.10.10.10
06/20/16 A 9.9.9.9
06/21/16 A 9.9.9.9
06/22/16 A 10.10.10.10
06/24/16 A 10.10.10.10
And I'd like to have a summary of this data because I have hundred of lines.
Is it possible to get this output:
duration src_host src_ip
06/19/16,06/22/16-06/24/16 A 10.10.10.10
06/20/16-06/21/16 A 9.9.9.9
Edit: Please note that successive values are surrounding with a dash "-" and non-successive values are separated by a comma ",".
Thank you !
Try this:
your base search
| stats values(_time) as duration by src_host, src_ip
If you still want duration values separated by commas then you can do it this way:
your base search
| stats values(_time) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ",")
Hope that helps,
J
EDIT (other options as suggested below):
your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
| eval dur = round(dur/86400)
| eval start_time = if(dur>1, _time, start_time)
| streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
| eventstats last(_time) as end_time by start_time, src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip
thank you for your answer javiergn. But with your solution, I am grouping all the _time values on a single row without order. As you can see in my example, I am surrounding successif values with "-" and the rest is separated by ",". It gives me only the period where the IP was actually leased to the host.
Hmm, that's a bit more complicated but doable I think.
See if the following help:
your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
Yes it is indeed more complicated. I am not sure to understand how streamstats works but unfortunately, it is not summed up enough, because the ouput is :
02/16/16, 02/16/16, 02/16/16 - 02/17/16, 02/17/16 - 02/18/16.....
Following days are not summed up.
What do you mean?
This is the csv I'm using to replicate your use case:
_time,src_host, src_ip
06/19/16,A,10.10.10.10
06/20/16,A,9.9.9.9
06/21/16,A,9.9.9.9
06/22/16,A,10.10.10.10
06/24/16,A,10.10.10.10
This is the query:
| inputcsv mycsv.csv
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip
And this is the output:
src_host, src_ip, duration
A, 9.9.9.9, 06/20/16 - 06/21/16
A, 10.10.10.10, 06/19/16, 06/22/16 - 06/24/16
Isn't that what you are looking for according to your sample?
Actually, in real, the data contains more lines. Like this:
_time,src_host, src_ip
06/19/16,A,10.10.10.10
06/20/16,A,9.9.9.9
06/21/16,A,9.9.9.9
06/22/16,A,10.10.10.10
06/24/16,A,10.10.10.10
06/25/16,A,10.10.10.10
06/26/16,A,10.10.10.10
06/28/16,A,10.10.10.10
06/29/16,A,10.10.10.10
so what would you expect out of this:
06/22/16,A,10.10.10.10
06/24/16,A,10.10.10.10
06/25/16,A,10.10.10.10
06/26/16,A,10.10.10.10
06/28/16,A,10.10.10.10
06/29/16,A,10.10.10.10
a) 06/22/16 - 06/29/16,A,10.10.10.10
b) 06/22/16, 06/24/16 - 06/29/16,A,10.10.10.10
?
c)
06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16 A 10.10.10.10
OK, one more attempt:
your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
| eval dur = round(dur/86400)
| eval start_time = if(dur>1, _time, start_time)
| streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
| eventstats last(_time) as end_time by start_time, src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip
Output based on your sample above:
src_host src_ip duration
--------------------------------------------------------
A 9.9.9.9 06/20/16 - 06/21/16
A 10.10.10.10 06/19/16, 06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16
Again, thank you for your help. But it is still strange because I have an output like this:
07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......
But in my base search output, I have a successive list of the same IP from 07/15/15 to 12/11/15
If I apply your search only from 07/15/15 to 12/11/15, I get the right output. But with a larger scope, I get
07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......