Hello,
I'm trying to compare the output of two searches, and display any items that were there yesterday, but not today. The log is a standard access_combined format and I am comparing the clientip field to see which hosts have stopped checking in.
I managed to get this to work using the "set diff" command, but the problem with that is it only displays the differences - there doesn't seem to be a way to find out which items were removed. For example:
| set diff [search sourcetype="access_combined" earliest=-2d@d latest=-1d@d | dedup clientip | table clientip] [search sourcetype="access_combined" earliest=-1d@d latest=-0d@d | dedup clientip | table clientip]
I then found this blog post (http://blogs.splunk.com/2014/03/10/splunk-command-diff/) about piping to a unix-like diff output, which seems to be exactly what I need (Since I can check lines prefixed with a '-'). After some experimentation it seems like this doesn't work well with tables, so I thought maybe I could put them all in one event, sort them, and diff like in the example:
index=main sourcetype="access_combined" earliest=-2d@d latest=-1d@d | dedup clientip | transaction index | sort clientip | table clientip | append [search index=main sourcetype="access_combined" earliest=-1d@d latest=-0d@d | dedup clientip | transaction index | sort clientip | table clientip ] | diff
Unfortunately trying to merge as a transaction didn't work at all (everything was flattened to a single line). Could anyone please give me some pointers on how I can achieve this? I'm still using Splunk 5.x if that matters.
Thanks!
Try this. It counts the number of times each clientip appears, split into a "today" count and a "yesterday" count. Then it looks for any clientip that appeared at least once yesterday, but not today.
sourcetype="access_combined" earliest=-2d@d latest=-1d@d
| eval when=if(_time>now()-86400,"today","yesterday")
| chart count by clientip when
| where yesterday>0 and today=0
| table clientip yesterday today
This should also be a lot faster and easier than using transactions or subsearches of any sort.
Could even go simpler and just list client ips that haven't appeared in the last 24 hours and last data time with sourcetype="access combined" earliest=-2d@d latest=+d@d | dedup clientip | where _time"<"now()-86400 | table host _time
remove "" from around the less than (couldn't post it here without these)
Try this. It counts the number of times each clientip appears, split into a "today" count and a "yesterday" count. Then it looks for any clientip that appeared at least once yesterday, but not today.
sourcetype="access_combined" earliest=-2d@d latest=-1d@d
| eval when=if(_time>now()-86400,"today","yesterday")
| chart count by clientip when
| where yesterday>0 and today=0
| table clientip yesterday today
This should also be a lot faster and easier than using transactions or subsearches of any sort.
Thanks for the reply! Unfortunately, that doesn't seem to work, since where
is equal to yesterday
or today
and not a number (or does this work differently in Splunk 6?). Your method did set me on the right track though - I managed to get the information I wanted using this search:
sourcetype="access_combined" earliest=-2d@d latest=-0d@d | eval when=if(_time>now()-86400,"today","yesterday") | stats first(when) as first, last(when) as last by clientip | where first="yesterday" AND last="yesterday"
In terms of performance, running against production logs, my | set diff
search in the op takes 2 minutes to run against a few million events. Using the eval
method you described, it took more than an hour! I'm definitely going to have to summarize the clientips daily since I only need to know if there was at least one hit per IP so hopefully it will run a lot faster then.
Cheers!
Oops - I made a mistake. What I did would work fine with the chart
command but not stats
. So I revised the answer above. Although your solution will work as well.