Splunk Search

Getting a Diff of two searches (items removed only)

pwnguin
Engager

Hello,

I'm trying to compare the output of two searches, and display any items that were there yesterday, but not today. The log is a standard access_combined format and I am comparing the clientip field to see which hosts have stopped checking in.

I managed to get this to work using the "set diff" command, but the problem with that is it only displays the differences - there doesn't seem to be a way to find out which items were removed. For example:
| set diff [search sourcetype="access_combined" earliest=-2d@d latest=-1d@d | dedup clientip | table clientip] [search sourcetype="access_combined" earliest=-1d@d latest=-0d@d | dedup clientip | table clientip]

I then found this blog post (http://blogs.splunk.com/2014/03/10/splunk-command-diff/) about piping to a unix-like diff output, which seems to be exactly what I need (Since I can check lines prefixed with a '-'). After some experimentation it seems like this doesn't work well with tables, so I thought maybe I could put them all in one event, sort them, and diff like in the example:
index=main sourcetype="access_combined" earliest=-2d@d latest=-1d@d | dedup clientip | transaction index | sort clientip | table clientip | append [search index=main sourcetype="access_combined" earliest=-1d@d latest=-0d@d | dedup clientip | transaction index | sort clientip | table clientip ] | diff

Unfortunately trying to merge as a transaction didn't work at all (everything was flattened to a single line). Could anyone please give me some pointers on how I can achieve this? I'm still using Splunk 5.x if that matters.

Thanks!

Tags (2)
0 Karma
1 Solution

lguinn2
Legend

Try this. It counts the number of times each clientip appears, split into a "today" count and a "yesterday" count. Then it looks for any clientip that appeared at least once yesterday, but not today.

sourcetype="access_combined" earliest=-2d@d latest=-1d@d
| eval when=if(_time>now()-86400,"today","yesterday")
| chart count by clientip when
| where yesterday>0 and today=0
| table clientip yesterday today

This should also be a lot faster and easier than using transactions or subsearches of any sort.

View solution in original post

robertbarton
Explorer

Could even go simpler and just list client ips that haven't appeared in the last 24 hours and last data time with sourcetype="access combined" earliest=-2d@d latest=+d@d | dedup clientip | where _time"<"now()-86400 | table host _time

0 Karma

robertbarton
Explorer

remove "" from around the less than (couldn't post it here without these)

0 Karma

lguinn2
Legend

Try this. It counts the number of times each clientip appears, split into a "today" count and a "yesterday" count. Then it looks for any clientip that appeared at least once yesterday, but not today.

sourcetype="access_combined" earliest=-2d@d latest=-1d@d
| eval when=if(_time>now()-86400,"today","yesterday")
| chart count by clientip when
| where yesterday>0 and today=0
| table clientip yesterday today

This should also be a lot faster and easier than using transactions or subsearches of any sort.

pwnguin
Engager

Thanks for the reply! Unfortunately, that doesn't seem to work, since where is equal to yesterday or today and not a number (or does this work differently in Splunk 6?). Your method did set me on the right track though - I managed to get the information I wanted using this search:

sourcetype="access_combined" earliest=-2d@d latest=-0d@d | eval when=if(_time>now()-86400,"today","yesterday") | stats first(when) as first, last(when) as last by clientip | where first="yesterday" AND last="yesterday"

In terms of performance, running against production logs, my | set diff search in the op takes 2 minutes to run against a few million events. Using the eval method you described, it took more than an hour! I'm definitely going to have to summarize the clientips daily since I only need to know if there was at least one hit per IP so hopefully it will run a lot faster then.

Cheers!

lguinn2
Legend

Oops - I made a mistake. What I did would work fine with the chart command but not stats. So I revised the answer above. Although your solution will work as well.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...