Hi, I have 2 lookup which is lookup A and lookup B. My lookup A will be keep update by splunk query and my lookup B is maintain manually.
Both lookup contain same fields which is Hostname, IP and OS.
I need to compare both lookup and bring out the non match Hostname and IP.
Please assist me on this.
Thank You
Because there are three fields, you need to be more descriptive about how want the differences to be highlighted. Maybe you can illustrate different data combinations and desired results?
To start, @bowesmana's formula outputs a line when any field is different; there can be one, two, or three fields that are different. (Also thanks for a great demonstration of the append option in inputlookup!) Let me start with an example.
lookup_A.csv | lookup_B.csv |
Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.3,Windows infoseek.com,10.0.0.5,Solaris yahoo.com,10.0.0.4,AIX | Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.8,Windows yahoo.com,10.0.0.4,Windows |
Here, I only illustrated two variations. There can be more. Specifically, I didn't make variance in Hostname. But I will use it to anchor other variants. If Hostname is also variant, the following formula will still work if you anchor on Hostname; if you anchor on another field, the answer will be rather different depending on other choices you may make.
To highlight differences anchored on Hostname (i.e., based on the assumption that hostname is unique), you can do
| inputlookup lookup_A.csv
| eval origin = "A"
| inputlookup append=t lookup_B.csv
| eval origin = coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1
| fields - originCount
| stats list(*) as * by Hostname
| foreach IP OS ``` anchor on Hostname, seek variance in IP, OS ```
[eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))]
| fields - origins
The above sample data will give
Hostname | IP | OS |
google.com | A:10.0.0.3 B:10.0.0.8 | Windows |
infoseek.com | A:10.0.0.5 | A:Solaris |
yahoo.com | 10.0.0.4 | A:AIX B:Windows |
Is this something you could use?
Something like this
| inputlookup lookup_A
| eval origin="A"
| inputlookup append=t lookup_B
| eval origin=coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1
where you load both inputs and set origin value to be where the data come, then join the two together with stats and show only those that have a single origin
Hi @bowesmana ,
it works great as expected, but is there any way to flag or highlight the differentiate value.
because there are 3 fields are compared. so i need to check both lookup in order to find the missing info.
Because there are three fields, you need to be more descriptive about how want the differences to be highlighted. Maybe you can illustrate different data combinations and desired results?
To start, @bowesmana's formula outputs a line when any field is different; there can be one, two, or three fields that are different. (Also thanks for a great demonstration of the append option in inputlookup!) Let me start with an example.
lookup_A.csv | lookup_B.csv |
Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.3,Windows infoseek.com,10.0.0.5,Solaris yahoo.com,10.0.0.4,AIX | Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.8,Windows yahoo.com,10.0.0.4,Windows |
Here, I only illustrated two variations. There can be more. Specifically, I didn't make variance in Hostname. But I will use it to anchor other variants. If Hostname is also variant, the following formula will still work if you anchor on Hostname; if you anchor on another field, the answer will be rather different depending on other choices you may make.
To highlight differences anchored on Hostname (i.e., based on the assumption that hostname is unique), you can do
| inputlookup lookup_A.csv
| eval origin = "A"
| inputlookup append=t lookup_B.csv
| eval origin = coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1
| fields - originCount
| stats list(*) as * by Hostname
| foreach IP OS ``` anchor on Hostname, seek variance in IP, OS ```
[eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))]
| fields - origins
The above sample data will give
Hostname | IP | OS |
google.com | A:10.0.0.3 B:10.0.0.8 | Windows |
infoseek.com | A:10.0.0.5 | A:Solaris |
yahoo.com | 10.0.0.4 | A:AIX B:Windows |
Is this something you could use?
Hi @yuanliu
i think i want as your example. but what field should i put for below eval?
[eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))] | fields - origins
is it os or ip?
because i cannot use the eval 2 times