Solved: Compare 2 lookup to show non match info

Akmal57 · ‎10-09-2023

Hi, I have 2 lookup which is lookup A and lookup B. My lookup A will be keep update by splunk query and my lookup B is maintain manually.

Both lookup contain same fields which is Hostname, IP and OS.

I need to compare both lookup and bring out the non match Hostname and IP.

Please assist me on this.

Thank You

yuanliu · ‎10-10-2023

Because there are three fields, you need to be more descriptive about how want the differences to be highlighted. Maybe you can illustrate different data combinations and desired results?

To start, @bowesmana's formula outputs a line when any field is different; there can be one, two, or three fields that are different. (Also thanks for a great demonstration of the append option in inputlookup!) Let me start with an example.

lookup_A.csv	lookup_B.csv
Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.3,Windows infoseek.com,10.0.0.5,Solaris yahoo.com,10.0.0.4,AIX	Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.8,Windows yahoo.com,10.0.0.4,Windows

Here, I only illustrated two variations. There can be more. Specifically, I didn't make variance in Hostname. But I will use it to anchor other variants. If Hostname is also variant, the following formula will still work if you anchor on Hostname; if you anchor on another field, the answer will be rather different depending on other choices you may make.

To highlight differences anchored on Hostname (i.e., based on the assumption that hostname is unique), you can do

| inputlookup lookup_A.csv
| eval origin = "A"
| inputlookup append=t lookup_B.csv
| eval origin = coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1
| fields - originCount
| stats list(*) as * by Hostname
| foreach IP OS ``` anchor on Hostname, seek variance in IP, OS ```
    [eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))]
| fields - origins

The above sample data will give

Hostname	IP	OS
google.com	A:10.0.0.3 B:10.0.0.8	Windows
infoseek.com	A:10.0.0.5	A:Solaris
yahoo.com	10.0.0.4	A:AIX B:Windows

Is this something you could use?

View solution in original post

bowesmana · ‎10-09-2023

@Akmal57

Something like this

| inputlookup lookup_A
| eval origin="A"
| inputlookup append=t lookup_B
| eval origin=coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1

where you load both inputs and set origin value to be where the data come, then join the two together with stats and show only those that have a single origin

Akmal57 · ‎10-09-2023

Hi @bowesmana ,

it works great as expected, but is there any way to flag or highlight the differentiate value.

because there are 3 fields are compared. so i need to check both lookup in order to find the missing info.

yuanliu · ‎10-10-2023

Because there are three fields, you need to be more descriptive about how want the differences to be highlighted. Maybe you can illustrate different data combinations and desired results?

To start, @bowesmana's formula outputs a line when any field is different; there can be one, two, or three fields that are different. (Also thanks for a great demonstration of the append option in inputlookup!) Let me start with an example.

lookup_A.csv	lookup_B.csv
Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.3,Windows infoseek.com,10.0.0.5,Solaris yahoo.com,10.0.0.4,AIX	Hostname,IP,OS splunk.com,10.0.0.1,MacOS youtube.com,10.0.0.2,Linux google.com,10.0.0.8,Windows yahoo.com,10.0.0.4,Windows

Here, I only illustrated two variations. There can be more. Specifically, I didn't make variance in Hostname. But I will use it to anchor other variants. If Hostname is also variant, the following formula will still work if you anchor on Hostname; if you anchor on another field, the answer will be rather different depending on other choices you may make.

To highlight differences anchored on Hostname (i.e., based on the assumption that hostname is unique), you can do

| inputlookup lookup_A.csv
| eval origin = "A"
| inputlookup append=t lookup_B.csv
| eval origin = coalesce(origin, "B")
| stats dc(origin) as originCount values(origin) as origins by Hostname IP OS
| where originCount=1
| fields - originCount
| stats list(*) as * by Hostname
| foreach IP OS ``` anchor on Hostname, seek variance in IP, OS ```
    [eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))]
| fields - origins

The above sample data will give

Hostname	IP	OS
google.com	A:10.0.0.3 B:10.0.0.8	Windows
infoseek.com	A:10.0.0.5	A:Solaris
yahoo.com	10.0.0.4	A:AIX B:Windows

Is this something you could use?

Akmal57 · ‎10-10-2023

Hi @yuanliu

i think i want as your example. but what field should i put for below eval?

[eval <<FIELD>> = if(mvindex(<<FIELD>>, 0) == mvindex(<<FIELD>>, 1), mvindex(<<FIELD>>, 0), mvzip(origins, <<FIELD>>, ":"))]
| fields - origins

is it os or ip?

yuanliu · ‎10-10-2023

This subsearch iterates <<FIELD>> between IP and OS. So, both. (<<FIELD>> is not meta code; it is part of foreach syntax.)

Akmal57 · ‎10-11-2023

Hi @yuanliu , noted on that.

Thank you for your assist on this.

its really helpful

Akmal57 · ‎10-10-2023

because i cannot use the eval 2 times

Compare 2 lookup to show non match info

fields

lookup

table

Splunk App Dev Community Updates – What’s New and What’s Next

The Latest Cisco Integrations With Splunk Platform!

Enterprise Security Content Update (ESCU) | New Releases

Are you a member of the Splunk Community?