Splunk Search

Compare two indexes and report on mismatch

Richy_s
Path Finder

I'm comparing two indexes, A and B, using the hostname as the common field. My current search successfully identifies whether each hostname in index A is present in index B. However, I also want to include additional information from index A, such as the operating system and device type, in the output. This information is not present in index B. How can I modify my query to display the operating system alongside the status (missing/ok) for each hostname?

below is the query I am using

index=A sourcetype="Any" | eval Hostname=lower(Hostname) | table Hostname | dedup Hostname | append [ search index=B sourcetype="foo" | eval Hostname=lower(Reporting_Host) | table Hostname | dedup Hostname ] | stats count by Hostname
| eval match=if(count=1, "missing", "ok")
Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

First, unless you have prior knowledge that number of Hostname in index A is always larger than that in index B in any search period, "missing" simply means that the name appears only in one index.  The following does not try to address this problem, but will give you what you want, and is much simpler, perhaps more performant.

(index=A sourcetype="Any") OR (index=B sourcetype="foo")
| eval Hostname=coalesce(lower(Hostname), lower(Reporting_Host))
| fields index Hostname operating_system device_type
| stats values(*) as * by Hostname
| eval match=if(mvcount(index) == 1, "missing", "ok")

Not only operating system and device type, you can add any other fields of interest that may only exist in one of indices.

Richy_s
Path Finder

Thank you @yuanliu  @jawahir007  Both of your solutions are working absolutely fine. @yuanliu  yes, index A always has larger number of hosts compared to index B.

I would like to further expand this query to match the IP address aswell.  Can you provide some guidance around that.

index A data 

HostnameIP addressOS
xyz

190.1.1.1,  101.2.2.2, 102.3.3.3, 4.3.2.1

Windows

zbc100.0.1.0Linux
alb190.1.0.2Windows
cgf20.4.2.1Windows
bcn20.5.3.4, 30.4.6.1Solaris

 

Index B

Hostname
zbc
30.4.6.1
alb
101.2.2.2

 

Results

HostnameIP addressOSmatch
xyz

190.1.1.1,  101.2.2.2, 102.3.3.3, 4.3.2.1

Windows

ok(because IP address 101.2.2.2 is matching)

zbc100.0.1.0Linuxok
alb190.1.0.2Windowsok
cgf20.4.2.1Windowsmissing(neither hostname is present nor the IP is matching)
bcn20.5.3.4, 30.4.6.1Solarisyes(IP is matching)

In my initial use case, I compared the hostnames in index A with those in index B. Now, I want to check if the hosts in index A are reporting their IP addresses in index B. If there’s a match, I will mark the corresponding hostname in index A as "ok."

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Assuming that, like in the OP, index A still carries Hostname field that you want to compare with Reporting_Host in index B.  In addition, index A has a field "IP address".  This should get your desired result.

 

index=A sourcetype="Any"
| stats values("IP address") as "IP address" by Hostname OS
| append
    [search index=B sourcetype="foo"
    | stats values(Reporting_Host) as Reporting_Host]
| eventstats values(eval(lower(Reporting_Host))) as Reporting_Host
| where index != "B"
| mvexpand "IP address"
| eval match = if(lower(Hostname) IN (Reporting_Host) OR 'IP address' IN (Reporting_Host), "ok", null())
| stats values("IP address") as "IP address" values(match) as match by Hostname OS
| fillnull match value="missing"

 

Use the following emulation:

 

| makeresults format=csv data="Hostname,	IP address,	OS
xyz,	190.1.1.1:101.2.2.2:102.3.3.3:4.3.2.1,Windows
zbc,	100.0.1.0,	Linux
alb,	190.1.0.2,	Windows
cgf,	20.4.2.1,	Windows
bcn,	20.5.3.4:30.4.6.1,	Solaris"
| eval "IP address" = split('IP address', ":")
| eval index = "A"
| append
    [makeresults format=csv data="Reporting_Host
zbc
30.4.6.1
alb
101.2.2.2"
    | eval index = "B"]
``` the above emulates
index=A sourcetype="Any"
| stats values("IP address") as "IP address" by Hostname OS
| append
    [search index=B sourcetype="foo"
    | stats values(Reporting_Host) as Reporting_Host]
```

 

the result is

HostnameOSIP addressmatch
albWindows190.1.0.2ok
bcnSolaris
20.5.3.4
30.4.6.1
ok
cgfWindows20.4.2.1missing
xyzWindows
101.2.2.2
102.3.3.3
190.1.1.1
4.3.2.1
ok
zbcLinux100.0.1.0ok
0 Karma

Richy_s
Path Finder

@yuanliu Thank you for your response.  I tried below query but it doesn't seem to be working.  When  I further cut down the query for testing, looks like "|where index!=B" is not working.  Everything before this query is working but when I add this condition, I get 0 results.

also, the query seems to be very aggressive.  My index A has almost close to 70k events and index B has around 10k events.  Splunk was crashing few times when I try to run the query.  

Any suggestions, how to address this ?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

My apologies.  I was switching between two different approaches and the filters got crossed.  To use the subsearch method above, modify that line to | where isnotnull(OS)

index=A sourcetype="Any"
| stats values("IP address") as "IP address" by Hostname OS
| append
    [search index=B sourcetype="foo"
    | stats values(Reporting_Host) as Reporting_Host]
| eventstats values(eval(lower(Reporting_Host))) as Reporting_Host
| where isnotnull(OS)
| mvexpand "IP address"
| eval match = if(lower(Hostname) IN (Reporting_Host) OR 'IP address' IN (Reporting_Host), "ok", null())
| stats values("IP address") as "IP address" values(match) as match by Hostname OS
| fillnull match value="missing"

 Depending on your deployment, combining the two index searches could improve performance, like this

(index=A sourcetype="Any") OR (index=B sourcetype="foo")
| eventstats values(eval(lower(Reporting_Host))) as Reporting_Host
| where index != "B"
| mvexpand "IP address"
| eval match = if(lower(Hostname) IN (Reporting_Host) OR 'IP address' IN (Reporting_Host), "ok", null())
| stats values("IP address") as "IP address" values(match) as match by Hostname OS
| fillnull match value="missing"

But eventstats and mvexpand could be bigger performance hindrances.  There could be ways to avoid mvexpand; there could be ways to improve eventstats.  But unless you can isolate the main contributor to slowness, they are not worth exploring.

Performance is a complex subject with any querying language.  You can start by doing some basic tests.  For example, run those two subsearches separately and compare with combined search.  If the total time is comparable, index search is the main hindrance.  That will be very difficult to improve.  Another test could be to add dedup before stats.  And so on.

0 Karma

Richy_s
Path Finder

Hi @yuanliu  Unfortunately none of the below queries are working for me.  First one is crashing splunk so unable to test it.  Second one, I don't get any results.  Could be because the field "Reporting_Host" is present only in index B and since we are excluding index B in the next step, the results are 0.  However I tried renaming the Hostname field in index A and try running the query but no results.

 

Can we test this scenario using a look up table, that might improve the search performance.  Can you give me something in this regards?

0 Karma

Richy_s
Path Finder

Something like this 

index=B | stats count by Reporting_Host | search NOT [| inputlookup inventory.csv ] | table Hostname ]

inventory.csv has the table pickup from index A.  Lookup query is - index=B | stats values(IP address) by Hostname Operating_system

PS This query is not working, just a thought.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You said crash several times.  This makes me think that your server may not have enough memory for that giant eventstats. (Again, this all depends on how many unique host name and IP addresses are in those tens of thousands of events.  If that number is unusually large, it could exceed system RAM.  But it also makes me suspect that your search heads might be under provisioned.)

If you are willing to use inventory lookup, things can improve.  Given that only one field from index B is useful in your logic, inventory should come from this index. (You also said index B is smaller.)

Here is my suggested setup for inventory.csv.

index = B sourcetype="foo"
| dedup Reporting_Host
| table Reporting_Host
| outputlookup inventory.csv

After this,

index=A sourcetype="Any"
| fields "IP address" Hostname OS
| dedup "IP address" Hostname OS
| eval Hostname = lower(Hostname)
| lookup inventory.csv Reporting_Host as Hostname output Reporting_Host as match
| lookup inventory.csv Reporting_Host as "IP address" output Reporting_Host as match
| eval match = if(isnull(match), "missing", "ok")
| table Hostname	"IP address"	OS	match
0 Karma

Richy_s
Path Finder


Hi @yuanliu 

Thank you for your feedback! I will definitely look into the performance issue and plan for further improvements.

Regarding the query, I tried it out, and here’s how it’s working: Index A contains around 70k assets, which serves as our asset inventory. Some hosts in this index have multiple IP addresses assigned to them. Index B has just the hostname, but this can include a mix of IP addresses, FQDNs, and hostnames.

When I ran the query, the first lookup compared the Reporting_Host with the hostnames in Index A and determined whether there was a match. The second lookup compared the Reporting_Host against the IP addresses in Index A to check for matches. However, when we combined these lookups as shown in the query you shared, the results only reflected matches from the second lookup, meaning only the IP addresses were being compared. Additionally, since a host in Index A has multiple IP addresses, the query gives a match for the IP address that corresponds, but for the remaining IP addresses associated with that host, it shows them as missing.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

when we combined these lookups as shown in the query you shared, the results only reflected matches from the second lookup, meaning only the IP addresses were being compared. Additionally, 

My mistake again.  When using the same output name, the second lookup overrides the first.  Use outputnew in the second.

index=A sourcetype="Any"
| fields "IP address" Hostname OS
| dedup "IP address" Hostname OS
| eval Hostname = lower(Hostname)
| lookup inventory.csv Reporting_Host as Hostname output Reporting_Host as match
| lookup inventory.csv Reporting_Host as "IP address" OUTPUTNEW Reporting_Host as match
| eval match = if(isnull(match), "missing", "ok")
| table Hostname	"IP address"	OS	match

 

0 Karma

jawahir007
Communicator
index=A sourcetype="Any" 
| eval Hostname=lower(Hostname)
| table Hostname os device_type ```# Include os and device_type fields```
| dedup Hostname
| append [
search index=B sourcetype="foo"
| eval Hostname=lower(Reporting_Host)
| table Hostname
| dedup Hostname
]
| stats values(os) as os values(device_type) as device_type count by Hostname
| eval match=if(count=1, "missing", "ok")
| table Hostname os device_type match

------

If you find this solution helpful, please consider accepting it and awarding karma points !!

 

 

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...