Splunk Search

Finding matches between 3 different indexes.

assennikolov
Explorer

I have the following case: I have 3 different indexes (A, B and C). My goal is to find what percentage of the devices found in index B could also be found in index C.

In index A I have fields asset_name and mac_address (~1000 different devices)
In index B I have field src_mac (~150 different devices)
In index C I have field asset_name (~6000) different devices

Basically, I firstly try to find the asset names of all 150 hosts from index B by looking into index A. And then I compare the newly found asset names to the asset names of index C.

index="C"
| stats count by asset_name
| join type=left [search (index=A OR index=B)
| eval all_macAddresses = coalesce(src_mac, mac_address)
| stats values(asset_name) as asset_name values(src_mac) as src_mac values(mac_address) as mac_address by all_macAddresses
| eval match = if(src_mac == mac_address, "match", "no_match")
| where match="match"
| table asset_name all_macAddresses]
| eval new=if(isnull(all_macAddresses,"NOT_OK","OK")
| stats count by new

I managed to get the results by using the above search but I was wondering whether this could be achieved without using any subsearches.

Labels (2)
0 Karma
1 Solution

rnowitzki
Builder

As I still had my browser tab with sample data open, I further played around with it.
I think I got it, but you would have to test it against your real data:

  | eval all_macAddresses = if(testindex="C","C",coalesce(src_mac, mac_address))
  | stats values(testindex) as testindex, values(asset_name) as asset_name    by all_macAddresses
  | mvexpand asset_name
  | stats values(testindex) as testindex by asset_name
  | where asset_name!=""
  | eval match=mvcount(testindex)
  | eval match=if(match=3, "OK", "NOT_OK")

It is based on the assumption that the assets you are looking for will have data in all three indexes: either hostname or mac or both in A, B or C. So the match against against the mvcount of (test)index column is checking for "3" at the end. Those should be the matching ones.
I put "C" as macaddress in the first eval for assets in the C index, just because the following stats command would remove all "C" data otherwise (if mac is empty)...maybe remove some lines one after another to see how the data is put together.

Again, replace testindex with _index to make it run for you.

Would be interesting to see if and how much faster this is compared to the join solution.

--
Karma and/or Solution tagging appreciated.

View solution in original post

rnowitzki
Builder

It is difficult without having the actual data, but this might work:

Instead of searching just index="C" in the first place, search all 3 indexes, so you have all the data in the search results. Then:

    | eval all_macAddresses = coalesce(src_mac, mac_address)
    | stats values(testindex) as testindex  values(all_macAddresses) as all_macAddresses  by asset_name
    | where NOT testindex="B"
    | eval match=mvfilter(testindex LIKE "A%")
    | eval match=match+","+mvfilter(testindex LIKE "C%")
    | eval match=if(match="A,C", "OK", "NOT_OK")
    | stats count by match

(I created a small table with makeresults that has a field "testindex" to simulate your different indexes, so just change testindex to _index)

This was quick and dirty, I guess the handling of the mvfield can be made more efficient, like for example directly filtering somehow for A and C instead of putting those in a new field.

The basic idea is to use stats with all the data instead of joining 2 subsets. So if my SPL does not work as is with your real data, I hope I could put you in the right direction.

--
Karma and/or Solution tagging appreciated.

assennikolov
Explorer

Thanks for the fast reply. I tested your search syntax, however end results shows:

match count
NOT_OK 4462
OK 1066 (~150 expected)

So I assume that at the end of the day it compares only index A to index C (given that index B has around 150 devices only).

Still, your reply was from a great help and I will work on filtering first the asset names from index A which MAC addresses are found in index B.

0 Karma

rnowitzki
Builder

As I still had my browser tab with sample data open, I further played around with it.
I think I got it, but you would have to test it against your real data:

  | eval all_macAddresses = if(testindex="C","C",coalesce(src_mac, mac_address))
  | stats values(testindex) as testindex, values(asset_name) as asset_name    by all_macAddresses
  | mvexpand asset_name
  | stats values(testindex) as testindex by asset_name
  | where asset_name!=""
  | eval match=mvcount(testindex)
  | eval match=if(match=3, "OK", "NOT_OK")

It is based on the assumption that the assets you are looking for will have data in all three indexes: either hostname or mac or both in A, B or C. So the match against against the mvcount of (test)index column is checking for "3" at the end. Those should be the matching ones.
I put "C" as macaddress in the first eval for assets in the C index, just because the following stats command would remove all "C" data otherwise (if mac is empty)...maybe remove some lines one after another to see how the data is put together.

Again, replace testindex with _index to make it run for you.

Would be interesting to see if and how much faster this is compared to the join solution.

--
Karma and/or Solution tagging appreciated.

assennikolov
Explorer

@rnowitzki For some time I was not able to reply to this post. Anyway, the solution you proposed worked and it is around 5 time faster than using the join command. Thanks again for helping me out with this one. 

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...