Splunk Search

How to Find Unique Buildings Present in One Search but Not in Another

Tajuddin
Explorer

Hi Splunk Community,

I have a query that retrieves building data from two sources and I need assistance in identifying the unique buildings that are present in buildings_from_search1 but not in buildings_from_search2. Here's the query I'm currently using:

index=buildings_core
|stats values(building_from_search1) as buildings_from_search1 by request_unique_id
| append [ | inputlookup roomlookup_buildings.csv
| stats values(building_from_search2) as buildings_from_search2 ]

Could someone please guide me on how to modify this query to get the unique buildings that are only present in buildings_from_search1 and not in buildings_from_search2?

Thank you in advance for your help!

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

For that, you use inputlookup.  It's simply the reverse logic.  I will use the same assumptions about your index search, with the same assumed field names.

 

| inputlookup roomlookup_buildings.csv where NOT
    [search index= buildings_core "Buildings updated in database*"
    | rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
    | rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
    | eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
    | mvexpand buildings
    | eval building_from_search1 = mvindex(split(buildings, ":"), 1)
    | fields building_from_search1
    | rename building_from_search1 as buildings]
| rename buildings as buildings_only_in_lookup

 

Using the same emulation as shown above, the mock data would give

buildings_only_in_lookup
Antara

Also a point that name buildiing_from_search1 (or building_from_index_search as in your latest comment).  Its value comes from an original field named "buildings" which is the same as in the lookup.  It is much easier to keep using that name on the left-hand side of the assignments because there don't appear to have use of the original value down the stream.

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

First, do not reach for inputlookup when lookup is readily applicable.  Assuming that you have distinct names in indexed events and the lookup, all you need to do is

index=buildings_core
| lookup roomlookup_buildings.csv building_from_search2 as building_from_search1 output building_from_search2
| where isnull(building_from_search2)
| stats values(building_from_search1) as unmatched_buildings_from_search1 by request_unique_id

Here, I assume that you have already defined a lookup called roomlookup_buildings.csv. (I usually name my lookups without that .csv.)

But then, why would you have different field names for the same thing?  If in both indexed events and lookup the field name is building, you can do

index=buildings_core
| lookup roomlookup_buildings.csv building output building as matching_building
| where isnull(matching_building)
| stats values(building) as unique_buildings by request_unique_id

Hope this helps.

Tags (1)

Tajuddin
Explorer

Hi @yuanliu,

Sorry, but the previous approach hasn’t worked for me. Let me provide the full context with the entire query. I am trying to compare building names from two sources: an indexed search and a lookup file.

For example, the building_from_search1 values from the indexed search are:

  • \"Aachen 1\"
  • \"Almanor 1\"
  • \"Almanor 2\"
  • \"Amsterdam\"

The lookup file, which has a column named buildings, contains values like:

  • \"Aachen 1\"
  • \"Almanor 1\"
  • \"Almanor 2"
  • \"Antara"

Currently, I am using the mvappend command to combine both sets and filter for values with a count of 1. However, this approach gives me unique values from both searches, not just the unique values from the indexed search.

The target is to print unique values from the indexed search only. In this example, "Amsterdam" should be included in the result, but I am currently getting both "Amsterdam" and "Antara."

Here is the query I am using:

index= buildings_core "Buildings updated in database*" | rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)" | rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})" | eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",") | mvexpand buildings | eval building_from_search1 = mvindex(split(buildings, ":"), 1) | stats values(building_from_search1) as buildings_from_search1 by request_unique_id | append [ | inputlookup roomlookup_buildings.csv | stats values(buildings) as buildings_from_search2 ] | eval all_buildings = mvappend(buildings_from_search1, buildings_from_search2) | stats count by all_buildings | where count = 1 | stats values(all_buildings) as all_buildings | eval source="buildings_lacking_timezone_data" | table source, all_buildings

0 Karma

yuanliu
SplunkTrust
SplunkTrust

First off, the phrase "doesn't work" conveys little information in the best of cases and should be banished.  Describe your data and illustrate the output, then explain why the output is different from desired output unless it is painfully obvious. (See my Four Commandments below.)

Back to your search.  You already say that search does not meet your requirement.  Why insist on using append?  To get unique buildings in index events, you lookup any matching value, then exclude those matching events.  What is left are events with unmatched buildings.  Not only is this approach more semantic, but using lookup is also more efficient because that's a binary tree search.

About that roomlookup_buildings.csv,

  1. have you defined a lookup to use this file?  In Splunk, a lookup definition can be independent of lookup file, meaning you need a definition. (The lookup definition doesn't have the use the same name as the file, but must use the file as source.  My convention is to name a lookup without .csv but that's up to you.  I will assume that your definition is called roomlookup_buildings.csv.)
  2. Are the column buildings containing one value per row? (I will assume yes.  There is no good reason not to.)
  3. What are those escaped quotation marks?  Are they part of field value or do you simply use them to signal that between quotes are the values? (I will assume the values are between quotes.)

If you have already defined a lookup, let's also call it roomlookup_buildings.csv; and let's assume that each row contains one value for building, i.e.,

buildings
Aachen 1
Almanor 1
Almanor 2
Antara

Further assume that your index search has these events:

building_from_search1request_unique_id
Aachen 1ID 1
Almanor 1ID 2
Almanor 2ID 2
AmsterdamID 3

Then, you run

 

 

| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building

 

 

This should give you

building_from_search1matching_buildingrequest_unique_id
Aachen 1Aachen 1ID 1
Almanor 1Almanor 1ID 2
Almanor 2Almanor 2ID 2
Amsterdam ID 3

Apply the filter,

 

 

| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building
| where isnull(matching_building)

 

 

This results in

building_from_search1matching_buildingrequest_unique_id
Amsterdam ID 3

Then, apply stats to the whole thing

 

 

index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building
| where isnull(matching_building)
| stats values(building_from_search1) as unmatching_buildings by request_unique_id

 

 

That mock data gives

request_unique_idunmatching_buildings
ID 3Amsterdam

Is this what you expect from that mock data?

Here, I am illustrating four golden rules of asking an answerable question in data analytics, which I call Four Commandments:

  • Illustrate data input (in raw text, anonymize as needed), whether they are raw events or output from a search that volunteers here do not have to look at.
  • Illustrate the desired output from illustrated data.
  • Explain the logic between illustrated data and desired output without SPL.
  • If you also illustrate attempted SPL, illustrate actual output and compare with desired output, explain why they look different to you if that is not painfully obvious.

Here is an emulation for you to play with and compare with real data.  This emulation is used to generate the above mock data.  If your real data (including lookup) is different, you need to carefully describe them.

 

 

| makeresults format=csv data="building_from_search1, request_unique_id
Aachen 1, ID 1
Almanor 1, ID 2
Almanor 2, ID 2
Amsterdam, ID 3"
``` the above emulates
index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
```

 

 

 

0 Karma

Tajuddin
Explorer

Hi @yuanliu ,

Thank a lot you for your help on this matter. I apologize for the confusion earlier; it turns out that the building_from_index_search field did indeed contain multiple values per row, which was causing the matching issues. Adding the | mvexpand building_from_index_search line helped by creating separate rows, which resolved the problem and allowed me to obtain the expected output.

Now, I’d like to address a similar requirement. While the current approach effectively identifies the unique values in building_from_index_search that are not present in roomlookup_buildings.csv buildings, I also need to find unique values specifically in the roomlookup_buildings.csv buildings that do not appear in the building_from_index_search field.

Could you help me with a query to achieve this? Thank you in advance for your assistance!

0 Karma

yuanliu
SplunkTrust
SplunkTrust

For that, you use inputlookup.  It's simply the reverse logic.  I will use the same assumptions about your index search, with the same assumed field names.

 

| inputlookup roomlookup_buildings.csv where NOT
    [search index= buildings_core "Buildings updated in database*"
    | rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
    | rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
    | eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
    | mvexpand buildings
    | eval building_from_search1 = mvindex(split(buildings, ":"), 1)
    | fields building_from_search1
    | rename building_from_search1 as buildings]
| rename buildings as buildings_only_in_lookup

 

Using the same emulation as shown above, the mock data would give

buildings_only_in_lookup
Antara

Also a point that name buildiing_from_search1 (or building_from_index_search as in your latest comment).  Its value comes from an original field named "buildings" which is the same as in the lookup.  It is much easier to keep using that name on the left-hand side of the assignments because there don't appear to have use of the original value down the stream.

0 Karma

Tajuddin
Explorer

Hi @yuanliu 

It worked perfectly! Thank you so much for your help; you’ve saved me a great deal of time. I had been struggling for several days to implement this logic to create an alert, and now that I have an efficient approach, I’m happy to accept this as the solution.

Thanks again for your support!

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...