Hi Splunk Community,
I have a query that retrieves building data from two sources and I need assistance in identifying the unique buildings that are present in buildings_from_search1 but not in buildings_from_search2. Here's the query I'm currently using:
index=buildings_core
|stats values(building_from_search1) as buildings_from_search1 by request_unique_id
| append [ | inputlookup roomlookup_buildings.csv
| stats values(building_from_search2) as buildings_from_search2 ]
Could someone please guide me on how to modify this query to get the unique buildings that are only present in buildings_from_search1 and not in buildings_from_search2?
Thank you in advance for your help!
For that, you use inputlookup. It's simply the reverse logic. I will use the same assumptions about your index search, with the same assumed field names.
| inputlookup roomlookup_buildings.csv where NOT
[search index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
| fields building_from_search1
| rename building_from_search1 as buildings]
| rename buildings as buildings_only_in_lookup
Using the same emulation as shown above, the mock data would give
buildings_only_in_lookup |
Antara |
Also a point that name buildiing_from_search1 (or building_from_index_search as in your latest comment). Its value comes from an original field named "buildings" which is the same as in the lookup. It is much easier to keep using that name on the left-hand side of the assignments because there don't appear to have use of the original value down the stream.
First, do not reach for inputlookup when lookup is readily applicable. Assuming that you have distinct names in indexed events and the lookup, all you need to do is
index=buildings_core
| lookup roomlookup_buildings.csv building_from_search2 as building_from_search1 output building_from_search2
| where isnull(building_from_search2)
| stats values(building_from_search1) as unmatched_buildings_from_search1 by request_unique_id
Here, I assume that you have already defined a lookup called roomlookup_buildings.csv. (I usually name my lookups without that .csv.)
But then, why would you have different field names for the same thing? If in both indexed events and lookup the field name is building, you can do
index=buildings_core
| lookup roomlookup_buildings.csv building output building as matching_building
| where isnull(matching_building)
| stats values(building) as unique_buildings by request_unique_id
Hope this helps.
Hi @yuanliu,
Sorry, but the previous approach hasn’t worked for me. Let me provide the full context with the entire query. I am trying to compare building names from two sources: an indexed search and a lookup file.
For example, the building_from_search1 values from the indexed search are:
The lookup file, which has a column named buildings, contains values like:
Currently, I am using the mvappend command to combine both sets and filter for values with a count of 1. However, this approach gives me unique values from both searches, not just the unique values from the indexed search.
The target is to print unique values from the indexed search only. In this example, "Amsterdam" should be included in the result, but I am currently getting both "Amsterdam" and "Antara."
Here is the query I am using:
index= buildings_core "Buildings updated in database*" | rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)" | rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})" | eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",") | mvexpand buildings | eval building_from_search1 = mvindex(split(buildings, ":"), 1) | stats values(building_from_search1) as buildings_from_search1 by request_unique_id | append [ | inputlookup roomlookup_buildings.csv | stats values(buildings) as buildings_from_search2 ] | eval all_buildings = mvappend(buildings_from_search1, buildings_from_search2) | stats count by all_buildings | where count = 1 | stats values(all_buildings) as all_buildings | eval source="buildings_lacking_timezone_data" | table source, all_buildings
First off, the phrase "doesn't work" conveys little information in the best of cases and should be banished. Describe your data and illustrate the output, then explain why the output is different from desired output unless it is painfully obvious. (See my Four Commandments below.)
Back to your search. You already say that search does not meet your requirement. Why insist on using append? To get unique buildings in index events, you lookup any matching value, then exclude those matching events. What is left are events with unmatched buildings. Not only is this approach more semantic, but using lookup is also more efficient because that's a binary tree search.
About that roomlookup_buildings.csv,
If you have already defined a lookup, let's also call it roomlookup_buildings.csv; and let's assume that each row contains one value for building, i.e.,
buildings |
Aachen 1 |
Almanor 1 |
Almanor 2 |
Antara |
Further assume that your index search has these events:
building_from_search1 | request_unique_id |
Aachen 1 | ID 1 |
Almanor 1 | ID 2 |
Almanor 2 | ID 2 |
Amsterdam | ID 3 |
Then, you run
| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building
This should give you
building_from_search1 | matching_building | request_unique_id |
Aachen 1 | Aachen 1 | ID 1 |
Almanor 1 | Almanor 1 | ID 2 |
Almanor 2 | Almanor 2 | ID 2 |
Amsterdam | ID 3 |
Apply the filter,
| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building
| where isnull(matching_building)
This results in
building_from_search1 | matching_building | request_unique_id |
Amsterdam | ID 3 |
Then, apply stats to the whole thing
index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
| lookup roomlookup_buildings.csv buildings as building_from_search1 output buildings as matching_building
| where isnull(matching_building)
| stats values(building_from_search1) as unmatching_buildings by request_unique_id
That mock data gives
request_unique_id | unmatching_buildings |
ID 3 | Amsterdam |
Is this what you expect from that mock data?
Here, I am illustrating four golden rules of asking an answerable question in data analytics, which I call Four Commandments:
Here is an emulation for you to play with and compare with real data. This emulation is used to generate the above mock data. If your real data (including lookup) is different, you need to carefully describe them.
| makeresults format=csv data="building_from_search1, request_unique_id
Aachen 1, ID 1
Almanor 1, ID 2
Almanor 2, ID 2
Amsterdam, ID 3"
``` the above emulates
index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
```
Hi @yuanliu ,
Thank a lot you for your help on this matter. I apologize for the confusion earlier; it turns out that the building_from_index_search field did indeed contain multiple values per row, which was causing the matching issues. Adding the | mvexpand building_from_index_search line helped by creating separate rows, which resolved the problem and allowed me to obtain the expected output.
Now, I’d like to address a similar requirement. While the current approach effectively identifies the unique values in building_from_index_search that are not present in roomlookup_buildings.csv buildings, I also need to find unique values specifically in the roomlookup_buildings.csv buildings that do not appear in the building_from_index_search field.
Could you help me with a query to achieve this? Thank you in advance for your assistance!
For that, you use inputlookup. It's simply the reverse logic. I will use the same assumptions about your index search, with the same assumed field names.
| inputlookup roomlookup_buildings.csv where NOT
[search index= buildings_core "Buildings updated in database*"
| rex "REQUEST_UNIQUE_ID:(?<request_unique_id>[^ ]+)"
| rex "Buildings updated in database:\s(?<buildings>\{[^}]+\})"
| eval buildings = replace(buildings, "[{}]", "") | eval buildings = split(buildings, ",")
| mvexpand buildings
| eval building_from_search1 = mvindex(split(buildings, ":"), 1)
| fields building_from_search1
| rename building_from_search1 as buildings]
| rename buildings as buildings_only_in_lookup
Using the same emulation as shown above, the mock data would give
buildings_only_in_lookup |
Antara |
Also a point that name buildiing_from_search1 (or building_from_index_search as in your latest comment). Its value comes from an original field named "buildings" which is the same as in the lookup. It is much easier to keep using that name on the left-hand side of the assignments because there don't appear to have use of the original value down the stream.
Hi @yuanliu
It worked perfectly! Thank you so much for your help; you’ve saved me a great deal of time. I had been struggling for several days to implement this logic to create an alert, and now that I have an efficient approach, I’m happy to accept this as the solution.
Thanks again for your support!