Re: Filter specific values from a field in main se...

ea-2023 · ‎02-20-2024

In my search I have a field (ResourceId) that contains various cloud resource values. One of these values is InstanceId. The subsearch is returning a list of "active" instances. What I ultimately need to do is filter out only those InstanceIds from the ResourceIds field that DO NOT match the InstanceIds returned from the subsearch (the active instances), while keeping all other values in the ResourceId field.

Sample ResourceId values:
i-987654321abcdefg (active; WAS returned by subsearch)
i-123abcde456abcde (inactive; was NOT a returned value from subsearch)
bucket-name
sg-12423adssvd

Intended Output:
i-987654321abcdefg
bucket-name
sg-12423adssvd

Search (in progress):

index=main ResourceId=*
| join InstanceId type=inner [search index=other type=instance earliest=-2h]
| eval InstanceId=if(in(ResourceId, InstanceId), InstanceId, "NULL")
| table InstanceId

bowesmana · ‎02-21-2024

Using join is not a Splunk way of doing things, generally you would use stats. I'm not entirely clear on what fields exist in what indexes in your example. Does InstanceId exist in index=main data - that is what you are joining on

From your description it sounds like all you want are those InstanceIds that come from the subsearch, so maybe I'm missing something

If you are looking to find only those ResourceId where ResourceId=InstanceId from your current subsearch, but are also looking for other information, then

index=main ResourceId=* OR (index=other type=instance earliest=-2h)
| eval InstanceId=coalesce(ResourceId, InstanceId)
| stats values(*) as * values(index) as indexes count by InstanceId
| where mvcount(indexes)=2

ea-2023 · ‎02-21-2024

I really appreciate the suggestion. You do seem to get what I'm after, though the mvcount(indexes) returned no results. It seems so simple to me conceptually, but finding it very frustrating to try and wrap my head around "how splunk does it". I'm trying to include just the InstanceIds seen in both indexes while keeping the rest of the values from ResourceId in index=main.

Index=main contains many different values in ResourceId (some of which are instance IDs).
Index=other has a field called InstanceId.

This works, but I am not sure how to filter the MV index results (where mvcount(index)=2 did not work)

(index=main ResourceId=i-*) OR (index=other type=instance earliest=-2h)
| eval InstanceId=coalesce(ResourceId, InstanceId)
| stats values(index) as index by InstanceId

I've tried to use if/match, which sounds like it is exactly what I need, but it looks like you can't specify dynamic values for the match.

(index=main ResourceId=i-*) OR (index=other type=instance earliest=-2h)
| eval InstanceId=if(match(ResourceId, InstanceId, ResourceId, "null"))
| stats values(index) as index by InstanceId

bowesmana · ‎02-21-2024

This statement

| stats values(index) as index by InstanceId

should certainly give you a field called index which will contain main/other or both

Doing

| stats values(*) as * dc(index) as index_count by InstanceId

would give you all the values of every field from both indexes and a field called index_count that would contain a 1 or 2

You can't match the resource id against the instanceid as the events are not yet "joined" together, so there will either be a ResourceId (from index=main) OR an InstanceId (from index=other), so the coalesce+stats will join the two datasets together on that now common field (due to coalesce).

Effectively what you are saying is that after the stats, it will show, for each InstanceId (where InstanceId has come from ResourceId in index=main), the values of the indexes those IDs were found in.

After the stats you can then match as needed, so I believe what you are trying to do is to then say

"I need to only show results, where a ResourceId from index=main has also been found as InstanceId from index=other.

So, the logic to decide that is

mvcount(index)=2 (this means it was in both indexes). You could use index_count from the dc(index) example above = that is the same as doing the mvcount.

Doing values(*) as * is simply a way to carry through all fields combined from both indexes when joining the data together - as you have tried the stats values(index) as index... that should simply carry forward the main+other to that field.

Can you given an example of the data you have in both and a search result that highlights what you are getting.

ea-2023 · ‎02-22-2024

I think I may not be explaining a key part of this well enough (or if I am misunderstanding your explanation, I'm sorry!). I need ALL ResourceIds from index=main. The only values I need to filter out are instance IDs (i.e. i-1234567abcdef) that are NOT found in index=other.

So let's say index=main ResourceId=* returns:
i-1234567abcdef
i-abcdef1234567
sg-12345abcde
etc. (any other value that is not an instance ID)

and the index=other search returns InstanceId:
i-abcdef1234567

I need the results to be (filtered out i-1234567abcdef because it was not returned by index=other):
i-abcdef1234567
sg-12345abcde

So I guess a way to think about this is that I am trying to remove any value from ResourceId that matches the string "i-*" IF it was NOT found in index=other, and THEN coalesce ResourceId and InstanceId into a single field.

bowesmana · ‎02-25-2024

Your example is a little unclear, because it stats index=other has i-abcdef1234567 but in the next statement sats it is filtered out i-abcdef1234567 because it was NOT in index=other

Hopefully the following example demonstrates the principle. I am using makeresults to simulate your data set. The stats values combines the two and then the where clause is what you use for your exclusion logic.

If that is not correct based on the above discrepancy, adjust as necessary. You can remove the where clause to see what the data looks like first

| makeresults
| eval index="main", ResourceId=split("i-1234567abcdef,i-abcdef1234567,sg-12345abcde,abc", ",")
| mvexpand ResourceId
| append [
  | makeresults
```
and the index=other search returns InstanceId:
i-abcdef1234567
```
  | eval index="other", InstanceId=split("i-abcdef1234567,i-abcdef1234569",",")
]
| fields - _time
``` The above is just simulating your data setup ```
| eval ResourceId=coalesce(ResourceId, InstanceId)
| stats values(index) as index dc(index) as indexes by ResourceId
| where (indexes=1 AND index="main") OR indexes=2 
```
I need the results to be (filtered out i-1234567abcdef because it was not returned by index=other):
i-abcdef1234567
sg-12345abcde```

Filter specific values from a field in main search by values from same field in subsearch

eval

subsearch

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?