Question regarding search using IN

rnayak · ‎03-17-2025

Hello:

I have a query that extracts a set of 5 request_ids based on certain criteria. I then need to include these request ids in a subsearch using the "IN" operator. I build up the string for the the search using the following:
| stats list(request_id) as req_id_list
| eval req_id_clause="(".mvjoin(req_id_list, ",").")"

I then use it in my query as follows:
| search req_id IN $req_id_clause$

However, Splunk interprets the $reg_id_clause$ as a literal string that is "(req_id1, req_id2...)" and I get an error. What are my options to handle this?

Thanks!

yuanliu · ‎03-17-2025

This is quite confusing. You should not flatten your multivalue req_id_list in the first place. First, make sure your list is multivalue:

| stats list(request_id) as req_id_list

No mvjoin allowed. Then, use with parentheses after IN operator. (See documentation in Logical expression options.)

| search req_id IN ($req_id_clause$)

livehybrid · ‎03-17-2025

Hi @rnayak

Would something like this work for you?

index=_internal 
    [ search index=_internal 
    | dedup source 
    | head 2
    | stats list(source) as source 
    | fields source  ]

In this example it gets 2 sources from _internal and ultimately passes them back in as source="a" OR source="b"
Using this approach I think you should get the same output as using an IN clause?

You can also add a format if you wanted to visualise this differently, but I dont think this will make much difference to your search!

| format "" "" "AND" "" "OR" ""

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

rnayak · ‎03-17-2025

Will, thank you for your prompt response. This is my original anonymized query that works so long as the request ids retrieved in the second set is a small number. Since the join failed, I am trying to extract request ids from the first search and feed it to the second search to limit the number of results. Hence my attempt to extract the request id as a variable and pass it to the subsequent search.

index="xyz-svc" "blah!"
| rex field=_raw "blah! cust_id=(?<cid>\d+)"
| stats count as error_count by cid
| sort - error_count
| head 10
| join type=left cid
[ search index="xyz-svc" "blah!"
| rex field=_raw "cust_id=(?<cid>\d+)"
| rex field=_raw "(?:INFO\s+\[.*\]\s+)?\[(?<request_id>[A-Za-z0-9@\-]+)\]"
| stats first(request_id) as request_id by cid
| table cid request_id ]
| rename request_id as req_id
| eval req_id=trim(req_id)
| join type=left req_id
[ search tag="prod" index="xyz-svc" Parameters (sourceName OR source_name OR vis-svc)
| rex field=_raw "(?:INFO\s+\[.*\]\s+)?\[(?<req_id>[A-Za-z0-9@\-]+)\]"
| eval req_id=trim(req_id)
| eval vis-svc_match=if(match(_raw, "vis-svc"), "vis-svc", null())
| rex field=_raw "sourceName\"=>\"(?<sourceName>[a-zA-Z]+)\""
| rex field=_raw "source_name\"=>\"(?<source_name>[a-zA-Z]+)\""
| eval app_source=coalesce(vis-svc_match, sourceName, source_name)
| where isnotnull(app_source) AND len(app_source) > 0
| table req_id app_source ]
| table cid app_source req_id error_count

When I look at your suggestion, you recommend that I use fields similar to
index="xyz-svc" "blah!"
| rex field=_raw "blah! cust_id=(?<cid>\d+)"
| stats count as error_count by cid
| sort - error_count
| head 10
| join type=left cid
[ search index="xyz-svc" "blah!"
| rex field=_raw "cust_id=(?<cid>\d+)"
| rex field=_raw "(?:INFO\s+\[.*\]\s+)?\[(?<request_id>[A-Za-z0-9@\-]+)\]"
| stats first(request_id) as request_id by cid
| table cid request_id ]
| rename request_id as req_id
| stats list(req_id) as req_id
| fields req_id

How would I pass this to the second query which needs to filter by
search tag="prod" index="xyz-svc" Parameters (sourceName OR source_name OR vis-svc) and extract request ids from these results and ensure that they are in the req_id extracted above. I was trying the IN for this reason with a map and that was failing. Thank you for your time.

ITWhisperer · ‎03-17-2025

Subsearches execute before the main search, so see if you can create a subsearch to provide filter values which can be used to filter your main search (as I was trying to suggest with my first answer).

isoutamo · ‎03-17-2025

The way which is already presented to you is a normal way how this is done in splunk.

Usually you should also avoid to use join between data sets. It's very inefficient way to do join between data sets. It also has some restrictions like max amount of rows and timeout.

Can you describe what your SPL is trying to do and if possible give some data sets to use? In that way it's much easier to help you.

Please add SPL and data inside code blocks </> in editor.

ITWhisperer · ‎03-17-2025

| search [index <your_index>
  | stats count by request_id
  | rename request_id as req_id
  | fields req_id]

rnayak · ‎03-17-2025

Thank you for your suggestion. Please see my response to Will, I'm trying to figure out how I can use the fields in a subsequent search. Thank you for your time!

Question regarding search using IN

subsearch

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

Question regarding search using IN

subsearch

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...