Search in lookup with subsearch- bug ?

Treize · ‎04-01-2025

Hi, there,

I'm simplifying the context:
We've had a perfectly working correlation rule for several years now, and for the past 2 days it hasn't been working properly.

The command has to list IPs and then check if these IPs are not in a first lookup and then in a second lookup. If the IPs are not in either lookup, an alert is triggered. The IPs are then added to the second lookup, so that they can be ignored for future searches.

The lookups are both identical:
IP
-------
1.1.1.1
2.2.2.2
etc

The first lookup has 1000 lines
The second lookup has 55000 lines

Everything was working fine, but now we have IPs that are triggering alerts despite being in the second lookup.

Any ideas?

Thanks a lot.

ITWhisperer · ‎04-01-2025

Your subsearch only returns a limited number of rows (possibly only 10000 not the 55000 you were expecting).

As an experiment, I tried this

| makeresults count=50001
| streamstats count as row
| outputlookup large.csv

Followed by

| makeresults count=50010
| streamstats count as row
| search NOT 
    [| inputlookup large.csv]

I still get 40010 rows not the 9 that I "should" get

ITWhisperer · ‎04-02-2025

You could break up the csv so that you process it in chunks of 10000 events

| makeresults count=50010
| streamstats count as row
| search NOT 
    [| inputlookup large.csv
    | head 10000]
| search NOT 
    [| inputlookup large.csv
    | head 20000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 30000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 40000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 50000
    | tail 10000]

Treize · ‎04-02-2025

That solution isn't perfect but it's a good tips, thanks dude

gcusello · ‎04-01-2025

Hi @Treize ,

I don't know why your lookup doesn't correctly run, how many lines there are in your lookups?

maybe you supered the limit of 10,000 rows.

Then, did you changed the name of the "ip" field in the main search or in the lookups?

anyway, your search isn't optimized: it should be better to insert the two subsearches in the main search and declare the field to use in the search using the field command:

<my search> NOT ([ | inputlookup 1.csv | fields ip ] OR [ | inputlookup 2.csv | fields ip ]
| dedup ip
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

Ciao.

Giuseppe

Treize · ‎04-02-2025

Thanks, I'll keep the optimization in my mind and implement it as soon as my problem is solved.

livehybrid · ‎04-01-2025

Hi @Treize ,

The issue is likely due to subsearch limitations. Subsearches have default limits on the number of results (10,000) and runtime, and your second lookup (55,000 lines) exceeds the default output limit. This means the subsearch [ | inputlookup 2.csv ] might only be returning the first 10,000 IPs, causing the outer search to incorrectly identify IPs beyond that limit as "not found".

Use the lookup command instead, which is more efficient and avoids these limits.

| dedup ip
| lookup 1.csv IP AS ip OUTPUT IP AS found_in_1
| where isnull(found_in_1)
| lookup 2.csv IP AS ip OUTPUT IP AS found_in_2
| where isnull(found_in_2)
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

This search first finds unique IPs, then uses lookup to check against 1.csv. It keeps only the IPs not found (where isnull(found_in_1)). It then repeats the process for 2.csv. Finally, it outputs the remaining IPs (those not found in either lookup) to 2.csv.

Quick Tip: Double-check that the IP column in your lookup files doesn't contain leading/trailing whitespace

https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches#Subsearch_performance_co...

🌟 Did this answer help you? If so, please consider:

Adding kudos to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Treize · ‎04-02-2025

Hello and thank you for your suggestion

It works pretty well indeed!

The only problem, and it's my fault, I forgot to specify that the 1.csv contains CIDRs and not simple IPs... Any ideas? 😃

I'd never have thought of the limits of subsearch. How has it worked all these years when we exceeded 10,000 lines years ago?

livehybrid · ‎04-02-2025

Hi @Treize

If your "ip" field in the lookup is a CIDR then configure it as an lookup definition (rather than referencing it as <lookup1>.csv and then under Advanced Options set the Match type to CIDR(ip) as below:

🌟 Did this answer help you? If so, please consider:

Adding kudos to show it was useful
Marking it as the solution if it resolved your issu
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

Treize · ‎04-02-2025

The lookup has already been defined. The variable is not really “ip” so in the definition should we put CIDR(ip) because it's an IP or CIDR() to define the variable it should take into account? In both cases, this solution doesn't work. It can't find the IPs in the lookup's CIDRs...

livehybrid · ‎04-02-2025

Hi @Treize

In the Match type box you would do CIDR(fieldName) where fieldName is the name of the field in your lookup which contains the CIDR values.

Treize · ‎04-02-2025

In the meantime, I've come up with a simple idea: a subsearch for the lookup with 1000 lines and a simple "| lookup" command for the lookup with 50,000 lines.

gcusello · ‎04-02-2025

Hi @Treize ,

it could run, but you should add another field to use for the check.

But, having the issue of so many rows, why you don't use a summary index, outting it in the main search so you don't have limits?

something like this:

(<my search>) OR (index=new_summary_index)
| eval ip=coalesce(ip,IP)
| stats 
     values(index) AS index
     dc(index) AS index_count
     BY ip
| where index_count=1 AND index=<your_index>
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

Ciao.

Giuseppe

Treize · ‎04-02-2025

I don't understand why a summary index would be better?

We use 2 lookups:

- 1st because it comes from a third party

- 2nd because we need to increment it after treating this IP as an alert

gcusello · ‎04-02-2025

Hi @Treize ,

because summary index in a main search hasn't limits in the number of results.

Ciao.

Giuseppe

Search in lookup with subsearch- bug ?

lookup

subsearch

Can’t make it to .conf25? Join us online!

What Is Splunk? Here’s What You Can Do with Splunk

Level Up Your .conf25: Splunk Arcade Comes to Boston

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Are you a member of the Splunk Community?

Search in lookup with subsearch- bug ?

lookup

subsearch

Can’t make it to .conf25? Join us online!

What Is Splunk? Here’s What You Can Do with Splunk

Level Up Your .conf25: Splunk Arcade Comes to Boston

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...