Splunk Search

Search in lookup with subsearch- bug ?

Treize
Path Finder

Hi, there,

I'm simplifying the context:
We've had a perfectly working correlation rule for several years now, and for the past 2 days it hasn't been working properly.

The command has to list IPs and then check if these IPs are not in a first lookup and then in a second lookup. If the IPs are not in either lookup, an alert is triggered. The IPs are then added to the second lookup, so that they can be ignored for future searches.

It looks like this:
<my search>
| dedup ip
| search NOT [ | inputlookup 1.csv ]
| search NOT [ | inputlookup 2.csv ]
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

The lookups are both identical:
IP
-------
1.1.1.1
2.2.2.2
etc

The first lookup has 1000 lines
The second lookup has 55000 lines


Everything was working fine, but now we have IPs that are triggering alerts despite being in the second lookup.

Any ideas?

Thanks a lot.

 
 

 

 

Labels (2)
Tags (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Your subsearch only returns a limited number of rows (possibly only 10000 not the 55000 you were expecting).

As an experiment, I tried this

| makeresults count=50001
| streamstats count as row
| outputlookup large.csv

Followed by

| makeresults count=50010
| streamstats count as row
| search NOT 
    [| inputlookup large.csv]

I still get 40010 rows not the 9 that I "should" get

ITWhisperer_0-1743576993342.png

 

ITWhisperer
SplunkTrust
SplunkTrust

You could break up the csv so that you process it in chunks of 10000 events

| makeresults count=50010
| streamstats count as row
| search NOT 
    [| inputlookup large.csv
    | head 10000]
| search NOT 
    [| inputlookup large.csv
    | head 20000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 30000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 40000
    | tail 10000]
| search NOT 
    [| inputlookup large.csv
    | head 50000
    | tail 10000]

ITWhisperer_0-1743579486421.png

 

Treize
Path Finder

That solution isn't perfect but it's a good tips, thanks dude

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Treize ,

I don't know why your lookup doesn't correctly run, how many lines there are in your lookups?

maybe you supered the limit of 10,000 rows.

Then, did you changed the name of the "ip" field in the main search or in the lookups?

anyway, your search isn't optimized: it should be better to insert the two subsearches in the main search and declare the field to use in the search using the field command:

<my search> NOT ([ | inputlookup 1.csv | fields ip ] OR [ | inputlookup 2.csv | fields ip ]
| dedup ip
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

Ciao.

Giuseppe

 

Treize
Path Finder

Thanks, I'll keep the optimization in my mind and implement it as soon as my problem is solved.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Treize ,

The issue is likely due to subsearch limitations. Subsearches have default limits on the number of results (10,000) and runtime, and your second lookup (55,000 lines) exceeds the default output limit. This means the subsearch [ | inputlookup 2.csv ] might only be returning the first 10,000 IPs, causing the outer search to incorrectly identify IPs beyond that limit as "not found".

Use the lookup command instead, which is more efficient and avoids these limits.

| dedup ip
| lookup 1.csv IP AS ip OUTPUT IP AS found_in_1
| where isnull(found_in_1)
| lookup 2.csv IP AS ip OUTPUT IP AS found_in_2
| where isnull(found_in_2)
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

 

This search first finds unique IPs, then uses lookup to check against 1.csv. It keeps only the IPs not found (where isnull(found_in_1)). It then repeats the process for 2.csv. Finally, it outputs the remaining IPs (those not found in either lookup) to 2.csv.

Quick Tip: Double-check that the IP column in your lookup files doesn't contain leading/trailing whitespace

https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches#Subsearch_performance_co...

🌟 Did this answer help you? If so, please consider:

  • Adding kudos to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Treize
Path Finder

Hello and thank you for your suggestion

 

It works pretty well indeed!

The only problem, and it's my fault, I forgot to specify that the 1.csv contains CIDRs and not simple IPs... Any ideas? 😃

 

I'd never have thought of the limits of subsearch. How has it worked all these years when we exceeded 10,000 lines years ago?

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Treize 

If your "ip" field in the lookup is a CIDR then configure it as an lookup definition (rather than referencing it as <lookup1>.csv and then under Advanced Options set the Match type to CIDR(ip) as below:

livehybrid_0-1743580407202.png

 

🌟 Did this answer help you? If so, please consider:

  • Adding kudos to show it was useful
  • Marking it as the solution if it resolved your issu
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

Treize
Path Finder

The lookup has already been defined. The variable is not really “ip” so in the definition should we put CIDR(ip) because it's an IP or CIDR() to define the variable it should take into account? In both cases, this solution doesn't work. It can't find the IPs in the lookup's CIDRs...

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Treize 

In the Match type box you would do CIDR(fieldName) where fieldName is the name of the field in your lookup which contains the CIDR values.

 

0 Karma

Treize
Path Finder

In the meantime, I've come up with a simple idea: a subsearch for the lookup with 1000 lines and a simple "| lookup" command for the lookup with 50,000 lines.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Treize ,

it could run, but you should add another field to use for the check.

But, having the issue of so many rows, why you don't use a summary index, outting it in the main search so you don't have limits?

something like this:

(<my search>) OR (index=new_summary_index)
| eval ip=coalesce(ip,IP)
| stats 
     values(index) AS index
     dc(index) AS index_count
     BY ip
| where index_count=1 AND index=<your_index>
| fields ip
| outputlookup append=true override_if_empty=false 2.csv

Ciao.

Giuseppe

0 Karma

Treize
Path Finder

I don't understand why a summary index would be better?

We use 2 lookups:

- 1st because it comes from a third party

- 2nd because we need to increment it after treating this IP as an alert

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Treize ,

because summary index in a main search hasn't limits in the number of results.

Ciao.

Giuseppe

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

What Is Splunk? Here’s What You Can Do with Splunk

Hey Splunk Community, we know you know Splunk. You likely leverage its unparalleled ability to ingest, index, ...

Level Up Your .conf25: Splunk Arcade Comes to Boston

With .conf25 right around the corner in Boston, there’s a lot to look forward to — inspiring keynotes, ...

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Although it might seem daunting, as we’ve seen in this series, manual instrumentation can be straightforward ...