Hi, there,
I'm simplifying the context:
We've had a perfectly working correlation rule for several years now, and for the past 2 days it hasn't been working properly.
The command has to list IPs and then check if these IPs are not in a first lookup and then in a second lookup. If the IPs are not in either lookup, an alert is triggered. The IPs are then added to the second lookup, so that they can be ignored for future searches.
It looks like this:
<my search>
| dedup ip
| search NOT [ | inputlookup 1.csv ]
| search NOT [ | inputlookup 2.csv ]
| fields ip
| outputlookup append=true override_if_empty=false 2.csv
The lookups are both identical:
IP
-------
1.1.1.1
2.2.2.2
etc
The first lookup has 1000 lines
The second lookup has 55000 lines
Everything was working fine, but now we have IPs that are triggering alerts despite being in the second lookup.
Any ideas?
Thanks a lot.
Your subsearch only returns a limited number of rows (possibly only 10000 not the 55000 you were expecting).
As an experiment, I tried this
| makeresults count=50001
| streamstats count as row
| outputlookup large.csv
Followed by
| makeresults count=50010
| streamstats count as row
| search NOT
[| inputlookup large.csv]
I still get 40010 rows not the 9 that I "should" get
You could break up the csv so that you process it in chunks of 10000 events
| makeresults count=50010
| streamstats count as row
| search NOT
[| inputlookup large.csv
| head 10000]
| search NOT
[| inputlookup large.csv
| head 20000
| tail 10000]
| search NOT
[| inputlookup large.csv
| head 30000
| tail 10000]
| search NOT
[| inputlookup large.csv
| head 40000
| tail 10000]
| search NOT
[| inputlookup large.csv
| head 50000
| tail 10000]
That solution isn't perfect but it's a good tips, thanks dude
Hi @Treize ,
I don't know why your lookup doesn't correctly run, how many lines there are in your lookups?
maybe you supered the limit of 10,000 rows.
Then, did you changed the name of the "ip" field in the main search or in the lookups?
anyway, your search isn't optimized: it should be better to insert the two subsearches in the main search and declare the field to use in the search using the field command:
<my search> NOT ([ | inputlookup 1.csv | fields ip ] OR [ | inputlookup 2.csv | fields ip ]
| dedup ip
| fields ip
| outputlookup append=true override_if_empty=false 2.csv
Ciao.
Giuseppe
Thanks, I'll keep the optimization in my mind and implement it as soon as my problem is solved.
Hi @Treize ,
The issue is likely due to subsearch limitations. Subsearches have default limits on the number of results (10,000) and runtime, and your second lookup (55,000 lines) exceeds the default output limit. This means the subsearch [ | inputlookup 2.csv ] might only be returning the first 10,000 IPs, causing the outer search to incorrectly identify IPs beyond that limit as "not found".
Use the lookup command instead, which is more efficient and avoids these limits.
| dedup ip | lookup 1.csv IP AS ip OUTPUT IP AS found_in_1 | where isnull(found_in_1) | lookup 2.csv IP AS ip OUTPUT IP AS found_in_2 | where isnull(found_in_2) | fields ip | outputlookup append=true override_if_empty=false 2.csv
This search first finds unique IPs, then uses lookup to check against 1.csv. It keeps only the IPs not found (where isnull(found_in_1)). It then repeats the process for 2.csv. Finally, it outputs the remaining IPs (those not found in either lookup) to 2.csv.
Quick Tip: Double-check that the IP column in your lookup files doesn't contain leading/trailing whitespace
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hello and thank you for your suggestion
It works pretty well indeed!
The only problem, and it's my fault, I forgot to specify that the 1.csv contains CIDRs and not simple IPs... Any ideas? 😃
I'd never have thought of the limits of subsearch. How has it worked all these years when we exceeded 10,000 lines years ago?
Hi @Treize
If your "ip" field in the lookup is a CIDR then configure it as an lookup definition (rather than referencing it as <lookup1>.csv and then under Advanced Options set the Match type to CIDR(ip) as below:
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing.
The lookup has already been defined. The variable is not really “ip” so in the definition should we put CIDR(ip) because it's an IP or CIDR() to define the variable it should take into account? In both cases, this solution doesn't work. It can't find the IPs in the lookup's CIDRs...
Hi @Treize
In the Match type box you would do CIDR(fieldName) where fieldName is the name of the field in your lookup which contains the CIDR values.
In the meantime, I've come up with a simple idea: a subsearch for the lookup with 1000 lines and a simple "| lookup" command for the lookup with 50,000 lines.
Hi @Treize ,
it could run, but you should add another field to use for the check.
But, having the issue of so many rows, why you don't use a summary index, outting it in the main search so you don't have limits?
something like this:
(<my search>) OR (index=new_summary_index)
| eval ip=coalesce(ip,IP)
| stats
values(index) AS index
dc(index) AS index_count
BY ip
| where index_count=1 AND index=<your_index>
| fields ip
| outputlookup append=true override_if_empty=false 2.csv
Ciao.
Giuseppe
I don't understand why a summary index would be better?
We use 2 lookups:
- 1st because it comes from a third party
- 2nd because we need to increment it after treating this IP as an alert
Hi @Treize ,
because summary index in a main search hasn't limits in the number of results.
Ciao.
Giuseppe