...dedup Order_Number|search NOT[|inputlookup Order_Details_Lookup.csv|fields Order_Number]|table Order_Number
Need to compare and show all the records which are not available in lookup file. Here Query is working fine but only problem is there are more then 10000 records in lookup file and query is truncating the records and comparing only 10k records leaving all other. I want to compare all 20k records.
Note- Records in lookup file can be more . for ex- 50k or 1 lakh
Is there any way to increase sub search limit ?
I just came across this gem via a co-worker. do:
dedup Order_Number| search NOT[ |inputlookup Order_Details_Lookup.csv |stats values(Order_Number) AS Order_Number] |table Order_Number
That will make the subsearch return a single row with a multi-value field containing all of the order numbers but the individual values will get passed along correctly into the base search.
This is great, I was stuck using lookup but it's very slow and this solves my problem, as long as the values() limit is not too small. I've done some testing and so far I haven't managed to hit the limit
This doesn't hit the maxout limit:
|makeresults| eval a=1 | append [|inputlookup large_lookup.csv | head 150000| stats values(clientip) AS clientip ] | stats dc(clientip)
This hits the maxout limit:
|makeresults| eval a=1 | append [|inputlookup large_lookup.csv | head 150000| fields clientip ] | stats dc(clientip)
[subsearch]: Search Processor: Subsearch produced 150000 results, truncating to maxout 50000.
reading the docs on
limits.conf http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf it looks like there is no limit by default:
maxvalues = <integer> * Maximum number of values for any field to keep track of. * When set to “0”: Specifies an unlimited number of values. * Default: 0
A better answer may be to use the lookup as a lookup rather than just as a mechanism to exclude events with a subsearch.
...| dedup Order_Number|search NOT[|inputlookup Order_Details_Lookup.csv|fields Order_Number]|table Order_Number
Making the assumptions that
1) there's some other field in here besides Order_Number
2) at least one of those other fields is present on all rows.
Then let's call that field "otherLookupField" and then we can instead do:
...| dedup Order_Number|lookup Order_Details_Lookup.csv Order_Number OUTPUT otherLookupField | search NOT otherLookupField=*
You'll end up filtering out all of the rows whose Order_Number is present in the lookup, and of course there are no longer any limits on number of rows or search execution time. No subsearch stuff to worry about at all.
The problem with this is that it's very slow as splunk still has to pull all of the events from the disk and then compare them against the lookup. If you use the sub search splunk can use the bloom filters to determine if a bucket is in or out.
thanks for your quick reply. i have gone through provided links but still my problem is not solved. even in documents its given that the sub serach limit can not be more than 10500 whereas i have to compare 20k records. could you be more specific on which limit i need to set in limits.conf file.
* This stanza controls subsearch results.
* NOTE: This stanza DOES NOT control subsearch results when a subsearch is called by
commands such as join, append, or appendcols.
* Read more about subsearches in the online documentation:
* Maximum number of results to return from a subsearch.
* This value cannot be greater than or equal to 10500.
* Defaults to 10000.