Hi All,
I am running tstats command and matching with large lookup file but i am getting the "[subsearch]: Subsearch produced 144180 results, truncating to maxout 10000."
| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject | `drop_dm_object_name("All_Email")` | search [| inputlookup local_domain_intel.csv| rename domain as src_user |fields src_user ]
My question is : Does my search match it with lookup file and then it truncate to 10000 results ?
Or First it truncate the lookup file with 10000 than compare with my search ? I checked in my lookup last row i updated with known domain and saved , search didn't produce any result . If i move this known domain in first row than search matches and produces result.
Also how to remove this error ?
lookup file row : 144180
Reason
@sumitkathpal, if your use-case is to show tstats count only for sources/domains present in the lookup, instead of using | inputlookup
which will run as a subsearch, you can run lookup
command to identify the domains that are present in the lookup and those which are not can be filtered out. Please try the following query.
| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject
| `drop_dm_object_name("All_Email")`
| lookup local_domain_intel.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup
Following is the run anywhere sample approach that I used to test:
1) Created .15M events for lookup. Used streamstats to create unique source names as domain
. PS: datasource="lookup" is created as identifier just for demo purpose.
| makeresults count=150000
| fields - _time
| streamstats count as domain
| eval domain="src".printf("%06d",domain), datasource="lookup"
2) Piped outputlookup to above result to save as localtestdata.csv
| outputlookup localtestdata.csv
3) Used new query to generate stats count by various sources. PS: count below can be changed to any number you want to test. I tested with 1.5M
as well. Following is 15K for demo example. Eval function random()
along with substr()
is used to generate some random count
. PS: datasource="tstats" is just for demo purpose.
| makeresults count=15000
| fields - _time
| streamstats count as src_user
| eval src_user=if(src_user<=100,"0",src_user )
| eval src_user="src".printf("%06d",src_user), count=substr("".random(),4), datasource="tstats"
PS: | eval src_user=if(src_user<=100,"0",src_user )
eval has been added in raw event to rename first 100 events as src000000
so that not all events from search matches data in lookup.
4) Once Lookup file using Step 1 and Step 2 is created and you have run a new search with Query 3 to generate your sample events you can match the srcuser
field in raw event with domain
field in lookup and filter only matched domains using the following command:
| lookup localtestdata.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup
@sumitkathpal, if your use-case is to show tstats count only for sources/domains present in the lookup, instead of using | inputlookup
which will run as a subsearch, you can run lookup
command to identify the domains that are present in the lookup and those which are not can be filtered out. Please try the following query.
| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject
| `drop_dm_object_name("All_Email")`
| lookup local_domain_intel.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup
Following is the run anywhere sample approach that I used to test:
1) Created .15M events for lookup. Used streamstats to create unique source names as domain
. PS: datasource="lookup" is created as identifier just for demo purpose.
| makeresults count=150000
| fields - _time
| streamstats count as domain
| eval domain="src".printf("%06d",domain), datasource="lookup"
2) Piped outputlookup to above result to save as localtestdata.csv
| outputlookup localtestdata.csv
3) Used new query to generate stats count by various sources. PS: count below can be changed to any number you want to test. I tested with 1.5M
as well. Following is 15K for demo example. Eval function random()
along with substr()
is used to generate some random count
. PS: datasource="tstats" is just for demo purpose.
| makeresults count=15000
| fields - _time
| streamstats count as src_user
| eval src_user=if(src_user<=100,"0",src_user )
| eval src_user="src".printf("%06d",src_user), count=substr("".random(),4), datasource="tstats"
PS: | eval src_user=if(src_user<=100,"0",src_user )
eval has been added in raw event to rename first 100 events as src000000
so that not all events from search matches data in lookup.
4) Once Lookup file using Step 1 and Step 2 is created and you have run a new search with Query 3 to generate your sample events you can match the srcuser
field in raw event with domain
field in lookup and filter only matched domains using the following command:
| lookup localtestdata.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup
You can change the search to use the lookup, something like ( change fields as needed)
| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject | `drop_dm_object_name("All_Email")` | inputlookup local_domain_intel.csv domain AS src_user
This will match the src_user in the tstats results to lookup contents and return matched results.
The subsearch executes first and is limited to returning 10,000 results. It is governed by the maxout
setting in limits.conf, but must always be less than 10500 so changing this value will not help you. You will have to redesign your query.