Solved: Re: Subsearch produced 144180 results, truncating ...

sumitkathpal · ‎04-28-2019

Hi All,

I am running tstats command and matching with large lookup file but i am getting the "[subsearch]: Subsearch produced 144180 results, truncating to maxout 10000."

| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject | `drop_dm_object_name("All_Email")` | search [| inputlookup  local_domain_intel.csv| rename domain as src_user |fields src_user ]

My question is : Does my search match it with lookup file and then it truncate to 10000 results ?
Or First it truncate the lookup file with 10000 than compare with my search ? I checked in my lookup last row i updated with known domain and saved , search didn't produce any result . If i move this known domain in first row than search matches and produces result.

Also how to remove this error ?

lookup file row : 144180

Reason

niketn · ‎04-29-2019

@sumitkathpal, if your use-case is to show tstats count only for sources/domains present in the lookup, instead of using | inputlookup which will run as a subsearch, you can run lookup command to identify the domains that are present in the lookup and those which are not can be filtered out. Please try the following query.

| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject 
| `drop_dm_object_name("All_Email")` 
| lookup local_domain_intel.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup

Following is the run anywhere sample approach that I used to test:

1) Created .15M events for lookup. Used streamstats to create unique source names as domain. PS: datasource="lookup" is created as identifier just for demo purpose.

| makeresults count=150000 
| fields - _time
| streamstats count as domain 
| eval domain="src".printf("%06d",domain), datasource="lookup"

2) Piped outputlookup to above result to save as localtestdata.csv

| outputlookup localtestdata.csv

3) Used new query to generate stats count by various sources. PS: count below can be changed to any number you want to test. I tested with 1.5M as well. Following is 15K for demo example. Eval function random() along with substr() is used to generate some random count. PS: datasource="tstats" is just for demo purpose.

| makeresults count=15000 
| fields - _time 
| streamstats count as src_user
| eval src_user=if(src_user<=100,"0",src_user )
| eval src_user="src".printf("%06d",src_user), count=substr("".random(),4), datasource="tstats"

PS: | eval src_user=if(src_user<=100,"0",src_user ) eval has been added in raw event to rename first 100 events as src000000 so that not all events from search matches data in lookup.

4) Once Lookup file using Step 1 and Step 2 is created and you have run a new search with Query 3 to generate your sample events you can match the srcuser field in raw event with domain field in lookup and filter only matched domains using the following command:

| lookup localtestdata.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn · ‎04-29-2019

@sumitkathpal, if your use-case is to show tstats count only for sources/domains present in the lookup, instead of using | inputlookup which will run as a subsearch, you can run lookup command to identify the domains that are present in the lookup and those which are not can be filtered out. Please try the following query.

| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject 
| `drop_dm_object_name("All_Email")` 
| lookup local_domain_intel.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup

Following is the run anywhere sample approach that I used to test:

1) Created .15M events for lookup. Used streamstats to create unique source names as domain. PS: datasource="lookup" is created as identifier just for demo purpose.

| makeresults count=150000 
| fields - _time
| streamstats count as domain 
| eval domain="src".printf("%06d",domain), datasource="lookup"

2) Piped outputlookup to above result to save as localtestdata.csv

| outputlookup localtestdata.csv

3) Used new query to generate stats count by various sources. PS: count below can be changed to any number you want to test. I tested with 1.5M as well. Following is 15K for demo example. Eval function random() along with substr() is used to generate some random count. PS: datasource="tstats" is just for demo purpose.

| makeresults count=15000 
| fields - _time 
| streamstats count as src_user
| eval src_user=if(src_user<=100,"0",src_user )
| eval src_user="src".printf("%06d",src_user), count=substr("".random(),4), datasource="tstats"

PS: | eval src_user=if(src_user<=100,"0",src_user ) eval has been added in raw event to rename first 100 events as src000000 so that not all events from search matches data in lookup.

4) Once Lookup file using Step 1 and Step 2 is created and you have run a new search with Query 3 to generate your sample events you can match the srcuser field in raw event with domain field in lookup and filter only matched domains using the following command:

| lookup localtestdata.csv domain as src_user outputnew domain as domainFromLookup
| search domainFromLookup!=""
| fields - domainFromLookup

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

lakshman239 · ‎04-29-2019

You can change the search to use the lookup, something like ( change fields as needed)

| tstats `summariesonly` count from datamodel=Email by All_Email.src_user All_Email.subject | `drop_dm_object_name("All_Email")` | inputlookup  local_domain_intel.csv domain AS src_user

This will match the src_user in the tstats results to lookup contents and return matched results.

richgalloway · ‎04-29-2019

The subsearch executes first and is limited to returning 10,000 results. It is governed by the maxout setting in limits.conf, but must always be less than 10500 so changing this value will not help you. You will have to redesign your query.

---
If this reply helps you, Karma would be appreciated.

Subsearch produced 144180 results, truncating to maxout 10000.

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

Are you a member of the Splunk Community?

Subsearch produced 144180 results, truncating to maxout 10000.

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps