Splunk Search

Why am I getting a limited number of events returned using the join command?

lbogle
Contributor

Hello Splunkers.
I have the below search/subsearch which are working fine by themselves, but when I try to join them to create a 'master' list, I suddenly lose events. The main search returns approximately 9.3K events/hostnames and the sub search returns approx 11.8K hostnames. I'm expecting that joining them would return a number of hostnames, somewhere down the middle but I only get a little over 4K in return.
What am I doing wrong?

index=asset_db source="/var/asset_database/fullpull.csv" "System Name"=* NOT "Purpose2"=Farm | convert timeformat="%m/%d/%Y" mktime("Last Audit") as last_audit_time | eval timer=now()-(90*24*60*60) | where last_audit_time>timer | rename "OS Name" as OS | rename "System Name" AS hostname | eval hostname=lower(hostname) | join hostname [search index=assets source="/scratch/cadence_assets/AD-host-report.CSV" earliest=-90d@d latest=-0d@d Name=* "Operating System"=* | rename "Operating System" AS OS | rename Name AS hostname | eval hostname=lower(hostname) | fields hostname,OS]

Thanks!

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

the maximum number of result from the sub search is 10000.
the maximum for a join is 50000

see http://docs.splunk.com/Documentation/Splunk/6.1.4/Admin/Limitsconf

[subsearch]
* This stanza controls subsearch results.
* NOTE: This stanza DOES NOT control subsearch results when a subsearch is called by
  commands such as join, append, or appendcols. 
* Read more about subsearches in the online documentation: 
  http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

maxout = <integer>
* Maximum number of results to return from a subsearch.
* This value cannot be greater than or equal to 10500.
* Defaults to 10000.

[join]
subsearch_maxout = <integer>
* Maximum result rows in output from subsearch to join against.
* Defaults to 50000

subsearch_maxtime = <integer>
* Maximum search time (in seconds) before auto-finalization of subsearch.
* Defaults to 60 

subsearch_timeout = <integer>
* Maximum time to wait for subsearch to fully finish (in seconds).
* Defaults to 120

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

the maximum number of result from the sub search is 10000.
the maximum for a join is 50000

see http://docs.splunk.com/Documentation/Splunk/6.1.4/Admin/Limitsconf

[subsearch]
* This stanza controls subsearch results.
* NOTE: This stanza DOES NOT control subsearch results when a subsearch is called by
  commands such as join, append, or appendcols. 
* Read more about subsearches in the online documentation: 
  http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsubsearches

maxout = <integer>
* Maximum number of results to return from a subsearch.
* This value cannot be greater than or equal to 10500.
* Defaults to 10000.

[join]
subsearch_maxout = <integer>
* Maximum result rows in output from subsearch to join against.
* Defaults to 50000

subsearch_maxtime = <integer>
* Maximum search time (in seconds) before auto-finalization of subsearch.
* Defaults to 60 

subsearch_timeout = <integer>
* Maximum time to wait for subsearch to fully finish (in seconds).
* Defaults to 120
0 Karma

lbogle
Contributor

Odd then that I maxed out at less than 5K. This wouldn't be the first time that the way I've crafted queries has introduced me to a maximum search result limit.
Any suggestions on how I should get those two searches joined? I'm trying to use Splunk to paint a picture of our asset inventory where if I can join the two asset logs by hostname and OS, I can understand what we have out there in our environment and then later search against that main query to use it as a living master repository list. Use it as a sub/main search against an virus scan log for example to see what machines have a virus scan utility installed on them etc.
Thanks for any assistance.

0 Karma

MuS
Legend
0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...