I have a query where I am using three joins to combine data from lookup , index and summary index.Also I am running the search alltime and for all hosts(158)....if I run for 10 hosts at a time I am getting the correct results but anything more than 10 I am getting the values as 0.
Is there a limit on splunk results OR is there a setting I need to make changes or a .conf file.Below is the search I am running
| inputlookup event_time.csv
| table host _time |rename host as orig_host
| eval Latest_Event_Time=strftime(Latest_Event_Time,"%Y-%m-%d %H:%M:%S")
| join orig_host max=0
[
search
index=summary
| table _time orig_host Boot_Time PCIe_Bus_Id MODEL PN BOARDPN SN VBIOS INFOROM SMs L2SLICES
| eval GPU = "SN: ".SN." PN: ".PN." BOARDPN: ".BOARDPN
| sort 0 +_time
| streamstats first(Boot_Time) as Start_Time last(Boot_Time) as Boot_Time by SN GPU orig_host
| table SN GPU orig_host Start_Time Boot_Time
| stats last(Boot_Time) as Boot_Time by SN GPU orig_host Start_Time
|replace "" WITH "remove" IN Start_Time
|replace "" WITH "remove" IN Boot_Time
|search Start_Time!=remove
|search Boot_Time!=remove
]
| join orig_host Boot_Time type=left
[ search
index=summary summary_type=preos_fru_boot_time earliest=0
| sort 0 orig_host +_time
| table orig_host _time Boot_Time Last_Boot_Time
| rename Boot_Time as Next_Boot_Time Last_Boot_Time as Boot_Time
]
| sort PRODUCTSN Start_Time
| reverse
| autoregress Start_Time as Next_Host_Start_Time
| autoregress SN as Next_SN
| reverse
| eval Next_Host_Start_Time = if (SN != Next_SN, "", Next_Host_Start_Time)
| eval Next_Host_Start_Time = strptime(Next_Host_Start_Time, "%Y-%m-%d %H:%M:%S")
| eval Next_Boot_Time = strptime(Next_Boot_Time, "%Y-%m-%d %H:%M:%S")
| eval Latest_Event_Time = strptime(Latest_Event_Time, "%Y-%m-%d %H:%M:%S")
| eval Estimated_End_Time = min(Latest_Event_Time, Next_Boot_Time, Next_Host_Start_Time)
| eval Estimated_End_Time = strftime(Estimated_End_Time, "%Y-%m-%d %H:%M:%S")
| eval Next_Host_Start_Time = strftime(Next_Host_Start_Time, "%Y-%m-%d %H:%M:%S")
| eval Next_Boot_Time = strftime(Next_Boot_Time, "%Y-%m-%d %H:%M:%S")
| eval Latest_Event_Time = strftime(Latest_Event_Time, "%Y-%m-%d %H:%M:%S")
| eval search_end_time = if("" = "", now(), "")
| eval Estimated_End_Time = strptime(Estimated_End_Time, "%Y-%m-%d %H:%M:%S")
| eval Estimated_End_Time = min(search_end_time, Estimated_End_Time)
| eval etime = strptime(Start_Time, "%Y-%m-%d %H:%M:%S")
| eval ltime = Estimated_End_Time
| eval Estimated_Installed_Time = ltime - etime
| eval Estimated_Installed_Time = round(Estimated_Installed_Time,0)
| eval Estimated_End_Time = strftime(Estimated_End_Time, "%Y-%m-%d %H:%M:%S")
| eval job_days = 0
| table SN GPU orig_host Start_Time Estimated_End_Time Estimated_Installed_Time job_days
| sort SN Start_Time
| join orig_host max=0 type=left
[ search
index=jobs
| convert dur2sec(start_to_end_time) as diff_in_seconds
| eval job_hours=round((diff_in_seconds)/60/60,1)
| stats sum(job_hours) as total_job_hours by orig_host time_start, time_end, start_to_end_time, partition, submit_line, account, user, job_id
| convert dur2sec(start_to_end_time) as diff_in_seconds
| eval job_days=round((diff_in_seconds)/60/60/24,6)
| fields orig_host time_start time_end start_to_end_time job_days
]
| eval job_start_time = strptime(time_start, "%Y-%m-%d %H:%M:%S")
| eval job_end_time = strptime(time_end, "%Y-%m-%d %H:%M:%S")
| eval Start_Time = strptime(Start_Time, "%Y-%m-%d %H:%M:%S")
| eval Estimated_End_Time = strptime(Estimated_End_Time, "%Y-%m-%d %H:%M:%S")
| eval diff = job_start_time - Start_Time
| eval diff2 = Estimated_End_Time - job_end_time
| search (diff > 0 AND diff > 0) OR job_days = 0
| eval Start_Time = strftime(Start_Time, "%Y-%m-%d %H:%M:%S")
| eval Estimated_End_Time = strftime(Estimated_End_Time, "%Y-%m-%d %H:%M:%S")
| fields SN GPU orig_host Start_Time Estimated_End_Time Estimated_Installed_Time time_start time_end start_to_end_time
| convert dur2sec(start_to_end_time) as start_to_end_time
| stats sum(start_to_end_time) as Job_Runtime by SN GPU orig_host Start_Time Estimated_End_Time Estimated_Installed_Time
| eval Job_Runtime = if (Estimated_Installed_Time<Job_Runtime, Estimated_Installed_Time, Job_Runtime)
| eval Estimated_Installed_Time=tostring(Estimated_Installed_Time, "duration")
| eval Job_Runtime=tostring(Job_Runtime, "duration")
| fillnull value="00:00:00" Job_Runtime
| sort SN Start_Time
Hi @power12,
there's a limit to the chars in a search from the browser you're using, but I don't remember what is probably 10,000 chars.
But probably the most important limit is the time you have to wait for the results of a search!
In other words, as @isoutamo said, Splunk isn't a DB, so join command should be used only when you haven't any other solution and you have to correlate few events, otherwise it isn't a good solution because it's very slow and high resource consuming command.
In addition there's a limit to subsearches of 50,000 results so there's also the possibility to have incomplete results.
In conclusion you have to re-design your searhc using the stats command to correlate different searches:
e.g. the first inputlookup could be transformed in a lookup command (that's like a left join),
then the other searches could be inserted in teh same main search or used to populate a DataModel, that's a database table.
Ciao.
Giuseppe
Hi
I'm not sure if there is any real limits how many joins you could use? But as SPL is not same as SQL you actually should avoid join anytime when it's possible and replace it with e.g. stats. There are couple of presentations how this can do and why
Your issues is probably that SPL's sub search can run only 60s and returns only 10k rows. You should see if this is your issue by looking it from job inspector. Subsearch performance considerations
Another hint is always use streaming commands and transforms as later as possible. That way you run searches parallel on indexers as long as possible and then return as small subset of events to search head as possible. You could see which command is transforming and which streaming form documentation (e.g. table is transforming and fields is streaming).
r. Ismo