I am looking to export the results of a Splunk search that contains transforming commands. When I run the same search in the web GUI the live results "hang" on 50,000 stats, but once the search is complete it shows more than 300,000. (screenshots provided below)
"preview":true
What am I missing?
Using python 3.9's requests, my script contains the following:
headers={'Authorization': 'Splunk %s' % sessionKey}
parameters={'exec_mode': "oneshot", 'output_mode':output_type, 'adhoc_search_level':'fast', 'count':0}
with post(url=baseurl + '/services/search/jobs/export',params=parameters, data=({'search': search_query}), timeout=60, headers=headers, verify=False, stream=True) as response:
How long does your search take? You have a timeout of 60 (seconds?) - could this be increased?
The trouble is not the timeout. When the Splunk search is complete the results are successfully moved into an output file.
I am interested in learning how to return only the final results of ~300,000 rows from Splunk. The current URI path and parameters are leading to an output of more than 1,924,900 rows. I think this is some indication that Splunk is streaming "live" results and I only want the final results.
I don't know how this works but I did notice that you have stream=True - could this be the issue? Have you tried with stream=False?
The instance of "Stream=True" that you are referring to is not related to any part of the request to Splunk's REST API.
This is part of the python requests module/library that, if the response is available as a stream, reads the response as a stream. Further into the script this allows for the results to be written "live" as they are returned... instead of as one massive 2,924,899 line file. In essence, this is a band-aid to patch the issue I am actually asking in my question.