How can I get the Splunk Python SDK API to return ...

nikos_d · ‎05-27-2015

How can I get the splunk SDK API to return results faster than 100 kB / second?

Some context: I am trying to create queries for limited time range, which return more than 50000 rows. I have managed to do this using a blocking query and paginating the results for the reader, as described in the answer here: http://answers.splunk.com/answers/237043/how-to-submit-a-splunk-python-sdk-query-with-a-res.html (sorry but my reputation of 20 does not allow posting links).

With the code posted in the link above, I can read the query results at a rate of around 100 kB / sec for queries of size 10 MB, which is not fast enough.

To make reading the results faster I tried the solution in the accepted answer to this question: http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html
which did not help.

What did help was increasing the paginate counter to 50000, which implies that the underlying problem is what @ineeman suggested in his answer to the question (http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html, look for the answer on Mar 10, 2014 at 05:01 PM). To cut a long story short: using larger and larger offsets makes reading the results slow, as they have to be zipped and unzipped on disk. The correct solution to the problem seems to be using the export endpoint of the API, and read the results as they stream to you.

I'd love any hints onhow to make my query reader faster. Can I make the paginated blocking query read results faster? If not, can I use the export endpoint to run a query with limited time range and more than 50000 results? (a link to examples of the export endpoint in use would be most helpful)

xavierashe · ‎11-08-2017

service.jobs.export is your answer. export doesn't have the 50,000 row limit. Also, be sure to set preview to false.

kwargs_export = {
      "earliest_time": datetime.datetime(2015, 6, 29).isoformat(),
      "latest_time": datetime.datetime(2016, 4, 11).isoformat(),
      "search_mode": "normal",
      "preview": False
  }
searchString = "search index=* | head 5"
rr = results.ResultsReader(service.jobs.export(searchString, **kwargs_export))
for result in rr:
    if isinstance(result, results.Message):
        # Diagnostic messages may be returned in the results
        print '%s: %s' % (result.type, result.message)
    elif isinstance(result, dict):
        # Normal events are returned as dicts
        print result

Ayn · ‎09-02-2019

Downvoting because export does have the 50000 event limit.

alancalvitti · ‎01-16-2020

@Ayn, any workaround to that limit? I know for Enterprise only admin can set maxresultrows in LIMITS.CONFIG.

How can I get the Splunk Python SDK API to return results faster than 100 kB per second?

Enhance Your Splunk App Development: New Tools & Support

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Are you a member of the Splunk Community?

How can I get the Splunk Python SDK API to return results faster than 100 kB per second?

Enhance Your Splunk App Development: New Tools & Support

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code