topic Re: How to submit a Splunk Python SDK query with a restricted time range and return more than 50000 rows? in Splunk Search

How to submit a Splunk Python SDK query with a restricted time range and return more than 50000 rows?

nikos_d — Thu, 21 May 2015 21:48:01 GMT

I am trying to submit a query which is limited to a restricted time window AND returns more than 50000 rows in Python.

I saw an answer on exceeding the 50000 row limit here but I cannot figure out how to add a custom time range to the query.

The only way I know how to submit a limited time-range query is via the one_shot query of the Python SDK:

    import splunklib.client as client
    import splunklib.results as results

    service = client.connect(host=HOST, port=PORT, username=USERNAME, password=PASSWORD)

    kwargs_oneshot = {"earliest_time": earliest_time, 
                      "latest_time": latest_time,
                      "output_mode": "xml",
                      "count": 0}

    searchquery_oneshot = basequery

    oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)

    reader = results.ResultsReader(oneshotsearch_results)

    for ix, item in enumerate(reader):
        for val in item.itervalues():
            print(val)

However,querying this way limits my results to 50000 rows. Any workarounds?

Re: How to submit a Splunk Python SDK query with a restricted time range and return more than 50000 rows?

nikos_d — Thu, 21 May 2015 21:49:56 GMT

This is the link which did not show up above due to my low number of points: http://answers.splunk.com/answers/39243/python-sdk-results-limited-to-50-000.html

Re: How to submit a Splunk Python SDK query with a restricted time range and return more than 50000 rows?

nikos_d — Wed, 27 May 2015 00:37:05 GMT

Adapting from this solution: http://answers.splunk.com/answers/124848/python-sdk-paginate-result-set.html#answer-227017 (thanks @paramagurukarthikeyan for the pointer and the answer), the following seems to work:

import sys
import io
import splunklib.results as results
import splunklib.client as client

service = client.connect(host=HOST,port=PORT,username=USERNAME,password=PASSWORD)

job = service.jobs.create(search, **{"exec_mode": "blocking", 
                                 "earliest_time": start_time, 
                                 "latest_time": end_time,
                                 "output_mode": "xml",
                                 "maxEvents": 30000000})

resultCount = int(job["resultCount"])
offset = 0;                                # Start at result 0
count = 50000;                       # Get sets of count results at a time
thru_counter = 0

while (offset < resultCount):
    kwargs_paginate = {"count": count, "offset": offset}

    # Get the search results and display them
    rs = job.results(**kwargs_paginate)
    reader = results.ResultsReader(io.BufferedReader(rs))

    wrt = sys.stdout.write
    for ix, item in enumerate(reader):
        if not (thru_counter % 50000):  # print only one in 50000 results as sanity check
            line = ""
            for val in item.itervalues():
                line += val + ","
            wrt(line[:-1] + "\n")
        thru_counter += 1
    # Increase the offset to get the next set of results
    offset += count

There is a remaining issue, that the parsing is relatively slow (I am getting ~1300 rows/sec, where each row is 100 bytes, i.e. ~130 kbps). The reason is hinted at in the answer of @ineeman on March 10 2014 in this question http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html

I am posting a separate question to see if I can improve the speed of fetching the query results.