Splunk Search

How to submit a Splunk Python SDK query with a restricted time range and return more than 50000 rows?

nikos_d
Explorer

I am trying to submit a query which is limited to a restricted time window AND returns more than 50000 rows in Python.

I saw an answer on exceeding the 50000 row limit here but I cannot figure out how to add a custom time range to the query.

The only way I know how to submit a limited time-range query is via the one_shot query of the Python SDK:

    import splunklib.client as client
    import splunklib.results as results

    service = client.connect(host=HOST, port=PORT, username=USERNAME, password=PASSWORD)

    kwargs_oneshot = {"earliest_time": earliest_time, 
                      "latest_time": latest_time,
                      "output_mode": "xml",
                      "count": 0}

    searchquery_oneshot = basequery

    oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)

    reader = results.ResultsReader(oneshotsearch_results)

    for ix, item in enumerate(reader):
        for val in item.itervalues():
            print(val)

However,querying this way limits my results to 50000 rows. Any workarounds?

1 Solution

nikos_d
Explorer

Adapting from this solution: http://answers.splunk.com/answers/124848/python-sdk-paginate-result-set.html#answer-227017 (thanks @paramagurukarthikeyan for the pointer and the answer), the following seems to work:

import sys
import io
import splunklib.results as results
import splunklib.client as client

service = client.connect(host=HOST,port=PORT,username=USERNAME,password=PASSWORD)

job = service.jobs.create(search, **{"exec_mode": "blocking", 
                                 "earliest_time": start_time, 
                                 "latest_time": end_time,
                                 "output_mode": "xml",
                                 "maxEvents": 30000000})

resultCount = int(job["resultCount"])
offset = 0;                                # Start at result 0
count = 50000;                       # Get sets of count results at a time
thru_counter = 0

while (offset < resultCount):
    kwargs_paginate = {"count": count, "offset": offset}

    # Get the search results and display them
    rs = job.results(**kwargs_paginate)
    reader = results.ResultsReader(io.BufferedReader(rs))

    wrt = sys.stdout.write
    for ix, item in enumerate(reader):
        if not (thru_counter % 50000):  # print only one in 50000 results as sanity check
            line = ""
            for val in item.itervalues():
                line += val + ","
            wrt(line[:-1] + "\n")
        thru_counter += 1
    # Increase the offset to get the next set of results
    offset += count

There is a remaining issue, that the parsing is relatively slow (I am getting ~1300 rows/sec, where each row is 100 bytes, i.e. ~130 kbps). The reason is hinted at in the answer of @ineeman on March 10 2014 in this question http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html

I am posting a separate question to see if I can improve the speed of fetching the query results.

View solution in original post

nikos_d
Explorer

Adapting from this solution: http://answers.splunk.com/answers/124848/python-sdk-paginate-result-set.html#answer-227017 (thanks @paramagurukarthikeyan for the pointer and the answer), the following seems to work:

import sys
import io
import splunklib.results as results
import splunklib.client as client

service = client.connect(host=HOST,port=PORT,username=USERNAME,password=PASSWORD)

job = service.jobs.create(search, **{"exec_mode": "blocking", 
                                 "earliest_time": start_time, 
                                 "latest_time": end_time,
                                 "output_mode": "xml",
                                 "maxEvents": 30000000})

resultCount = int(job["resultCount"])
offset = 0;                                # Start at result 0
count = 50000;                       # Get sets of count results at a time
thru_counter = 0

while (offset < resultCount):
    kwargs_paginate = {"count": count, "offset": offset}

    # Get the search results and display them
    rs = job.results(**kwargs_paginate)
    reader = results.ResultsReader(io.BufferedReader(rs))

    wrt = sys.stdout.write
    for ix, item in enumerate(reader):
        if not (thru_counter % 50000):  # print only one in 50000 results as sanity check
            line = ""
            for val in item.itervalues():
                line += val + ","
            wrt(line[:-1] + "\n")
        thru_counter += 1
    # Increase the offset to get the next set of results
    offset += count

There is a remaining issue, that the parsing is relatively slow (I am getting ~1300 rows/sec, where each row is 100 bytes, i.e. ~130 kbps). The reason is hinted at in the answer of @ineeman on March 10 2014 in this question http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html

I am posting a separate question to see if I can improve the speed of fetching the query results.

nikos_d
Explorer

This is the link which did not show up above due to my low number of points: http://answers.splunk.com/answers/39243/python-sdk-results-limited-to-50-000.html

0 Karma
Get Updates on the Splunk Community!

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...