topic Python SDK Paginate Result Set in Splunk Dev

Python SDK Paginate Result Set

michaudel — Thu, 27 Feb 2014 18:17:01 GMT

So I have a fairly simple python script i have been working on which gets the results from search and does some work on them. However i am having some trouble paginating through the results so i can pull in more than 50K results.

Following the documentation works where it paginates the result set 10 at a time, but this takes a really long time, even just to iterate through 90K results.

so even though my result count is about 90K the result reader is always giving me 0.

For some reason no matter what i put for a value of the count other than 10 it breaks.

Any thoughts would be great, thanks for your help.

searchPayroll = """ <some search>"""
#returns a job from a service connection which performs the search
job = doSearch(searchPayroll)

 # Page through results by looping through sets of 10 at a time
resultCount = job["resultCount"]  # Number of results this job returned
offset = 0                     # Start at result 0
count = getMaxResults()                # 1 less that the max result count.
trackerpayroll=0                    #track result count
dictResultSetPayRoll = dict()


while (offset < int(resultCount)):
    kwargs_paginate = {"count": count,
                       "offset": offset}

    # Get the search results
    blocksearch_results = job.results(**kwargs_paginate)
    readerResults = results.ResultsReader(blocksearch_results)

    for result in readerResults:
       <do stuff here problem:
        result count (from job["resultCount"]) is 90K
        reader results = 0

    trackerpayroll += 1
    # Increase the offset to get the next set of results
    offset += count

Re: Python SDK Paginate Result Set

paramagurukarth — Tue, 07 Apr 2015 13:25:27 GMT

Please provide the code for your doSearch method

Re: Python SDK Paginate Result Set

paramagurukarth — Tue, 07 Apr 2015 13:32:05 GMT

In your implementation, if dosearchjob method internally uses splunk.search.dispatch..

add maxEvents=30000000 to your **kwargs ..

i.e, splunk.search.dispatch(searchquery,sessionKey=sessionkey,hostPath=baseurl,earliestTime=earliestTime,latestTime=latestTime,maxEvents=30000000)

and use the below implementation

searchjob = dosearchjob(quey)
    resultCount  = searchjob.resultCount
    offsetValue = 0
    searchresults = ""
    while offsetValue < resultCount:
        searchresults = searchresults +  str(searchjob.getFeed(mode='results', outputMode='csv',count=49999,offset=offsetValue))
        offsetValue = offsetValue + 49999

Use whatever outputMode you want 🙂