Splunk Search

How to change service.jobs.oneshot to return unlimited number of rows in its result set?

fere
Path Finder

I have a Python script to run nightly and extract data using Splunk REST API. Here is the code:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}
searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
f=open('myresults.csv', 'w')
f.write(oneshotsearch_results.read())

The resultset seem to have a limit of 100 records. Is there anyway to set it to unlimited? I don't see anything related to that on http://docs.splunk.com/Documentation/PythonSDK/1.2.2/client.html

If not, how else I can make sure I retrieve the entire result set?

Thanks

1 Solution

marco_sulla
Path Finder

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

View solution in original post

marco_sulla
Path Finder

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

nixonap
Engager

Really appreciated the depth and detail of this answer. It got our local dev environment working right searching and returning in minutes.

Are there any ideas on if the API consumer doesn't have the ability to change the Splunk instance's maxresultsrow? The client/consumer we are building will deploy separate to customers who have Splunk and we won't have authority to make that change, just advise that it should be made.

0 Karma

sramakr
New Member

did you find a solution? Great if you can share it.

Thanks

0 Karma

fere
Path Finder

as I mentioned, I changed my code to using blocking and pagination. The problem with stopping at 10000 was my oversight to forgetting to include 0 in the sort command. Adding 0 to sort command and looping took care of getting all the results back from the search command.

fere
Path Finder

I changed the python script to do blocking, using pagination example. it goes through the loop and extracts 100 (my count size for testing it), but it still stops when the offset is 10000! How can I make it receive 100s of 1000s of events?

0 Karma

fere
Path Finder

I know adding 'count':0 lets the resultset to return 10000 entries. However, I am lookign to export about 400000 records (or least 100000 entries on nightly basis). What is the best way to do that?

dolivasoh
Contributor

Have you looked at the limits.conf spec? Seems to me you'll be hitting one if not many output limits here. Even if you adjust your limits.conf to allow more output, you'll still hit a ceiling, most certainly on sub searches.

fere
Path Finder

sorry, codes lines in a readable format:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}

searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '

oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)

f=open('myresults.csv', 'w')

f.write(oneshotsearch_results.read())

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...