Splunk Search

How to change service.jobs.oneshot to return unlimited number of rows in its result set?

Path Finder

I have a Python script to run nightly and extract data using Splunk REST API. Here is the code:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}
searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
f=open('myresults.csv', 'w')
f.write(oneshotsearch_results.read())

The resultset seem to have a limit of 100 records. Is there anyway to set it to unlimited? I don't see anything related to that on http://docs.splunk.com/Documentation/PythonSDK/1.2.2/client.html

If not, how else I can make sure I retrieve the entire result set?

Thanks

1 Solution

Path Finder

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

View solution in original post

Path Finder

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

View solution in original post

Engager

Really appreciated the depth and detail of this answer. It got our local dev environment working right searching and returning in minutes.

Are there any ideas on if the API consumer doesn't have the ability to change the Splunk instance's maxresultsrow? The client/consumer we are building will deploy separate to customers who have Splunk and we won't have authority to make that change, just advise that it should be made.

0 Karma

New Member

did you find a solution? Great if you can share it.

Thanks

0 Karma

Path Finder

as I mentioned, I changed my code to using blocking and pagination. The problem with stopping at 10000 was my oversight to forgetting to include 0 in the sort command. Adding 0 to sort command and looping took care of getting all the results back from the search command.

Path Finder

I changed the python script to do blocking, using pagination example. it goes through the loop and extracts 100 (my count size for testing it), but it still stops when the offset is 10000! How can I make it receive 100s of 1000s of events?

0 Karma

Path Finder

I know adding 'count':0 lets the resultset to return 10000 entries. However, I am lookign to export about 400000 records (or least 100000 entries on nightly basis). What is the best way to do that?

Contributor

Have you looked at the limits.conf spec? Seems to me you'll be hitting one if not many output limits here. Even if you adjust your limits.conf to allow more output, you'll still hit a ceiling, most certainly on sub searches.

Path Finder

sorry, codes lines in a readable format:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}

searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '

oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)

f=open('myresults.csv', 'w')

f.write(oneshotsearch_results.read())

0 Karma