Solved: How to change service.jobs.oneshot to return unlim...

fere · ‎10-23-2014

I have a Python script to run nightly and extract data using Splunk REST API. Here is the code:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}
searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
f=open('myresults.csv', 'w')
f.write(oneshotsearch_results.read())

The resultset seem to have a limit of 100 records. Is there anyway to set it to unlimited? I don't see anything related to that on http://docs.splunk.com/Documentation/PythonSDK/1.2.2/client.html

If not, how else I can make sure I retrieve the entire result set?

Thanks

marco_sulla · ‎04-15-2015

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

View solution in original post

marco_sulla · ‎04-15-2015

SHORT ANSWER

You have to create a $SPLUNK_HOME/etc/system/local/limits.conf file, add the stanza:

[restapi]
maxresultrows = 4294967295

Furthermore you have to add 0 to your sort search command:

query = """
search source=xyz event="watch" | 
table _time event | 
sort 0 - _time
"""

and run in your Python code:

service.jobs.oneshot(query, count=0)

LONG ANSWER

If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK

you can read for job.oneshot() that

The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())

So job.oneshot() is a job.create() followed by a job.results() (almost). So it can take the arguments of create():
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams

and the arguments of results():
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf

Note that I specified 2^32 - 1 in maxresultrows because if you run this code on a 32 bit machine it hangs:

job = splunk_connection.jobs.create(search, max_count=2**32)

This is probably caused by a C for loop.

From sort documentation:

sort <count>+ [desc]

<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort

nixonap · ‎05-20-2019

Really appreciated the depth and detail of this answer. It got our local dev environment working right searching and returning in minutes.

Are there any ideas on if the API consumer doesn't have the ability to change the Splunk instance's maxresultsrow? The client/consumer we are building will deploy separate to customers who have Splunk and we won't have authority to make that change, just advise that it should be made.

sramakr · ‎04-13-2015

did you find a solution? Great if you can share it.

Thanks

fere · ‎04-21-2015

as I mentioned, I changed my code to using blocking and pagination. The problem with stopping at 10000 was my oversight to forgetting to include 0 in the sort command. Adding 0 to sort command and looping took care of getting all the results back from the search command.

fere · ‎10-24-2014

I changed the python script to do blocking, using pagination example. it goes through the loop and extracts 100 (my count size for testing it), but it still stops when the offset is 10000! How can I make it receive 100s of 1000s of events?

fere · ‎10-23-2014

I know adding 'count':0 lets the resultset to return 10000 entries. However, I am lookign to export about 400000 records (or least 100000 entries on nightly basis). What is the best way to do that?

dolivasoh · ‎04-14-2015

Have you looked at the limits.conf spec? Seems to me you'll be hitting one if not many output limits here. Even if you adjust your limits.conf to allow more output, you'll still hit a ceiling, most certainly on sub searches.

fere · ‎10-23-2014

sorry, codes lines in a readable format:

kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}

searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '

oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)

f=open('myresults.csv', 'w')

f.write(oneshotsearch_results.read())

How to change service.jobs.oneshot to return unlimited number of rows in its result set?

SHORT ANSWER

LONG ANSWER

SHORT ANSWER

LONG ANSWER

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

How to change service.jobs.oneshot to return unlimited number of rows in its result set?

SHORT ANSWER

LONG ANSWER

SHORT ANSWER

LONG ANSWER

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?