Splunk Dev

Python SDK results limited to 50,000?

lemon_wire
Engager

I am using the search.py script from the SDK. I have the following set :


--max_count=1000000 --count=0 --output_mode=csv

Yet the results always limit themselves to 100,000 lines? Why is this?

Doesn't --count=0 mean send me all results?

And then --max_count=1000000 means mark the job as finished at 1,000,000 events? So why do I only get 100,000 records? Always about 100,000

CORRECTION: although there are 100,000 lines, in fact there are 50,000 records! The lines are wrapped and have a CR in them hence the 100,000.

Why am I hitting a limit of 50,000 records?
Thanks

Tags (3)

wcolgate_splunk
Splunk Employee
Splunk Employee

You can get all the events, even when the result set is greater than the limits specified in the conf files. The concept is the same as specified in this splunk-answer, though the language is Java: http://splunk-base.splunk.com/answers/43907/change-default-limit-of-100-results-using-java-sdk

wcolgate_splunk
Splunk Employee
Splunk Employee

There are two ways to extract events from Splunk that are larger than the .conf file specifies.

The first is to submit a search job, and use the results reader to extract them. The second is to use the export endpoint.

Using Jobs:

$ python job.py create "search index=_internal *"
1327495772.12

[ though not necessary, list the jobs ]

python job.py list
@0 : 1327495772.12
@1 : scheduler_nobodytesting_c2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
ZHMgKGV4aXN0aW5nIGpvYiBjYXNlKSB0aW1lY2hhcnQ_at_1327495740_6d60a6288748cac8
@2 : scheduler
nobodytesting_c2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
ZHMgKGV4aXN0aW5nIGpvYiBjYXNlKQ_at_1327495740_591022e8c9a54450
@3 : scheduler
nobodytesting_c2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
ZHMgKGV4aXN0aW5nIGpvYiBjYXNlKSB0aW1lY2hhcnQ_at_1327495680_ce6c5552adcd6d74
@4 : scheduler
nobody_testing_c2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
ZHMgKGV4aXN0aW5nIGpvYiBjYXNlKQ_at_1327495680_d5f9ef87327b453e

$ python job.py results 1327495772.12 --count=10 --offset=100000

[[ results would be here, but I cannot figure out a way to trick the text widget to stop parsing the XML 😛 ]]

PLEASE NOTE: It looks like there is a bug in jobs.py, where the cmdline()
calls cmdline() ... But instead should call parse() at line 104 in jobs.py.

replace:

 return cmdline(argv, rules)

with:

 return parse(argv, rules)

Using the export endpoint:

Programmatically:

try:
    result = service.get(
                   "search/jobs/export",
                   search=search,
                   count=0)

    reader = results.ResultsReader(result.body)
    while True:
        kind = reader.read()
        if kind == None: break
        if kind == results.RESULT:
        event = reader.value
        pprint(event)

However, the export endpoint returns events in reverse chronological order -- and this may not be what you want.

If there is anything else you need please do not hesitate to ask.

Wim

Drainy
Champion

I think you might find this is a result of limits.conf, here is a link and below is an excerpt from the config that relates to your problem;

[searchresults]
* This stanza controls search results for a variety of Splunk search commands.

maxresultrows = <integer>
* Configures the maximum number of events are generated by search commands which 
grow the size of your result set (such as multikv) or that create events. Other search commands are explicitly 
controlled in specific stanzas below.
* This limit should not exceed 50000. Setting this limit higher than 50000 causes instability.
* Defaults to 50000. 

Drainy
Champion

I would say so. The SDK is still limited by what Splunk will physically return to it. Best bet is to test and see, although maybe with something obvious like changing it to 55000 so you can see the effects immediately.
Also, if this is the right answer in the end don't forget to click on the tick to the left of it 🙂 It just helps others in the future with the same problem

0 Karma

lemon_wire
Engager

I think you might be right there. That fits. However, some people are returning many more results (found on google) so does that mean they have edited limits.conf?

Is editing limit.conf my only option?
Thanks

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...