I am using the search.py script from the SDK. I have the following set :
--maxcount=1000000 --count=0 --outputmode=csv
Yet the results always limit themselves to 100,000 lines? Why is this?
Doesn't --count=0 mean send me all results?
And then --max_count=1000000 means mark the job as finished at 1,000,000 events? So why do I only get 100,000 records? Always about 100,000
CORRECTION: although there are 100,000 lines, in fact there are 50,000 records! The lines are wrapped and have a CR in them hence the 100,000.
Why am I hitting a limit of 50,000 records?
I think you might find this is a result of limits.conf, here is a link and below is an excerpt from the config that relates to your problem;
[searchresults] * This stanza controls search results for a variety of Splunk search commands. maxresultrows = <integer> * Configures the maximum number of events are generated by search commands which grow the size of your result set (such as multikv) or that create events. Other search commands are explicitly controlled in specific stanzas below. * This limit should not exceed 50000. Setting this limit higher than 50000 causes instability. * Defaults to 50000.
I think you might be right there. That fits. However, some people are returning many more results (found on google) so does that mean they have edited limits.conf?
Is editing limit.conf my only option?
I would say so. The SDK is still limited by what Splunk will physically return to it. Best bet is to test and see, although maybe with something obvious like changing it to 55000 so you can see the effects immediately.
Also, if this is the right answer in the end don't forget to click on the tick to the left of it 🙂 It just helps others in the future with the same problem
There are two ways to extract events from Splunk that are larger than the .conf file specifies.
The first is to submit a search job, and use the results reader to extract them. The second is to use the export endpoint.
$ python job.py create "search index=_internal *"
[ though not necessary, list the jobs ]
python job.py list
@0 : 1327495772.12
@1 : schedulernobodytestingc2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
@2 : schedulernobodytestingc2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
@3 : schedulernobodytestingc2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
@4 : schedulernobodytestingc2FtcGxlIHNjaGVkdWxlZCBzZWFyY2ggZm9yIGRhc2hib2Fy
$ python job.py results 1327495772.12 --count=10 --offset=100000
[[ results would be here, but I cannot figure out a way to trick the text widget to stop parsing the XML 😛 ]]
PLEASE NOTE: It looks like there is a bug in jobs.py, where the cmdline()
calls cmdline() ... But instead should call parse() at line 104 in jobs.py.
return cmdline(argv, rules)
return parse(argv, rules)
try: result = service.get( "search/jobs/export", search=search, count=0) reader = results.ResultsReader(result.body) while True: kind = reader.read() if kind == None: break if kind == results.RESULT: event = reader.value pprint(event)
However, the export endpoint returns events in reverse chronological order -- and this may not be what you want.
If there is anything else you need please do not hesitate to ask.
You can get all the events, even when the result set is greater than the limits specified in the conf files. The concept is the same as specified in this splunk-answer, though the language is Java: http://splunk-base.splunk.com/answers/43907/change-default-limit-of-100-results-using-java-sdk