I’m having problems with slow queries when returning a fixed number of events starting from a specified earliest datetime, from an index/sourcetype containing tens to hundreds of millions of records. E.g.:
Using the SDK and a maximum limit of 50000 events, because events appear in descending date order and are cropped at 50000, we only get the LATEST 50000 values which do not start at the requested earliest date. So in this case the resulting date range of the above query is "from": 2015-10-13T12:30:14 "to": 2015-10-13T13:08:41. (where 2015-10-13T13:08:41 is the most recent record in Splunk).
If I use tail, I get the correct values (in ascending date order) starting at the correct earliest datetime, but then the query time is unacceptably long as it needs to process every event (and will take longer and longer with increasing data). E.g.
Use tstats against an accelerated data model to improve performance, passing to its WHERE statement a subsearch that tells tstats the earliest time: | tstats count FROM datamodel=yourdata.modelname WHERE earliest=[| tstats count WHERE index=yourindex sourcetype=yoursourcetype BY _time span=1m | reverse | streamstats sum(count) as sum | where sum>1000 | head 1 | rename _time as search | table search] BY _time, host, modelname.fieldA, modelname.fieldB, modelname.fieldC | fields - count | tail 1000
You'll have to create a data model and accelerated it, then in the search above, replace the references to your data model's name and the two instances of '1000' with however many results you'd like returned. The search will always return a little bit more than in the subsearch's WHERE statement, so the tail is added at the end to ensure the exact number of events you want are returned. N.B. The subsearch WHERE statement's constraints should be identical to that of the datamodel (ie. the indices and sourcetypes searched).