Solved: Slow queries searching for records starting at an ...

jamesar · ‎10-12-2015

Hi Splunkers,

I’m having problems with slow queries when returning a fixed number of events starting from a specified earliest datetime, from an index/sourcetype containing tens to hundreds of millions of records. E.g.:

index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id

Using the SDK and a maximum limit of 50000 events, because events appear in descending date order and are cropped at 50000, we only get the LATEST 50000 values which do not start at the requested earliest date. So in this case the resulting date range of the above query is "from": 2015-10-13T12:30:14 "to": 2015-10-13T13:08:41. (where 2015-10-13T13:08:41 is the most recent record in Splunk).

If I use tail, I get the correct values (in ascending date order) starting at the correct earliest datetime, but then the query time is unacceptably long as it needs to process every event (and will take longer and longer with increasing data). E.g.

index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id| tail 50000

In the dashboard, even a query such as below takes a very long time, looking for 10 records starting at a particular date:

index=test sourcetype=touch_tuio earliest="09/29/2015:00:00:00" | fields _time,host,screen_id| tail 10

It seems Splunk needs to go through EVERY record just to give me the 10 records starting at a specific date (09/29/2015:00:00:00). This takes an extreme amount of time.

Do you know what I may be doing wrong here? Is there another way to query a fixed number of events beginning at a selected earliest date, without using tail, or without the increasing search time?

Thank you!

doksu · ‎10-19-2015

Use tstats against an accelerated data model to improve performance, passing to its WHERE statement a subsearch that tells tstats the earliest time: | tstats count FROM datamodel=yourdata.modelname WHERE earliest=[| tstats count WHERE index=yourindex sourcetype=yoursourcetype BY _time span=1m | reverse | streamstats sum(count) as sum | where sum>1000 | head 1 | rename _time as search | table search] BY _time, host, modelname.fieldA, modelname.fieldB, modelname.fieldC | fields - count | tail 1000

You'll have to create a data model and accelerated it, then in the search above, replace the references to your data model's name and the two instances of '1000' with however many results you'd like returned. The search will always return a little bit more than in the subsearch's WHERE statement, so the tail is added at the end to ensure the exact number of events you want are returned. N.B. The subsearch WHERE statement's constraints should be identical to that of the datamodel (ie. the indices and sourcetypes searched).

Regarding the C# SDK issue, please see: https://answers.splunk.com/answers/119100/search-using-c-sdk-is-returning-only-50000-events.html

View solution in original post

doksu · ‎10-19-2015

Use tstats against an accelerated data model to improve performance, passing to its WHERE statement a subsearch that tells tstats the earliest time: | tstats count FROM datamodel=yourdata.modelname WHERE earliest=[| tstats count WHERE index=yourindex sourcetype=yoursourcetype BY _time span=1m | reverse | streamstats sum(count) as sum | where sum>1000 | head 1 | rename _time as search | table search] BY _time, host, modelname.fieldA, modelname.fieldB, modelname.fieldC | fields - count | tail 1000

You'll have to create a data model and accelerated it, then in the search above, replace the references to your data model's name and the two instances of '1000' with however many results you'd like returned. The search will always return a little bit more than in the subsearch's WHERE statement, so the tail is added at the end to ensure the exact number of events you want are returned. N.B. The subsearch WHERE statement's constraints should be identical to that of the datamodel (ie. the indices and sourcetypes searched).

Regarding the C# SDK issue, please see: https://answers.splunk.com/answers/119100/search-using-c-sdk-is-returning-only-50000-events.html

Slow queries searching for records starting at an earliest datetime using C# SDK and dashboard

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Are you a member of the Splunk Community?

Slow queries searching for records starting at an earliest datetime using C# SDK and dashboard

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console