Since upgrade to 7.x multiple installations I have access to have had very heavy indexer CPU usage. I have upgraded to 7.1.1 to ensure I have all bug fixes.
This has been chased down to datamodels that appear to continually accelerate the full time period over and over, rather than just checking the last 5 minutes of new data each time. The datamodels are all showing 100% complete, but searches running to the full 3600 seconds all the time also. Skip ratio for these searches is through the roof.
To reproduce, I have installed uberAgent and it's event generator in a test instance of Splunk 7.1.1. I can see the base LISPY of the acceleration jobs appear to be looking at (pretty much) "All time" on every invocation, e.g.:
06-15-2018 11:58:09.119 INFO BatchSearch - Searching index:uberagent with LISPY:'[ AND sourcetype::uberagent:application:outlookpluginload [ OR _indextime::12* _indextime::13* _indextime::14* _indextime::150* _indextime::151* _indextime::1520* _indextime::1521* _indextime::1522* _indextime::1523* _indextime::1524* _indextime::1525* _indextime::1526* _indextime::1527* _indextime::1528* _indextime::152900* _indextime::152901* _indextime::1529020* _indextime::1529021* _indextime::1529022* _indextime::1529023* _indextime::1529024* _indextime::1529025* _indextime::1529026* _indextime::15290270* _indextime::15290271* _indextime::15290272* _indextime::15290273* _indextime::15290274* _indextime::15290275* _indextime::15290276* _indextime::152902770* _indextime::152902771* _indextime::152902772* _indextime::152902773* _indextime::152902774* _indextime::152902775* _indextime::152902776* _indextime::152902777* _indextime::152902778* _indextime::1529027790 _indextime::1529027791 _indextime::1529027792 _indextime::1529027793 _indextime::1529027794 _indextime::1529027795 _indextime::1529027796 _indextime::1529027797 _indextime::1529027798 ] ]'
To me it appears that this LISPY is looking at events with an _indextime beginning with "12*" and later, or pretty much a"All time" for any practical purpose. This LISPY is the same pattern on every acceleration search for every datamodel.
I've managed to calm down the problem by setting acceleration.backfill_time to -1d by default. This seems to make each acceleration search invocation scan only 1 day of data every time, not the entire acceleration duration. Far from ideal, but it has stopped this being a complete Splunk breaking issue, and the accelerations continue to be built and stored.
Is there anything else I should be looking at to figure this one out, or ideas on how to fix it? What might be the change in 7.x that has caused this?
There's a bug only that applies to an unset acceleration.backfilltime (which unfortunately is the default), and the workaround, in that case, is to set that to the same value as acceleration.earliesttime. Datamodels that have set acceleration.backfill_time already are unaffected.
Is there a version target such as 7.1.2 for this bug fix yet?
Any chance this can be added to the release notes? We have had a number of engineers waste a lot of time trying to identify the cause of this issue. If they had seen this on the release notes page they would not have wasted a day or two trying to identify the problem.