Our app uses an accelerated data model for all searches, which works really well.
I recently stumbled about a discrepancy which I cannot explain. The Data Models UI always shows the acceleration status as 100% completed, and the field Updated is always within a few seconds of the current time. That is good, of course, however when looking at how often populating searches are actually run, things seem to be different. In scheduler.log nearly all searches of type datamodel_acceleration have a status of skipped.
When I visualize how often populating searchs for a specific data model object are run successfully, I get only one successful search every 20 minutes:
My question is: how can the data model be 100% complete (and dashboards always show current data) when populating searches only run once every 20 minutes?
I have observed this on Splunk Enterprise 6.5.3 on Windows (simple single-instance Splunk environment).
Update 2017-06-15: I tested Splunk Enterprise 6.6.0 on Linux and Splunk Enterprise 6.6.1 on Windows. The issue occurs on those versions & platforms, too.
To fix the issue of the high number of skipped data model acceleration searches Splunk added auto-skewing in Splunk 7.1. Auto-skewing needs to be enabled per data model. Once enabled, Splunk distributes the searches across the available time range instead of trying to run them all at the same time.
To enable auto-skewing add the following to your datamodels.conf
:
acceleration.allow_skew = 100%
More information on the uberAgent blog.
To fix the issue of the high number of skipped data model acceleration searches Splunk added auto-skewing in Splunk 7.1. Auto-skewing needs to be enabled per data model. Once enabled, Splunk distributes the searches across the available time range instead of trying to run them all at the same time.
To enable auto-skewing add the following to your datamodels.conf
:
acceleration.allow_skew = 100%
More information on the uberAgent blog.
Hi @helge ,
I am facing the similar issue. Apart from adding this configuration in .conf, is there a way I can make this allow_skew changes from UI?
Thanks
@bsanjeevaI'm afraid I don't know if/how this can be configured via Splunk's UI.
Hi @helge
Was this ever resolved in later versions of splunk do you know ?
We are using uberagent (with ITSI) on splunk 6.6.3 and seeing this skipped search 'feature' is there any updates to these discussions other than here ?
Sample:
Report Name App User Cron Schedule Schedule Interval (sec) Average Runtime (sec) Interval Load Factor Total Executions Skipped Executions Skip Ratio Deferred Executions Average Execution Latency sec)
_ACCELERATE_DM_uberAgent_uberAgent.Citrix_Applications_ACCELERATE_ uberAgent nobody 15 12744 12732 99.91 % 0 0
_ACCELERATE_DM_uberAgent_uberAgent.Logon_All_ACCELERATE_ uberAgent nobody 15 12565 12553 99.90 % 0 0
_ACCELERATE_DM_uberAgent_uberAgent.Citrix_Databases_ACCELERATE_ uberAgent nobody 5 12489 12477 99.90 % 0 0
@Esky73 Please see the accepted answer which I just added to this question.
thanks @helge
Ok. We spent some time debugging this at #conf2017. We observed that all of the "base event" searches within the data model required separate single search processes (acceleration jobs) to accelerate that piece of the model. Put another way, if there were 42 separate "root" searches within the model, attempting to accelerate the model would want to run 42 separate search jobs. However, due to the number of cores on my laptop (8), my resulting limit for the number of acceleration jobs was 3. Tracing this in the _audit log showed that three jobs started (and completed quickly) to attempt to accelerate the model. However, that meant that the models "turn" at accelerating was done for this scheduled slot, and the remaining 39 searches were "skipped" by the scheduler. On the next iteration, a different set of (3) searches from the root objects were chosen, so it feels like the models would eventually get a chance to accelerate.
Finally, due to the "mixed mode" nature of |pivot
and | tstats
with their default arguments, any events that were in buckets not yet summarized would be searched ad-hoc, resulting in a "complete" result set, while not being fully accelerated, per se.
The suggestion is to break the model into separate (possibly related) root searches, so that any given acceleration run can accelerate all of the child searches therein.
Thanks for your time, Sanford. After our discussion I noticed that I still do not understand the following: why is the data model showing a status of "100% completed" when it clearly is not?
is it possible that there is only new data every 20 mins? I recall noticing that dm accelerations that I had running and which had no data, showed as skipped, which triggered me to think something was wrong, but after investigating I found it was only DMAs that had no events.
Not really. Every endpoint sends at least a few dozen events twice a minute, and uberAgent is typically deployed to hundreds/thousands of endpoints.
interesting. I also vaguely remember searches showing up as skipped because they were already running....I'd have to take a look at the app itself...
In case you want to take a look, uberAgent is available here: https://uberagent.com/download/
If you have any questions please email support@uberagent.com. Thanks!
I noticed the following fixed issues in the release notes for Splunk 6.6.3 - maybe they help with this?
Noticing the exact same behavior, with the exact same app (uberAgent). The App developer has indicated that this is normal behavior and not to worry about it. For me, this is running on a staging environment and is under powered, but it almost seems like skipped data model accelerated searches are continually rescheduled every minute (or less) until they are successful. Because each of the 28 data model objects is eventually successful within the 5 minute schedule that is defined for the App, but for every successful search, there can be anywhere from 0 to 30 skipped searches.
Does anyone know if this is normal data model acceleration behavior? And if so, what kind of impact would this have on other scheduled and adhoc searches?
Trying to figure out if it is acceptable to move this into production - or if there is a problem with the App or our Splunk configuration.
I assume that you either have scheduled real-time searches OR are using ITSI (which does so under the hood). There is a bug in all versions of Splunk that "correctly" but misleadingly says that it is skipping the real-time search because a real-time search will never stop. If it did crash for some reason, it would restart on the next cycle and NOT generate this misleading log and then every cycle after that complain. This is a known problem and there is a jira
on it and it should be fixed soon.
ITSI is not installed on that machine, and there are no realtime searches either. Our app does have scheduled historic searches, plus the accelerated data model.
I would be more than happy to submit log files to support if that helps.