Reporting

Accelerated data model 100% complete even though most populating searches are skipped

Builder

Our app uses an accelerated data model for all searches, which works really well.

I recently stumbled about a discrepancy which I cannot explain. The Data Models UI always shows the acceleration status as 100% completed, and the field Updated is always within a few seconds of the current time. That is good, of course, however when looking at how often populating searches are actually run, things seem to be different. In scheduler.log nearly all searches of type datamodel_acceleration have a status of skipped.

When I visualize how often populating searchs for a specific data model object are run successfully, I get only one successful search every 20 minutes:

alt text

My question is: how can the data model be 100% complete (and dashboards always show current data) when populating searches only run once every 20 minutes?

I have observed this on Splunk Enterprise 6.5.3 on Windows (simple single-instance Splunk environment).

Update 2017-06-15: I tested Splunk Enterprise 6.6.0 on Linux and Splunk Enterprise 6.6.1 on Windows. The issue occurs on those versions & platforms, too.

1 Solution

Builder

To fix the issue of the high number of skipped data model acceleration searches Splunk added auto-skewing in Splunk 7.1. Auto-skewing needs to be enabled per data model. Once enabled, Splunk distributes the searches across the available time range instead of trying to run them all at the same time.

To enable auto-skewing add the following to your datamodels.conf:

acceleration.allow_skew = 100%

More information on the uberAgent blog.

View solution in original post

0 Karma

Builder

To fix the issue of the high number of skipped data model acceleration searches Splunk added auto-skewing in Splunk 7.1. Auto-skewing needs to be enabled per data model. Once enabled, Splunk distributes the searches across the available time range instead of trying to run them all at the same time.

To enable auto-skewing add the following to your datamodels.conf:

acceleration.allow_skew = 100%

More information on the uberAgent blog.

View solution in original post

0 Karma

Builder

Hi @helge

Was this ever resolved in later versions of splunk do you know ?

We are using uberagent (with ITSI) on splunk 6.6.3 and seeing this skipped search 'feature' is there any updates to these discussions other than here ?

Sample:

Report Name App User    Cron Schedule   Schedule Interval (sec) Average Runtime (sec)   Interval Load Factor    Total Executions    Skipped Executions  Skip Ratio  Deferred Executions Average Execution Latency sec)
_ACCELERATE_DM_uberAgent_uberAgent.Citrix_Applications_ACCELERATE_  uberAgent   nobody          15  12744   12732   99.91 % 0   0
_ACCELERATE_DM_uberAgent_uberAgent.Logon_All_ACCELERATE_    uberAgent   nobody          15  12565   12553   99.90 % 0   0
_ACCELERATE_DM_uberAgent_uberAgent.Citrix_Databases_ACCELERATE_ uberAgent   nobody          5   12489   12477   99.90 % 0   0
0 Karma

Builder

@Esky73 Please see the accepted answer which I just added to this question.

Builder

thanks @helge

0 Karma

Splunk Employee
Splunk Employee

Ok. We spent some time debugging this at #conf2017. We observed that all of the "base event" searches within the data model required separate single search processes (acceleration jobs) to accelerate that piece of the model. Put another way, if there were 42 separate "root" searches within the model, attempting to accelerate the model would want to run 42 separate search jobs. However, due to the number of cores on my laptop (8), my resulting limit for the number of acceleration jobs was 3. Tracing this in the _audit log showed that three jobs started (and completed quickly) to attempt to accelerate the model. However, that meant that the models "turn" at accelerating was done for this scheduled slot, and the remaining 39 searches were "skipped" by the scheduler. On the next iteration, a different set of (3) searches from the root objects were chosen, so it feels like the models would eventually get a chance to accelerate.

Finally, due to the "mixed mode" nature of |pivot and | tstats with their default arguments, any events that were in buckets not yet summarized would be searched ad-hoc, resulting in a "complete" result set, while not being fully accelerated, per se.

The suggestion is to break the model into separate (possibly related) root searches, so that any given acceleration run can accelerate all of the child searches therein.

Builder

Thanks for your time, Sanford. After our discussion I noticed that I still do not understand the following: why is the data model showing a status of "100% completed" when it clearly is not?

0 Karma

Splunk Employee
Splunk Employee

is it possible that there is only new data every 20 mins? I recall noticing that dm accelerations that I had running and which had no data, showed as skipped, which triggered me to think something was wrong, but after investigating I found it was only DMAs that had no events.

0 Karma

Builder

Not really. Every endpoint sends at least a few dozen events twice a minute, and uberAgent is typically deployed to hundreds/thousands of endpoints.

0 Karma

Splunk Employee
Splunk Employee

interesting. I also vaguely remember searches showing up as skipped because they were already running....I'd have to take a look at the app itself...

0 Karma

Builder

In case you want to take a look, uberAgent is available here: https://uberagent.com/download/
If you have any questions please email support@uberagent.com. Thanks!

0 Karma

Builder

I noticed the following fixed issues in the release notes for Splunk 6.6.3 - maybe they help with this?

  • SPL-142801, SPL-142771: Only one root event search in a DM gets accelerated
  • SPL-141887, SPL-141823: SearchParser Errors for Datamodel Acceleration prevents other scheduled searches from being executed
0 Karma

Noticing the exact same behavior, with the exact same app (uberAgent). The App developer has indicated that this is normal behavior and not to worry about it. For me, this is running on a staging environment and is under powered, but it almost seems like skipped data model accelerated searches are continually rescheduled every minute (or less) until they are successful. Because each of the 28 data model objects is eventually successful within the 5 minute schedule that is defined for the App, but for every successful search, there can be anywhere from 0 to 30 skipped searches.

Does anyone know if this is normal data model acceleration behavior? And if so, what kind of impact would this have on other scheduled and adhoc searches?

Trying to figure out if it is acceptable to move this into production - or if there is a problem with the App or our Splunk configuration.

Esteemed Legend

I assume that you either have scheduled real-time searches OR are using ITSI (which does so under the hood). There is a bug in all versions of Splunk that "correctly" but misleadingly says that it is skipping the real-time search because a real-time search will never stop. If it did crash for some reason, it would restart on the next cycle and NOT generate this misleading log and then every cycle after that complain. This is a known problem and there is a jira on it and it should be fixed soon.

0 Karma

Builder

ITSI is not installed on that machine, and there are no realtime searches either. Our app does have scheduled historic searches, plus the accelerated data model.

0 Karma

Builder

I would be more than happy to submit log files to support if that helps.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!