Splunk Enterprise Security

ES Data Model Acceleration - Never Finishes/Stuck at 99%

Communicator

I'm working on tuning our data model accelerations and the first problem I'm running into is that they never finish. I currently have only the Application State data model being accelerated (and enforced) with a 1 day time range and it stays at 99.##% complete. This concerns me because it constantly ties up resources on our indexers. If I enable multiple accelerations it will bring our indexing tier to it's knees.

I have ensured that the volumes have enough space, and I am storing the datamodel_summary along with the hot buckets. All volumes have plenty of space so I'm not sure what the problem can be. I have not made any changes to the default data models.

I appreciate any suggestions.

Communicator

I'm having this exact same problem across multiple data models within ES. For some background our ES environment is SH Clustered (4 members) (1 deployer) and performance is very sluggish as we have a majority of these data models accelerated - which we typically never see finish.

On the flip side our ad-hoc search cluster of 6 members performs just fine. I have an active case open with support but I wanted to see if you ever made any progress or resolution to your issues?

Thanks

0 Karma

Communicator

No obvious resolution on my end. I disabled a lot of the data model acceleration and just kept very specific ones running and just let them go. Eventually, I would catch them at 100% completed. Since then, I've slowly enabled the DM's that I need and it's been performing pretty well. I do have some performance issues at my indexing tier, so I'm hoping that our new server swap happening soon will resolve that.

I am interested to know what Support says on this so please let us know, thanks.

0 Karma

Communicator

One more quick question if you're willing to share - I'm curious if you're environment is physical or virtual as that's been a topic of discussion here as well but our servers are scaled out pretty well, we're just trying to determine the bottleneck here.

0 Karma

Communicator

Our environment is a blend of both actually. The original team really had very little understanding of the implications of this. I had them order new physical servers to replace all the VM's at the indexing tier. The other issue is they have three storage technologies at play on the indexers: HBA to FC SAN, VM Datastore and NFS. Needless to say I disabled the indexers using NFS day one...

0 Karma

Motivator

If you go to settings, Report acceleration summaries and click on the the summary id link and then click verify, fast verification. Let that run then click on the link for summary status.

I believe the fast verification doesn't check every bucket but a sampling.

It may provide some more detail like so:

103 buckets failed (5 passed, 38 skipped) 

I'm still not clear on the all the reasons for failure. I know one is that a search dependency could have changed. i.e. you search depends on an eventtype which someone edited.

I also know that the docs say you need 100k events per bucket to qualify for acceleration. I'm not certain if that requirement is not met if it will also set a bucket verification to either failed or skipped?

I recently changed the bucket sizes on several my indexes as I believe that was preventing me from reaching the 100k events requirement for report acceleration.

indexes.conf - allows a bucket size to grow to up to 10GB:
maxDataSize = autohighvolume

From: http://docs.splunk.com/Documentation/Splunk/latest/Admin/Indexesconf

maxDataSize = <positive integer>|auto|auto_high_volume
    * The maximum size in MB for a hot DB to reach before a roll to warm is triggered.
    * Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this parameter (recommended).
    * You should use "auto_high_volume" for high-volume indexes (such as the main
      index); otherwise, use "auto".  A "high volume index" would typically be
      considered one that gets over 10GB of data per day.
    * Defaults to "auto", which sets the size to 750MB.
    * "auto_high_volume" sets the size to 10GB on 64-bit, and 1GB on 32-bit systems.
    * Although the maximum value you can set this is 1048576 MB, which corresponds to 1 TB, a reasonable 
      number ranges anywhere from 100 to 50000.  Before proceeding with any higher value, please seek
      approval of Splunk Support.
    * If you specify an invalid number or string, maxDataSize will be auto tuned.
    * NOTE: The maximum size of your warm buckets may slightly exceed 'maxDataSize', due to post-processing and 
      timing issues with the rolling policy.
0 Karma

Motivator

My bad. Looks like the data model accelerations don't show up under report accelerations. Hopefully report acceleration and data model acceleration code bases will be merged at some point.

0 Karma

Communicator

Here is one example, I have the "Intrusion Detection" data model being accelerated for 1 day. It has now been running for well over a week or so, and here are it's current stats:

Status: 99.92% Completed
Access Count: 379 Last Access: 2014-11-13T 10:40:01-05:00
Size on Disk: 63.03MB
Summary Range: 86400
Buckets: 1414

I'd take any guesses as to what it could be. Perhaps bucket rotation is an issue? Seems like a lot of buckets for only 63.03MB of data.

0 Karma

I have seen this issue numerous times, but have never received a good response on it.