I'm working on tuning our data model accelerations and the first problem I'm running into is that they never finish. I currently have only the Application State data model being accelerated (and enforced) with a 1 day time range and it stays at 99.##% complete. This concerns me because it constantly ties up resources on our indexers. If I enable multiple accelerations it will bring our indexing tier to it's knees.
I have ensured that the volumes have enough space, and I am storing the datamodel_summary along with the hot buckets. All volumes have plenty of space so I'm not sure what the problem can be. I have not made any changes to the default data models.
I appreciate any suggestions.
I'm having this exact same problem across multiple data models within ES. For some background our ES environment is SH Clustered (4 members) (1 deployer) and performance is very sluggish as we have a majority of these data models accelerated - which we typically never see finish.
On the flip side our ad-hoc search cluster of 6 members performs just fine. I have an active case open with support but I wanted to see if you ever made any progress or resolution to your issues?
No obvious resolution on my end. I disabled a lot of the data model acceleration and just kept very specific ones running and just let them go. Eventually, I would catch them at 100% completed. Since then, I've slowly enabled the DM's that I need and it's been performing pretty well. I do have some performance issues at my indexing tier, so I'm hoping that our new server swap happening soon will resolve that.
I am interested to know what Support says on this so please let us know, thanks.
One more quick question if you're willing to share - I'm curious if you're environment is physical or virtual as that's been a topic of discussion here as well but our servers are scaled out pretty well, we're just trying to determine the bottleneck here.
Our environment is a blend of both actually. The original team really had very little understanding of the implications of this. I had them order new physical servers to replace all the VM's at the indexing tier. The other issue is they have three storage technologies at play on the indexers: HBA to FC SAN, VM Datastore and NFS. Needless to say I disabled the indexers using NFS day one...
If you go to settings, Report acceleration summaries and click on the the summary id link and then click verify, fast verification. Let that run then click on the link for summary status.
I believe the fast verification doesn't check every bucket but a sampling.
It may provide some more detail like so:
103 buckets failed (5 passed, 38 skipped)
I'm still not clear on the all the reasons for failure. I know one is that a search dependency could have changed. i.e. you search depends on an eventtype which someone edited.
I also know that the docs say you need 100k events per bucket to qualify for acceleration. I'm not certain if that requirement is not met if it will also set a bucket verification to either failed or skipped?
I recently changed the bucket sizes on several my indexes as I believe that was preventing me from reaching the 100k events requirement for report acceleration.
indexes.conf - allows a bucket size to grow to up to 10GB:
maxDataSize = auto_high_volume
maxDataSize = <positive integer>|auto|auto_high_volume * The maximum size in MB for a hot DB to reach before a roll to warm is triggered. * Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this parameter (recommended). * You should use "auto_high_volume" for high-volume indexes (such as the main index); otherwise, use "auto". A "high volume index" would typically be considered one that gets over 10GB of data per day. * Defaults to "auto", which sets the size to 750MB. * "auto_high_volume" sets the size to 10GB on 64-bit, and 1GB on 32-bit systems. * Although the maximum value you can set this is 1048576 MB, which corresponds to 1 TB, a reasonable number ranges anywhere from 100 to 50000. Before proceeding with any higher value, please seek approval of Splunk Support. * If you specify an invalid number or string, maxDataSize will be auto tuned. * NOTE: The maximum size of your warm buckets may slightly exceed 'maxDataSize', due to post-processing and timing issues with the rolling policy.
Here is one example, I have the "Intrusion Detection" data model being accelerated for 1 day. It has now been running for well over a week or so, and here are it's current stats:
Status: 99.92% Completed
Access Count: 379 Last Access: 2014-11-13T 10:40:01-05:00
Size on Disk: 63.03MB
Summary Range: 86400
I'd take any guesses as to what it could be. Perhaps bucket rotation is an issue? Seems like a lot of buckets for only 63.03MB of data.