I have one data model accelerated which contains 5 event datasets with simple fields conditions. Now when I try to just find out count using tstats count from datamodel=X.Y1 where source=A
It didn't seem to be accelerated compared to running it on other 4 datasets. Then I tried passing summariesonly=t
in tstats
and I found 0 count for that 1 dataset except others were giving correct count. Why is this the case for this one dataset (i.e it is not summarized?). What might be the cause of this?
Edit(11/04/2020): I deleted all other 4 datasets and rebuilt the acceleration and it turns out it worked. Maybe I am hitting some limits on datamodel which I am not aware of? This is not random behavior I did this 2 times on 2 different instances.
Edit(13/04/2020): Found these error in search.log which suggests that it is not finding tsdix files for this particular dataset means it is not summarized for some reason even though the data model shows acceleration status as 100% (Note: I've removed some parts from the log string):
st_select_handle_new found no TSIDX files warm_rc=[0,0] errno=18
and
Mixed mode is disabled, skipping search for bucket with no TSIDX data: E:\Splunk\indexes\db_1586485508_1586389645_42
Edit (14/04/2020): I was able to fix this issue by simply changing the name of the dataset that was not getting accelerated (it was named executed_background_jobs and I renamed it to batch_jobs_exec). It's still a mystery why it didn't work with the previous name.
Turns out the above fix was a random behavior. It again getting issues creating tsidx files.
Edit (17/04/2020): If anyone knows what are the factors that can cause this issue I would appreciate the help. There are no troubleshooting docs or guide for the data model acceleration whatsoever.
Edit(22/04/2020): Found on the monitoring console that acceleration searches are being skipped 92% of the time for this dataset (in the last 4 hours) and others are also getting skipped (all of them above 60% skip ratio). After looking at the answer by @sowings (link) I came to know that each root dataset has its own job (which can be seen in monitoring console as well) and as I have 5 root datasets there are 5 acceleration report searches in total running every 5 mins (set in acceleration settings). As the VM has 4 cores and has concurrent job limit set to 5, its sure that most of them are going to get skipped. In the monitoring console it also seems that highest miss ratio was for this specific one data set (approx 92% for the last 4 hours, maybe because it has large amount of data? not sure). This also explains my previous edit where I tried to accelerate datamodel with only single root dataset. As it would only need one job, it won't be bothered by the skipping issue. I am currently trying the allow_skew
parameter available in the datamodels.conf with reducing the cron for the jobs as well and will post the result update here. Reference: https://answers.splunk.com/answers/543887/accelerated-data-model-100-complete-even-though-mo.html
Thanks,
Harsh
I believe the answer here helped me identify the issue so If anyone having this issue I would suggest first check in monitoring console if any acceleration jobs are being skipped or not.
I believe the answer here helped me identify the issue so If anyone having this issue I would suggest first check in monitoring console if any acceleration jobs are being skipped or not.
As you already stated, the problem is that datamodel acceleration searches are being skipped, therefore datamodel is not being populated with new data. If possible I would suggest to throw in more CPU cores. Splunks reference hardware suggest at least 12 cores per instance, for dedicated search head, that number is increased to 16.
https://docs.splunk.com/Documentation/Splunk/8.0.3/Capacity/Referencehardware
If you have no option to add CPU cores you will have to do optimization of the datamodel acceleration searches. Specifying at least the index of the events has a quite remarkable impact.
Thanks for the answer (voted). Appreciate it. But sorry I can not accept this as an answer.
Please change your edit/update to an answer and accept your solution. That way other people will be able to know that your problem is solved and access your solution.
I dont think thats a solution. What was the first reason for that issue? I am still troubleshooting and will post answer with reason.
Thanks,
Harsh