We just implemented Splunk Enterprise Security about a month ago. We're new to data models, acceleration, and any implications they may have on our Splunk environment.
I noticed the datamodel_summary directory in our firewall logs index ($SPLUNK_HOME/var/lib/splunk/pan_logs/) is growing incredibly large (850GB and growing a couple dozen GB/day).
I need to understand why. We have the Palo Alto app installed as well and the Palo Alto Networks Firewall Logs datamodel (7 days acceleration) is 100 GB.
In ES, the Network Traffic datamodel (30 days acceleration), part of the Splunk_SA_CIM app, is 300+ GB!
There are approx. 350 dirs in the $SPLUNK_HOME/var/lib/splunk/pan_logs/datamodel_summary dir. Are all of these really necessary, or can I institute some kind of cleanup in this directory to recover space?
Any help in understanding how data models are stored/cleaned up would be greatly appreciated.
I don't think that you should manually delete any of the data model acceleration files. The size of these files is related to two things: the number of events in the associated index and the number of days acceleration. I am not surprised to find that your data model summary information is quite large.
To fix it, you may want to decrease the number of days acceleration for some (or all) data models. Clearly 30 days acceleration is going to be approximately 4x as large as 7 days acceleration for the same index.
The usual estimate for the size of the data model summary = Inbound data amount (GB or MB) * 3.4
You might want to take a look at this page of the documentation: Accelerate data models
Thanks for the response.
How did you come up with the "data model summary = Inbound data amount (GB or MB) * 3.4" statement? Wouldn't it depend on the summary range of the data model?
That calculation is published in the Splunk® Enterprise Security Installation and Upgrade Manual in the section on Data model acceleration storage and retention
I just looked it up and it also says "This formula assumes that you are using the recommended retention rates for the accelerated data models." Here is the link:
If you are seeing something really different from what the documentation suggests, I think you should file a support ticket. If you just stop accelerating the data models, I am concerned that it might have a negative effect on your Enterprise Security correlation searches and alerts...