Splunk Enterprise Security

Can I clean up the datamodel_summary directories that are growing by a couple dozen GB/day?

ericlarsen
Path Finder

We just implemented Splunk Enterprise Security about a month ago. We're new to data models, acceleration, and any implications they may have on our Splunk environment.

I noticed the datamodel_summary directory in our firewall logs index ($SPLUNK_HOME/var/lib/splunk/pan_logs/) is growing incredibly large (850GB and growing a couple dozen GB/day).

I need to understand why. We have the Palo Alto app installed as well and the Palo Alto Networks Firewall Logs datamodel (7 days acceleration) is 100 GB.

In ES, the Network Traffic datamodel (30 days acceleration), part of the Splunk_SA_CIM app, is 300+ GB!

There are approx. 350 dirs in the $SPLUNK_HOME/var/lib/splunk/pan_logs/datamodel_summary dir. Are all of these really necessary, or can I institute some kind of cleanup in this directory to recover space?

Any help in understanding how data models are stored/cleaned up would be greatly appreciated.
Thanks.

0 Karma

lguinn2
Legend

I don't think that you should manually delete any of the data model acceleration files. The size of these files is related to two things: the number of events in the associated index and the number of days acceleration. I am not surprised to find that your data model summary information is quite large.

To fix it, you may want to decrease the number of days acceleration for some (or all) data models. Clearly 30 days acceleration is going to be approximately 4x as large as 7 days acceleration for the same index.

The usual estimate for the size of the data model summary = Inbound data amount (GB or MB) * 3.4
You might want to take a look at this page of the documentation: Accelerate data models

0 Karma

ericlarsen
Path Finder

Thanks for the response.

How did you come up with the "data model summary = Inbound data amount (GB or MB) * 3.4" statement? Wouldn't it depend on the summary range of the data model?

0 Karma

lguinn2
Legend

That calculation is published in the Splunk® Enterprise Security Installation and Upgrade Manual in the section on Data model acceleration storage and retention

I just looked it up and it also says "This formula assumes that you are using the recommended retention rates for the accelerated data models." Here is the link:
http://docs.splunk.com/Documentation/ES/4.5.1/Install/Datamodels

If you are seeing something really different from what the documentation suggests, I think you should file a support ticket. If you just stop accelerating the data models, I am concerned that it might have a negative effect on your Enterprise Security correlation searches and alerts...

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...