All Apps and Add-ons

Palo Alto Networks App for Splunk: How should I plan disk space for the size of datamodel summary files?

MonkeyK
Builder

We added Palo Alto data to our Splunk environment a little over two months ago and installed the Palo Alto Networks App for Splunk (v5.2.0)

After two months, we are seeing PA App datamodel_summary files that approach the size of the total indexed data. I need to appropriately plan disk space and not sure what I need to be asking for. Is this normal for the Palo Alto Networks App for Splunk?

Is there a rule of thumb for how I should think about datamodel_summary size to indexed volume?

What options do I have for containing the PA App datamodel?

0 Karma

panguy
Contributor

MonkeyK,

We have worked with the Splunk DataModel Team to optimize our datamodel the best we could. However, since every customers needs are different we have included fields in the datamodel that may not be important in your environment. The Splunk admin has the ability to remove fields that may not be of importance. This will help shrink the datamodel storage needs.

In an effort to continue to optimize our data model. Could you please provide feedback on which fields were removed and why? We would really appreciate it.

Regards,

Paul

0 Karma

MonkeyK
Builder

by "approach the size of the total indexed data" I mean that I have been told by my Splunk admin that after two months, our pan_logs index is 830GB and he gave me two numbers for the datamodel_summary files: 670GB and 850GB. So the datamodel_summary file disk needs appear to be on the order of the indexed data.

I need to understand if this is a linear trend that will continue and what ability I have to control this trend. If I cannot do this, I cannot help my Splunk admin define storage needs.

0 Karma

adamsaul
Communicator

MonkeyK,

ddrilic brings up a good point, what level of data acceleration is your PA App set to? Limiting the amount of historical data to accelerate will significantly reduce the summary index consumption.

0 Karma

MonkeyK
Builder

I am not so much concerned about generally limiting the amount of data as being able to plan for what I should need. Whatever the number, it's a management decision on cost/benefit. But if I estimate wrong and we run out of space or budget, things will not go great for me or our Splunk implementation.

That said, I just looked at the acceleration stats. They show 7 days of data model acceleration and ~100GB size on disk. So there is either something more, or some way that we can be more agressive in cleaning up the datamodel_summary files.

0 Karma

adamsaul
Communicator

The default appears to be 7 Days of acceleration for Firewall Logs, Endpoint Logs, and Wildfire Malware Reports.

I'd run the search below to determine the usage per day against your Palo Alto Indexes and Summary indexes to be able to project an average monthly usage by Palo Alto. The search will return a day by day basis of usage per Index.

index=_internal source="*license_usage.log*" type=Usage  | eval yearmonthday=strftime(_time, "%Y%m%d") | eval yearmonth=strftime(_time, "%Y%m%d") | stats sum(eval(b/1024/1024/1024)) AS volume_b by idx yearmonthday yearmonth | chart sum(volume_b) over yearmonth by idx
0 Karma

adamsaul
Communicator

MonkeyK,

When you say "...approach the size of the total indexed data", what do you mean? Total indexed data over the last two months or today?

Summary indexes are great for large time window searches such as annual reporting, so they will be a subset of your overall indexed data.

0 Karma

ddrillic
Ultra Champion

It's interesting here - Palo Alto Networks App for Splunk

It says -

-- Datamodel acceleration might rebuild itself after installation due to updated constraints - ...

I just wonder if you use the Datamodel acceleration...

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...