All Apps and Add-ons

How can I reduce the amount of data being saved to my index?

BrendanCO
Path Finder

Hi guys! I have multiple Palo Alto Network Apps for Splunk devices sending their syslog data to my Splunk instance. I've tailored what I can on the Palo Alto side of the house but was wondering if there are some easy ways to reduce what is being saved in my index? I'm bumping up against my licensed amount and would like to trim it down before I upgrade my license, if possible.

Thanks in advance!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi BrendanCO,
at first you have to understand which events are not mandatory for your monitoring and find a regex.
When you're sure you can filter them using the usual mehods:
in props.conf

[your_sourcetype]
TRANSFORMS-set-filter = set_nullqueue,set_filter

in transforms.conf

########## Discard #########
[set_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
########## Filter ##########
[set_filter]
REGEX = your_regex
DEST_KEY = queue
FORMAT = indexQueue

Bye.
Giuseppe

0 Karma

DalJeanis
Legend

Primarily, you need to determine the use cases for your data and then whitelist what you want or blacklist what you don't want.

The easiest way to get started is to sample everything you're currently getting, then check the patterns tab and see what each kind of record is. Anything that is a common pattern, identify what the purpose of that record is and whether and how you are likely to use it, RIGHT NOW. If you have a current use, whitelist, if not, greylist. If it seems totally useless, blacklist.

Next, kill all those records from your pull and pull another set from a different day, and repeat the process. Soon you will get to the point where all the remaining records are anomalies. Then you start paring down your greylist into black and white, documenting your choices. Ideally, you probably want to be sending your blacklist to the null queue and keeping the anomalies, unless there are just too many of them.

0 Karma

DalJeanis
Legend

Another potential saving is to identify patterns where the data being collected is redundant - windows events are notorious for 'splaining unnecessarily, for example, and use sedcmd in your profs to kill the redundant verbiage without losing the actual data.

0 Karma
Get Updates on the Splunk Community!

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...