All Apps and Add-ons

How can I reduce the amount of data being saved to my index?

BrendanCO
Path Finder

Hi guys! I have multiple Palo Alto Network Apps for Splunk devices sending their syslog data to my Splunk instance. I've tailored what I can on the Palo Alto side of the house but was wondering if there are some easy ways to reduce what is being saved in my index? I'm bumping up against my licensed amount and would like to trim it down before I upgrade my license, if possible.

Thanks in advance!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi BrendanCO,
at first you have to understand which events are not mandatory for your monitoring and find a regex.
When you're sure you can filter them using the usual mehods:
in props.conf

[your_sourcetype]
TRANSFORMS-set-filter = set_nullqueue,set_filter

in transforms.conf

########## Discard #########
[set_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
########## Filter ##########
[set_filter]
REGEX = your_regex
DEST_KEY = queue
FORMAT = indexQueue

Bye.
Giuseppe

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Primarily, you need to determine the use cases for your data and then whitelist what you want or blacklist what you don't want.

The easiest way to get started is to sample everything you're currently getting, then check the patterns tab and see what each kind of record is. Anything that is a common pattern, identify what the purpose of that record is and whether and how you are likely to use it, RIGHT NOW. If you have a current use, whitelist, if not, greylist. If it seems totally useless, blacklist.

Next, kill all those records from your pull and pull another set from a different day, and repeat the process. Soon you will get to the point where all the remaining records are anomalies. Then you start paring down your greylist into black and white, documenting your choices. Ideally, you probably want to be sending your blacklist to the null queue and keeping the anomalies, unless there are just too many of them.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Another potential saving is to identify patterns where the data being collected is redundant - windows events are notorious for 'splaining unnecessarily, for example, and use sedcmd in your profs to kill the redundant verbiage without losing the actual data.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...