All Apps and Add-ons

How can I reduce the amount of data being saved to my index?

BrendanCO
Path Finder

Hi guys! I have multiple Palo Alto Network Apps for Splunk devices sending their syslog data to my Splunk instance. I've tailored what I can on the Palo Alto side of the house but was wondering if there are some easy ways to reduce what is being saved in my index? I'm bumping up against my licensed amount and would like to trim it down before I upgrade my license, if possible.

Thanks in advance!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi BrendanCO,
at first you have to understand which events are not mandatory for your monitoring and find a regex.
When you're sure you can filter them using the usual mehods:
in props.conf

[your_sourcetype]
TRANSFORMS-set-filter = set_nullqueue,set_filter

in transforms.conf

########## Discard #########
[set_nullqueue]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
########## Filter ##########
[set_filter]
REGEX = your_regex
DEST_KEY = queue
FORMAT = indexQueue

Bye.
Giuseppe

0 Karma

DalJeanis
Legend

Primarily, you need to determine the use cases for your data and then whitelist what you want or blacklist what you don't want.

The easiest way to get started is to sample everything you're currently getting, then check the patterns tab and see what each kind of record is. Anything that is a common pattern, identify what the purpose of that record is and whether and how you are likely to use it, RIGHT NOW. If you have a current use, whitelist, if not, greylist. If it seems totally useless, blacklist.

Next, kill all those records from your pull and pull another set from a different day, and repeat the process. Soon you will get to the point where all the remaining records are anomalies. Then you start paring down your greylist into black and white, documenting your choices. Ideally, you probably want to be sending your blacklist to the null queue and keeping the anomalies, unless there are just too many of them.

0 Karma

DalJeanis
Legend

Another potential saving is to identify patterns where the data being collected is redundant - windows events are notorious for 'splaining unnecessarily, for example, and use sedcmd in your profs to kill the redundant verbiage without losing the actual data.

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...