All Apps and Add-ons

How should I configure Splunk Add-on for Unix and Linux to index at advertised rate?

davidocsch
Loves-to-Learn

When I configured the Splunk Add-On for Unix and Linux using defaults and choosing "enable all inputs", its indexing rate is approx 100 KB/s per host, which exceeds our 1 GB/day limit.

This doesn't match up with the indexing volume specified in the "Indexing volume" of the "What data are collected?" page, which says:

The Splunk App for Unix and Linux collects around 200MB of data per host per day. The app can collect slightly more or less based on individual host activity.

I have tried disabling some of the performance-related metrics, increasing polling times, etc. but I couldn't make a significant difference to the indexing rate.

0 Karma

lguinn2
Legend

When you first start any monitor input, Splunk reads the existing files and directories. Typically, there are weeks of data in these files and it will take a while for Splunk to "catch up." So in the beginning, Splunk will be indexing much more data than normal.
If this is a problem for you, I suggest:
1. Identify the files and directories being monitored. One of them is probably /var/log
2. Go to each file/directory mentioned and "tidy it up." In particular, /var/log may have many old files that could be deleted. Or at least, if you don't want Splunk to index the file, move it to another directory, like /var/oldlogs
3. Disable any monitor inputs in the linux app that you want to ignore

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...