We have a log file which a team wants to index in Splunk every 30 minutes. And we would like to keep the log data at source even after indexing in Splunk. What are the options we have ..?
Example to monitor the apache web server access_log and error_log ., I create the below staza in the inputs.conf.
Here you need not specify intervals like 5m/ 30 min ., whenever the file content changes the logs are monitored and sent for indexing.
[monitor://<path>]
E.g. inputs.conf
[monitor:///var/log/httpd]
sourcetype = access_common
index = httpd_logs
sourcetype=access_combined
ignoreOlderThan = 7d
How the monitor processor works ?
Specify a path to a file or directory and the monitor processor consumes any new data written to that file or directory. This is how you can monitor live application logs such as those coming from Web access logs, Java 2 Platform Enterprise Edition (J2EE) or .NET applications, and so on.
Splunk software monitors and indexes the file or directory as new data appears. You can also specify a mounted or shared directory, including network file systems, as long as Splunk software can read from the directory. If the specified directory contains subdirectories, the monitor process recursively examines them for new files, as long as the directories can be read.
You can include or exclude files or directories from being read by using whitelists and blacklists.
If you disable or delete a monitor input, Splunk software does not stop indexing the files that the input references. It only stops checking those files again. To stop all in-process data indexing, the Splunk server must be stopped and restarted.
Interval parameter
e.g interval = 300 //Every 5 min once
Use the interval parameter to schedule and monitor scripts. The interval parameter specifies how long a script waits before it restarts.
The interval parameter is useful for a script that performs a task periodically. The script performs a specific task and then exits. The interval parameter specifies when the script restarts to perform the task again.
The interval parameter is also useful to ensure that a script restarts, even if a previous instance of the script exits unexpectedly.
Entering an empty value for interval results in a script only being executed on start and/or endpoint reload (on edit).
I'm confused by your question.
First, why can't you just monitor it normally and let Splunk index the new events as they occur?
Second, Splunk doesn't delete your source data - it just reads it - so it will still be at your source.
How does the file content changes? Do you want to just grab the difference from 30 mins back?
Yes , only the new data.Its an application log file the data only gets appended.
Any specific reason for monitoring every 30 mins? If your data has timestamp then there l can be data for every min.
This(30min) has been decided by customer to avoid some performance issues. Can we increase that 1min to 30 if we have time stamp in the logs ..?
Is it being read by a Universal Forwarder?
Do they experience a performance issue that they are trying to work around or do they just think that a 30 minute interval will be better?
If this log file is very busy then you could have a larger performance impact if you try to ingest it in larger 30 minute amounts. Little bites versus big bites.
Please describe the performance issue and show us the inputs.conf
stanza.