Getting Data In

How to monitor current and future log files and how long should the data be retained in Splunk?

pprakash2
Explorer

I am generating log files with date appended to the log file.

Examples:
xxxx_20172702.log
xxxx_20172602.log
xxxx_20172502.log

is it possible to monitor all these log files & the log files which gets generated in future on daily basis? If yes, how do i configure it in inputs.conf file? I am following this naming convention, so that i can archive files which are 10 days old and retain the rest. If i maintain a single log file, its difficult to archive the older data. Please let me know the best approach.

Also if once the data in the log file is indexed, how long would the data be retained in the Splunk index? Is it until we clean up the index in Splunk?

0 Karma

adayton20
Contributor

This documentation might help answer some of the questions you have regarding monitoring files and directories:
http://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Monitorfilesanddirectorieswithinputs.conf

If your logs are generated from a script, try reading about scripted inputs:
http://docs.splunk.com/Documentation/Splunk/6.5.2/AdvancedDev/ScriptSetup

I think in your case, you might take a look at Example 2 from the documentation pertaining to monitoring files and directories:
To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

or for Windows

[monitor://C:\path\to\your\stuff\*.log]

Splunk will monitor any new additions or changes to the files you set up on a monitoring stanza on.

The retention of your data depends on a number of factors. Most of the time, this depends on how much storage you have and Splunk will automatically remove data from indexes should storage be an issue. You can set a retirement date for your data by either size or age (time), and even do this per index if you care more about the retention of specific data. This document explains a bit about how Splunk data "ages" and "rolls" to different buckets depending on your settings:
https://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Setaretirementandarchivingpolicy

If you want to determine how long your data is hanging around, you could try this:

| dbinspect 
    [ eventcount summarize="false" index=* 
    | dedup index 
    | fields index] 
| stats min(startEpoch) AS startEpoch,min(modTime) AS modTime by index,splunk_server 
| convert ctime(startEpoch) AS startEpoch 
| rename modTime AS "Oldest Bucket",startEpoch AS "Earliest Event Time"