Getting Data In

How to configure inputs.conf on a Splunk forwarder to only read a file once?

echalex
Builder

Hello,

Is there a way to tell the Splunk forwarder not to keep monitoring a file after it's been indexed once? We are having performance issues with Splunking Oracle audit logs (xml), since there are tens of thousands of them. We have already reduced the retention period in Oracle to 24 hours, but the performance issue remains. I know that we could use ignoreOlderThan, but then we have the risk of missing some files, if the forwarder is down longer than this period.

1 Solution

woodcock
Esteemed Legend

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

The idea is that whatever logic you desire, you can put into the 2 cron jobs, instead of into inputs.conf.

View solution in original post

woodcock
Esteemed Legend

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

The idea is that whatever logic you desire, you can put into the 2 cron jobs, instead of into inputs.conf.

echalex
Builder

Thanks. It is not directly a problem that files will be read only once and not look for updates. The files are generated once-only and never touched again. My real worry would be that someone shuts down the forwarder for longer than the retention period. Hower, this is not a true problem any more.

0 Karma

echalex
Builder

I should add that a non-destructive solution would be ideal. Splunk has read-only access to these files.

0 Karma

javiergn
SplunkTrust
SplunkTrust

You could try with batch mode:

[batch://<path>]
[batch://<path>]
* One time, destructive input of files in <path>.
* For continuous, non-destructive inputs of files, use monitor instead.

# Additional attributes:

move_policy = sinkhole
* IMPORTANT: This attribute/value pair is required. You *must* include
  "move_policy = sinkhole" when defining batch inputs.
* This loads the file destructively.
* Do not use the batch input type for files you do not want to consume
  destructively.
* As long as this is set, Splunk won't keep track of indexed files. Without the
  "move_policy = sinkhole" setting, it won't load the files destructively and
  will keep a track of them. 

host_regex = see MONITOR, above.
host_segment = see MONITOR, above.
crcSalt = see MONITOR, above.

# IMPORTANT: The following attribute is not used by batch:
# source = <string>

followSymlink = [true|false]
* Works similarly to the same setting for monitor, but will not delete files
  after following a symlink out of the monitored directory.

# The following settings work identically as for [monitor::] stanzas, documented above
host_regex = <regular expression>
host_segment = <integer>
crcSalt = <string>
recursive = [true|false]
whitelist = <regular expression>
blacklist = <regular expression>
initCrcLength = <integer>

javiergn
SplunkTrust
SplunkTrust

Can you tell Oracle to place the logs somewhere else that Splunk has write access to?
Or are these logs used by other teams?

ignoreOlderThan is still the way to go mainly because why will your forwarder be down for such a long period without anyone noticing it?

Alternatively, maybe you could write a script that uses the Splunk CLI to add files using the oneshot option. Something like:

$files = get-files *.xml $AUDIT_DIR
foreach ($file in $files)
    splunk add oneshot $file

See this:
http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/MonitorfilesanddirectoriesusingtheCLI

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...