Solved: Re: How to configure inputs.conf on a Splunk forwa...

echalex · ‎04-13-2016

Hello,

Is there a way to tell the Splunk forwarder not to keep monitoring a file after it's been indexed once? We are having performance issues with Splunking Oracle audit logs (xml), since there are tens of thousands of them. We have already reduced the retention period in Oracle to 24 hours, but the performance issue remains. I know that we could use ignoreOlderThan, but then we have the risk of missing some files, if the forwarder is down longer than this period.

woodcock · ‎04-26-2016

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

The idea is that whatever logic you desire, you can put into the 2 cron jobs, instead of into inputs.conf.

View solution in original post

woodcock · ‎04-26-2016

You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

The idea is that whatever logic you desire, you can put into the 2 cron jobs, instead of into inputs.conf.

echalex · ‎04-26-2016

Thanks. It is not directly a problem that files will be read only once and not look for updates. The files are generated once-only and never touched again. My real worry would be that someone shuts down the forwarder for longer than the retention period. Hower, this is not a true problem any more.

echalex · ‎04-13-2016

I should add that a non-destructive solution would be ideal. Splunk has read-only access to these files.

javiergn · ‎04-13-2016

You could try with batch mode:

[batch://<path>]
[batch://<path>]
* One time, destructive input of files in <path>.
* For continuous, non-destructive inputs of files, use monitor instead.

# Additional attributes:

move_policy = sinkhole
* IMPORTANT: This attribute/value pair is required. You *must* include
  "move_policy = sinkhole" when defining batch inputs.
* This loads the file destructively.
* Do not use the batch input type for files you do not want to consume
  destructively.
* As long as this is set, Splunk won't keep track of indexed files. Without the
  "move_policy = sinkhole" setting, it won't load the files destructively and
  will keep a track of them. 

host_regex = see MONITOR, above.
host_segment = see MONITOR, above.
crcSalt = see MONITOR, above.

# IMPORTANT: The following attribute is not used by batch:
# source = <string>

followSymlink = [true|false]
* Works similarly to the same setting for monitor, but will not delete files
  after following a symlink out of the monitored directory.

# The following settings work identically as for [monitor::] stanzas, documented above
host_regex = <regular expression>
host_segment = <integer>
crcSalt = <string>
recursive = [true|false]
whitelist = <regular expression>
blacklist = <regular expression>
initCrcLength = <integer>

javiergn · ‎04-13-2016

Can you tell Oracle to place the logs somewhere else that Splunk has write access to?
Or are these logs used by other teams?

ignoreOlderThan is still the way to go mainly because why will your forwarder be down for such a long period without anyone noticing it?

Alternatively, maybe you could write a script that uses the Splunk CLI to add files using the oneshot option. Something like:

$files = get-files *.xml $AUDIT_DIR
foreach ($file in $files)
    splunk add oneshot $file

See this:
http://docs.splunk.com/Documentation/Splunk/6.4.0/Data/MonitorfilesanddirectoriesusingtheCLI

How to configure inputs.conf on a Splunk forwarder to only read a file once?

See just what you’ve been missing | Observability tracks at Splunk University

Weezer at .conf25? Say it ain’t so!

How SC4S Makes Suricata Logs Ingestion Simple

Are you a member of the Splunk Community?

How to configure inputs.conf on a Splunk forwarder to only read a file once?

See just what you’ve been missing | Observability tracks at Splunk University

Weezer at .conf25? Say it ain’t so!

How SC4S Makes Suricata Logs Ingestion Simple