I'm trying to monitor files on a Windows server and it isn't working. I've placed a few stanzas like this into etc/deployment-apps/Exchange/local/inputs.conf:
[monitor://D:\Microsoft\Exchange Server\V14\Logging\RPC Client Access\*.log]
disabled = false
followTail = 1
sourcetype = exchange
I've verified that the deployment server is pushing this app out to the server.
I've verified that the path is correct.
I've verified that log files in the directory, including the currently open log file, can be opened up with 'notepad'.
I've verified that the Universal Forwarder was restarted after the app was verified as in place.
I pushed the app to a second server and manually created a log file at the appropriate path, and that promptly showed up in Splunk. So it doesn't appear to be syntax or anything.
What else can I do to debug this? Is there a splunk log somewhere that might talk about what it sees for files, tries to open, and any errors that crop up along the way?
Update after following jbsplunk's note:
The FileStatus link was very useful. I can now see that two things are happening:
1) Some files are being indexed, but instead of sourcetype 'exchange', they're logging under sourcetypes 'exchange-103' and 'exchange-112'. Any idea why the sourcetype I've specified in my monitor stanza is being modified?
2) Other files are not being matched, possibly because *.log doesn't match *.LOG. Is it the case that Splunk monitor path specifications are case sensitive on Windows where the filesystem is case insensitive? (angle brackets replaced with {} due to HTML rendering conflict with forum)
{s:key name="D:\Microsoft\Exchange Server\V14\Logging\RPC Client Access\RCA_20120208-1.LOG"}
{s:dict}
{s:key name="parent"}D:\Microsoft\Exchange Server\V14\Logging\RPC Client Access\*.log{/s:key}
{s:key name="type"}File did not match whitelist '^D:\\Microsoft\\Exchange Server\\V14\\Logging\\RPC Client Access\\[^\\]*\.log$'.{/s:key}
{/s:dict}
{/s:key}
From $SPLUNK_HOME/bin you can run 'splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus'
. It should tell you what the status of the monitored files are, if Splunk read them, what the size was when they were read, and to what percentage splunk read the files. If the files were ignored for some reason, it will tell you why they were ignored.
http://splunk-base.splunk.com/answers/26664/cant-add-my-complete-list-of-sources
http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
With regard to your observations:
On the first point, sourcetypes increment because of changes within the file structure, where the splunk 'learned' app believes there is a new header.
Please see this post for details on how to correct for this:
http://splunk-base.splunk.com/answers/42522/input-data-from-directory-sourcetype-changing
On the second issue, splunk is case sensitive when it expands inputs into regex, LOG is not the same as log. It might be better, instead of having a monitor with the wildcard in it, simply to monitor the directory, D:\Microsoft\Exchange Server\V14\Logging\RPC Client Access\ and use the whitelist option for inputs:
whitelist = <regular expression>
* If set, files from this path are monitored only if they match the specified regex.
* Takes precedence over the deprecated _whitelist attribute, which functions the same way.
The reason that your sourcetype is getting messed up is just a bad default in Splunk. Add this to props.conf on the machine where the files are being reading (where the monitor/inputs are configured):
[exchange]
CHECK_FOR_HEADER = false
That's better than the way referenced here, as it prevents it from getting messed up in the first place, rather than fixing it back after it got messed up.
This also solves the problem - with less contortions, as you say. Thanks!
From $SPLUNK_HOME/bin you can run 'splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus'
. It should tell you what the status of the monitored files are, if Splunk read them, what the size was when they were read, and to what percentage splunk read the files. If the files were ignored for some reason, it will tell you why they were ignored.
http://splunk-base.splunk.com/answers/26664/cant-add-my-complete-list-of-sources
http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
With regard to your observations:
On the first point, sourcetypes increment because of changes within the file structure, where the splunk 'learned' app believes there is a new header.
Please see this post for details on how to correct for this:
http://splunk-base.splunk.com/answers/42522/input-data-from-directory-sourcetype-changing
On the second issue, splunk is case sensitive when it expands inputs into regex, LOG is not the same as log. It might be better, instead of having a monitor with the wildcard in it, simply to monitor the directory, D:\Microsoft\Exchange Server\V14\Logging\RPC Client Access\ and use the whitelist option for inputs:
whitelist = <regular expression>
* If set, files from this path are monitored only if they match the specified regex.
* Takes precedence over the deprecated _whitelist attribute, which functions the same way.
Using
Replacing .log with .LOG did fix the second issue. I agree a directory/regex would be a more applicable solution, but believe I can rely upon Exchange to continue with LOG.
The props/transforms fix you suggested isn't working yet. Most likely cause is that I'm trying to match the munged sourcetype of [exchange*] in props.conf; the doc page isn't clear to me whether wildcards for sourcetype
Updated answer to reflect the information you've provided, I hope this helps!
Thank you, this is very helpful. It helped me identify two questions, which I posted back up to the main question to keep it all together, and I would greatly appreciate any thoughts you have given that new data.