The Splunk Universal Forwarder 6.5.1 seems to skip the data added to the log file, once the splunk service was not running.
Problem: Forwarder is configured to forward and index the logs of some custom Java based application. It does this correctly, until the service is stopped by any reason, i.e. system update. Should the custom application write any additional data to the log, after the forwarder is stopped, those records will not be picked up after the service restart,
Relevant stanza looks like this (allowing logs of the the few apps for given source-type) in inputs.conf:
[monitor:///var/log/splunk/custom/*/<SOURCETYPE_NAME>/...]
sourcetype=<SOURCETYPE_NAME>
whitelist = (a|b|c).*
followSymlink = true
I know that Splunk forwarder does not reindex the files by default and that it uses CRC for handling the rolling log files
(http://docs.splunk.com/Documentation/Splunk/6.0.1/Data/Howlogfilerotationishandled)
It seems that only the CRC for the begining of the file is checked, once the service is running again. In the logs I even see that after starting back, it checks the file:
08-22-2018 18:31:22.655 +0200 INFO WatchedFile - Will begin reading at offset=91875 for file='/var/log/splunk/custom/<INDEX_NAME>/<SOURCETYPE_NAME>/logs/c.2018-08-22.2.log'.
The offset mentioned is exactly the last offset the service has seen before being stopped ( checked with linux dd utility), but I do not see the data starting from that offset in the search results.
Question: How can I make sure that the data added to that log file during the period of splunk outage, actually gets indexed, and provisioned during the searches?
Given that Splunkd.log states that it does start reading at the correct point in the file, could it be some timestamping issue or so? What does your time config look like for this sourcetype? Have you tried searching for All Time, to see if you can find the events that occurred during the forwarder downtime?
Thank!
No specific time config, default values. File is monitored indexed and forwarded by the forwarder.
I have not tested the "All Time" period, as the log messages expected to be found might have duplicates if one does not consider the timestamp. But, I have tested bucketing the events by "_time" with the span of one minute for the corresponding date. Then checked and compared the counts with the actual source files.
Counts for the buckets that correspond to time when forwarder was running are correct, but no counts(raw events) for the added portion during artificial outage of the forwarder in my test case.
Configure and use persistent queues to prevent data loss during outage.
persistentQueueSize = <integer>[KB|MB|GB|TB]
* Maximum size of the persistent queue file.
* Defaults to 0 (no persistent queue).
* If set to some value other than 0, persistentQueueSize must be larger than
the in-memory queue size (as defined by the 'queueSize' setting in
inputs.conf or 'maxSize' settings in [queue] stanzas in server.conf).
* Persistent queues can help prevent loss of transient data. For information on
persistent queues and how the 'queueSize' and 'persistentQueueSize' settings
interact, see the online documentation.
* Defaults to 0 (no persistent queue).
Thanks a lot for the suggestion, but before testing I would like to ask for a short clarification.
I am not sure I get how the persisted queue would help if the splunk service is stopped or killed, as the checking is then done by CRCs on the file. Furthermore, in the documentation you have mentioned it states:
Persistent queues are not available for these input types:
Monitor
Batch
File system change monitor
splunktcp (input from Splunk forwarders)
I would consider adding the lines to log file, as the "File system change monitor" input action type.
Could you please eleborate a bit, and explain why the persistent queue is a good solution for monitoring the rolled file logs? Thanks!
Valid question. Persistent queues don't help for downtime of the forwarder itself, only for downtime of downstream components (e.g. intermediate forwarder, indexer).