Red banner message in GUI (below) regarding sentinel.txt file lock is preventing updates from the GUI.
Error fixing dangling data: Failed to
lock
/mnt/search_head_pool/etc/apps/sentinel.txt
with return code 1: Success
I do not presently have access to the logs of the system (Splunk 4.2.2 101277 on RHEL).
Yes, you can delete these files while Splunk is stopped. They are re-created on demand.
If you see a "stale" sentinel.txt.lock file remaining while Splunk is stopped, that is probably the source of this error.
What is the output of "splunk pooling validate"?
Yes, you can delete these files while Splunk is stopped. They are re-created on demand.
If you see a "stale" sentinel.txt.lock file remaining while Splunk is stopped, that is probably the source of this error.
What is the output of "splunk pooling validate"?
The most common reasons are mentioned in the comment immediately preceding yours: 1) stale lock file (caused by a crash, for example), or 2) poor performance of shared storage, leading to slow I/O and contention on the lock file.
Some improvements to splunkd were made to reduce the amount of I/O we perform against sentinel.txt; these improvements landed in 5.0.6 and 6.0 (SPL-66563)
@ewoo, in what circumstances this "failed to lock sentinel.txt" error will occur?
The error is displayed if:
1) the user triggers an action that requires writing to a conf file, and
2) the write fails when the user cannot acquire a file-based mutex
You can't suppress the error. The underlying failure must be addressed -- remove stale lock files, investigate contention on the lock file and/or performance of shared storage, etc.
Why would this error be displayed to the user? Can it be suppressed?
This file is only created/used when pooling is enabled.
The file itself acts as the synchronization mechanism for conf writes. In other words, a member of the SHP must "own" this lockfile in order to make conf changes. If a member of the pool X finds the lockfile already owned by another member Y, X will wait for Y to relinquish ownership of the lockfile.
How does Splunk handle this file in Pooling mode? Which server gets the "lock"? What happens when multiple servers need to lock the file?