Can someone explain the normal source of these errors? I've seen these errors in both the
search.log (in the
dispatch folder) and when exporting a bucket using
exporttool. Also, what is the appropriate action after these errors are encountered?
invalid read: addr=4ab60c4 ERROR SplitCompression - invalid separator: path=db_1267889549_1188869197_13/rawdata/2501302080.gz, offset=5623968, separator='l', expected='|' ERROR SplitCompression - invalid separator: path=db_1267889549_1188869197_13/rawdata/2501302080.gz, offset=5623968, separator='l', expected='|'
invalid read: addr=4ab7148 ERROR SplitCompression - gzip seek failure: file=db_1267889549_1188869197_13/rawdata/2501302080.gz, hit unexpected EOF at 7279072 while trying to seek to 15375232 ERROR SplitCompression - gzip seek failure: file=db_1267889549_1188869197_13/rawdata/2501302080.gz, hit unexpected EOF at 7279072 while trying to seek to 15375232
ERROR databasePartitionPolicy - Could not read event: cd=235:137137517 index=_internal
Example 3 was taken from a
search.log file in a different failure situation. But there were a bunch of errors like that intermixed with a bunch of the
invalid separator messages.
A SplitCompression error essentially means that the index files (tsidx) do not agree with the event text files (rawdata). This can result from a defect in splunk, or in the operating system, or the hardware, or a power failure.
Correlate the timestamps for the bucket, and the related rawdata files with any possible crash_log files in your
var/log/splunk directory. As well as any possible errors which landed into splunkd.log around those times. Perhaps this will help identify a defect (if any). If you experienced system crashes around that time, or power loss, that's most likely the cause.
databasePartitionPolicy is in charge of what buckets exist, where they should be located (warm/cold etc) and what should be searched. An error emitted by it could mean that a bucket was frozen during a search, for example.
That tsidx files do not agree with rawdata may indicate that you have experienced data loss, but in some cases the events may be written again, regardless. This comes down to the index data updates not being fully synchronized in an atomic way with the rawdata updates. A shutdown and restart might lead to an inconsistent state between the two that is fully consistent for all events received.
In general, these messages merit investigation. Since some of the messages in splunkd.log are best interpreted in the context of both expereince troubleshooting splunk, and adjoining messages, you would be well served to engage support with a nice juicy diag and a request for root cause analysis. If your local system administration team can correlate the errors to system events though, that's probably your answer.
I suppose that having multiple copies of
splunkd running concurrently could cause this problem as well? (That's a really annoying problem.)