Hello,
I have a log file stored in an s3 bucket that i am trying to index properly. I am using SQS based messaging to pull the log down with the Splunk_AWS_Add-on.
The current process that is not working is.
The log is being re-indexed completely every time this happens instead of simply the new data in the log.
According to splunk documentation the way indexing works is it looks at the first and last 256 characters of a file to determine differences. If it finds any at the top it immediately reindexes, but if not then it parses through the file until it finds anything different then only adds new events to the index if they are appended at the end.
This should work perfectly fine with a file with appends at the end like the one I am currently using. After searching through lots of configuration and documentation notes I havn't found anything on the Splunk side that could remedy this. All options point to using a input file and its parameters which is not an option with SQS based retrieval in the add-on.
The one thing I have found that may be causing this is that it appears S3 places metadata on each file. Now with each new upload and overwrite that metadata would change. Is splunk reading this metadata and marking it as a new file and immediately re indexing the entire thing?
Any help on this would be greatly appreciated. This is the last hiccup in making a standard for our custom logs.
I'm seeing a similar problem.
The file hashes are the same but the 'stat <file>' shows the metadata has changed. Splunk seems to also re-read a file if the metadata has changed.
Did you get a resolution?