I have no doubt this is a configuration problem, but unfortunately can't find how to proceed.
The problem occurs when a new Citrix image is put out to the user base. The image updated and the image is then saved. This image is then pushed out to the environment. Once this is done and the system starts the image and splunk starts to re-ingest all of that data which was ingested previously. The UF is unaware this data was already ingested.
I looked into a couple items such as followtail (recommended against doing) and as well as ignoreOlderthan (I believe the file is appended, not rolled over).
Normally I wouldn't really be bothered by this however this causes around an additional 30GB to be indexed. This doesn't occur too often (1-2 times per month) but it does trigger a license warning when it occurs.
Thanks for any help!
I suspect the problem arises because reloading the image destroys the UF's fishbucket. The fishbucket is an index the UF uses to keep track of its position in the files it monitors. If it's lost or reset to a previous state, then files will be re-indexed (if still present). You can find the fishbucket in $SPLUNK_DB/fishbucket.
One solution to the problem is to back up the fishbucket then restore it after reloading the Citrix image and before starting Splunk.
So, I talked to one of the server guys to figure out more along what the environment does. The image itself (Where Splunk runs) is not editable. So the fishbucket I would assume can simply never update and every time the server is rebooted / image is reloaded, it re-grabs all of the event log data it has access to (or from another post I found the modinputs directory). There is a persistent disk available so the question now is can can the fishbucket/modinputs be moved to this persistent directory. I think we can do this with a symbolic link.
Yes, you should be able to move the fishbucket directory. Change the value of SPLUNK_DB in $SPLUNK_HOME/etc/splunk-launch.conf. Symbolic links don't always work with Splunk.