Hi folks,
I've rolled out Splunk UFs on Citrix workstations, but found out that the storage was non-persistent. This caused the workstations to be "whiped" every night, including deletion of the fishbucket, thus causing the UF to reindex all historical data on the workstation. We tried solving this by creating a symbolic link to a separate persistent storage. That is, the fishbucket folder would point to a folder on a different path, "tricking" Splunk into storing the fishbucket files on the persistent storage. Unfortunately I can find no "official" way of changing the location of the fishbucket.
The problem is, this doesn't work. Even though the symbolic link seems to be working just fine, and Splunk writes to the fishbucket files on the persistent storage, we still get reindexing of data every night when the workstations are whiped. I can find no error messages in the Splunk internal logs. Has anyone solved this problem? Are there any insights on exactly how the fishbucket works that could help me? Has anyone created a GPO for this?
Ant help is much appreciated. Thanks!
Hi. Thanks for your comments guys. The fix was to create a symbolic link for the modinputs directory to a separate disk with persistent storage (like we already had done with the fishbucket directory). What I didn't know beforehand is that the input of Windows event logs does not use the fishbucket for keeping track on what logs the UF has already read or not, but instead uses the said modinputs directory.
Side note: Unlike the fishbucket, which is created immediately when the UF is installed, the modinputs directory isn't created before any scripted inputs is actually activated, like the input of Windows event logs. This means that we had to create a script that runs every time after the Citrix workstations has been cleaned, that create the modinputs directory before the UF is started, and links the directory to the separate modinputs directory on the persistent storage using a symbolic link.
Hi. Thanks for your comments guys. The fix was to create a symbolic link for the modinputs directory to a separate disk with persistent storage (like we already had done with the fishbucket directory). What I didn't know beforehand is that the input of Windows event logs does not use the fishbucket for keeping track on what logs the UF has already read or not, but instead uses the said modinputs directory.
Side note: Unlike the fishbucket, which is created immediately when the UF is installed, the modinputs directory isn't created before any scripted inputs is actually activated, like the input of Windows event logs. This means that we had to create a script that runs every time after the Citrix workstations has been cleaned, that create the modinputs directory before the UF is started, and links the directory to the separate modinputs directory on the persistent storage using a symbolic link.
What types of events are getting reindexed - Windows event logs?
If the files are atomically deposited (all at once, not growing by adding to the bottom), then you can use [batch://
instead of [monitor://
and then use the sink_hole
policy which will cause Splunk to delete the file after it is read and forwarded. If the source does tail new data to the end of the file, then you need to make sure that something rotates the file (if *NIX, look at logrotate
) and then have do batch/sink_hole
on the rotated files (which will no longer be growing). This will obviously cause delays in getting the data.
If you're getting the same data re-indexed after wiping the server, then the wiping must be selective. Perhaps the re-indexed historical data could be included in the wiping or the fishbuckets could be excluded.