Our indexer and all forwarders are running 4.1.2. Recently we developed a need to send events from our forwarders in the syslog format to a syslog server. Three days ago I configured them to still send events to our Splunk indexer, and additionally send them to a syslog server. It was a little challenging to accomplish this because SplunkLightForwarder disables syslog output capabilities. I needed to maintain a small footprint on Spunk forwarders so didn’t want to switch to SplunkForwarder. I ended up creating a new app called wsu-lightforwarder that was an exact copy of SplunkLightForwarder, with the exception of default-mode.conf. After some trial and error I found what options to change in default-mode.conf to enable syslog. I then disabled SplunkLightForwarder and deployed wsu-lightforwarder. I thought I had succeeded but just found out the fishbucket database has grown large on several systems. For example, the fishbucket is ~3 GB on one of our domain controllers after only three days. It is just a matter of time before this becomes a problem on many other systems.
One option I considered is adding indexes.conf to my new wsu-lightforwarder app and setting the _thefishbucket database to something like maxTotalDataSizeMB=100. However, as cautioned at http://blogs.splunk.com/2008/08/14/what-is-this-fishbucket-thing/ I am concerned this might result in duplicate indexing.
I thought SplunkLightForwarder disabled the fishbucket all together. Is there a way to disable the fishbucket, yet still maintain the syslog output capabilities?
We have many Linux systems with a small opt partition, and most of our systems could not withstand a fishbucket if it grows many Gigs. How can I get a small footprint yet still keep the syslog capabilities?
Note: We use followTail=1 or current_only=1 for all the logs we monitor.
Below is my default-mode.conf file from our wsu-lightforwarder app. Again this app is identical to SplunkLightForwarder with the exception of this file. You will see my comments that reflect changes from the original file.
# This file turns off certain pipelines and processors. It is similar to the one in SplunkLightForwarder, but does not disable as many so that syslog forwarding will work. Changes are noted below. 2010-05-14
#Turn off a processor
[pipeline:indexerPipe]
# Not included in the following un-remarked disabled_processors list is syslog-output-generic-processor.
#disabled_processors = indexandforward, diskusage, signing,tcp-output-generic-processor, syslog-output-generic-processor, http-output-generic-processor, stream-output-processor
disabled_processors = indexandforward, diskusage, signing, tcp-output-generic-processor, http-output-generic-processor, stream-output-processor
[pipeline:distributedDeployment]
disabled = true
[pipeline:distributedSearch]
disabled = true
[pipeline:fifo]
disabled = true
# Enabled
[pipeline:merging]
#disabled = true
disabled = false
# Enabled
[pipeline:typing]
#disabled = true
disabled = false
# Enabled
[pipeline:udp]
disabled = false
[pipeline:tcp]
disabled = true
[pipeline:syslogfifo]
disabled = true
# Enabled
[pipeline:syslogudp]
disabled = false
# Enabled
# Note: Remarking the following stanza and disabled_processors cause output to POIROT to fail, although syslog output still worked. Instead, left value for this attribute blank.
[pipeline:parsing]
#disabled_processors = utf8, linebreaker, header, sendOut
disabled_processors =
# do not start the scheduler if in lwf mode
[pipeline:scheduler]
disabled_processors = LiveSplunks
You can delete the existing fishbucket contents with the command detailed here -
./splunk clean eventdata _thefishbucket
Splunk now trackes the files it has seen and indexed/forwarded in the splunk_private_db
directory in $SPLUNK_DB/fishbucket
so as long as the 'many gigs' of data isn't inside this directory, we can safely remove it.
The original post by Jaci was done on my behalf. I have additional information that hopefully will give further clues to identify the issue.
The large volumes of data is accumulating in “C:\Program Files\Splunk\var\lib\splunk\fishbucket\db” and not in the …\fishbucket\splunk_private_db folder.
Three days ago after identifying the fishbucket accumulation problem I used the clean command and deleted several Gigs of data on our domain controllers, but the data keeps accumulating (note: I use the example of our domain controllers because they generate the most events for Splunk to consume, but most or all of our Splunk clients have the problem). On one of the domain controllers there is already 1.75 Gigs in the …\fishbucket\db folder after being cleaned 3 days ago. What I believe is occurring is all of the events normally shipped off the indexer are still being shipped off, but additionally are ending up in the …\fishbucket\db folder. The question is why? I disabled the local app SplunkLightForwarder and deployed and enabled a new app called wsu-lightforwarder. As discussed in the first post wsu-lightforwarder is an exact copy of SplunkLightForwarder with the exception default-mode.conf. There must be a setting in default-mode.conf that is causing the problem, but I don’t know what.
As a test, on one of the domain controllers I disabled wsu-lightforwarder, re-enabled SplunkLightForwarder and used the clean command. This essentially reverted the Splunk client back to the original state before the problem started. I found the …\fishbucket\db folder did not grow. Then I went back to using my wsu-lightforwarder app and the problem started back up again. The problem is definitely with wsu-lightforwarder, and again the only difference with SplunkLightForwarder is default-mode.conf (see first post). What could be causing this? Can I just disable the fishbucket in index.conf? I need to keep the small footprint of a light forwarder, but must be able to output syslog, so I need find a way to make my wsu-lightforwarder app to work.
Thank you gkanapathy and mick for your posts. Any further help would be greatly appreciated.
You can delete the existing fishbucket contents with the command detailed here -
./splunk clean eventdata _thefishbucket
Splunk now trackes the files it has seen and indexed/forwarded in the splunk_private_db
directory in $SPLUNK_DB/fishbucket
so as long as the 'many gigs' of data isn't inside this directory, we can safely remove it.
The fishbucket index is not disabled. However, as of 4.0, it is not a Splunk index. The CRCs are stored in a completely different (and much smaller) data structure.
I have no idea what is in your fishbucket. If it's an old Splunk index left over from an upgrade, you can probably delete the splunk index part of it just fine. (The actual new CRC data is still stored in the fishbucket folder, but a different subfolder.