Getting Data In

How do i avoid duplicate entries when switching from universal (light) forwarder to heavy forwarder?


I have the SUF running on a few servers monitoring a typical logfile and I want to replace it with the heavy forwarder so I can filter out DEBUG entries with a REGEX using the props.conf and transform.conf files.

my current procedure is:

shut down universal forwarder
start heavy forwarder
login to the heavy forwarder's web ui to add a new index
add an inputs.conf to the heavy forwarder
restart the heavy forwarder

I believe this will result in duplicate entries and I'm wondering what is the best way I can migrate from light forwarder to heavy forwarder while avoiding the aforementioned duplicate entries.


Esteemed Legend

These are the exact steps that I just used.

SHUTDOWN & PREP (as user "root"):

service splunk stop
mkdir /opt/splunk/
chown splunk:wheel /opt/splunk/

BACKUP (as user "root"):

cp /opt/splunkforwarder/etc/auth/splunk.secret /tmp/
cd /opt/splunkforwarder/var/lib/splunk/
tar czvf /tmp/fishbucket.tgz ./fishbucket
cd /opt/splunkforwarder/etc/apps/
tar czvf /tmp/deploymentclient.tgz ./YourDeploymentClientAppNameGoesHere
cp /etc/init.d/splunk /tmp/splunk.init
chmod a+rw /tmp/*

DESTROY (as user "root"):

cd /opt/
rm -rf /splunkforwarder/

REINSTALL (as user "splunk"):

cd /tmp/
wget ...
cd /opt/
tar czvf /tmp/splunk...tgz

RESTORE (as user "splunk"):

cp /tmp/splunk.secret /opt/splunk/etc/auth/splunk.secret
cd /opt/splunk/etc/apps/
tar xvf /tmp/deploymentclient.tgz
mkdir -p /opt/splunk/var/lib/splunk/
cd /opt/splunk/var/lib/splunk/
tar xvf /tmp/fishbucket.tgz

FIRST TIME RUN (as user "splunk"):

/opt/splunk/bin/splunk start

SERVICE FIXUP (as user "root"):

/opt/splunk/bin/splunk enable boot-start -user splunk
cat /tmp/splunk.init | sed "s%splunkforwarder/bin/%splunk/bin/%" > /etc/init.d/splunk
systemctl daemon-reload
service splunk restart

Splunk Employee
Splunk Employee

The way any Splunk instance keeps track of the data it has read is by writing entries into an internal index called fishbucket. It's in $SPLUNK_HOME/var/lib/splunk/fishbucket by default.

If you're actually running an LWF, as in the Light Weight Forwarder app, you can upgrade to HWF without duplicates by just enabling the SplunkForwarder app and disabling the SplunkLightForwarder app. You're using the terms for LWF and UF interchangably, so I am unclear about which configuration is actually in place.

If you're using a SUF, you can make a backup of your SUF instance, uninstall and then install a full Splunk instance. Copy the fishbucket from the SUF to the HWF and you'll avoid duplicates.

Alternatively, figure out which data has been read and move it out of the directory where it exists so that Splunk won't see the older data.

That being said, unless you're filtering the majority of data on the HWF, this work is better done by the indexer. Because of the size of cooked events, it can frequently be more work for the indexer(s). If you're just doing this with plurality of the data, move the nullQueue work to the indexing tier.

Splunk Employee
Splunk Employee

In that instance, copying the fishbucket would result in duplicates within other sources due to the fact that checksums wouldn't match records correctly. You'll need to move the files which have already been read out of the way manually or use a setting like ignoreOlderThan in inputs.conf.

0 Karma


host's syslog file was copied to the indexer via syslog-ng. Indexer ingested it as a local data input file.

Now, we want to install heavy forwarder on host and send data to the same indexer. No more syslog-ng. No more copying logfiles to the indexer.

The question is how do we avoid duplicate entries during the switch.

0 Karma

Splunk Employee
Splunk Employee

Can you please elaborate on the configuration? I don't think I understand what you're trying to explain.

Are you saying that you were formerly using syslog-ng to deliver data directly to the indexer, and now you'd like to use the heavy forwarder where you're running syslog-ng?

If so, how was the indexer configured to receive that data?

0 Karma


We figured it'd be better to do the filtering on the hosts because there are many of them while we only have a couple of indexers for each environment.

We're using the universal forwarder. I thought it was the same thing as the light forwarder. My mistake.

I tried to edit my question unsuccessfully (for some reason, the recaptcha will always mark my answers as wrong when editing instead of posting) to ask about:

best way to migrate from using a log file copied from the host by syslog-ng to the indexer as a data input to using the heavy forwarder on the host itself?

0 Karma
Get Updates on the Splunk Community!

Platform Newsletter Highlights | March 2023

 March 2023 | Check out the latest and greatestIntroducing Splunk Edge Processor, simplified data ...

Enterprise Security Content Updates (ESCU) - New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 3 releases of new content via the Enterprise ...

Thought Leaders are Validating Your Hard Work and Training Rigor

As a Splunk enthusiast and member of the Splunk Community, you are one of thousands who recognize the value of ...