Getting Data In

How do i avoid duplicate entries when switching from universal (light) forwarder to heavy forwarder?

gozulin
Communicator

I have the SUF running on a few servers monitoring a typical logfile and I want to replace it with the heavy forwarder so I can filter out DEBUG entries with a REGEX using the props.conf and transform.conf files.

my current procedure is:

shut down universal forwarder
start heavy forwarder
login to the heavy forwarder's web ui to add a new index
add an inputs.conf to the heavy forwarder
restart the heavy forwarder

I believe this will result in duplicate entries and I'm wondering what is the best way I can migrate from light forwarder to heavy forwarder while avoiding the aforementioned duplicate entries.

Thanks!

woodcock
Esteemed Legend

These are the exact steps that I just used.

SHUTDOWN & PREP (as user "root"):

service splunk stop
mkdir /opt/splunk/
chown splunk:wheel /opt/splunk/

BACKUP (as user "root"):

cp /opt/splunkforwarder/etc/auth/splunk.secret /tmp/
cd /opt/splunkforwarder/var/lib/splunk/
tar czvf /tmp/fishbucket.tgz ./fishbucket
cd /opt/splunkforwarder/etc/apps/
tar czvf /tmp/deploymentclient.tgz ./YourDeploymentClientAppNameGoesHere
cp /etc/init.d/splunk /tmp/splunk.init
chmod a+rw /tmp/*

DESTROY (as user "root"):

cd /opt/
rm -rf /splunkforwarder/

REINSTALL (as user "splunk"):

cd /tmp/
wget ...
cd /opt/
tar czvf /tmp/splunk...tgz

RESTORE (as user "splunk"):

cp /tmp/splunk.secret /opt/splunk/etc/auth/splunk.secret
cd /opt/splunk/etc/apps/
tar xvf /tmp/deploymentclient.tgz
mkdir -p /opt/splunk/var/lib/splunk/
cd /opt/splunk/var/lib/splunk/
tar xvf /tmp/fishbucket.tgz

FIRST TIME RUN (as user "splunk"):

/opt/splunk/bin/splunk start

SERVICE FIXUP (as user "root"):

/opt/splunk/bin/splunk enable boot-start -user splunk
cat /tmp/splunk.init | sed "s%splunkforwarder/bin/%splunk/bin/%" > /etc/init.d/splunk
systemctl daemon-reload
service splunk restart

jbsplunk
Splunk Employee
Splunk Employee

The way any Splunk instance keeps track of the data it has read is by writing entries into an internal index called fishbucket. It's in $SPLUNK_HOME/var/lib/splunk/fishbucket by default.

If you're actually running an LWF, as in the Light Weight Forwarder app, you can upgrade to HWF without duplicates by just enabling the SplunkForwarder app and disabling the SplunkLightForwarder app. You're using the terms for LWF and UF interchangably, so I am unclear about which configuration is actually in place.

If you're using a SUF, you can make a backup of your SUF instance, uninstall and then install a full Splunk instance. Copy the fishbucket from the SUF to the HWF and you'll avoid duplicates.

Alternatively, figure out which data has been read and move it out of the directory where it exists so that Splunk won't see the older data.

That being said, unless you're filtering the majority of data on the HWF, this work is better done by the indexer. Because of the size of cooked events, it can frequently be more work for the indexer(s). If you're just doing this with plurality of the data, move the nullQueue work to the indexing tier.

jbsplunk
Splunk Employee
Splunk Employee

In that instance, copying the fishbucket would result in duplicates within other sources due to the fact that checksums wouldn't match records correctly. You'll need to move the files which have already been read out of the way manually or use a setting like ignoreOlderThan in inputs.conf.

0 Karma

gozulin
Communicator

host's syslog file was copied to the indexer via syslog-ng. Indexer ingested it as a local data input file.

Now, we want to install heavy forwarder on host and send data to the same indexer. No more syslog-ng. No more copying logfiles to the indexer.

The question is how do we avoid duplicate entries during the switch.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

Can you please elaborate on the configuration? I don't think I understand what you're trying to explain.

Are you saying that you were formerly using syslog-ng to deliver data directly to the indexer, and now you'd like to use the heavy forwarder where you're running syslog-ng?

If so, how was the indexer configured to receive that data?

0 Karma

gozulin
Communicator

We figured it'd be better to do the filtering on the hosts because there are many of them while we only have a couple of indexers for each environment.

We're using the universal forwarder. I thought it was the same thing as the light forwarder. My mistake.

I tried to edit my question unsuccessfully (for some reason, the recaptcha will always mark my answers as wrong when editing instead of posting) to ask about:

best way to migrate from using a log file copied from the host by syslog-ng to the indexer as a data input to using the heavy forwarder on the host itself?

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...