Re: How do i avoid duplicate entries when switchin...

gozulin · ‎02-05-2014

I have the SUF running on a few servers monitoring a typical logfile and I want to replace it with the heavy forwarder so I can filter out DEBUG entries with a REGEX using the props.conf and transform.conf files.

my current procedure is:

shut down universal forwarder
start heavy forwarder
login to the heavy forwarder's web ui to add a new index
add an inputs.conf to the heavy forwarder
restart the heavy forwarder

I believe this will result in duplicate entries and I'm wondering what is the best way I can migrate from light forwarder to heavy forwarder while avoiding the aforementioned duplicate entries.

Thanks!

woodcock · ‎07-05-2018

These are the exact steps that I just used.

SHUTDOWN & PREP (as user "root"):

service splunk stop
mkdir /opt/splunk/
chown splunk:wheel /opt/splunk/

BACKUP (as user "root"):

cp /opt/splunkforwarder/etc/auth/splunk.secret /tmp/
cd /opt/splunkforwarder/var/lib/splunk/
tar czvf /tmp/fishbucket.tgz ./fishbucket
cd /opt/splunkforwarder/etc/apps/
tar czvf /tmp/deploymentclient.tgz ./YourDeploymentClientAppNameGoesHere
cp /etc/init.d/splunk /tmp/splunk.init
chmod a+rw /tmp/*

DESTROY (as user "root"):

cd /opt/
rm -rf /splunkforwarder/

REINSTALL (as user "splunk"):

cd /tmp/
wget ...
cd /opt/
tar czvf /tmp/splunk...tgz

RESTORE (as user "splunk"):

cp /tmp/splunk.secret /opt/splunk/etc/auth/splunk.secret
cd /opt/splunk/etc/apps/
tar xvf /tmp/deploymentclient.tgz
mkdir -p /opt/splunk/var/lib/splunk/
cd /opt/splunk/var/lib/splunk/
tar xvf /tmp/fishbucket.tgz

FIRST TIME RUN (as user "splunk"):

/opt/splunk/bin/splunk start

SERVICE FIXUP (as user "root"):

/opt/splunk/bin/splunk enable boot-start -user splunk
cat /tmp/splunk.init | sed "s%splunkforwarder/bin/%splunk/bin/%" > /etc/init.d/splunk
systemctl daemon-reload
service splunk restart

jbsplunk · ‎02-05-2014

The way any Splunk instance keeps track of the data it has read is by writing entries into an internal index called fishbucket. It's in $SPLUNK_HOME/var/lib/splunk/fishbucket by default.

If you're actually running an LWF, as in the Light Weight Forwarder app, you can upgrade to HWF without duplicates by just enabling the SplunkForwarder app and disabling the SplunkLightForwarder app. You're using the terms for LWF and UF interchangably, so I am unclear about which configuration is actually in place.

If you're using a SUF, you can make a backup of your SUF instance, uninstall and then install a full Splunk instance. Copy the fishbucket from the SUF to the HWF and you'll avoid duplicates.

Alternatively, figure out which data has been read and move it out of the directory where it exists so that Splunk won't see the older data.

That being said, unless you're filtering the majority of data on the HWF, this work is better done by the indexer. Because of the size of cooked events, it can frequently be more work for the indexer(s). If you're just doing this with plurality of the data, move the nullQueue work to the indexing tier.

jbsplunk · ‎02-06-2014

In that instance, copying the fishbucket would result in duplicates within other sources due to the fact that checksums wouldn't match records correctly. You'll need to move the files which have already been read out of the way manually or use a setting like ignoreOlderThan in inputs.conf.

gozulin · ‎02-06-2014

host's syslog file was copied to the indexer via syslog-ng. Indexer ingested it as a local data input file.

Now, we want to install heavy forwarder on host and send data to the same indexer. No more syslog-ng. No more copying logfiles to the indexer.

The question is how do we avoid duplicate entries during the switch.

jbsplunk · ‎02-05-2014

Can you please elaborate on the configuration? I don't think I understand what you're trying to explain.

Are you saying that you were formerly using syslog-ng to deliver data directly to the indexer, and now you'd like to use the heavy forwarder where you're running syslog-ng?

If so, how was the indexer configured to receive that data?

gozulin · ‎02-05-2014

We figured it'd be better to do the filtering on the hosts because there are many of them while we only have a couple of indexers for each environment.

We're using the universal forwarder. I thought it was the same thing as the light forwarder. My mistake.

I tried to edit my question unsuccessfully (for some reason, the recaptcha will always mark my answers as wrong when editing instead of posting) to ask about:

best way to migrate from using a log file copied from the host by syslog-ng to the indexer as a data input to using the heavy forwarder on the host itself?

How do i avoid duplicate entries when switching from universal (light) forwarder to heavy forwarder?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation