Getting Data In

How to avoid or minimize duplication of data during the switch of data input of the same data from heavy forwarder to syslogger?

mlevsh
Builder

We have our Heavy forwarder server monitoring a shared directory for proxy data log file provided by our proxy team.
We want to switch from monitoring a log in a shared directory on Heavy Forwarder
to monitoring logs on Syslog server.
Right now the data input is running on Heavy Forwarder but we already receiving the same data on our Syslogger server.
So we want to disable data input on HF and enable data input on Syslogger

How do avoid or minimize duplication of data during the switch of data input of the same data from heavy forwarder to syslogger?

We cannot use ignoreOlderThan in inputs.conf because ignoreOlderThan looks at the file's modification timestamp and our logs are getting appended constantly

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

I'm assuming you have a UF/HF set up on the Syslog Server and that's all ready to go, you just haven't turned on that input yet?

If by "minimizing" it's good enough to just not have much overlap (a few seconds) or be missing only a few events (again a couple of seconds worth), then here's one way.

  • Stop Splunk on the Syslog server.
  • Make the changes to the config so when you start it, the syslog server will send this data in.
  • Stop the HF, and immediately truncate the syslog file. See below for discussion
  • Start the syslog server's splunk instance/hf/uf
  • Confirm data is coming in
  • Finish decomissioning the HF (or removing that input and restarting splunk on it if it has other inputs).

Truncating the log can take any one of 3 dozen methods. rm <file> & touch <file> would likely be the least likely to impact any other inputs.

When you stop the HF and truncate the syslog file, you could stop either one first. Which order you do it in - if you don't do any more work - will determine if you have a few seconds of possible gap or a few seconds of overlap, right? In most data, a tiny bit of extra or a few seconds missing isn't a big deal.

If you are willing to do more work to get NO overlap or gaps, you could do this.

  • Stop Splunk on the Syslog server.
  • Make the changes to the config so when you start it, the syslog server will send this data in.
  • Truncate the syslog file
  • Stop the HF a few moments later, leaving the syslog file with a bit of overlap.
  • open Splunk and search your index/sourcetype/whatever and find the most recent event
  • Manually edit the syslog system's log file and delete the overlap events up to that point
  • Start Splunk on the syslog server.
  • Confirm data is coming in
  • Finish decomissioning the HF (or removing that input and restarting splunk on it if it has other inputs).

I'd recommend reading through this a time or two, and doing a practice run to confirm you got all the steps ready. But really, with Splunk off in both places temporarily, that data's not coming in - you could spend a few minutes at that point fixing things up and making sure they're ready, then just turn it on and test.

So, test before implementing. I am not responsible if this process results in your server catching fire.

Also don't forget to make sure sourcetypes and all that are set up right on the new input!

Wishing you luck and Happy Splunking,
-Rich

View solution in original post

Richfez
SplunkTrust
SplunkTrust

I'm assuming you have a UF/HF set up on the Syslog Server and that's all ready to go, you just haven't turned on that input yet?

If by "minimizing" it's good enough to just not have much overlap (a few seconds) or be missing only a few events (again a couple of seconds worth), then here's one way.

  • Stop Splunk on the Syslog server.
  • Make the changes to the config so when you start it, the syslog server will send this data in.
  • Stop the HF, and immediately truncate the syslog file. See below for discussion
  • Start the syslog server's splunk instance/hf/uf
  • Confirm data is coming in
  • Finish decomissioning the HF (or removing that input and restarting splunk on it if it has other inputs).

Truncating the log can take any one of 3 dozen methods. rm <file> & touch <file> would likely be the least likely to impact any other inputs.

When you stop the HF and truncate the syslog file, you could stop either one first. Which order you do it in - if you don't do any more work - will determine if you have a few seconds of possible gap or a few seconds of overlap, right? In most data, a tiny bit of extra or a few seconds missing isn't a big deal.

If you are willing to do more work to get NO overlap or gaps, you could do this.

  • Stop Splunk on the Syslog server.
  • Make the changes to the config so when you start it, the syslog server will send this data in.
  • Truncate the syslog file
  • Stop the HF a few moments later, leaving the syslog file with a bit of overlap.
  • open Splunk and search your index/sourcetype/whatever and find the most recent event
  • Manually edit the syslog system's log file and delete the overlap events up to that point
  • Start Splunk on the syslog server.
  • Confirm data is coming in
  • Finish decomissioning the HF (or removing that input and restarting splunk on it if it has other inputs).

I'd recommend reading through this a time or two, and doing a practice run to confirm you got all the steps ready. But really, with Splunk off in both places temporarily, that data's not coming in - you could spend a few minutes at that point fixing things up and making sure they're ready, then just turn it on and test.

So, test before implementing. I am not responsible if this process results in your server catching fire.

Also don't forget to make sure sourcetypes and all that are set up right on the new input!

Wishing you luck and Happy Splunking,
-Rich

mlevsh
Builder

@rich7177 Thank you so much for a detailed answer!
As we are getting a lot of other data types to the same Syslog servers - we might not be able to stop Splunk on Syslog though.
but truncating syslog log for the data source we are trying to switch sounds like a very good idea!
Thank you

0 Karma

Richfez
SplunkTrust
SplunkTrust

Great!

I would like to point out a couple of general things:

If the HF is doing tcp or udp network inputs, then yes - stopping it will stop those inputs and that data will be lost. This is exactly why it's better to not use Splunk to listen on network ports for data.

BUT, turning off the Splunk HF/UF on the syslog server - as long as it's just readin files that syslog creates - will only delay the data for a bit. As long as syslog is running, the Splunk UF/HF can be stopped without issue - the syslog server will continue to listen for data and to write it to disk, then when Splunk gets turned back on it'll start reading those files from where it left off and grab the bit of backlog. NOTE: just make sure no logs rotate while you have Splunk turned off. Usually not an issue - They're probably set to rotate nightly, so just don't do this overnight. 🙂

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...