Getting Data In

Best practice for getting data into Splunk without a forwarder?

Motivator

A client is interested in the best ways to generally get data into Splunk without installing forwarder on all their machines. Is there a doc on this somewhere?

I can start with:

  1. Decide to use the forwarder on all machines, pointing either to the Indexer or another Forwarder.
  2. Decide to use the forwarder only on a central log server already logging everything.
  3. Have a daemon (syslog-ish) on Indexer accept data on network port and write to files
  4. Put files on a network share accessible to Indexer
  5. Poll servers via WMI (Windows)
  6. Write a script on a forwarder or indexer to pull data from somewhere
  7. Periodic upload from another system to the Indexer

Anyone have other items to add, or a better way of ranking the list above?

0 Karma
1 Solution

Communicator

I wouldn't use the qualifier "all" in (1).

Not all machines will necessarily have the capacity or network architecture that allows you to do it, however, you may still want to install a forwarder on as many machines as you can.

Even if you have a standard build, and include a forwarder as part of your standard build, you'll still have systems which end up as "exceptions" for whatever reason.


RE: (3) splunk can listen for syslog inputs directly. You don't need to do this manually.


Missing: You can scp the data back to the indexer with a scripted input, or a job scheduler, or cron. Then pick the data up with the local file system. This can be handy when you want to grab some data off a production server without having to submit a change control to install something new, or change anything on that server. 😉 Also, sometimes the log files are created that way - the result of batch jobs, and it hardly seems worthwhile running a full time splunk forwarder when you only need the output from a batch job once per day.


Another architecture option:

Sometimes you might want to forward the splunk data to another forwarder, and then send it to the indexer from there.

I can think of two reasons off the top of my head:

  1. You want to minimize the number of holes in a firewall that you have to open, so you designate one relay box to pull everything from the other side of the firewall and send it through.

  2. You want to run lightweight forwarders on your production systems, but need a "full fledged" forwarder to do filtering before the data ends up on the indexer.


A Note on NFS:

It is slow. If you have a lot of log files, you will quickly run into latency issues. On the other hand, it is usually quick and easy to set up, and can dramatically shorten the deployment time for many low volume sources.


I would argue which solution is "best" depends on what you are trying to do and what kind of data you are working with. That is why there are so many possible solutions.

View solution in original post

0 Karma

Communicator

I wouldn't use the qualifier "all" in (1).

Not all machines will necessarily have the capacity or network architecture that allows you to do it, however, you may still want to install a forwarder on as many machines as you can.

Even if you have a standard build, and include a forwarder as part of your standard build, you'll still have systems which end up as "exceptions" for whatever reason.


RE: (3) splunk can listen for syslog inputs directly. You don't need to do this manually.


Missing: You can scp the data back to the indexer with a scripted input, or a job scheduler, or cron. Then pick the data up with the local file system. This can be handy when you want to grab some data off a production server without having to submit a change control to install something new, or change anything on that server. 😉 Also, sometimes the log files are created that way - the result of batch jobs, and it hardly seems worthwhile running a full time splunk forwarder when you only need the output from a batch job once per day.


Another architecture option:

Sometimes you might want to forward the splunk data to another forwarder, and then send it to the indexer from there.

I can think of two reasons off the top of my head:

  1. You want to minimize the number of holes in a firewall that you have to open, so you designate one relay box to pull everything from the other side of the firewall and send it through.

  2. You want to run lightweight forwarders on your production systems, but need a "full fledged" forwarder to do filtering before the data ends up on the indexer.


A Note on NFS:

It is slow. If you have a lot of log files, you will quickly run into latency issues. On the other hand, it is usually quick and easy to set up, and can dramatically shorten the deployment time for many low volume sources.


I would argue which solution is "best" depends on what you are trying to do and what kind of data you are working with. That is why there are so many possible solutions.

View solution in original post

0 Karma

Motivator

Re: (3), yes Splunk can read from network ports, but having Splunk monitor a syslog-written file is a best practice so Splunk doesn't lose data during restarts. /// Re: Missing, that's what I meant by (6). /// I added multi-tiered to the using forwarding line - thanks!

Motivator

On Windows you also have remote WMI. Though I would rank that pretty low, slightly above your option 5.

0 Karma

Motivator

Thanks! Added...

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!