Getting Data In

Best practice for getting data into Splunk without a forwarder?

Jason
Motivator

A client is interested in the best ways to generally get data into Splunk without installing forwarder on all their machines. Is there a doc on this somewhere?

I can start with:

  1. Decide to use the forwarder on all machines, pointing either to the Indexer or another Forwarder.
  2. Decide to use the forwarder only on a central log server already logging everything.
  3. Have a daemon (syslog-ish) on Indexer accept data on network port and write to files
  4. Put files on a network share accessible to Indexer
  5. Poll servers via WMI (Windows)
  6. Write a script on a forwarder or indexer to pull data from somewhere
  7. Periodic upload from another system to the Indexer

Anyone have other items to add, or a better way of ranking the list above?

0 Karma
1 Solution

rotten
Communicator

I wouldn't use the qualifier "all" in (1).

Not all machines will necessarily have the capacity or network architecture that allows you to do it, however, you may still want to install a forwarder on as many machines as you can.

Even if you have a standard build, and include a forwarder as part of your standard build, you'll still have systems which end up as "exceptions" for whatever reason.


RE: (3) splunk can listen for syslog inputs directly. You don't need to do this manually.


Missing: You can scp the data back to the indexer with a scripted input, or a job scheduler, or cron. Then pick the data up with the local file system. This can be handy when you want to grab some data off a production server without having to submit a change control to install something new, or change anything on that server. 😉 Also, sometimes the log files are created that way - the result of batch jobs, and it hardly seems worthwhile running a full time splunk forwarder when you only need the output from a batch job once per day.


Another architecture option:

Sometimes you might want to forward the splunk data to another forwarder, and then send it to the indexer from there.

I can think of two reasons off the top of my head:

  1. You want to minimize the number of holes in a firewall that you have to open, so you designate one relay box to pull everything from the other side of the firewall and send it through.

  2. You want to run lightweight forwarders on your production systems, but need a "full fledged" forwarder to do filtering before the data ends up on the indexer.


A Note on NFS:

It is slow. If you have a lot of log files, you will quickly run into latency issues. On the other hand, it is usually quick and easy to set up, and can dramatically shorten the deployment time for many low volume sources.


I would argue which solution is "best" depends on what you are trying to do and what kind of data you are working with. That is why there are so many possible solutions.

View solution in original post

0 Karma

rotten
Communicator

I wouldn't use the qualifier "all" in (1).

Not all machines will necessarily have the capacity or network architecture that allows you to do it, however, you may still want to install a forwarder on as many machines as you can.

Even if you have a standard build, and include a forwarder as part of your standard build, you'll still have systems which end up as "exceptions" for whatever reason.


RE: (3) splunk can listen for syslog inputs directly. You don't need to do this manually.


Missing: You can scp the data back to the indexer with a scripted input, or a job scheduler, or cron. Then pick the data up with the local file system. This can be handy when you want to grab some data off a production server without having to submit a change control to install something new, or change anything on that server. 😉 Also, sometimes the log files are created that way - the result of batch jobs, and it hardly seems worthwhile running a full time splunk forwarder when you only need the output from a batch job once per day.


Another architecture option:

Sometimes you might want to forward the splunk data to another forwarder, and then send it to the indexer from there.

I can think of two reasons off the top of my head:

  1. You want to minimize the number of holes in a firewall that you have to open, so you designate one relay box to pull everything from the other side of the firewall and send it through.

  2. You want to run lightweight forwarders on your production systems, but need a "full fledged" forwarder to do filtering before the data ends up on the indexer.


A Note on NFS:

It is slow. If you have a lot of log files, you will quickly run into latency issues. On the other hand, it is usually quick and easy to set up, and can dramatically shorten the deployment time for many low volume sources.


I would argue which solution is "best" depends on what you are trying to do and what kind of data you are working with. That is why there are so many possible solutions.

0 Karma

Jason
Motivator

Re: (3), yes Splunk can read from network ports, but having Splunk monitor a syslog-written file is a best practice so Splunk doesn't lose data during restarts. /// Re: Missing, that's what I meant by (6). /// I added multi-tiered to the using forwarding line - thanks!

ftk
Motivator

On Windows you also have remote WMI. Though I would rank that pretty low, slightly above your option 5.

0 Karma

Jason
Motivator

Thanks! Added...

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...