topic Re: file source load balancing in Getting Data In

file source load balancing

nickhills — Sun, 15 Jan 2012 22:45:39 GMT

I am just about to start indexing a large amount of CDR (call detail records) which i will be retrieving via SFTP.

Currently, we splunk our real time data by using forwarders on our servers which load balance into a pool of indexers.

This is pretty evenly spreading the load across the pool, and also means we can take one of the indexers down for updates etc without affecting our ability to index and report on real time events.

what is the best way to take 'flat' files and spread the indexing across the pool?
i have thought about writing a script to read the files from the server, and then use a forwarder to push the data to the index pool. Is this the best way?

is there a way to have an indexer retrieve the files, and then push the events back to the pool without indexing them locally first?

are there other options that i haven't even considered?

thanks,
Nick

Re: file source load balancing

Damien_Dallimor — Mon, 16 Jan 2012 01:12:58 GMT

I'm going to presume that installing a Universal Forwarder(UF) directly on the CDR server is not an option for you.

So you could have a dedicated/standalone UF that receives the CDR log events and load balances them into your Indexer cluster.
The UF wont index the events locally, it will simply just forward them on.

So a few ideas for getting the CDR log events to the UF :

can you syslog(UDP) or stream over TCP the log events from the CDR server to the UF ?
your sftp scripted input could download the files ,write the contents to STDOUT (which the UF monitors) and then simply not persist the file to disk.
the CDR server could write the logs to shared storage(ie:NAS) which the UF mounts and monitors.

Re: file source load balancing

nickhills — Mon, 16 Jan 2012 20:45:30 GMT

as you guessed, the actual system producing the CDR is outside of our control, so my thinking was to use a UF on a dedicated instance to download the csv files, and then set them up as an input.

was just curious if there are other ways to tackle this. Is there a way to do this with a single indexer, ie have an indexer forward to itself (or the pool it is in)?

seems like the sensible idea is a dedicated forwarding node so i can add my scripted inputs also (currently running on an indexer, and thus indexed on only 1 node)

Re: file source load balancing

Damien_Dallimor — Tue, 17 Jan 2012 03:27:14 GMT

In this scenario, the dedicated UF architecture will be the cleanest way of achieving high availability and failover into your cluster of Splunk Indexers.

Re: file source load balancing

nickhills — Mon, 27 Feb 2012 21:22:56 GMT

This is actually the method we went for.
A quick script to grab CDRs by FTP, and then A dedicated forwarder is squirting the events at the indexing pool which gives us some tolerance, and HA.

thanks for you input!