Getting Data In

file source load balancing

nickhills
Ultra Champion

I am just about to start indexing a large amount of CDR (call detail records) which i will be retrieving via SFTP.

Currently, we splunk our real time data by using forwarders on our servers which load balance into a pool of indexers.

This is pretty evenly spreading the load across the pool, and also means we can take one of the indexers down for updates etc without affecting our ability to index and report on real time events.

what is the best way to take 'flat' files and spread the indexing across the pool?
i have thought about writing a script to read the files from the server, and then use a forwarder to push the data to the index pool. Is this the best way?

is there a way to have an indexer retrieve the files, and then push the events back to the pool without indexing them locally first?

are there other options that i haven't even considered?

thanks,
Nick

If my comment helps, please give it a thumbs up!
Tags (1)
0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

I'm going to presume that installing a Universal Forwarder(UF) directly on the CDR server is not an option for you.

So you could have a dedicated/standalone UF that receives the CDR log events and load balances them into your Indexer cluster.
The UF wont index the events locally, it will simply just forward them on.

So a few ideas for getting the CDR log events to the UF :

  • can you syslog(UDP) or stream over TCP the log events from the CDR server to the UF ?
  • your sftp scripted input could download the files ,write the contents to STDOUT (which the UF monitors) and then simply not persist the file to disk.
  • the CDR server could write the logs to shared storage(ie:NAS) which the UF mounts and monitors.

View solution in original post

Damien_Dallimor
Ultra Champion

I'm going to presume that installing a Universal Forwarder(UF) directly on the CDR server is not an option for you.

So you could have a dedicated/standalone UF that receives the CDR log events and load balances them into your Indexer cluster.
The UF wont index the events locally, it will simply just forward them on.

So a few ideas for getting the CDR log events to the UF :

  • can you syslog(UDP) or stream over TCP the log events from the CDR server to the UF ?
  • your sftp scripted input could download the files ,write the contents to STDOUT (which the UF monitors) and then simply not persist the file to disk.
  • the CDR server could write the logs to shared storage(ie:NAS) which the UF mounts and monitors.

View solution in original post

nickhills
Ultra Champion

This is actually the method we went for.
A quick script to grab CDRs by FTP, and then A dedicated forwarder is squirting the events at the indexing pool which gives us some tolerance, and HA.

thanks for you input!

If my comment helps, please give it a thumbs up!
0 Karma

Damien_Dallimor
Ultra Champion

In this scenario, the dedicated UF architecture will be the cleanest way of achieving high availability and failover into your cluster of Splunk Indexers.

nickhills
Ultra Champion

as you guessed, the actual system producing the CDR is outside of our control, so my thinking was to use a UF on a dedicated instance to download the csv files, and then set them up as an input.

was just curious if there are other ways to tackle this. Is there a way to do this with a single indexer, ie have an indexer forward to itself (or the pool it is in)?

seems like the sensible idea is a dedicated forwarding node so i can add my scripted inputs also (currently running on an indexer, and thus indexed on only 1 node)

If my comment helps, please give it a thumbs up!
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!