Getting Data In

file source load balancing

nickhills
Ultra Champion

I am just about to start indexing a large amount of CDR (call detail records) which i will be retrieving via SFTP.

Currently, we splunk our real time data by using forwarders on our servers which load balance into a pool of indexers.

This is pretty evenly spreading the load across the pool, and also means we can take one of the indexers down for updates etc without affecting our ability to index and report on real time events.

what is the best way to take 'flat' files and spread the indexing across the pool?
i have thought about writing a script to read the files from the server, and then use a forwarder to push the data to the index pool. Is this the best way?

is there a way to have an indexer retrieve the files, and then push the events back to the pool without indexing them locally first?

are there other options that i haven't even considered?

thanks,
Nick

If my comment helps, please give it a thumbs up!
Tags (1)
0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

I'm going to presume that installing a Universal Forwarder(UF) directly on the CDR server is not an option for you.

So you could have a dedicated/standalone UF that receives the CDR log events and load balances them into your Indexer cluster.
The UF wont index the events locally, it will simply just forward them on.

So a few ideas for getting the CDR log events to the UF :

  • can you syslog(UDP) or stream over TCP the log events from the CDR server to the UF ?
  • your sftp scripted input could download the files ,write the contents to STDOUT (which the UF monitors) and then simply not persist the file to disk.
  • the CDR server could write the logs to shared storage(ie:NAS) which the UF mounts and monitors.

View solution in original post

Damien_Dallimor
Ultra Champion

I'm going to presume that installing a Universal Forwarder(UF) directly on the CDR server is not an option for you.

So you could have a dedicated/standalone UF that receives the CDR log events and load balances them into your Indexer cluster.
The UF wont index the events locally, it will simply just forward them on.

So a few ideas for getting the CDR log events to the UF :

  • can you syslog(UDP) or stream over TCP the log events from the CDR server to the UF ?
  • your sftp scripted input could download the files ,write the contents to STDOUT (which the UF monitors) and then simply not persist the file to disk.
  • the CDR server could write the logs to shared storage(ie:NAS) which the UF mounts and monitors.

nickhills
Ultra Champion

This is actually the method we went for.
A quick script to grab CDRs by FTP, and then A dedicated forwarder is squirting the events at the indexing pool which gives us some tolerance, and HA.

thanks for you input!

If my comment helps, please give it a thumbs up!
0 Karma

Damien_Dallimor
Ultra Champion

In this scenario, the dedicated UF architecture will be the cleanest way of achieving high availability and failover into your cluster of Splunk Indexers.

nickhills
Ultra Champion

as you guessed, the actual system producing the CDR is outside of our control, so my thinking was to use a UF on a dedicated instance to download the csv files, and then set them up as an input.

was just curious if there are other ways to tackle this. Is there a way to do this with a single indexer, ie have an indexer forward to itself (or the pool it is in)?

seems like the sensible idea is a dedicated forwarding node so i can add my scripted inputs also (currently running on an indexer, and thus indexed on only 1 node)

If my comment helps, please give it a thumbs up!
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...