Deployment Architecture

Best practice to ingest data from external data sources

marzio
New Member

Hi, which is the best practice to ingest data from external (internet-based) data sources, when only syslog or native forwarding are available? A set of load-balanced heavy forwarders in DMZ, that work as relay to internal indexers?

Direct channels from external networks to internal networks are not an option, due to security requirements.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As always, apart from the protocol, with any security-related zoning and such you have the question of the direction of the connection.

For some use cases it's ok to have a connection initiated by the UF "outside" connectiong to the HF "inside" (which is relatively safe even over internet if using mutual TLS - just make sure it's configured properly and working!) but for other cases it might be necessary to use some form of "pull" method instead of "push" - for this you might have to use some third party modules or even custom-written scripted/modular inputs to pull data from some external repository.

For syslog you could also use a setup with remote "collector" forwarding over an encrypted channel (like rsyslog with RELP) into your internal network where you'd forward the data into your splunk forwarders.

There are many possible setups depending on your limitations and restrictions.

marzio
New Member

Hello PickleRick,

unlucky I cannot move to a "pull" paradigm for these very specific data sources (network appliances). I use the "pull" paradigm with dedicated TAs (mostly of them were developed by me using the Add-on builder) for other cloud-based or external sources.

At the end I am investigating the use of a cloud-based log stream processor, which is able to forward data to Splunk using HEC.

Thank you!

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Those both are usable solutions. Mostly it depends how you are collecting events on source side. If your are using syslog there then use it and if collection has done by UF/HF then just setup couple of IHFs/IUFs (intermediate xx forwarders) on your DMZ with TLS to get data into your site. And remember that use s2s is not (officially) supported via external LBs like F5/AWS NLB etc. Basically this means that you must use outputs.conf with FQDN names on source side.

r. Ismo

0 Karma

marzio
New Member

Hello Isoutamo,

thank you for pointing out that Splunk-to-Splunk forwarding is not officially supported via external network devices.

At the end I am investigating the use of a cloud-based log stream processor, which is able to forward data to Splunk using HEC. In this scenario, I would configure the reverse proxy to forward the HEC stream to existing load-balanced Heavy Forwarders, in internal network, where HEC input is active.

Thank you!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...