Hi,
I’m trying to use Splunk as a log aggregation solution, and eventually as a SIEM. I have three industrial plants that are completely air-gapped, with no permanent internet access.
The idea is to deploy a syslog server at each plant to collect logs locally and then forward them to a central Splunk installation.
Any component or software will be:
downloaded or installed during a temporary internet connection (via a cellular modem),
then moved into a fully air-gapped production environment.
I’ve reviewed SC4S (Splunk Connect for Syslog) through the official Splunk documentation and several videos.
In theory, it looks like a very powerful and well-designed solution. However, in practice, I find the documentation quite difficult to follow, especially when considering:
an air-gapped environment,
no internet connectivity,
and very large log volumes (high EPS / high throughput).
From a practical and low-complexity perspective:
Is it better to use SC4S, or
to simply deploy syslog-ng or rsyslog (open-source solutions) on Linux servers at each plant and forward logs to the central Splunk instance?
Best deployment model for an air-gapped industrial environment:
SC4S standalone at each site?
Simple syslog collectors with forwarding?
Which option is more stable and easier to operate long term?
Given a preference for open-source solutions:
Is relying on syslog-ng / rsyslog considered a professionally acceptable approach with Splunk?
Or has SC4S effectively become the best practice that should not be avoided?
From an operating system perspective:
Which is better suited for handling very large volumes of log data?
Ubuntu Server
CentOS / Rocky Linux / AlmaLinux
Which is more stable and easier to maintain in a 24/7 production environment?
The end goal:
A stable solution
Simple to operate
Capable of handling very large data volumes
Suitable for air-gapped industrial environments
Without introducing excessive operational complexity
I’m trying to find a balance between simplicity and best practices.
I want to use Splunk correctly, but at the same time avoid introducing operational complexity that exceeds the team’s current capabilities.
Any advice or real-world experience would be greatly appreciated.
SC4S _is_ open source. It's just syslog-ng with extra steps.
The question about syslog-ng/SC4S/rsyslog usually boils down to previous experience and personal preferences (I'm a big fan of rsyslog but that's just me).
I'm not 100% sure how SC4S handles writing to files (by default it's meant to push events to HEC input(s)).
Anyway, your main challenge I think will not be the particular solution but the overall process of moving the data since if you have an air-gapped environment(s) and big volume of data. Of course air gap means that you will have to save the data to files in one site, then move the files on some remote media to another site and from there ingest them to Splunk. That will yield a significant latency on your events.
Writing the events to files with a predefined naming scheme should be relatively easy in both syslog-ng/rsyslog (again I'm not sure if SC4S can do it easily). Unless you want to modify the events before writing them, this part will be fairly straightforward. You might want to think over your overall process to automate as much as possible (like (un)mounting of the movable storage, copying the files...). And you will have to struggle with preventing duplicates.
And don't forget about retention policy and file rotation at the source site(s).