Hey all, I just wanted to get people's opinion on the best method for getting firewall data into Splunk. We have firewall logs coming via syslog. We are using Rsyslog and its working fine. The data for the firewall is coming into a central point which then forwards it to our heavy forwarders.
So the data path looks like this (firewall) > (firewall log collection node) > (load balancer) > (HF) > Indexer
The catch to all of it is, the host coming into Splunk was the firewall log collection node instead of the firewall itself. To get the host name of the firewall, we can extract that from the message. The question is, where is it better to extract that?
The messages look like this:
blah blah blah originsicname=CN\=THIS_IS_THE_HOSTNAME,O\=somethingelse sequencenum=3291 some more blah blah blah
Option 1: Let rsyslog do it.
The messages come in and we have a regex routine in rsyslog that extracts the host from the logs and places it in a folder path that contains the host. The template and rsyslog script is below.
template(name="checkpoint_host_extrated-dynaFile" type="string" string="/var/log/syslog/%$MYHOSTNAME%/checkpoint_firewall_514/%!extracted_firewall_hostname%/%$YEAR%-%$MONTH%-%$DAY%-%$HOUR%.log")
template(name="firewall_host_extraction_originsicname" type="string" string="%msg:R,ERE,1,FIELD:originsicname=...=(.+),O--end%")
if $rawmsg contains ["originsicname=CN"] then {
reset $!extracted_firewall_hostname = exec_template("firewall_host_extraction_originsicname");
action(name="checkpoint_firewall_514-write" type="omfile" DynaFile="checkpoint_host_extrated-dynaFile" template="rawmsg_format" dynaFileCacheSize="5" closeTimeout="5" ioBufferSize="64k" fileOwner="splunk" dirOwner="splunk" dirGroup="splunk" fileGroup="splunk" fileCreateMode="0755" dirCreateMode="0755")}
This works great, host name is recorded properly, etc. I may still need to do some error correction in case it doesn't get a match. Some documentation regarding this:
https://www.rsyslog.com/doc/master/configuration/nomatch.html
https://www.rsyslog.com/regex/
https://www.rsyslog.com/doc/v8-stable/configuration/property_replacer.html
https://www.rsyslog.com/how-to-use-set-variable-and-exec_template/
Option 2: have Splunk do the extraction when the heavy forwarder reads the log file.
I haven't written this but if this is more efficient, I could put the effort into it.
Thoughts? (FYI, I added all that info above for reference in case anyone else needs to do a regex extraction of a field via rsyslog / syslog).
Option 1 is efficient less overhead on parsing/other-queues and as you mentioned if your _raw event's doesn't contain firewall hostname in some events option 2 is not a solution for your case.
---
An upvote would be appreciated if it helps!
Hi @mlody11
You can go with Option 1:
if you are able to extract and write the source firewall_hostname in absolute path of log file and you shall be running Splunk UF on the host where *.log files being written. Then use host_segment setting in inputs.conf to override the default host field.
---
An upvote would be appreciated if it helps!
Yup, that's exactly what I'm doing, overwriting the host name from the log path that is created by extracting it from the message using rsyslog.
The question really is, which option is more efficient?
Also, I guess I should mention some of the logs have it, some of them dont, so extracting only from the ones that have it is also something to consider.