Getting Data In

host name extraction / regex from syslog using Rsyslog or Splunk

mlody11
Engager

Hey all, I just wanted to get people's opinion on the best method for getting firewall data into Splunk. We have firewall logs coming via syslog. We are using Rsyslog and its working fine. The data for the firewall is coming into a central point which then forwards it to our heavy forwarders.

So the data path looks like this (firewall) > (firewall log collection node) > (load balancer) > (HF) > Indexer

The catch to all of it is, the host coming into Splunk was the firewall log collection node instead of the firewall itself. To get the host name of the firewall, we can extract that from the message. The question is, where is it better to extract that?

The messages look like this:

 

blah blah blah originsicname=CN\=THIS_IS_THE_HOSTNAME,O\=somethingelse sequencenum=3291 some more blah blah blah

 

 

Option 1: Let rsyslog do it.

The messages come in and we have a regex routine in rsyslog that extracts the host from the logs and places it in a folder path that contains the host. The template and rsyslog script is below.

 

template(name="checkpoint_host_extrated-dynaFile" type="string" string="/var/log/syslog/%$MYHOSTNAME%/checkpoint_firewall_514/%!extracted_firewall_hostname%/%$YEAR%-%$MONTH%-%$DAY%-%$HOUR%.log")
template(name="firewall_host_extraction_originsicname" type="string" string="%msg:R,ERE,1,FIELD:originsicname=...=(.+),O--end%")

 

 

if $rawmsg contains ["originsicname=CN"] then { 
  reset $!extracted_firewall_hostname = exec_template("firewall_host_extraction_originsicname");
  action(name="checkpoint_firewall_514-write" type="omfile" DynaFile="checkpoint_host_extrated-dynaFile" template="rawmsg_format" dynaFileCacheSize="5" closeTimeout="5" ioBufferSize="64k" fileOwner="splunk" dirOwner="splunk" dirGroup="splunk" fileGroup="splunk" fileCreateMode="0755" dirCreateMode="0755")}

 

This works great, host name is recorded properly, etc. I may still need to do some error correction in case it doesn't get a match. Some documentation regarding this:

https://www.rsyslog.com/doc/master/configuration/nomatch.html

https://www.rsyslog.com/regex/

https://www.rsyslog.com/doc/v8-stable/configuration/property_replacer.html

https://www.rsyslog.com/how-to-use-set-variable-and-exec_template/

 

Option 2: have Splunk do the extraction when the heavy forwarder reads the log file.

I haven't written this but if this is more efficient, I could put the effort into it.


Thoughts? (FYI, I added all that info above for reference in case anyone else needs to do a regex extraction of a field via rsyslog / syslog).

Labels (3)
0 Karma

venkatasri
Motivator

@mlody11 

Option 1 is efficient less overhead on parsing/other-queues and as you mentioned if your _raw event's doesn't contain firewall hostname in some events option 2 is not a solution for your case.

---

An upvote would be appreciated if it helps!

venkatasri
Motivator

Hi @mlody11 

You can go with Option 1:

if you are able to extract and write the source firewall_hostname in absolute path of log file and you shall be running Splunk UF on the host where *.log files being written. Then use host_segment setting in inputs.conf to override the default host field.

---

An upvote would be appreciated if it helps!

Tags (2)
0 Karma

mlody11
Engager

Yup, that's exactly what I'm doing, overwriting the host name from the log path that is created by extracting it from the message using rsyslog.

The question really is, which option is more efficient?

Also, I guess I should mention some of the logs have it, some of them dont, so extracting only from the ones that have it is also something to consider. 

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.