Below is one improvement (which saves a significant amount of money/resources) and one bug fix (which improves ingest) relevant for everyone who supports Fortinet.
In our environment, Fortigate firewall logs accounted for about 20% of our Splunk license usage. This is a typical Fortigate traffic event:
time=17:28:50 devname="my3700d" devid="FGT37D4615801234" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="mirror-dmz" eventtime=1570055330 srcip=192.0.2.35 srcname="venus.example.local" srcport=41394 srcintf="internal.123" srcintfrole="undefined" dstip=18.71.0.20 dstport=389 dstintf="external.466" dstintfrole="undefined" poluuid="b64cd2aa-8b22-51e9-36fd-43ef903c833b" sessionid=3049101971 proto=6 action="accept" policyid=188 policytype="policy" service="LDAP" dstcountry="United States" srccountry="Reserved" trandisp="noop" duration=275 sentbyte=386 rcvdbyte=60 sentpkt=6 rcvdpkt=1 appcat="unscanned" sentdelta=386 rcvddelta=60
this event is over 650 bytes long. Since Splunk licensing is based on raw event size, stripping down that event before it gets ingested by Splunk can have a dramatic effect on license usage. Fortinet does not have any settings for making its events less chatty. So, we used syslog-ng's rewrite functionality to trim the events like this:
time=17:28:50 devname="my3700d" devid="FGT37D4615801234" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="mirror-dmz" eventtime=1570055330 srcip=192.0.2.35 srcname="venus.example.local" srcport=41394 srcintf="internal.123" srcintfrole="undefined" dstip=18.71.0.20 dstport=389 dstintf="external.466" dstintfrole="undefined" poluuid="b64cd2aa-8b22-51e9-36fd-43ef903c833b" sessionid=3049101971 proto=6 action="accept" policyid=188 policytype="policy" service="LDAP" dstcountry="United States" srccountry="Reserved" trandisp="noop" duration=275 sentbyte=386 rcvdbyte=60 sentpkt=6 rcvdpkt=1 appcat="unscanned" sentdelta=386 rcvddelta=60
which is under 400 bytes. Many events (like drops and blocks) shrink even further. It's up to you to decide which fields to keep and which to discard. We removed fields that were not of interest to the firewall admins. In addition to the fields shown, we could have safely deleted the human-readable 'time' field (see item #2 below). We could also have totally discarded the src/dst name fields which are pulled from reverse DNS, but in our case we elected to just trim the FQDN down to the short name. I'm probably also going to rename the 'action' field to 'vendor-action' or something else to avoid collision and confusion with the CIM-mapped 'action'.
In syslog-ng, the configuration looks like this:
# TCP/601 - RFC5424, octet counting
source s_aggregation_tcp_601 {
syslog(
ip(0.0.0.0) transport(tcp) port(601)
);
};
rewrite rewrite_fortigate {
#subst ( '^time=\d\d:\d\d:\d\d(\.\d{3,6})? ', "", value("MESSAGE") );
subst ( ' devid="[A-Z0-9]{16}" ', " ", value("MESSAGE") );
subst ( ' logid="[0-9]{10}" ', " ", value("MESSAGE") );
subst ( ' srcname="([\w-]+)\.example\.local" ', ' srcname="$1" ', value("MESSAGE") );
subst ( ' srcintfrole="[a-z]+" ', " ", value("MESSAGE") );
subst ( ' dstname="([\w-]+)\.example\.local" ', ' dstname="$1" ', value("MESSAGE") );
subst ( ' dstintfrole="[a-z]+" ', " ", value("MESSAGE") );
subst ( ' poluuid="[a-z0-9-]{36}" ', " ", value("MESSAGE") );
subst ( ' sessionid=[0-9]+ ', " ", value("MESSAGE") );
subst ( ' policytype="policy" ', " ", value("MESSAGE") );
subst ( ' srccountry="[^"]+" ', " ", value("MESSAGE") );
subst ( ' dstcountry="[^"]+" ', " ", value("MESSAGE") );
subst ( ' service="[^"]+" ', " ", value("MESSAGE") );
subst ( ' appcat="unscanned"', "", value("MESSAGE") );
subst ( ' crscore=[0-9]+ craction=[0-9]+ crlevel="[a-z]+"', "", value("MESSAGE") );
};
filter f_firewall {
netmask(192.0.2.1/32)
;};
template t_msg_only { template("${MESSAGE}\n"); template_escape(no); };
destination d_firewall { file("/app/$LOGHOST/log/$R_YEAR-$R_MONTH-$R_DAY/firewall/$SOURCEIP/$R_HOUR-$LEVEL-$FACILITY.log" suppress(0) template(t_msg_only)); };
log { source(s_aggregation_tcp_601); filter(f_firewall); rewrite(rewrite_fortigate); destination(d_firewall); flags(final); };
Note that since last year Fortinet versions v6.0+ support modern RFC5424/RFC6514 syslog. So, you can now do syslog over TCP instead of over UDP. However, be careful that in syslog-ng you must use the syslog() driver instead of the network() driver. The network() driver is for legacy BSD syslog, which is what most devices today still use.
Because I discard the 'devid' field, I needed to make one modification to the Fortinet TA deployed to the Heavy Forwarder on the syslog-ng server. I added this file to the TA:
# $SPLUNK_HOME/etc/apps/Splunk_TA_fortinet_fortigate/local/transforms.conf
[force_sourcetype_fgt_traffic]
DEST_KEY = MetaData:Sourcetype
REGEX = \stype=\"?traffic\"?\s
FORMAT = sourcetype::fgt_traffic
[force_sourcetype_fgt_utm]
DEST_KEY = MetaData:Sourcetype
REGEX = \stype=\"?utm\"?\s
FORMAT = sourcetype::fgt_utm
[force_sourcetype_fgt_event]
DEST_KEY = MetaData:Sourcetype
REGEX = \stype=\"?event\"?\s
FORMAT = sourcetype::fgt_event
By using syslog-ng to trim our logs, we cut our daily firewall log ingest from 180GB to about 70GB. At today's pricing, the value of that 110GB savings in Splunk license is maybe $60,000 per year. Trimming events also made them more readable (when using the 'Search and Reporting' app), faster to query, and consume less disk. For any deep debugging that requires those trimmed fields, the firewall admins still have access to those events in Fortianalyzer.
There is a bug in current versions (up to 1.6.1) of the Splunk add-on for Fortinet that breaks timestamp extraction. The bug causes Splunk timestamp extraction to guess the timestamp, which sometimes fails. When it fails, Splunk will intermittently backdate recent events with the right time but a random date from the past ~10 years. Most events do get properly assigned today's date - only a few events get backdated. So perhaps the biggest ramification of the bug is that it gives Splunk buckets a longer date range. Your firewall log buckets will say they contain years of data instead of a few days or weeks of data, because a few events have the wrong date.
The fix is adding this file to the TA, to explicitly extract epoch time from events:
# $SPLUNK_HOME/etc/apps/Splunk_TA_fortinet_fortigate/local/props.conf
[fgt_log]
TIME_PREFIX = \seventtime=
TIME_FORMAT = %s
MAX_TIMESTAMP_LOOKAHEAD = 28
This bug has been reported to Fortinet and should be fixed in upcoming versions of the TA.
Thanks Satyen for the input.
As Satyen also pointed out, please trim the events according to your own needs. Some trimming may break dashboards on the Fortinet App.
Thanks Satyen for the input.
As Satyen also pointed out, please trim the events according to your own needs. Some trimming may break dashboards on the Fortinet App.
Hi @satyenshah / @jerryzhao ,
Thanks for the awesome post. We had the same problem in our environment and we had applied this fix in our syslog-ng server yesterday and we see a considerable difference in the raw logs length of Fortinet logs after the unwanted fields are stripped. But we also noticed that, we no longer receive any logs in the catch-all sourcetype (fgt_log). Is this something expected?
Regards,
Vishnu.
I don't see that happening in my environment. Am running TA version 1.6.1. I haven't checked if it works with 1.6.2.
Thanks @satyenshah ,
We are running v.1.6.2 of the TA. Not sure if this is a specific problem with this TA version.