I have the same or very similar issue asked about by @token2 in TA-pfsense sourcetyping only catching filterlog on 3-28-2020. As of the posting of this message, there has been no response to their message.
Splunk has a local data input ingesting pfsense on upd/5140 writing logs to index fw with a sourcetype of "pfsense".
Running the query `index=fw | stats count by sourcetype` over all time returned the following sourcetypes:
pfsense | 1004981 |
pfsense:dhclient | 298 |
pfsense:filterdns | 2240 |
pfsense:filterlog | 564704 |
pfsense:syslogd | 3 |
However, when running tcpdump on the listening interface, I can see more types of events being sent on the wire: dhcpd, nginx, unbound, etc.
When doing a search for these events, 'index=fw dhcpd' or 'index=fw nginx', zero (0) events are returned. So it's not that they are not being sourcetyp-ed incorrectly, but they are being dropped altogether.
According to Splunk's transforms.conf documentation for 8.0.4 (note I have 8.0.3),
* If the REGEX for a field extraction configuration does not have the capturing groups specified in the FORMAT, searches that use that configuration will not return events.
This comment does not appear in the transforms.conf documentation for 8.0.3. I don't know if that means this is new behavior, previously undocumented behavior, or something else. Regardless, in order to verify that, I have focused my efforts on troubleshooting the REGEX used by the pfsense_sourcetyper stanza.
I've tried using the pfsense_sourcetyper REGEX in transforms.conf from a forked version of TA-pfsense to attempt to get the other events sourcetyp-ed correctly, but this did not work
I hope this post shows that I've tried to do some initial legwork in solving the problem before asking for help. In addition to everything above, I have also done the following:
If anyone in the Splunk community has any pointers or can provide any assistance in helping me get TA-pfsense to correctly sourcetype all the events being sent from pfsense, I would be very grateful!
SOLVED. Here's my working set-up, hopefully it helps others.
# inputs.conf
[udp://fw.homenet:5140]
connection_host = dns
index = fw
sourcetype = pfsense
no_appending_timestamp = false
disabled = 0
# props.conf
[pfsense]
TRANSFORMS-sourcetype = pfsense_sourcetyper
SHOULD_LINEMERGE = false
SEDCMD-event_cleaner = s/\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s+//g
# transforms.conf
[pfsense_sourcetyper]
REGEX = ^\(?(\w+)\)?
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::pfsense:$1
As best as I've been able to figure out, the way Splunk processes events is as follows:
If this was helpful, I'd appreciate a thumbs up so other people can find it.
SOLVED. Here's my working set-up, hopefully it helps others.
# inputs.conf
[udp://fw.homenet:5140]
connection_host = dns
index = fw
sourcetype = pfsense
no_appending_timestamp = false
disabled = 0
# props.conf
[pfsense]
TRANSFORMS-sourcetype = pfsense_sourcetyper
SHOULD_LINEMERGE = false
SEDCMD-event_cleaner = s/\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s+//g
# transforms.conf
[pfsense_sourcetyper]
REGEX = ^\(?(\w+)\)?
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::pfsense:$1
As best as I've been able to figure out, the way Splunk processes events is as follows:
If this was helpful, I'd appreciate a thumbs up so other people can find it.
Thanks @pkt_nspktr I combined your approach with @hansb and I finally got it working! Great work, and thank you so much for sharing!
Regards,
Dave
I was having the same issues, the solution above wasnt working for me so I read a lot of documentation to understand how overriding of source types on a per-event basis works.
According to the Getting Data In 8.0.2007 manual at the chapter 'Override source types on a per-event basis', the syntax should be as:
[<unique_stanza_name>]
REGEX = <your_regex>
FORMAT = sourcetype::<your_custom_sourcetype_value>
DEST_KEY = MetaData:Sourcetype
One of my experiments was that I didnt change the expressions, I only swapped the order of the last two:
[pfsense_sourcetyper]
# The REGEX setting specifies the regular expression that points to a
# field in the event that you want to extract
# timestamp wo year, host, application and 1 single extraction by the
# second group (no '?' so marked active): application wo the ':'
# Sep 13 23:59:59 xxx.yyy.edu filterlog:
REGEX = \w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s(?:[\w.]+\s)?(\w+)
#
# The FORMAT setting specifies the name of the new sourcetype
# $1 here refers to the second group, but first extraction above
# example: sourcetype::pfsense:dhcp
FORMAT = sourcetype::pfsense:$1
#
# write the value from FORMAT to the source type of the event
# MetaData:Sourcetype : The source type of the event,
# the value must be prefixed by "sourcetype::"
DEST_KEY = MetaData:Sourcetype
After that, I got 14 different sourcetypes instead of just 3:
pfsense:filterlog | 80,220 | 97.568% | |
pfsense:dhcpd | 729 | 0.887% | |
pfsense | 614 | 0.747% | |
pfsense:unbound | 263 | 0.32% | |
pfsense:filterdns | 144 | 0.175% | |
pfsense:openvpn | 96 | 0.117% | |
pfsense:gw1 | 54 | 0.066% | |
pfsense:check_reload_status | 48 | 0.058% | |
pfsense:dpinger | 12 | 0.014% | |
pfsense:php | 12 | 0.014% |
I also found that my dhcp reported less than the EXTRACT for it:
< EXTRACT-ipv4_dhcp = dhcpd:\s(?<vendor_action>DHCPACK|DHCPREQUEST) (?:on|for) (?<dest_ip>\S+) (?:from|to) (?<src_mac>\S+) via (?<src_interface>\S+)
---
> EXTRACT-ipv4_dhcp = (?<vendor_action>DHCPACK|DHCPREQUEST) (?:on|for) (?<dest_ip>\S+) (?:from|to) (?<src_mac>\S+) \(.*\) via (?<src_interface>\S+)
kind regards,
hansb