All Apps and Add-ons

TA-pfsense transforms.conf pfsense_sourcetyper broken?

pkt_nspktr
Explorer

I have the same or very similar issue asked about by @token2 in TA-pfsense sourcetyping only catching filterlog on 3-28-2020. As of the posting of this message, there has been no response to their message.

  • Splunk Enterprise Version:8.0.3 Build:a6754d8441bf on host splunk. (This is the Trial Version of Enterprise running on my home network. It is a single host install, no forwarders, separate indexers or anything fancy.)
  • TA-pfsense Version 2.2.1, released. Oct. 29, 2019.
  • pfSense 2.4.5-RELEASE-p1 (host fw) to send all syslog events to splunk  on port udp/5140.

Splunk has a local data input ingesting pfsense on upd/5140 writing logs to index fw with a sourcetype of "pfsense".

Running the query `index=fw | stats count by sourcetype` over all time returned the following sourcetypes:

pfsense1004981
pfsense:dhclient298
pfsense:filterdns2240
pfsense:filterlog564704
pfsense:syslogd3

 

However, when running tcpdump on the listening interface, I can see more types of events being sent on the wire: dhcpd, nginx, unbound, etc.

When doing a search for these events, 'index=fw dhcpd' or 'index=fw nginx', zero (0) events are returned. So it's not that they are not being sourcetyp-ed incorrectly, but they are being dropped altogether.

According to Splunk's transforms.conf documentation for 8.0.4 (note I have 8.0.3),

* If the REGEX for a field extraction configuration does not have the
    capturing groups specified in the FORMAT, searches that use that
    configuration will not return events.

This comment does not appear in the transforms.conf documentation for 8.0.3.  I don't know if that means this is new behavior, previously undocumented behavior, or something else. Regardless, in order to verify that, I have focused my efforts on troubleshooting the REGEX used by the pfsense_sourcetyper stanza.

I've tried using the pfsense_sourcetyper REGEX in transforms.conf from a forked version of TA-pfsense to attempt to get the other events sourcetyp-ed correctly, but this did not work

  1. Original (Datapunctum): REGEX = \w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s(?:[\w.]+\s)?(\w+)
  2. Forked (Apocrathia):      REGEX = ^\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s+(\w+)(\[\d+\])?:
  3. I wrote a small python script to validate the regular expressions were capturing the correct data (they are).

I hope this post shows that I've tried to do some initial legwork in solving the problem before asking for help. In addition to everything above, I have also done the following:

  1. I read this April 2020 post on setting up a dashboard on trenchesofit.com, but it does not address using sourcetypes other than "pfsense:filterlog".
  2. I searched the Splunk Community forums for "TA-pfsense"; there were 21 unique posts returned. The most recent & similar to mine I mentioned above (posted by @token2 ). The rest were unhelpful in that they were too old or did not apply to this situation.
  3. I've done websearches for troubleshooting TA-pfsense and found a few forked instances of the code on github (where I found the updated REGEX above), but none seemed to work.
  4. I've uninstalled TA-pfsense and re-installed it. I've deleted indexes and renamed them.

If anyone in the Splunk community has any pointers or can provide any assistance in helping me get TA-pfsense to correctly sourcetype all the events being sent from pfsense, I would be very grateful!

Labels (1)
0 Karma
1 Solution

pkt_nspktr
Explorer

SOLVED.  Here's my working set-up, hopefully it helps others.

# inputs.conf
[udp://fw.homenet:5140]
connection_host = dns
index = fw
sourcetype = pfsense
no_appending_timestamp = false
disabled = 0

 

# props.conf
[pfsense]
TRANSFORMS-sourcetype = pfsense_sourcetyper
SHOULD_LINEMERGE = false
SEDCMD-event_cleaner = s/\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s+//g

 

# transforms.conf
[pfsense_sourcetyper]
REGEX = ^\(?(\w+)\)?
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::pfsense:$1

 

As best as I've been able to figure out, the way Splunk processes events is as follows:

  1. event is received on udp/5140; the hostname of the sending system is looked up via DNS and assigned to the host field of the event
  2. the event's sourcetype field is assigned the value of pfsense
  3. Splunk adds the hostname and the timestamp to the event, but does not modify the _raw event string
  4. the event is stored in the "fw" index
  5. The props.conf's [pfsense] stanza says for events with a sourcetype field of pfsense, the specified actions need to happen:
    1. SEDCMD-event_cleaner: remove the syslog formatted timestamp (i.e. "Jul 2 09:03:51") from the event
    2. TRANSFORMS-sourcetype: lookup the specified stanza in the transforms.conf file and perform the associated actions
  6. In transforms.conf, the [pfsense-sourcetyper] stanza uses the regular expression specified by REGEX to specify a capture group that can be used to modify/transform fields in the event. The REGEX is compared against the SEDCMD- modified _raw event. In this particular case, we're going to modify the MetaData:Sourcetype field and use the FORMAT directive to specify how that will look (appending the first capture group from REGEX ("$1") to the string "pfsense:").
  7. After the transforms.conf stanza has been executed, the event should now have the sourcetype field set to a value of pfsense:<service>. The remaining props.conf's stanzas will match on that value at search runtime to provide the extracted values specified under each stanza.

If this was helpful, I'd appreciate a thumbs up so other people can find it.

View solution in original post

pkt_nspktr
Explorer

SOLVED.  Here's my working set-up, hopefully it helps others.

# inputs.conf
[udp://fw.homenet:5140]
connection_host = dns
index = fw
sourcetype = pfsense
no_appending_timestamp = false
disabled = 0

 

# props.conf
[pfsense]
TRANSFORMS-sourcetype = pfsense_sourcetyper
SHOULD_LINEMERGE = false
SEDCMD-event_cleaner = s/\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s+//g

 

# transforms.conf
[pfsense_sourcetyper]
REGEX = ^\(?(\w+)\)?
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::pfsense:$1

 

As best as I've been able to figure out, the way Splunk processes events is as follows:

  1. event is received on udp/5140; the hostname of the sending system is looked up via DNS and assigned to the host field of the event
  2. the event's sourcetype field is assigned the value of pfsense
  3. Splunk adds the hostname and the timestamp to the event, but does not modify the _raw event string
  4. the event is stored in the "fw" index
  5. The props.conf's [pfsense] stanza says for events with a sourcetype field of pfsense, the specified actions need to happen:
    1. SEDCMD-event_cleaner: remove the syslog formatted timestamp (i.e. "Jul 2 09:03:51") from the event
    2. TRANSFORMS-sourcetype: lookup the specified stanza in the transforms.conf file and perform the associated actions
  6. In transforms.conf, the [pfsense-sourcetyper] stanza uses the regular expression specified by REGEX to specify a capture group that can be used to modify/transform fields in the event. The REGEX is compared against the SEDCMD- modified _raw event. In this particular case, we're going to modify the MetaData:Sourcetype field and use the FORMAT directive to specify how that will look (appending the first capture group from REGEX ("$1") to the string "pfsense:").
  7. After the transforms.conf stanza has been executed, the event should now have the sourcetype field set to a value of pfsense:<service>. The remaining props.conf's stanzas will match on that value at search runtime to provide the extracted values specified under each stanza.

If this was helpful, I'd appreciate a thumbs up so other people can find it.

dconnett_splunk
Splunk Employee
Splunk Employee

Thanks @pkt_nspktr  I combined your approach with @hansb and I finally got it working! Great work, and thank you so much for sharing!


Regards,

Dave

0 Karma

hansb
New Member

I was having the same issues, the solution above wasnt working for me so I read a lot of documentation to understand how overriding of source types on a per-event basis works.

According to the Getting Data In 8.0.2007 manual at the chapter 'Override source types on a per-event basis', the syntax should be as:

[<unique_stanza_name>]
REGEX = <your_regex>
FORMAT = sourcetype::<your_custom_sourcetype_value>
DEST_KEY = MetaData:Sourcetype

One of my experiments was that I didnt change the expressions, I only swapped the order of the last two:

[pfsense_sourcetyper]
# The REGEX setting specifies the regular expression that points to a
# field in the event that you want to extract
# timestamp wo year, host, application and 1 single extraction by the
# second group (no '?' so marked active): application wo the ':'
# Sep 13 23:59:59 xxx.yyy.edu filterlog:
REGEX = \w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s(?:[\w.]+\s)?(\w+)
#
# The FORMAT setting specifies the name of the new sourcetype
# $1 here refers to the second group, but first extraction above
# example: sourcetype::pfsense:dhcp
FORMAT = sourcetype::pfsense:$1
#
# write the value from FORMAT to the source type of the event
# MetaData:Sourcetype : The source type of the event,
# the value must be prefixed by "sourcetype::"
DEST_KEY = MetaData:Sourcetype

After that, I got 14 different sourcetypes instead of just 3:

pfsense:filterlog80,22097.568%
 
pfsense:dhcpd7290.887%
 
pfsense6140.747%
 
pfsense:unbound2630.32%
 
pfsense:filterdns1440.175%
 
pfsense:openvpn960.117%
 
pfsense:gw1540.066%
 
pfsense:check_reload_status480.058%
 
pfsense:dpinger120.014%
 
pfsense:php120.014%
 

 

I also found that my dhcp reported less than the EXTRACT for it:

< EXTRACT-ipv4_dhcp = dhcpd:\s(?<vendor_action>DHCPACK|DHCPREQUEST) (?:on|for) (?<dest_ip>\S+) (?:from|to) (?<src_mac>\S+) via (?<src_interface>\S+)
---
> EXTRACT-ipv4_dhcp = (?<vendor_action>DHCPACK|DHCPREQUEST) (?:on|for) (?<dest_ip>\S+) (?:from|to) (?<src_mac>\S+) \(.*\) via (?<src_interface>\S+)

 

kind regards,
hansb

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...