I've scoured the Splunk answers site for all the regex/rex/transforms/props threads and still can't figure this out. The data is the syslog output from a pfSense firewall with the extraneous newline character filtered out on the pfSense side (to make parsing easier). I'm convinced Splunk is somehow not finding a regex match in my log files. I can tell it's picking up my props.conf and transforms.conf files just fine because other elements of these files are working as expected.
My transforms.conf file:
[sourcetype_pfsense_by_proto]
DEST_KEY = MetaData:Sourcetype
REGEX = proto\s(\S+)
FORMAT = sourcetype::pfsense_$1
[pfsenseCommonFields]
REGEX = pf: (?P<duration>\d{2}:\d{2}:\d{2}\.\d{6}) rule (?P<rulenum>\d+/\d+)\((?P<reason>\w+)\): (?P<action>\w+) (?P<direction>\w+) on (?P<interface>[A-Za-z0-9]+): \((?P<ipheader>[A-Za-z0-9, ]*\[[A-Za-z0-9, ]*\][A-Za-z0-9, ]*\([A-Za-z0-9, ]*\)[A-Za-z0-9, ]*)\)\s+(?P<srcip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<srcport>\d{1,5}) > (?P<dstip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<dstport>\d{1,5}):
Don't be thrown off by all the number groupings in the middle, they are simply IP address fields with the definition of an octet (from the transforms.conf documentation page) pasted in for each octet. Here is my props.conf file, although I've tried all three ways (EXTRACT, TRANSFORM and REPORT) with the same results:
[host::10.11.12.13]
TRUNCATE = 0
REPORT-pfsenseCommonFields = pfsenseCommonFields
[source::udp:514]
TRANSFORMS-pfsense_by_proto = sourcetype_pfsense_by_proto
Sample of the data that this is meant to parse:
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:08.012132 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3013, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:03.050708 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14314, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61077 > 54.247.105.180.443: Flags [S], cksum 0x4730 (correct), seq 2560336237, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.062553 rule 3/0(match): block in on re0: (tos 0x0, ttl 128, id 10232, offset 0, flags [none], proto UDP (17), length 229) 10.11.12.50.138 > 10.11.12.255.138: NBT UDP PACKET(138)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:00.133234 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14291, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61068 > 54.247.105.180.443: Flags [S], cksum 0xe445 (correct), seq 2412318003, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.742183 rule 82/0(match): pass in on re0: (tos 0x0, ttl 128, id 31516, offset 0, flags [none], proto UDP (17), length 73) 10.11.12.50.50363 > 10.11.12.13.53: 56921+ A? mcs1-870f.broker.sophos.com. (45)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:02.742183 rule 82/0(match): pass in on re0: (tos 0x0, ttl 128, id 31516, offset 0, flags [none], proto UDP (17), length 73) 10.11.12.50.50363 > 10.11.12.13.53: 56921+ A? mcs1-870f.broker.sophos.com. (45)
Dec 10 21:38:20 10.11.12.13 Dec 11 05:38:10 pf: 00:00:01.999673 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3012, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:00.429468 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14341, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61199 > 54.247.105.180.443: Flags [S], cksum 0x5232 (correct), seq 977794117, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:10.162735 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14322, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61192 > 54.247.105.180.443: Flags [S], cksum 0x340c (correct), seq 1407056092, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:08.012132 rule 3/0(match): block in on re0: (tos 0xc0, ttl 64, id 3013, offset 0, flags [DF], proto UDP (17), length 76) 10.11.12.101.123 > 198.55.111.5.123: NTPv4, length 48
Dec 10 21:38:17 10.11.12.13 Dec 11 05:38:07 pf: 00:00:03.050708 rule 89/0(match): pass in on re0: (tos 0x0, ttl 128, id 14314, offset 0, flags [DF], proto TCP (6), length 52) 10.11.12.50.61077 > 54.247.105.180.443: Flags [S], cksum 0x4730 (correct), seq 2560336237, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
(I'm planning to parse more fields later after I get these common fields working.) When I copy/paste this same regex expression into a rex line, it parses all my fields just fine. Based on a hint from another Splunk Answers thread I suspect the issue is a difference in parsing regex (possibly difference in trimming white spaces) but the other thread wasn't clear on the solution. Here is the rex line that successfully parses the fields:
sourcetype=pfsens* | rex field=_raw "pf: (?P<duration>\d{2}:\d{2}:\d{2}\.\d{6}) rule (?P<rulenum>\d+/\d+)\((?P<reason>\w+)\): (?P<action>\w+) (?P<direction>\w+) on (?P<interface>[A-Za-z0-9]+): \((?P<ipheader>[A-Za-z0-9, ]*\[[A-Za-z0-9, ]*\][A-Za-z0-9, ]*\([A-Za-z0-9, ]*\)[A-Za-z0-9, ]*)\)\s+(?P<srcip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<srcport>\d{1,5}) [\>] (?P<dstip>(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?))\.(?P<dstport>\d{1,5})"
This rex statement will show three sourcetypes (pfsense, pfsense_TCP and pfsense_UDP) under normal conditions, but won't show any of my named fields.
I know it's not an issue with the host:: or source:: stanzas because I've swapped the labels with the TRANSFORMS line that adjusts the sourcetype based on protocol, and the sourcetype rename per protocol continues to work just fine.
More history: I exhausted the 60-day enterprise trial but did not change any of the enterprise settings. This instance has been reverted to a Free license so if it's an embedded/leftover permissions issue I'll need guidance in how to fix it under the hood. My field extractions settings page shows:
Name Type Extraction/Transform Owner App Sharing Status Actions
host::10.11.12.13 : REPORT-pfsenseCommonFields Uses transform pfsenseCommonFields
No owner
system
Global | Permissions Enabled Move | Delete
And my field transformations settings page lists the following:
Name Owner App Sharing Status Actions
pfsenseCommonFields
No owner
system
Global | Permissions Enabled | Disable Clone | Move | Delete
sourcetype_pfsense_by_proto
No owner
system
Global | Permissions Enabled | Disable Clone | Move | Delete
In other words, Splunk seems to be recognizing the elements of the props.conf and transforms.conf files just fine, and permissions are Global all around.
Any help would be greatly appreciated.
Rookie mistake. I forgot that Fast Mode doesn't show any fields (also forgot I was in Fast Mode). Flipped to Verbose Mode and they're all there. Sorry for all the hassle. Hopefully my idiocy benefits someone else down the road sometime.
Rookie mistake. I forgot that Fast Mode doesn't show any fields (also forgot I was in Fast Mode). Flipped to Verbose Mode and they're all there. Sorry for all the hassle. Hopefully my idiocy benefits someone else down the road sometime.
Are you putting these settings on the instance where you search (like a search head) or some other instance? Your regex looks OK, so one theory would be that you're putting your settings in the wrong Splunk instance.
in addition to @Ayn 's comment - see this great wiki article to learn more about it http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings
Did you had a look at the existing App for pfsense https://apps.splunk.com/app/1527 - maybe this already fits your needs....
I was off-put by the Universal Forwarder requirement. I'm pretty sure the whole reason for that is to mitigate the newline character that tcpdump adds in there, but I've already mitigated that manually. And now that I'm 98% down this path I've 1) learned a TON about how Splunk works under the hood and 2) don't want to give up when I'm this close!
I've always used (?)
rather than (?P)
for field extractions.
Also, unless you really need to validate the data consider simplifying your regex. If the contents of the ipheader field, for example, ever changes this regex will fail, but [\S\s]+?
will continue to work.
Thank you for the suggestions. I tried your (?)
suggestion but got the same results. Works fine in rex command, still no dice from the transforms.conf.
The ipheader is as simple as I could get it. I don't need to validate it, in fact I was going to put in another regex parse later 'on ipheader' to split out the various elements. But for now I have to deal with the fact that there are spaces inside of it, and there is always one set of parentheses inside of it as well. So my regex string will always need to locate one set of parentheses before it can start looking for end of match.