Getting Data In

Why does regex extract fields differently in props than same regex used in SPL REX command?

torowa
Path Finder

Hi Splunkers.

I'm trying to extract fields from Windows DNS debug logs but running into extraction issues for some events.

Most events the fields extract o.k.
I'm finding for some events, the regex is returning more than it should in the field.
i.e. returns the field plus the remaining text in the raw event.

Works for most events extracting the domain correctly as, for example. (3)web(4)site(5)again(3)net(0) but when it fails, it extracts the questionname filed as (3)web(4)site(5)again(3)net(0) plus the remaining text to the end of the event.

Regex in use is straight out of the Splunk TA for Windows from props.conf:

] (?<questiontype>\w+)\s+(?<questionname>.*)

Sample data:

-------
28/10/2022 12:29:22 PM 07AC PACKET 1234523DDF690A11 UDP Snd 10.20.222.111 54c5 R Q [8081 DR NOERROR] A (3)web(4)site(5)again(3)net(0)
UDP response info at 1234523DDF690A11
Socket = 736
Remote addr 10.20.222.111, port 62754
Time Query=20130697, Queued=0, Expire=0
Buf length = 0x0200 (512)
Msg length = 0x0054 (84)
Message:
XID 0x54c5
Flags 0x8180
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 2
NSCOUNT 0
ARCOUNT 0
QUESTION SECTION:
[snipped for brevity]

--------

If I use the regex from the props.conf above in a REX command via SPL, the field is extracted correctly.
The same regex also works fine in regex101 etc. (with the same event causes the issue used as test data)

Can anyone explain why the regex works differently when used in props.conf than in direct SPL, and where I should be looking?
As mentioned above, issue only occurs for some events.  Note that DNS events are both single line and multi-line, with only some multi-line having the issue.

 

Thanks in advance.

Labels (1)
Tags (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I believe it's a matter of whether a newline is present or not.  At search time, there will be a newline.  There will not be one for single-line events, but it doesn't matter in that case.  At index time, newlines are stripped before regex processing so the behavior is different.

I think it problem can be avoided with a slightly different regex.

] (?<questiontype>\w+)\s+(?<questionname>\S*)
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...

Archived Metrics Now Available for APAC and EMEA realms

We’re excited to announce the launch of Archived Metrics in Splunk Infrastructure Monitoring for our customers ...