Getting Data In

Why does regex extract fields differently in props than same regex used in SPL REX command?

torowa
Path Finder

Hi Splunkers.

I'm trying to extract fields from Windows DNS debug logs but running into extraction issues for some events.

Most events the fields extract o.k.
I'm finding for some events, the regex is returning more than it should in the field.
i.e. returns the field plus the remaining text in the raw event.

Works for most events extracting the domain correctly as, for example. (3)web(4)site(5)again(3)net(0) but when it fails, it extracts the questionname filed as (3)web(4)site(5)again(3)net(0) plus the remaining text to the end of the event.

Regex in use is straight out of the Splunk TA for Windows from props.conf:

] (?<questiontype>\w+)\s+(?<questionname>.*)

Sample data:

-------
28/10/2022 12:29:22 PM 07AC PACKET 1234523DDF690A11 UDP Snd 10.20.222.111 54c5 R Q [8081 DR NOERROR] A (3)web(4)site(5)again(3)net(0)
UDP response info at 1234523DDF690A11
Socket = 736
Remote addr 10.20.222.111, port 62754
Time Query=20130697, Queued=0, Expire=0
Buf length = 0x0200 (512)
Msg length = 0x0054 (84)
Message:
XID 0x54c5
Flags 0x8180
QR 1 (RESPONSE)
OPCODE 0 (QUERY)
AA 0
TC 0
RD 1
RA 1
Z 0
CD 0
AD 0
RCODE 0 (NOERROR)
QCOUNT 1
ACOUNT 2
NSCOUNT 0
ARCOUNT 0
QUESTION SECTION:
[snipped for brevity]

--------

If I use the regex from the props.conf above in a REX command via SPL, the field is extracted correctly.
The same regex also works fine in regex101 etc. (with the same event causes the issue used as test data)

Can anyone explain why the regex works differently when used in props.conf than in direct SPL, and where I should be looking?
As mentioned above, issue only occurs for some events.  Note that DNS events are both single line and multi-line, with only some multi-line having the issue.

 

Thanks in advance.

Labels (1)
Tags (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I believe it's a matter of whether a newline is present or not.  At search time, there will be a newline.  There will not be one for single-line events, but it doesn't matter in that case.  At index time, newlines are stripped before regex processing so the behavior is different.

I think it problem can be avoided with a slightly different regex.

] (?<questiontype>\w+)\s+(?<questionname>\S*)
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...

Platform Highlights | January 2023 Newsletter

 January 2023Peace on Earth and Peace of Mind With Business ResilienceAll organizations can start the new year ...