I'm trying to extract fields from Windows DNS debug logs but running into extraction issues for some events.
Most events the fields extract o.k. I'm finding for some events, the regex is returning more than it should in the field. i.e. returns the field plus the remaining text in the raw event.
Works for most events extracting the domain correctly as, for example. (3)web(4)site(5)again(3)net(0) but when it fails, it extracts the questionname filed as (3)web(4)site(5)again(3)net(0) plus the remaining text to the end of the event.
Regex in use is straight out of the Splunk TA for Windows from props.conf:
------- 28/10/2022 12:29:22 PM 07AC PACKET 1234523DDF690A11 UDP Snd 10.20.222.111 54c5 R Q [8081 DR NOERROR] A (3)web(4)site(5)again(3)net(0) UDP response info at 1234523DDF690A11 Socket = 736 Remote addr 10.20.222.111, port 62754 Time Query=20130697, Queued=0, Expire=0 Buf length = 0x0200 (512) Msg length = 0x0054 (84) Message: XID 0x54c5 Flags 0x8180 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 1 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 2 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: [snipped for brevity]
If I use the regex from the props.conf above in a REX command via SPL, the field is extracted correctly. The same regex also works fine in regex101 etc. (with the same event causes the issue used as test data)
Can anyone explain why the regex works differently when used in props.conf than in direct SPL, and where I should be looking? As mentioned above, issue only occurs for some events. Note that DNS events are both single line and multi-line, with only some multi-line having the issue.
I believe it's a matter of whether a newline is present or not. At search time, there will be a newline. There will not be one for single-line events, but it doesn't matter in that case. At index time, newlines are stripped before regex processing so the behavior is different.
I think it problem can be avoided with a slightly different regex.
--- If this reply helps you, Karma would be appreciated.