Splunk Search

Splunk Regex Engine Fails?

morethanyell
Builder

We're trying to extract fields that match this [ FIELD_NAME = S0m3 Valu3 w\ reaLLy $pec!aL ch*rac+3rs ] and write them on tsidx so that their consumable on tstats. We're using the transforms-props partnership below

# transforms.conf
[hello_transforms]
REGEX = (?<key>[\w]+)\s\=\s(?<value>[^\]]+)
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true

#props.conf
[hello]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
TRANSFORMS-capturer = hello_transforms

While it is doing what's expected for most of the fields, (i.e. fields are written on disk, verified through walklex), some values failed to be captured entirely or as expected. For example
[ REMARKS = A Kerberos authentication ticket (TGT) was requested. ]
Splunk only captured "A". See screenshot below.

alt text

REGEX VALID:

alt text

Do you think this is Splunk's REGEX engine's fault or I have something wrong in my configs?

Thanks in advance.

0 Karma

to4kawa
Ultra Champion

Sample:

| makeresults 
| eval _raw="Feb 7 11:25:20 SYD-UTIL-02 ADAuditPlus [ Category = LogonReports ] [ REMARKS = A Kerberos authentication ticket (TGT) was requested. ]"
| rex max_match=0 "\[\s*(?<key>\S+)\s\=\s(?<value>.*?)\]"

transforms.conf

 REGEX = \[\s*(\S+)\s\=\s(.*?)\]

need ]

If you use FORMAT in props.conf , capture name is not need.

Using FORMAT:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2

Not using FORMAT:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

cf. Configureindex-timefieldextraction

0 Karma

morethanyell
Builder

Same result

0 Karma

to4kawa
Ultra Champion

@marethanyell
Do you restart/refresh Splunk?
At least, [ REMARKS = A Kerberos authentication ticket (TGT) was requested. ] is not same result.

0 Karma

morethanyell
Builder

Edited transforms.conf with your regex. Stopped Splunk. Deleted index using "clean eventdata" (don't worry, it's a dev machine). Then restarted Splunk. Re indexed the file using one-shot. Still fails to capture the entire value. It stops at whitespace.

0 Karma

morethanyell
Builder

My old Regex also works on | rex but it does not on transforms.conf

0 Karma

to4kawa
Ultra Champion

@morethanyell
we both have a mistake. my answer is updated.
I'm sorry.

0 Karma

morethanyell
Builder

Same issue, mate. I've used your transforms and it still fails to capture the entire thing and halts at whitespace


[aap_fields_discov]
REGEX = \[\s*(\S+)\s\=\s(.*?)\s\]
REPEAT_MATCH = true
WRITE_META = true

0 Karma

to4kawa
Ultra Champion

(T_T)

sedcmd-whitespace = s/\s/ /g

why REGEX halt with white space?
I don't understand.

0 Karma

morethanyell
Builder

By paper, it should capture this
[ FIELDNAME = The quick brown fox jumps over the lazy dog. ]
If you try it on | rex or on regex101.com, it does work. But when implemented on transforms.conf, it only captures "The"...so, the field value will be "FIELDNAME = The" instead of entire "FIELDNAME = The quick brown fox jumps over the lazy dog."

It's not appropriate anymore to show evidence that the regex is working via | rex or regex101.com because as I've said before, it does work via those mediums. But not when used in transforms.conf for index-time field extraction, it doesn't.

Out of frustration, I've changed the strategy of capturing the fields by enclosing values with double quotes (e.g. [ FIELDNAME = s0m3 vaLu3 ] becomes [ FIELDNAME ="s0m3 vaLu3" ] ) using SEDCMD on props instead of transforms.conf.

Thanks for the help.

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...