Splunk Search

Regex to remove optional trailing text from field with transforms/props

bowesmana
SplunkTrust
SplunkTrust

I have a field called Title, where it may sometimes end with the text

 Ends 9 P.M.

or varying case related variants.

I can easily do this in my search

| rex mode=sed field=Title "s/(?i) Ends 9.?p.?m.?//"

which performs the job nicely, but I want to be able to do this as standard, so I tried setting up a transform and field extraction with the following regex

(.*)((?i) ends [0-9]*.?[ap].?m)?

but the optional ? at the end of the 'ends...' group means that the first (.*) will capture all text, including the 'ends...' section, so the result is no change.

If I get rid of the last ? then it works for fields that have the 'ends...' but not for those fields that don't so they lose their value.

Any help on the right regex or a way to setup a 'sed' style regex in conf?

0 Karma
1 Solution

DalJeanis
Legend

Try this -

(?i)(^.*(?=\s*ends\s+\d+\s?[ap]\.?m\.?.*)|^.*)

This is a case-insensitive flag (?i) followed by a single capture group which has two options. The first option is anything, followed by a positive lookahead (?= for a value like " ends 9 pm". You'll notice I've allowed for 2-digit hours, etc. If that one fails, the second option takes everything. Both options require the match to start at the beginning of the string, with the first one ending at the start of the positive lookahead, and the second option taking the entire string.

View solution in original post

DalJeanis
Legend

Try this -

(?i)(^.*(?=\s*ends\s+\d+\s?[ap]\.?m\.?.*)|^.*)

This is a case-insensitive flag (?i) followed by a single capture group which has two options. The first option is anything, followed by a positive lookahead (?= for a value like " ends 9 pm". You'll notice I've allowed for 2-digit hours, etc. If that one fails, the second option takes everything. Both options require the match to start at the beginning of the string, with the first one ending at the start of the positive lookahead, and the second option taking the entire string.

bowesmana
SplunkTrust
SplunkTrust

Ah, that's the trick with the positive lookahead... That single capture group is the key, which means I can use Title::$1 in the transforms.conf and it works.

Out of interest, would lookaround work to remove prefixes to strings? I played around with a few attempts, but I don't see that it would.

Thanks

0 Karma

DalJeanis
Legend

Updated. Correct tool is definitely not lookaround.

Just need to take group 2 from this one:

^(drop this prefix )?(.*)
0 Karma

horsefez
Motivator

Hi bowesmana,

try out this regex and see if it will do the trick.

https://regex101.com/r/3MSGhl/2

(.+)(?:((?i)\sends\s[1]?[0-9]\s[ap]\.m\.))$
0 Karma

bowesmana
SplunkTrust
SplunkTrust

That has the same problem as my original, i.e. it does not capture anything in the capture group unless the text does have the trailing "ends..." phrase, e.g.

The phrase

Weekend Unreserved

gets an empty capture group 1 as the regex requires the "ends..." to be present to result in a match

0 Karma

bowesmana
SplunkTrust
SplunkTrust

I could make two alternatives within the regex, but then I am not sure how to assign the correct numbered capture group to the Title.

0 Karma

horsefez
Motivator

How about this one?

https://regex101.com/r/3MSGhl/3

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Still has the same issue, how does that work with transforms.conf, where the assignment is done with

Title::$X

where X is the capture group #. Unless it's always the same number how do you assign more than one capture group to the same field?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...