Splunk Search

Regex in Field Transform not greedy?

Olli1919
Path Finder

Hi Base,

could it be that Regexes in Field Transforms are not greedy?

I am using this field transformation to extract sld.tld from hostnames:

[hostname_query_sub1]
CLEAN_KEYS = 1
MV_ADD = 0
SOURCE_KEY = querystring1
REGEX = ([^\.]+\.[^\.]+\.[^\.]+)
FORMAT = t2d::$1

Which gives these results:
1.2.3.4.in-addr.arpa -> 1.2.3
subdomain.subdomain.sld.tld -> subdomain.subdomain.sld

Using REGEX = ([^\.]+\.[^\.]+\.[^\.]+)$ works as intended:
1.2.3.4.in-addr.arpa -> 4.in-addr.arpa
subdomain.subdomain.sld.tld -> subdomain.sld.tld

Shouldn't the expression match to the end of the string even without "$"?

0 Karma
1 Solution

Ayn
Legend

It's working as you've instructed it to. The expression you've supplied will not match until the end of the string because you've explicitly specified that it should only match characters that are NOT a period ("."). The regex engine will return the earliest match it will find - this is not to be confused with whether the match itself is greedy or not, that's simply default regex engine operation. Greediness in matches is whether they will try to match as long as possible or not.

If you want the last three groups before the end of the string, you need to anchor the match at the end of the string, just like you've done in the last regex in your post.

View solution in original post

Ayn
Legend

It's working as you've instructed it to. The expression you've supplied will not match until the end of the string because you've explicitly specified that it should only match characters that are NOT a period ("."). The regex engine will return the earliest match it will find - this is not to be confused with whether the match itself is greedy or not, that's simply default regex engine operation. Greediness in matches is whether they will try to match as long as possible or not.

If you want the last three groups before the end of the string, you need to anchor the match at the end of the string, just like you've done in the last regex in your post.

Ayn
Legend

It's fast, I would write it in the same way 🙂

0 Karma

Olli1919
Path Finder

Thank you for the explation. So my first char class does not walk past the first dot - thanks. If I may follow-up on normal regex behavior: Ist this regex method (negate with anchor) fast, or could this be further optimized?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...