Splunk Search

Help in optimising a horrible regex (46K+ steps)

Nikobobinus
Explorer

Hi Splunkers,

I am trying to extract a string within a string, which has been repeated, with the addition of some pre- and -post fixes, only the very start and end of the string are static values ('AZ-' and '-VMSS').

Example data:

AZ-203-dev-app-1-build-agents-203-dev-app-1-build-agents0006GA-1720624093-VMSS

AZ-eun-dev-005-pqu-ado-vmss-eun-dev-005-pqu-ado-vmss005X89-1720625975-VMSS

AZ-DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE-DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE000000-1720637733-VMSS

 

I have a working rex command to extract the relevant data (temp_hostname4):

| rex field=source_hostname "(?i)^AZ(?<cap1>(-[A-Z0-9]+)+)(?=\1[A-Z0-9]{6})-(?<temp_hostname4>([A-Z0-9]+-?)+)-\d{10}-VMSS$"

 

Which correctly extracts:

203-dev-app-1-build-agents0006GA

eun-dev-005-pqu-ado-vmss005X89

DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE000000

 

But let's face it, this is horrible! According to regex101 this takes 46K+ steps, which can't be nice for Splunk to apply to c.20K records several times per day.

Can anyone suggest optimisations to bring that number down?

 

For added complication (and for clarity to anyone reading this) it's temp_hostname4 because there are multiple other ways the hostname might have been... manipulated before it gets to Splunk, sometimes with the string repeated, sometimes not, resulting in the following SPL - I could use coalesce rather than case, but that's hardly important right now, and separating the regex statements seemed like the saner thing to do in this instance 😉

| rex field=source_hostname "(?i)^AZ(?<cap1>(-[A-Z0-9]+)+)(?=\1[A-Z0-9]{6})-(?<temp_hostname4>([A-Z0-9]+-?)+)-\d{10}-VMSS$"
| rex field=source_hostname "(?i)^AZ-(?<temp_hostname3>[^.]+)-\d{10}-VMSS$"
| rex field=source_hostname "(?i)^AZ-(?<temp_hostname2>[^.]+)-\d{10}$"
| rex field=source_hostname "(?i)^(?<temp_hostname1>[^.]+)_\d{10}$"

| eval alias_source_of=case(
!isnull(temp_hostname4), temp_hostname4,
!isnull(temp_hostname3), temp_hostname3,
!isnull(temp_hostname2), temp_hostname2,
!isnull(temp_hostname1), temp_hostname1,
1=1, null()
)

Any suggestions for optimisations of the regex would be gratefully appreciated.

Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try this https://regex101.com/r/rlI3Xl/2

| rex field=source_hostname "(?i)^AZ(?<cap1>[A-Z0-9-]+?)(?=\1[A-Z0-9]{6})(?<temp_hostname4>\1[A-Z0-9]{6})-\d{10}-VMSS$"

View solution in original post

Nikobobinus
Explorer
  1. @ITWhispererand @PickleRick thank you both very much! Technically PickleRick's mimics the precise result better, but takes c.13K steps, while ITWhisperer's answer takes just 332 steps and leaves a leading hyphen (which is easy enough to strip out).

I'm going to accept ITWhisperer's as the solution for it's efficiency, but wanted to call out that PickleRick's result, as a pure regex solution, is technically better.

Thank you both!

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try this https://regex101.com/r/rlI3Xl/2

| rex field=source_hostname "(?i)^AZ(?<cap1>[A-Z0-9-]+?)(?=\1[A-Z0-9]{6})(?<temp_hostname4>\1[A-Z0-9]{6})-\d{10}-VMSS$"

PickleRick
SplunkTrust
SplunkTrust
(?i)^AZ-(?<temp_hostname4>([-A-Z0-9]+))(?:[-A-Z0-9]+?)(?=\1).*-VMSS$

According to regex101, it matches your 3 events in slightly less than 13k (which gives about 4k steps per event)

https://regex101.com/r/8h4zwD/1

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...